CrawlJobs Logo

Senior Infrastructure Engineer - GenAI

https://www.citi.com/ Logo

Citi

Location Icon

Location:
India , Chennai

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

We are seeking an experienced Senior Backend Engineer to design, develop, and maintain the infrastructure powering our generative AI applications. You will work closely with AI engineers, platform teams, and product stakeholders to build scalable, reliable backend systems that support AI model deployment, inference, and integration. This role combines traditional backend engineering expertise with cutting-edge AI infrastructure challenges to deliver robust solutions at enterprise scale.

Job Responsibility:

  • Design and implement scalable backend services and APIs for generative AI applications using microservices architecture and cloud-native patterns
  • Build and maintain model serving infrastructure with load balancing, auto-scaling, caching, and failover capabilities for high-availability AI services
  • Deploy and orchestrate containerized AI workloads using Docker, Kubernetes, ECS, and OpenShift across development, staging, and production environments
  • Develop serverless AI functions using AWS Lambda, ECS Fargate, and other cloud services for scalable, cost-effective inference
  • Implement robust CI/CD pipelines for automated deployment of AI services, including model versioning and gradual rollout strategies
  • Create comprehensive monitoring, logging, and alerting systems for AI service performance, reliability, and cost optimization
  • Integrate with various LLM APIs (OpenAI, Anthropic, Google) and open-source models, implementing efficient batching and optimization techniques
  • Build data pipelines for training data preparation, model fine-tuning workflows, and real-time streaming capabilities
  • Ensure adherence to security best practices, including authentication, authorization, API rate limiting, and data encryption
  • Collaborate with AI researchers and product teams to translate AI capabilities into production-ready backend services

Requirements:

  • Bachelor’s degree in computer science, Engineering, or related technical field, or equivalent practical experience
  • 4–6 years of experience in backend engineering with focus on scalable, production systems
  • 2+ years of hands-on experience with containerization, Kubernetes, and cloud infrastructure in production environments
  • Demonstrated experience with AI/ML model deployment and serving in production systems
  • Strong experience with backend development using Python, with familiarity in Go, Node.js, or Java for building scalable web services and APIs
  • Hands-on experience with containerization using Docker and orchestration platforms including Kubernetes, OpenShift, and AWS ECS in production environments
  • Proficient with cloud infrastructure, particularly AWS services (Lambda, ECS, EKS, S3, RDS, ElastiCache) and serverless architectures
  • Experience with CI/CD pipelines using Jenkins, GitLab CI, GitHub Actions, or similar tools, including Infrastructure as Code with Terraform or CloudFormation
  • Strong knowledge of databases including PostgreSQL, MongoDB, Redis, and experience with vector databases for AI applications
  • Familiarity with message queues (RabbitMQ, Apache Kafka, AWS SQS/SNS) and event-driven architectures
  • Experience with monitoring and observability tools such as Prometheus, Grafana, DataDog, or equivalent platforms
  • Knowledge of AI/ML model serving frameworks like MLflow, Kubeflow, TensorFlow Serving, or Triton Inference Server
  • Understanding of API design principles, load balancing, caching strategies, and performance optimization techniques
  • Experience with microservices architecture, distributed systems, and handling high-traffic, low-latency applications

Additional Information:

Job Posted:
April 16, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior Infrastructure Engineer - GenAI

Senior Machine Learning Engineer

We’re seeking a Senior Machine Learning Engineer (P50) to join our new GenAI Mod...
Location
Location
Singapore
Salary
Salary:
Not provided
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Extensive experience (generally 5+ years) in ML systems engineering, backend engineering, or infrastructure roles
  • Strong background in one or more of: LLMs, NLP, search/retrieval, embeddings, or applied ML
  • Hands-on experience with at least one GenAI area: RAG pipelines, fine-tuning, hybrid retrieval, or orchestration frameworks
  • Proficiency with modern ML frameworks (PyTorch, TensorFlow, Hugging Face, LangChain, LlamaIndex)
  • Familiarity with vector databases (Weaviate, Pinecone, FAISS, etc.) and large-scale serving infra
  • Strong coding skills (Python, backend engineering) and ability to move fast from idea to prototype
  • Comfort working in fast-paced, experimental environments with evolving direction
  • Bachelor’s or Master’s in Computer Science, Machine Learning, or related field—or equivalent experience
Job Responsibility
Job Responsibility
  • Build and apply advanced GenAI models
  • Develop and fine-tune LLMs and embeddings for Atlassian’s unique knowledge and enterprise data
  • Implement retrieval-augmented generation (RAG), hybrid retrieval, and knowledge-grounded modeling approaches
  • Work hands-on with modern frameworks, contributing directly to high-value prototypes and experiments
  • Prototype and experiment quickly
  • Build proof-of-concept systems for GenAI-powered assistants, agentic workflows, and innovative user experiences
  • Run experiments, collect feedback, and iterate fast to validate impact
  • Design and implement evaluation methods for quality, groundedness, and user value
  • Collaborate and contribute
  • Work closely with peers across ML, engineering, and product teams to bring new ideas to life
What we offer
What we offer
  • Health and wellbeing resources
  • Paid volunteer days
Read More
Arrow Right

Senior AI Engineer

We are seeking an experienced Senior Python Software Engineer (Senior AI Develop...
Location
Location
Poland , Warsaw
Salary
Salary:
Not provided
https://www.inetum.com Logo
Inetum
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Degree in Computer Science, Data Science, Artificial Intelligence, or a related field, or equivalent practical experience
  • Several years of experience in AI and Machine Learning development, ideally within Customer Care solutions
  • Strong proficiency in Python and NLP frameworks
  • Hands-on experience with Azure AI services (e.g., Azure Machine Learning, Cognitive Services, Bot Services)
  • Solid understanding of cloud architectures and microservices on Azure
  • Experience with CI/CD pipelines and MLOps
  • Analytical mindset and strong problem-solving capabilities
  • Polish & English speaker
Job Responsibility
Job Responsibility
  • Design, develop, and integrate AI/ML solutions, with a particular focus on Generative AI (GenAI), LLMs, and multi-modal (chat, voice) interfaces
  • Architect and deliver customer-facing AI agents that provide real-time, intelligent automation for support, marketing, or transactional use cases
  • Build and maintain multi-model pipelines for inference, fine-tuning, chunking, and embedding-based retrieval (RAG) systems
  • Deploy, monitor, and optimize AI models in production-grade environments using Kubernetes and Azure-native services
  • Integrate GenAI agents with cross-company APIs, backend services, and partner systems through MCP for dynamic tool use and data enrichment
  • Collaborate closely with DevOps engineers to implement scalable CI/CD pipelines, infrastructure-as-code, and secure AI workload automation
  • Evaluate and integrate open-source and proprietary LLMs, embeddings, and vector databases
  • Optimize prompt engineering strategies and implement orchestration tools (e.g., LangChain, MCP) to enable complex task execution
  • Build robust model evaluation frameworks, A/B testing environments, and experiment tracking for iterative development
  • Design privacy-first AI workflows that comply with GDPR, anonymization, and auditability (e.g., PII scrubbing, user consent)
What we offer
What we offer
  • Flexible working hours
  • Hybrid work model, allowing employees to divide their time between home and modern offices in key Polish cities
  • A cafeteria system that allows employees to personalize benefits by choosing from a variety of options
  • Generous referral bonuses, offering up to PLN6,000 for referring specialists
  • Additional revenue sharing opportunities for initiating partnerships with new clients
  • Ongoing guidance from a dedicated Team Manager for each employee
  • Tailored technical mentoring from an assigned technical leader, depending on individual expertise and project needs
  • Dedicated team-building budget for online and on-site team events
  • Opportunities to participate in charitable initiatives and local sports programs
  • A supportive and inclusive work culture with an emphasis on diversity and mutual respect
  • Fulltime
Read More
Arrow Right

Senior DevOps Engineer (GCP)

Our client is a global UK-based financial services and investment banking organi...
Location
Location
Salary
Salary:
Not provided
n-ix.com Logo
N-iX
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in DevOps, Cloud Engineering, or SRE roles
  • Strong hands-on experience with Google Cloud Platform, including: GKE / Kubernetes, Cloud Run, Cloud Functions, Pub/Sub, Cloud Storage, VPC, IAM, networking, security
  • Expertise in Terraform, Helm, or other IaC tools
  • Experience building CI/CD pipelines (GitHub Actions, GitLab CI, CircleCI, Jenkins, etc.)
  • Strong understanding of containerization and orchestration: Docker, Kubernetes
  • Solid experience with monitoring, observability, and logging stacks
  • Familiarity with networking, load balancing, security hardening, and zero-trust principles
  • Experience supporting production systems in high-availability, distributed environments
  • Strong scripting skills (Python, Bash, or similar)
  • Experience working with agile engineering teams
Job Responsibility
Job Responsibility
  • Design, implement, and maintain cloud infrastructure on Google Cloud (GKE, Cloud Run, Cloud Functions, Pub/Sub, Cloud Storage)
  • Build and optimize CI/CD pipelines (GitHub Actions, GitLab CI, Jenkins, or similar)
  • Develop infrastructure-as-code using Terraform or similar tools
  • Set up and maintain container orchestration (Kubernetes, GKE) and automated deployment workflows
  • Implement monitoring, alerting, and observability using tools such as Prometheus, Grafana, ELK/Elastic, Stackdriver, or OpenTelemetry
  • Ensure compliance with security and governance standards across all environments
  • Collaborate closely with engineering teams to ensure scalable, high-performance deployment architectures
  • Support AI/ML and GenAI workloads (Vertex AI pipelines, model hosting, GPU workloads, inference optimization)
  • Manage environment strategies, release pipelines, configuration management, and secrets management
  • Optimize cloud costs and recommend improvements for performance and reliability
What we offer
What we offer
  • Flexible working format - remote, office-based or flexible
  • A competitive salary and good compensation package
  • Personalized career growth
  • Professional development tools (mentorship program, tech talks and trainings, centers of excellence, and more)
  • Active tech communities with regular knowledge sharing
  • Education reimbursement
  • Memorable anniversary presents
  • Corporate events and team buildings
  • Other location-specific benefits
Read More
Arrow Right

Senior Principal Cloud Developer

The role involves designing and building innovative Agentic AI applications and ...
Location
Location
United States , San Jose
Salary
Salary:
157500.00 - 361500.00 USD / Year
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10-15 years of experience in developing highly scalable cloud and cloud-native applications using technology stacks, architecture, design, development, and support
  • at least one year of recent multi-agent Agentic and RAG GenAI Software Development experience applied to Networking and/or Observability domains
  • experience developing Network Observability software for large scale Network Monitoring, Network Performance, Network Configuration or Network Capacity Management products
  • deep understanding and experience in Networking Protocol and Networking Best Practices for Enterprise and Service Provider networks
  • proven skills and programming experience in Golang, scalable concurrent processing, REST, Data Caching Services, DB schema design and data access technologies
  • experience in building, orchestrating, and deploying highly scalable REST based stateless APIs/web services for web applications in Kubernetes environment
  • familiarity with code versioning tools such as Git
  • knowledge of Network and NetFlow Logs processing and indexing
  • ability to communicate with senior Executives and with customers
Job Responsibility
Job Responsibility
  • design and build large scale distributed systems
  • apply best practices for high availability, scalability, resilience, performance, and security requirements in the cloud
  • transition proof-of-concept implementations into R&D teams to accelerate new product delivery
  • create technical content such as designs, specifications, and initial software implementations
  • mentor less-experienced staff members
  • collect product feedback from field interactions to provide input into Engineering and Product Management
  • maintain knowledge of OpsRamp SaaS product and roadmap, as well as competition
  • collaborate with product team to translate functional requirements into technical solutions
  • develop monitoring solutions using tools and services that are part of the cloud infrastructure
  • facilitate CI/CD by integrating development processes
What we offer
What we offer
  • comprehensive suite of benefits supporting physical, financial, and emotional wellbeing
  • personal and professional development programs
  • unconditional inclusion and flexibility to manage work and personal needs
  • Fulltime
Read More
Arrow Right

Senior Staff Data Engineer- ML & AI Platform

At Marktplaats, data is at the heart of everything we do, but Intelligence is wh...
Location
Location
Netherlands , Amsterdam
Salary
Salary:
Not provided
adevinta.com Logo
Adevinta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of experience with a specific focus on the intersection of Data Engineering, MLOps, and AI Infrastructure
  • Deep knowledge of Spark internals, structured streaming, and performance tuning for large-scale data processing
  • Proven experience architecting end-to-end ML platforms for Traditional ML (Classic MLOps) while actively enabling the organization on Generative AI concepts
  • Strong background in building automated pipelines and ensuring system observability
  • Practical experience building infrastructure for Large Language Models, including managing the complexity of chaining models and tools
  • Solid experience serving models at low latency and high concurrency using containerized solutions
  • Ability to speak the language of AI/ML Engineers and effectively bridge the gap between experimental code and production systems
  • Expert level Python
  • Experience with PyTorch, Terraform, Terragrunt, Docker, Kubernetes, GitHub Actions, Datadog
  • Experience with Databricks AI Stack: MLflow, Mosaic AI, Unity Catalog, Feature Store, Databricks Model Serving, Vector Databases
Job Responsibility
Job Responsibility
  • Lead the evolution of our Machine Learning & AI Platform, designing the architecture for AI Agents and establishing patterns for Vector Databases
  • Act as a first mover: validate new Databricks features and integrate them into the platform
  • Write the guidelines for GenAI development, helping teams transition from notebook experiments to production-grade LLM applications
  • Design the Feature Store, manage the Model Registry, and set up the infrastructure for Vector Search and RAG (Retrieval Augmented Generation) workflows
  • Elevate the technical bar of the team, mentoring Staff and Senior engineers on design patterns, code quality, and architectural decisions
  • Translate complex requirements from ML Engineers and Data Scientists into robust engineering tickets and infrastructure roadmaps
What we offer
What we offer
  • An attractive Base Salary
  • Participation in our Short Term Incentive plan (annual bonus)
  • Work From Anywhere: Enjoy up to 20 days a year of working from anywhere
  • A 24/7 Employee Assistance Program for you and your family
  • Fulltime
Read More
Arrow Right

Senior MLOps Engineer

Prolific is not just another player in the AI space – we are the architects of t...
Location
Location
United Kingdom
Salary
Salary:
Not provided
prolific.com Logo
Prolific
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years experience with cloud infrastructure and infrastructure as code
  • Previous experience with the ML and LLM lifecycle - training, hosting, optimisation, observability
  • Used to working closely with researchers and data scientists - taking experiments from worksheets into production
  • Strong grasp of ML fundamentals and modern GenAI stack
Job Responsibility
Job Responsibility
  • Infrastructure & Platform Engineering: Design and maintain scalable cloud environments (GCP/AWS) using Terraform
  • Manage GPU/TPU resource allocation for training, fine-tuning, and interactive notebooks
  • Build internal services and CLI tools to streamline the developer experience for the AI team
  • ML & LLM Orchestration: Design CI/CD/CT (Continuous Training) pipelines using tools such as GitHub Actions, MLFlow, Vertex AI Pipelines
  • Develop reusable patterns for model serving
  • Managing service deployments to Kubernetes
  • Manage and optimize vector databases and embedding pipelines for RAG-based systems
  • Performance & Optimization: Implement techniques to reduce latency and increase throughput
  • Solve scaling bottlenecks for serverless or containerized model deployments
  • Optimize GPU utilization and cloud spend without compromising performance
What we offer
What we offer
  • competitive salary
  • benefits
  • remote working
  • impactful, mission-driven culture
Read More
Arrow Right

Senior AI Security Engineer

The Senior AI Security Engineer is a technical leader and engineering manager wi...
Location
Location
Hungary , Budapest
Salary
Salary:
22713830.00 - 38083370.00 HUF / Month
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8-10+ years of experience in software engineering, with demonstrable experience as a technical lead or engineering manager
  • Python mastery: Deep, hands-on experience building and maintaining production-grade Python applications and services
  • LLM engineering: Practical experience with LLM APIs (OpenAI, Anthropic, Google), prompt engineering, model evaluation, and input/output guardrails
  • Production systems: Track record of deploying and operating AI/ML systems in production at enterprise scale
  • Software engineering fundamentals: Clean code, design patterns, testable architecture, CI/CD, infrastructure-as-code
  • 3+ years leading or managing engineering teams, including performance management, hiring, and career development
  • Track record of delivering complex software products in environments where priorities shift rapidly
  • Experience setting engineering standards and driving quality across a team's output
  • Demonstrated ability to mentor and develop engineers through code review, architectural guidance, and knowledge sharing
  • Proven capability to attract, develop, and retain engineering talent
Job Responsibility
Job Responsibility
  • Agentic AI Engineering & Use Case Incubation (40%): Own and evolve the Incubator Environment — the platform and tooling that enables CISO teams to move from idea to working PoC to validated use case
  • Partner with cybersecurity domain teams to understand their challenges, identify high-value AI use cases, and rapidly prototype agentic solutions
  • Design, build, and deploy agentic AI systems that autonomously perform cybersecurity tasks — including threat analysis, security control validation, intelligent triage, and response orchestration
  • Architect multi-agent orchestration systems, defining how AI agents collaborate, delegate, and escalate across security workflows
  • Implement robust agent infrastructure: tool use frameworks, memory and context management, planning/execution loops, guardrails, and human-in-the-loop controls
  • Build and maintain RAG pipelines, knowledge retrieval systems, and dynamic context assembly that underpin agent decision-making
  • Shepherd validated use cases through to production readiness and handoff to the dedicated product support team
  • Drive adoption and effective use of AI development tooling (Devin, GitHub Copilot, Claude Code) to maximize team velocity
  • Make key technical decisions on architecture, technology selection, and build-vs-integrate trade-offs
  • Incubator Platform & Technical Architecture (25%): Design and maintain the Incubator Environment architecture — a scalable, secure platform that enables rapid prototyping and validation of agentic AI use cases
What we offer
What we offer
  • Cafeteria Program
  • Home Office Allowance (for colleagues working in hybrid work models)
  • Paid Parental Leave Program (maternity and paternity leave)
  • Private Medical Care Program and onsite medical rooms at our offices
  • Pension Plan Contribution to voluntary pension fund
  • Group Life Insurance
  • Employee Assistance Program
  • Access to a wide variety of learning and development programs, online course libraries and upskilling platforms, such as Udemy and Degreed
  • Flexible work arrangements to support you in managing work - life balance
  • Career progression opportunities across geographies and business lines
  • Fulltime
Read More
Arrow Right

Forward Deployed Engineering Manager, GenAI Applications

At Scale AI, we are not just building AI tools. We are pioneering the next era o...
Location
Location
Germany; United Kingdom , Berlin; London
Salary
Salary:
Not provided
scale.com Logo
Scale
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3–5+ years of engineering management experience, ideally in high-growth tech environments with direct customer engagement
  • A strong engineering background and a track record of leading technical teams to deliver impactful products
  • Hands-on experience building or deploying AI-powered systems, with an understanding of how model behavior shapes user experience
  • A deep interest in Generative AI and a belief in its potential to transform enterprise workflows
  • Proven ability to make fast, sound decisions in high-pressure or ambiguous situations
  • The leadership presence to earn trust and respect from engineers, peers, and senior stakeholders
  • Strategic thinking with a clear ability to turn business goals into actionable engineering plans
  • Operational rigor in planning, prioritizing, and delivering complex software systems at scale
  • Experience with cloud infrastructure (AWS, GCP, or Azure), DevOps, and scalable platform architecture
  • Strong communication and collaboration skills, with the ability to align cross-functional teams toward common goals
Job Responsibility
Job Responsibility
  • Lead, mentor, and grow a high-performing team of forward deployed engineers supporting enterprise GenAI initiatives
  • Set clear technical direction and foster a culture of ownership, speed, and trust
  • Work hands-on with customers to deeply understand and solve complex business and technical challenges
  • Scope, deliver, and guide the adoption of solutions by embedding directly with customer teams
  • Make smart tradeoffs between speed, scope, reusability, and bespoke solutions
  • Drive technical decisions and help the team navigate system integrations and architecture choices
  • Shape the product roadmap by translating customer needs into scalable platform improvements
  • Build internal tooling and shared capabilities to accelerate delivery across diverse use cases
  • Collaborate across engineering, product, and customer teams to streamline execution and reduce friction
  • Ensure fast, high-quality delivery of features, experiments, and customer launches
Read More
Arrow Right