CrawlJobs Logo

Sr. Cloud Infrastructure Engineer (Ai & Llm Platforms)

q6cyber.com Logo

Q6 Cyber

Location Icon

Location:

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

We are seeking a specialized Infrastructure Engineer to bridge the gap between our large data repositories, Cloud Platform and the rapidly evolving world of Large Language Models (LLMs). You will be responsible for building the 'plumbing' that allows our internal teams and external users to leverage AI effectively. This includes deploying Model Context Protocol (MCP) servers, building agentic execution environments, and scaling our internal Retrieval-Augmented Generation (RAG) architecture.

Job Responsibility:

  • Guide the architecture that will allow us to leverage AI tools with our large existing data stores and incoming streams of realtime intelligence
  • Work closely with other infrastructure engineers and software development teams to integrate AI tools into existing systems
  • Design, deploy, and maintain Model Context Protocol (MCP) servers to allow LLMs to securely interact with our internal databases, APIs, and external tooling
  • Build and orchestrate sandboxed, scalable environments (e.g., using Docker or specialized runtimes) where users can safely build and execute AI agents
  • Develop and manage the infrastructure for our internal RAG (Retrieval-Augmented Generation) pipeline, including vector database management (e.g., Pinecone, Weaviate, or pgvector) and automated embedding pipelines
  • Utilize Kubernetes (K8s) and Infrastructure as Code (Terraform/Pulumi) to deploy LLM-related tools, ensuring high availability and low latency for model inference and data retrieval
  • Implement strict guardrails for data privacy within LLM workflows, ensuring internal datasets remain secure while being accessible to authorized AI tools

Requirements:

  • 5+ years of experience in DevOps, Platform Engineering, or SRE, with at least 1-2 years specifically focused on AI/ML infrastructure
  • Proven track record of building production-grade RAG pipelines or LLM-integrated applications
  • Thrives in 'day zero' environments where the tools and protocols (like MCP) are evolving weekly
  • Deep understanding of the security implications of LLMs (prompt injection, data leakage, and secure tool execution)
  • Experience working with substantial datasets (over 1bn objects, dozens or hundreds of TBs) and the challenges of leveraging AI tools with these data sets
  • Bachelor's degree or equivalent in computer science or related field
  • Cloud & Orchestration: AWS/GCP/Azure, Kubernetes, Terraform, Helm
  • AI Frameworks: LangChain, LlamaIndex, LangGraph
  • Data & Vectors: Pinecone, Milvus, Qdrant, or pgvector
  • Apache Kafka/Pulsar
  • Elasticsearch/OpenSearch
  • traditional SQL RDBMS
  • Languages: Python (Expert), TypeScript/Node.js (for MCP development), Go
  • AI Protocols: Model Context Protocol (MCP), REST/gRPC
What we offer:

We offer a competitive compensation package and comprehensive benefits package

Additional Information:

Job Posted:
May 14, 2026

Employment Type:
Fulltime
Work Type:
Remote work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Sr. Cloud Infrastructure Engineer (Ai & Llm Platforms)

Sr. Software Engineer (Agentic Runtime)

Dialpad’s AI Engineering organization is responsible for building and maintainin...
Location
Location
Argentina , Buenos Aires
Salary
Salary:
Not provided
dialpad.com Logo
Dialpad
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3–6 years of experience in distributed systems, platform engineering, or ML infrastructure, with exposure to LLM-based or agentic systems strongly preferred
  • Strong understanding of agent architectures, including ReAct, plan-and-execute, and multi-agent coordination patterns
  • Deep knowledge of context management, prompt lifecycle, tool-call protocols (e.g., function calling, MCP), and agent memory strategies (short-term, episodic, and long-term)
  • Experience integrating and managing external tool ecosystems, including web search, code interpreters, databases, and third-party APIs
  • Familiarity with retrieval-augmented generation (RAG) and how retrieval fits into broader agentic pipelines
  • Understanding of LLM output reliability challenges — hallucination, non-determinism, and retry/fallback strategies at runtime
  • Proficiency in Go and Python 3 (experience with Rust or TypeScript is a plus)
  • Strong understanding of distributed systems, microservices, and event-driven architectures suited to long-running agent tasks
  • Passion for real-time performance optimization, including streaming responses, async execution, and parallel tool invocation
  • Experience with API design using OpenAPI, Swagger, or equivalent, with an eye toward agentic interaction patterns
Job Responsibility
Job Responsibility
  • Contribute to the design, development, and maintenance of agentic runtime systems, including agent orchestration, tool execution pipelines, and multi-step reasoning loops
  • Build and optimize core runtime components, including task planners, action dispatchers, memory managers, and context window management systems
  • Work on agent coordination techniques, including dynamic tool selection, parallel agent execution, state management, and result aggregation across multi-agent workflows
  • Maintain and enhance highly scalable agentic platforms with a focus on low-latency execution, cost efficiency, and deterministic behavior
  • Ensure high availability, reliability, and fault tolerance in agent runtime services, including graceful degradation when LLM or tool calls fail
  • Collaborate with cross-functional teams — including ML researchers, product, and platform engineers — to translate agentic product requirements into robust runtime infrastructure
  • Develop and optimize real-time distributed systems, microservices, and event-driven architectures powering agentic task execution
  • Design and implement sandboxed execution environments for safe agent use of tools, code execution, and external API calls
  • Implement and maintain monitoring, alerting, and performance metrics covering agent run success rates, token consumption, latency, and cost attribution
  • Evaluate and integrate emerging agentic frameworks, LLM APIs, and tooling ecosystems to continuously improve platform capabilities
What we offer
What we offer
  • Competitive benefits and perks
  • Robust training program
  • Inclusive office environment
  • Recognized Great Place to Work culture
Read More
Arrow Right

Sr Cloud Solution Architect

As a Cloud Solution Architect aligned to the Azure AI platform for Microsoft's C...
Location
Location
United States , Multiple Locations
Salary
Salary:
106400.00 - 203600.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in computer science, Information Technology, Engineering, Business or related field AND 4+ years’ experience in cloud/infrastructure technologies, information technology (IT) consulting/support, systems administration, network operations, software development/support, technology solutions, practice development, architecture, and/or Business Applications consulting OR equivalent experience
  • Bachelor's Degree in Computer Science, Information Technology, Engineering, Business, Liberal Arts, or related field AND 8+ years experience in cloud/infrastructure technologies, information technology (IT) consulting/support, systems administration, network operations, software development/support, technology solutions, practice development, architecture, and/or consulting OR Master's Degree in Computer Science, Information Technology, Engineering, Business, Liberal Arts, or related field AND 6+ years experience in cloud/infrastructure technologies, technology solutions, practice development, architecture, and/or consulting OR equivalent experience
  • 4+ years experience working in a customer-facing role (e.g., internal and/or external)
  • 4+ years experience working on technical projects
  • Technical Certification in Cloud (e.g., Azure, Amazon Web Services, Google, security certifications)
  • Breadth of technical experience and knowledge in foundational security, foundational AI, architecture design, with depth / Subject Matter Expertise in one or more of the following: Deep Domain Expertise in Azure AI Areas: Deep domain expertise in one of the Azure AI specific areas, such as Cognitive Services, Azure OpenAI and CoPilot OR hands-on experience working with the respective products at the expert level
  • Expertise with Azure AI Search and/or Vector Indexes, Azure Document Processing and /or equivalent OCR technology
  • Programming Languages and Integration: Proficient with Python, C#, R, JavaScript, or similar programming languages in the context of application development, and ability to integrate Azure AI with other services (e.g., Azure Functions, Azure Container Apps, Docker, API Management)
  • Architecting Enterprise-Grade Solutions: The ability to create and explain 3-tier architecture diagrams, system context diagrams, system interaction diagrams, etc
  • Proven experience building enterprise-grade, AI-focused solutions on the cloud (Azure, AWS, GCP) for customers, from Minimum Viable Products (MVPs) leading to production deployments
Job Responsibility
Job Responsibility
  • Play a pivotal role in the AI Factory, providing technical enablement, operational support, and strategic engagement across customer projects
  • Understand customers' overall data estate, business priorities, and IT success measures
  • Innovate with AI solutions that drive business value
  • Facilitate scalable delivery through strong technical program management utilizing a factory model/approach, driving program awareness and demand across the regional operating units
  • Attend in-flight project status meetings to monitor progress and identify support needs
  • Engage directly with complex or non-standard customer use cases beyond existing accelerators
  • Participate in intake reviews for milestone sizing, objection handling, and technical scoping
  • Deliver solutions with high performance, security, scalability, maintainability, repeatability, reusability, and reliability upon deployment
  • Gather insights from customers and partners
  • Develop opportunities to enhance Customer Success and help customers extract value from their Microsoft investments
  • Fulltime
Read More
Arrow Right

Sr Staff ML Engineer - Production & MLOps Focus - GenAI Security Platform

Join our team building a cutting-edge multi-tenanted GenAI Security Platform tha...
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
paloaltonetworks.com Logo
Palo Alto Networks
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ years of ML engineering experience with hands-on LLM/NLP work
  • Practical experience building LLM-based applications (agents, multi-turn systems, evaluators)
  • Understanding of model fine-tuning, embedding optimization, and prompt engineering
  • Experience with LLM APIs (OpenAI, Anthropic, AWS Bedrock, Azure OpenAI)
  • Knowledge of LLM orchestration frameworks ( LangChain, LlamaIndex, Pydantic AI, custom solutions)
  • Familiarity with model architectures and when to fine-tune vs prompt engineer
  • Strong experience deploying ML models to production at scale
  • Experience with Model serving frameworks (vLLM preferred
  • TensorRT-LLM, Ray Serve, or similar a plus)
  • Kubernetes and Docker proficiency for ML workload orchestration
Job Responsibility
Job Responsibility
  • Build and deploy LLM-based agents and multi-step evaluation workflows
  • Fine-tune models, optimize embeddings, and manage model weights and artifacts
  • Deploy and scale ML services on Kubernetes with proper monitoring and resource management
  • Implement experiment tracking, model versioning, and deployment automation
  • Develop observability dashboards for ML metrics, costs, latency, and quality
  • Optimize LLM API usage through caching, batching, and intelligent routing strategies
  • Manage vector database infrastructure and semantic search systems
  • Create CI/CD pipelines for ML artifacts and automated testing frameworks
  • Collaborate with ML researchers to productionize prototypes and scale experiments
  • Fulltime
Read More
Arrow Right

Apps Dev Tech Sr Lead Analyst Java SVP

Apps Dev Tech Sr Lead Analyst Java SVP at Citi. Own and drive end-to-end migrati...
Location
Location
India , Chennai, Pune
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or Master's degree in Computer Science, Information Technology, Engineering, or a related technical discipline
  • 12–16 years of progressive software engineering experience, with at least 5 years in a senior technical leadership role (Tech Lead, Staff Engineer, principle)
  • Demonstrated experience leading or significantly contributing to at least one large-scale Mainframe modernisation or legacy platform migration programme
  • Prior experience in financial services, banking, or a similarly regulated industry strongly preferred
  • Java 17/21 (Expert), Python, COBOL / JCL (reading & assessment level)
  • Spring Boot, Spring Cloud, Spring Security, Spring Data JPA, Project Reactor / WebFlux
  • OpenShift, Kubernetes, Docker, Helm
  • Tekton, Harness, Jenkins, Git (Bitbucket / GitHub), Artifactory, SonarQube
  • Oracle, MongoDB, PostgreSQL, MS SQL Server, Redis, DB2/z
  • Apache Kafka, IBM MQ
Job Responsibility
Job Responsibility
  • Own and drive end-to-end migration of legacy Mainframe workloads (COBOL, JCL, CICS, IMS, DB2/z) to modern Java-based microservices deployed on enterprise container platforms (OpenShift / Kubernetes)
  • Conduct application assessments to identify migration candidates, define target-state architectures, and produce sequenced migration roadmaps with risk registers and rollback plans
  • Establish reusable migration patterns, tooling, and runbooks to accelerate successive migration waves
  • Leverage AI-assisted code translation tools (e.g., autonomous AI coding agents such as Devin) to automate COBOL-to-Java conversion at scale, with human-in-the-loop review gates
  • Validate functional parity post-migration through automated testing strategies (unit, integration, regression, performance)
  • Identify and quantify cost-reduction opportunities across MIPS consumption, software licensing, infrastructure footprint, and operational overhead
  • Build and maintain a technology cost model
  • track savings realisation against committed targets on a monthly cadence
  • Drive rationalisation of redundant systems, decommission end-of-life platforms, and consolidate tooling to reduce Total Cost of Ownership (TCO)
  • Partner with Finance and Vendor Management to renegotiate contracts and optimise spend through right-sizing, reserved capacity, and FinOps practices
  • Fulltime
Read More
Arrow Right

Sr Staff Engineer Software, Fullstack (Prisma AIRS) - NetSec

Join our team building a cutting-edge multi-tenanted GenAI Security Platform tha...
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
paloaltonetworks.com Logo
Palo Alto Networks
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven experience building and scaling multi-tenant SaaS platforms with strict data isolation
  • Strong knowledge of API design, RESTful principles, and OpenAPI specifications
  • Proficiency in modern JavaScript frameworks (React, Vue, or Svelte) with TypeScript
  • Experience building data-intensive dashboards with complex visualisations and real-time data
  • Strong CSS/styling skills and responsive design principles
  • Demonstrated experience working with production AI/ML systems at scale
  • Practical experience integrating LLM APIs and managing inference at scale
  • Understanding of LLM operational challenges: rate limiting, cost optimisation, latency management, fallback strategies
  • Familiarity with AI agent frameworks (LangChain, AutoGen, MCP, or similar)
  • Knowledge of prompt engineering, semantic search, and vector databases
Job Responsibility
Job Responsibility
  • Design and implement high-performance REST APIs with enterprise-grade multi-tenant isolation and strict security boundaries
  • Work on distributed systems architecture handling high-throughput workloads with mission-critical uptime requirements
  • Build responsive dashboards and administrative interfaces for platform management, data visualisation, and system configuration
  • Integrate multiple LLM providers, implement semantic search capabilities, and build intelligent agent workflows
  • Architect complex, multi-step AI evaluation pipelines for asynchronous job execution and large-scale data processing
  • Design and implement database schemas with proper indexing, query optimisation, and data isolation strategies
  • Build and maintain scalable micro-services with async/await patterns and type-safe code
  • Develop data-intensive UIs with real-time updates, complex state management, and intuitive user experiences
  • Deploy and manage containerised applications on Kubernetes with comprehensive observability
  • Write thorough tests (frontend and backend) and maintain high code quality standards with automated tooling
  • Fulltime
Read More
Arrow Right
New

Classroom Teacher

We have an exciting opportunity to appoint a KS1/KS2 Teacher from 1st September ...
Location
Location
United Kingdom , Sittingbourne
Salary
Salary:
Not provided
swale.at Logo
Swale Academies Trust
Expiration Date
May 18, 2026
Flip Icon
Requirements
Requirements
  • Qualified Teacher Status
  • KS1/KS2 teaching experience
  • Commitment to safeguarding
Job Responsibility
Job Responsibility
  • Deliver quality first teaching
  • Provide a broad and balanced curriculum
  • Create the right environment for children to learn
What we offer
What we offer
  • Teachers Pension Scheme
  • Employee Referral Recruitment Incentive
  • Enhanced Maternity Pay
  • Discounts with local and national retailers, cinemas and restaurants
  • Access to training and development
  • Employee Assistance Programme
  • Cycle to Work scheme
  • On-site Parking
  • Fulltime
!
Read More
Arrow Right
New

Pharmacy Technician

We’re building a world of health around every individual — shaping a more connec...
Location
Location
United States , Dracut
Salary
Salary:
17.00 - 27.00 USD / Hour
https://www.cvshealth.com/ Logo
CVS Health
Expiration Date
July 04, 2026
Flip Icon
Requirements
Requirements
  • Must comply with any state board of pharmacy requirements or laws governing the practice of pharmacy
  • If the state board of pharmacy does not address or mandate a minimum age requirement, must be at least 16 years of age
  • If the state board of pharmacy does not address or mandate a minimum educational requirement, must have a high school diploma or equivalent, or be actively enrolled in high school or high school equivalency program
  • State-level licensure and national certification requirements vary by state
  • Regular and predictable attendance, including nights and weekends
  • Ability to complete required training within designated timeframe
  • Attention and Focus: Ability to concentrate on a task over a period of time
  • Ability to pivot quickly from one task to another to meet patient and business needs
  • Ability to confirm prescription information and label accuracy, ensuring patient safety
  • Customer Service and Team Orientation: Actively look for ways to help people, and do so in a friendly manner
Job Responsibility
Job Responsibility
  • Living our purpose by following all company SOPs at each workstation to help our Pharmacists manage and improve patient health
  • Following pharmacy workflow procedures at each pharmacy workstation (i.e., production, pick-up, drive-thru, and drop-off) for safe and accurate prescription fulfillment
  • Contributing to positive patient experiences by showing empathy and genuine care: creating heartfelt and personalized moments while serving patients at pick-up, drive-thru, and over the phone
  • keeping patients healthy by offering immunizations and other services at the register and over the phone
  • and demonstrating compassionate care by solving or escalating patient problems
  • Completing basic inventory activities, as permitted by law, and as directed by the pharmacy leadership team, such as accurately putting away medication deliveries and completing cycle counts, returns-to-stocks, waiting bin inventories, etc.
  • Contributing to a high-performing team, embracing a growth mindset, and being receptive to feedback
  • actively seeking opportunities to expand clinical and technical knowledge needed to better assist patients
  • Remaining flexible for both scheduling and business needs, while contributing to a safe, inclusive, and engaging team dynamic
  • voluntarily traveling to stores in the market to work shifts as needed by the business
What we offer
What we offer
  • dental
  • vision
  • wellness resources
  • employee discounts
  • access to certain voluntary benefits
  • other programs
  • Parttime
Read More
Arrow Right
New

Manager, C&Q Document Preparation (Facilities and Utilities)

In this vital role, you will lead the preparation and delivery of commissioning ...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
amgen.com Logo
Amgen
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or Master's degree in Engineering, Life Sciences, Pharmaceutical Sciences, or a related technical field
  • 8–13 years of experience in GMP commissioning, qualification, validation, engineering, or technical operations within the pharmaceutical or biotechnology industry
  • Experience supporting commissioning and qualification activities for manufacturing systems, packaging equipment, or facilities and utilities
  • Experience preparing or overseeing qualification documentation including protocols, reports, risk assessments, traceability matrices, and testing documentation
  • Experience working in regulated GMP manufacturing environments with knowledge of inspection readiness and compliance expectations
  • Strong understanding of GMP commissioning, qualification, and validation practices within regulated pharmaceutical or biotechnology environments
  • Experience leading technical teams including full-time employees and contingent or outsourced resources
  • Strong understanding of risk-based qualification methodologies and lifecycle validation approaches
  • Knowledge of GMP documentation practices, data integrity expectations, and inspection readiness principles
  • Experience collaborating across Engineering, Validation, Quality, Manufacturing, Facilities, and Project Management teams
Job Responsibility
Job Responsibility
  • Lead a team responsible for preparation and delivery of C&Q documentation supporting commissioning and qualification activities across Engineering projects and systems
  • Manage work across a blended team of employees and contingent resources, ensuring effective planning, prioritization, and execution of deliverables
  • Oversee development of qualification protocols, reports, risk assessments, traceability matrices, and related qualification documentation
  • Execute C&Q documentation preparation using established CoE standards, templates, and processes to ensure consistency, compliance, and efficiency
  • Partner with Engineering, Validation, Quality, Manufacturing, Facilities, and Project teams to support qualification execution and operational readiness activities
  • Monitor documentation progress, identify risks, and drive timely resolution of issues that may impact project timelines
  • Maintain inspection-ready documentation and support regulatory inspections and internal audits as required
  • Support implementation of standardized C&Q practices, templates, and procedures across Engineering projects and sites
  • Provide leadership, coaching, and development for team members while fostering a culture of accountability, collaboration, and quality execution
  • Contribute to continuous improvement initiatives that enhance C&Q documentation quality, efficiency, and compliance
  • Fulltime
Read More
Arrow Right