CrawlJobs Logo

Senior Software Engineer, AI Runtime

apollographql.com Logo

Apollo GraphQL

Location Icon

Location:
United States

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

157000.00 - 198900.00 USD / Year

Job Description:

We’re seeking a Senior Software Engineer to help power the future of agentic AI workflows. You’ll take our MCP Server to the next level, turning it into an enterprise-grade service that lets diverse tools and systems be exposed effortlessly to AI agents. Looking ahead, you’ll also help architect the MCP Gateway—a new layer that will route requests across tools, enforce policies, and provide the runtime foundation for scalable multi-agent systems. Along the way, you’ll tackle challenges in scalability, performance, and developer experience to ensure our platform feels seamless, powerful, and enterprise-ready.

Job Responsibility:

  • Scale an enterprise AI/MCP Server and Gateway that powers multi-agent workflows across Apollo, including routing, orchestration, and integration boundaries
  • Implement robust server infrastructure to ensure reliability, performance, and security at scale
  • Build and maintain tools for agent discovery, communication, and coordination
  • Define deployment strategies and runtime optimizations to maximize efficiency and minimize operational overhead
  • Develop frameworks and patterns that enable seamless multi-agent collaboration and AI-driven orchestration
  • Integrate observability, logging, and monitoring for full visibility into server and agent behavior
  • Explore and implement AI-enhanced developer workflows to optimize orchestration and agent interactions
  • Collaborate with teams within our org to ensure the MCP Server meets evolving product and developer needs

Requirements:

  • Expertise in agent-to-tool orchestration, routing, and coordination in scalable, fault-tolerant systems
  • Deep expertise in Rust programming language
  • Strong background in distributed systems, server architecture, and high-performance backend development
  • Proven experience with protocol design, message routing, and server-side orchestration frameworks
  • Experience building and maintaining robust runtime infrastructure that supports AI-driven workflows and enables reliable agent-to-tool interactions
  • Proven experience with protocol design, message routing, and building server-side frameworks that enable scalable, reliable multi-tool agent workflows
  • Hands-on experience with observability, monitoring, and debugging frameworks for complex systems
  • Passion for clean, maintainable code, high system reliability, and scalable architecture
  • Experience in strategic system design, making architectural trade-offs, and planning for long-term scalability and maintainability
  • Strong technical leadership and mentorship, including guiding junior engineers and driving engineering best practices across teams
  • Ability to influence cross-team architecture decisions and align engineering efforts with product and business objectives
  • Production ownership experience: leading incident response, debugging, and performance optimization in high-impact backend systems

Nice to have:

  • Exposure to AI/ML-enabled developer tooling or autonomous system orchestration
  • Familiarity with cloud-native architectures, containerization, or orchestration frameworks
  • Experience with performance optimization and cost-efficient scaling of high-throughput distributed systems

Additional Information:

Job Posted:
December 06, 2025

Work Type:
Remote work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior Software Engineer, AI Runtime

Senior Solution Engineer

JFrog is expanding in APAC, and we are looking for a strong Senior Solution Engi...
Location
Location
Singapore
Salary
Salary:
Not provided
jfrog.com Logo
JFrog
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience in pre-sales, solutions engineering, DevOps consulting, or platform engineering
  • Hands-on knowledge of Docker, Kubernetes, Git, Jenkins/GitHub/GitLab, cloud-native architectures
  • Strong communication and customer-facing skills
  • Based in Singapore and open to travel across SEA and Korea
Job Responsibility
Job Responsibility
  • Lead technical discovery, demos, and POCs for customers in SEA + Korea
  • Architect CI/CD, DevSecOps, and software supply chain solutions using the JFrog Platform
  • Work closely with sales, product, R&D, and channel partners
  • Represent JFrog at regional events, workshops, and customer sessions
  • Support enterprise adoption of Artifactory, Xray, Curation, Advanced Security, AI Catalog, Runtime, and more
Read More
Arrow Right

Distinguished Engineer- AI Agentics Engineering

At CVS Health, we’re building a world of health around every consumer and surrou...
Location
Location
United States , Woonsocket
Salary
Salary:
175100.00 - 334750.00 USD / Year
https://www.cvshealth.com/ Logo
CVS Health
Expiration Date
March 31, 2026
Flip Icon
Requirements
Requirements
  • 15+ years of Software Engineering experience required
  • 7+ years in AI/ML engineering with 3+ years specifically in agentic AI or autonomous systems
  • Proven experience building multi-agent systems from scratch (not just fine-tuning existing models)
  • Deep expertise in: Multi-agent system architectures: Actor model frameworks, distributed consensus protocols, agent communication standards (FIPA-ACL, KQML, MCP, A2A), and coordination patterns (hierarchical, peer-to-peer, marketplace-based)
  • LLM Integration Platforms: OpenAI API, Anthropic Claude API, Azure OpenAI Service, Google Vertex AI, and on-premises LLM deployment (vLLM, TensorRT-LLM, Ollama)
  • Agentic Frameworks: LangChain, LangGraph, AutoGen, CrewAI, Semantic Kernel, and custom agent runtime environments
  • Tool-using AI Systems: Function calling implementations, API integration patterns, IDE (Cursor, Windsurf), Notebooks (Jupyter), tool selection algorithms, and sandbox execution environments for safe tool usage
  • Agent Orchestration Platforms: Kubernetes-based agent deployment, Apache Airflow for agent workflows, Temporal for durable agent executions, Agentspace, and event-driven architectures (Apache Kafka, RabbitMQ)
  • Vector Databases & Knowledge Systems: Pinecone, Weaviate, Chroma, Qdrant for agent memory systems, and knowledge graph technologies (Neo4j, Amazon Neptune, Apache Jena)
  • Real-time Inference Infrastructure: NVIDIA Triton Inference Server, Ray Serve, TorchServe, and streaming architectures for sub-100ms agent response times
Job Responsibility
Job Responsibility
  • Strategic Agentic Architecture & Design: Drive the end-to-end architecture for highly scalable, multi-agent systems that can operate autonomously across complex enterprise workflows
  • Partner with other Principal Engineers, AI Architects, and executive leadership to shape the long-term agentic roadmap
  • Champion best practices for agent reliability, interpretability, safety, and performance optimization
  • Agent Platform Development & Orchestration: Oversee the design and development of new AI agent platforms from the ground up
  • Implement robust agent lifecycle management, including spawning, monitoring, termination, and inter-agent communication protocols
  • Foster an engineering culture that values agent autonomy, emergent intelligence, and continuous learning capabilities
  • Multi-Agent Systems & Emerging AI Technologies: Provide thought leadership on how multi-agent systems, large language models, and reinforcement learning create unique demands on infrastructure
  • Understand how to move AI agents from proof-of-concept to production-ready autonomous systems
  • Evaluate and recommend emerging agentic technologies and guide their integration into the broader technology stack
  • Cross-Functional Leadership & AI Mentoring: Serve as a key technical advisor for C-level executives and product leaders
What we offer
What we offer
  • Affordable medical plan options
  • 401(k) plan (including matching company contributions)
  • Employee stock purchase plan
  • No-cost programs for all colleagues including wellness screenings, tobacco cessation and weight management programs, confidential counseling and financial coaching
  • Paid time off
  • Flexible work schedules
  • Family leave
  • Dependent care resources
  • Colleague assistance programs
  • Tuition assistance
  • Fulltime
Read More
Arrow Right

Director, Digital Ecosystem Applications

This position is responsible for the Software Platforms group at the Innovation ...
Location
Location
United States , Belmont
Salary
Salary:
240000.00 - 285000.00 USD / Year
https://www.volkswagen-group.com Logo
Volkswagen AG
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years with 2+ years in a technical leadership role
  • CS, EE, M.S. Engineering (or equivalent) REQUIRED
  • M.S. Engineering (or equivalent) or PhD PREFERRED
  • Analytical and conceptual thinking – using logic and reason, creative and strategic
  • Communication skills – interpersonal, presentation and written
  • Managing interdisciplinary teams on individual projects
  • Integration – joining people, processes or systems
  • Influencing and negotiation skills
  • Problem solving
  • Resource management
Job Responsibility
Job Responsibility
  • Define the technical mission, architecture strategy, and long‑term platform vision for the In‑Vehicle Computing & Digital Ecosystem Applications team, spanning Android Automotive OS (AAOS), in‑vehicle compute platforms, Software‑Defined Vehicle (SDV) architecture, and AI‑driven cockpit intelligence
  • Provide technical leadership across the full software stack, including Android Framework, System Services, HAL layers, middleware, connectivity stacks, media/audio frameworks, HMI toolchains, and cloud‑connected AI runtimes within an SDV‑aligned architecture
  • Lead and mentor engineering teams in platform bring‑up, system integration, performance optimization, and development of AI‑agentic features, multimodal interaction models, and next‑generation speech technologies
  • Manage multi‑year budgets for platform development, AI integration, SDV‑aligned compute evolution, SoC evaluations, cloud services, and prototype programs
  • Deliver executive‑level technical reporting on architecture decisions, platform readiness, SDV integration milestones, AI progress, risks, and strategic recommendations
  • Drive strategic planning for ICC’s infotainment and cockpit portfolio, including AAOS evolution, hybrid cloud/edge AI pipelines, intelligent mobile agent technologies, and SDV‑centric software and compute roadmaps
  • Align technical roadmaps with global VW Group Innovation teams across infotainment, connectivity, AI/ML, vehicle architecture, cloud services, and SDV platform strategy, ensuring cross‑platform consistency and shared component reuse
  • Build strategic relationships with SoC vendors, Tier‑1 suppliers, cloud providers, and AI technology partners to influence cockpit compute and SDV platform evolution
  • Maintain partnerships with Silicon Valley companies specializing in AI runtimes, LLMs, speech, multimodal interaction, and automotive‑grade SDV‑compatible software frameworks
  • Collaborate with academic and research institutions on AI‑agentic systems, embedded ML, HMI, and in‑vehicle compute architectures aligned with SDV principles
What we offer
What we offer
  • Eligibility for annual performance bonus
  • Healthcare benefits
  • 401(k), with company match
  • Defined contribution retirement program
  • Tuition reimbursement
  • Company lease car program
  • Paid time off
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, Managed AI - AI model LifeCycle

The Senior Software Engineer for the Model LifeCycle team will contribute to bui...
Location
Location
United States , San Francisco
Salary
Salary:
172425.00 - 209000.00 USD / Year
crusoe.ai Logo
Crusoe
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Engineering, or a related field
  • Experience delivering production-ready features
  • Familiarity with essential cloud-based services (e.g., compute, storage, networking)
  • Familiarity with Generative AI (Large Language Models, Multimodal)
  • Experience with AI infrastructure components (training, inference)
  • 4-5+ years of industry experience with demonstrated history of consistent success leading a varied portfolio of initiatives across your function
Job Responsibility
Job Responsibility
  • Implement and maintain systems for fine-tuning large foundation models (SFT, PEFT, LoRA, adapters), including multi-node orchestration, checkpointing, failure recovery, and cost-efficient scaling
  • Implement and maintain end-to-end training pipelines for Large Language Models
  • Implement components for distillation and reinforcement learning pipelines (e.g., preference optimization, policy optimization, reward modeling)
  • Develop and maintain core agent execution infrastructure
  • Implement features for dataset, model, and experiment management, focusing on versioning, lineage, evaluation, and reproducible fine-tuning
  • Work closely with Senior Engineers and Principal Engineers, as well as product and platform teams, to implement system abstractions and APIs
  • Contribute to technical discussions on training runtimes, scheduling, storage, and model lifecycle management
  • Engage with the open-source LLM ecosystem
What we offer
What we offer
  • Restricted Stock Units
  • Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
  • Employer contributions to HSA accounts
  • Paid Parental Leave
  • Paid life insurance, short-term and long-term disability
  • Teladoc
  • 401(k) with a 100% match up to 4% of salary
  • Generous paid time off and holiday schedule
  • Cell phone reimbursement
  • Tuition reimbursement
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, Managed AI - AI Platform

Be a part of the AI revolution with sustainable technology at Crusoe. Here, you'...
Location
Location
United States , San Francisco, CA; Sunnyvale, CA
Salary
Salary:
172425.00 - 209000.00 USD / Year
crusoe.ai Logo
Crusoe
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Advanced degree in Computer Science/Engineering
  • 4-5+ years of industry experience with demonstrated history of consistent success leading a varied portfolio of initiatives across your function
  • Experience with distributed systems, cloud services (compute, storage, networking, database), and delivering early-stage projects quickly
  • Experience with Generative AI (LLMs, Multimodal) and familiar with AI infrastructure (training, inference, ETL pipelines)
  • Proficient with container runtimes (e.g., Kubernetes), microservices, REST APIs, gRPC, and the full software development lifecycle including CI/CD
Job Responsibility
Job Responsibility
  • Lead the design and implementation of core AI services, including: Resilient fault-tolerant queues for efficient task distribution
  • Model catalogs for managing and versioning AI models
  • Scheduling mechanisms optimized for cost and performance
  • Architect and scale infrastructure to handle millions of API requests per second
  • Implement robust monitoring and alerting to ensure system health and 24/7 availability
  • Collaborate closely with product management, business strategy, and other engineering teams to define the AI platform roadmap
  • Influence the long-term vision and architectural decisions of the platform
  • Contribute to open-source AI frameworks and actively participate in the AI community
  • Prototype and rapidly iterate on emerging technologies and new features
What we offer
What we offer
  • Restricted Stock Units
  • Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
  • Employer contributions to HSA accounts
  • Paid Parental Leave
  • Paid life insurance, short-term and long-term disability
  • Teladoc
  • 401(k) with a 100% match up to 4% of salary
  • Generous paid time off and holiday schedule
  • Cell phone reimbursement
  • Tuition reimbursement
  • Fulltime
Read More
Arrow Right

Principal Engineer

The Principal AI/ML Operations Engineer leads the architecture, automation, and ...
Location
Location
United States , Pleasanton, California
Salary
Salary:
251000.00 - 314500.00 USD / Year
blackline.com Logo
BlackLine
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Machine Learning, Data Science, or a related field
  • 10+ years in ML infrastructure, DevOps, and software system architecture
  • 4+ years in leading MLOps or AI Ops platforms
  • Strong programming skills in languages such as Python, Java, or Scala
  • Expertise in ML frameworks (TensorFlow, PyTorch, scikit-learn) and orchestration tools (Airflow, Kubeflow, Vertex AI, MLflow)
  • Proven experience operating production pipelines for ML and LLM-based systems across cloud ecosystems (GCP, AWS, Azure)
  • Deep familiarity with LangChain, LangGraph, ADK or similar agentic system runtime management
  • Strong competencies in CI/CD, IaC, and DevSecOps pipelines integrating testing, compliance, and deployment automation
  • Hands-on with observability stacks (Prometheus, Grafana, Newrelic) for model and agent performance tracking
  • Understanding of governance frameworks for Responsible AI, auditability, and cost metering across training and inference workloads
Job Responsibility
Job Responsibility
  • Define enterprise-level standards and reference architectures for ML-Ops and AIOps systems
  • Partner with data science, security, and product teams to set evaluation and governance standards (Guardrails, Bias, Drift, Latency SLAs)
  • Mentor senior engineers and drive design reviews for ML pipelines, model registries, and agentic runtime environments
  • Lead incident response and reliability strategies for ML/AI systems
  • Lead the deployment of AI models and systems in various environments
  • Collaborate with development teams to integrate AI solutions into existing workflows and applications
  • Ensure seamless integration with different platforms and technologies
  • Define and manage MCP Registry for agentic component onboarding, lifecycle versioning, and dependency governance
  • Build CI/CD pipelines automating LLM agent deployment, policy validation, and prompt evaluation of workflows
  • Develop and operationalize experimentation frameworks for agent evaluations, scenario regression, and performance analytics
What we offer
What we offer
  • short-term and long-term incentive programs
  • robust offering of benefit and wellness plans
  • Fulltime
Read More
Arrow Right

Forward Deployment Specialist

Wind River seeks an outstanding Forward Deployment Specialist to join our new el...
Location
Location
United States , Walnut Creek; Troy; San Diego
Salary
Salary:
160000.00 USD / Year
aptiv.com Logo
Aptiv plc
Expiration Date
April 10, 2026
Flip Icon
Requirements
Requirements
  • 10+ years in technical sales engineering for embedded software and cloud technologies at the edge demonstrating both technical and business acumen
  • Strong background in operating systems preferred (enterprise Linux, embedded Linux, RTOS, mixed-criticality) including boot, kernel, drivers, and performance tuning
  • Knowledge of middleware and runtimes supporting Edge AI (e.g., container runtimes Docker, containerd, Kubernetes / K3s / MicroK8s at the edge, AI runtimes and accelerators (TensorRT, OpenVINO, ONNX Runtime, vendor SDKs)
  • Ability to translate complex technical solutions into clear business value and ROI for executives
  • Ability to design end-to-end Edge AI solutions spanning data ingestion, model lifecycle, deployment, runtime, and operations
  • Hands-on understanding of AI frameworks and model formats (e.g., PyTorch, TensorFlow, ONNX)
  • Proven ability to lead customer-facing technical engagements, including discovery workshops, executive briefings, and architecture reviews
  • Skilled at eliciting business objectives, operational constraints, and success criteria, then translating them into viable Edge AI architectures
  • Comfortable running meetings end-to-end with stakeholders ranging from developers to C-suite executives
  • Trusted technical advisor capable of aligning AI initiatives with business value, ROI, and long-term platform strategy
Job Responsibility
Job Responsibility
  • Act as the primary technical expert on EDGE AI and emerging technologies, offering guidance to the sales team, clients, and internal stakeholders
  • Design and architect sophisticated, scalable solutions to solve critical business challenges and drive value for our clients and Wind River
  • Collaborate with C-level executives and senior technical leaders to align AI and emerging solutions with their strategic goals
  • Create and deliver advanced technical demonstrations and proofs of concept that highlight Wind River’s capabilities
  • Create a detailed technical understanding of the market and our competition in EDGE AI and emerging technology
  • Work closely with Product and R&D to provide market insights and client feedback, shaping the product roadmap
  • Help develop and implement technical sales strategies that drive value to Wind River and differentiate Wind River from the competition
  • Research, design, and write requirements where required
  • Generate outward-facing whitepapers, blogs, presentations and talks at conferences
  • Investigate new technologies and techniques and research ongoing industry developments
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, Software Co-Design AI HPC Systems

Our team’s mission is to architect, co-design, and productionize next-generation...
Location
Location
United States , Mountain View
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
  • Master's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
  • Strong background in one or more of the following areas: AI accelerator or GPU architectures
  • Distributed systems and large-scale AI training/inference
  • High-performance computing (HPC) and collective communications
  • ML systems, runtimes, or compilers
  • Performance modeling, benchmarking, and systems analysis
  • Hardware–software co-design for AI workloads
  • Proficiency in systems-level programming (e.g., C/C++, CUDA, Python) and performance-critical software development.
  • Proven ability to work across organizational boundaries and influence technical decisions involving multiple stakeholders.
Job Responsibility
Job Responsibility
  • Lead the co-design of AI systems across hardware and software boundaries, spanning accelerators, interconnects, memory systems, storage, runtimes, and distributed training/inference frameworks.
  • Drive architectural decisions by analyzing real workloads, identifying bottlenecks across compute, communication, and data movement, and translating findings into actionable system and hardware requirements.
  • Co-design and optimize parallelism strategies, execution models, and distributed algorithms to improve scalability, utilization, reliability, and cost efficiency of large-scale AI systems.
  • Develop and evaluate what-if performance models to project system behavior under future workloads, model architectures, and hardware generations, providing early guidance to hardware and platform roadmaps.
  • Partner with compiler, kernel, and runtime teams to unlock the full performance of current and next-generation accelerators, including custom kernels, scheduling strategies, and memory optimizations.
  • Influence and guide AI hardware design at system and silicon levels, including accelerator microarchitecture, interconnect topology, memory hierarchy, and system integration trade-offs.
  • Lead cross-functional efforts to prototype, validate, and productionize high-impact co-design ideas, working across infrastructure, hardware, and product teams.
  • Mentor senior engineers and researchers, set technical direction, and raise the overall bar for systems rigor, performance engineering, and co-design thinking across the organization.
  • Fulltime
Read More
Arrow Right