CrawlJobs Logo

Staff Software Engineer, AI Runtime

apollographql.com Logo

Apollo GraphQL

Location Icon

Location:
United States

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

185000.00 - 215000.00 USD / Year

Job Description:

We’re seeking a Staff Software Engineer to help power the future of agentic AI workflows. You’ll take our MCP Server to the next level, turning it into an enterprise-grade service that lets diverse tools and systems be exposed effortlessly to AI agents. Looking ahead, you’ll also help architect the MCP Gateway—a new layer that will route requests across tools, enforce policies, and provide the runtime foundation for scalable multi-agent systems. Along the way, you’ll tackle challenges in scalability, performance, and developer experience to ensure our platform feels seamless, powerful, and enterprise-ready.

Job Responsibility:

  • Architect and scale an enterprise AI/MCP Server and Gateway that powers multi-agent workflows across Apollo, including routing, orchestration, and integration boundaries
  • Design and implement robust server infrastructure to ensure reliability, performance, and security at scale
  • Build and maintain tools for agent discovery, communication, and coordination
  • Define deployment strategies and runtime optimizations to maximize efficiency and minimize operational overhead
  • Develop frameworks and patterns that enable seamless multi-agent collaboration and AI-driven orchestration
  • Integrate observability, logging, and monitoring for full visibility into server and agent behavior
  • Explore and implement AI-enhanced developer workflows to optimize orchestration and agent interactions
  • Collaborate with teams across Apollo to ensure the MCP Server meets evolving product and developer needs

Requirements:

  • Expertise in agent-to-tool orchestration, routing, and coordination in scalable, fault-tolerant systems
  • Deep expertise in Rust programming language
  • Strong background in distributed systems, server architecture, and high-performance backend development
  • Proven experience with protocol design, message routing, and server-side orchestration frameworks
  • Experience building and maintaining robust runtime infrastructure that supports AI-driven workflows and enables reliable agent-to-tool interactions
  • Proven experience with protocol design, message routing, and building server-side frameworks that enable scalable, reliable multi-tool agent workflows
  • Hands-on experience with observability, monitoring, and debugging frameworks for complex systems
  • Passion for clean, maintainable code, high system reliability, and scalable architecture
  • Experience in strategic system design, making architectural trade-offs, and planning for long-term scalability and maintainability
  • Strong technical leadership and mentorship, including guiding junior engineers and driving engineering best practices across teams
  • Ability to influence cross-team architecture decisions and align engineering efforts with product and business objectives
  • Production ownership experience: leading incident response, debugging, and performance optimization in high-impact backend systems

Nice to have:

  • Exposure to AI/ML-enabled developer tooling or autonomous system orchestration
  • Familiarity with cloud-native architectures, containerization, or orchestration frameworks
  • Experience with performance optimization and cost-efficient scaling of high-throughput distributed systems
What we offer:
  • Offers Equity
  • Choice of 3 Anthem Blue Cross medical plans (California residents can also choose from an additional 2 Kaiser medical plans)
  • Dental and Vision benefits are provided by Sun Life Financial

Additional Information:

Job Posted:
December 06, 2025

Employment Type:
Fulltime
Work Type:
Remote work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Staff Software Engineer, AI Runtime

Staff AI Developer Productivity Engineer

This is a staff-level, hands-on role at the intersection of Developer Productivi...
Location
Location
United Kingdom , London OR Chester
Salary
Salary:
Not provided
equalsplc.com Logo
Equals Group PLC
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years in software engineering with a track record of building internal developer tooling/DevEx (CLI/IDE plugins, bots, internal services or web apps)
  • Strong proficiency in TypeScript/Node.js, C#/.NET and front-end experience with React/Next.js
  • Hands-on LLM integration: prompting, tool/function calling, retrieval (embeddings/vector stores) and basic evaluation of quality
  • Solid understanding of CI/CD, test strategy and code-quality tooling
  • able to design fast inner loops and reliable pipelines
  • Product mindset: discovery interviews, hypothesis-driven experiments, before/after evaluation and clear business cases
  • Excellent communication and influence skills
  • comfortable enabling teams via docs, workshops and remote office hours
  • A role model for our values: Make it happen
  • Succeed together
Job Responsibility
Job Responsibility
  • AI-first discovery roadmap - Interview engineers, analyse telemetry and prioritise the highest-leverage opportunities across coding, review, testing, releases and knowledge flows. Set baselines and target deltas and maintain a transparent backlog/roadmap grounded in measurable outcomes
  • Ship internal tools - Build and maintain AI-augmented dev tools across IDE/CLI, GitHub, Slack, Notion, Retool and internal web apps that reduce manual effort and cycle time
  • Agentic flows across the SDLC - Design structured, agentic workflows (including MCP and tool-use with human-in-the-loop gates) to improve code authoring/review, test creation/selection and the release, incident response and onboarding processes
  • Frontier scouting & evaluation - Continuously experiment with cutting-edge AI models, frameworks and runtimes
  • run short, hypothesis-driven pilots with clear success criteria and developer-fit assessments
  • productionise successful approaches
  • Automation pipelines - Implement LLM-assisted and AST-based code-mod workflows for migrations, boilerplate generation, test creation and docs updates
  • Enablement & change management - Drive adoption with playbooks, golden prompts, demos, office hours and a champions network across build teams
  • measure usage and satisfaction and iterate
What we offer
What we offer
  • 25 days holiday per year + your birthday off
  • Opportunities for progression, development and learning new skills - £250 towards the cost of learning & development
  • Free onsite Nuffield Health gym & pool (London) and discounted gym membership elsewhere
  • GetActive with Aviva - Health and Wellbeing discounts on services and products
  • Interbank currency rates on travel money and international transfers
  • Bupa Private Healthcare
  • Free Eye Test and £50 up to the cost of glasses
  • EAP Service - Mental Health Services
  • Life Assurance Policy - x3 annual salary
  • Contributory pension scheme
Read More
Arrow Right

Staff Product Security Engineer

We’re looking for a Staff Product Security Engineer to lead the design and imple...
Location
Location
United States
Salary
Salary:
184000.00 - 252000.00 USD / Year
alpha-sense.com Logo
AlphaSense
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of experience in product, application, or cloud security engineering
  • Deep understanding of secure SDLC, threat modeling, and secure architecture design
  • Proven expertise with AWS cloud security concepts and best practices
  • Strong experience with container security, orchestration, and runtime protection
  • Proficiency in Python, Java, and/or JavaScript for security automation, code review, and tooling
  • Experience securing AI/ML pipelines, data workflows, or model-serving infrastructure
  • Familiarity with DevSecOps and continuous integration/deployment environments
Job Responsibility
Job Responsibility
  • Embed robust security practices throughout the software and AI development lifecycle (SDLC)
  • Lead secure design reviews, threat modeling, and risk assessments for AI-driven products, APIs, and backend services
  • Partner with engineering and product teams to ensure security, privacy, and compliance by design
  • Build and maintain security automation and governance frameworks that integrate seamlessly into development workflows
  • Architect and enforce security controls for AI/ML systems, including model training, data pipelines, and inference environments
  • Identify and mitigate AI-specific attack vectors such as data poisoning, model inversion, prompt injection, and model theft
  • Collaborate with governance and compliance teams to align with ethical AI principles and frameworks like NIST AI RMF and the EU AI Act
  • Implement model provenance, integrity, and auditability controls to ensure responsible and secure AI operations
  • Partner with DevOps and SRE teams to secure service meshes, container networking, and secrets management
  • Drive software supply chain security, including artifact integrity, dependency management, and vulnerability reduction
What we offer
What we offer
  • Competitive compensation, benefits, and career growth opportunities
  • Opportunity to shape and drive product security strategy
  • Collaborative and security-minded engineering culture
  • Work on cutting-edge security challenges in a fast-growing company
  • Performance-based bonus, equity, and a generous benefits program
  • Fulltime
Read More
Arrow Right

Staff Infrastructure Software Engineer - AI Platform

We are currently seeking a Staff Software Engineer to join the AI Platform team ...
Location
Location
Canada
Salary
Salary:
Not provided
addepar.com Logo
Addepar
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Extensive experience as a Software/Backend Engineer, with a track record of taking on increasing responsibility
  • Experience across the full product lifecycle: designing, implementing, shipping, scaling, operationalizing, and maintaining technology/SaaS products
  • Exceptional Programming skills and fundamentals in Python/Go/Java, with a proven track record of building large scale production systems
  • Proficient experience with diverse compute environments including microservices (K8s), Databricks and serverless architectures (e.g. AWS Lambda)
  • Demonstrable experience leading initiatives with infrastructure-as-code tools such as Terraform in complex, multi-account environments
  • Proficient experience with comprehensive monitoring and alerting stacks (e.g. Prometheus/Grafana/Sentry/cloud-native tools), with a focus on observability strategy
  • Excellent interpersonal and communication skills to effectively collaborate with multi-functional teams, articulate complex technical concepts, and influence outcomes
  • Applicants must have legal authorization to work in the country where this role is based on the first day of employment
Job Responsibility
Job Responsibility
  • Design and build the production runtime for LLM-based agents and products, creating the services and infrastructure that serve autonomous agents
  • Develop deep application-level knowledge to proactively inform and influence requirements, constraints and best practices for implementing composable, complex AI systems
  • Lead the design, implementation, and automation of production infrastructure on a variety of cloud environments (Kubernetes/Databricks), to enable us to ship and scale AI features instantly
  • Evangelize and promote disciplined, best engineering practices to enforce strong production hygiene and culture
  • Initiate and lead collaborations with cross-functional teams to identify and resolve complex application or infrastructure issues, serving as a technical subject matter expert
  • Architect, build, and maintain advanced, automated CI/CD pipelines e.g. using Jenkins, ArgoCD, AWS CodeBuild/Pipeline, GitHub Actions, or similar, establishing best practices for deployment strategies (e.g., blue/green, canary)
  • Develop systems and best practices monitoring, alerting, and troubleshooting of our probabilistic and AI-driven systems and broader software stack
Read More
Arrow Right

Staff Infrastructure Software Engineer - AI Platform

We are currently seeking a Staff Software Engineer to join the AI Platform team ...
Location
Location
United Kingdom , Edinburgh
Salary
Salary:
Not provided
addepar.com Logo
Addepar
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Extensive experience as a Software/Backend Engineer, with a track record of taking on increasing responsibility
  • Experience across the full product lifecycle: designing, implementing, shipping, scaling, operationalizing, and maintaining technology/SaaS products
  • Exceptional Programming skills and fundamentals in Python/Go/Java, with a proven track record of building large scale production systems
  • Proficient experience with diverse compute environments including microservices (K8s), Databricks and serverless architectures (e.g. AWS Lambda)
  • Demonstrable experience leading initiatives with infrastructure-as-code tools such as Terraform in complex, multi-account environments
  • Proficient experience with comprehensive monitoring and alerting stacks (e.g. Prometheus/Grafana/Sentry/cloud-native tools), with a focus on observability strategy
  • Excellent interpersonal and communication skills to effectively collaborate with multi-functional teams, articulate complex technical concepts, and influence outcomes
Job Responsibility
Job Responsibility
  • Design and build the production runtime for LLM-based agents and products, creating the services and infrastructure that serve autonomous agents
  • Develop deep application-level knowledge to proactively inform and influence requirements, constraints and best practices for implementing composable, complex AI systems
  • Lead the design, implementation, and automation of production infrastructure on a variety of cloud environments (Kubernetes/Databricks), to enable us to ship and scale AI features instantly
  • Evangelize and promote disciplined, best engineering practices to enforce strong production hygiene and culture
  • Initiate and lead collaborations with cross-functional teams to identify and resolve complex application or infrastructure issues, serving as a technical subject matter expert
  • Architect, build, and maintain advanced, automated CI/CD pipelines e.g. using Jenkins, ArgoCD, AWS CodeBuild/Pipeline, GitHub Actions, or similar, establishing best practices for deployment strategies (e.g., blue/green, canary)
  • Develop systems and best practices monitoring, alerting, and troubleshooting of our probabilistic and AI-driven systems and broader software stack
Read More
Arrow Right

Staff Software Engineer, Managed AI - AI Platform

Be a part of the AI revolution with sustainable technology at Crusoe. Here, you'...
Location
Location
United States , San Francisco, CA; Sunnyvale, CA
Salary
Salary:
208725.00 - 253000.00 USD / Year
crusoe.ai Logo
Crusoe
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Advanced degree in Computer Science/Engineering
  • 8-10+ years of industry experience with demonstrated history of consistent success leading a varied portfolio of initiatives across your function
  • Experience with distributed systems, cloud services (compute, storage, networking, database), and delivering early-stage projects quickly
  • Experience with Generative AI (LLMs, Multimodal) and familiar with AI infrastructure (training, inference, ETL pipelines)
  • Proficient with container runtimes (e.g., Kubernetes), microservices, REST APIs, gRPC, and the full software development lifecycle including CI/CD
Job Responsibility
Job Responsibility
  • Lead the design and implementation of core AI services, including: Resilient fault-tolerant queues for efficient task distribution
  • Model catalogs for managing and versioning AI models
  • Scheduling mechanisms optimized for cost and performance
  • Architect and scale infrastructure to handle millions of API requests per second
  • Implement robust monitoring and alerting to ensure system health and 24/7 availability
  • Collaborate closely with product management, business strategy, and other engineering teams to define the AI platform roadmap
  • Influence the long-term vision and architectural decisions of the platform
  • Contribute to open-source AI frameworks and actively participate in the AI community
  • Prototype and rapidly iterate on emerging technologies and new features
What we offer
What we offer
  • Restricted Stock Units in a fast growing, well-funded technology company
  • Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
  • Employer contributions to HSA accounts
  • Paid Parental Leave
  • Paid life insurance, short-term and long-term disability
  • Teladoc
  • 401(k) with a 100% match up to 4% of salary
  • Generous paid time off and holiday schedule
  • Cell phone reimbursement
  • Tuition reimbursement
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, Software Co-Design AI HPC Systems

Our team’s mission is to architect, co-design, and productionize next-generation...
Location
Location
United States , Mountain View
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
  • Master's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
  • Strong background in one or more of the following areas: AI accelerator or GPU architectures
  • Distributed systems and large-scale AI training/inference
  • High-performance computing (HPC) and collective communications
  • ML systems, runtimes, or compilers
  • Performance modeling, benchmarking, and systems analysis
  • Hardware–software co-design for AI workloads
  • Proficiency in systems-level programming (e.g., C/C++, CUDA, Python) and performance-critical software development.
  • Proven ability to work across organizational boundaries and influence technical decisions involving multiple stakeholders.
Job Responsibility
Job Responsibility
  • Lead the co-design of AI systems across hardware and software boundaries, spanning accelerators, interconnects, memory systems, storage, runtimes, and distributed training/inference frameworks.
  • Drive architectural decisions by analyzing real workloads, identifying bottlenecks across compute, communication, and data movement, and translating findings into actionable system and hardware requirements.
  • Co-design and optimize parallelism strategies, execution models, and distributed algorithms to improve scalability, utilization, reliability, and cost efficiency of large-scale AI systems.
  • Develop and evaluate what-if performance models to project system behavior under future workloads, model architectures, and hardware generations, providing early guidance to hardware and platform roadmaps.
  • Partner with compiler, kernel, and runtime teams to unlock the full performance of current and next-generation accelerators, including custom kernels, scheduling strategies, and memory optimizations.
  • Influence and guide AI hardware design at system and silicon levels, including accelerator microarchitecture, interconnect topology, memory hierarchy, and system integration trade-offs.
  • Lead cross-functional efforts to prototype, validate, and productionize high-impact co-design ideas, working across infrastructure, hardware, and product teams.
  • Mentor senior engineers and researchers, set technical direction, and raise the overall bar for systems rigor, performance engineering, and co-design thinking across the organization.
  • Fulltime
Read More
Arrow Right

Staff Software Engineer, SDK (Language Runtime)

Our team makes the Temporal SDKs — our SDKs are the primary mechanism through wh...
Location
Location
United States
Salary
Salary:
196000.00 - 245000.00 USD / Year
temporal.io Logo
Temporal
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Minimum 10 years experience post graduation writing scalable software
  • BS or MS in Computer Science (or a closely-related degree), or equivalent work experience writing production-grade software
  • Fluency in multiple programming languages, and an affinity for learning new ones
  • Deep experience with concurrent programming
  • Deep experience with distributed systems
  • Experience designing APIs and writing documentation for publicly-available libraries or modules
  • A methodical, detail-oriented approach to your work
  • Strong technical communication skills—written and verbal—in English
  • A deep sense of ownership and personal accountability
  • A proactive approach to managing your work
Job Responsibility
Job Responsibility
  • Take end-to-end ownership of new features
  • Design and build Temporal SDKs used by customers
  • Tightly integrate Temporal SDKs with their respective languages
  • Develop features that provide a foundation for the reliable execution of the current wave of agentic AI systems
  • Work directly with our community to debug issues and get feedback
  • Write publicly-readable technical documentation
  • Go the extra mile to support a customer in need
  • Travel to meet your coworkers for a week once or twice a year
  • Attend the occasional developer conference
What we offer
What we offer
  • Unlimited PTO, 12 Holidays + 2 Floating Holidays
  • 100% Premiums Coverage for Medical, Dental, and Vision
  • AD&D, LT & ST Disability, and Life Insurance
  • Empower 401K Plan
  • Perks for Learning & Development, Lifestyle Spending, In-Home Office Setup, Professional Memberships, WFH Meals, Internet Stipend
  • $3,600 / Year Work from Home Meals
  • $1,800 / Year Professional Enrichment
  • $1,200 / Year Lifestyle Spending Account
  • $1,000 / Year In-Home Office Setup
  • $74 / Month Reimbursement for Internet
  • Fulltime
Read More
Arrow Right

Staff Software Engineer, Model LifeCycle

The Staff Software Engineer for the Model LifeCycle team will play a key role in...
Location
Location
United States , San Francisco
Salary
Salary:
208725.00 - 253000.00 USD / Year
crusoe.ai Logo
Crusoe
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or Master's degree in Computer Science, Engineering, or a related field
  • 8-10+ years of industry experience with demonstrated history of consistent success leading a varied portfolio of initiatives across your function
  • Proven track record of delivering production features on time
  • Experience in using cloud-based services, such as, elastic compute, object storage, virtual private networks, managed database, etc.
  • Experience with Generative AI (Large Language Models, Multimodal)
  • Experience with AI infrastructure, including training, inference
Job Responsibility
Job Responsibility
  • Contribute to fine-tuning systems for large foundation models (SFT, PEFT, LoRA, adapters), including multi-node orchestration, checkpointing, failure recovery, and cost-efficient scaling
  • Implement and maintain end-to-end training pipelines for Large Language Models
  • Contribute to distillation and reinforcement learning pipelines (e.g., preference optimization, policy optimization, reward modeling)
  • Develop and maintain agent execution infrastructure
  • Implement features for dataset, model, and experiment management: versioning, lineage, evaluation, and reproducible fine-tuning at scale
  • Work closely with Principal Engineers, product, business, and platform teams to implement the core abstractions and APIs of the system
  • Contribute to architectural decisions around training runtimes, scheduling, storage, and model lifecycle management
  • Engage with the open-source LLM ecosystem
What we offer
What we offer
  • Restricted Stock Units in a fast growing, well-funded technology company
  • Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
  • Employer contributions to HSA accounts
  • Paid Parental Leave
  • Paid life insurance, short-term and long-term disability
  • Teladoc
  • 401(k) with a 100% match up to 4% of salary
  • Generous paid time off and holiday schedule
  • Cell phone reimbursement
  • Tuition reimbursement
  • Fulltime
Read More
Arrow Right