CrawlJobs Logo

Staff Software Engineer, AI Runtime

apollographql.com Logo

Apollo GraphQL

Location Icon

Location:
United States

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

185000.00 - 215000.00 USD / Year

Job Description:

We’re seeking a Staff Software Engineer to help power the future of agentic AI workflows. You’ll take our MCP Server to the next level, turning it into an enterprise-grade service that lets diverse tools and systems be exposed effortlessly to AI agents. Looking ahead, you’ll also help architect the MCP Gateway—a new layer that will route requests across tools, enforce policies, and provide the runtime foundation for scalable multi-agent systems. Along the way, you’ll tackle challenges in scalability, performance, and developer experience to ensure our platform feels seamless, powerful, and enterprise-ready.

Job Responsibility:

  • Architect and scale an enterprise AI/MCP Server and Gateway that powers multi-agent workflows across Apollo, including routing, orchestration, and integration boundaries
  • Design and implement robust server infrastructure to ensure reliability, performance, and security at scale
  • Build and maintain tools for agent discovery, communication, and coordination
  • Define deployment strategies and runtime optimizations to maximize efficiency and minimize operational overhead
  • Develop frameworks and patterns that enable seamless multi-agent collaboration and AI-driven orchestration
  • Integrate observability, logging, and monitoring for full visibility into server and agent behavior
  • Explore and implement AI-enhanced developer workflows to optimize orchestration and agent interactions
  • Collaborate with teams across Apollo to ensure the MCP Server meets evolving product and developer needs

Requirements:

  • Expertise in agent-to-tool orchestration, routing, and coordination in scalable, fault-tolerant systems
  • Deep expertise in Rust programming language
  • Strong background in distributed systems, server architecture, and high-performance backend development
  • Proven experience with protocol design, message routing, and server-side orchestration frameworks
  • Experience building and maintaining robust runtime infrastructure that supports AI-driven workflows and enables reliable agent-to-tool interactions
  • Proven experience with protocol design, message routing, and building server-side frameworks that enable scalable, reliable multi-tool agent workflows
  • Hands-on experience with observability, monitoring, and debugging frameworks for complex systems
  • Passion for clean, maintainable code, high system reliability, and scalable architecture
  • Experience in strategic system design, making architectural trade-offs, and planning for long-term scalability and maintainability
  • Strong technical leadership and mentorship, including guiding junior engineers and driving engineering best practices across teams
  • Ability to influence cross-team architecture decisions and align engineering efforts with product and business objectives
  • Production ownership experience: leading incident response, debugging, and performance optimization in high-impact backend systems

Nice to have:

  • Exposure to AI/ML-enabled developer tooling or autonomous system orchestration
  • Familiarity with cloud-native architectures, containerization, or orchestration frameworks
  • Experience with performance optimization and cost-efficient scaling of high-throughput distributed systems
What we offer:
  • Offers Equity
  • Choice of 3 Anthem Blue Cross medical plans (California residents can also choose from an additional 2 Kaiser medical plans)
  • Dental and Vision benefits are provided by Sun Life Financial

Additional Information:

Job Posted:
December 06, 2025

Employment Type:
Fulltime
Work Type:
Remote work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Staff Software Engineer, AI Runtime

Staff AI Developer Productivity Engineer

This is a staff-level, hands-on role at the intersection of Developer Productivi...
Location
Location
United Kingdom , London OR Chester
Salary
Salary:
Not provided
equalsplc.com Logo
Equals Group PLC
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years in software engineering with a track record of building internal developer tooling/DevEx (CLI/IDE plugins, bots, internal services or web apps)
  • Strong proficiency in TypeScript/Node.js, C#/.NET and front-end experience with React/Next.js
  • Hands-on LLM integration: prompting, tool/function calling, retrieval (embeddings/vector stores) and basic evaluation of quality
  • Solid understanding of CI/CD, test strategy and code-quality tooling
  • able to design fast inner loops and reliable pipelines
  • Product mindset: discovery interviews, hypothesis-driven experiments, before/after evaluation and clear business cases
  • Excellent communication and influence skills
  • comfortable enabling teams via docs, workshops and remote office hours
  • A role model for our values: Make it happen
  • Succeed together
Job Responsibility
Job Responsibility
  • AI-first discovery roadmap - Interview engineers, analyse telemetry and prioritise the highest-leverage opportunities across coding, review, testing, releases and knowledge flows. Set baselines and target deltas and maintain a transparent backlog/roadmap grounded in measurable outcomes
  • Ship internal tools - Build and maintain AI-augmented dev tools across IDE/CLI, GitHub, Slack, Notion, Retool and internal web apps that reduce manual effort and cycle time
  • Agentic flows across the SDLC - Design structured, agentic workflows (including MCP and tool-use with human-in-the-loop gates) to improve code authoring/review, test creation/selection and the release, incident response and onboarding processes
  • Frontier scouting & evaluation - Continuously experiment with cutting-edge AI models, frameworks and runtimes
  • run short, hypothesis-driven pilots with clear success criteria and developer-fit assessments
  • productionise successful approaches
  • Automation pipelines - Implement LLM-assisted and AST-based code-mod workflows for migrations, boilerplate generation, test creation and docs updates
  • Enablement & change management - Drive adoption with playbooks, golden prompts, demos, office hours and a champions network across build teams
  • measure usage and satisfaction and iterate
What we offer
What we offer
  • 25 days holiday per year + your birthday off
  • Opportunities for progression, development and learning new skills - £250 towards the cost of learning & development
  • Free onsite Nuffield Health gym & pool (London) and discounted gym membership elsewhere
  • GetActive with Aviva - Health and Wellbeing discounts on services and products
  • Interbank currency rates on travel money and international transfers
  • Bupa Private Healthcare
  • Free Eye Test and £50 up to the cost of glasses
  • EAP Service - Mental Health Services
  • Life Assurance Policy - x3 annual salary
  • Contributory pension scheme
Read More
Arrow Right

Staff Product Security Engineer

We’re looking for a Staff Product Security Engineer to lead the design and imple...
Location
Location
United States
Salary
Salary:
184000.00 - 252000.00 USD / Year
alpha-sense.com Logo
AlphaSense
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of experience in product, application, or cloud security engineering
  • Deep understanding of secure SDLC, threat modeling, and secure architecture design
  • Proven expertise with AWS cloud security concepts and best practices
  • Strong experience with container security, orchestration, and runtime protection
  • Proficiency in Python, Java, and/or JavaScript for security automation, code review, and tooling
  • Experience securing AI/ML pipelines, data workflows, or model-serving infrastructure
  • Familiarity with DevSecOps and continuous integration/deployment environments
Job Responsibility
Job Responsibility
  • Embed robust security practices throughout the software and AI development lifecycle (SDLC)
  • Lead secure design reviews, threat modeling, and risk assessments for AI-driven products, APIs, and backend services
  • Partner with engineering and product teams to ensure security, privacy, and compliance by design
  • Build and maintain security automation and governance frameworks that integrate seamlessly into development workflows
  • Architect and enforce security controls for AI/ML systems, including model training, data pipelines, and inference environments
  • Identify and mitigate AI-specific attack vectors such as data poisoning, model inversion, prompt injection, and model theft
  • Collaborate with governance and compliance teams to align with ethical AI principles and frameworks like NIST AI RMF and the EU AI Act
  • Implement model provenance, integrity, and auditability controls to ensure responsible and secure AI operations
  • Partner with DevOps and SRE teams to secure service meshes, container networking, and secrets management
  • Drive software supply chain security, including artifact integrity, dependency management, and vulnerability reduction
What we offer
What we offer
  • Competitive compensation, benefits, and career growth opportunities
  • Opportunity to shape and drive product security strategy
  • Collaborative and security-minded engineering culture
  • Work on cutting-edge security challenges in a fast-growing company
  • Performance-based bonus, equity, and a generous benefits program
  • Fulltime
Read More
Arrow Right

Staff Infrastructure Software Engineer - AI Platform

We are currently seeking a Staff Software Engineer to join the AI Platform team ...
Location
Location
United Kingdom , Edinburgh
Salary
Salary:
Not provided
addepar.com Logo
Addepar
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Extensive experience as a Software/Backend Engineer, with a track record of taking on increasing responsibility
  • Experience across the full product lifecycle: designing, implementing, shipping, scaling, operationalizing, and maintaining technology/SaaS products
  • Exceptional Programming skills and fundamentals in Python/Go/Java, with a proven track record of building large scale production systems
  • Proficient experience with diverse compute environments including microservices (K8s), Databricks and serverless architectures (e.g. AWS Lambda)
  • Demonstrable experience leading initiatives with infrastructure-as-code tools such as Terraform in complex, multi-account environments
  • Proficient experience with comprehensive monitoring and alerting stacks (e.g. Prometheus/Grafana/Sentry/cloud-native tools), with a focus on observability strategy
  • Excellent interpersonal and communication skills to effectively collaborate with multi-functional teams, articulate complex technical concepts, and influence outcomes
Job Responsibility
Job Responsibility
  • Design and build the production runtime for LLM-based agents and products, creating the services and infrastructure that serve autonomous agents
  • Develop deep application-level knowledge to proactively inform and influence requirements, constraints and best practices for implementing composable, complex AI systems
  • Lead the design, implementation, and automation of production infrastructure on a variety of cloud environments (Kubernetes/Databricks), to enable us to ship and scale AI features instantly
  • Evangelize and promote disciplined, best engineering practices to enforce strong production hygiene and culture
  • Initiate and lead collaborations with cross-functional teams to identify and resolve complex application or infrastructure issues, serving as a technical subject matter expert
  • Architect, build, and maintain advanced, automated CI/CD pipelines e.g. using Jenkins, ArgoCD, AWS CodeBuild/Pipeline, GitHub Actions, or similar, establishing best practices for deployment strategies (e.g., blue/green, canary)
  • Develop systems and best practices monitoring, alerting, and troubleshooting of our probabilistic and AI-driven systems and broader software stack
Read More
Arrow Right

Staff Software Engineer, Managed AI - AI Platform

Be a part of the AI revolution with sustainable technology at Crusoe. Here, you'...
Location
Location
United States , San Francisco, CA; Sunnyvale, CA
Salary
Salary:
208725.00 - 253000.00 USD / Year
crusoe.ai Logo
Crusoe
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Advanced degree in Computer Science/Engineering
  • 8-10+ years of industry experience with demonstrated history of consistent success leading a varied portfolio of initiatives across your function
  • Experience with distributed systems, cloud services (compute, storage, networking, database), and delivering early-stage projects quickly
  • Experience with Generative AI (LLMs, Multimodal) and familiar with AI infrastructure (training, inference, ETL pipelines)
  • Proficient with container runtimes (e.g., Kubernetes), microservices, REST APIs, gRPC, and the full software development lifecycle including CI/CD
Job Responsibility
Job Responsibility
  • Lead the design and implementation of core AI services, including: Resilient fault-tolerant queues for efficient task distribution
  • Model catalogs for managing and versioning AI models
  • Scheduling mechanisms optimized for cost and performance
  • Architect and scale infrastructure to handle millions of API requests per second
  • Implement robust monitoring and alerting to ensure system health and 24/7 availability
  • Collaborate closely with product management, business strategy, and other engineering teams to define the AI platform roadmap
  • Influence the long-term vision and architectural decisions of the platform
  • Contribute to open-source AI frameworks and actively participate in the AI community
  • Prototype and rapidly iterate on emerging technologies and new features
What we offer
What we offer
  • Restricted Stock Units in a fast growing, well-funded technology company
  • Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
  • Employer contributions to HSA accounts
  • Paid Parental Leave
  • Paid life insurance, short-term and long-term disability
  • Teladoc
  • 401(k) with a 100% match up to 4% of salary
  • Generous paid time off and holiday schedule
  • Cell phone reimbursement
  • Tuition reimbursement
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, Software Engineer

Help build the infrastructure that powers training, evaluation, and data platfor...
Location
Location
Switzerland , Zürich
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong software engineering background building reliable, scalable production systems (Python preferred)
  • Hands‑on experience supporting large‑scale ML / LLM training, evaluation, or experimentation infrastructure
  • Operating GPU‑heavy workloads in cloud environments using Docker and Kubernetes (scheduling, utilization, isolation)
  • Designing and running data / compute pipelines and orchestration (e.g., Airflow, Argo) with object storage (Azure Blob / S3)
  • Platform reliability and operability: observability, metrics, logging, tracing, alerting (Prometheus, Grafana, OpenTelemetry)
Job Responsibility
Job Responsibility
  • Design and build core platform services for scalable training and evaluation, including cluster orchestration, job scheduling, data and compute pipelines, and artifact management
  • Standardize containerized workflows by maintaining Docker images, CI/CD, and runtime configurations
  • advocate for best practices in security, reproducibility, and cost efficiency
  • Implement end-to-end observability and operations through metrics, tracing, logging, dashboard development, monitoring, and automated alerts for model training and platform health (using Prometheus, Grafana, OpenTelemetry)
  • Architect and operate services on Azure cloud platforms, managing infrastructure-as-code (Terraform/Helm), secrets, networking, and storage
  • Enhance developer experience by creating tools, CLIs, and portals that simplify job submission, metrics analysis, and experiment management for generalist software engineering and research teams
  • Enforce security and compliance policies for data access, container hardening, and supply-chain integrity, and partner with security and privacy teams to maintain robust practices in multi-tenant environments and secret management
  • Collaborate cross-functionally with data, model, and product teams to align infrastructure roadmaps with training needs, evaluation protocols, and Copilot product goals
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, Software Co-Design AI HPC Systems

Our team’s mission is to architect, co-design, and productionize next-generation...
Location
Location
United States , Mountain View
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
  • Master's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
  • Strong background in one or more of the following areas: AI accelerator or GPU architectures
  • Distributed systems and large-scale AI training/inference
  • High-performance computing (HPC) and collective communications
  • ML systems, runtimes, or compilers
  • Performance modeling, benchmarking, and systems analysis
  • Hardware–software co-design for AI workloads
  • Proficiency in systems-level programming (e.g., C/C++, CUDA, Python) and performance-critical software development.
  • Proven ability to work across organizational boundaries and influence technical decisions involving multiple stakeholders.
Job Responsibility
Job Responsibility
  • Lead the co-design of AI systems across hardware and software boundaries, spanning accelerators, interconnects, memory systems, storage, runtimes, and distributed training/inference frameworks.
  • Drive architectural decisions by analyzing real workloads, identifying bottlenecks across compute, communication, and data movement, and translating findings into actionable system and hardware requirements.
  • Co-design and optimize parallelism strategies, execution models, and distributed algorithms to improve scalability, utilization, reliability, and cost efficiency of large-scale AI systems.
  • Develop and evaluate what-if performance models to project system behavior under future workloads, model architectures, and hardware generations, providing early guidance to hardware and platform roadmaps.
  • Partner with compiler, kernel, and runtime teams to unlock the full performance of current and next-generation accelerators, including custom kernels, scheduling strategies, and memory optimizations.
  • Influence and guide AI hardware design at system and silicon levels, including accelerator microarchitecture, interconnect topology, memory hierarchy, and system integration trade-offs.
  • Lead cross-functional efforts to prototype, validate, and productionize high-impact co-design ideas, working across infrastructure, hardware, and product teams.
  • Mentor senior engineers and researchers, set technical direction, and raise the overall bar for systems rigor, performance engineering, and co-design thinking across the organization.
  • Fulltime
Read More
Arrow Right

Staff / Senior Backend Engineer, Agentic AI

This team designs intelligent, autonomous systems that eliminate financial opera...
Location
Location
United States , San Francisco
Salary
Salary:
150000.00 - 299000.00 USD / Year
airwallex.com Logo
Airwallex
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years backend experience with Python at scale
  • Deep knowledge of SQL/NoSQL database design
  • Experience with distributed systems, event-driven architectures, or microservices
  • Familiarity with GCP services (Pub/Sub, Cloud Run, Cloud SQL, BigQuery, KMS, etc.)
  • Strong grasp of security and compliance best practices
  • Passion for building the backend that powers intelligent products
Job Responsibility
Job Responsibility
  • Design and operate distributed, event-driven systems that process high-volume, high-sensitivity financial data with strict correctness and latency guarantees
  • Build typed, production-grade APIs and services in Python (FastAPI + Pydantic) that power agentic workflows, AI copilots, and real-time financial infrastructure
  • Own data architecture end-to-end schema design, event models, ledger primitives, and storage layers optimized for auditability, traceability, and scale
  • Develop resilient infrastructure primitives (queues, retries, idempotency, observability, backpressure handling) that let teams ship fast without sacrificing reliability
  • Work at the intersection of systems + AI, building feedback loops where product data continuously improves models and models autonomously improve system behavior
  • Ship platform foundations, not one-off features internal frameworks, SDKs, and tooling that accelerate every engineer in the org
  • Partner closely with ML + product teams to productionize research-grade ideas into real customer-facing systems
  • Help define the architecture of an AI-native financial stack not bolting AI onto software, but designing software assuming AI is a first-class runtime
What we offer
What we offer
  • Offers Equity
  • Offers Bonus
  • medical, dental, and vision insurance
  • a 401(k) plan
  • short-term and long-term disability
  • basic life insurance
  • well-being benefits
  • 20 paid days of vacation
  • 12 paid days of company holidays
  • Fulltime
Read More
Arrow Right

Sr Staff Engineer Software (Full Stack Prisma AIRS)

With Prisma AIRS, Palo Alto Networks is building the world's most comprehensive ...
Location
Location
United States , Santa Clara
Salary
Salary:
147000.00 - 237500.00 USD / Year
paloaltonetworks.com Logo
Palo Alto Networks
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science or a related field with 5+ years of experience, or a Master's degree with 3+ years of experience
  • Expertise in modern React (Functional components) and JavaScript/TypeScript
  • Expertise in building scalable distributed systems with excellent Python or Golang programming skills
  • Expertise writing comprehensive unit, integration, and end-to-end tests
  • Proven experience with modern backend frameworks, databases (SQL or NoSQL), and cloud platforms, specifically GCP (Google Cloud Platform)
  • Excellent written and verbal communication, able to collaborate and convey ideas effectively
  • Self-disciplined, self-managed, self-motivated, and strong sense of ownership, urgency, and drive
Job Responsibility
Job Responsibility
  • Design and build innovative, scalable software products to ensure our customers can use AI securely
  • Own new features/functionality from start to finish. Participate in all phases of the product development cycle, from definition, design, through implementation and test. Ensure that applications are production-ready, scalable, and reliable
  • Collaborate with product managers, backend software engineers, product designers, and infrastructure engineers to shape the future of Prisma AIRS
  • Deliver high quality UX and provide the Prisma AIRS customer with a seamless, intuitive customer experience
  • Proactively identify problems and opportunities, proposing and developing simple, attainable solutions to enhance the team's development process and product quality
  • Serve as a role model in establishing and implementing engineering best practices, including test-driven development for AI runtime services
  • Fulltime
Read More
Arrow Right