Senior Inference ML Runtime Engineer Job at Cerebras Systems (Sunnyvale)

Senior Software Engineer - Network Enablement (Applied ML)

We build simple yet innovative consumer products and developer APIs that shape h...

Location

United States , San Francisco

Salary:

180000.00 - 270000.00 USD / Year

Plaid

Expiration Date

Until further notice

Requirements

Strong software engineering skills including systems design, APIs, and building reliable backend services (Go or Python preferred)
Production experience with batch and streaming data pipelines and orchestration tools such as Airflow or Spark
Experience building or operating real-time scoring and online feature-serving systems, including feature stores and low-latency model inference
Experience integrating model outputs into product flows (APIs, feature flags) and measuring impact through experiments and product metrics
Experience with model lifecycle and operations: model registries, CI/CD for models, reproducible training, offline & online parity, monitoring and incident response

Job Responsibility

Embed model inference into Network Enablement product flows and decision logic (APIs, feature flags, backend flows)
Define and instrument product + ML success metrics (fraud reduction, retention lift, false positives, downstream impact)
Design and run experiments and rollout plans (backtesting, shadow scoring, A/B tests, feature-flagged releases) to validate product hypotheses
Build and operate offline training pipelines and production batch scoring for bank intelligence products
Ship and maintain online feature serving and low-latency model inference endpoints for real-time partner/bank scoring
Implement model CI/CD, model/version registry, and safe rollout/rollback strategies
Monitor model/data health: drift/regression detection, model-quality dashboards, alerts, and SLOs targeted to partner product needs
Ensure offline and online parity, data lineage, and automated validation / data contracts to reduce regressions
Optimize inference performance and cost for real-time scoring (batching, caching, runtime selection)
Ensure fairness, explainability and PII-aware handling for partner-facing ML features

What we offer

medical
dental
vision
401(k)
equity
commission

Fulltime

Senior Research Engineer

We are seeking a highly skilled Senior Research Engineer to collaborate closely ...

Location

United States

Salary:

210000.00 - 309000.00 USD / Year

Assembly

Expiration Date

Until further notice

Requirements

Strong expertise in the Python ecosystem and major ML frameworks (PyTorch, JAX)
Experience with lower-level programming (C++ or Rust preferred)
Deep understanding of GPU acceleration (CUDA, profiling, kernel-level optimization)
TPU experience is a strong plus
Proven ability to accelerate deep learning workloads using compiler frameworks, graph optimizations, and parallelization strategies
Solid understanding of the deep learning lifecycle: model design, large-scale training, data processing pipelines, and inference deployment
Strong debugging, profiling, and optimization skills in large-scale distributed environments
Excellent communication and collaboration skills, with the ability to clearly prioritize and articulate impact-driven technical solutions

Job Responsibility

Investigate and mitigate performance bottlenecks in large-scale distributed training and inference systems
Develop and implement both low-level (operator/kernel) and high-level (system/architecture) optimization strategies
Translate research models and prototypes into highly optimized, production-ready inference systems
Explore and integrate inference compilers such as TensorRT, ONNX Runtime, AWS Neuron and Inferentia, or similar technologies
Design, test, and deploy scalable solutions for parallel and distributed workloads on heterogeneous hardware
Facilitate knowledge transfer and bidirectional support between Research and Engineering teams, ensuring alignment of priorities and solutions

What we offer

competitive equity grants
100% employer-paid benefits
flexibility of being fully remote

Fulltime

Senior Software Engineer

At JFrog, we’re reinventing DevOps and MLOps to help the world’s greatest compan...

Location

Israel , Netanya/Tel Aviv

Salary:

Not provided

JFrog

Expiration Date

Until further notice

Requirements

5+ years of proven experience in software development
Strong background in designing, developing, and debugging complex distributed systems (e.g., microservices, event-driven architectures)
Hands-on experience with containerized environments, microservices, and Kubernetes
Proven experience with at least one major cloud provider (e.g., AWS, GCP, Azure)
Ability to lead technical discussions, mentor engineers, and drive architectural decisions

Job Responsibility

Be an integral part of a highly skilled team working to build the leading MLOps platform in the industry
Maintain and evolve the Runtime team’s products, ensuring their reliability and scalability
Design and develop a complete hosting system that supports various types of inference, analytics, monitoring, distribution, and more – enabling customers to run large-scale real-time, batch, and streaming ML pipelines
Play a key role in shaping our cross-company engineering culture
Conduct high-quality design reviews with a strong emphasis on scalability, maintainability, security, and sound use of design patterns
Write maintainable, well-tested code in multiple programming languages
Continuously improve the efficiency, scalability, and stability of critical system components

Senior Machine Learning Engineer

Start.io, a leading mobile marketing and audience platform, empowers the app eco...

Location

Salary:

Not provided

Start.io

Expiration Date

Until further notice

Requirements

B.Sc. or M.Sc. in Computer Science, Software Engineering, or a related technical discipline
5+ years of experience building high-performance backend or ML inference systems
Deep expertise in Python and experience with low-latency APIs and real-time serving frameworks (e.g., FastAPI, Triton Inference Server, TorchServe, BentoML)
Experience with scalable service architecture, message queues (Kafka, Pub/Sub), and async processing
Strong understanding of model deployment practices, online/offline feature parity, and real-time monitoring
Experience in cloud environments (AWS, GCP, or OCI) and container orchestration (Kubernetes)
Experience working with in-memory and NoSQL databases (e.g. Aerospike, Redis, Bigtable) to support ultra-fast data access in production-grade ML services
Familiarity with observability stacks (Prometheus, Grafana, OpenTelemetry) and best practices for alerting and diagnostics
A strong sense of ownership and the ability to drive solutions end-to-end
Passion for performance, clean architecture, and impactful systems

Job Responsibility

Own and lead the design and development of low-latency Algo inference services handling billions of requests per day
Build and scale robust real-time decision-making engines, integrating ML models with business logic under strict SLAs
Collaborate closely with DS to deploy models seamlessly and reliably in production
Design systems for model versioning, shadowing, and A/B testing at runtime
Ensure high availability, scalability, and observability of production systems
Continuously optimize latency, throughput, and cost-efficiency using modern tooling and techniques
Work independently while interfacing with cross-functional stakeholders from Algo, Infra, Product, Engineering, BA & Business

What we offer

Lead the mission-critical inference engine that drives our core product
Join a high-caliber Algo group solving real-time, large-scale, high-stakes problems
Work on systems where every millisecond matters, and every decision drives real value
Enjoy a fast-paced, collaborative, and empowered culture with full ownership of your domain

New

Senior AI Backend Engineer

As a core member of our AI Engineering team, you will collaborate with data scie...

Location

Israel , Tel-Aviv

Salary:

Not provided

K Health

Expiration Date

Until further notice

Requirements

6+ years of backend engineering experience
Strong proficiency in more than one major programming language (such as Python, Java, Go, Rust, or Kotlin)
Solid understanding of AI systems architecture and experience working in environments involving AI agents, LLMs, or inference pipelines
Proven experience in building and scaling backend APIs, microservices, and background jobs
Strong experience with relational and NoSQL databases (e.g., PostgreSQL, MySQL, MongoDB, Redis), including schema design, data modeling, and query optimization
Experience with message brokers and distributed systems (e.g., Pub/Sub, Redis Streams, RabbitMQ)
Highly adaptable, a strong team player, with the ability to quickly learn and apply new technologies

Job Responsibility

Build and maintain reliable, scalable backend services to support AI agent execution and orchestration
Develop AI agent systems for complex operational workflows using LangChain, LangGraph, LiteLLM, and Langfuse
Orchestrate a hybrid model stack that includes OpenAI and Google Gemini alongside self-hosted and fine-tuned LLMs like Gemma and Llama
Build and maintain integrations with clinical systems (FHIR, EMR)
Drive observability and reliability using OpenTelemetry, Datadog, and Langfuse
Design APIs (GraphQL, REST), background workers, and event-driven systems that interface with AI inference engines and agent runtimes
Collaborate with Data Science, ML, and engineering teams to deploy AI features and improve the performance, scalability, and reliability of backend systems
Participate in code reviews, knowledge sharing, and mentoring to elevate the team’s technical capabilities

What we offer

Competitive compensation packages based on industry benchmarks for function, level, and geographic location

Principal Engineer

The Principal AI/ML Operations Engineer leads the architecture, automation, and ...

Location

United States , Pleasanton, California

Salary:

251000.00 - 314500.00 USD / Year

BlackLine

Expiration Date

Until further notice

Requirements

Bachelor’s or Master’s degree in Computer Science, Machine Learning, Data Science, or a related field
10+ years in ML infrastructure, DevOps, and software system architecture
4+ years in leading MLOps or AI Ops platforms
Strong programming skills in languages such as Python, Java, or Scala
Expertise in ML frameworks (TensorFlow, PyTorch, scikit-learn) and orchestration tools (Airflow, Kubeflow, Vertex AI, MLflow)
Proven experience operating production pipelines for ML and LLM-based systems across cloud ecosystems (GCP, AWS, Azure)
Deep familiarity with LangChain, LangGraph, ADK or similar agentic system runtime management
Strong competencies in CI/CD, IaC, and DevSecOps pipelines integrating testing, compliance, and deployment automation
Hands-on with observability stacks (Prometheus, Grafana, Newrelic) for model and agent performance tracking
Understanding of governance frameworks for Responsible AI, auditability, and cost metering across training and inference workloads

Job Responsibility

Define enterprise-level standards and reference architectures for ML-Ops and AIOps systems
Partner with data science, security, and product teams to set evaluation and governance standards (Guardrails, Bias, Drift, Latency SLAs)
Mentor senior engineers and drive design reviews for ML pipelines, model registries, and agentic runtime environments
Lead incident response and reliability strategies for ML/AI systems
Lead the deployment of AI models and systems in various environments
Collaborate with development teams to integrate AI solutions into existing workflows and applications
Ensure seamless integration with different platforms and technologies
Define and manage MCP Registry for agentic component onboarding, lifecycle versioning, and dependency governance
Build CI/CD pipelines automating LLM agent deployment, policy validation, and prompt evaluation of workflows
Develop and operationalize experimentation frameworks for agent evaluations, scenario regression, and performance analytics

What we offer

short-term and long-term incentive programs
robust offering of benefit and wellness plans

Fulltime

New

Senior Software Development Engineer in Test (SDET) - AI Cluster Networking and Security

In AI infrastructure organization, simplifying large hardware deployments with p...

Location

India , Bengaluru

Salary:

Not provided

Cerebras Systems

Expiration Date

Until further notice

Requirements

Bachelor's or master's degree in engineering in computer science, electrical, AI, data science of related field
10+ years of experience in testing one of areas like enterprise software, distributed systems, datacenter hardware and software
Experience working in large enterprise or cloud networking infrastructure, high speed switches, routers, firewalls
Experience in qualifying networking vendor platforms like Juniper, Arista or Cisco and network test equipment like Ixia/Spirent
Experience in Datacenter technology like BGP, ECN, PFC
Experience testing networking security, compliance and firewalls
Strong coding skills in one of the programming languages like python, golang or C/C++
Strong debugging skills to debug issues in large distributed systems, hardware, and software. Experience with debugging tools like gdb, strace, networking monitors
Strong understanding of operating systems internals like memory management, file system working, security basics and performance
Strong understanding of datacenter layout, device performance characteristics like PCIe, networking and storage

Job Responsibility

Innovate and execute tests on cutting edge AI infrastructure
Define optimized test strategies and methodologies
Be a quick learner, adapt to new technologies
Build a strong understanding of how to break these large distributed systems challenge into smaller components that can be unit tested
Automate first approach - Aim for 100% automated tests to test all cluster features in areas of high availability, failure scenarios, performance, stress and security
Champion cluster security, reliability for uptime of 99.9999% and ease of use with observability
Test all components of AI cluster including but not limited to cluster software involving kubernetes, prometheus and grafana. Cluster hardware components like ML wafer scale accelerators, CPU runtime nodes, High speed swarmx interconnect, High speed data transfer of weights through memoryx interconnect
Qualify cluster networking solutions which consists of high-speed switches, routers and optics from various vendors
Qualify cluster security features including OS security, network security, cloud compliance user access and security certifications

What we offer

Build a breakthrough AI platform beyond the constraints of the GPU
Publish and open source their cutting-edge AI research
Work on one of the fastest AI supercomputers in the world
Enjoy job stability with startup vitality
Our simple, non-corporate work culture that respects individual beliefs

New

Senior AI/ML Validation Engineer

We are seeking an experienced and versatile professional with expertise in valid...

Location

India , Hyderabad

Salary:

Not provided

AMD

Expiration Date

Until further notice

Requirements

8–12 years in QA/Test for systems software or platform engineering, with at least 4 years focused on GPU software, device drivers, or firmware validation
Demonstrable ownership of validation for AI/ML pipelines and serving stacks using PyTorch and at least one modern inference framework (e.g., vLLM), including accuracy baselining and performance regression detection
Proven expertise testing drivers and firmware with hands-on work in: PCIe fundamentals (link training, BARs, MSI/MSI-X), DMA engines, interrupt handling, and memory models
Failure modes: error injection, recovery paths, power/thermal events, and persistence across reboot/upgrade cycles
Deep proficiency in Linux (kernel/user space) and practical experience with Windows driver ecosystems
ability to: Read kernel logs and symbols, trace with ftrace/perf/ETW, and perform cross-layer debugging
Build custom kernels/modules and analyze crash dumps (kdump, WinDbg)
Strong programming for test automation: Python for framework and orchestration (pytest or equivalent), robust mocking/fixtures, and data-driven test generation
C/C++ for low-level test harnesses, protocol exercisers, and performance micro-benchmarks
Bash/PowerShell for environment setup, CI scripting, and reproducibility

Job Responsibility

Own end-to-end test strategy for AI/ML workflows (PyTorch, vLLM), GPU runtimes, drivers, and firmware across kernel and user space
Develop scalable automation frameworks spanning unit, integration, HIL (hardware-in-the-loop), system, and end-to-end tests
Implement and maintain CI quality gates (GitHub Actions/Workflows, Jenkins), including automated build, test execution, artifact management, reporting, and flake reduction
Design and execute performance, stress, reliability, soak, and long-haul tests targeting GPU compute, memory, I/O, and serving throughput/latency
Validate cross-platform compatibility (Linux/Windows), covering driver interfaces, kernel interactions, firmware behavior, and runtime stability
Create reproducible environments with containers/orchestration
instrument telemetry and observability for data-driven QA
Apply agentic AI techniques to accelerate test generation, triage, and root cause analysis
integrate intelligent diagnostics into pipelines
Develop rigorous test cases for low-level features (PCIe, DMA, interrupts, memory management), error handling, recovery, and fault injection

What we offer

AMD benefits at a glance

Senior Inference ML Runtime Engineer

Cerebras Systems

Location:
United States; Canada , Sunnyvale ▼
Toronto

Category:
IT - Software Development

Contract Type:
Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Additional Information:

Job Posted:
February 17, 2026

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Senior Inference ML Runtime Engineer