CrawlJobs Logo

Senior Inference ML Runtime Engineer

cerebras.net Logo

Cerebras Systems

Location Icon

Location:
United States; Canada , Sunnyvale

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

The Inference ML Engineering team at Cerebras Systems is dedicated to enabling our fast generative inference solution through simple APIs powered by a distributed runtime that runs on large clusters of our own hardware. Our mission is to empower enterprises, developers, and researchers to unlock the full potential of our platform, leveraging its performance, scalability, and flexibility. The team works closely with cross-functional groups, including compiler developers, cluster orchestrators, ML scientists, cloud architects, and product teams, to deliver high-impact solutions that redefine the boundaries of ML performance and usability.

Job Responsibility:

  • Drive and provide technical guidance to a team of software engineers working on complex machine learning integration projects
  • Design and implement ML features (e.g., structured outputs, biased sampling, predicted outputs) that improve performance of generative AI models at inference time
  • Design and implement high-throughput, low-latency multimodal inference models that support delivery of image, audio, and video inputs and outputs
  • Maintain our scalable serving backend for handling many concurrent requests per minute
  • Scale our inference service by implementing detailed observability throughout the entire stack
  • Analyze and improve latency, throughput, memory usage, and compute efficiency on the service and the implementation of various features
  • Optimize software to accelerate generative LLM inference by achieving high throughput and low latency
  • Stay up-to-date with advancements in machine learning and deep learning, and apply state-of-the-art techniques to enhance our solutions
  • Evaluate trade-offs between different approaches, clearly articulate design choices, and develop detailed proposals for implementing new features
  • Uncover, scope, and prioritize significant areas of technical debt across the software stack to ensure continued high quality of the inference service
  • Build and maintain robust automated test suites to ensure software quality, performance, and reliability
  • Contribute to an agile team environment by delivering high-quality software and adhering to agile development practices
  • Lead cross-functional initiative across the company to deliver high-quality inference solutions

Requirements:

  • Bachelor’s, Master’s, or PhD in Computer Science, Computer Engineering, Mathematics, or a related field
  • 8+ years of experience in large-scale software engineering, with a focus on deep learning or related domains
  • Proficiency in Python for building and maintaining scalable systems
  • Advanced proficiency in C++, with an emphasis on multi-threaded programming, performance optimization, and system-level development
  • Demonstrated experience driving cross-functional projects
  • Experience building and scaling large-scale inference systems for LLMs or multimodal models
  • Familiarity with LLM serving frameworks, such as vLLM, SGLang, and TensorRT-LLM
  • Solid understanding of software architectural patterns for large-scale, high-performance applications
  • Hands-on experience with ML frameworks, such as PyTorch, and a strong understanding of their underlying architectures
  • Strong problem-solving skills, with the ability to balance technical depth with practical implementation constraints
  • Exceptional communication and presentation skills, with the ability to work both independently and collaboratively across multidisciplinary teams
What we offer:
  • Build a breakthrough AI platform beyond the constraints of the GPU
  • Publish and open source their cutting-edge AI research
  • Work on one of the fastest AI supercomputers in the world
  • Enjoy job stability with startup vitality
  • Our simple, non-corporate work culture that respects individual beliefs

Additional Information:

Job Posted:
February 17, 2026

Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior Inference ML Runtime Engineer

Senior Software Engineer - Network Enablement (Applied ML)

We build simple yet innovative consumer products and developer APIs that shape h...
Location
Location
United States , San Francisco
Salary
Salary:
180000.00 - 270000.00 USD / Year
plaid.com Logo
Plaid
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong software engineering skills including systems design, APIs, and building reliable backend services (Go or Python preferred)
  • Production experience with batch and streaming data pipelines and orchestration tools such as Airflow or Spark
  • Experience building or operating real-time scoring and online feature-serving systems, including feature stores and low-latency model inference
  • Experience integrating model outputs into product flows (APIs, feature flags) and measuring impact through experiments and product metrics
  • Experience with model lifecycle and operations: model registries, CI/CD for models, reproducible training, offline & online parity, monitoring and incident response
Job Responsibility
Job Responsibility
  • Embed model inference into Network Enablement product flows and decision logic (APIs, feature flags, backend flows)
  • Define and instrument product + ML success metrics (fraud reduction, retention lift, false positives, downstream impact)
  • Design and run experiments and rollout plans (backtesting, shadow scoring, A/B tests, feature-flagged releases) to validate product hypotheses
  • Build and operate offline training pipelines and production batch scoring for bank intelligence products
  • Ship and maintain online feature serving and low-latency model inference endpoints for real-time partner/bank scoring
  • Implement model CI/CD, model/version registry, and safe rollout/rollback strategies
  • Monitor model/data health: drift/regression detection, model-quality dashboards, alerts, and SLOs targeted to partner product needs
  • Ensure offline and online parity, data lineage, and automated validation / data contracts to reduce regressions
  • Optimize inference performance and cost for real-time scoring (batching, caching, runtime selection)
  • Ensure fairness, explainability and PII-aware handling for partner-facing ML features
What we offer
What we offer
  • medical
  • dental
  • vision
  • 401(k)
  • equity
  • commission
  • Fulltime
Read More
Arrow Right

Senior Research Engineer

We are seeking a highly skilled Senior Research Engineer to collaborate closely ...
Location
Location
United States
Salary
Salary:
210000.00 - 309000.00 USD / Year
assembly.ai Logo
Assembly
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong expertise in the Python ecosystem and major ML frameworks (PyTorch, JAX)
  • Experience with lower-level programming (C++ or Rust preferred)
  • Deep understanding of GPU acceleration (CUDA, profiling, kernel-level optimization)
  • TPU experience is a strong plus
  • Proven ability to accelerate deep learning workloads using compiler frameworks, graph optimizations, and parallelization strategies
  • Solid understanding of the deep learning lifecycle: model design, large-scale training, data processing pipelines, and inference deployment
  • Strong debugging, profiling, and optimization skills in large-scale distributed environments
  • Excellent communication and collaboration skills, with the ability to clearly prioritize and articulate impact-driven technical solutions
Job Responsibility
Job Responsibility
  • Investigate and mitigate performance bottlenecks in large-scale distributed training and inference systems
  • Develop and implement both low-level (operator/kernel) and high-level (system/architecture) optimization strategies
  • Translate research models and prototypes into highly optimized, production-ready inference systems
  • Explore and integrate inference compilers such as TensorRT, ONNX Runtime, AWS Neuron and Inferentia, or similar technologies
  • Design, test, and deploy scalable solutions for parallel and distributed workloads on heterogeneous hardware
  • Facilitate knowledge transfer and bidirectional support between Research and Engineering teams, ensuring alignment of priorities and solutions
What we offer
What we offer
  • competitive equity grants
  • 100% employer-paid benefits
  • flexibility of being fully remote
  • Fulltime
Read More
Arrow Right

Senior Software Engineer

At JFrog, we’re reinventing DevOps and MLOps to help the world’s greatest compan...
Location
Location
Israel , Netanya/Tel Aviv
Salary
Salary:
Not provided
jfrog.com Logo
JFrog
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of proven experience in software development
  • Strong background in designing, developing, and debugging complex distributed systems (e.g., microservices, event-driven architectures)
  • Hands-on experience with containerized environments, microservices, and Kubernetes
  • Proven experience with at least one major cloud provider (e.g., AWS, GCP, Azure)
  • Ability to lead technical discussions, mentor engineers, and drive architectural decisions
Job Responsibility
Job Responsibility
  • Be an integral part of a highly skilled team working to build the leading MLOps platform in the industry
  • Maintain and evolve the Runtime team’s products, ensuring their reliability and scalability
  • Design and develop a complete hosting system that supports various types of inference, analytics, monitoring, distribution, and more – enabling customers to run large-scale real-time, batch, and streaming ML pipelines
  • Play a key role in shaping our cross-company engineering culture
  • Conduct high-quality design reviews with a strong emphasis on scalability, maintainability, security, and sound use of design patterns
  • Write maintainable, well-tested code in multiple programming languages
  • Continuously improve the efficiency, scalability, and stability of critical system components
Read More
Arrow Right

Senior Machine Learning Engineer

Start.io, a leading mobile marketing and audience platform, empowers the app eco...
Location
Location
Salary
Salary:
Not provided
start.io Logo
Start.io
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • B.Sc. or M.Sc. in Computer Science, Software Engineering, or a related technical discipline
  • 5+ years of experience building high-performance backend or ML inference systems
  • Deep expertise in Python and experience with low-latency APIs and real-time serving frameworks (e.g., FastAPI, Triton Inference Server, TorchServe, BentoML)
  • Experience with scalable service architecture, message queues (Kafka, Pub/Sub), and async processing
  • Strong understanding of model deployment practices, online/offline feature parity, and real-time monitoring
  • Experience in cloud environments (AWS, GCP, or OCI) and container orchestration (Kubernetes)
  • Experience working with in-memory and NoSQL databases (e.g. Aerospike, Redis, Bigtable) to support ultra-fast data access in production-grade ML services
  • Familiarity with observability stacks (Prometheus, Grafana, OpenTelemetry) and best practices for alerting and diagnostics
  • A strong sense of ownership and the ability to drive solutions end-to-end
  • Passion for performance, clean architecture, and impactful systems
Job Responsibility
Job Responsibility
  • Own and lead the design and development of low-latency Algo inference services handling billions of requests per day
  • Build and scale robust real-time decision-making engines, integrating ML models with business logic under strict SLAs
  • Collaborate closely with DS to deploy models seamlessly and reliably in production
  • Design systems for model versioning, shadowing, and A/B testing at runtime
  • Ensure high availability, scalability, and observability of production systems
  • Continuously optimize latency, throughput, and cost-efficiency using modern tooling and techniques
  • Work independently while interfacing with cross-functional stakeholders from Algo, Infra, Product, Engineering, BA & Business
What we offer
What we offer
  • Lead the mission-critical inference engine that drives our core product
  • Join a high-caliber Algo group solving real-time, large-scale, high-stakes problems
  • Work on systems where every millisecond matters, and every decision drives real value
  • Enjoy a fast-paced, collaborative, and empowered culture with full ownership of your domain
Read More
Arrow Right
New

Senior AI Backend Engineer

As a core member of our AI Engineering team, you will collaborate with data scie...
Location
Location
Israel , Tel-Aviv
Salary
Salary:
Not provided
khealth.com Logo
K Health
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years of backend engineering experience
  • Strong proficiency in more than one major programming language (such as Python, Java, Go, Rust, or Kotlin)
  • Solid understanding of AI systems architecture and experience working in environments involving AI agents, LLMs, or inference pipelines
  • Proven experience in building and scaling backend APIs, microservices, and background jobs
  • Strong experience with relational and NoSQL databases (e.g., PostgreSQL, MySQL, MongoDB, Redis), including schema design, data modeling, and query optimization
  • Experience with message brokers and distributed systems (e.g., Pub/Sub, Redis Streams, RabbitMQ)
  • Highly adaptable, a strong team player, with the ability to quickly learn and apply new technologies
Job Responsibility
Job Responsibility
  • Build and maintain reliable, scalable backend services to support AI agent execution and orchestration
  • Develop AI agent systems for complex operational workflows using LangChain, LangGraph, LiteLLM, and Langfuse
  • Orchestrate a hybrid model stack that includes OpenAI and Google Gemini alongside self-hosted and fine-tuned LLMs like Gemma and Llama
  • Build and maintain integrations with clinical systems (FHIR, EMR)
  • Drive observability and reliability using OpenTelemetry, Datadog, and Langfuse
  • Design APIs (GraphQL, REST), background workers, and event-driven systems that interface with AI inference engines and agent runtimes
  • Collaborate with Data Science, ML, and engineering teams to deploy AI features and improve the performance, scalability, and reliability of backend systems
  • Participate in code reviews, knowledge sharing, and mentoring to elevate the team’s technical capabilities
What we offer
What we offer
  • Competitive compensation packages based on industry benchmarks for function, level, and geographic location
Read More
Arrow Right

Principal Engineer

The Principal AI/ML Operations Engineer leads the architecture, automation, and ...
Location
Location
United States , Pleasanton, California
Salary
Salary:
251000.00 - 314500.00 USD / Year
blackline.com Logo
BlackLine
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Machine Learning, Data Science, or a related field
  • 10+ years in ML infrastructure, DevOps, and software system architecture
  • 4+ years in leading MLOps or AI Ops platforms
  • Strong programming skills in languages such as Python, Java, or Scala
  • Expertise in ML frameworks (TensorFlow, PyTorch, scikit-learn) and orchestration tools (Airflow, Kubeflow, Vertex AI, MLflow)
  • Proven experience operating production pipelines for ML and LLM-based systems across cloud ecosystems (GCP, AWS, Azure)
  • Deep familiarity with LangChain, LangGraph, ADK or similar agentic system runtime management
  • Strong competencies in CI/CD, IaC, and DevSecOps pipelines integrating testing, compliance, and deployment automation
  • Hands-on with observability stacks (Prometheus, Grafana, Newrelic) for model and agent performance tracking
  • Understanding of governance frameworks for Responsible AI, auditability, and cost metering across training and inference workloads
Job Responsibility
Job Responsibility
  • Define enterprise-level standards and reference architectures for ML-Ops and AIOps systems
  • Partner with data science, security, and product teams to set evaluation and governance standards (Guardrails, Bias, Drift, Latency SLAs)
  • Mentor senior engineers and drive design reviews for ML pipelines, model registries, and agentic runtime environments
  • Lead incident response and reliability strategies for ML/AI systems
  • Lead the deployment of AI models and systems in various environments
  • Collaborate with development teams to integrate AI solutions into existing workflows and applications
  • Ensure seamless integration with different platforms and technologies
  • Define and manage MCP Registry for agentic component onboarding, lifecycle versioning, and dependency governance
  • Build CI/CD pipelines automating LLM agent deployment, policy validation, and prompt evaluation of workflows
  • Develop and operationalize experimentation frameworks for agent evaluations, scenario regression, and performance analytics
What we offer
What we offer
  • short-term and long-term incentive programs
  • robust offering of benefit and wellness plans
  • Fulltime
Read More
Arrow Right
New

Senior Software Development Engineer in Test (SDET) - AI Cluster Networking and Security

In AI infrastructure organization, simplifying large hardware deployments with p...
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
cerebras.net Logo
Cerebras Systems
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or master's degree in engineering in computer science, electrical, AI, data science of related field
  • 10+ years of experience in testing one of areas like enterprise software, distributed systems, datacenter hardware and software
  • Experience working in large enterprise or cloud networking infrastructure, high speed switches, routers, firewalls
  • Experience in qualifying networking vendor platforms like Juniper, Arista or Cisco and network test equipment like Ixia/Spirent
  • Experience in Datacenter technology like BGP, ECN, PFC
  • Experience testing networking security, compliance and firewalls
  • Strong coding skills in one of the programming languages like python, golang or C/C++
  • Strong debugging skills to debug issues in large distributed systems, hardware, and software. Experience with debugging tools like gdb, strace, networking monitors
  • Strong understanding of operating systems internals like memory management, file system working, security basics and performance
  • Strong understanding of datacenter layout, device performance characteristics like PCIe, networking and storage
Job Responsibility
Job Responsibility
  • Innovate and execute tests on cutting edge AI infrastructure
  • Define optimized test strategies and methodologies
  • Be a quick learner, adapt to new technologies
  • Build a strong understanding of how to break these large distributed systems challenge into smaller components that can be unit tested
  • Automate first approach - Aim for 100% automated tests to test all cluster features in areas of high availability, failure scenarios, performance, stress and security
  • Champion cluster security, reliability for uptime of 99.9999% and ease of use with observability
  • Test all components of AI cluster including but not limited to cluster software involving kubernetes, prometheus and grafana. Cluster hardware components like ML wafer scale accelerators, CPU runtime nodes, High speed swarmx interconnect, High speed data transfer of weights through memoryx interconnect
  • Qualify cluster networking solutions which consists of high-speed switches, routers and optics from various vendors
  • Qualify cluster security features including OS security, network security, cloud compliance user access and security certifications
What we offer
What we offer
  • Build a breakthrough AI platform beyond the constraints of the GPU
  • Publish and open source their cutting-edge AI research
  • Work on one of the fastest AI supercomputers in the world
  • Enjoy job stability with startup vitality
  • Our simple, non-corporate work culture that respects individual beliefs
Read More
Arrow Right
New

Senior AI/ML Validation Engineer

We are seeking an experienced and versatile professional with expertise in valid...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8–12 years in QA/Test for systems software or platform engineering, with at least 4 years focused on GPU software, device drivers, or firmware validation
  • Demonstrable ownership of validation for AI/ML pipelines and serving stacks using PyTorch and at least one modern inference framework (e.g., vLLM), including accuracy baselining and performance regression detection
  • Proven expertise testing drivers and firmware with hands-on work in: PCIe fundamentals (link training, BARs, MSI/MSI-X), DMA engines, interrupt handling, and memory models
  • Failure modes: error injection, recovery paths, power/thermal events, and persistence across reboot/upgrade cycles
  • Deep proficiency in Linux (kernel/user space) and practical experience with Windows driver ecosystems
  • ability to: Read kernel logs and symbols, trace with ftrace/perf/ETW, and perform cross-layer debugging
  • Build custom kernels/modules and analyze crash dumps (kdump, WinDbg)
  • Strong programming for test automation: Python for framework and orchestration (pytest or equivalent), robust mocking/fixtures, and data-driven test generation
  • C/C++ for low-level test harnesses, protocol exercisers, and performance micro-benchmarks
  • Bash/PowerShell for environment setup, CI scripting, and reproducibility
Job Responsibility
Job Responsibility
  • Own end-to-end test strategy for AI/ML workflows (PyTorch, vLLM), GPU runtimes, drivers, and firmware across kernel and user space
  • Develop scalable automation frameworks spanning unit, integration, HIL (hardware-in-the-loop), system, and end-to-end tests
  • Implement and maintain CI quality gates (GitHub Actions/Workflows, Jenkins), including automated build, test execution, artifact management, reporting, and flake reduction
  • Design and execute performance, stress, reliability, soak, and long-haul tests targeting GPU compute, memory, I/O, and serving throughput/latency
  • Validate cross-platform compatibility (Linux/Windows), covering driver interfaces, kernel interactions, firmware behavior, and runtime stability
  • Create reproducible environments with containers/orchestration
  • instrument telemetry and observability for data-driven QA
  • Apply agentic AI techniques to accelerate test generation, triage, and root cause analysis
  • integrate intelligent diagnostics into pipelines
  • Develop rigorous test cases for low-level features (PCIe, DMA, interrupts, memory management), error handling, recovery, and fault injection
What we offer
What we offer
  • AMD benefits at a glance
Read More
Arrow Right