CrawlJobs Logo

Senior Runtime Engineer

cerebras.net Logo

Cerebras Systems

Location Icon

Location:
United States; Canada , Sunnyvale

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

We are building the next generation of large-scale AI systems that power training and inference workloads at unprecedented scale and efficiency. You will design and develop high-performance distributed software that orchestrates massive compute and data pipelines across heterogeneous clusters. Your work will push the limits of concurrency, throughput, and scalability—enabling efficient execution of models at massive scale. This role sits at the intersection of systems engineering and machine learning performance, demanding both architectural depth and low-level implementation skills. You will help shape how models are executed and optimized end-to-end, from data ingestion to distributed execution, across cutting-edge hardware platforms. We’re hiring for runtime roles across both Training and Inference.

Job Responsibility:

  • Design and implement distributed runtime components to efficiently manage large-scale execution workloads
  • Develop and optimize high-performance data and communication pipelines that fully utilize CPU, memory, storage, and network resources
  • Enable scalable execution across multiple compute nodes, ensuring high concurrency and minimal bottlenecks
  • Collaborate closely with ML and compiler teams to integrate new model architectures, training regimes, and hardware-specific optimizations
  • Diagnose and resolve complex performance issues across the software stack using profiling and instrumentation tools
  • Contribute to overall system design, architecture reviews, and roadmap planning for large-scale AI workloads

Requirements:

  • 3+ years of experience developing high-performance or distributed system software
  • Strong programming skills in C/C++, with expertise in multi-threading, memory management, and performance optimization
  • Experience with distributed systems, networking, or inter-process communication
  • Solid understanding of data structures, concurrency, and system-level resource management (CPU, I/O, and memory)
  • Proven ability to debug, profile, and optimize code across scales—from threads to clusters
  • Bachelor’s, Master’s, or equivalent experience in Computer Science, Electrical Engineering, or related field

Nice to have:

  • Familiarity with machine learning training or inference pipelines, especially distributed training and large-model scaling
  • Exposure to Python and PyTorch, particularly in the context of model training or performance tuning
  • Experience with compiler internals, custom hardware interfaces, or low-level protocol design
  • Prior work on high-performance clusters, HPC systems, or custom hardware/software co-design
  • Deep curiosity about how to unlock new levels of performance for large-scale AI workloads
What we offer:
  • Build a breakthrough AI platform beyond the constraints of the GPU
  • Publish and open source their cutting-edge AI research
  • Work on one of the fastest AI supercomputers in the world
  • Enjoy job stability with startup vitality
  • Our simple, non-corporate work culture that respects individual beliefs

Additional Information:

Job Posted:
February 17, 2026

Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior Runtime Engineer

Senior Software Engineer, AI Runtime

We’re seeking a Senior Software Engineer to help power the future of agentic AI ...
Location
Location
United States
Salary
Salary:
157000.00 - 198900.00 USD / Year
apollographql.com Logo
Apollo GraphQL
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Expertise in agent-to-tool orchestration, routing, and coordination in scalable, fault-tolerant systems
  • Deep expertise in Rust programming language
  • Strong background in distributed systems, server architecture, and high-performance backend development
  • Proven experience with protocol design, message routing, and server-side orchestration frameworks
  • Experience building and maintaining robust runtime infrastructure that supports AI-driven workflows and enables reliable agent-to-tool interactions
  • Proven experience with protocol design, message routing, and building server-side frameworks that enable scalable, reliable multi-tool agent workflows
  • Hands-on experience with observability, monitoring, and debugging frameworks for complex systems
  • Passion for clean, maintainable code, high system reliability, and scalable architecture
  • Experience in strategic system design, making architectural trade-offs, and planning for long-term scalability and maintainability
  • Strong technical leadership and mentorship, including guiding junior engineers and driving engineering best practices across teams
Job Responsibility
Job Responsibility
  • Scale an enterprise AI/MCP Server and Gateway that powers multi-agent workflows across Apollo, including routing, orchestration, and integration boundaries
  • Implement robust server infrastructure to ensure reliability, performance, and security at scale
  • Build and maintain tools for agent discovery, communication, and coordination
  • Define deployment strategies and runtime optimizations to maximize efficiency and minimize operational overhead
  • Develop frameworks and patterns that enable seamless multi-agent collaboration and AI-driven orchestration
  • Integrate observability, logging, and monitoring for full visibility into server and agent behavior
  • Explore and implement AI-enhanced developer workflows to optimize orchestration and agent interactions
  • Collaborate with teams within our org to ensure the MCP Server meets evolving product and developer needs
Read More
Arrow Right

Senior Engineer

We’re seeking a hands-on Senior Engineer with strong Unreal Engine expertise and...
Location
Location
United States , Loveland, Colorado
Salary
Salary:
105000.00 - 135000.00 USD / Year
snail.com Logo
Snail Games
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years professional software development experience (games preferred)
  • Strong Unreal Engine experience (UE4 or UE5), including Blueprints and C++
  • Demonstrated experience with console development and/or porting projects
  • Strong debugging, profiling, and performance optimization skills across multiple platforms
  • Ability to work full-stack across gameplay, tools, build systems, and runtime features
  • Clear communication skills and the ability to work effectively in a small, fast-moving team
Job Responsibility
Job Responsibility
  • Implement and maintain gameplay systems, tools, pipelines, and technical features across Unreal Engine projects
  • Own console development workflows including optimization, platform-specific features, certification prep, and debugging across Xbox, PlayStation, Switch, and PC
  • Support porting initiatives by profiling performance, addressing platform constraints, and ensuring compliance with TRCs/XRs
  • Operate as a full-stack engineer: contribute to gameplay code, tools, build/CI improvements, and runtime systems as needed
  • Collaborate with the Technical Director on architecture decisions, performance budgets, and risk assessment
  • Work closely with design, art, and production to estimate tasks, scope technical needs, and maintain alignment across disciplines
  • Conduct code reviews, uphold engineering standards, and contribute to improving workflows and pipelines
What we offer
What we offer
  • True focus on work/life balance
  • Paid company holidays, vacation, and separate sick leave
  • Medical, dental, vision, and Life/LTD
  • 401k with company match
  • Fulltime
Read More
Arrow Right

Senior Security Engineer

PagerDuty is seeking a Senior Security Engineer to join our diverse, customer-fo...
Location
Location
Canada , Toronto
Salary
Salary:
137000.00 - 207000.00 CAD / Year
https://www.pagerduty.com Logo
PagerDuty
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proficiency with Application & Product Security typically associated with 4 - 5 years of experience in a Security Engineering role working with a cloud-native, microservices environment, preferably AWS
  • Familiarity with cloud-native product technologies including: Vulnerability detection via multiple approaches including SAST, DAST, SCA, and runtime (e.g., Qualys/Nessus, Wiz, Snyk, GHAS, Semgrep, etc.)
  • CI/CD technologies and integrations (e.g., CircleCI, Buildkite, Helm, Terraform, Chef)
  • Product security event logging standards and analysis tools (e.g., SIEM such as: SumoLogic, LogRythm, or Splunk, etc.)
  • Security Incident Response & Risk Management processes and tools
  • Proficiency in at least one programming language and framework (e.g. Python, Bash, Phoenix/Elixir, Java, Ruby on Rails), typically associated with 3 - 4 years of experience with the language/framework
  • Have exceptional written, oral communication, and interpersonal skills
  • Organizational skills with the ability to successfully manage multiple priorities and deadlines
Job Responsibility
Job Responsibility
  • Embrace the role of hands-on technical lead in defining product security standards and guiding platform protections
  • Establish criteria and conduct comprehensive security reviews throughout all stages of product development to identify and address security risks
  • Perform regular threat assessments, coordinate with third-party testers for penetration testing, and conduct internal penetration testing to identify and mitigate security risks
  • Mentor and guide team members to ensure product and business objectives are prioritized in project implementations, fostering a strong documentation culture with project charters and design documents
  • Work with loosely defined requirements where you exercise your analytical skills to clarify questions, share your approach, and collaborate with the team to design and implement effective security frameworks. Maintain a strong appetite for challenging problems with a high degree of ownership
  • Participate in the team’s On-Call rotation, triaging and addressing security issues as they arise, and implement measures to prevent future occurrences
  • Enable service team security implementations by developing security-as-code constructs, including infrastructure-as-code (IaC) modules, libraries and frontend components, while creating and maintaining developer-focused documentation to promote easy adoption
  • Establish and uphold baseline standards and hardened configurations for platform components
  • Continuously enhance security frameworks by focusing on product security standards and software supply chain protections, tailored for application security in cloud-native, microservices environments
What we offer
What we offer
  • Competitive salary
  • Comprehensive benefits package from day one
  • Flexible work arrangements
  • Company equity
  • ESPP (Employee Stock Purchase Program)
  • Retirement or pension plan
  • Generous paid vacation time
  • Paid holidays and sick leave
  • Dutonian Wellness Days & HibernationDuty - companywide paid days off in addition to PTO
  • Paid parental leave: 22 weeks for pregnant parent, 12 weeks for non-pregnant parent (some countries have longer leave standards and we comply with local laws)
  • Fulltime
Read More
Arrow Right

Senior Platform Engineer

Glide is looking for a Senior Platform Engineer to join our Infrastructure team ...
Location
Location
Salary
Salary:
Not provided
glideapps.com Logo
Glide
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience as a platform engineer/SRE
  • 3+ years experience building and maintaining highly available and scalable distributed data sources
  • Experience with Google Cloud Platform services like Cloud SQL, Cloud Run, AlloyDB, or equivalent
  • Experience orchestrating complex systems with Kubernetes
  • Proficiency in TypeScript development
  • Strong SQL skills
  • can speak to covering index optimization strategies
  • Experience designing, building and running data-intensive event-driven architectures
  • You are a clear and effective communicator, be it when you write code, write emails, or explain complex technical issues to non-technical co-workers
  • Passionate and self-motivated, with a demonstrated ability to work in a fast-paced and evolving environment
Job Responsibility
Job Responsibility
  • Managing our existing infrastructure in GCP
  • Driving our platform evolution as the complexity and sophistication of our product only increases
  • Managing our Github/GH Actions based build pipeline
  • Provide build, test, and runtime infrastructure to service teams
  • Ensure patterns are established (e.g., for database throttling, request rate limiting, etc…) to protect Glide’s uptime
  • Monitor infrastructure costs and coordinate improvements when necessary
  • Drive SRE tooling and best practices around observability and alerting
  • Write, review, and maintain code primarily in TypeScript
  • Write architecture briefs and proposals, carry out code experiments, and build prototypes to learn how we can achieve reliable scale with our systems
  • Provide technical leadership, mentorship, pairing opportunities, and code review to encourage the growth of others
What we offer
What we offer
  • competitive salary and benefits package
  • a supportive and dynamic remote work environment
  • opportunities for career growth
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, Rust

We’re helping organizations deploy supergraphs at scale using Apollo Federation....
Location
Location
United States
Salary
Salary:
157300.00 - 198900.00 USD / Year
apollographql.com Logo
Apollo GraphQL
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience with Rust and enjoy writing performant, maintainable code
  • Expertise in systems engineering, including knowledge of stateless/fault-tolerant systems, event-driven patterns, and distributed paradigms
  • Excel at cross-team collaboration and have a “rising tide lifts all boats” mentality
  • Passionate about GraphQL, modern developer tooling, and contributing to industry-leading innovations
  • Have a growth mindset and actively seek opportunities to learn and stay current with industry trends
Job Responsibility
Job Responsibility
  • Build, test, and maintain fault-tolerant infrastructure for GraphQL runtime platforms, primarily in idiomatic Rust
  • Operate and improve durable, stable public APIs used by the world’s most demanding GraphQL workloads
  • Engage directly with users to understand their needs, debug issues, and bring insights back to influence the platform’s evolution
  • Design scalable, observable systems that integrate seamlessly into diverse customer infrastructure stacks
  • Collaborate with engineers across teams using supportive communication and constructive code reviews
  • Mentor and guide teammates in architecting and writing idiomatic Rust code
  • Lead architectural discussions and cross-team initiatives
  • Develop comprehensive technical designs and documentation that address cost efficiency, security, and observability
  • Participate in on-call rotations to ensure the reliability of mission-critical systems
What we offer
What we offer
  • Equity
  • Choice of 3 Anthem Blue Cross medical plans
  • Dental and Vision benefits provided by Sun Life Financial
  • Fulltime
Read More
Arrow Right

Senior Software Engineer - Network Enablement (Applied ML)

We build simple yet innovative consumer products and developer APIs that shape h...
Location
Location
United States , San Francisco
Salary
Salary:
180000.00 - 270000.00 USD / Year
plaid.com Logo
Plaid
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong software engineering skills including systems design, APIs, and building reliable backend services (Go or Python preferred)
  • Production experience with batch and streaming data pipelines and orchestration tools such as Airflow or Spark
  • Experience building or operating real-time scoring and online feature-serving systems, including feature stores and low-latency model inference
  • Experience integrating model outputs into product flows (APIs, feature flags) and measuring impact through experiments and product metrics
  • Experience with model lifecycle and operations: model registries, CI/CD for models, reproducible training, offline & online parity, monitoring and incident response
Job Responsibility
Job Responsibility
  • Embed model inference into Network Enablement product flows and decision logic (APIs, feature flags, backend flows)
  • Define and instrument product + ML success metrics (fraud reduction, retention lift, false positives, downstream impact)
  • Design and run experiments and rollout plans (backtesting, shadow scoring, A/B tests, feature-flagged releases) to validate product hypotheses
  • Build and operate offline training pipelines and production batch scoring for bank intelligence products
  • Ship and maintain online feature serving and low-latency model inference endpoints for real-time partner/bank scoring
  • Implement model CI/CD, model/version registry, and safe rollout/rollback strategies
  • Monitor model/data health: drift/regression detection, model-quality dashboards, alerts, and SLOs targeted to partner product needs
  • Ensure offline and online parity, data lineage, and automated validation / data contracts to reduce regressions
  • Optimize inference performance and cost for real-time scoring (batching, caching, runtime selection)
  • Ensure fairness, explainability and PII-aware handling for partner-facing ML features
What we offer
What we offer
  • medical
  • dental
  • vision
  • 401(k)
  • equity
  • commission
  • Fulltime
Read More
Arrow Right

Senior Frontend Engineer

NorthBay is seeking a Senior Front-End Engineer with deep expertise in JavaScrip...
Location
Location
Salary
Salary:
Not provided
northbaysolutions.com Logo
NorthBay
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of front-end development experience with strong command over JavaScript and TypeScript
  • Deep expertise in React, Vue.js (Nuxt3), Next.js, and Svelte
  • Strong knowledge of responsive design, accessibility (WCAG), and front-end performance optimization
  • Experience with Lit.js for building modular and reusable UI components
  • Familiarity with Git workflows, modern CI/CD pipelines, and build tools like Vite, Webpack, or Rollup
Job Responsibility
Job Responsibility
  • Lead the development of scalable and dynamic front-end applications using React, Vue.js (Nuxt3), Next.js, Svelte, and TypeScript
  • Build reusable components and design systems, including lightweight Lit.js web components
  • Optimize front-end performance through SSR, code splitting, lazy loading, hydration, and runtime optimization
  • Collaborate closely with ML engineers, backend developers, and designers to deliver intuitive, AI-powered user experiences
  • Work with state management libraries like Redux, Vuex, Pinia, or Svelte Stores
  • Build accessible and responsive applications that are cross-browser and cross-device compatible
  • Contribute to testing, documentation, and CI/CD automation to maintain high code quality
  • Participate in architectural discussions, code reviews, and mentor junior front-end engineers
  • Fulltime
Read More
Arrow Right

Senior Infrastructure Engineer

Coralogix is seeking a Senior Infrastructure Engineer to join our Core SRE team ...
Location
Location
Germany , Berlin
Salary
Salary:
Not provided
coralogix.com Logo
Coralogix
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in DevOps, SRE, platform engineering, or infrastructure roles
  • Deep understanding of Kubernetes: API, CNI, scheduling, container runtimes and such
  • Strong hands-on experience with Kafka and Istio (or similar technologies ), and core networking protocols (HTTP, gRPC, TLS)
  • Proven experience managing large-scale cloud infrastructure (AWS, GCP, etc.)
  • Experience in incident response and troubleshooting complex distributed systems
  • Some software engineering experience, preferably in Golang
  • Passion for automation, performance tuning, and operational excellence
Job Responsibility
Job Responsibility
  • Act as a hands-on technical leader with deep expertise in modern cloud infrastructure
  • Serve as a go-to person in the team — leading through influence, not hierarchy
  • Collaborate cross-functionally to refine requirements and propose innovative, scalable solutions
  • Drive long-term, high-impact infrastructure projects across multiple teams, from design to implementation, within defined timelines
  • Contribute to improving system reliability, performance, and cost-efficiency at scale
  • Fulltime
Read More
Arrow Right