CrawlJobs Logo

Systems Engineer (Performance / Runtime / Infrastructure)

weareorbis.com Logo

Orbis Consultants

Location Icon

Location:
United States , New York City

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

We’re working with a high-growth AI company based in New York that has reached $10M ARR within its first year and is now scaling its engineering team. This is a systems-focused role working on the core execution layer of the platform. The team is building and improving the systems that determine how workloads are executed, scheduled, and optimised in production, with a strong focus on performance, latency, throughput, and reliability under real-world load. A key part of the work involves running untrusted, AI-generated code safely at scale. This includes building sandboxed execution environments, working with containers and isolation mechanisms, and designing systems that can securely handle thousands of concurrent workloads. Unlike typical infrastructure roles, this sits much closer to how systems actually behave under the hood. Engineers have direct control over system design, execution, and performance rather than primarily working with higher-level cloud abstractions.

Job Responsibility:

  • Debugging and improving system performance (latency, throughput, efficiency)
  • Identifying bottlenecks and optimising systems under real production load
  • Building and improving runtime systems and execution environments
  • Working on sandboxing, containers, and isolation for running untrusted code
  • Designing systems that handle thousands of concurrent workloads
  • Building orchestration systems for stateless containers
  • Contributing to multi-tenant infrastructure and resource management
  • Working on high-throughput, real-time systems

Requirements:

  • Strong software engineering fundamentals (Go is used, but not required)
  • Experience working on performance-critical or distributed systems
  • Experience with containers and orchestration (e.g. Docker, Kubernetes, container runtimes), with a focus on execution, performance, and systems behaviour
  • Experience or interest in sandboxing, isolation, or execution environments
  • Understanding of concurrency, multithreading, or networking fundamentals
  • Comfortable debugging complex systems in production
  • Experience with multi-tenant or high-throughput systems is a plus

Nice to have:

Experience with multi-tenant or high-throughput systems is a plus

What we offer:

Equity

Additional Information:

Job Posted:
May 04, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Systems Engineer (Performance / Runtime / Infrastructure)

Software Engineer, C++ Middleware and Runtime Infrastructure

You will develop and optimize the core infrastructure that facilitates reliable,...
Location
Location
United States , Santa Clara
Salary
Salary:
120000.00 - 200000.00 USD / Year
plus.ai Logo
PlusAI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • BS in Computer Science, Electrical Engineering, or related field
  • Solid hands-on coding experience using C++14(or later)
  • Strong understanding of Linux development tools, including build systems, compilers, debuggers, and performance analyzers
  • Excellent written and verbal communication skills
  • Proactive problem-solving mindset: ability to identify, propose, and implement solutions
Job Responsibility
Job Responsibility
  • Design and optimize low-latency interprocess communication and data flow monitoring
  • Develop high-performance event logging and structured telemetry
  • Implement safety-enhanced memory allocators and efficient memory provisioning policies
  • Build lock-free data structures and algorithms to support real-time system requirements
  • Work on network communication and coherency protocols
  • Develop on-the-fly component health monitoring and rapid response mechanisms for critical events
  • Manage on-vehicle configurations and system state validation
  • Optimize low-level OS interactions and fine-tune system performance
  • Ensure that your work is performed in accordance with the company’s Quality Management System (QMS) requirements and contribute to continuous improvement efforts
  • Ensure team compliance with QMS, monitor quality, and drive process improvements
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, AI Runtime

We’re seeking a Senior Software Engineer to help power the future of agentic AI ...
Location
Location
United States
Salary
Salary:
157000.00 - 198900.00 USD / Year
apollographql.com Logo
Apollo GraphQL
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Expertise in agent-to-tool orchestration, routing, and coordination in scalable, fault-tolerant systems
  • Deep expertise in Rust programming language
  • Strong background in distributed systems, server architecture, and high-performance backend development
  • Proven experience with protocol design, message routing, and server-side orchestration frameworks
  • Experience building and maintaining robust runtime infrastructure that supports AI-driven workflows and enables reliable agent-to-tool interactions
  • Proven experience with protocol design, message routing, and building server-side frameworks that enable scalable, reliable multi-tool agent workflows
  • Hands-on experience with observability, monitoring, and debugging frameworks for complex systems
  • Passion for clean, maintainable code, high system reliability, and scalable architecture
  • Experience in strategic system design, making architectural trade-offs, and planning for long-term scalability and maintainability
  • Strong technical leadership and mentorship, including guiding junior engineers and driving engineering best practices across teams
Job Responsibility
Job Responsibility
  • Scale an enterprise AI/MCP Server and Gateway that powers multi-agent workflows across Apollo, including routing, orchestration, and integration boundaries
  • Implement robust server infrastructure to ensure reliability, performance, and security at scale
  • Build and maintain tools for agent discovery, communication, and coordination
  • Define deployment strategies and runtime optimizations to maximize efficiency and minimize operational overhead
  • Develop frameworks and patterns that enable seamless multi-agent collaboration and AI-driven orchestration
  • Integrate observability, logging, and monitoring for full visibility into server and agent behavior
  • Explore and implement AI-enhanced developer workflows to optimize orchestration and agent interactions
  • Collaborate with teams within our org to ensure the MCP Server meets evolving product and developer needs
Read More
Arrow Right

Staff Software Engineer, AI Runtime

We’re seeking a Staff Software Engineer to help power the future of agentic AI w...
Location
Location
United States
Salary
Salary:
185000.00 - 215000.00 USD / Year
apollographql.com Logo
Apollo GraphQL
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Expertise in agent-to-tool orchestration, routing, and coordination in scalable, fault-tolerant systems
  • Deep expertise in Rust programming language
  • Strong background in distributed systems, server architecture, and high-performance backend development
  • Proven experience with protocol design, message routing, and server-side orchestration frameworks
  • Experience building and maintaining robust runtime infrastructure that supports AI-driven workflows and enables reliable agent-to-tool interactions
  • Proven experience with protocol design, message routing, and building server-side frameworks that enable scalable, reliable multi-tool agent workflows
  • Hands-on experience with observability, monitoring, and debugging frameworks for complex systems
  • Passion for clean, maintainable code, high system reliability, and scalable architecture
  • Experience in strategic system design, making architectural trade-offs, and planning for long-term scalability and maintainability
  • Strong technical leadership and mentorship, including guiding junior engineers and driving engineering best practices across teams
Job Responsibility
Job Responsibility
  • Architect and scale an enterprise AI/MCP Server and Gateway that powers multi-agent workflows across Apollo, including routing, orchestration, and integration boundaries
  • Design and implement robust server infrastructure to ensure reliability, performance, and security at scale
  • Build and maintain tools for agent discovery, communication, and coordination
  • Define deployment strategies and runtime optimizations to maximize efficiency and minimize operational overhead
  • Develop frameworks and patterns that enable seamless multi-agent collaboration and AI-driven orchestration
  • Integrate observability, logging, and monitoring for full visibility into server and agent behavior
  • Explore and implement AI-enhanced developer workflows to optimize orchestration and agent interactions
  • Collaborate with teams across Apollo to ensure the MCP Server meets evolving product and developer needs
What we offer
What we offer
  • Offers Equity
  • Choice of 3 Anthem Blue Cross medical plans (California residents can also choose from an additional 2 Kaiser medical plans)
  • Dental and Vision benefits are provided by Sun Life Financial
  • Fulltime
Read More
Arrow Right

Senior Infrastructure Engineer

Coralogix is seeking a Senior Infrastructure Engineer to join our Core SRE team ...
Location
Location
Germany , Berlin
Salary
Salary:
Not provided
coralogix.com Logo
Coralogix
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in DevOps, SRE, platform engineering, or infrastructure roles
  • Deep understanding of Kubernetes: API, CNI, scheduling, container runtimes and such
  • Strong hands-on experience with Kafka and Istio (or similar technologies ), and core networking protocols (HTTP, gRPC, TLS)
  • Proven experience managing large-scale cloud infrastructure (AWS, GCP, etc.)
  • Experience in incident response and troubleshooting complex distributed systems
  • Some software engineering experience, preferably in Golang
  • Passion for automation, performance tuning, and operational excellence
Job Responsibility
Job Responsibility
  • Act as a hands-on technical leader with deep expertise in modern cloud infrastructure
  • Serve as a go-to person in the team — leading through influence, not hierarchy
  • Collaborate cross-functionally to refine requirements and propose innovative, scalable solutions
  • Drive long-term, high-impact infrastructure projects across multiple teams, from design to implementation, within defined timelines
  • Contribute to improving system reliability, performance, and cost-efficiency at scale
  • Fulltime
Read More
Arrow Right

Senior Infrastructure Engineer

Coralogix is seeking a Senior Infrastructure Engineer to join our Core SRE team ...
Location
Location
Israel , Ramat Gan
Salary
Salary:
Not provided
coralogix.com Logo
Coralogix
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in DevOps, SRE, platform engineering, or infrastructure roles
  • Deep understanding of Kubernetes: API, CNI, scheduling, container runtimes and such
  • Strong hands-on experience with Kafka and Istio (or similar technologies ), and core networking protocols (HTTP, gRPC, TLS)
  • Proven experience managing large-scale cloud infrastructure (AWS, GCP, etc.)
  • Experience in incident response and troubleshooting complex distributed systems
  • Some software engineering experience, preferably in Golang
  • Passion for automation, performance tuning, and operational excellence
Job Responsibility
Job Responsibility
  • Act as a hands-on technical leader with deep expertise in modern cloud infrastructure
  • Serve as a go-to person in the team — leading through influence, not hierarchy
  • Collaborate cross-functionally to refine requirements and propose innovative, scalable solutions
  • Drive long-term, high-impact infrastructure projects across multiple teams, from design to implementation, within defined timelines
  • Contribute to improving system reliability, performance, and cost-efficiency at scale
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, Rust

We’re helping organizations deploy supergraphs at scale using Apollo Federation....
Location
Location
United States
Salary
Salary:
157300.00 - 198900.00 USD / Year
apollographql.com Logo
Apollo GraphQL
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience with Rust and enjoy writing performant, maintainable code
  • Expertise in systems engineering, including knowledge of stateless/fault-tolerant systems, event-driven patterns, and distributed paradigms
  • Excel at cross-team collaboration and have a “rising tide lifts all boats” mentality
  • Passionate about GraphQL, modern developer tooling, and contributing to industry-leading innovations
  • Have a growth mindset and actively seek opportunities to learn and stay current with industry trends
Job Responsibility
Job Responsibility
  • Build, test, and maintain fault-tolerant infrastructure for GraphQL runtime platforms, primarily in idiomatic Rust
  • Operate and improve durable, stable public APIs used by the world’s most demanding GraphQL workloads
  • Engage directly with users to understand their needs, debug issues, and bring insights back to influence the platform’s evolution
  • Design scalable, observable systems that integrate seamlessly into diverse customer infrastructure stacks
  • Collaborate with engineers across teams using supportive communication and constructive code reviews
  • Mentor and guide teammates in architecting and writing idiomatic Rust code
  • Lead architectural discussions and cross-team initiatives
  • Develop comprehensive technical designs and documentation that address cost efficiency, security, and observability
  • Participate in on-call rotations to ensure the reliability of mission-critical systems
What we offer
What we offer
  • Equity
  • Choice of 3 Anthem Blue Cross medical plans
  • Dental and Vision benefits provided by Sun Life Financial
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, Performance Optimization

We're looking for a Software Engineer focused on Performance Optimization to hel...
Location
Location
United States , San Mateo
Salary
Salary:
175000.00 - 220000.00 USD / Year
fireworks.ai Logo
Fireworks AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent practical experience
  • 5+ years of experience working on performance optimization or high-performance computing systems
  • Proficiency in CUDA or ROCm and experience with GPU profiling tools (e.g., Nsight, nvprof, CUPTI)
  • Familiarity with PyTorch and performance-critical model execution
  • Experience with distributed system debugging and optimization in multi-GPU environments
  • Deep understanding of GPU architecture, parallel programming models, and compute kernels
Job Responsibility
Job Responsibility
  • Optimize system and GPU performance for high-throughput AI workloads across training and inference
  • Analyze and improve latency, throughput, memory usage, and compute efficiency
  • Profile system performance to detect and resolve GPU- and kernel-level bottlenecks
  • Implement low-level optimizations using CUDA, Triton, and other performance tooling
  • Drive improvements in execution speed and resource utilization for large-scale model workloads (LLMs, VLMs, and video models)
  • Collaborate with ML researchers to co-design and tune model architectures for hardware efficiency
  • Improve support for mixed precision, quantization, and model graph optimization
  • Build and maintain performance benchmarking and monitoring infrastructure
  • Scale inference and training systems across multi-GPU, multi-node environments
  • Evaluate and integrate optimizations for emerging hardware accelerators and specialized runtimes
What we offer
What we offer
  • Meaningful equity in a fast-growing startup
  • Competitive salary
  • Comprehensive benefits package
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, Software Co-Design AI HPC Systems

Our team’s mission is to architect, co-design, and productionize next-generation...
Location
Location
United States , Mountain View
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
  • Master's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
  • Strong background in one or more of the following areas: AI accelerator or GPU architectures
  • Distributed systems and large-scale AI training/inference
  • High-performance computing (HPC) and collective communications
  • ML systems, runtimes, or compilers
  • Performance modeling, benchmarking, and systems analysis
  • Hardware–software co-design for AI workloads
  • Proficiency in systems-level programming (e.g., C/C++, CUDA, Python) and performance-critical software development.
  • Proven ability to work across organizational boundaries and influence technical decisions involving multiple stakeholders.
Job Responsibility
Job Responsibility
  • Lead the co-design of AI systems across hardware and software boundaries, spanning accelerators, interconnects, memory systems, storage, runtimes, and distributed training/inference frameworks.
  • Drive architectural decisions by analyzing real workloads, identifying bottlenecks across compute, communication, and data movement, and translating findings into actionable system and hardware requirements.
  • Co-design and optimize parallelism strategies, execution models, and distributed algorithms to improve scalability, utilization, reliability, and cost efficiency of large-scale AI systems.
  • Develop and evaluate what-if performance models to project system behavior under future workloads, model architectures, and hardware generations, providing early guidance to hardware and platform roadmaps.
  • Partner with compiler, kernel, and runtime teams to unlock the full performance of current and next-generation accelerators, including custom kernels, scheduling strategies, and memory optimizations.
  • Influence and guide AI hardware design at system and silicon levels, including accelerator microarchitecture, interconnect topology, memory hierarchy, and system integration trade-offs.
  • Lead cross-functional efforts to prototype, validate, and productionize high-impact co-design ideas, working across infrastructure, hardware, and product teams.
  • Mentor senior engineers and researchers, set technical direction, and raise the overall bar for systems rigor, performance engineering, and co-design thinking across the organization.
  • Fulltime
Read More
Arrow Right