Software Engineer, Load Balancing - Inference Job at OpenAI (San Francisco)

Software Engineer, Networking - Inference

We’re looking for a senior engineer to design and build the load balancer that w...

Location

United States , San Francisco

Salary:

325000.00 - 490000.00 USD / Year

OpenAI

Expiration Date

Until further notice

Requirements

Deep experience designing and operating large-scale distributed systems, particularly load balancers, service gateways, or traffic routing layers
5+ years of experience designing in theory for and debugging in practice for the algorithmic and systems challenges of consistent hashing, sticky routing, and low-latency connection management
5+ years of experience as a software engineer and systems architect working on high-scale, high-reliability infrastructure
Strong debugging mindset and enjoy spending time in tracing, logs, and metrics to untangle distributed failures
Comfortable writing and reviewing production code in Rust or similar systems languages (C/C++, Java, Go, Zig, etc)
Operated in big tech or high-growth environments and are excited to apply that experience in a faster-moving setting
Take ownership of problems end-to-end and are excited to build something foundational to how our models interact with the world

Job Responsibility

Architect and build the gateway / network load balancer that fronts all research jobs, ensuring long-lived connections remain consistent and performant
Design traffic stickiness and routing strategies that optimize for both reliability and throughput
Instrument and debug complex distributed systems — with a focus on building world-class observability and debuggability tools (distributed tracing, logging, metrics)
Collaborate closely with researchers and ML engineers to understand how infrastructure decisions impact model performance and training dynamics
Own the end-to-end system lifecycle: from design and code to deploy, operate, and scale
Work in an outcome-oriented environment where everyone contributes across layers of the stack, from infra plumbing to performance tuning

What we offer

Offers Equity
Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
401(k) retirement plan with employer match
Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
Mental health and wellness support
Employer-paid basic life and disability coverage
Annual learning and development stipend to fuel your professional growth

Fulltime

Principal Software Engineer, CoreAI Workload Engines

The CoreAI Workloads team builds the foundational inference engines and APIs tha...

Location

United States , Redmond

Salary:

139900.00 - 331200.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field and 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Python, or equivalent experience
Proven ability to design and operate large-scale, production inference services with high reliability and performance requirements, and to ship performance improvements safely via disciplined experimentation
Strong skills in performance analysis: benchmarking, profiling, diagnosing regressions, and turning results into concrete engine/runtime changes
Strong problem-solving skills and the ability to debug complex, cross layer systems issues
Demonstrated technical leadership, including mentoring engineers, driving cross-team architectural alignment, and leveraging AI tools and AI-assisted workflows to accelerate engineering velocity and quality
Hands-on experience with Kubernetes (building and operating services on k8s), including debugging production issues and designing platform abstractions (e.g., custom resources/controllers) and scheduling-aware deployments (e.g., node affinity, taints/tolerations, resource requests/limits)
Strong collaboration and communication skills, with the ability to work across organizational boundaries

Job Responsibility

Optimize inference engines for OpenAI and open-source models by implementing and shipping performance/efficiency improvements across runtime, scheduling, and serving paths (latency, throughput, utilization, availability, and cost)
Run experiments end-to-end: formulate hypotheses, implement engine changes (including Python/PyTorch integration points where relevant), analyze results, and ship improvements behind guardrails
Build and use experimentation capabilities for large-scale AI inference (experiment lifecycle, tracking, metric modeling, comparability standards, automated analysis) so the team can iterate quickly and safely
Own serving availability and efficiency for Azure OpenAI Service workloads through tiered experimentation, lean segmentation, and multi-modal utilization across heterogeneous fleets—turning findings into shipped engine improvements
Design and evolve inference serving architectures to improve utilization and latency using techniques such as disaggregated serving, multi-token prediction, KV offload/retrieval, and quantization—validated via staged rollouts and production guardrails
Extend AI infrastructure abstractions to support elastic, heterogeneous inference engines reliably at scale (e.g., dynamic scaling across model families, modalities, and workload classes while maintaining isolation and SLOs)
Tune and scale inference engines across NVIDIA GPU generations (A100, H100, H200) for state-of-the-art OpenAI models, focusing on serving efficiency, utilization, and reliability (not hardware bring-up)
Partner with networking and storage teams to leverage high-performance interconnects (e.g., RDMA/InfiniBand-class fabrics such as RoCE over IB) for distributed inference, without owning low-level kernel/driver enablement
Drive end-to-end features from design through production: observability, diagnostics, performance regression detection, and operational excellence for inference serving
Influence platform architecture and technical direction across teams through design reviews, clear metrics, and technical leadership focused on experimentation velocity and production reliability

Fulltime

Senior Software Engineer, CoreAI Workload Engines

The CoreAI Workloads team builds the foundational inference engines and APIs tha...

Location

United States , Redmond

Salary:

119800.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field and 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Python, or equivalent experience.
Proven ability to design and operate large-scale, production inference services with high reliability and performance requirements, and to ship performance improvements safely via disciplined experimentation.
Strong skills in performance analysis: benchmarking, profiling, diagnosing regressions, and turning results into concrete engine/runtime changes.
Strong problem-solving skills and the ability to debug complex, cross layer systems issues.
Demonstrated technical leadership, including mentoring engineers, driving cross-team architectural alignment, and leveraging AI tools and AI-assisted workflows to accelerate engineering velocity and quality.
Hands-on experience with Kubernetes (building and operating services on k8s), including debugging production issues and designing platform abstractions (e.g., custom resources/controllers) and scheduling-aware deployments (e.g., node affinity, taints/tolerations, resource requests/limits).
Strong collaboration and communication skills, with the ability to work across organizational boundaries.

Job Responsibility

Optimize inference engines for OpenAI and open-source models by implementing and shipping performance/efficiency improvements across runtime, scheduling, and serving paths (latency, throughput, utilization, availability, and cost).
Run experiments end-to-end: formulate hypotheses, implement engine changes (including Python/PyTorch integration points where relevant), analyze results, and ship improvements behind guardrails.
Build and use experimentation capabilities for large-scale AI inference (experiment lifecycle, tracking, metric modeling, comparability standards, automated analysis) so the team can iterate quickly and safely.
Own serving availability and efficiency for Azure OpenAI Service workloads through tiered experimentation, lean segmentation, and multi-modal utilization across heterogeneous fleets—turning findings into shipped engine improvements.
Design and evolve inference serving architectures to improve utilization and latency using techniques such as disaggregated serving, multi-token prediction, KV offload/retrieval, and quantization—validated via staged rollouts and production guardrails.
Extend AI infrastructure abstractions to support elastic, heterogeneous inference engines reliably at scale (e.g., dynamic scaling across model families, modalities, and workload classes while maintaining isolation and SLOs).
Tune and scale inference engines across NVIDIA GPU generations (A100, H100, H200) for state-of-the-art OpenAI models, focusing on serving efficiency, utilization, and reliability (not hardware bring-up).
Partner with networking and storage teams to leverage high-performance interconnects (e.g., RDMA/InfiniBand-class fabrics such as RoCE over IB) for distributed inference, without owning low-level kernel/driver enablement.
Drive end-to-end features from design through production: observability, diagnostics, performance regression detection, and operational excellence for inference serving.
Influence platform architecture and technical direction across teams through design reviews, clear metrics, and technical leadership focused on experimentation velocity and production reliability.

What we offer

Benefits and other compensation

Fulltime

Software Engineer 2

Microsoft Azure AI Inference platform is the next generation cloud business posi...

Location

United States , Redmond

Salary:

100600.00 - 199000.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor’s degree in Computer Science or a related technical field AND 2+ years of technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, or Golang, OR equivalent experience
Ability to meet Microsoft, customer, and/or government security screening requirements for this role
Technical background with a solid foundation in software engineering principles, distributed computing, and system architecture
Experience working on high-scale, reliable online systems
Experience with real-time online services requiring low latency and high throughput
Experience working with Layer 7 (L7) network proxies and gateways
Knowledge of network architecture and concepts, including HTTP and TCP protocols, authentication, and session management
Knowledge and experience with OSS, Docker, Kubernetes, C++, Golang, or equivalent programming languages
Cross-team collaboration skills and the desire to collaborate in a team of researchers and developers
Ability to independently lead projects

Job Responsibility

Design and implement core inference infrastructure for serving frontier AI models in production
Identify and drive improvements to end-to-end inference performance and efficiency of state-of-the-art LLMs and GenAI models from OpenAI, Anthropic and xAI hosted on AI Foundary
Design and implement efficient load scheduling and balancing strategies, by leveraging key insights and features of the model and workload
Scale the platform to support the growing inferencing demand and maintain high availability
Deliver critical capabilities required to serve the latest and greatest Gen AI models such as GPT5, Realtime audio, Sora, and enable fast time to market for them
Drive generic features to cater to the needs of customers such as GitHub, M365, Microsoft AI and third-party companies
Collaborate with our partners both internal and external
Embody Microsoft's Culture and Values

Fulltime

Software Engineer, Caching Infrastructure

The Caching Infrastructure team is responsible for building a caching layer that...

Location

United States , San Francisco

Salary:

230000.00 - 385000.00 USD / Year

OpenAI

Expiration Date

Until further notice

Requirements

5+ years of experience building and scaling distributed systems, with a strong focus on caching, load balancing, or storage systems
Deep expertise with Redis, Memcached, or similar solutions, including clustering, durability configurations, client-side connection patterns, and performance tuning
Production experience with Kubernetes, service meshes (e.g., Envoy), and autoscaling systems
Think rigorously about latency, reliability, throughput, and cost in designing platform capabilities
Thrive in a fast-paced environment and enjoy balancing pragmatic engineering with long-term technical excellence

Job Responsibility

Design, build, and operate OpenAI’s multi-tenant caching platform used across inference, identity, quota, and product experiences
Define the long-term vision and roadmap for caching as a core infra capability, balancing performance, durability, and cost
Collaborate with other infra teams (e.g., networking, observability, databases) and product teams to ensure our caching platform meets their needs

What we offer

Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
401(k) retirement plan with employer match
Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
Mental health and wellness support
Employer-paid basic life and disability coverage
Annual learning and development stipend to fuel your professional growth
Daily meals in our offices, and meal delivery credits as eligible

Fulltime

Sr Staff Engineer Software, Fullstack (Prisma AIRS) - NetSec

Join our team building a cutting-edge multi-tenanted GenAI Security Platform tha...

Location

India , Bengaluru

Salary:

Not provided

Palo Alto Networks

Expiration Date

Until further notice

Requirements

Proven experience building and scaling multi-tenant SaaS platforms with strict data isolation
Strong knowledge of API design, RESTful principles, and OpenAPI specifications
Proficiency in modern JavaScript frameworks (React, Vue, or Svelte) with TypeScript
Experience building data-intensive dashboards with complex visualisations and real-time data
Strong CSS/styling skills and responsive design principles
Demonstrated experience working with production AI/ML systems at scale
Practical experience integrating LLM APIs and managing inference at scale
Understanding of LLM operational challenges: rate limiting, cost optimisation, latency management, fallback strategies
Familiarity with AI agent frameworks (LangChain, AutoGen, MCP, or similar)
Knowledge of prompt engineering, semantic search, and vector databases

Job Responsibility

Design and implement high-performance REST APIs with enterprise-grade multi-tenant isolation and strict security boundaries
Work on distributed systems architecture handling high-throughput workloads with mission-critical uptime requirements
Build responsive dashboards and administrative interfaces for platform management, data visualisation, and system configuration
Integrate multiple LLM providers, implement semantic search capabilities, and build intelligent agent workflows
Architect complex, multi-step AI evaluation pipelines for asynchronous job execution and large-scale data processing
Design and implement database schemas with proper indexing, query optimisation, and data isolation strategies
Build and maintain scalable micro-services with async/await patterns and type-safe code
Develop data-intensive UIs with real-time updates, complex state management, and intuitive user experiences
Deploy and manage containerised applications on Kubernetes with comprehensive observability
Write thorough tests (frontend and backend) and maintain high code quality standards with automated tooling

Fulltime

Site Reliability Engineer

As a Site Reliability Engineer (SRE), you will be a key player in ensuring our p...

Location

Portugal , Lisboa

Salary:

Not provided

Tekever

Expiration Date

Until further notice

Requirements

Bachelor’s degree in Computer Science, Information Technology, Engineering, or a related field
3+ years of experience in Site Reliability Engineering, DevOps, or a related software/systems engineering role
Proficiency in one or more programming languages such as Python, Go, or Bash for automation and tooling
Deep understanding of Linux/Unix operating systems and networking fundamentals (TCP/IP, DNS, HTTP, load balancing)
Experience with cloud platforms such as AWS, Azure, or Google Cloud, with a focus on Google Cloud
Strong knowledge of CI/CD tools like Jenkins, GitLab CI, or CircleCI
Strong hands-on experience operating Kubernetes in production, including troubleshooting of networking, storage, scheduling, autoscaling, and stateful workloads
Experience with Infrastructure as Code (IaC) tools such as Terraform and Ansible
Understanding of version control systems (e.g., Git) and with CI/CD principles and tools (e.g., GitLab CI, Jenkins)
Knowledge of monitoring, logging and tracing tools (e.g., Prometheus, Grafana, ELK stack)

Job Responsibility

Design, build, and maintain highly available, scalable infrastructure for distributed and stateful workloads, supporting real-time data ingestion, AI inference pipelines, and hybrid cloud/edge deployment
Automate repetitive manual tasks, infrastructure provisioning, and operational workflows to reduce toil and improve system efficiency
Implement and manage robust monitoring, logging, and alerting solutions to proactively detect and address issues
Define and track Service Level Indicators (SLIs) and Service Level Objectives (SLOs)
Participate in an on-call rotation to respond to production incidents
Lead blameless post-mortem analyses for incidents in complex distributed systems, identifying root causes, systemic weaknesses, and implementing long-term preventative measures
Manage and provision cloud and on-premise infrastructure using IaC principles and tools like Terraform and Ansible
Conduct performance analysis, system tuning, and capacity planning to ensure our services meet performance and cost-efficiency goals
Develop, test, and maintain disaster recovery plans and business continuity strategies to ensure service resilience
Work closely with software development teams to consult on system design, platform choices, and reliability best practices for new features and services

What we offer

An excellent work environment and an opportunity to create a real impact in the world
A truly high-tech, state-of-the-art engineering company with flat structure and no politics
Working with the very latest technologies in Data & AI, including Edge AI, Swarming - both within our software platforms and within our embedded on-board systems
Flexible work arrangements
Professional development opportunities
Collaborative and inclusive work environment
Salary compatible with the level of proven experience

Fulltime

New

Assistant Director of Video Services

K-State Athletics is seeking qualified applicants for an Assistant Director of V...

Location

United States , Manhattan

Salary:

47500.00 - 50000.00 USD / Year

Kansas State University

Expiration Date

Until further notice

Requirements

Bachelor's degree or equivalent combination of education and training beyond high school, with an emphasis in professional video production, broadcast media, sports media, digital media, or a related field
Strong proficiency in Adobe Premiere Pro, Adobe After Effects, and Adobe Photoshop
Bachelor's degree in video production, broadcast media, sports media, communications, or a related field
Minimum of one year of full-time experience in a collegiate or live-event video production environment
Working knowledge of Ross Video Kiva, Expression, and Acuity systems
Daktronics Show Control
EVS Replay systems
and Yamaha audio boards
Strong verbal and written communication skills with the ability to collaborate effectively across departments and production teams
Demonstrated ability to perform in a fast-paced, high-pressure live production environment while meeting deadlines and adapting to last-minute changes with strong attention to detail

Job Responsibility

Plan, develop, edit, organize, and manage creative content for videoboards and other in-venue display systems
Manage all game day control room operations, including production setup, supervision of student and freelance crew members, live event directing, and overall execution of in-venue videoboard productions
Oversee operation and content management of Daktronics display systems across all athletic venues, ensuring proper functionality and timely content updates
Perform additional duties and special projects as assigned

What we offer

Excellent medical, dental and vision health plans
Retirement plan (No Vesting Period!!)
Generous earned leave plan – vacation and sick
Parental leave plan
Term life insurance
Accidental death and dismemberment insurance
Long term disability insurance
Paid KSU designated holidays

Fulltime

Select Country

Software Engineer, Load Balancing - Inference

Job Description

Job Responsibility

Requirements

Nice to have

What we offer

Looking for more opportunities?