CrawlJobs Logo

Software Engineer, Load Balancing - Inference

openai.com Logo

OpenAI

Location Icon

Location:
United States , San Francisco

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

293000.00 - 490000.00 USD / Year

Job Description:

We’re looking for a senior engineer to design and build the load balancer that will sit at the very front of our research inference stack - routing the world’s largest AI models with millisecond precision and bulletproof reliability. This system will serve research jobs where requests must stay “sticky” to the same model instance for hours or days and where even subtle errors can directly degrade model performance.

Job Responsibility:

  • Architect and build the gateway / network load balancer that fronts all research jobs, ensuring long-lived connections remain consistent and performant
  • Design traffic stickiness and routing strategies that optimize for both reliability and throughput
  • Instrument and debug complex distributed systems — with a focus on building world-class observability and debuggability tools (distributed tracing, logging, metrics)
  • Collaborate closely with researchers and ML engineers to understand how infrastructure decisions impact model performance and training dynamics
  • Own the end-to-end system lifecycle: from design and code to deploy, operate, and scale
  • Work in an outcome-oriented environment where everyone contributes across layers of the stack, from infra plumbing to performance tuning

Requirements:

  • Deep experience designing and operating large-scale distributed systems, particularly load balancers, service gateways, or traffic routing layers
  • 5+ years of experience designing in theory for and debugging in practice for the algorithmic and systems challenges of consistent hashing, sticky routing, and low-latency connection management
  • 5+ years of experience as a software engineer and systems architect working on high-scale, high-reliability infrastructure
  • Strong debugging mindset and enjoy spending time in tracing, logs, and metrics to untangle distributed failures
  • Comfortable writing and reviewing production code in Rust or similar systems languages (C/C++, Java, Go, Zig, etc)
  • Operated in big tech or high-growth environments and are excited to apply that experience in a faster-moving setting
  • Take ownership of problems end-to-end and are excited to build something foundational to how our models interact with the world

Nice to have:

  • Experience with gateway or load balancing systems (e.g., Envoy, gRPC, custom LB implementations)
  • Familiarity with inference workloads (e.g., reinforcement learning, streaming inference, KV cache management, etc)
  • Exposure to debugging and operational excellence practices in large production environments
What we offer:
  • Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
  • 401(k) retirement plan with employer match
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
  • 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
  • Mental health and wellness support
  • Employer-paid basic life and disability coverage
  • Annual learning and development stipend to fuel your professional growth
  • Daily meals in our offices, and meal delivery credits as eligible
  • Relocation support for eligible employees
  • Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided
  • Offers Equity
  • Performance-related bonus(es) for eligible employees

Additional Information:

Job Posted:
February 21, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Software Engineer, Load Balancing - Inference

New

Software Engineer, Networking - Inference

We’re looking for a senior engineer to design and build the load balancer that w...
Location
Location
United States , San Francisco
Salary
Salary:
325000.00 - 490000.00 USD / Year
openai.com Logo
OpenAI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Deep experience designing and operating large-scale distributed systems, particularly load balancers, service gateways, or traffic routing layers
  • 5+ years of experience designing in theory for and debugging in practice for the algorithmic and systems challenges of consistent hashing, sticky routing, and low-latency connection management
  • 5+ years of experience as a software engineer and systems architect working on high-scale, high-reliability infrastructure
  • Strong debugging mindset and enjoy spending time in tracing, logs, and metrics to untangle distributed failures
  • Comfortable writing and reviewing production code in Rust or similar systems languages (C/C++, Java, Go, Zig, etc)
  • Operated in big tech or high-growth environments and are excited to apply that experience in a faster-moving setting
  • Take ownership of problems end-to-end and are excited to build something foundational to how our models interact with the world
Job Responsibility
Job Responsibility
  • Architect and build the gateway / network load balancer that fronts all research jobs, ensuring long-lived connections remain consistent and performant
  • Design traffic stickiness and routing strategies that optimize for both reliability and throughput
  • Instrument and debug complex distributed systems — with a focus on building world-class observability and debuggability tools (distributed tracing, logging, metrics)
  • Collaborate closely with researchers and ML engineers to understand how infrastructure decisions impact model performance and training dynamics
  • Own the end-to-end system lifecycle: from design and code to deploy, operate, and scale
  • Work in an outcome-oriented environment where everyone contributes across layers of the stack, from infra plumbing to performance tuning
What we offer
What we offer
  • Offers Equity
  • Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
  • 401(k) retirement plan with employer match
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
  • 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
  • Mental health and wellness support
  • Employer-paid basic life and disability coverage
  • Annual learning and development stipend to fuel your professional growth
  • Fulltime
Read More
Arrow Right
New

Software Engineer, Caching Infrastructure

The Caching Infrastructure team is responsible for building a caching layer that...
Location
Location
United States , San Francisco
Salary
Salary:
230000.00 - 385000.00 USD / Year
openai.com Logo
OpenAI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience building and scaling distributed systems, with a strong focus on caching, load balancing, or storage systems
  • Deep expertise with Redis, Memcached, or similar solutions, including clustering, durability configurations, client-side connection patterns, and performance tuning
  • Production experience with Kubernetes, service meshes (e.g., Envoy), and autoscaling systems
  • Think rigorously about latency, reliability, throughput, and cost in designing platform capabilities
  • Thrive in a fast-paced environment and enjoy balancing pragmatic engineering with long-term technical excellence
Job Responsibility
Job Responsibility
  • Design, build, and operate OpenAI’s multi-tenant caching platform used across inference, identity, quota, and product experiences
  • Define the long-term vision and roadmap for caching as a core infra capability, balancing performance, durability, and cost
  • Collaborate with other infra teams (e.g., networking, observability, databases) and product teams to ensure our caching platform meets their needs
What we offer
What we offer
  • Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
  • 401(k) retirement plan with employer match
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
  • 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
  • Mental health and wellness support
  • Employer-paid basic life and disability coverage
  • Annual learning and development stipend to fuel your professional growth
  • Daily meals in our offices, and meal delivery credits as eligible
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

As a Site Reliability Engineer (SRE), you will be a key player in ensuring our p...
Location
Location
Portugal , Lisboa
Salary
Salary:
Not provided
tekever.com Logo
Tekever
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Information Technology, Engineering, or a related field
  • 3+ years of experience in Site Reliability Engineering, DevOps, or a related software/systems engineering role
  • Proficiency in one or more programming languages such as Python, Go, or Bash for automation and tooling
  • Deep understanding of Linux/Unix operating systems and networking fundamentals (TCP/IP, DNS, HTTP, load balancing)
  • Experience with cloud platforms such as AWS, Azure, or Google Cloud, with a focus on Google Cloud
  • Strong knowledge of CI/CD tools like Jenkins, GitLab CI, or CircleCI
  • Strong hands-on experience operating Kubernetes in production, including troubleshooting of networking, storage, scheduling, autoscaling, and stateful workloads
  • Experience with Infrastructure as Code (IaC) tools such as Terraform and Ansible
  • Understanding of version control systems (e.g., Git) and with CI/CD principles and tools (e.g., GitLab CI, Jenkins)
  • Knowledge of monitoring, logging and tracing tools (e.g., Prometheus, Grafana, ELK stack)
Job Responsibility
Job Responsibility
  • Design, build, and maintain highly available, scalable infrastructure for distributed and stateful workloads, supporting real-time data ingestion, AI inference pipelines, and hybrid cloud/edge deployment
  • Automate repetitive manual tasks, infrastructure provisioning, and operational workflows to reduce toil and improve system efficiency
  • Implement and manage robust monitoring, logging, and alerting solutions to proactively detect and address issues
  • Define and track Service Level Indicators (SLIs) and Service Level Objectives (SLOs)
  • Participate in an on-call rotation to respond to production incidents
  • Lead blameless post-mortem analyses for incidents in complex distributed systems, identifying root causes, systemic weaknesses, and implementing long-term preventative measures
  • Manage and provision cloud and on-premise infrastructure using IaC principles and tools like Terraform and Ansible
  • Conduct performance analysis, system tuning, and capacity planning to ensure our services meet performance and cost-efficiency goals
  • Develop, test, and maintain disaster recovery plans and business continuity strategies to ensure service resilience
  • Work closely with software development teams to consult on system design, platform choices, and reliability best practices for new features and services
What we offer
What we offer
  • An excellent work environment and an opportunity to create a real impact in the world
  • A truly high-tech, state-of-the-art engineering company with flat structure and no politics
  • Working with the very latest technologies in Data & AI, including Edge AI, Swarming - both within our software platforms and within our embedded on-board systems
  • Flexible work arrangements
  • Professional development opportunities
  • Collaborative and inclusive work environment
  • Salary compatible with the level of proven experience
  • Fulltime
Read More
Arrow Right
New

Senior Scaled Customer Activation Manager

Customer Success Managers at Ramp drive value for customers and revenue for the ...
Location
Location
United States , New York; San Francisco
Salary
Salary:
120000.00 - 165000.00 USD / Year
ramp.com Logo
Ramp
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in Customer Success, Account Management, Sales, or a related customer-facing role in B2B SaaS
  • Proven success managing a high-volume book of business while consistently driving customer outcomes
  • Strong executive presence with the ability to lead confident, structured customer conversations and handle objections effectively
  • Experience balancing scale and personalization through group onboarding, automation, and targeted 1:1 engagement
  • Comfort working cross-functionally with Product and Engineering to share feedback, troubleshoot issues, and ideate solutions
  • Data-informed mindset with experience using metrics and signals to guide decisions and assess risk
  • Proven track record of meeting or exceeding key performance metrics in fast-paced environments
  • High adaptability and comfort navigating ambiguity in a scaling organization
  • Strong curiosity about product design and the ability to communicate why Ramp works the way it does
Job Responsibility
Job Responsibility
  • Own the end-to-end onboarding and activation of a large portfolio of micro-SMB and micro-MM customers, driving customers to full Ramp adoption within 60 days through efficient, scalable motions
  • Lead onboarding with strong executive presence, setting clear agendas, controlling call flow, and driving as much progress as possible in minimal touchpoints
  • Ensure fast, thorough, and complete implementation by educating customers on Ramp functionality, best practices, and the “why” behind key workflows
  • Confidently navigate and position Ramp’s full product suite, including Cards, Bill Pay, Travel, Treasury, and Accounting integrations, tailoring recommendations to customer workflows and business goals
  • Deeply understand customer workflows, pain points, and blockers, and problem-solve alongside Product, Engineering, and Support when needed
  • Drive revenue by minimizing implementation delays and reinforcing the value of Ramp as a core financial operating system
  • Use customer data, usage signals, and patterns to prioritize outreach, identify risk, and intervene proactively
  • Balance implementation work, follow-ups, proactive outreach, and group onboarding while maintaining a high bar for quality
  • Partner closely with Product to surface customer feedback, identify trends, and influence roadmap priorities
  • Identify opportunities to automate, standardize, and improve onboarding processes, contributing to playbooks, enablement materials, and scalable best practices
What we offer
What we offer
  • 100% medical, dental & vision insurance coverage for you
  • Partially covered for your dependents
  • One Medical annual membership
  • 401k (including employer match on contributions made while employed by Ramp)
  • Flexible PTO
  • Fertility HRA (up to $10,000 per year)
  • Parental Leave
  • Unlimited AI token usage
  • Pet insurance
  • Centralized home-office equipment ordering for all employees
  • Fulltime
Read More
Arrow Right
New

Shift Supervisor

We’re building a world of health around every individual — shaping a more connec...
Location
Location
United States , Bellevue
Salary
Salary:
19.13 - 28.13 USD / Hour
https://www.cvshealth.com/ Logo
CVS Health
Expiration Date
April 12, 2026
Flip Icon
Requirements
Requirements
  • Deductive reasoning ability, analytical skills and computer skills
  • Advanced communication skills and supervision skills
  • Ability to work a flexible schedule, including some early morning, overnight and weekend shifts, to work overtime as needed, and to respond to urgent issues at the store when they arise
Job Responsibility
Job Responsibility
  • Work effectively with store management and store crews
  • Supervise the store’s crew through assigning, directing and following up of all activities
  • Effectively communicate information both to and from store management and crews
  • Assist customers with their questions, problems and complaints
  • Promote CVS customer service culture. (Greet, offer help, and thank)
  • Handle all customer relations issues in accordance with company policy and promote a positive shopping experience for all CVS customers
  • Maintain customer/patient confidentiality
  • Price merchandise
  • Stock shelves
  • Execute the displays, sign and inventory of weekly, promotional, and seasonal merchandise
What we offer
What we offer
  • Affordable medical plan options
  • 401(k) plan (including matching company contributions)
  • Employee stock purchase plan
  • No-cost wellness screenings
  • No-cost tobacco cessation and weight management programs
  • No-cost confidential counseling and financial coaching
  • Paid time off
  • Flexible work schedules
  • Family leave
  • Dependent care resources
  • Fulltime
Read More
Arrow Right
New

Senior Data Analyst

As a Data Analyst at Norm, you will join our growing data team to build and scal...
Location
Location
United States , New York City
Salary
Salary:
160000.00 - 190000.00 USD / Year
Norm AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5-7 years of data analytics experience, ideally with 1-2 years of data engineering exposure
  • Strong SQL and Python skills, with hands-on experience in a modern data stack (dbt, Fivetran, Hex, Airflow)
  • Experience working with cloud data warehouses such as Snowflake, BigQuery, or Redshift
  • Proven ability to build analytics foundations or data models from scratch, especially in early-stage or scaling environments
  • Experience supporting B2B SaaS products and understanding enterprise customer behavior
  • Ability to translate complex business questions into technical data solutions
  • Comfort working in small, fast-moving teams and wearing multiple data hats
  • AI fluency: active use of AI in day-to-day work to support thinking, creation, and problem-solving
Job Responsibility
Job Responsibility
  • Build and maintain data models, pipelines, and help develop ELT processes that support product and business analytics
  • Implement comprehensive product analytics to understand how enterprise clients use our AI-driven compliance workflows
  • Develop measurement frameworks to evaluate AI model performance in real-world legal and compliance contexts
  • Design and deliver self-service analytics capabilities using modern tools like Hex for analysis and visualization, dbt for semantic modeling
  • Set up and maintain reliable data pipelines using orchestration platforms like Airflow
  • Establish and uphold data governance practices aligned with enterprise security and regulatory requirements
  • Build dashboards and KPIs that drive strategic product and business decisions
What we offer
What we offer
  • Equity compensation
  • 401(k) plan with an employer match
  • Top-tier insurance coverage, encompassing health, dental, hospital, accident, and vision plans
  • Relocation reimbursement for candidates needing to relocate to NYC
  • Fast-paced learning environment where professional growth is constant
  • Fulltime
Read More
Arrow Right
New

Senior Microsoft Security Engineer - MCM

about the role
Location
Location
Singapore
Salary
Salary:
100000.00 - 150000.00 SGD / Year
https://www.randstad.com Logo
Randstad
Expiration Date
March 23, 2026
Flip Icon
Requirements
Requirements
  • Bachelor's degree/Diploma in Computer Science or equivalent
  • At least 4 years of relevant working experience in Microsoft security
  • Understanding of Microsoft operating systems and their security features
  • Familiar with Powershell scripting
Job Responsibility
Job Responsibility
  • Implement and maintain Microsoft security solutions
  • Implement security policies, standards and procedures
  • Collaborate with other teams to ensure secure system configurations and deployment of new devices and applications
  • Provide technical guidance and support to the incident response team during security incidents and investigations
Read More
Arrow Right
New

Associate Dentist

Looking for a new challenge with a supportive, forward-thinking team? Join us in...
Location
Location
United Kingdom , Wallsend
Salary
Salary:
67500.00 - 150000.00 GBP / Year
jobs.360resourcing.co.uk Logo
360 Resourcing Solutions
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • GDC registration + performer number
  • Indemnity cover
  • Eligibility to work in the UK
What we offer
What we offer
  • Competitive UDA rate, by negotiation
  • Practice private plan
  • 50% private/lab fee split
  • Modern 4 surgery practice
  • Digital x-rays, intra-oral scanner and airflow available
  • SFD software used
  • Established NHS list available
  • Specialists onsite - Endodontics, Invisalign, Facial Aesthetics
  • Longstanding and established staff onsite
  • Clinical lead onsite for further support
  • Fulltime
Read More
Arrow Right