CrawlJobs Logo

Software Engineer, Networking - Inference

openai.com Logo

OpenAI

Location Icon

Location:
United States , San Francisco

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

325000.00 - 490000.00 USD / Year

Job Description:

We’re looking for a senior engineer to design and build the load balancer that will sit at the very front of our research inference stack - routing the world’s largest AI models with millisecond precision and bulletproof reliability. This system will serve research jobs where requests must stay “sticky” to the same model instance for hours or days and where even subtle errors can directly degrade model performance.

Job Responsibility:

  • Architect and build the gateway / network load balancer that fronts all research jobs, ensuring long-lived connections remain consistent and performant
  • Design traffic stickiness and routing strategies that optimize for both reliability and throughput
  • Instrument and debug complex distributed systems — with a focus on building world-class observability and debuggability tools (distributed tracing, logging, metrics)
  • Collaborate closely with researchers and ML engineers to understand how infrastructure decisions impact model performance and training dynamics
  • Own the end-to-end system lifecycle: from design and code to deploy, operate, and scale
  • Work in an outcome-oriented environment where everyone contributes across layers of the stack, from infra plumbing to performance tuning

Requirements:

  • Deep experience designing and operating large-scale distributed systems, particularly load balancers, service gateways, or traffic routing layers
  • 5+ years of experience designing in theory for and debugging in practice for the algorithmic and systems challenges of consistent hashing, sticky routing, and low-latency connection management
  • 5+ years of experience as a software engineer and systems architect working on high-scale, high-reliability infrastructure
  • Strong debugging mindset and enjoy spending time in tracing, logs, and metrics to untangle distributed failures
  • Comfortable writing and reviewing production code in Rust or similar systems languages (C/C++, Java, Go, Zig, etc)
  • Operated in big tech or high-growth environments and are excited to apply that experience in a faster-moving setting
  • Take ownership of problems end-to-end and are excited to build something foundational to how our models interact with the world

Nice to have:

  • Experience with gateway or load balancing systems (e.g., Envoy, gRPC, custom LB implementations)
  • Familiarity with inference workloads (e.g., reinforcement learning, streaming inference, KV cache management, etc)
  • Exposure to debugging and operational excellence practices in large production environments
What we offer:
  • Offers Equity
  • Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
  • 401(k) retirement plan with employer match
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
  • 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
  • Mental health and wellness support
  • Employer-paid basic life and disability coverage
  • Annual learning and development stipend to fuel your professional growth
  • Daily meals in our offices, and meal delivery credits as eligible
  • Relocation support for eligible employees
  • Additional taxable fringe benefits, such as charitable donation matching and wellness stipends

Additional Information:

Job Posted:
February 21, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Software Engineer, Networking - Inference

Senior Software Engineer - Network Enablement (Applied ML)

We build simple yet innovative consumer products and developer APIs that shape h...
Location
Location
United States , San Francisco
Salary
Salary:
180000.00 - 270000.00 USD / Year
plaid.com Logo
Plaid
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong software engineering skills including systems design, APIs, and building reliable backend services (Go or Python preferred)
  • Production experience with batch and streaming data pipelines and orchestration tools such as Airflow or Spark
  • Experience building or operating real-time scoring and online feature-serving systems, including feature stores and low-latency model inference
  • Experience integrating model outputs into product flows (APIs, feature flags) and measuring impact through experiments and product metrics
  • Experience with model lifecycle and operations: model registries, CI/CD for models, reproducible training, offline & online parity, monitoring and incident response
Job Responsibility
Job Responsibility
  • Embed model inference into Network Enablement product flows and decision logic (APIs, feature flags, backend flows)
  • Define and instrument product + ML success metrics (fraud reduction, retention lift, false positives, downstream impact)
  • Design and run experiments and rollout plans (backtesting, shadow scoring, A/B tests, feature-flagged releases) to validate product hypotheses
  • Build and operate offline training pipelines and production batch scoring for bank intelligence products
  • Ship and maintain online feature serving and low-latency model inference endpoints for real-time partner/bank scoring
  • Implement model CI/CD, model/version registry, and safe rollout/rollback strategies
  • Monitor model/data health: drift/regression detection, model-quality dashboards, alerts, and SLOs targeted to partner product needs
  • Ensure offline and online parity, data lineage, and automated validation / data contracts to reduce regressions
  • Optimize inference performance and cost for real-time scoring (batching, caching, runtime selection)
  • Ensure fairness, explainability and PII-aware handling for partner-facing ML features
What we offer
What we offer
  • medical
  • dental
  • vision
  • 401(k)
  • equity
  • commission
  • Fulltime
Read More
Arrow Right

Software Engineer Staff

This Software Engineer Staff will be engaged in data science-related research an...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Utilize analytical and programming skills and open-source systems, such as Apache Storm, Apache Spark, Elasticsearch, Cassandra, Graph DB etc. develop data processing pipeline required efficacy and latency
  • Require good knowledge and experience of the big data tool sets and techniques of distributed storage and computation engine
  • Require the experience to develop the reusable and highly scalable data processing component
  • Require good knowledge and experience to work with cloud based CICD tools and cloud devops teams to collect stats and create monitors for our data processing pipelines
  • Develop good quality python APIs to support micro services
  • Require the knowledge of APIs to various No SQL storage systems, Elasticsearch, Cassandra, and Redis, etc.
  • Good understanding Python Flask web service and be able to develop good quality code
  • Troubleshoot production environment and customer reported issues
  • Require the knowledge of the multi-cloud production environment
  • Require the agility to troubleshoot open-source data processing engine, such as Apache Spark, Apache Storm and Apache Flink
Job Responsibility
Job Responsibility
  • Designs, develops, troubleshoots and debugs software programs for software enhancements and new products
  • Develops software including operating systems, compilers, routers, networks, utilities, databases and Internet-related tools
  • Determines hardware compatibility and/or influences hardware design
  • Engaged in data science-related research and software application development and engineering duties related to our enterprise-grade Wi-Fi technology and autonomous platform to provide an unprecedented visibility into the user experience
  • Collaborate with other engineers and product managers to build the next generation of autonomous Wi-Fi networks leveraging big data and predictive models
  • Use knowledge of wireless communication networks, machine learning and software engineering to develop and implement scalable algorithms to process a large amount of streaming data to detect anomalies, predict problems, and classify them in real-time
  • Leverage the data collected from the Wi-Fi network to empower the inference engine of our Mist platform and systems, including the Mist virtual assistant chat bot
  • Determine the likelihood of failures across the Wi-Fi network and performing failure scope analysis
What we offer
What we offer
  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion
  • Fulltime
Read More
Arrow Right

Principal Software Engineer

Principal Software Engineer role at Hewlett Packard Enterprise to design, develo...
Location
Location
United States , San Jose
Salary
Salary:
148000.00 - 340500.00 USD / Year
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor or Masters degree in Computer science, Computer Engineering or a related field
  • 10+ years of experience in software engineering with a focus on Python, Go or Java
  • Strong understanding of RESTful API design and development
  • 2+ years of Experience working with large scale distributed systems based on either cloud technologies or Kubernetes
  • 2+ years of experience on event-driven technologies like Kafka and Apache Storm/Flink
  • 2+ years of experience in Big-data technologies like Apache spark/Databricks
  • Proficient in working with Redis and databases like Cassandra/Datastax
  • Must hold U.S. citizenship
Job Responsibility
Job Responsibility
  • Design, develop, and test software related to the cloud-based network configuration and reporting system
  • Solve complex problems and designing subsystems for Mist platform
  • Develop software for highly scalable and fault-tolerant cloud-scale distributed applications
  • Develop microservices using Python, and/or Go (golang)
  • Develop event-driven systems using Python and Java
  • Develop software for AIDE's real-time data pipeline and batch processing
  • Develop ETL pipelines aiding in training and inference of various ML models using big-data frameworks like Apache Spark
  • Build metrics, monitoring and structured logging into the product
  • Write unit, integration and functional tests
  • Participate in collaborative, DevOps style, lean practices
What we offer
What we offer
  • Health & Wellbeing benefits
  • Personal & Professional Development programs
  • Unconditional Inclusion environment
  • Comprehensive benefits suite supporting physical, financial and emotional wellbeing
  • Fulltime
Read More
Arrow Right

Principal Software Engineer

Principal Software Engineer role at Hewlett Packard Enterprise to design, develo...
Location
Location
United States , San Jose
Salary
Salary:
148000.00 - 340500.00 USD / Year
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor or Masters degree in Computer science, Computer Engineering or a related field
  • 10+ years of experience in software engineering with a focus on Python, Go or Java
  • Strong understanding of RESTful API design and development
  • 2+ years of Experience working with large scale distributed systems based on either cloud technologies or Kubernetes
  • 2+ years of experience on event-driven technologies like Kafka and Apache Storm/Flink
  • 2+ years of experience in Big-data technologies like Apache spark/Databricks
  • Proficient in working with Redis and databases like Cassandra/Datastax
  • Excellent problem-solving and analytical skills
  • Strong communication and collaboration skills
Job Responsibility
Job Responsibility
  • Design, develop, and test software related to the cloud-based network configuration and reporting system
  • Solve complex problems and design subsystems for the Mist platform
  • Develop software for highly scalable and fault-tolerant cloud-scale distributed applications
  • Develop microservices using Python, and/or Go (golang)
  • Develop event-driven systems using Python and Java
  • Develop software for AIDE's real-time data pipeline and batch processing
  • Develop ETL pipelines aiding in training and inference of various ML models using big-data frameworks like Apache Spark
  • Build metrics, monitoring and structured logging into the product
  • Write unit, integration and functional tests
  • Participate in collaborative, DevOps style, lean practices
What we offer
What we offer
  • Health & Wellbeing benefits
  • Personal & Professional Development programs
  • Unconditional Inclusion environment
  • Comprehensive suite of benefits supporting physical, financial and emotional wellbeing
  • Fulltime
Read More
Arrow Right

Software Engineer, AI Infrastructure

As a Software Engineer on our AI Infrastructure team, you will help design the c...
Location
Location
United States , New York, NY; San Mateo, CA
Salary
Salary:
Not provided
fireworks.ai Logo
Fireworks AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Engineering, or a related technical field (or equivalent practical experience)
  • 3 years of experience in software engineering, with a focus on infrastructure or machine learning systems
  • Strong programming skills in Python, Go, or a similar language
  • Proven experience in ML infrastructure and tooling (e.g., PyTorch, MLflow, Vertex AI, SageMaker, Kubernetes, etc.)
  • Basic understanding of LLM knowledge (e.g., context length, disaggregated prefill, KV cache memory estimation, etc)
Job Responsibility
Job Responsibility
  • Contribute to the design and development of scalable backend infrastructure that supports distributed training, inference, and data pipelines
  • Build and maintain core backend services such as LLM CI/CD pipeline, control plane, and model serving systems
  • Support performance optimization, cost efficiency, and reliability improvements across compute, storage, and networking layers
  • Building frameworks and safeguards to ensure Fireworks AI has the best model quality in the industry
  • Collaborate with performance, training, and product teams to translate research and product needs into infrastructure solutions
  • Participate in code reviews, technical discussions, and continuous integration and deployment processes
What we offer
What we offer
  • Solve Hard Problems: Tackle challenges at the forefront of AI infrastructure
  • Build What’s Next: Work with bleeding-edge technology that impacts how businesses and developers harness AI globally
  • Ownership & Impact: Join a fast-growing, passionate team where your work directly shapes the future of AI—no bureaucracy, just results
  • Learn from the Best: Collaborate with world-class engineers and AI researchers who thrive on curiosity and innovation
  • Fulltime
Read More
Arrow Right

AI Software Engineer - NLP/LLM

At Moody's, we unite the brightest minds to turn today’s risks into tomorrow’s o...
Location
Location
United States , New York
Salary
Salary:
159300.00 - 230850.00 USD / Year
moodys.com Logo
Moody's
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of demonstrated experience building production-grade machine learning systems with measurable impacts
  • expertise in NLP and search and recommendation systems is preferred
  • Hands-on experience with large language model (LLM) applications and AI agents, including retrieval-augmented generation, prompt optimization, fine-tuning, agent design, and evaluation methodologies
  • familiarity with prompt optimization frameworks like DSPy is preferred
  • Deep expertise in machine learning models and systems design, including classic models (e.g., XGBoost), modern deep learning and graph machine learning architectures (e.g., transformers-based models, graph neural networks (GNN)), and reinforcement learning systems
  • Proven ability to take models and agents from research to production, including optimization for latency and cost, implementation of monitoring and tracing, and development of reusable platforms or frameworks
  • Strong technical leadership and mentorship skills, with a track record of growing engineers, improving team velocity through automation, documentation, and tooling, and influencing architectural decisions without direct authority
  • Excellent communication and strategic thinking abilities, capable of aligning technical decisions with business outcomes, navigating ambiguity, and driving cross-functional collaboration
  • Bachelor’s degree or higher in Computer Science, Engineering, or a related field
Job Responsibility
Job Responsibility
  • Design and deploy end to end AI and machine learning solutions including machine learning and graph-based models, natural language processing (NLP) models, and large language model (LLM) based AI agents
  • Build robust pipelines for data ingestion, feature engineering, model training, validation, and real-time or batch inference
  • Develop and integrate large language model (LLM) applications using techniques such as fine-tuning, retrieval-augmented generation, and reinforcement learning
  • Build autonomous agents capable of multi-step reasoning and tool use in production environments
  • Lead the full model and agent development lifecycle, from problem definition and data exploration through experimentation, implementation, deployment, and monitoring
  • Ensure solutions are scalable, reliable, and aligned with business goals
  • Advocate and implement machine learning operations (MLOps) best practices including data monitoring and tracing, error analysis, automated retraining, model and prompt versioning, business metrics monitoring, and incident response
  • Collaborate across disciplines and provide technical leadership, working with product managers, engineers, and researchers to deliver impactful solutions
  • Mentor team members, lead design reviews, and promote best practices in AI and machine learning systems development
What we offer
What we offer
  • medical
  • dental
  • vision
  • parental leave
  • paid time off
  • a 401(k) plan with employee and company contribution opportunities
  • life, disability, and accident insurance
  • a discounted employee stock purchase plan
  • tuition reimbursement
  • Fulltime
Read More
Arrow Right

Software Engineer, Infrastructure

As a Software Engineer on our Infrastructure team, you will help design and buil...
Location
Location
United States , New York; San Mateo; Redwood City
Salary
Salary:
140000.00 - 150000.00 USD / Year
fireworks.ai Logo
Fireworks AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Engineering, or a related technical field (or equivalent practical experience)
  • Strong programming skills in Python, C++, or a similar language
  • Solid understanding of computer systems concepts such as networking, storage, and distributed computing
  • Familiarity with cloud platforms like AWS, GCP, or Azure, and containerization tools like Docker or Kubernetes
  • Knowledge and interest in cloud infrastructure, distributed systems, and machine learning
Job Responsibility
Job Responsibility
  • Contribute to the design and development of scalable backend infrastructure that supports distributed training, inference, and data pipelines
  • Build and maintain core backend services such as job schedulers, autoscalers, resource managers, and model serving systems
  • Support performance optimization, cost efficiency, and reliability improvements across compute, storage, and networking layers
  • Collaborate with ML, DevOps, and product teams to translate research and product needs into infrastructure solutions
  • Learn and apply modern cloud technologies including Kubernetes, Ray, Kubeflow, and MLFlow
  • Participate in code reviews, technical discussions, and continuous integration and deployment processes
What we offer
What we offer
  • Meaningful equity in a fast-growing startup
  • Competitive salary and comprehensive benefits package
  • Fulltime
Read More
Arrow Right

Senior Applied Scientist

Microsoft Ads powers experiences at global scale through large-scale machine lea...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s /Masters Degree in Computer Science, Mathematics, Software Engineering, Computer Engineering, or related technical field, and 5+ years of related experience in machine learning systems, distributed systems, inference infrastructure, or software engineering
  • OR Doctorate in Computer Science, Mathematics, Software Engineering, Computer Engineering, or related technical field, and 2+ years of related experience
  • Strong programming skills in Python, C++, or C#
  • Hands-on experience in one or more of the following areas: Large-scale ML/LLM inference serving in production
  • MLSys for model deployment, serving, or runtime optimization
  • Experience building or optimizing systems for online inference, batch inference, or near-real-time inference
  • Strong understanding of inference bottlenecks such as batching, queuing, tail latency, KV-cache pressure, memory bandwidth limits, caching, and heterogeneous resource utilization
  • Experience with one or more modern inference stacks or runtimes such as vLLM, TensorRT-LLM, SGLang, Triton, ONNX Runtime, DeepSpeed, or PyTorch inference tooling
  • Experience with modern LLM inference and serving techniques, including areas such as KV-cache management, prefix caching, speculative decoding, quantization, prefill/decode disaggregation, or MoE inference optimization
  • Experience with production-scale model serving platforms and distributed inference systems, including multi-node or multi-tenant deployments, resource-aware scheduling, and optimization across heterogeneous workloads
Job Responsibility
Job Responsibility
  • Design and optimize end-to-end ML/LLM inference workflows across online low-latency serving, near-real-time inference, and large-scale batch inference scenarios
  • Build scalable serving and execution systems for large-scale models, including scheduling, batching, routing, admission control, and resource-aware execution
  • Improve inference performance and efficiency across compute, memory, storage, network, and concurrency dimensions, with strong focus on latency, throughput, reliability, and cost
  • Develop and apply modern serving techniques such as continuous or dynamic batching, prefix caching, KV-cache optimization, request shaping, tail-latency reduction, and runtime-level performance tuning
  • Optimize systems for key generative inference metrics such as time to first token, inter-token latency, throughput, accelerator utilization, and cost per request
  • Work on runtime and serving optimizations for modern inference stacks such as vLLM, TensorRT-LLM, SGLang, Triton, ONNX Runtime, and PyTorch-based serving systems
  • Partner with applied scientists to productionize new models and inference patterns, including agentic workflows with tool use, structured outputs, and long-context workloads, and evaluate quality-latency-cost tradeoffs in real production scenarios
  • Design and improve scheduling and resource management for heterogeneous and multi-tenant inference workloads, including GPU-aware placement, admission control, burst handling, and workload isolation
  • Build strong observability and diagnostics for inference services, including bottleneck analysis, performance regression detection, and end-to-end latency and cost measurement
  • Fulltime
Read More
Arrow Right