Software Engineer, Model Inference Job at OpenAI (San Francisco)

Principal Software Engineer - CoreAI Model Inference & Serving

Join our team within CoreAI, where we are building the AI data-plane that powers...

Location

United States , Redmond

Salary:

139900.00 - 274800.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, or Java
OR equivalent experience
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
These requirements include, but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter

Job Responsibility

Be a hands-on technical leader, designing, coding, and shipping core serving systems, smart routing, and request distribution for a broad portfolio of LLMs, including OpenAI, Mistral, Grok, DeepSeek, and others
Build large-scale AI services and platform capabilities that power new products and customer experiences
Drive cutting-edge innovation in AI systems alongside world-class engineers and cross-functional partners
Lead through architecture, code reviews, mentorship, and technical excellence while staying close to implementation
Improve reliability, scalability, observability, efficiency, and performance across mission-critical services

Fulltime

Senior Software Engineer - CoreAI Model Inference & Serving

Join our team within CoreAI, where we are building the AI data-plane that powers...

Location

United States , Redmond

Salary:

119800.00 - 234700.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, or Java
OR equivalent experience
Ability to meet Microsoft, customer and/or government security screening requirements
Microsoft Cloud Background Check
4+ years of design and problem-solving experience, with understanding of system performance, scalability, and engineering best practices
Understanding of distributed systems specifically in request serving at scale
(e.g. inferencing, L7 gateways, high-performance storage, distributed databases across global-scale infrastructure)
Demonstrated experience in building high-quality, reliable systems at scale
Experience using modern AI-assisted development tools and workflows to move faster, improve quality, and amplify engineering impact
Customer-obsessed approach to problem solving, with empathy and a drive to deliver impactful solutions

Job Responsibility

Be a hands-on technical leader, designing, coding, and shipping core serving systems, smart routing, and request distribution for a broad portfolio of LLMs
Build large-scale AI services and platform capabilities that power new products and customer experiences
Drive cutting-edge innovation in AI systems alongside world-class engineers and cross-functional partners
Lead through architecture, code reviews, mentorship, and technical excellence while staying close to implementation
Improve reliability, scalability, observability, efficiency, and performance across mission-critical services

Fulltime

Senior Software Engineer and Principal Software Engineer - Power Point AI Team

The PowerPoint team is embarking on an exciting new chapter - evolving a product...

Location

United States , Redmond

Salary:

119800.00 - 234700.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
8+ years of experience in backend service engineering, including work on high-scale infrastructures
Proficiency in one or more systems programming languages such as C#, C++
1+ years of experience in software engineering, designing and developing systems (and APIs) that deploy and integrate with AI models
2+ years of experience working with rich telemetry, making data driven decisions, and carrying out rapid experimentation
2+ years of experience building software for scale, performance, and reliability
Academic or industry experience with building, finetuning, deploying or building eval-driven systems utilizing the models (any category)

Job Responsibility

Lead design and delivery of complex, scalable AI features ensuring resilience and exceptional user experience
Drive technical strategy and architecture decisions across multiple services, influencing partner teams and aligning with compliance and security requirements
Champion modern engineering practices, including AI-driven approaches, automation, and cloud-native patterns, across the full development lifecycle
Mentor and guide engineers, fostering technical excellence and continuous improvement in security, reliability, and performance
Collaborate cross-org to solve challenging technical problems, streamline processes, and reduce operational costs while improving live-site health
Design and implement scalable backend services optimized for machine learning workflows and large language model integration
Develop and maintain evaluation-driven systems that leverage text and multimodal inputs (e.g., images) to power visual-creation experiences
Build and optimize APIs and infrastructure to support high-performance model inference and experimentation at scale
Collaborate with product, ML, and design teams to integrate models into user-facing features, ensuring seamless functionality and performance
Conduct model evaluations and experiments, analyze results, and iterate on improvements to enhance accuracy and user experience

Fulltime

Software Engineer II and Senior Software Engineer - Performance

The Artificial Intelligence Performance team at Microsoft develops AI software t...

Location

United States , Mountain View

Salary:

100600.00 - 199000.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field AND 2+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, or Python OR equivalent experience
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter

Job Responsibility

Identify and drive improvements to end-to-end inference performance of OpenAI and other state-of-the-art LLMs
Measure, benchmark performance on Nvidia/AMD GPUs and first party Microsoft silicon
Optimize and monitor performance of LLMs and build SW tooling to enable insights into performance opportunities ranging from the model level to the systems and silicon level to improve customer experience and reduce the footprint of the computing fleet
Enable fast time to market of LLMs/models and their deployments at scale by building SW tools that afford velocity in porting models on new Nvidia and AMD GPUs
Design, implement, and test functions or components for our AI/DNN/LLM frameworks and tools
Speeding up/reducing complexity of key components/pipelines to improve performance and/or efficiency of our systems
Communicate and collaborate with our partners both internal and external
Embody Microsoft's Culture and Values

Fulltime

Software Engineer, Inference – AMD GPU Enablement

We’re hiring engineers to scale and optimize OpenAI’s inference infrastructure a...

Location

United States , San Francisco

Salary:

295000.00 - 555000.00 USD / Year

OpenAI

Expiration Date

Until further notice

Requirements

Experience writing or porting GPU kernels using HIP, CUDA, or Triton
Familiarity with communication libraries like NCCL/RCCL
Experience working on distributed inference systems
Ability to solve end-to-end performance challenges across hardware, system libraries, and orchestration layers
Ability to thrive in a small, fast-moving team building new infrastructure from first principles

Job Responsibility

Own bring-up, correctness and performance of the OpenAI inference stack on AMD hardware
Integrate internal model-serving infrastructure (e.g., vLLM, Triton) into a variety of GPU-backed systems
Debug and optimize distributed inference workloads across memory, network, and compute layers
Validate correctness, performance, and scalability of model execution on large GPU clusters
Collaborate with partner teams to design and optimize high-performance GPU kernels for accelerators using HIP, Triton, or other performance-focused frameworks
Collaborate with partner teams to build, integrate and tune collective communication libraries (e.g., RCCL) used to parallelize model execution across many GPUs

What we offer

Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
401(k) retirement plan with employer match
Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
Mental health and wellness support
Employer-paid basic life and disability coverage
Annual learning and development stipend to fuel your professional growth
Daily meals in our offices, and meal delivery credits as eligible

Fulltime

Software Engineer, Inference - Multi Modal

OpenAI’s Inference team powers the deployment of our most advanced models - incl...

Location

United States , San Francisco

Salary:

295000.00 - 555000.00 USD / Year

OpenAI

Expiration Date

Until further notice

Requirements

Experience building and scaling inference systems for LLMs or multimodal models
Worked with GPU-based ML workloads and understand the performance dynamics of large models, especially with complex data like images or audio
Enjoy experimental, fast-evolving work and collaborating closely with research
Comfortable dealing with systems that span networking, distributed compute, and high-throughput data handling
Familiarity with inference tooling like vLLM, TensorRT-LLM, or custom model parallel systems
Own problems end-to-end and are excited to operate in ambiguous, fast-moving spaces

Job Responsibility

Design and implement inference infrastructure for large-scale multimodal models
Optimize systems for high-throughput, low-latency delivery of image and audio inputs and outputs
Enable experimental research workflows to transition into reliable production services
Collaborate closely with researchers, infra teams, and product engineers to deploy state-of-the-art capabilities
Contribute to system-level improvements including GPU utilization, tensor parallelism, and hardware abstraction layers

What we offer

Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
401(k) retirement plan with employer match
Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
Mental health and wellness support
Employer-paid basic life and disability coverage
Annual learning and development stipend to fuel your professional growth
Daily meals in our offices, and meal delivery credits as eligible

Fulltime

Staff Software Engineer, Model LifeCycle

The Staff Software Engineer for the Model LifeCycle team will play a key role in...

Location

United States , San Francisco

Salary:

208725.00 - 253000.00 USD / Year

Crusoe

Expiration Date

Until further notice

Requirements

Bachelor's or Master's degree in Computer Science, Engineering, or a related field
8-10+ years of industry experience with demonstrated history of consistent success leading a varied portfolio of initiatives across your function
Proven track record of delivering production features on time
Experience in using cloud-based services, such as, elastic compute, object storage, virtual private networks, managed database, etc.
Experience with Generative AI (Large Language Models, Multimodal)
Experience with AI infrastructure, including training, inference

Job Responsibility

Contribute to fine-tuning systems for large foundation models (SFT, PEFT, LoRA, adapters), including multi-node orchestration, checkpointing, failure recovery, and cost-efficient scaling
Implement and maintain end-to-end training pipelines for Large Language Models
Contribute to distillation and reinforcement learning pipelines (e.g., preference optimization, policy optimization, reward modeling)
Develop and maintain agent execution infrastructure
Implement features for dataset, model, and experiment management: versioning, lineage, evaluation, and reproducible fine-tuning at scale
Work closely with Principal Engineers, product, business, and platform teams to implement the core abstractions and APIs of the system
Contribute to architectural decisions around training runtimes, scheduling, storage, and model lifecycle management
Engage with the open-source LLM ecosystem

What we offer

Restricted Stock Units in a fast growing, well-funded technology company
Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
Employer contributions to HSA accounts
Paid Parental Leave
Paid life insurance, short-term and long-term disability
Teladoc
401(k) with a 100% match up to 4% of salary
Generous paid time off and holiday schedule
Cell phone reimbursement
Tuition reimbursement

Fulltime

Staff Software Engineer, Inference Infrastructure

Our mission is to scale intelligence to serve humanity. We’re training and deplo...

Location

San Francisco, Toronto, London, New York, Montreal

Salary:

Not provided

Cohere

Expiration Date

Until further notice

Requirements

5+ years of engineering experience running production infrastructure at a large scale
Experience designing large, highly available distributed systems with Kubernetes, and GPU workloads on those clusters
Experience with Kubernetes dev and production coding and support
Experience with GCP, Azure, AWS, OCI, multi-cloud on-prem / hybrid serving
Experience in designing, deploying, supporting, and troubleshooting in complex Linux-based computing environments
Experience in compute/storage/network resource and cost management
Excellent collaboration and troubleshooting skills to build mission-critical systems, and ensure smooth operations and efficient teamwork
The grit and adaptability to solve complex technical challenges that evolve day to day
Familiarity with computational characteristics of accelerators (GPUs, TPUs, and/or custom accelerators), especially how they influence latency and throughput of inference
Strong understanding or working experience with distributed systems

Job Responsibility

Developing, deploying, and operating the AI platform delivering Cohere's large language models through easy to use API endpoints
Working closely with many teams to deploy optimized NLP models to production in low latency, high throughput, and high availability environments
Interfacing with customers and creating customized deployments to meet their specific needs

What we offer

An open and inclusive culture and work environment
Work closely with a team on the cutting edge of AI research
Weekly lunch stipend, in-office lunches & snacks
Full health and dental benefits, including a separate budget to take care of your mental health
100% Parental Leave top-up for up to 6 months
Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
6 weeks of vacation (30 working days!)

Fulltime

Select Country

Software Engineer, Model Inference

Job Description

Job Responsibility

Requirements

Nice to have

What we offer

Looking for more opportunities?