CrawlJobs Logo

Software Engineer, Inference - Multi Modal

United States, San Francisco 295000.00 - 555000.00 USD / Year · Job Posted February 21, 2026
Apply Position
Job Link Share

Job Description

OpenAI’s Inference team powers the deployment of our most advanced models - including our GPT models, 4o Image Generation, and Whisper - across a variety of platforms. Our work ensures these models are available, performant, and scalable in production, and we partner closely with Research to bring the next generation of models into the world. We're a small, fast-moving team of engineers focused on delivering a world-class developer experience while pushing the boundaries of what AI can do. We’re expanding into multimodal inference, building the infrastructure needed to serve models that handle image, audio, and other non-text modalities. These workloads are inherently more heterogeneous and experimental, involving diverse model sizes and interactions, more complex input/output formats, and tighter coordination with product and research. We’re looking for a software engineer to help us serve OpenAI’s multimodal models at scale. You’ll be part of a small team responsible for building reliable, high-performance infrastructure for serving real-time audio, image, and other MM workloads in production. This work is inherently cross-functional: you’ll collaborate directly with researchers training these models and with product teams defining new modalities of interaction. You'll build and optimize the systems that let users generate speech, understand images, and interact with models in ways far beyond text.

Job Responsibility

  • Design and implement inference infrastructure for large-scale multimodal models
  • Optimize systems for high-throughput, low-latency delivery of image and audio inputs and outputs
  • Enable experimental research workflows to transition into reliable production services
  • Collaborate closely with researchers, infra teams, and product engineers to deploy state-of-the-art capabilities
  • Contribute to system-level improvements including GPU utilization, tensor parallelism, and hardware abstraction layers

Requirements

  • Experience building and scaling inference systems for LLMs or multimodal models
  • Worked with GPU-based ML workloads and understand the performance dynamics of large models, especially with complex data like images or audio
  • Enjoy experimental, fast-evolving work and collaborating closely with research
  • Comfortable dealing with systems that span networking, distributed compute, and high-throughput data handling
  • Familiarity with inference tooling like vLLM, TensorRT-LLM, or custom model parallel systems
  • Own problems end-to-end and are excited to operate in ambiguous, fast-moving spaces

Nice to have

  • Experience working with image generation or audio synthesis models in production
  • Exposure to distributed ML training or system-efficient model design

What we offer

  • Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
  • 401(k) retirement plan with employer match
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
  • 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
  • Mental health and wellness support
  • Employer-paid basic life and disability coverage
  • Annual learning and development stipend to fuel your professional growth
  • Daily meals in our offices, and meal delivery credits as eligible
  • Relocation support for eligible employees
  • Additional taxable fringe benefits, such as charitable donation matching and wellness stipends

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Software Engineer, Inference - Multi Modal

8 matching positions

Principal Software Engineer, CoreAI Workload Engines

The CoreAI Workloads team builds the foundational inference engines and APIs tha...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 331200.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field and 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Python, or equivalent experience
  • Proven ability to design and operate large-scale, production inference services with high reliability and performance requirements, and to ship performance improvements safely via disciplined experimentation
  • Strong skills in performance analysis: benchmarking, profiling, diagnosing regressions, and turning results into concrete engine/runtime changes
  • Strong problem-solving skills and the ability to debug complex, cross layer systems issues
  • Demonstrated technical leadership, including mentoring engineers, driving cross-team architectural alignment, and leveraging AI tools and AI-assisted workflows to accelerate engineering velocity and quality
  • Hands-on experience with Kubernetes (building and operating services on k8s), including debugging production issues and designing platform abstractions (e.g., custom resources/controllers) and scheduling-aware deployments (e.g., node affinity, taints/tolerations, resource requests/limits)
  • Strong collaboration and communication skills, with the ability to work across organizational boundaries
Job Responsibility
Job Responsibility
  • Optimize inference engines for OpenAI and open-source models by implementing and shipping performance/efficiency improvements across runtime, scheduling, and serving paths (latency, throughput, utilization, availability, and cost)
  • Run experiments end-to-end: formulate hypotheses, implement engine changes (including Python/PyTorch integration points where relevant), analyze results, and ship improvements behind guardrails
  • Build and use experimentation capabilities for large-scale AI inference (experiment lifecycle, tracking, metric modeling, comparability standards, automated analysis) so the team can iterate quickly and safely
  • Own serving availability and efficiency for Azure OpenAI Service workloads through tiered experimentation, lean segmentation, and multi-modal utilization across heterogeneous fleets—turning findings into shipped engine improvements
  • Design and evolve inference serving architectures to improve utilization and latency using techniques such as disaggregated serving, multi-token prediction, KV offload/retrieval, and quantization—validated via staged rollouts and production guardrails
  • Extend AI infrastructure abstractions to support elastic, heterogeneous inference engines reliably at scale (e.g., dynamic scaling across model families, modalities, and workload classes while maintaining isolation and SLOs)
  • Tune and scale inference engines across NVIDIA GPU generations (A100, H100, H200) for state-of-the-art OpenAI models, focusing on serving efficiency, utilization, and reliability (not hardware bring-up)
  • Partner with networking and storage teams to leverage high-performance interconnects (e.g., RDMA/InfiniBand-class fabrics such as RoCE over IB) for distributed inference, without owning low-level kernel/driver enablement
  • Drive end-to-end features from design through production: observability, diagnostics, performance regression detection, and operational excellence for inference serving
  • Influence platform architecture and technical direction across teams through design reviews, clear metrics, and technical leadership focused on experimentation velocity and production reliability
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, Multimodal Infrastructure

Microsoft AI is looking for a Member of Technical Staff, Multimodal Infrastructu...
Location
Location
United States , Mountain View
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science, or related technical discipline AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Experience in multi-modal data processing: Strong proficiency in distributed data processing infra (resource utilization management, fault tolerance, ray & spark) and CPU/GPU batch processing optimizations
  • Experience with state-of-art model inference and serving frameworks
  • Experience with image/video/audio data processing
  • Experience with common data formats for efficient I/O
  • Experience in multi-modal pretraining and post-training: Strong proficiency in deep learning frameworks such as PyTorch, Megatron and Deepspeed
  • Knowledge of auto-regressive and diffusion transformer models
  • Experience with distributed training techniques such as data parallelism, model parallelism, and pipeline parallelism
  • Proven experiences in at least one of the following areas: image/video generation and editing
  • efficient architectures (e.g., MoE, window attention)
Job Responsibility
Job Responsibility
  • Design, develop and maintain large-scale multimodal data processing pipelines
  • Design, develop and maintain large-scale multimodal model pretraining and post-training frameworks
  • Design, develop and maintain large-scale multimodal model inference and serving frameworks
  • Work with research scientists and product engineers to solve infra-related problems
  • Find a path to get things done despite roadblocks to get your work into the hands of users quickly and iteratively
  • Enjoy working in a fast-paced, design-driven, product development cycle
  • Embody our Culture and Values
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, CoreAI Workload Engines

The CoreAI Workloads team builds the foundational inference engines and APIs tha...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field and 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Python, or equivalent experience.
  • Proven ability to design and operate large-scale, production inference services with high reliability and performance requirements, and to ship performance improvements safely via disciplined experimentation.
  • Strong skills in performance analysis: benchmarking, profiling, diagnosing regressions, and turning results into concrete engine/runtime changes.
  • Strong problem-solving skills and the ability to debug complex, cross layer systems issues.
  • Demonstrated technical leadership, including mentoring engineers, driving cross-team architectural alignment, and leveraging AI tools and AI-assisted workflows to accelerate engineering velocity and quality.
  • Hands-on experience with Kubernetes (building and operating services on k8s), including debugging production issues and designing platform abstractions (e.g., custom resources/controllers) and scheduling-aware deployments (e.g., node affinity, taints/tolerations, resource requests/limits).
  • Strong collaboration and communication skills, with the ability to work across organizational boundaries.
Job Responsibility
Job Responsibility
  • Optimize inference engines for OpenAI and open-source models by implementing and shipping performance/efficiency improvements across runtime, scheduling, and serving paths (latency, throughput, utilization, availability, and cost).
  • Run experiments end-to-end: formulate hypotheses, implement engine changes (including Python/PyTorch integration points where relevant), analyze results, and ship improvements behind guardrails.
  • Build and use experimentation capabilities for large-scale AI inference (experiment lifecycle, tracking, metric modeling, comparability standards, automated analysis) so the team can iterate quickly and safely.
  • Own serving availability and efficiency for Azure OpenAI Service workloads through tiered experimentation, lean segmentation, and multi-modal utilization across heterogeneous fleets—turning findings into shipped engine improvements.
  • Design and evolve inference serving architectures to improve utilization and latency using techniques such as disaggregated serving, multi-token prediction, KV offload/retrieval, and quantization—validated via staged rollouts and production guardrails.
  • Extend AI infrastructure abstractions to support elastic, heterogeneous inference engines reliably at scale (e.g., dynamic scaling across model families, modalities, and workload classes while maintaining isolation and SLOs).
  • Tune and scale inference engines across NVIDIA GPU generations (A100, H100, H200) for state-of-the-art OpenAI models, focusing on serving efficiency, utilization, and reliability (not hardware bring-up).
  • Partner with networking and storage teams to leverage high-performance interconnects (e.g., RDMA/InfiniBand-class fabrics such as RoCE over IB) for distributed inference, without owning low-level kernel/driver enablement.
  • Drive end-to-end features from design through production: observability, diagnostics, performance regression detection, and operational excellence for inference serving.
  • Influence platform architecture and technical direction across teams through design reviews, clear metrics, and technical leadership focused on experimentation velocity and production reliability.
What we offer
What we offer
  • Benefits and other compensation
  • Fulltime
Read More
Arrow Right

Senior Software Engineer - Wayve Foundation Model

This is a rare opportunity to join the small but high-leverage engineering team ...
Location
Location
Canada , Vancouver
Salary
Salary:
Not provided
wayve.ai Logo
Wayve
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong software engineering skills with experience building and maintaining distributed systems, data pipelines, or backend platforms at scale
  • Experience developing infrastructure that supports machine learning workflows—such as training orchestration, evaluation tooling, or inference systems
  • Comfort working closely with research or ML teams to understand their iteration needs and build systems that accelerate them
  • Familiarity with technologies like Flyte, Ray, Spark, Airflow, or Kubernetes, and an understanding of how to use them to scale data and compute
  • Ownership mindset with the ability to identify bottlenecks, operate across team boundaries, and “get stuff done” in ambiguous, fast-moving environments
Job Responsibility
Job Responsibility
  • Design and scale infrastructure for data ingestion, filtering, and curation of multi-modal embodied data
  • Build robust, efficient training, evaluation, and inference pipelines to support foundation model development
  • Partner closely with scientists and MLEs to accelerate experimentation and unblock research
  • Improve ML systems performance, scalability, and automation across the stack
  • Act as a cross-functional force multiplier—connecting Science, Software, and Data teams through well-designed tooling and systems
  • Fulltime
Read More
Arrow Right

Senior Software Engineer - Wayve Foundation Model

This is a rare opportunity to join the small but high-leverage engineering team ...
Location
Location
Canada , Vancouver
Salary
Salary:
Not provided
wayve.ai Logo
Wayve
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong software engineering skills with experience building and maintaining distributed systems, data pipelines, or backend platforms at scale
  • Experience developing infrastructure that supports machine learning workflows—such as training orchestration, evaluation tooling, or inference systems
  • Comfort working closely with research or ML teams to understand their iteration needs and build systems that accelerate them
  • Familiarity with technologies like Flyte, Ray, Spark, Airflow, or Kubernetes, and an understanding of how to use them to scale data and compute
  • Ownership mindset with the ability to identify bottlenecks, operate across team boundaries, and “get stuff done” in ambiguous, fast-moving environments
Job Responsibility
Job Responsibility
  • Design and scale infrastructure for data ingestion, filtering, and curation of multi-modal embodied data
  • Build robust, efficient training, evaluation, and inference pipelines to support foundation model development
  • Partner closely with scientists and MLEs to accelerate experimentation and unblock research
  • Improve ML systems performance, scalability, and automation across the stack
  • Act as a cross-functional force multiplier—connecting Science, Software, and Data teams through well-designed tooling and systems
  • Fulltime
Read More
Arrow Right

Senior Data Scientist

At Boeing, we innovate and collaborate to make the world a better place. We’re c...
Location
Location
United States , Arlington; Seattle
Salary
Salary:
173400.00 - 234600.00 USD / Year
boeing.com Logo
Boeing
Expiration Date
June 02, 2026
Flip Icon
Requirements
Requirements
  • Bachelor's degree in computer science, Machine Learning, Applied Mathematics, Computer Engineering, Software Engineering, Artificial Intelligence, Physics or a closely related field
  • 5+ years of experience in deep learning frameworks
  • 1+ year of experience fine-tuning open-source LLMs and integrating APIs from commercial providers
  • 5+ years of programming experience in Python, and experience with data engineering workflows (e.g., Spark, Airflow, SQL)
  • Must be a U.S. Person as defined by 22 C.F.R. §120.15
Job Responsibility
Job Responsibility
  • Lead the development and deployment of advanced GenAI models, including LLMs and multi-modal systems
  • Design and implement robust pipelines for model fine-tuning and evaluation
  • Develop and evaluate prompt engineering strategies and embedding techniques
  • Prototype and productionize GenAI applications that solve complex business problems
  • Own model performance evaluation and bias/fairness assessments to ensure ethical deployment
  • Collaborate with MLOps and engineering teams to scale model inference and monitor performance
  • Provide insights on GenAI strategy, tools, and industry trends to the team
  • Mentor junior and mid-level data scientists and contribute to team development
What we offer
What we offer
  • competitive base pay
  • variable compensation opportunities
  • health insurance
  • flexible spending accounts
  • health savings accounts
  • retirement savings plans
  • life and disability insurance programs
  • paid and unpaid time away from work
  • Generous company match to your 401(k)
  • Industry-leading tuition assistance program pays your institution directly
  • Fulltime
!
Read More
Arrow Right

Engineering Director, AI Solutions and Automation (ASA)-AI Product Acceleration

We are seeking a highly accomplished Engineering Director with extensive technic...
Location
Location
United States , Bellevue, WA
Salary
Salary:
271000.00 - 347000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 15+ years of experience growing and leading successful Engineering teams, with a proven ability to recruit, land, and grow both engineering technical managers and individual contributors
  • Extensive expertise (15+ years) in Machine Learning (ML), and Artificial Intelligence (AI), with a history of functioning as a technical leader or lead architect on production systems
  • Extensive experience building and deploying complex, large-scale, distributed AI/ML software systems from the ground up
  • Experience as a great collaborator, building models and processes for aligning work across large, multi-disciplinary teams (Engineering, Data Science, Product Management)
  • Hands-on technical experience in relevant ML/AI languages (e.g., Python, C++) and applying data-driven methodologies to define and manage large software projects
  • Demonstrated ability to drive technical strategy and execution in cutting-edge AI domains like multi-modal processing, model evaluation, or RL-based post-training
Job Responsibility
Job Responsibility
  • Lead and manage teams of AI applied researchers and engineers, providing extensive technical guidance, mentorship, and support to ensure the successful end-to-end delivery of high-quality, scalable AI/ML systems
  • Serve as the technical authority, driving the design, development, and deployment of complex AI solutions, including LLM post-training techniques (like Reinforcement Learning and Fine-Tuning), Multi-modal Content Understanding, and Agentic AI platforms
  • Define and lead the long-term technical strategy and roadmap for large, enterprise-wide AI efforts, ensuring alignment with the ASA mission to deliver cost-efficient and performant AI models
  • Foster an environment of innovation, rapid prototyping, and technical excellence, encouraging experimentation and continuous improvement in the pursuit of SoTA performance
  • Identify new, high-leverage opportunities for LLM-based automation across Meta's product portfolio and influence cross-functional partners for appropriate staffing and prioritization
  • Supervise the development of AI-centric platforms, such as the AI Evaluation and scalable inference and serving infrastructure for 1P, 2P, and 3P models
What we offer
What we offer
  • bonus
  • equity
  • benefits
  • Fulltime
Read More
Arrow Right

Applied Scientist

We are reimagining Windows in the era of AI. As a Applied Scientist you would pl...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java or Python
  • OR equivalent experience
  • 4+ Overall experience End- end shipping of commercial software, with at least 3+ years of experience in AI/ML, predictive analytics or research, and exposure to generative AI/LLM/SLM algorithms
  • A Customer focused innovation mindset
  • Passionate about Craftmanship in engineering
  • Experience building AI/ML solutions is good to have
  • Aptitude to learn and adapt with intensity and agility
Job Responsibility
Job Responsibility
  • Design and implement and experiment end-to-end AI-powered user experiences
  • Build scalable fullstack solutions that integrate AI models (LLMs, vision, speech) via SDKs, APIs, and custom pipelines
  • Collaborate with other engineers to optimize model selection, inference performance, and user interaction loops
  • Partner with PMs, designers, and researchers to prototype and validate new interaction paradigms
  • Contribute to the architecture and infrastructure for AI-first features, ensuring reliability, privacy, and compliance
  • Drive engineering excellence through code reviews, testing, telemetry, and continuous improvement
  • Mentor junior engineers and contribute to a culture of innovation and inclusion
  • Be a Subject Matter Expert in a specific domain or tech
  • Be customer and telemetry focussed and reduce mean time to market and mean time to recover through Engineering Excellence
  • Research and implement state-of-the-art using foundation models, prompt engineering, RAG, graphs, multi-agent architectures, as well as classical machine learning techniques
  • Fulltime
Read More
Arrow Right