CrawlJobs Logo

Inference Technical Lead

openai.com Logo

OpenAI

Location Icon

Location:
United States , San Francisco

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

380000.00 USD / Year

Job Description:

The Sora team is pioneering multimodal capabilities for OpenAI’s foundation models. We’re a hybrid research and product team focused on integrating multimodal functionalities into our AI products, ensuring they are reliable, user-friendly, and aligned with our mission of broad societal benefit. We’re looking for a GPU Inference Engineer to contribute to improvements in model serving efficiency for Sora. This is a high-impact role where you’ll drive initiatives to optimize inference performance and scalability. You’ll also be engaged in model design, to help assist our researchers in developing inference-friendly models. This role is critical to scaling the team’s broader goals - it will directly enable leadership to focus on higher-leverage initiatives by building a stronger technical foundation.

Job Responsibility:

  • Perform engineering efforts focused on improving model serving, inference performance, and system efficiency
  • Drive optimizations from a kernel and data movement perspective to improve system throughput and reliability
  • Partner closely with research and product teams to ensure our models perform effectively at scale
  • Design, build, and improve critical serving infrastructure to support Sora’s growth and reliability needs
  • Contribute to improvements in model serving efficiency for Sora
  • Drive initiatives to optimize inference performance and scalability
  • Be engaged in model design, to help assist our researchers in developing inference-friendly models

Requirements:

  • Deep expertise in model performance optimization, particularly at the inference layer
  • Strong background in kernel-level systems, data movement, and low-level performance tuning
  • Excited about scaling high-performing AI systems that serve real-world, multimodal workloads
  • Can navigate ambiguity, set technical direction, and drive complex initiatives to completion
What we offer:
  • Offers Equity
  • Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
  • 401(k) retirement plan with employer match
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
  • 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
  • Mental health and wellness support
  • Employer-paid basic life and disability coverage
  • Annual learning and development stipend to fuel your professional growth
  • Daily meals in our offices, and meal delivery credits as eligible
  • Relocation support for eligible employees
  • Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided
  • Performance-related bonus(es) for eligible employees

Additional Information:

Job Posted:
February 21, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Inference Technical Lead

Technical Lead

At Spectro Cloud, we are in search of a talented individual to become an integra...
Location
Location
United States , San Jose
Salary
Salary:
Not provided
spectrocloud.com Logo
Spectro Cloud
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science or related technical field
  • 8+ years of software development experience (or 6+ years with a Master's degree)
  • Strong LLM/GenAI fundamentals: Solid understanding of large language models, prompt engineering, embeddings, vector search, RAG systems, and lightweight fine-tuning (LoRA/PEFT preferred)
  • Python expertise: Proficiency in Python and hands-on experience with AI/ML libraries such as Hugging Face, PyTorch, LangChain, LangGraph, FastAPI, or similar frameworks
  • LLM deployment experience: Familiarity with Kubernetes-based inference stacks including vLLM, llm-d, TensorRT, PyTorch Serve, or comparable deployment frameworks
  • Proficiency in at least one modern programming language such as Go, Java, or equivalent
  • Solid understanding of containerization and orchestration concepts, including Kubernetes
  • Deep understanding of microservices architecture and REST API design principles
  • Experience designing and building scalable, cloud-native applications
  • Analytical problem-solving: Ability to debug model outputs, improve retrieval accuracy, optimize latency, and iterate quickly through experiments
Job Responsibility
Job Responsibility
  • Building production-grade AI systems - designing, implementing, and maintaining LLM-powered applications, agentic AI workflows, and RAG pipelines across multiple product use-cases
  • Actively participate in guided technical labs covering prompt engineering, vector databases, LLM deployment tooling, multi-agent orchestration, fine-tuning strategies, and evaluation techniques
  • Develop, refine, and operationalize LLM solutions, including prompt design, retrieval strategies, embedding pipelines, LangChain/LangGraph workflows, and API integrations using Python, Hugging Face, FastAPI, and similar frameworks
  • Ensuring the seamless operation of our platform through a combination of automation, scripting, and rigorous testing
  • Stay ahead of emerging AI trends - small models, efficient inference (vLLM/TensorRT), multimodal systems, on-device LLMs - and recommend tools, frameworks, or integrations that enhance our platform
  • Work closely with cross-functional teams to create scalable, dependable, and secure solutions that push boundaries
  • Stay current with industry trends and emerging technologies, thereby ensuring that our solutions remain innovative and ahead of the curve
Read More
Arrow Right

Pyspark Technical Lead

We are seeking a highly skilled and motivated Data Engineer to join our dynamic ...
Location
Location
India , Chennai
Salary
Salary:
Not provided
https://www.soprasteria.com Logo
Sopra Steria
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proficiency in Advanced SQL (Window functions), Spark Architecture, Pyspark or Scala with Spark, Hadoop
  • Proven expertise in designing and deploying data pipelines
  • Strong problem-solving skills and ability to work effectively in a collaborative team environment
  • Excellent communication skills and ability to translate technical concepts to non-technical stakeholder
Job Responsibility
Job Responsibility
  • Work in tandem with Data Scientists to design, develop, and implement machine learning pipelines
  • Utilize PySpark for data processing, transformation, and preparation for model training
  • Leverage AWS EMR and S3 for scalable and efficient data storage and processing
  • Implement and manage ETL workflows using Streamsets for data ingestion and transformation
  • Design and construct pipelines to deliver high-quality training and inference datasets
  • Collaborate with cross-functional teams to ensure smooth deployment and real-time/near real-time inferencing capabilities
  • Optimize and fine-tune pipelines for performance, scalability, and reliability
  • Ensure IAM policies and permissions are appropriately configured for secure data access and management
  • Implement Spark architecture and optimize Spark jobs for scalable data processing
What we offer
What we offer
  • Inclusive and respectful work environment
  • Open positions for people with disabilities
  • Fulltime
Read More
Arrow Right

Technical Team Lead – LLM Systems

We’re hiring a hands-on Technical Team Lead to join our core LLM engineering tea...
Location
Location
India , Delhi NCR
Salary
Salary:
Not provided
balbix.com Logo
Balbix
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong CS fundamentals (B.Tech/M.Tech or equivalent)
  • 5+ years of backend or systems engineering experience
  • Experience with LLM orchestration tools like LangGraph, LangChain, or Bedrock agents
  • Deep Python skills with experience in async and event-driven programming
  • Proven track record shipping and maintaining production systems
  • Ability to work across layers — prompt logic, orchestration, infrastructure
Job Responsibility
Job Responsibility
  • Architect and implement LangGraph-powered workflows and Bedrock-based inference
  • Collaborate closely with the founder, and with the head of AI on system design and product strategy
  • Build and manage stateful agent flows, tool orchestration, retries, and memory handling
  • Debug real-world issues across prompts, agent logic, and runtime behavior
  • Mentor and lead an initial team of 5 engineers, shaping engineering best practices
  • Own the performance, cost-efficiency, and observability of LLM pipelines
What we offer
What we offer
  • Competitive salary
  • Meaningful equity
  • Fast-moving builder culture
  • Fulltime
Read More
Arrow Right

Data Science Lead, Guest Funnel Science

This tech lead will be at the core of using AI and casual inference to understan...
Location
Location
United States
Salary
Salary:
194000.00 - 240000.00 USD / Year
airbnb.com Logo
Airbnb
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Causal inference expertise with marketplace experience
  • Preferred domain experience in search, UX discovery, personalized evidence systems
  • Advanced degree in Computer Science, Statistics, Econometrics or related field
  • 9+ years of industry experience with a PhD (or 12+ years with a Masters)
  • Strong in communication with XFN partners in product, engineering, and design to enable data-driven product development with a focus on the user experience
  • Expert in at least one programming language for data analysis (Python or R) with familiarity in SQL
  • Comfort with developing proof-of-concept prototypes
  • Passionate about AI and possessing a learner’s mindset towards LLMs and dynamic systems
  • Proven ability to succeed in both collaborative and independent work environments
  • Demonstrated willingness and track record of engagement with the technical community
Job Responsibility
Job Responsibility
  • Learn: Develop deep understanding of how guests navigate and re-engage with our app via analysis, research, and by leveraging granular user action and sequence datasets
  • Partner: With product and engineering, drive technical frameworks and science leadership to explore innovative paradigms for detecting revealed preferences and quantifying online frictions
  • Build: Write code for prototypes to detect and quantify taxonomy of guest preferences via iterative development of data frameworks, models and artefacts derived from AI toolkits
  • Evaluate: Assess assumptions and efficacy of derived guest preferences via measurement and validating hypotheses linked to online guest action and engagement. Setup experiments and data feedback loops to own a high bar over continuous impact
  • Influence: Regularly present findings and recommendations to leadership audiences to inform strategy and cross-functional deliverables
What we offer
What we offer
  • bonus
  • equity
  • benefits
  • Employee Travel Credits
  • Fulltime
Read More
Arrow Right

Director of AI Engineering

We are entering a hyper-growth phase of AI innovation and are hiring a Director ...
Location
Location
Canada; United States
Salary
Salary:
300000.00 - 450000.00 USD / Year
apollo.io Logo
Apollo.io
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10–15+ years in software engineering, with significant leadership experience owning AI/ML or applied LLM systems at scale
  • Proven history shipping LLM-powered features, agentic workflows, or AI assistants used by real customers in production
  • Deep understanding of LLM orchestration frameworks (LangChain, LlamaIndex), RAG pipelines, vector search, embeddings, and prompt engineering
  • Expert in backend & distributed systems (Python strongly preferred) and cloud infrastructure (AWS/GCP)
  • Strong experience with telemetry, observability, and cost-aware real-time inference optimizations
  • Demonstrated ability to lead senior engineers, define technical roadmaps, and deliver outcomes aligned to business metrics
  • Experience building or scaling teams working on experimentation, optimization, personalization, or ML-powered growth systems
  • Exceptional ability to simplify complex problems, set clear standards, and drive alignment across Product, Data, Design, and Engineering
  • Strong product sense, ability to weigh novelty vs. impact, focus on user value, and prioritize speed with guardrails
  • Fluent in integrating AI tools into engineering workflows for code generation, debugging, delivery velocity, and operational efficiency
Job Responsibility
Job Responsibility
  • Define the multi-year technical vision for Apollo’s AI stack, spanning agents, orchestration, inference, retrieval, and platformization
  • Prioritize high-impact AI investments by partnering with Product, Design, Research, and Data leaders to align engineering outcomes with business goals
  • Establish technical standards, evaluation criteria, and success metrics for every AI-powered feature shipped
  • Lead the architecture and deployment of long-horizon autonomous agents, multi-agent workflows, and API-driven orchestration frameworks
  • Build reusable, scalable agentic components that power GTM workflows like research, enrichment, sequencing, lead scoring, routing, and personalization
  • Own the evolution of Apollo’s internal LLM platform for high-scale, low-latency, cost-optimized inference
  • Oversee model-driven experiences for natural-language interfaces, RAG pipelines, semantic search, personalized recommendations, and email intelligence
  • Partner with Product & Design to build intuitive conversational UX that hides underlying complexity while elevating user productivity
  • Implement rigorous evaluation frameworks, including offline benchmarking, human-in-the-loop review, and online A/B experimentation
  • Ensure robust observability, monitoring, and safety guardrails for all AI systems in production
What we offer
What we offer
  • Equity
  • Company bonus or sales commissions/bonuses
  • 401(k) plan
  • At least 10 paid holidays per year
  • Flex PTO
  • Parental leave
  • Employee assistance program and wellbeing benefits
  • Global travel coverage
  • Life/AD&D/STD/LTD insurance
  • FSA/HSA
  • Fulltime
Read More
Arrow Right
New

Research Engineer, RealTime AI, MSL PAR

We are seeking research engineers to join the Product and Applied Research (PAR)...
Location
Location
United States , Bellevue, WA
Salary
Salary:
257000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • 2+ years of industry experience in LLM/NLP, audio, or related AI/ML models
  • Experience as a formal technical lead, leading major technical initiatives with cross functional partners to impact, and/or influencing strategy across multiple teams
  • Skilled in model training, data, or inference & efficiency for LLMs
  • Experience building products/systems based on machine learning, reinforcement learning and/or deep learning methods
  • Programming experience in Python and hands-on experience with frameworks like PyTorch
Job Responsibility
Job Responsibility
  • Collaborate with cross-functional teams to develop Meta’s AI Characters products
  • Lead the development of new algorithms and systems for LLM post-training, evaluation and efficiency
  • Support creative data sourcing, high-quality post-training data curation, and scale and optimize data pipelines for large language models (LLMs)
  • Develop and integrate models,orchestrations and RAGs in production
  • Analyze and interpret experimental results, iterate on model architectures, and drive continuous improvement
  • Lead complex technical projects end-to-end
What we offer
What we offer
  • bonus
  • equity
  • benefits
Read More
Arrow Right

Senior Principal Technical Program Manager - ML Platform

Location
Location
Salary
Salary:
231300.00 - 301975.00 USD / Year
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience on software teams as Development Manager, Technical Product Manager or TPM leading technical platforms areas
  • Deep domain experience in AI and/or Search. Example: Model Inference, Model Evaluation, Model Training, LLM Ops, Semantic Search, Search Relevance, etc.
  • Partner with Engineering in defining direction, strategy and execution at Platform level
  • Strategic thinking and ability to understand business objectives to translate them into technical problems and programs.
  • Technical understanding of systems involved. Willingness to develop domain expertise in the area they operate - storage, networking, authentication, capacity management, service deployments, etc.
  • TPMs are not expected to write or read code, but are expected to understand system flows, block architectures, APIs and such.
  • Experience defining and running end-to-end complex technical programs
  • Strong leadership, organizational, and communication skills
Job Responsibility
Job Responsibility
  • Understand and stay up-to-date on latest innovations in AI and Search. Partner closely with engineering teams to translate these into practical platform evolution for Atlassian bringing value to our customers.
  • Analyze business objectives, customer needs, product adoption inhibitors and opportunities, industry trends, and based on these, in close collaboration with your stakeholders, define a long-term strategy and roadmap for your platform and product components.
  • Understand business objectives and translate them into technical systems problems that need to be prioritized solved in the current business environment.
  • Define specific systems programs and create a plan of action for realizing those programs. Such programs could be around capacity planning, migration efforts, high availability, network architecture, performance optimization, reliability improvements and more.
  • Use your technical understanding of Atlassian and related systems to partner with and influence engineers and architects in making progress on these problems.
  • Responsible for taking a systematic approach to engineering problems. This includes: prioritizing tasks, scoping out the project, defining objectives, and making consistent progress against each of these.
  • Be accountable for the success of these technical programs by managing the entire lifecycle from initiation to forecasting, budgeting, scheduling, etc.
  • Manage complex dependencies and projects with a broad scope across the company
What we offer
What we offer
  • health and wellbeing resources
  • paid volunteer days
Read More
Arrow Right

Tech Lead

At JFrog, we’re reinventing DevOps and MLOps to help the world’s greatest compan...
Location
Location
Israel , Netanya/Tel Aviv
Salary
Salary:
Not provided
jfrog.com Logo
JFrog
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years building large-scale backend or distributed systems
  • Strong foundation in distributed systems (consistency, replication, concurrency, fault tolerance)
  • Proficiency in Java / Go or similar languages
  • Hands-on experience with high-performance, scalable, and reliable systems
  • Ability to lead design discussions and influence technical direction across teams
  • Curiosity and willingness to work with ML systems and workload patterns
  • Experience with Kubernetes, container orchestration, or cloud-native infrastructure
  • Thrive in a collaborative, ownership-driven engineering culture
Job Responsibility
Job Responsibility
  • Design and evolve components for managing and distributing ML/AI models and artifacts at scale
  • Extend the platform to support reliable, high-performance inference and training workflows
  • Lead cross-team technical initiatives and serve as a reference for distributed systems and ML infra design
  • Write maintainable, high-quality code in performance-critical areas
  • Mentor engineers and drive strong engineering practices
  • Collaborate with adjacent teams to ensure seamless end-to-end ML platform behavior
  • Improve the reliability, efficiency, and observability of core services
Read More
Arrow Right