CrawlJobs Logo

Software Engineer: ML Infra

generalistai.com Logo

Generalist AI

Location Icon

Location:
United States , San Mateo

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

200000.00 - 350000.00 USD / Year

Job Description:

Generalist trains very large robot foundation models. This requires utilizing very large numbers of the latest generation GPU hardware and infrastructure (currently Nvidia) to run distributed training jobs and researcher experiments. We have extreme requirements on storage and data loading infrastructure that requires maximizing cloud infrastructure and custom solutions. You will also own inference infrastructure. For our robots this is a fleet of on-prem GPUs attached to robots that have extreme real-time and latency budgets in compute constrained environments.

Job Responsibility:

  • Owning our GPU compute fleets
  • Ensure our GPUs are easy for researchers to use and maximally utilized
  • Optimizing and improving ML data loading transport and storage in highly distributed fully utilized environments
  • Orchestration of robot inference fleets

Requirements:

  • Have managed large fleets of GPUs doing large-scale, long-term, highly distributed training runs or inference
  • Deep experience in Slurm or Kubernetes for ML workload orchestration
  • Have build high-scale ML data loaders and preparation systems
  • Deeply understand every layer of the ML hardware, storage, and networking stacks
  • Have experience in the NVidia GPU ecosystem
What we offer:

Offers Equity

Additional Information:

Job Posted:
February 18, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Software Engineer: ML Infra

Senior Software Engineer – ML Model Compliance & Automation

We are seeking a highly skilled and motivated Senior Software Engineer to lead t...
Location
Location
India , Jaipur
Salary
Salary:
Not provided
infoobjects.com Logo
InfoObjects
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience Required: 3 - 7 yrs
  • GoLang (preferred)
  • Python (preferred)
  • Bash
  • MLOps Tools: KitOps, MLModelCI, MLflow, ONNX, TensorFlow, PyTorch, Docker
  • SBOM & Security: Syft, Grype, Trivy, CycloneDX, SPDX
  • CI/CD: GitHub Actions, GitLab CI, Jenkins, ArgoCD
  • Infra: Kubernetes, Docker, Helm, Terraform
  • Cloud: AWS, GCP, Azure (EKS/GKE/ECS preferred)
  • Version Control: Git, GitOps
Job Responsibility
Job Responsibility
  • Model Packaging & Artifact Management: Design and implement workflows for packaging ML models using KitOps, ONNX, MLflow, or TensorFlow SavedModel
  • Manage model artifact versioning, registries, and reproducibility
  • Ensure artifact integrity, consistency, and traceability across CI/CD pipelines
  • Model Profiling & Optimization: Automate model profiling (latency, size, ops) using MLModelCI, TorchServe, or ONNX Runtime
  • Apply quantization, pruning, and format conversions (e.g., FP32→INT8) for optimization
  • Embed profiling and optimization checks into CI/CD pipelines to assess deployment readiness
  • Compliance & SBOM Generation: Develop pipelines to generate and validate SBOMs for ML models
  • Implement compliance checks for licensing, vulnerabilities, and security using CycloneDX, SPDX, Syft, or Trivy
  • Validate schema, dependencies, and runtime environments for production readiness
  • Cloud Integration & Deployment: Automate model registration, endpoint creation, and monitoring setup in AWS/GCP/Azure
  • Fulltime
Read More
Arrow Right

Software Engineer, Systems ML - SW/HW Co-design

Meta is seeking an AI Software Engineer to join our Research & Development teams...
Location
Location
United States , Sunnyvale
Salary
Salary:
257000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • Specialized experience in one or more of the following machine learning/deep learning domains: Hardware accelerators architecture, GPU architecture, machine learning compilers, or ML systems, AI infrastructure, high performance computing, performance optimizations, or Machine learning frameworks (e.g. PyTorch), numerics and SW/HW co-design
  • Experience developing AI-System infrastructure or AI algorithms in C/C++ or Python
Job Responsibility
Job Responsibility
  • Apply relevant AI infrastructure and hardware acceleration techniques to build & optimize our intelligent ML systems that improve Meta’s products and experiences
  • Goal setting related to project impact, AI system design, and infrastructure/developer efficiency
  • Directly or influencing partners to deliver impact through deep, thorough data-driven analysis
  • Drive large efforts across multiple teams
  • Define use cases, and develop methodology & benchmarks to evaluate different approaches
  • Apply in depth knowledge of how the ML infra interacts with the other systems around it
  • Mentor other engineers / research scientists & improve the quality of engineering work in the broader team
What we offer
What we offer
  • bonus
  • equity
  • benefits
Read More
Arrow Right

Software Engineer, Systems ML - SW/HW Co-design

Meta is seeking an AI Software Engineer to join our Research & Development teams...
Location
Location
United States , Sunnyvale
Salary
Salary:
217000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • Specialized experience in one or more of the following machine learning/deep learning domains: Hardware accelerators architecture, GPU architecture, machine learning compilers, or ML systems, AI infrastructure, high performance computing, performance optimizations, or Machine learning frameworks (e.g. PyTorch), numerics and SW/HW co-design
  • Experience developing AI-System infrastructure or AI algorithms in C/C++ or Python
Job Responsibility
Job Responsibility
  • Apply relevant AI infrastructure and hardware acceleration techniques to build & optimize our intelligent ML systems that improve Meta’s products and experiences
  • Goal setting related to project impact, AI system design, and infrastructure/developer efficiency
  • Directly or influencing partners to deliver impact through deep, thorough data-driven analysis
  • Drive large efforts across multiple teams
  • Define use cases, and develop methodology & benchmarks to evaluate different approaches
  • Apply in depth knowledge of how the ML infra interacts with the other systems around it
  • Mentor other engineers / research scientists & improve the quality of engineering work in the broader team
What we offer
What we offer
  • bonus
  • equity
  • benefits
Read More
Arrow Right

AI Engineering Leader—Robotics Innovation

Ready to architect the future of “physical AI”? Lead the buildout of next-gen da...
Location
Location
United States , Burlington
Salary
Salary:
Not provided
ndt.com Logo
Nondestructive & Visual Inspection
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years driving data infra, ML systems, or end-to-end AI engineering at scale
  • Hands-on with orchestration tools, feature stores, and cloud infra (AWS, GCP, Azure)
  • Deep software engineering skills (Python, Scala, Java) & streaming frameworks (Spark, Flink)
  • Background with robotics, CV data, and edge deployment preferred
Job Responsibility
Job Responsibility
  • Spearhead full-stack data & ML pipelines for sensor, video, and telemetry data powering real-time robotics and vision systems
  • Design scalable infrastructure with strong foundations-schema, lineage, validation, and anomaly detection for embedded AI
  • Integrate edge intelligence, observability, & feedback loops into robotics and perception
  • Set technical standards, mentor talent, and align architecture to real-world product goals
What we offer
What we offer
  • base + bonus + equity
Read More
Arrow Right

Senior ML Engineer

As a Senior ML Engineer on the Content Platform team, you will help build the co...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
uber.com Logo
Uber
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of professional software engineering experience
  • At least 3+ years working on machine-learning or information-retrieval systems in production, including ownership of reliability, observability, and quality metrics
  • Hands-on experience with retrieval and relevance technologies, such as semantic search, embeddings, ranking algorithms, RAG pipelines, or large-scale content indexing
  • Strong proficiency in at least one modern programming language (e.g., Python, Java, Go, or C++)
  • Demonstrated experience building end-to-end ML systems at scale, from offline experimentation and evaluation to online deployment, monitoring, and feedback loops, ideally in a customer-facing or platform environment
Job Responsibility
Job Responsibility
  • Define and implement observability and evaluation frameworks to measure response quality, relevance, coverage gaps, latency, and failure modes across customer interactions
  • Develop and iterate on advanced retrieval, ranking, and coverage algorithms (e.g. semantic search, RAG improvements, content expansion strategies) to continuously improve answer relevance
  • Build automated feedback loops that surface insights from customer queries back to content authors and partner teams, enabling proactive identification and resolution of coverage issues
  • Collaborate closely with product, ML, infra, and content stakeholders to translate ambiguous problem spaces into measurable improvements and production-ready systems with real customer impact
Read More
Arrow Right

Staff Software Engineer, Backend (AI Platform)

Cresta is on a mission to turn every customer conversation into a competitive ad...
Location
Location
United States
Salary
Salary:
Not provided
cresta.com Logo
Cresta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years writing production software
  • 2+ years focused on ML platform or infra
  • Expert Python (async, typing, packaging, performance)
  • Working Golang knowledge for systems components
  • Proven experience with one or more serving frameworks (e.g., vLLM, Triton, TorchServe)
  • Kubernetes and cloud-native ops
  • Solid grasp of distributed systems, networking, and container security
  • Culture of rigorous testing, code review, and continuous delivery
Job Responsibility
Job Responsibility
  • Own model serving: Design, build, and maintain low-latency, highly-available serving stacks for in-house ML model serving and integrating with LLM serving partners
  • Automate training pipelines: Orchestrate data prep, training, evaluation, and registry workflows on Kubernetes with solid MLOps practices
  • Optimize at scale: Profile and tune throughput, memory, and cost
  • introduce caching, sharding, batching, and GPU/CPU autoscaling where it pays off
  • Build platform primitives: Create reusable SDKs, templates, and CLI tools that let research and product teams ship models independently and safely
  • Raise the bar: Instrument deep observability (tracing, metrics, alerts), drive blameless post-mortems, and mentor engineers on production ML best practices
What we offer
What we offer
  • Comprehensive medical, dental, and vision coverage with plans to fit you and your family
  • Flexible PTO to take the time you need, when you need it
  • Paid parental leave for all new parents welcoming a new child
  • Retirement savings plan to help you plan for the future
  • Remote work setup budget to help you create a productive home office
  • Monthly wellness and communication stipend to keep you connected and balanced
  • In-office meal program and commuter benefits provided for onsite employees
Read More
Arrow Right
New

Senior Software Engineer

Microsoft’s Azure Data engineering team is leading the transformation of analyti...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science or related field AND 8+ years of Fullstack experience with AI, building software tools, internal platforms, automation systems, or developer productivity solutions
  • Hands on experience with AI / Agentic AI / LLM / MCP servers
  • Strong coding expertise in C#, TypeScript, Java, Python, or JavaScript
Job Responsibility
Job Responsibility
  • Architect, design, and implement internal tools and platforms that modernize engineering workflows for Power BI Desktop and Service
  • Build scalable, secure, and reliable systems that support productivity, automation, and compliance across engineering teams
  • Lead tool-driven automation that improves validation, release readiness, diagnostic workflows, and engineering observability
  • Integrate GenAI, telemetry, and ML-based intelligence into tools to enhance coverage, detection, and engineering insights
  • Modernize legacy systems by migrating them to cloud-native, maintainable architectures
  • Partner with PM, QA, Release Engineering, and Partner Teams (Desktop, Service, Infra, ADO/GitHub) to align tooling needs and engineering OKRs
  • Mentor engineers and foster a culture of craftsmanship, learning, and technical excellence
  • Influence engineering strategy through data-driven insights, metrics, and tooling analytics
  • Drive continuous improvement through measurable success indicators and iterative tool enhancements
  • Fulltime
Read More
Arrow Right
New

Software Engineer II

Microsoft’s Azure Data engineering team is leading the transformation of analyti...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science or related field AND 4+ years of fullstack experience building software tools, internal platforms, automation systems, or developer productivity solutions
  • Strong coding expertise in C#, TypeScript, Java, Python, or JavaScript
Job Responsibility
Job Responsibility
  • Develop internal tools and platforms that modernize engineering workflows for Power BI Desktop and Service
  • Implement scalable, secure, and reliable systems that support productivity, automation, and compliance across engineering teams
  • Integrate GenAI, telemetry, and ML-based intelligence into tools to enhance coverage, detection, and engineering insights
  • Modernize legacy systems by migrating them to cloud-native, maintainable architectures
  • Build telemetry‑driven insights to improve observability, diagnostics, and product quality
  • Develop automation solutions that support rapid validation cycles for weekly Power BI releases
  • Partner with PM, QA, Release Engineering, and Partner Teams (Desktop, Service, Infra, ADO/GitHub) to align tooling needs and engineering OKRs
  • Influence engineering strategy through data-driven insights, metrics, and tooling analytics
  • Drive continuous improvement through measurable success indicators and iterative tool enhancements
  • Fulltime
Read More
Arrow Right