CrawlJobs Logo

Staff Software Engineer, Model LifeCycle

United States, San Francisco 208725.00 - 253000.00 USD / Year · Job Posted February 21, 2026
Apply Position
Job Link Share

Job Description

The Staff Software Engineer for the Model LifeCycle team will play a key role in building a comprehensive managed platform for the entire application development lifecycle, with a specific focus on leveraging Machine Learning models, including Large Language Models (LLMs). This role offers significant scope for ownership — you'll be implementing and contributing to the design of core systems.

Job Responsibility

  • Contribute to fine-tuning systems for large foundation models (SFT, PEFT, LoRA, adapters), including multi-node orchestration, checkpointing, failure recovery, and cost-efficient scaling
  • Implement and maintain end-to-end training pipelines for Large Language Models
  • Contribute to distillation and reinforcement learning pipelines (e.g., preference optimization, policy optimization, reward modeling)
  • Develop and maintain agent execution infrastructure
  • Implement features for dataset, model, and experiment management: versioning, lineage, evaluation, and reproducible fine-tuning at scale
  • Work closely with Principal Engineers, product, business, and platform teams to implement the core abstractions and APIs of the system
  • Contribute to architectural decisions around training runtimes, scheduling, storage, and model lifecycle management
  • Engage with the open-source LLM ecosystem

Requirements

  • Bachelor's or Master's degree in Computer Science, Engineering, or a related field
  • 8-10+ years of industry experience with demonstrated history of consistent success leading a varied portfolio of initiatives across your function
  • Proven track record of delivering production features on time
  • Experience in using cloud-based services, such as, elastic compute, object storage, virtual private networks, managed database, etc.
  • Experience with Generative AI (Large Language Models, Multimodal)
  • Experience with AI infrastructure, including training, inference

Nice to have

  • Proficiency in Golang or Python for large-scale, production-level services
  • Experience contributing to open-source AI projects
  • Experience with performance optimizations on GPU systems and inference frameworks
  • Experience working with PyTorch
  • Experience with training and fine-tuning LLMs
  • Proactive and collaborative approach with the ability to work independently
  • Strong communication and interpersonal skills
  • Passion for building cutting-edge AI products and solving challenging technical problems

What we offer

  • Restricted Stock Units in a fast growing, well-funded technology company
  • Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
  • Employer contributions to HSA accounts
  • Paid Parental Leave
  • Paid life insurance, short-term and long-term disability
  • Teladoc
  • 401(k) with a 100% match up to 4% of salary
  • Generous paid time off and holiday schedule
  • Cell phone reimbursement
  • Tuition reimbursement
  • Subscription to the Calm app
  • MetLife Legal
  • Company paid commuter benefit
  • $300/month

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Staff Software Engineer, Model LifeCycle

8 matching positions

New

Staff Software Engineer

You'll help build Confluent Cloud's AI capabilities — the layer that lets custom...
Location
Location
Canada
Salary
Salary:
225100.00 - 260500.00 CAD / Year
confluent.io Logo
Confluent
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of significant experience designing, building, and operating distributed systems or cloud-native backend infrastructure in production
  • Strong working knowledge of Kubernetes and distributed-systems patterns (control loops, API servers, high-scale control planes), plus the fundamentals — containerization, networking, resource isolation
  • Proficiency in at least one of Go, Java, or Python, and the willingness to work across all three
  • A track record of leading cross-team technical work: turning ambiguous requirements into designs others can rally behind
  • Excellent written and verbal communication — you can write a design doc that aligns people who don't report to you
Job Responsibility
Job Responsibility
  • Design and build the backend services (primarily Go, Java, and Python) that run AI and model inference on real-time data
  • Own features end to end — drafting the design, aligning stakeholders inside and outside the team, and driving the decision to a conclusion
  • Make the technical calls on systems that span teams: model lifecycle, inference routing, and agent execution
  • Own the quality of what you ship — code, test coverage, documentation, operability, and rollout safety
  • Make the engineers around you better through code review, design feedback, and being someone the team trusts with ambiguous, cross-cutting work
  • Participate in on-call for the services your team owns, and help keep the team's processes and rituals healthy
What we offer
What we offer
  • Remote-First Work
  • Robust Insurance Benefits
  • Flexible Time Away
  • The Best Teammates
  • Experience Ambassadors
  • Open and Honest Culture
  • Well-Being and Growth
  • Offers Equity
  • Fulltime
Read More
Arrow Right

Staff, Software Engineer

Are you passionate about building well-governed, cost-efficient cloud platforms ...
Location
Location
United States of America , Denver
Salary
Salary:
121000.00 - 242000.00 USD / Year
walmart.com Logo
Walmart
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Option 1: Bachelor's degree in computer science, computer engineering, computer information systems, software engineering, or related area and 4 years experience in software engineering or related area
  • Option 2: 6 years experience in software engineering or related area
Job Responsibility
Job Responsibility
  • Design and implement automated FinOps solutions to optimize cloud spend across AWS, GCP, and Azure
  • Build and maintain cost allocation, tagging enforcement, and usage governance frameworks across multiple clouds
  • Develop automation for savings analysis, rightsizing, idle resource detection, and anomaly identification across providers
  • Partner with Finance and Engineering teams to translate technical usage into actionable financial insights and forecasts
  • Support and execute cloud migration initiatives from AWS to GCP and Azure, including workload analysis, readiness assessment, migration planning, and post-migration optimization
  • Compare cloud-native services and pricing models to guide migration decisions and architectural tradeoffs
  • Apply best-practice multi-cloud architectures with an emphasis on cost efficiency, security, and operational sustainability
  • Help define standards, landing zones, and reference architectures for AWS, GCP, and Azure
  • Build and maintain scalable Infrastructure as Code using Terraform and Terragrunt across multiple cloud providers
  • Develop Python and Bash automation for operational workflows, governance, migrations, and infrastructure lifecycle management
What we offer
What we offer
  • Medical, vision and dental coverage
  • 401(k)
  • stock purchase
  • company-paid life insurance
  • PTO (including sick leave)
  • parental leave
  • family care leave
  • bereavement
  • jury duty
  • voting leave
  • Fulltime
Read More
Arrow Right

Staff Software Engineer – Secondary Driving System

At General Motors, our Embodied AI teams are redefining what’s possible in drive...
Location
Location
United States , Sunnyvale
Salary
Salary:
218800.00 - 335300.00 USD / Year
gm.com Logo
General Motors
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • BS, MS, or PhD in Computer Science, Robotics, Electrical/Mechanical Engineering, or a related field
  • or equivalent practical experience
  • 8+ years of professional software engineering experience building production systems in robotics, autonomous vehicles, or other complex real‑time/control systems, including significant experience in perception and/or prediction
  • Strong proficiency in modern C++ (e.g., C++14/17 or later) in large, multi‑contributor codebases
  • experience using Python for tooling, data analysis, and ML experimentation
  • Demonstrated experience leading technical design and delivery of perception, tracking, or prediction systems in real‑time environments, including: Multi‑sensor fusion across camera, radar, and/or lidar (e.g., object‑level fusion, occupancy/freespace fusion, early/late fusion architectures)
  • Classical computer vision and geometric algorithms (feature extraction, multi‑view geometry, stereo, SfM, SLAM/visual odometry)
  • Multi‑object tracking (Kalman/extended/unscented filters, track‑to‑track fusion, track lifecycle management)
  • Motion prediction for road users (analytical kinematic models, maneuver‑based prediction, or learned trajectory forecasting models)
  • Proven track record of delivering reliable, high‑quality robotics or autonomous driving software to production, including: Testing strategies (simulation, HIL, scenario‑based testing, regression suites)
Job Responsibility
Job Responsibility
  • Serve as a technical lead for SDS software across multiple components of the stack, setting direction for algorithms, architectures, and system interfaces across features and releases
  • Own the end‑to‑end technical strategy for key SDS behaviors and features, spanning perception/prediction integration, planning, controls, and system‑level interactions
  • Balance hands‑on technical work with cross‑team leadership: you will still design and implement critical components in modern C++, while also guiding other senior and mid‑level engineers to deliver at scale
  • Collaborate closely with experts in perception, tracking, prediction, state estimation, localization, mapping, planning, controls, systems engineering, and safety to deliver robust, fail‑operational behaviors for Super Cruise and future products
  • Define technical vision & architecture
  • Set the technical direction for SDS software components with a focus on correctness, robustness, and predictable runtime behavior under tight latency and compute budgets
  • Architect scalable, modular multi‑sensor perception pipelines for camera, radar, and lidar, including detection, classification, lane/road feature extraction, freespace/occupancy, and environmental context
  • Establish and evolve interfaces and contracts between perception/prediction and upstream/downstream components (state estimation, localization, mapping, planning, controls, autonomy management)
  • Lead high‑impact projects
  • Lead design and delivery of multi‑object tracking systems (e.g., Kalman/extended/unscented filters, IMM, probabilistic data association, track lifecycle management) that provide stable, high‑quality tracks under real‑world noise and edge cases
What we offer
What we offer
  • medical
  • dental
  • vision
  • Health Savings Account
  • Flexible Spending Accounts
  • retirement savings plan
  • sickness and accident benefits
  • life insurance
  • paid vacation & holidays
  • tuition assistance programs
  • Fulltime
Read More
Arrow Right

Staff Software Engineer, Search & Distributed Systems

We are looking for a Staff Software Engineer who would thrive on being accountab...
Location
Location
USA , Buffalo
Salary
Salary:
165000.00 - 260000.00 USD / Year
acvauctions.com Logo
ACV Auctions
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of software engineering experience, with at least 3+ years operating at a Senior or Staff level focusing on distributed systems and high-throughput platforms.
  • Deep, authoritative knowledge of Elasticsearch internals. You have managed large-scale clusters and deeply understand mapping, analysis, query optimization, cluster state management, and split-brain mitigation.
  • Proficiency in the systems upstream and downstream of Search. You have hands-on experience with Kubernetes (EKS/GKE), API Gateway/BFF architectures, and event streams (Kafka).
  • A proven track record of implementing fault-tolerant patterns (retries, rate limiting, circuit breaking, dead letter queues) in microservice architectures.
  • Expert-level ability to instrument systems and diagnose complex performance issues using modern observability stacks (Datadog, Prometheus, Grafana, OpenTelemetry).
  • Strong communication skills with a proven ability to influence cross-functional teams, build consensus around architectural decisions (the Knoster model!), and mentor mid-level and senior engineers.
Job Responsibility
Job Responsibility
  • Architect for Scale: Design, configure, and scale our Elasticsearch clusters. You will define our global strategies for shard routing, Index Lifecycle Management (ILM), heap tuning, and data tiering to support massive auction throughput.
  • Master the Failure Modes: Anticipate and engineer away points of failure. You will design circuit breakers, implement backpressure mechanisms, and tune asymmetric timeouts to prevent retry storms between our BFFs, K8s services, and the Search layer.
  • Expert Troubleshooting & IR: Act as the ultimate technical escalation point for complex, cross-system performance degradation. You will dive deep into JVM metrics, Garbage Collection pauses, K8s network bottlenecks, and slow logs to uncover and remediate root causes.
  • Holistic System Ownership: Manage the entire data lifecycle. You will optimize the ingestion pipelines syncing our event datastreams driven by producers and consumers (Kafka) to Elasticsearch, ensuring eventual consistency and data integrity at scale.
  • Drive Engineering Excellence: Draft authoritative architectural Blueprints, SOPs, and Runbooks. You will elevate the surrounding engineering culture by coaching teams on distributed systems design, observability best practices, and incident management.
  • Modernize & Innovate: Scan the horizon for emerging technologies. You will help evaluate and integrate next-generation search capabilities (e.g., Vector Search, RAG architectures) to support our broader AI and machine learning initiatives.
What we offer
What we offer
  • Multiple medical plans including a high deductible, low cost health plan
  • Company-sponsored (paid) Short-Term Disability, Long-Term Disability, and Life Insurance
  • Comprehensive optional benefits such as Dental, Vision, Supplemental Life/AD&D, Legal/ID Protection, and Accident and Critical Illness Insurance
  • Generous paid time off options, including uncapped vacation days, the greater of 3 paid sick days or in accordance with the applicable state or local paid sick leave law, 6 paid company holidays, 2 floating holidays, parental leave, bereavement leave, jury duty leave, voting leave, and other forms of paid leave as required by applicable law or regulation
  • Employee Stock Purchase Program with additional opportunities to earn stock in the Company
  • Retirement planning through the Company's 401(k)
  • Fulltime
Read More
Arrow Right

Staff Software Engineer - Data

Uber's mission is to reimagine the way the world moves for the better. Here, bol...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
uber.com Logo
Uber
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Education: Bachelor's or Master's degree in Computer Science, Engineering, or a related technical field
  • Experience Level: 10+ years of hands-on experience in Data Engineering, with a proven track record of delivering results at a Staff Engineer level (or equivalent scope) at a premier technology company
  • Expert SQL Competency: 10+ years of hands-on, expert-level SQL experience
  • Data Modeling & Warehousing: Extensive experience designing dimensional data models (Star/Snowflake schemas) and data warehouses
  • Software Engineering Fundamentals: Proficiency in at least one high-level programming language (Java, Scala, Python, or Go)
  • Big Data Ecosystem: 10+ years of experience working with distributed data systems (Hadoop, Hive, Spark) and MPP databases (Vertica, Redshift, etc.)
  • End-to-End Architecture: Experience designing full-lifecycle data systems, including logging, ingestion (Batch/Stream), quality frameworks, and monitoring
  • Technical Leadership: Excellent written and verbal communication skills
  • Mentorship & Growth: A strong passion for driving engineering excellence and mentoring engineers
Job Responsibility
Job Responsibility
  • Own the Technical Vision: You will own and drive the technical roadmap for the Payments data ecosystem, balancing long-term architectural scalability with short-term business critical deliveries
  • Navigate Ambiguity: Actively identify strategically important problems and inefficiencies without waiting for instruction
  • Drive Alignment: See the big picture and drive consensus on complex technical decisions across the organization
  • Architect at Scale: Design and implement resilient, cost-effective, and high-scale batch and streaming pipelines that power critical support operations and financial analytics
  • Elevate Data Standards: Define and enforce robust data modeling standards, data contracts, and governance frameworks
  • Optimize & Automate: Identify opportunities to automate manual workflows (like SLA tracking and issue detection) and optimize infrastructure efficiency to lower TCO
  • Raise the Bar: Champion sustainable engineering practices
  • Be a Trusted Mentor: Serve as a humble mentor and technical advisor to both junior engineers and peer leaders
  • Force Multiplier: Act as a role model for judgment and responsibility
  • Fulltime
Read More
Arrow Right

Sr Staff Software Engineer

As a Senior Staff Software Engineer, you will be a key contributor to Teradata’s...
Location
Location
United States , San Diego
Salary
Salary:
156400.00 - 234700.00 USD / Year
teradata.com Logo
Teradata
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10 to 12 years of working experience as a Software Developer
  • Experience with large-scale, enterprise grade software development in C/C++ programming language, including: Low level TCP/IP protocol, Inter-process communications, Debugging multi-threaded applications
  • Strong data structure, multi-threading and algorithms fundamentals
  • Multi-cloud and On Premises platforms exposure
  • Very good understanding of common public cloud technologies - storage, communication, and security
  • Knowledge of SQL and understanding of relational databases, including, Referential Integrity, Columnar vs. Row Storage, Triggers, and Stored Procedures
  • Strong background in database internals and analytics through working exposure
  • Knowledge of diverse concepts and techniques for creating systems with High Availability and Resilience
  • Experience working on high-availability data replication solutions achieving near‑zero RTO and RPO
  • Knowledge of modern storage options, including objects stores, sharded data, and data replication techniques
Job Responsibility
Job Responsibility
  • Design, develop, test, and maintain Teradata’s In-Database Replication offering, a critical business continuity solution for its customers worldwide
  • span the full product development lifecycle, including requirement analysis, architecture and design, development, testing, and ongoing maintenance of new and existing features
  • engage with the associated technologies and environments necessary to ensure successful, high-quality product delivery
  • may also provide support for released products
What we offer
What we offer
  • healthcare
  • life and disability insurance plans
  • 401(k)-retirement savings plan
  • time-off programs
  • Fulltime
Read More
Arrow Right

Staff Software Engineer

Join the team as our next Staff Software Engineer in the Enterprise AI Engineeri...
Location
Location
Canada
Salary
Salary:
160320.00 - 200400.00 USD / Year
stytch.com Logo
Stytch
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Engineering, or a related field
  • 8+ years of experience in data engineering, software development, or a related field, with at least 3 years in a technical leadership role
  • Experience with full-stack development building web apps, using modern programming languages such as JavaScript, Typescript or React
  • Proven track record of architecting and delivering complex data projects at scale, with a deep understanding of data infrastructure and distributed systems
  • Strong understanding of data modeling, data warehousing, and ETL processes, with experience designing and optimizing data pipelines
  • Excellent communication and collaboration skills, with the ability to influence technical decisions and drive alignment across teams
  • Strong leadership skills, with a track record of mentoring and developing high-performing engineering teams
  • Demonstrated ability to thrive in a fast-paced, dynamic environment and deliver results under tight timelines
Job Responsibility
Job Responsibility
  • Co-lead the design and development of our software infrastructure, driving technical vision and strategy to ensure scalability, reliability, and performance
  • Drive the development of sophisticated, stateful web applications. You will oversee the integration of complex React-based front-ends with backend modular services, ensuring a seamless UI experience
  • Serve as developer leader in distributed systems, data technologies, with strong software engineering skills
  • Drive technical innovation and research to stay at the forefront of emerging data technologies and best practices
  • Mentor and elevate a team of high-performing engineers. You don’t just write great code
  • you foster a culture of technical excellence, helping senior and mid-level engineers level up through deep-dive code reviews and architectural workshops
  • Collaborate closely with cross-functional teams to understand business requirements and translate them into scalable and efficient technical solutions
  • Continuously adapt to the evolving JavaScript ecosystem to maximize engineering efficiency
  • Ensure data quality, integrity, and security throughout the data lifecycle, adhering to industry best practices and compliance standards
What we offer
What we offer
  • competitive pay
  • generous time off
  • ample parental and wellness leave
  • healthcare
  • a retirement savings program
  • incentive programs
  • commissions
  • equity grants
  • health and wellness benefits
  • retirement contributions
  • Fulltime
Read More
Arrow Right

Staff Software Engineer (Data)

As a Staff Software Engineer, you will be a technical leader in our Data Science...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
arrive.com Logo
Arrive
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Extensive history of building and scaling data-intensive applications in production, with a track record of leading technical initiatives from conception to deployment
  • Expert-level Python and its data ecosystem (Numpy, Pandas), including designing frameworks for data tasks
  • Deep understanding of distributed data processing engines like Apache Spark
  • A strong command of Linux, containers (Docker), and infrastructure as code for cloud deployments (AWS preferred)
  • A passion for elevating engineering standards through pair programming and detailed code reviews to help other engineers grow their technical depth
Job Responsibility
Job Responsibility
  • Architect and Implement: Own the technical roadmap for our Spark-based data processes to ensure our Airflow pipelines are performant, cost-effective, and scalable. You will enhance our existing services through hands-on development and solve any complex performance bottlenecks, concurrency issues, and systemic bugs
  • Drive Engineering Excellence: Define standards for efficient, testable and reusable Python code across the organisation that ensure our services remain reliable, robust, and easy for other engineers to extend
  • Bridge Strategy and Execution: Partner with Data Scientists to translate modeling requirements into high-performance production services. You will design the architectures necessary to meet sophisticated data-serving needs, ensuring our parking and EV products remain accurate and responsive at scale
  • Modernize Infrastructure: Evolve our infrastructure-as-code (AWS) and CI/CD pipelines to keep up with cutting-edge approaches. You will personally contribute to the automation and observability patterns that allow us to deploy fresh data and production services with high confidence and zero downtime
  • Advance Data Capabilities: Lead the hands-on development of platform enhancements, such as establishing feature stores for machine learning and building automated data monitoring systems to ensure data integrity and model reproducibility
  • Scale AI Practices: Lead the adoption of AI throughout the software development lifecycle, evolving our internal coding practices while ensuring systems remain reliable and maintainable
What we offer
What we offer
  • Flexible working - hybrid home and office-based opportunities
  • Paid Leave if you participate in an event for Charity
  • 25 Days holiday entitlement
  • An enhanced Workplace Pension Scheme - 5% by Arrive, 3% by you
  • Private Medical Health Insurance
  • Fantastic wellbeing programmes, including On-site Sports massages, Reiki and Head massages every week
  • Discounted gym membership
  • Access to Blue Call, a mental health support platform
  • Enhanced Maternity and Paternity offering
  • Fulltime
Read More
Arrow Right