Machine Learning Operations Lead Job at Together AI (San Francisco)

Senior Staff Machine Learning Engineer

Join the Affirm team as a Senior Staff Machine Learning Engineer and become a pi...

Location

United States

Salary:

232000.00 - 310000.00 USD / Year

Affirm

Expiration Date

Until further notice

Requirements

10+ years of experience researching, designing, deploying, and operating large-scale, real-time machine learning systems
Experience leading end-to-end ML system design, from data architecture and feature pipelines to model training, evaluation, and production deployment
Proficient in Python and ML frameworks, including PyTorch and XGBoost
Strong understanding of representation learning and embedding-based modeling
Deep expertise in neural network-based sequence modeling, including architectures such as Transformers, recurrent, or attention-based models, and multi-task learning systems
Deep hands-on experience with large-scale distributed ML infrastructure, including streaming or batch data ingestion, feature stores, feature engineering, training pipelines, model serving and inference infrastructure, monitoring, and automated retraining
Strong technical leadership: defining long-term strategy, guiding research direction, and aligning work across teams
Exceptional judgment, collaboration, and communication skills
Strong verbal and written communication skills that support effective collaboration across our global engineering organization
Equivalent practical experience or a Bachelor’s degree in a related field

Job Responsibility

Define and drive multi-year, multi-team technical strategy for machine learning across Affirm
Lead the design, implementation, and scaling of advanced ML systems
Partner deeply with ML Platform, product, engineering, and risk leadership to shape long-term modeling capabilities
Provide broad technical leadership across the ML organization, mentoring senior engineers
Drive clarity and alignment on ambiguous, high-stakes technical decisions
Champion operational and system excellence at the area level

What we offer

Equity rewards
Monthly stipends for health, wellness and tech spending
100% subsidized medical coverage, dental and vision for you and your dependents
Flexible Spending Wallets - generous stipends for spending on Technology, Food, various Lifestyle needs, and family forming expenses
Competitive vacation and holiday schedules
Employee stock purchase plan enabling you to buy shares of Affirm at a discount

Fulltime

Senior Staff Machine Learning Engineer

Join the Affirm team as a Senior Staff Machine Learning Engineer and become a pi...

Location

Canada

Salary:

206000.00 - 256000.00 CAD / Year

Affirm

Expiration Date

Until further notice

Requirements

10+ years of experience researching, designing, deploying, and operating large-scale, real-time machine learning systems
Experience leading end-to-end ML system design, from data architecture and feature pipelines to model training, evaluation, and production deployment
Proficiency in Python and ML frameworks, including PyTorch and XGBoost
Experience with ML tooling for training orchestration, experimentation, and model monitoring, such as Kubeflow, MLflow, or equivalent
Strong understanding of representation learning and embedding-based modeling
Deep expertise in neural network-based sequence modeling, including architectures such as Transformers, recurrent, or attention-based models, and multi-task learning systems
Deep hands-on experience with large-scale distributed ML infrastructure, including streaming or batch data ingestion, feature stores, feature engineering, training pipelines, model serving and inference infrastructure, monitoring, and automated retraining
Strong technical leadership: defining long-term strategy, guiding research direction, and aligning work across teams
Exceptional judgment, collaboration, and communication skills
Strong verbal and written communication skills

Job Responsibility

Define and drive multi-year, multi-team technical strategy for machine learning across Affirm
Lead the design, implementation, and scaling of advanced ML systems
Partner deeply with ML Platform, product, engineering, and risk leadership to shape long-term modeling capabilities
Provide broad technical leadership across the ML organization
Drive clarity and alignment on ambiguous, high-stakes technical decisions
Champion operational and system excellence at the area level

What we offer

Health care coverage - Affirm covers all premiums for all levels of coverage for you and your dependents
Flexible Spending Wallets - generous stipends for spending on Technology, Food, various Lifestyle needs, and family forming expenses
Time off - competitive vacation and holiday schedules
ESPP - An employee stock purchase plan enabling you to buy shares of Affirm at a discount

Fulltime

Principal Machine Learning Engineer

As a Principal Engineer on the ITSM team, you will get the opportunity to work o...

Location

Australia , Sydney

Salary:

Not provided

Atlassian

Expiration Date

Until further notice

Requirements

10+ years of total experience
Fluency in at least 1 scripting, OOP language
Solid understanding of machine learning concepts and algorithms, including supervised and unsupervised learning, deep learning, and NLP
Familiarity with popular ML libraries like sci-kit-learn, Keras/TensorFlow/PyTorch, numpy, pandas
Good Understanding of Machine Learning project lifecycle
Familiarity with MLOps and experience with scaling and deploying Machine Learning models

Job Responsibility

Work on cutting-edge AI and ML algorithms that help modernize IT Operations by reducing MTTR (mean time to resolve), and MTTI (Mean time to identify)
Use software development expertise to solve difficult problems, tackling complex infrastructure and architecture challenges
Lead engineers to drive involved projects from technical design to launch
Collaborate with other teams and internal customers to set expectations, gather input, and communicate results
Work with a distributed, world-class team shaping the future of AIOps
Master Generative AI
Become a machine learning maestro
Collaborate with diverse minds
Make a tangible impact
Routinely tackle complex architectural challenges

What we offer

Health coverage
Paid volunteer days
Wellness resources

Fulltime

Principal Machine Learning System Engineer

As a Principal Machine Learning Systems Engineer, you will lead the design, deve...

Location

United States , Seattle; San Francisco

Salary:

190300.00 - 305600.00 USD / Year

Atlassian

Expiration Date

Until further notice

Requirements

Lead the design, development, and deployment of scalable machine learning (ML) systems and infrastructure
Collaborate closely with data scientists, software engineers, and product teams
Optimize model performance
Ensure system reliability
Implement efficient data pipelines
Drive architectural decisions for high-performance computing and cloud-based ML platforms
Mentor junior engineers
Promote best practices in ML operations (MLOps)
Stay updated on emerging technologies

Job Responsibility

Translate complex ML models into production-ready solutions
Ensure scalability and security
Deliver robust, scalable, and efficient machine learning solutions that support business growth and innovation

What we offer

Health coverage
Paid volunteer days
Wellness resources

Fulltime

Senior Machine Learning Engineer

As a Senior Machine Learning Engineer in the Central AI team, you will build and...

Location

Australia , Sydney

Salary:

Not provided

Atlassian

Expiration Date

Until further notice

Requirements

Master or PhD in a quantitative subject (Statistics, Mathematics, Computer Science, Operations Research, or relevant work experience)
3+ years of related industry experience in the data science domain
Expertise in Python or Java with and the ability to write performant production-quality code, familiarity with SQL, knowledge of Spark and cloud data environments (e.g. AWS, Databricks)
Experience building and scaling machine learning models in business applications using large amounts of data
Ability to communicate and explain data science concepts to diverse audiences, craft a compelling story
Focus on business practicality and the 80/20 rule
very high bar for output quality, but recognize the business benefit of "having something now" vs "perfection sometime in the future"
Agile development mindset, appreciating the benefit of constant iteration and improvement

Job Responsibility

Build and maintain the core infrastructure to allow machine learning engineers and data scientists to develop, train, evaluate, deploy, and operate Machine Learning models and pipelines
Use software development expertise to solve difficult problems, tackling complex infrastructure and architecture challenges
Design system and model architectures, conducting rigorous experimentation and model evaluations, and providing guidance to junior ML engineers
Lead other engineers to drive involved projects from technical design to launch
Collaborate with other teams and internal customers to set expectations, gather input and communicate results

What we offer

Health and wellbeing resources
Paid volunteer days

Fulltime

Senior Machine Learning Systems Engineer

Our team is building the foundations to democratise Machine Learning for Atlassi...

Location

India , Bengaluru

Salary:

Not provided

Atlassian

Expiration Date

Until further notice

Requirements

Fluency in at least one modern object-oriented programming language (preferably Java/Kotlin)
Understanding and experience with Machine Learning project lifecycle and tools
Understanding of LLMs, best deployment practices and inference optimisation
Experience in building and implementing high-performance RESTful micro-services
Experience building and operating large scale distributed systems using Amazon Web Services (Sagemaker, S3, Cloud Formation, AWS Security and Networking)
Experience with Continuous Delivery and Continuous Integration

Job Responsibility

Build and scale the core infrastructure to allow software engineers, ML engineers & data scientists to develop, train, evaluate, deploy, and operate Machine Learning models and pipelines
Build systems for product teams like Jira & Confluence to provide access to curated LLMs
Use software development expertise to solve difficult problems, tackling infrastructure and architecture challenges
Lead engineers to drive involved projects from technical design to launch
Collaborate with other teams and internal customers to set expectations, gather input and communicate results

What we offer

Health coverage
Paid volunteer days
Wellness resources

Fulltime

Machine Learning Systems Engineer

As a Machine Learning Systems Engineer on the AI & ML Platform team, you will bu...

Location

United States

Salary:

145800.00 - 229125.00 USD / Year

Atlassian

Expiration Date

Until further notice

Requirements

Fluency in at least one modern object-oriented programming language (preferably Java/Kotlin)
Understanding and experience with Machine Learning project lifecycle and tools
Understanding of LLMs, best deployment practices and inference optimisation
Experience in building and implementing high-performance RESTful micro-services
Experience building and operating large scale distributed systems using Amazon Web Services (Sagemaker, S3, Cloud Formation, AWS Security and Networking)
Experience with Continuous Delivery and Continuous Integration

Job Responsibility

Build and scale the core infrastructure to allow software engineers, ML engineers & data scientists to develop, train, evaluate, deploy, and operate Machine Learning models and pipelines
Build systems for product teams like Jira & Confluence to provide access to curated LLMs
Use software development expertise to solve difficult problems, tackling infrastructure and architecture challenges
Lead engineers to drive involved projects from technical design to launch
Collaborate with other teams and internal customers to set expectations, gather input and communicate results
Regularly tackle complex problems in the team, from technical design to launch
Routinely tackle complex architecture challenges and defines coding standards & patterns for the team
Lead the team through times of ambiguity, help them adapt and deliver positive impact
Mentor junior members on the team

What we offer

Health coverage
Paid volunteer days
Wellness resources
Bonuses
Commissions
Equity

Fulltime

Senior Staff Machine Learning Engineer

Help design our AI platform and develop our next generation of machine learning ...

Location

United States , San Francisco

Salary:

216500.00 - 324500.00 USD / Year

GoFundMe

Expiration Date

Until further notice

Requirements

9+ years of hands-on experience in machine learning engineering, AI development, software engineering, or related fields
Experience emphasizing secure, large-scale, distributed system design, AI/ML pipeline development, and implementation
Extensive experience designing, developing, and operating scalable backend systems
Experience applying software engineering best practices such as domain-driven design, event-driven architectures, and microservices
Deep expertise in agentic workflows, AI evaluation solutions, prompt management, and secure AI development and testing practices
Strong knowledge of relational and document-based databases, data storage paradigms, and efficient RESTful API design
Experience establishing robust CI/CD pipelines, automated testing (unit and integration), and deployment practices
Strong leadership skills, including effective planning and management of complex projects, mentoring of team members, and fostering a collaborative, high-performing engineering culture
Excellent communicator, able to articulate complex technical concepts clearly to both technical and non-technical stakeholders
Bachelor's degree in Computer Science, Software Engineering, or a related technical field (preferred)

Job Responsibility

Design and implement AI platforms to enable scalable and secure access to LLMs from multiple model providers for diverse use cases
Design and implement agentic workflows, agentic tool ecosystems, and LLM prompt management solutions
Design, build, and optimize scalable model training, fine tuning, and inference pipelines, ensuring robust integration with production systems
Influence technical strategy and approach to developing embedding stores, vector databases, and other reusable assets
Lead initiatives to streamline ML and AI workflows, improve operational efficiency, and establish standardized procedures to achieve consistent, high-quality results across our AI systems
Design and develop backend services and RESTful APIs using Python and FastAPI, integrating seamlessly with ML pipelines and services
Take operational responsibility for team-owned services, including performance monitoring, optimization, troubleshooting, and participation in an on-call rotation
Collaborate with both technical and non-technical colleagues, including data and applied scientists, software engineers, product managers, and business stakeholders, to deliver reliable and scalable ML-driven products
Coach and mentor fellow ML engineers, promoting a culture of collaboration, continuous improvement, and engineering excellence within the team
Employ a diverse set of tools and platforms including Python, AWS, Databricks, Docker, Kubernetes, FastAPI, Terraform, Snowflake, Coralogix, and GitHub to build, deploy, and maintain scalable, highly available machine learning infrastructure

What we offer

Competitive pay
Comprehensive healthcare benefits
Financial assistance for things like hybrid work, family planning
Generous parental leave
Flexible time-off policies
Mental health and wellness resources
Learning, development, and recognition programs

Fulltime

Machine Learning Operations Lead

Together AI

Location:
United States of America , San Francisco

Category:
IT - Software Development

Contract Type:
Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Nice to have:

Additional Information:

Job Posted:
February 18, 2026

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Machine Learning Operations Lead

Senior Staff Machine Learning Engineer

Senior Staff Machine Learning Engineer

Principal Machine Learning Engineer

Principal Machine Learning System Engineer

Senior Machine Learning Engineer

Senior Machine Learning Systems Engineer

Machine Learning Systems Engineer

Senior Staff Machine Learning Engineer

Machine Learning Operations Lead

Together AI

Location:United States of America , San Francisco

Category:IT - Software Development

Contract Type:Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Nice to have:

Additional Information:

Job Posted:February 18, 2026

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Machine Learning Operations Lead

Senior Staff Machine Learning Engineer

Senior Staff Machine Learning Engineer

Principal Machine Learning Engineer

Principal Machine Learning System Engineer

Senior Machine Learning Engineer

Senior Machine Learning Systems Engineer

Machine Learning Systems Engineer

Senior Staff Machine Learning Engineer

Location:
United States of America , San Francisco

Category:
IT - Software Development

Contract Type:
Not provided

Job Posted:
February 18, 2026