CrawlJobs Logo

Machine Learning Platform Engineer

United States, San Francisco 160000.00 - 250000.00 USD / Year · Job Posted February 18, 2026
Apply Position
Job Link Share

Job Description

Our team focuses on enabling custom models and dedicated inference on Together. We are responsible for building a container platform, optimizing autoscaling, minimizing cold starts, achieving the best end-to-end model performance, and providing a best-in-class developer experience with great tooling. We often focus on video or audio generation across the stack: CUDA kernels, pytorch optimization, inference engines, container orchestration, queueing theory, etc. An ideal candidate will be great at profiling/optimization but know the word kubernetes, or be intimately familiar with multi-cluster scheduling and have some sense of ML bottlenecks.

Job Responsibility

  • New hires may work on multi-cluster orchestration, portfolio optimization, predictive autoscaling, control panes, model bring-up, model optimization, APIs for managing deployments, inference worker SDKs, and CLI tools
  • Analyze and improve the robustness and scalability of existing distributed systems, APIs, databases, and infrastructure
  • Partner with product teams to understand functional requirements and deliver solutions that meet business needs
  • Write clear, well-tested, and maintainable software and IaC for both new and existing systems
  • Conduct design and code reviews, create developer documentation, and develop testing strategies for robustness and fault tolerance

Requirements

  • 5+ years of demonstrated experience in building large scale, fault tolerant, distributed systems
  • Experience running serverless inference platforms, doing model bring-up on short notice, being on call, or running a cloud provider is a very big plus
  • Good taste and ability to thoughtfully discuss how what you’ve built has failed over time
  • Experience designing, analyzing and improving efficiency, scalability, and stability of various system resources
  • Excellent understanding of low level operating systems concepts including concurrency, networking and storage, performance and scale
  • Expert-level programmer in one or more of Python, Golang, Rust, C++, or Haskell
  • Proficiency in writing and maintaining Infrastructure as Code (IaC) using tools like Terraform
  • Experience with Kubernetes internals or other container orchestration systems
  • Sound judgement for when to use and when to not use LLMs for code
  • Bachelor’s or Master’s degree in Computer Science, Computer Engineering, or a related technical field, or equivalent practical experience
  • Writing-heavy roles or companies are a plus

What we offer

  • competitive compensation
  • startup equity
  • health insurance
  • other competitive benefits

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Machine Learning Platform Engineer

8 matching positions

Machine Learning Platform Engineer

We’re looking for builders–intellectually curious, highly entrepreneurial engine...
Location
Location
United States , San Francisco
Salary
Salary:
245000.00 - 345000.00 USD / Year
whatnot.com Logo
Whatnot
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Statistics, Applied Mathematics or a related technical field, or equivalent work experience
  • 4+ years of professional experience developing machine learning systems and algorithms
  • 3+ years of software engineering experience building and maintaining production systems for consumer-scale loads
  • 1+ years of professional experience developing software in Python
  • Ability to work autonomously and drive initiatives across multiple product areas and communicate findings with leadership and product teams
  • Experience with operational, search, and key-value databases such as PostgreSQL, DynamoDB, Elasticsearch, Redis
  • Firm grasp of visualization tools for monitoring and logging e.g. DataDog, Grafana
  • Familiarity with cloud computing platforms and managed services such as AWS Sagemaker, Lambda, Kinesis, S3, EC2, EKS/ECS, Apache Kafka, Flink
  • Professionalism around collaborating in a remote working environment and well tested, reproducible work
  • Exceptional documentation and communication skills
Job Responsibility
Job Responsibility
  • Own the infrastructure powering AI and ML models across critical business surfaces–supporting growth, recommendations, trust and safety, fraud, seller tooling, and more
  • Prototype, deploy, and productionalize novel ML architectures that directly shape user experience and marketplace dynamics
  • Design and scale inference infrastructure capable of serving large models with low latency and high throughput
  • Build distributed training and inference pipelines leveraging GPUs and both model and data parallelism
  • Stretch beyond your comfort zone to take on new technical challenges as we scale AI across Whatnot’s ecosystem
What we offer
What we offer
  • Flexible Time off Policy and Company-wide Holidays (including a spring and winter break)
  • Health Insurance options including Medical, Dental, Vision
  • Work From Home Support
  • Home office setup allowance
  • Monthly allowance for cell phone and internet
  • Care benefits
  • Monthly allowance for wellness
  • Annual allowance towards Childcare
  • Lifetime benefit for family planning, such as adoption or fertility expenses
  • Retirement
  • Fulltime
Read More
Arrow Right

Staff Full Stack Software Engineer, Machine Learning Platform

At Cloudera, we empower people to transform complex data into clear and actionab...
Location
Location
Hungary , Budapest
Salary
Salary:
Not provided
cloudera.com Logo
Cloudera
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bsc/Msc in related field or equivalent experience
  • 4+ years of experience with web applications using Node.js, or other modern web services technologies
  • Expertise in full stack development with client and server-side JavaScript/TypeScript (utilizing Node.js, yarn, npm, webpack, babel), SQL in microservices architecture
  • Experience with modern JavaScript frameworks such as React, Angular etc.
  • Experience with generative AI assisted software development
  • Hands-on experience with Docker containers, Kubernetes and Linux at the user level
  • Self-driven and motivated, with a strong sense of ownership and craftsmanship
  • Strong written and verbal communication skills
  • This role is not eligible for sponsorship
Job Responsibility
Job Responsibility
  • Help build the leading platform for AI and machine learning in the enterprise
  • Design, code, and implement clean and elegant user interfaces and workflows
  • Work to enhance developer velocity and team agility
  • Build strong relationships and collaborate with UX designers, other developers, quality engineers, as well as, Product Management, Field Engineering, and other external partners
What we offer
What we offer
  • Generous PTO Policy
  • Support work life balance with Unplugged Days
  • Flexible WFH Policy
  • Mental & Physical Wellness programs
  • Phone and Internet Reimbursement program
  • Access to Continued Career Development
  • Comprehensive Benefits and Competitive Packages
  • Paid Volunteer Time
  • Employee Resource Groups
  • Fulltime
Read More
Arrow Right

Staff Full Stack Software Engineer, Machine Learning Platform

At Cloudera, we empower people to transform complex data into clear and actionab...
Location
Location
Hungary , Budapest
Salary
Salary:
Not provided
cloudera.com Logo
Cloudera
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bsc/Msc in related field or equivalent experience
  • 4+ years of experience with web applications using Node.js, or other modern web services technologies
  • Expertise in full stack development with client and server-side JavaScript/TypeScript (utilizing Node.js, yarn, npm, webpack, babel), SQL in microservices architecture
  • Experience with modern JavaScript frameworks such as React, Angular etc.
  • Experience with generative AI assisted software development
  • Hands-on experience with Docker containers, Kubernetes and Linux at the user level
  • Self-driven and motivated, with a strong sense of ownership and craftsmanship
  • Strong written and verbal communication skills
  • This role is not eligible for sponsorship
Job Responsibility
Job Responsibility
  • Help build the leading platform for AI and machine learning in the enterprise
  • Design, code, and implement clean and elegant user interfaces and workflows
  • Work to enhance developer velocity and team agility
  • Build strong relationships and collaborate with UX designers, other developers, quality engineers, as well as, Product Management, Field Engineering, and other external partners
What we offer
What we offer
  • Generous PTO Policy
  • Support work life balance with Unplugged Days
  • Flexible WFH Policy
  • Mental & Physical Wellness programs
  • Phone and Internet Reimbursement program
  • Access to Continued Career Development
  • Comprehensive Benefits and Competitive Packages
  • Paid Volunteer Time
  • Employee Resource Groups
  • Fulltime
Read More
Arrow Right

Staff Full Stack Software Engineer, Machine Learning Platform

At Cloudera, our Data Services Pillar is the heart of data innovation. We don’t ...
Location
Location
Hungary , Budapest
Salary
Salary:
Not provided
cloudera.com Logo
Cloudera
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bsc/Msc in related field or equivalent experience
  • 4+ years of experience with web applications using Node.js, or other modern web services technologies
  • Expertise in full stack development with client and server-side JavaScript/TypeScript (utilizing Node.js, yarn, npm, webpack, babel), SQL in microservices architecture
  • Experience with modern JavaScript frameworks such as React, Angular etc.
  • Experience with generative AI assisted software development
  • Hands-on experience with Docker containers, Kubernetes and Linux at the user level
  • Self-driven and motivated, with a strong sense of ownership and craftsmanship
  • Strong written and verbal communication skills
  • This role is not eligible for sponsorship
Job Responsibility
Job Responsibility
  • Help build the leading platform for AI and machine learning in the enterprise
  • Design, code, and implement clean and elegant user interfaces and workflows
  • Work to enhance developer velocity and team agility
  • Build strong relationships and collaborate with UX designers, other developers, quality engineers, as well as, Product Management, Field Engineering, and other external partners
What we offer
What we offer
  • Generous PTO Policy
  • Support work life balance with Unplugged Days
  • Flexible WFH Policy
  • Mental & Physical Wellness programs
  • Phone and Internet Reimbursement program
  • Access to Continued Career Development
  • Comprehensive Benefits and Competitive Packages
  • Paid Volunteer Time
  • Employee Resource Groups
  • Fulltime
Read More
Arrow Right

Senior Machine Learning Engineer, Platform

We seek an outstanding, creative, and passionate Machine Learning Platform Engin...
Location
Location
United States , San Jose
Salary
Salary:
229500.00 - 367100.00 USD / Year
roku.com Logo
Roku
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience building software solutions to concrete problems
  • Strong CS fundamentals. Should be able to write an algorithm with ease
  • Fluent with one of high-level programming languages like Java, Scala, Kotlin or Python
  • Worked with big data systems (Spark, Kafka, Flink, S3, AirFlow)
  • Familiar with model ML framework and tools: Ray, PyTorch, HuggingFace, AWS Sagemaker
  • AI literacy and curiosity. You have either tried Gen AI in your previous work or outside of work or are curious about Gen AI and have explored it
  • MS in Computer Science or related field
Job Responsibility
Job Responsibility
  • Design, build, and maintain scalable platform services: feature store, real-time inference services, vector DBs etc., that serve millions of transactions per second
  • Run and monitor online AB tests via robust platform services, analyzing platform metrics and business KPIs to optimize recommendation system performance
  • Collaborate closely with US-based engineering and cross-functional teams to translate business requirements into modular platform components and APIs
  • Enhance and evolve the ML platform ecosystem to support high developer velocity, system scalability, and adaptability to future business needs
  • Contribute to onboarding, training, and mentoring new team members on emerging platform engineering best practices and technologies
What we offer
What we offer
  • health insurance
  • equity awards
  • life insurance
  • disability benefits
  • parental leave
  • wellness benefits
  • paid time off
  • global access to mental health and financial wellness support and resources
  • commuter benefits
  • retirement options (401(k)/pension)
  • Fulltime
Read More
Arrow Right

Senior Machine Learning Platform Engineer

We are looking for a Senior Machine Learning Platform Engineer to join the growi...
Location
Location
United States , San Francisco
Salary
Salary:
180000.00 - 200000.00 USD / Year
strava.com Logo
Strava
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Have worked on complex, ambiguous platform challenges and broken them down into manageable tasks with both strategies and tactical execution
  • Demonstrated technical leadership in leading projects and the ability to mentor and grow early-career team members
  • Have demonstrated strong interpersonal and communication skills, and a collaborative approach to drive business impact across teams
  • Have worked with a variety of MLOps tools that fulfill different ML needs (like FastAPI, LitServe, Metaflow, MLflow, Kubeflow, Feast)
  • Are experienced in production ML model operational excellence and best practices, like automated model retraining, performance monitoring, feature logging, A/B testing
  • Experience with generative AI technologies around LLM evaluation, vector stores, and agent frameworks
  • Have built backend production tools and services on cloud environments like (but not limited to) AWS, using languages Python, Terraform, and other similar technologies
  • Have built and worked on data pipelines using large scale data technologies (like Spark, SQL, Snowflake)
  • Have experience building, shipping, and supporting ML models in production at scale
  • Have experience with exploratory data analysis and model prototyping, using languages such as Python or R and tools like Scikit learn, Pandas, Numpy, Pytorch, Tensorflow, Sagemaker
Job Responsibility
Job Responsibility
  • Own End to End Systems: Drive key projects to power AI/ML at Strava end-to-end from gathering stakeholders requirements to ground up developer to driving adoption and optimizing the experience
  • Interact with a Rich and Large Dataset: Explore and help leverage Strava’s extensive unique fitness and geo datasets from millions of users to extract actionable insights, inform product decisions, and optimize existing features
  • Contribute to a Well Loved Consumer Product: Work at the intersection of AI and fitness to help launch and maintain product experiences that will be used by tens of millions of active people worldwide
What we offer
What we offer
  • Offers Equity
  • Fulltime
Read More
Arrow Right

Senior Platform Machine Learning Engineer

Machine learning is the crucial enabler for every financial service EarnIn provi...
Location
Location
United States , Mountain View
Salary
Salary:
232200.00 - 283800.00 USD / Year
earnin.com Logo
EarnIn
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or Master’s degree in Computer Science, Engineering, or a related field, or relevant equivalent experience
  • 4+ years of industry machine learning experience and excellent software engineering skills
  • Strong programming skills in Python, with familiarity in ML frameworks such as TensorFlow or PyTorch
  • Experience with ML cloud platforms like AWS Sagemaker, Databricks, or GCP Vertex AI
  • Experience with LLM Ops, foundation model APIs, and AI engineering
  • Familiarity with data pipeline and workflow management tools
  • Strong communication and collaboration skills
  • Passion for learning and staying updated with the latest machine learning and platform engineering industry trends
Job Responsibility
Job Responsibility
  • Design, build, and maintain the ML and AI platform and tools to support the end-to-end machine learning lifecycle
  • Work closely with other machine learning engineers to understand their workflows, optimize model training and deployment processes, and ensure the reproducibility of results
  • Ensure scalability, reliability, cost efficiency, and ease of use of the machine learning platform
  • Contribute to evaluating and adopting new technologies and tools to enhance our machine-learning capabilities
  • Set examples of outstanding operational excellence. Be the catalyst for step-jump changes
What we offer
What we offer
  • equity and benefits
  • Fulltime
Read More
Arrow Right

Staff Machine Learning Engineer - AI Platform

You will join our Data Department to support the development of Phantom Intellig...
Location
Location
Salary
Salary:
Not provided
phantombuster.com Logo
PhantomBuster
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience as a Data Scientist or Machine Learning Engineer
  • Experience working with LLMs (e.g., prompt engineering, fine-tuning, retrieval-augmented generation)
  • Experience working with Agents for Amazon Bedrock AgentCore or similar agent setups
  • Strong understanding of machine learning algorithms, statistical methods, and data preprocessing techniques
  • Experience with cloud platforms for model training and deployment, especially AWS
  • Proficiency in Python, including experience with libraries such as LangChain, Scikit-Learn, NumPy, Pandas, and PyTorch/TensorFlow
  • Proficiency in SQL and experience working with data warehouses (e.g., Snowflake, GCP)
  • Knowledge of MLOps best practices, including CI/CD pipelines, model monitoring, and versioning (e.g., MLflow, Airflow)
  • Experience deploying models to production and supporting them post-deployment
  • Fluency in English
Job Responsibility
Job Responsibility
  • Define and evolve our infrastructure to allow for better ML and AI capabilities, with a focus on LLM-based and agentic systems
  • Contribute to the development and expansion of our agentic AI framework powered by AWS Bedrock, enabling both internal tools and customer-facing features
  • Identify, source, and refine datasets to allow tuning models, powering retrieval pipelines, or expanding agentic workflows
  • Pre-process data by using techniques such as data cleaning, feature engineering, and transformation
  • Train, evaluate, and deploy both LLM-based systems and traditional machine learning models into production
  • Monitor, debug, and continuously improve deployed models and AI tools
  • Support machine learning usage throughout the company, including selecting the right modeling approach for the use case (LLM vs. traditional ML)
  • Support the integration and use of LLMs, including approaches such as fine-tuning, prompt tuning, and retrieval-augmented generation (RAG), to improve accuracy
What we offer
What we offer
  • Fully remote working environment
  • €40/month for remote work
  • Flexible working time
  • Home office budget up to €1500
  • 100% of an Alan Blue subscription (french-based contracts)
  • Lunch vouchers - €8 (50% The Phantom Company) / worked day (french-based contracts)
  • Partnership with MokaCare
  • €70 a month benefit for entertainment expenses
  • Book Allowance and Sharing Program
  • Fulltime
Read More
Arrow Right