CrawlJobs Logo

Engineering Manager - Machine Learning Infrastructure

United States, New York 216000.00 - 367200.00 USD / Year · Job Posted March 22, 2026
Apply Position
Job Link Share

Job Description

We build simple yet innovative consumer products and developer APIs that shape how everybody interacts with money and the financial system. Plaid is evolving into an AI-first company, where data and machine learning are the key enablers of smarter, more secure insight products built on top of Plaid’s vast financial data network. The Machine Learning Infrastructure team sits at the center of this transformation. We build the platforms that enable model developers to experiment, train, deploy, and monitor machine learning systems reliably and at scale — from feature stores and pipelines, to deployment frameworks and inference tooling. We are in the midst of a pivotal shift: replacing legacy systems with a modern feature store, and establishing a standardized ML Ops “golden path.” Our mission is to enable Plaid’s product teams to move faster with trustworthy insights, deploy models with confidence, and unlock the next generation of AI-powered financial experiences. As the Engineering Manager for Machine Learning Infrastructure, you will be responsible for guiding a senior engineering team through the design, delivery, and operation of Plaid’s ML infrastructure. We are looking for a leader who combines deep technical expertise in ML infrastructure with proven experience scaling and managing senior engineering teams. You’ll ensure clarity of execution, help your team deliver high-quality systems, and partner closely with ML product teams to meet their needs. This role is execution-driven: you will translate strategy into action, remove blockers, and build a culture of ownership and technical excellence.

Job Responsibility

  • Lead and support the ML Infra team, driving project execution and ensuring delivery on key commitments
  • Build and launch Plaid’s next-generation feature store to improve reliability and velocity of model development
  • Define and drive adoption of an ML Ops “golden path” for secure, scalable model training, deployment, and monitoring
  • Ensure operational excellence of ML pipelines, deployment tooling, and inference systems
  • Partner with ML product teams to understand requirements and deliver solutions that accelerate model development and iteration
  • Recruit, mentor, and develop engineers, fostering a collaborative and high-performing team culture

Requirements

  • 8–10 years of experience in ML infrastructure, including direct hands-on expertise as an engineer, IC/TL
  • 2+ years of experience managing infrastructure or ML platform engineers
  • Proven experience delivering and operating ML or AI infrastructure at scale
  • Solid technical depth across ML/AI infrastructure domains (e.g., feature stores, pipelines, deployment, inference, observability)
  • Demonstrated ability to drive execution on complex technical projects with cross-team stakeholders
  • Strong communication and stakeholder management skills

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Engineering Manager - Machine Learning Infrastructure

8 matching positions

Engineering Manager - Machine Learning Infrastructure

We build simple yet innovative consumer products and developer APIs that shape h...
Location
Location
United States , San Francisco
Salary
Salary:
241200.00 - 400000.00 USD / Year
plaid.com Logo
Plaid
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8–10 years of experience in ML infrastructure, including direct hands-on expertise as an engineer, IC/TL
  • 2+ years of experience managing infrastructure or ML platform engineers
  • Proven experience delivering and operating ML or AI infrastructure at scale
  • Solid technical depth across ML/AI infrastructure domains (e.g., feature stores, pipelines, deployment, inference, observability)
  • Demonstrated ability to drive execution on complex technical projects with cross-team stakeholders
  • Strong communication and stakeholder management skills
Job Responsibility
Job Responsibility
  • Lead and support the ML Infra team, driving project execution and ensuring delivery on key commitments
  • Build and launch Plaid’s next-generation feature store to improve reliability and velocity of model development
  • Define and drive adoption of an ML Ops “golden path” for secure, scalable model training, deployment, and monitoring
  • Ensure operational excellence of ML pipelines, deployment tooling, and inference systems
  • Partner with ML product teams to understand requirements and deliver solutions that accelerate model development and iteration
  • Recruit, mentor, and develop engineers, fostering a collaborative and high-performing team culture
What we offer
What we offer
  • medical
  • dental
  • vision
  • 401(k)
  • equity
  • commission
  • Fulltime
Read More
Arrow Right

Senior Manager, Machine Learning Engineering

As a Capital One Machine Learning Engineer (MLE), you'll be part of an Agile tea...
Location
Location
United States , McLean; New York
Salary
Salary:
229900.00 - 286200.00 USD / Year
capitalone.com Logo
Capital One
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s Degree
  • At least 8 years of experience designing and building data-intensive solutions using distributed computing
  • At least 4 years of experience programming with Python, Scala, or Java
  • At least 3 years of experience building, scaling, and optimizing ML systems
  • At least 2 years of experience leading teams developing ML solutions
  • At least 4 years of people management experience
  • Master's or Doctoral Degree in computer science, electrical engineering, mathematics, or a similar field
  • 4+ years of on-the-job experience with an industry recognized ML framework such as scikit-learn, PyTorch, Dask, Spark, or TensorFlow
  • 3+ years of experience developing performant, resilient, and maintainable code
  • 3+ years of experience with data gathering and preparation for ML models
Job Responsibility
Job Responsibility
  • Design, build, and/or deliver ML models and components that solve real-world business problems
  • Inform ML infrastructure decisions using understanding of ML modeling techniques and issues
  • Solve complex problems by writing and testing application code, developing and validating ML models, and automating tests and deployment
  • Collaborate as part of a cross-functional Agile team to create and enhance software
  • Retrain, maintain, and monitor models in production
  • Leverage or build cloud-based architectures, technologies, and/or platforms to deliver optimized ML models at scale
  • Construct optimized data pipelines to feed ML models
  • Leverage continuous integration and continuous deployment best practices
  • Ensure all code is well-managed to reduce vulnerabilities, models are well-governed
  • Use programming languages like Python, Scala, or Java
What we offer
What we offer
  • Performance based incentive compensation, which may include cash bonus(es) and/or long term incentives (LTI)
  • comprehensive, competitive, and inclusive set of health, financial and other benefits
  • Fulltime
Read More
Arrow Right

Engineering Manager, Machine Learning

We are looking for an Engineering Manager, Machine Learning to lead a team of ML...
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
roku.com Logo
Roku
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of experience in machine learning engineering, with a strong track record of shipping models to production in consumer-facing products (recommendations, search, ads, personalization, or similar domains)
  • Experience managing an ML or software engineering team
  • BS/MS in Computer Science, Mathematics, Statistics, or a related quantitative field
  • Deep expertise in recommendation system architectures, deep neural networks, and ranking models
  • Strong software engineering fundamentals and experience writing production-quality code
  • Experience with large-scale ML tooling and infrastructure: PyTorch/TensorFlow, Spark, Airflow, cloud-native MLOps platforms
  • Experience with multi-objective optimization, reinforcement learning, or Bayesian methods in production settings
  • Familiarity with LLM-based approaches for recommendations, content understanding, or generative personalization
  • Demonstrated ability to connect ML work to measurable product and business outcomes
  • Experience building and scaling ML teams in a distributed or multi-site setting
Job Responsibility
Job Responsibility
  • Lead, mentor, and grow a team of ML engineers
  • foster a culture of technical excellence, ownership, and collaboration
  • Set the technical roadmap, aligning priorities across the team and within the broader Recommendations organization
  • Drive system design and architecture decisions for ML models powering content ranking, user modeling, multi-objective optimization, and personalization across Roku's key surfaces
  • Provide technical leadership on model architecture choices, training and serving infrastructure, and evaluation methodologies
  • Own the A/B experimentation and measurement strategy for your team's surfaces
  • ensure ML work is tied to measurable product and business outcomes
  • Champion the adoption of generative AI to push the boundaries of recommendation and personalization
  • Partner with Product, Engineering, and cross-functional stakeholders to translate business goals into ML solutions
  • Recruit and develop ML talent in Bengaluru
What we offer
What we offer
  • Global access to mental health and financial wellness support and resources
  • Local benefits include statutory and voluntary benefits which may include healthcare (medical, dental, and vision)
  • Life, accident, disability, commuter, and retirement options (401(k)/pension)
  • Employees are supported in taking time off, in accordance with local leave policies and other personal needs
  • Fulltime
Read More
Arrow Right

Engineering Manager, Machine Learning Platform

Dandy is transforming the massive and antiquated dental industry. We are establi...
Location
Location
United States
Salary
Salary:
216750.00 - 255000.00 USD / Year
meetdandy.com Logo
Dandy
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s in Computer Science, Electrical Engineering, Robotics, or a related field
  • 5+ years of industry experience in applied machine learning
  • at least 2+ years in a leadership or management role
  • Deep understanding of ML Platform best practices: CI/CD for ML, model deployment, observability, and lifecycle management
  • Strong people management skills with a track record of team development, hiring mentorship, and retention
  • Strong Python skills and experience with ML frameworks (PyTorch, TensorFlow, scikit-learn)
  • Hands-on experience with model optimization, hyperparameter tuning, evaluation, monitoring, and benchmarking
  • Familiarity with Docker, Kubernetes, Kubeflow, and cloud platforms (AWS, GCP, or Azure)
  • preference for GCP tools (Vertex AI, BigQuery, Dataflow, GKE)
  • Demonstrated ability to lead cross-functional efforts across product, engineering, research, and operations.
Job Responsibility
Job Responsibility
  • Lead execution of the ML roadmap, with a focus on building scalable, production-grade ML Platform infrastructure for 3D deep learning and computer vision applications
  • Build and lead a high-performing ML Platform team, with an emphasis on reliability, reproducibility, and efficient model lifecycle management
  • Partner with product, engineering, and research stakeholders to translate ML innovations into robust, deployable systems that support key business objectives
  • Oversee the end-to-end deployment of computer vision models that extract structure from complex 3D scan data, ensuring production readiness and performance at scale
  • Drive experimentation and implementation of SOTA techniques for 3D generative AI, while ensuring they are integrated into robust ML Platform workflows
  • Collaborate cross-functionally to identify pain points and bottlenecks in current ML pipelines and deliver improvements through automation and tooling
  • Manage large-scale data pipelines, versioning, and labeling workflows to support both generalized and fine-tuned model development
  • Define and implement rigorous evaluation and monitoring strategies to maintain high model quality, mitigate drift, and ensure reliability in production
  • Establish and maintain best practices for CI/CD of ML models, including model versioning, rollback mechanisms, and monitoring for performance and integrity.
What we offer
What we offer
  • Offers Equity
  • healthcare
  • dental
  • mental health support
  • parental planning resources
  • retirement savings options
  • generous paid time off
  • Fulltime
Read More
Arrow Right

Engineering Manager (Python + Machine Learning)

We are seeking a hands-on Machine Learning Engineering Manager to lead cross-fun...
Location
Location
India , Noida
Salary
Salary:
Not provided
aqusag.com Logo
AquSag Technologies
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 9+ yrs of strong background in Machine Learning, NLP, and modern deep learning architectures (Transformers, LLMs)
  • Hands-on experience with frameworks such as PyTorch, TensorFlow, Hugging Face, or DeepSpeed
  • 2+ yrs of proven experience managing teams delivering ML/LLM models in production environments
  • Knowledge of distributed training, GPU/TPU optimization, and cloud platforms (AWS, GCP, Azure)
  • Familiarity with MLOps tools like MLflow, Kubeflow, or Vertex AI for scalable ML pipelines
  • Excellent leadership, communication, and cross-functional collaboration skills
  • Bachelor’s or Master’s in Computer Science, Engineering, or related field
Job Responsibility
Job Responsibility
  • Lead and mentor a cross-functional team of ML engineers, data scientists, and MLOps professionals
  • Oversee the full lifecycle of LLM and ML projects — from data collection to training, evaluation, and deployment
  • Collaborate with Research, Product, and Infrastructure teams to define goals, milestones, and success metrics
  • Provide technical direction on large-scale model training, fine-tuning, and distributed systems design
  • Implement best practices in MLOps, model governance, experiment tracking, and CI/CD for ML
  • Manage compute resources, budgets, and ensure compliance with data security and responsible AI standards
  • Communicate progress, risks, and results to stakeholders and executives effectively
Read More
Arrow Right

Manager, Machine Learning - Community Support Engineering

The Community Support Platform (CSP) at Airbnb is a critical system that drives ...
Location
Location
United States
Salary
Salary:
204000.00 - 255000.00 USD / Year
airbnb.com Logo
Airbnb
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Expertise in various machine learning and AI methodologies, including LLMs and non-LLMs, tailored for user-facing products
  • Proven experience in leading teams that develop large-scale ML models and systems to improve online user experiences
  • Strong leadership skills with a track record of nurturing an innovative and collaborative team environment
  • Exceptional verbal and written communication abilities, with a keen eye for detail
  • Demonstrated capability to work effectively with stakeholders at all organizational levels, both internally and externally
  • Skilled in navigating and resolving ambiguous challenges through proactive and strategic approaches
  • PhD, or Master's degree in Computer Science, Mathematics, Statistics, or related technical field
  • 10+ years of experience in building and shipping AI models and products, including 2+ years of experience with LLMs
  • 5+ years managing machine learning teams that deliver large impact
  • Expert knowledge of machine learning algorithms and techniques
Job Responsibility
Job Responsibility
  • Lead and mentor a dynamic team of highly skilled applied scientists and machine learning engineers in the research, design and optimization of AI models and services
  • Develop and refine the overarching strategy for the ML and AI aspects of our community support products, focusing on scalability, quality, safety, performance, and reliability
  • Foster rapid development cycles without sacrificing quality, collaborating closely with platform, backend, and frontend engineers to engineer robust ML models and systems that enhance community support initiatives
  • Evaluate technical trade-offs in key decisions, ensuring optimal outcomes through data-backed strategies
  • Conduct thorough design and architecture reviews to continually elevate our standards of technical excellence
What we offer
What we offer
  • bonus
  • equity
  • benefits
  • Employee Travel Credits
  • Fulltime
Read More
Arrow Right
New

Sr. Manager, Machine Learning

Roku is changing how the world watches TV... The person in this role will levera...
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
roku.com Logo
Roku
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of total experience in software/ML engineering, with at least 5 years in a dedicated people management leadership role
  • MS or BS in Computer Science, Mathematics, or a related quantitative field
  • deep theoretical and practical understanding of Machine Learning
  • understanding the business of streaming and how ML directly impacts the bottom line
  • ability to explain complex model to non-technical executives
  • proven track record of deploying large-scale ML models into production environments
  • experience with modern ML stacks (e.g., LLMs, PyTorch, TensorFlow, Spark, and cloud-native MLOps tools)
Job Responsibility
Job Responsibility
  • Define the roadmap for ML within the Content Platform
  • Provide high-level guidance on model architecture and MLOps infrastructure
  • Lead and grow high-performing team of ML Engineers
  • Partner with Product, Engineering, and Content Strategy to align ML initiatives with business goals
  • Champion a culture of incremental delivery
  • Balance long-term R&D with measurable improvements
What we offer
What we offer
  • global access to mental health and financial wellness support
  • healthcare (medical, dental, and vision)
  • life
  • accident
  • disability
  • commuter
  • retirement options (401(k)/pension)
  • time off in accordance with local leave policies
  • Fulltime
Read More
Arrow Right

Machine Learning Manager - Applied ML

We are looking for a Machine Learning Manager to lead our East Coast Applied ML ...
Location
Location
United States , New York
Salary
Salary:
Not provided
cohere.com Logo
Cohere
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Machine Learning, or a related field (Master’s or PhD preferred)
  • 8+ years in AI/ML, including several years in technical leadership roles
  • Proven success leading large ML teams and delivering complex AI solutions at scale
  • Experience with enterprise deployments, including custom model development and fine-tuning
  • Deep understanding of LLMs, their training, deployment, and real-world constraints
  • Hands-on experience with RAG pipelines, agentic systems, and multi-modal applications
  • Proficiency in ML frameworks such as PyTorch or TensorFlow
  • Familiarity with modern cloud platforms (AWS, GCP, Azure) and ML infrastructure best practices
  • Strong ability to translate business requirements into scalable ML solutions
  • Track record of product thinking and technical decision-making aligned with customer needs
Job Responsibility
Job Responsibility
  • Define and drive the long-term vision for the Applied ML team in alignment with Cohere’s product and business goals
  • Shape the roadmap for custom model development, fine-tuning, and advanced implementations that address nuanced enterprise challenges
  • Collaborate closely with executive leadership to prioritize high-impact initiatives and strategic customer engagements
  • Lead and grow a high-performing team of ML engineers through hiring, coaching, and mentorship
  • Foster a culture of ownership, innovation, and continuous learning
  • Establish and evolve team processes to maximize productivity and execution speed
  • Partner with Product to define and deliver novel, scalable ML solutions that differentiate Cohere in the market
  • Guide the development of reusable frameworks and abstractions that streamline deployment across customer use cases
  • Oversee performance optimization and evaluation of models in real-world enterprise environments
  • Act as a trusted technical advisor to strategic customers—translating needs into actionable plans
What we offer
What we offer
  • An open and inclusive culture and work environment
  • Work closely with a team on the cutting edge of AI research
  • Weekly lunch stipend, in-office lunches & snacks
  • Full health and dental benefits, including a separate budget to take care of your mental health
  • 100% Parental Leave top-up for up to 6 months
  • Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
  • Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
  • 6 weeks of vacation (30 working days!)
  • Fulltime
Read More
Arrow Right