Engineering Manager - Machine Learning Infrastructure Job at Plaid (New York)

Engineering Manager - Machine Learning Infrastructure

We build simple yet innovative consumer products and developer APIs that shape h...

Location

United States , San Francisco

Salary:

241200.00 - 400000.00 USD / Year

Plaid

Expiration Date

Until further notice

Requirements

8–10 years of experience in ML infrastructure, including direct hands-on expertise as an engineer, IC/TL
2+ years of experience managing infrastructure or ML platform engineers
Proven experience delivering and operating ML or AI infrastructure at scale
Solid technical depth across ML/AI infrastructure domains (e.g., feature stores, pipelines, deployment, inference, observability)
Demonstrated ability to drive execution on complex technical projects with cross-team stakeholders
Strong communication and stakeholder management skills

Job Responsibility

Lead and support the ML Infra team, driving project execution and ensuring delivery on key commitments
Build and launch Plaid’s next-generation feature store to improve reliability and velocity of model development
Define and drive adoption of an ML Ops “golden path” for secure, scalable model training, deployment, and monitoring
Ensure operational excellence of ML pipelines, deployment tooling, and inference systems
Partner with ML product teams to understand requirements and deliver solutions that accelerate model development and iteration
Recruit, mentor, and develop engineers, fostering a collaborative and high-performing team culture

What we offer

medical
dental
vision
401(k)
equity
commission

Fulltime

Senior Manager, Machine Learning Engineering

As a Capital One Machine Learning Engineer (MLE), you'll be part of an Agile tea...

Location

United States , McLean; New York

Salary:

229900.00 - 286200.00 USD / Year

Capital One

Expiration Date

Until further notice

Requirements

Bachelor’s Degree
At least 8 years of experience designing and building data-intensive solutions using distributed computing
At least 4 years of experience programming with Python, Scala, or Java
At least 3 years of experience building, scaling, and optimizing ML systems
At least 2 years of experience leading teams developing ML solutions
At least 4 years of people management experience
Master's or Doctoral Degree in computer science, electrical engineering, mathematics, or a similar field
4+ years of on-the-job experience with an industry recognized ML framework such as scikit-learn, PyTorch, Dask, Spark, or TensorFlow
3+ years of experience developing performant, resilient, and maintainable code
3+ years of experience with data gathering and preparation for ML models

Job Responsibility

Design, build, and/or deliver ML models and components that solve real-world business problems
Inform ML infrastructure decisions using understanding of ML modeling techniques and issues
Solve complex problems by writing and testing application code, developing and validating ML models, and automating tests and deployment
Collaborate as part of a cross-functional Agile team to create and enhance software
Retrain, maintain, and monitor models in production
Leverage or build cloud-based architectures, technologies, and/or platforms to deliver optimized ML models at scale
Construct optimized data pipelines to feed ML models
Leverage continuous integration and continuous deployment best practices
Ensure all code is well-managed to reduce vulnerabilities, models are well-governed
Use programming languages like Python, Scala, or Java

What we offer

Performance based incentive compensation, which may include cash bonus(es) and/or long term incentives (LTI)
comprehensive, competitive, and inclusive set of health, financial and other benefits

Fulltime

Engineering Manager, Machine Learning

We are looking for an Engineering Manager, Machine Learning to lead a team of ML...

Location

India , Bengaluru

Salary:

Not provided

Roku

Expiration Date

Until further notice

Requirements

10+ years of experience in machine learning engineering, with a strong track record of shipping models to production in consumer-facing products (recommendations, search, ads, personalization, or similar domains)
Experience managing an ML or software engineering team
BS/MS in Computer Science, Mathematics, Statistics, or a related quantitative field
Deep expertise in recommendation system architectures, deep neural networks, and ranking models
Strong software engineering fundamentals and experience writing production-quality code
Experience with large-scale ML tooling and infrastructure: PyTorch/TensorFlow, Spark, Airflow, cloud-native MLOps platforms
Experience with multi-objective optimization, reinforcement learning, or Bayesian methods in production settings
Familiarity with LLM-based approaches for recommendations, content understanding, or generative personalization
Demonstrated ability to connect ML work to measurable product and business outcomes
Experience building and scaling ML teams in a distributed or multi-site setting

Job Responsibility

Lead, mentor, and grow a team of ML engineers
foster a culture of technical excellence, ownership, and collaboration
Set the technical roadmap, aligning priorities across the team and within the broader Recommendations organization
Drive system design and architecture decisions for ML models powering content ranking, user modeling, multi-objective optimization, and personalization across Roku's key surfaces
Provide technical leadership on model architecture choices, training and serving infrastructure, and evaluation methodologies
Own the A/B experimentation and measurement strategy for your team's surfaces
ensure ML work is tied to measurable product and business outcomes
Champion the adoption of generative AI to push the boundaries of recommendation and personalization
Partner with Product, Engineering, and cross-functional stakeholders to translate business goals into ML solutions
Recruit and develop ML talent in Bengaluru

What we offer

Global access to mental health and financial wellness support and resources
Local benefits include statutory and voluntary benefits which may include healthcare (medical, dental, and vision)
Life, accident, disability, commuter, and retirement options (401(k)/pension)
Employees are supported in taking time off, in accordance with local leave policies and other personal needs

Fulltime

Engineering Manager, Machine Learning Platform

Dandy is transforming the massive and antiquated dental industry. We are establi...

Location

United States

Salary:

216750.00 - 255000.00 USD / Year

Dandy

Expiration Date

Until further notice

Requirements

Bachelor’s in Computer Science, Electrical Engineering, Robotics, or a related field
5+ years of industry experience in applied machine learning
at least 2+ years in a leadership or management role
Deep understanding of ML Platform best practices: CI/CD for ML, model deployment, observability, and lifecycle management
Strong people management skills with a track record of team development, hiring mentorship, and retention
Strong Python skills and experience with ML frameworks (PyTorch, TensorFlow, scikit-learn)
Hands-on experience with model optimization, hyperparameter tuning, evaluation, monitoring, and benchmarking
Familiarity with Docker, Kubernetes, Kubeflow, and cloud platforms (AWS, GCP, or Azure)
preference for GCP tools (Vertex AI, BigQuery, Dataflow, GKE)
Demonstrated ability to lead cross-functional efforts across product, engineering, research, and operations.

Job Responsibility

Lead execution of the ML roadmap, with a focus on building scalable, production-grade ML Platform infrastructure for 3D deep learning and computer vision applications
Build and lead a high-performing ML Platform team, with an emphasis on reliability, reproducibility, and efficient model lifecycle management
Partner with product, engineering, and research stakeholders to translate ML innovations into robust, deployable systems that support key business objectives
Oversee the end-to-end deployment of computer vision models that extract structure from complex 3D scan data, ensuring production readiness and performance at scale
Drive experimentation and implementation of SOTA techniques for 3D generative AI, while ensuring they are integrated into robust ML Platform workflows
Collaborate cross-functionally to identify pain points and bottlenecks in current ML pipelines and deliver improvements through automation and tooling
Manage large-scale data pipelines, versioning, and labeling workflows to support both generalized and fine-tuned model development
Define and implement rigorous evaluation and monitoring strategies to maintain high model quality, mitigate drift, and ensure reliability in production
Establish and maintain best practices for CI/CD of ML models, including model versioning, rollback mechanisms, and monitoring for performance and integrity.

What we offer

Offers Equity
healthcare
dental
mental health support
parental planning resources
retirement savings options
generous paid time off

Fulltime

Engineering Manager (Python + Machine Learning)

We are seeking a hands-on Machine Learning Engineering Manager to lead cross-fun...

Location

India , Noida

Salary:

Not provided

AquSag Technologies

Expiration Date

Until further notice

Requirements

9+ yrs of strong background in Machine Learning, NLP, and modern deep learning architectures (Transformers, LLMs)
Hands-on experience with frameworks such as PyTorch, TensorFlow, Hugging Face, or DeepSpeed
2+ yrs of proven experience managing teams delivering ML/LLM models in production environments
Knowledge of distributed training, GPU/TPU optimization, and cloud platforms (AWS, GCP, Azure)
Familiarity with MLOps tools like MLflow, Kubeflow, or Vertex AI for scalable ML pipelines
Excellent leadership, communication, and cross-functional collaboration skills
Bachelor’s or Master’s in Computer Science, Engineering, or related field

Job Responsibility

Lead and mentor a cross-functional team of ML engineers, data scientists, and MLOps professionals
Oversee the full lifecycle of LLM and ML projects — from data collection to training, evaluation, and deployment
Collaborate with Research, Product, and Infrastructure teams to define goals, milestones, and success metrics
Provide technical direction on large-scale model training, fine-tuning, and distributed systems design
Implement best practices in MLOps, model governance, experiment tracking, and CI/CD for ML
Manage compute resources, budgets, and ensure compliance with data security and responsible AI standards
Communicate progress, risks, and results to stakeholders and executives effectively

Manager, Machine Learning - Community Support Engineering

The Community Support Platform (CSP) at Airbnb is a critical system that drives ...

Location

United States

Salary:

204000.00 - 255000.00 USD / Year

Airbnb

Expiration Date

Until further notice

Requirements

Expertise in various machine learning and AI methodologies, including LLMs and non-LLMs, tailored for user-facing products
Proven experience in leading teams that develop large-scale ML models and systems to improve online user experiences
Strong leadership skills with a track record of nurturing an innovative and collaborative team environment
Exceptional verbal and written communication abilities, with a keen eye for detail
Demonstrated capability to work effectively with stakeholders at all organizational levels, both internally and externally
Skilled in navigating and resolving ambiguous challenges through proactive and strategic approaches
PhD, or Master's degree in Computer Science, Mathematics, Statistics, or related technical field
10+ years of experience in building and shipping AI models and products, including 2+ years of experience with LLMs
5+ years managing machine learning teams that deliver large impact
Expert knowledge of machine learning algorithms and techniques

Job Responsibility

Lead and mentor a dynamic team of highly skilled applied scientists and machine learning engineers in the research, design and optimization of AI models and services
Develop and refine the overarching strategy for the ML and AI aspects of our community support products, focusing on scalability, quality, safety, performance, and reliability
Foster rapid development cycles without sacrificing quality, collaborating closely with platform, backend, and frontend engineers to engineer robust ML models and systems that enhance community support initiatives
Evaluate technical trade-offs in key decisions, ensuring optimal outcomes through data-backed strategies
Conduct thorough design and architecture reviews to continually elevate our standards of technical excellence

What we offer

bonus
equity
benefits
Employee Travel Credits

Fulltime

New

Sr. Manager, Machine Learning

Roku is changing how the world watches TV... The person in this role will levera...

Location

India , Bengaluru

Salary:

Not provided

Roku

Expiration Date

Until further notice

Requirements

10+ years of total experience in software/ML engineering, with at least 5 years in a dedicated people management leadership role
MS or BS in Computer Science, Mathematics, or a related quantitative field
deep theoretical and practical understanding of Machine Learning
understanding the business of streaming and how ML directly impacts the bottom line
ability to explain complex model to non-technical executives
proven track record of deploying large-scale ML models into production environments
experience with modern ML stacks (e.g., LLMs, PyTorch, TensorFlow, Spark, and cloud-native MLOps tools)

Job Responsibility

Define the roadmap for ML within the Content Platform
Provide high-level guidance on model architecture and MLOps infrastructure
Lead and grow high-performing team of ML Engineers
Partner with Product, Engineering, and Content Strategy to align ML initiatives with business goals
Champion a culture of incremental delivery
Balance long-term R&D with measurable improvements

What we offer

global access to mental health and financial wellness support
healthcare (medical, dental, and vision)
life
accident
disability
commuter
retirement options (401(k)/pension)
time off in accordance with local leave policies

Fulltime

Machine Learning Manager - Applied ML

We are looking for a Machine Learning Manager to lead our East Coast Applied ML ...

Location

United States , New York

Salary:

Not provided

Cohere

Expiration Date

Until further notice

Requirements

Bachelor’s degree in Computer Science, Machine Learning, or a related field (Master’s or PhD preferred)
8+ years in AI/ML, including several years in technical leadership roles
Proven success leading large ML teams and delivering complex AI solutions at scale
Experience with enterprise deployments, including custom model development and fine-tuning
Deep understanding of LLMs, their training, deployment, and real-world constraints
Hands-on experience with RAG pipelines, agentic systems, and multi-modal applications
Proficiency in ML frameworks such as PyTorch or TensorFlow
Familiarity with modern cloud platforms (AWS, GCP, Azure) and ML infrastructure best practices
Strong ability to translate business requirements into scalable ML solutions
Track record of product thinking and technical decision-making aligned with customer needs

Job Responsibility

Define and drive the long-term vision for the Applied ML team in alignment with Cohere’s product and business goals
Shape the roadmap for custom model development, fine-tuning, and advanced implementations that address nuanced enterprise challenges
Collaborate closely with executive leadership to prioritize high-impact initiatives and strategic customer engagements
Lead and grow a high-performing team of ML engineers through hiring, coaching, and mentorship
Foster a culture of ownership, innovation, and continuous learning
Establish and evolve team processes to maximize productivity and execution speed
Partner with Product to define and deliver novel, scalable ML solutions that differentiate Cohere in the market
Guide the development of reusable frameworks and abstractions that streamline deployment across customer use cases
Oversee performance optimization and evaluation of models in real-world enterprise environments
Act as a trusted technical advisor to strategic customers—translating needs into actionable plans

What we offer

An open and inclusive culture and work environment
Work closely with a team on the cutting edge of AI research
Weekly lunch stipend, in-office lunches & snacks
Full health and dental benefits, including a separate budget to take care of your mental health
100% Parental Leave top-up for up to 6 months
Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
6 weeks of vacation (30 working days!)

Fulltime

Select Country

Engineering Manager - Machine Learning Infrastructure

Job Description

Job Responsibility

Requirements

Looking for more opportunities?

Engineering Manager - Machine Learning Infrastructure

Engineering Manager - Machine Learning Infrastructure

Senior Manager, Machine Learning Engineering

Engineering Manager, Machine Learning

Engineering Manager, Machine Learning Platform

Engineering Manager (Python + Machine Learning)

Manager, Machine Learning - Community Support Engineering

Sr. Manager, Machine Learning

Machine Learning Manager - Applied ML

Our AI answers in your language