CrawlJobs Logo

Engineering Manager - Machine Learning Infrastructure

plaid.com Logo

Plaid

Location Icon

Location:
United States , San Francisco

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

241200.00 - 400000.00 USD / Year

Job Description:

We build simple yet innovative consumer products and developer APIs that shape how everybody interacts with money and the financial system. Plaid is evolving into an AI-first company, where data and machine learning are the key enablers of smarter, more secure insight products built on top of Plaid’s vast financial data network. The Machine Learning Infrastructure team sits at the center of this transformation. We build the platforms that enable model developers to experiment, train, deploy, and monitor machine learning systems reliably and at scale — from feature stores and pipelines, to deployment frameworks and inference tooling. We are in the midst of a pivotal shift: replacing legacy systems with a modern feature store, and establishing a standardized ML Ops “golden path.” Our mission is to enable Plaid’s product teams to move faster with trustworthy insights, deploy models with confidence, and unlock the next generation of AI-powered financial experiences.

Job Responsibility:

  • Lead and support the ML Infra team, driving project execution and ensuring delivery on key commitments
  • Build and launch Plaid’s next-generation feature store to improve reliability and velocity of model development
  • Define and drive adoption of an ML Ops “golden path” for secure, scalable model training, deployment, and monitoring
  • Ensure operational excellence of ML pipelines, deployment tooling, and inference systems
  • Partner with ML product teams to understand requirements and deliver solutions that accelerate model development and iteration
  • Recruit, mentor, and develop engineers, fostering a collaborative and high-performing team culture

Requirements:

  • 8–10 years of experience in ML infrastructure, including direct hands-on expertise as an engineer, IC/TL
  • 2+ years of experience managing infrastructure or ML platform engineers
  • Proven experience delivering and operating ML or AI infrastructure at scale
  • Solid technical depth across ML/AI infrastructure domains (e.g., feature stores, pipelines, deployment, inference, observability)
  • Demonstrated ability to drive execution on complex technical projects with cross-team stakeholders
  • Strong communication and stakeholder management skills
What we offer:
  • medical
  • dental
  • vision
  • 401(k)
  • equity
  • commission

Additional Information:

Job Posted:
December 11, 2025

Employment Type:
Fulltime
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Engineering Manager - Machine Learning Infrastructure

Senior AI and Machine Learning Engineer

We are seeking Senior AI/ML & Innovation Engineer who will be leading initiative...
Location
Location
United States , Aguadilla
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or master’s degree in computer science, engineering, data science, machine learning, artificial intelligence, or closely related quantitative discipline
  • Typically, 7-10 years’ experience
  • Deep understanding of machine learning algorithms, such as linear regression, decision trees, support vector machines, random forests, deep learning models (e.g., neural networks), and reinforcement learning
  • A strong foundation in mathematics and statistics
  • Proficiency in programming languages such as Python, R, or Java
  • Strong understanding of GitHub CoPilot, Cursor, N8N, vibe coding, Windsurf, and similar technologies
  • Experience in Cloud Infrastructure (AWS, Azure, etc)
  • Knowledge of Open Source, Linux, etc
  • Understanding of Devops, SRE
  • Advanced knowledge and experience in deep learning
Job Responsibility
Job Responsibility
  • Conducts research and stays up to date with the latest advancements in AI and machine learning technologies, frameworks, and algorithms
  • Collaborates with cross-functional teams to understand business requirements and design AI and machine learning solutions
  • Develops, implements, and optimizes machine learning models and algorithms
  • Deploys machine learning models into production environments
  • Monitors the performance of deployed models
  • Organizes and leads comprehensive design review sessions
  • Works collaboratively with the engineering manager and team lead to set design and implementation standards
  • Regularly leads meetings
  • Has experience in providing technical leadership, mentorship, and guidance to junior team members
  • Develops and delivers strategic presentations and reports to senior stakeholders
What we offer
What we offer
  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion
  • Fulltime
Read More
Arrow Right

Lead Product Manager, Data and Machine Learning (ML)

As a Lead Data and Machine Learning Product Manager, you’ll own the vision, stra...
Location
Location
United States , New York
Salary
Salary:
200000.00 - 230000.00 USD / Year
clearme.com Logo
Clear
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of product management experience, ideally owning data platforms, ML-powered features, or analytics products
  • Strong fluency in data infrastructure, machine learning workflows, and responsible AI concepts
  • Experience delivering end-user and partner-facing features (e.g. personalization, core identity, insights, fraud detection)
  • Skilled in partnering with Data, Engineering, and Security teams to ship performant, privacy-safe, and compliant systems
  • Adept at leading cross-functional teams, navigating ambiguity, and driving from vision to execution
  • Excellent communication skills
  • able to explain complex concepts to technical and non-technical audiences alike
  • Deep curiosity and user-first mindset, with a passion for unlocking value through intelligence
Job Responsibility
Job Responsibility
  • Define and own the product strategy and roadmap for CLEAR’s data and ML-powered capabilities, spanning internal intelligence platforms and customer/partner-facing use cases
  • Lead the design and delivery of systems that support ML model development, deployment, and monitoring, in partnership with Engineering and Data teams
  • Define and evolve reporting and analytics capabilities, helping us understand and optimize user identity verification, usage, and conversion
  • Work closely with stakeholders across Engineering, Data, Security, Product, GTM, and Ops to ensure robust, compliant, and scalable solutions
  • Translate regulatory requirements (e.g. GDPR, CCPA) into data handling features that preserve trust and transparency
  • Set clear KPIs and success metrics, and use them to inform prioritization, iteration, and impact storytelling
What we offer
What we offer
  • Hybrid work environment
  • Meals and snacks in offices
  • Well-being and learning & development stipend and reimbursement programs
  • Comprehensive healthcare plans
  • Family building benefits (fertility and adoption/surrogacy support)
  • Flexible time off
  • Free OneMedical memberships for you and your dependents
  • 401(k) retirement plan with employer match
  • Annual bonuses
  • Commission
  • Fulltime
Read More
Arrow Right

Senior Machine Learning Engineer, Personalization and Recommendations

As a Senior Machine Learning Engineer on the Personalization & Recommendations t...
Location
Location
United States , San Francisco
Salary
Salary:
183360.00 - 248000.00 USD / Year
edtechjobs.io Logo
EdTech Jobs
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in applied machine learning or ML-heavy software engineering, with a strong focus on personalization, ranking, or recommendation systems
  • Demonstrated impact improving key metrics such as CTR, retention, or engagement through recommender or search systems in production
  • Strong hands-on skills in Python and PyTorch, with expertise in data and feature engineering, distributed training and inference on GPUs, and familiarity with modern MLOps practices — including model registries, feature stores, monitoring, and drift detection
  • Deep understanding of retrieval and ranking architectures, such as Two-Tower models, deep cross networks, Transformers, or MMoE, and the ability to apply them to real-world problems
  • Experience with large-scale embedding models and vector search, including FAISS, ScaNN, or similar systems
  • Proficiency in experiment design and evaluation, connecting offline metrics (AUC, NDCG, calibration) with online A/B test outcomes to drive product decisions
  • Clear, effective communication, collaborating well with product managers, data scientists, engineers, and cross-functional partners
  • A growth and mentorship mindset, helping elevate team quality in modeling, experimentation, and reliability
  • Commitment to responsible and inclusive personalization, ensuring our systems respect learner privacy, fairness, and diverse goals
Job Responsibility
Job Responsibility
  • Design and implement personalization models across candidate retrieval, ranking, and post-ranking layers, leveraging user embeddings, contextual signals and content features
  • Develop scalable retrieval and serving systems using architectures such as Two-Tower models, deep ranking networks, and ANN-based vector search for real-time personalization
  • Build and maintain model training, evaluation, and deployment pipelines, ensuring reliability, training–serving consistency, observability, and robust monitoring
  • Partner with Product and Data Science to translate learner objectives (engagement, retention, mastery) into measurable modeling goals and experiment designs
  • Advance evaluation methodologies, contributing to offline metric design (e.g., NDCG, CTR, calibration) and supporting rigorous A/B testing to measure learner and business impact
  • Collaborate with platform and infrastructure teams to optimize distributed training, inference latency, and serving cost in production environments
  • Stay informed on industry and research trends, evaluating opportunities to meaningfully apply them within Quizlet’s ecosystem
  • Mentor junior and mid-level engineers, supporting technical growth, experimentation rigor, and responsible ML practices
  • Champion collaboration, inclusion, curiosity, and data-driven problem solving, contributing to a healthy and productive team culture
What we offer
What we offer
  • 20 vacation days
  • Competitive health, dental, and vision insurance (100% employee and 75% dependent PPO, Dental, VSP Choice)
  • Employer-sponsored 401k plan with company match
  • Access to LinkedIn Learning and other resources to support professional growth
  • Paid Family Leave, FSA, HSA, Commuter benefits, and Wellness benefits
  • 40 hours of annual paid time off to participate in volunteer programs of choice
  • Fulltime
Read More
Arrow Right

Machine Learning Platform / Backend Engineer

We are seeking a Machine Learning Platform/Backend Engineer to design, build, an...
Location
Location
Serbia; Romania , Belgrade; Timișoara
Salary
Salary:
Not provided
everseen.ai Logo
Everseen
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4-5+ years of work experience in either ML infrastructure, MLOps, or Platform Engineering
  • Bachelors degree or equivalent focusing on the computer science field is preferred
  • Excellent communication and collaboration skills
  • Expert knowledge of Python
  • Experience with CI/CD tools (e.g., GitLab, Jenkins)
  • Hands-on experience with Kubernetes, Docker, and cloud services
  • Understanding of ML training pipelines, data lifecycle, and model serving concepts
  • Familiarity with workflow orchestration tools (e.g., Airflow, Kubeflow, Ray, Vertex AI, Azure ML)
  • A demonstrated understanding of the ML lifecycle, model versioning, and monitoring
  • Experience with ML frameworks (e.g., TensorFlow, PyTorch)
Job Responsibility
Job Responsibility
  • Design, build, and maintain scalable infrastructure that empowers data scientists and machine learning engineers
  • Own the design and implementation of the internal ML platform, enabling end-to-end workflow orchestration, resource management, and automation using cloud-native technologies (GCP/Azure)
  • Design and manage Kubernetes-based infrastructure for multi-tenant GPU and CPU workloads with strong isolation, quota control, and monitoring
  • Integrate and extend orchestration tools (Airflow, Kubeflow, Ray, Vertex AI, Azure ML or custom schedulers) to automate data processing, training, and deployment pipelines
  • Develop shared services for model behavior/performance tracking, data/datasets versioning, and artifact management (MLflow, DVC, or custom registries)
  • Build out documentation in relation to architecture, policies and operations runbooks
  • Share skills, knowledge, and expertise with members of the data engineering team
  • Foster a culture of collaboration and continuous learning by organizing training sessions, workshops, and knowledge-sharing sessions
  • Collaborate and drive progress with cross-functional teams to design and develop new features and functionalities
  • Ensure that the developed solutions meet project objectives and enhance user experience
  • Fulltime
Read More
Arrow Right

Senior Staff Machine Learning Engineer

Help design our AI platform and develop our next generation of machine learning ...
Location
Location
United States , San Francisco
Salary
Salary:
216500.00 - 324500.00 USD / Year
gofundme.com Logo
GoFundMe
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 9+ years of hands-on experience in machine learning engineering, AI development, software engineering, or related fields
  • Experience emphasizing secure, large-scale, distributed system design, AI/ML pipeline development, and implementation
  • Extensive experience designing, developing, and operating scalable backend systems
  • Experience applying software engineering best practices such as domain-driven design, event-driven architectures, and microservices
  • Deep expertise in agentic workflows, AI evaluation solutions, prompt management, and secure AI development and testing practices
  • Strong knowledge of relational and document-based databases, data storage paradigms, and efficient RESTful API design
  • Experience establishing robust CI/CD pipelines, automated testing (unit and integration), and deployment practices
  • Strong leadership skills, including effective planning and management of complex projects, mentoring of team members, and fostering a collaborative, high-performing engineering culture
  • Excellent communicator, able to articulate complex technical concepts clearly to both technical and non-technical stakeholders
  • Bachelor's degree in Computer Science, Software Engineering, or a related technical field (preferred)
Job Responsibility
Job Responsibility
  • Design and implement AI platforms to enable scalable and secure access to LLMs from multiple model providers for diverse use cases
  • Design and implement agentic workflows, agentic tool ecosystems, and LLM prompt management solutions
  • Design, build, and optimize scalable model training, fine tuning, and inference pipelines, ensuring robust integration with production systems
  • Influence technical strategy and approach to developing embedding stores, vector databases, and other reusable assets
  • Lead initiatives to streamline ML and AI workflows, improve operational efficiency, and establish standardized procedures to achieve consistent, high-quality results across our AI systems
  • Design and develop backend services and RESTful APIs using Python and FastAPI, integrating seamlessly with ML pipelines and services
  • Take operational responsibility for team-owned services, including performance monitoring, optimization, troubleshooting, and participation in an on-call rotation
  • Collaborate with both technical and non-technical colleagues, including data and applied scientists, software engineers, product managers, and business stakeholders, to deliver reliable and scalable ML-driven products
  • Coach and mentor fellow ML engineers, promoting a culture of collaboration, continuous improvement, and engineering excellence within the team
  • Employ a diverse set of tools and platforms including Python, AWS, Databricks, Docker, Kubernetes, FastAPI, Terraform, Snowflake, Coralogix, and GitHub to build, deploy, and maintain scalable, highly available machine learning infrastructure
What we offer
What we offer
  • Competitive pay
  • Comprehensive healthcare benefits
  • Financial assistance for things like hybrid work, family planning
  • Generous parental leave
  • Flexible time-off policies
  • Mental health and wellness resources
  • Learning, development, and recognition programs
  • Fulltime
Read More
Arrow Right

Senior Machine Learning Engineer

Our Machine Learning Team is developing next-generation AI capabilities for High...
Location
Location
Canada , Vancouver
Salary
Salary:
146000.00 - 220000.00 CAD / Year
highspot.com Logo
Highspot
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or Master's degree in Computer Science, Engineering, Mathematics, or a related field
  • 5+ years of experience in developing and deploying machine learning models and systems
  • Strong proficiency in at least one programming language (Python, Java, C++, etc.) and experience with machine learning libraries (TensorFlow, Keras, PyTorch, etc.)
  • In-depth understanding of machine learning concepts, such as supervised and unsupervised learning, deep learning, and natural language processing
  • Experience with data processing tools and technologies, such as Spark, Flink, Hadoop, and SQL databases
  • Strong problem-solving skills and ability to analyze complex data sets
  • Excellent communication skills and ability to articulate complex technical concepts to non-technical stakeholders
  • An entrepreneurial spirit: you’re agile, creative, resourceful, and tenacious as you solve problems and achieve team and company goals
  • Comfortable with modern open source technologies and tools
  • Passion to learn new technologies and infrastructure
Job Responsibility
Job Responsibility
  • Collaborate with product and data science teams to identify opportunities that drive customer values, and define machine learning requirements, lead the development and deployment of machine learning solutions and applications, while ensuring scalability, reliability, and maintainability
  • Develop and maintain a robust machine learning platform that enables the efficient deployment and management of machine learning models across our product offerings
  • Stay up-to-date with the latest machine learning research and trends, and apply them to our product offerings, while mentoring and collaborating with junior team members to improve their technical skills
  • Partner cross-functionally across all of Highspot’s feature engineering crews, as well as with QA/DevOps/Site-Reliability
What we offer
What we offer
  • Comprehensive medical, dental, vision, disability, and life benefits
  • Group Retirement Savings Plan (RRSP) and matching employer contributions (DPSP) with immediate vesting
  • Flexible PTO
  • Generous Holiday Schedule + 5 Days for Annual Holiday Week
  • Quarterly Recharge Fridays (paid days off for mental health recharge)
  • Flexible work schedules
  • Access to Coaches and Therapists through Modern Health
  • 2 Volunteer days per year
  • Monthly transportation allowance for employees that work in our Vancouver Hub location
  • Employees are eligible to receive stock options
  • Fulltime
Read More
Arrow Right

Sr. Machine Learning Engineer

Our Machine Learning Team is developing next-generation AI capabilities for High...
Location
Location
United States , Seattle
Salary
Salary:
171000.00 - 254100.00 USD / Year
highspot.com Logo
Highspot
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or Master's degree in Computer Science, Engineering, Mathematics, or a related field
  • 5+ years of experience in developing and deploying machine learning models and systems
  • Strong proficiency in at least one programming language (Python, Java, C++, etc.) and experience with machine learning libraries (TensorFlow, Keras, PyTorch, etc.)
  • In-depth understanding of machine learning concepts, such as supervised and unsupervised learning, deep learning, and natural language processing
  • Experience with data processing tools and technologies, such as Spark, Flink, Hadoop, and SQL databases
  • Strong problem-solving skills and ability to analyze complex data sets
  • Excellent communication skills and ability to articulate complex technical concepts to non-technical stakeholders
  • An entrepreneurial spirit: you’re agile, creative, resourceful, and tenacious as you solve problems and achieve team and company goals
  • Comfortable with modern open source technologies and tools
  • Passion to learn new technologies and infrastructure
Job Responsibility
Job Responsibility
  • Collaborate with product and data science teams to identify opportunities that drive customer values, and define machine learning requirements, lead the development and deployment of machine learning solutions and applications, while ensuring scalability, reliability, and maintainability
  • Develop and maintain a robust machine learning platform that enables the efficient deployment and management of machine learning models across our product offerings
  • Stay up-to-date with the latest machine learning research and trends, and apply them to our product offerings, while mentoring and collaborating with junior team members to improve their technical skills
  • Partner cross-functionally across all of Highspot’s feature engineering crews, as well as with QA/DevOps/Site-Reliability
What we offer
What we offer
  • Comprehensive medical, dental, vision, disability, and life benefits
  • Health Savings Account (HSA) with employer contribution
  • 401(k) Matching with immediate vesting on employer match
  • Flexible PTO
  • 8 paid holidays and 5 paid days for Annual Holiday Week
  • Quarterly Recharge Fridays (paid days off for mental health recharge)
  • 18 weeks paid parental leave
  • Access to Coaches and Therapists through Modern Health
  • 2 volunteer days per year
  • Commuting benefits
  • Fulltime
Read More
Arrow Right

Senior Machine Learning Infrastructure Engineer

As a Senior ML Infrastructure Engineer at Plus, you will design scalable archite...
Location
Location
United States , Santa Clara
Salary
Salary:
160000.00 - 200000.00 USD / Year
plus.ai Logo
PlusAI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Phd or MS in Computer Science, Electrical Engineering, or related field
  • Good oral and written communication skills
  • Phd new grad or Masters with 3+ years of software engineering experience with a focus on ML infrastructure or distributed systems
  • Proficiency in in Python, C++, SQL
  • Deep understanding of containerization, orchestration technologies, distributed ML workload, and experiment tracking tools (e.g., Docker, Kubernetes, multiprocessing, Kubeflow, and mlflow)
  • Deploy and manage resources across multiple cloud platforms (AWS, GCP, or on-prem environments)
  • Proficiency in at least one deep learning framework, such as PyTorch and data pipeline tools (e.g., Apache Airflow, Prefect)
  • Strong knowledge of distributed systems, databases, and storage solutions
  • Extensive software design and development skills
  • Ability to learn and adapt to new technologies and contribute in a productive environment
Job Responsibility
Job Responsibility
  • Design and develop scalable, high-performance systems for training, inference, deploying, and monitoring ML models at scale
  • Build and maintain efficient data pipelines, model versioning systems, and experiment tracking frameworks
  • Collaborate with cross-functional teams, including ML researchers and engineers, to identify bottlenecks and improve platform usability
  • Implement distributed systems and storage solutions optimized for machine learning workloadsDrive improvements in CI/CD workflows for ML models and infrastructure
  • Ensure high availability and reliability of the ML platform by implementing robust monitoring, logging, and alerting systems
  • Stay current with industry trends and integrate relevant tools and frameworks to enhance the platform
  • Mentor junior engineers and contribute to a culture of technical excellence
  • Ensure that your work is performed in accordance with the company’s Quality Management System (QMS) requirements and contribute to continuous improvement efforts
  • Ensure team compliance with QMS, monitor quality, and drive process improvements
What we offer
What we offer
  • Work, learn and grow in a highly future-oriented, innovative and dynamic field
  • Wide range of opportunities for personal and professional development
  • Catered free lunch, unlimited snacks and beverages
  • Highly competitive salary and benefits package, including 401(k) plan
  • Fulltime
Read More
Arrow Right