CrawlJobs Logo

Senior ML Ops Engineer

United States, Philadelphia 95300.00 - 158800.00 USD / Year · Job Posted January 13, 2026
Apply Position
Job Link Share

Job Description

Join Elsevier as a Senior ML Ops Engineer to lead the development of impactful AI-based features within health platforms while bridging the gap between data science and engineering. You will work on AI-based features (GenAI, Agentic AI, RAG, etc.) search/ranking quality, and knowledge graph aware retrieval while enforcing content rights and editorial confidentiality.

Job Responsibility

  • Automate and orchestrate machine learning workflows across major cloud and AI platforms (AWS, Azure, Databricks, and foundation model APIs such as OpenAI)
  • Maintain and version model registries and artifact stores to ensure reproducibility and governance
  • Develop and manage CI/CD for ML, including automated data validation, model testing, and deployment
  • Implement ML Engineering solutions using popular MLOps platforms such as AWS SageMaker, MLflow, Azure ML
  • Scale end-end custom Sagemaker pipelines
  • Design and implement the engineering components of GAR+RAG systems (e.g., query interpretation and reflection, chunking, embeddings, hybrid retrieval, semantic search), manage prompt libraries, guardrails and structured output for LLMs hosted on Bedrock/SageMaker or self-hosted
  • Design and implement ML pipelines that utilize Elasticsearch/OpenSearch/Solr, vector DBs, and graph DBs
  • Build evaluation pipelines: offline IR metrics (NDCG, MAP, MRR), LLM quality metrics (faithfulness, grounding), and A/B testing
  • Optimize infrastructure costs through monitoring, scaling strategies, and efficient resource utilization
  • Stay current with the latest GAI research, NLP and RAG and apply the state-of-the-art in our experiments and systems
  • Partner with Subject-Matter Experts, Product Managers, Data Scientists and Responsible AI experts to translate business problems into cutting edge data science solutions
  • Collaborate and interface with Operations Engineers who deploy and run production infrastructure

Requirements

  • Current experience in ML Engineering, MLOps platforms, shipping ML or search/GenAI systems to production
  • Strong Python, Java, and/or Scala experience
  • Hands-on experience with major cloud vendor solutions (AWS, Azure and/or Google)
  • Experience with Search/vector/graph technologies (e.g., Elasticsearch / OpenSearch / Solr / Neo4j)
  • Experience in evaluating LLM models
  • A strong understanding of the Data Science Life Cycle including feature engineering, model training, and evaluation metrics
  • Familiarity with ML frameworks, e.g., PyTorch, TensorFlow, PySpark
  • Experience with large-scale data processing systems, e.g., Spark
  • Experience with statistical analysis, machine learning theory and natural language processing

Nice to have

Background in health technology and/or medical content workflows

What we offer

  • Annual incentive bonus
  • Country specific benefits
  • Fair and accessible hiring process with accommodation support

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Senior ML Ops Engineer

8 matching positions

Senior ML Ops Engineer - Architecture & Strategy

We own the platform blueprint for our ML infrastructure: designing systems that ...
Location
Location
Germany , Munich
Salary
Salary:
Not provided
bmw.de Logo
BMW
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • University degree in Computer Science, Computer/Electrical Engineering or related subjects
  • 5–8+ years in ML platform or infrastructure engineering, with at least two years in a tech lead or architect role
  • Deep expertise in either AWS, Azure or Google cloud, ideally with multi-region or multi-account setups
  • Proven track record designing systems for PB-scale data and hundreds of concurrent training jobs as well as understanding of large vision models and the challenges of compressing them for automotive-grade SoCs
  • Strong knowledge of Kubernetes platform design, GitOps, and infrastructure-as-code
  • Excellent communication skills to align ML researchers, embedded engineers, data teams, and executives
  • Familiarity with edge model compilation toolchains for Qualcomm (QNN, AIMET) and/or NVIDIA (TensorRT, Triton) and experience with automotive data at scale, such as MDF4, MCAP, ROS bags, and multi-sensor synchronisation
Job Responsibility
Job Responsibility
  • You design the reference architecture for the ML platform end-to-end: data ingestion, PB-scale data lake, heterogeneous training clusters, model registry, and deployment-ready artefacts
  • You design the data-format backbone, setting standards for data flows, ingestion, cataloguing, transcoding, and partitioning at PB scale, integrated with dataset management tooling
  • You define the platform component topology and integration contracts for pipeline orchestration, experiment tracking, hyperparameter optimisation, dataset management, observability, and metadata
  • You establish model lifecycle governance, including experiment tracking, approval gates, validation criteria, and clear handoff contracts to deployment teams
  • You drive cost governance at PB scale, including accelerator spot strategies, S3 tiering, cross-AZ traffic reduction, and Kubernetes cluster right-sizing
  • You partner with Security, Legal, and Functional-Safety teams on ISO 26262, ISO 8800, and data-protection compliance
What we offer
What we offer
  • Challenging projects with which we shape the mobility of tomorrow together
  • Wide range of personal and professional development opportunities
  • Attractive, fair and performance-related remuneration
  • High level of job security
  • Annual special payments such as vacation pay, Christmas bonus, and profit sharing
  • Flexible working hours including six weeks annual leave and overtime compensation
  • Discounted BMW & MINI conditions
  • Many other benefits at bmw.jobs/benefits
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, AI & ML Ops

Hyundai AutoEver America seeks a seasoned Senior AI/ML Engineer to architect, de...
Location
Location
United States , Irvine
Salary
Salary:
103170.00 - 158873.00 USD / Year
haeaus.com Logo
Hyundai AutoEver America
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Engineering, AI, or related field
  • advanced degrees/certifications are a plus
  • 8+ years of software engineering experience, including 3+ years in AI/ML solution development
  • Proven experience designing and deploying LLM-based solutions, traditional ML models, RAG systems, and agent workflows
  • Strong expertise in Python, TensorFlow/PyTorch, Hugging Face, prompt engineering, vector databases, and AI orchestration
  • Hands-on experience with AWS SageMaker/Bedrock, Azure OpenAI, or Azure ML Studio, plus MLOps best practices (CI/CD, testing, model monitoring)
  • Proficiency in frontend frameworks (React), cloud-native deployment (Docker/Kubernetes), microservice APIs, and relational/NoSQL databases
Job Responsibility
Job Responsibility
  • Architect and develop scalable AI/ML and LLM-based systems, including RAG pipelines, agentic workflows, predictive models, and generative AI solutions
  • Build full‑stack AI applications, including React-based dashboards and front‑end interfaces integrated with backend services and cloud infrastructure
  • Develop data pipelines and ML Ops workflows using Python, SQL, AWS/Azure platforms, and monitoring tools to train, deploy, and optimize models
  • Lead cross-functional AI initiatives, deliver PoCs/MVPs, ensure compliance with AI governance, and integrate AI features into enterprise and user-facing systems
  • Provide technical leadership and mentorship, guiding standards, code reviews, model documentation, and best practices in AI/ML development
  • Continuously improve AI performance and reliability through prompt engineering, architecture enhancements, and data optimization
What we offer
What we offer
  • comprehensive medical/dental coverage
  • generous PTO
  • education assistance
  • annual merit increase eligibility
  • Fulltime
Read More
Arrow Right

Senior Software Engineer - ML Infrastructure

We build simple yet innovative consumer products and developer APIs that shape h...
Location
Location
United States , New York
Salary
Salary:
190800.00 - 286800.00 USD / Year
plaid.com Logo
Plaid
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of industry experience as a software engineer, with strong focus on ML/AI infrastructure or large-scale distributed systems
  • Hands-on expertise in building and operating ML platforms (e.g., feature stores, data pipelines, training/inference frameworks)
  • Proven experience delivering reliable and scalable infrastructure in production
  • Solid understanding of ML Ops concepts and tooling, as well as best practices for observability, security, and reliability
  • Strong communication skills and ability to collaborate across teams
Job Responsibility
Job Responsibility
  • Design and implement large-scale ML infrastructure, including feature stores, pipelines, deployment tooling, and inference systems
  • Drive the rollout of Plaid’s next-generation feature store to improve reliability and velocity of model development
  • Help define and evangelize an ML Ops “golden path” for secure, scalable model training, deployment, and monitoring
  • Ensure operational excellence of ML pipelines and services, including reliability, scalability, performance, and cost efficiency
  • Collaborate with ML product teams to understand requirements and deliver solutions that accelerate experimentation and iteration
  • Contribute to technical strategy and architecture discussions within the team
  • Mentor and support other engineers through code reviews, design discussions, and technical guidance
  • Fulltime
Read More
Arrow Right

Senior ML Systems Engineer, Frameworks & Tooling

We’re looking for a senior engineer to help build, maintain and evolve the train...
Location
Location
Salary
Salary:
Not provided
cohere.com Logo
Cohere
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong engineering experience in large-scale distributed training or HPC systems
  • Deep familiarity with JAX internals, distributed training libraries, or custom kernels/fused ops
  • Experience with multi-node cluster orchestration (Slurm, Ray, Kubernetes, or similar)
  • Comfort debugging performance issues across CUDA/NCCL, networking, IO, and data pipelines
  • Experience working with containerized environments (Docker, Singularity/Apptainer)
  • A track record of building tools that increase developer velocity for ML teams
  • Excellent judgment around trade-offs: performance vs complexity, research velocity vs maintainability
  • Strong collaboration skills — you’ll work closely with infra, research, and deployment teams
Job Responsibility
Job Responsibility
  • Build and own the training framework responsible for large-scale LLM training
  • Design distributed training abstractions (data/tensor/pipeline parallelism, FSDP/ZeRO strategies, memory management, checkpointing)
  • Improve training throughput and stability on multi-node clusters (e.g., GB200/300, AMD, H200/100)
  • Develop and maintain tooling for monitoring, logging, debugging, and developer ergonomics
  • Collaborate closely with infra teams to ensure our cluster, container environments, and hardware configurations support high-performance training
  • Investigate and resolve performance bottlenecks across the ML systems stack
  • Build robust systems that ensure reproducible, debuggable, large-scale runs
What we offer
What we offer
  • An open and inclusive culture and work environment
  • Work closely with a team on the cutting edge of AI research
  • Weekly lunch stipend, in-office lunches & snacks
  • Full health and dental benefits, including a separate budget to take care of your mental health
  • 100% Parental Leave top-up for up to 6 months
  • Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
  • Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
  • 6 weeks of vacation (30 working days!)
  • Fulltime
Read More
Arrow Right

Senior Software Engineer – ML Model Compliance & Automation

We are seeking a highly skilled and motivated Senior Software Engineer to lead t...
Location
Location
India , Jaipur
Salary
Salary:
Not provided
infoobjects.com Logo
InfoObjects
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience Required: 3 - 7 yrs
  • GoLang (preferred)
  • Python (preferred)
  • Bash
  • MLOps Tools: KitOps, MLModelCI, MLflow, ONNX, TensorFlow, PyTorch, Docker
  • SBOM & Security: Syft, Grype, Trivy, CycloneDX, SPDX
  • CI/CD: GitHub Actions, GitLab CI, Jenkins, ArgoCD
  • Infra: Kubernetes, Docker, Helm, Terraform
  • Cloud: AWS, GCP, Azure (EKS/GKE/ECS preferred)
  • Version Control: Git, GitOps
Job Responsibility
Job Responsibility
  • Model Packaging & Artifact Management: Design and implement workflows for packaging ML models using KitOps, ONNX, MLflow, or TensorFlow SavedModel
  • Manage model artifact versioning, registries, and reproducibility
  • Ensure artifact integrity, consistency, and traceability across CI/CD pipelines
  • Model Profiling & Optimization: Automate model profiling (latency, size, ops) using MLModelCI, TorchServe, or ONNX Runtime
  • Apply quantization, pruning, and format conversions (e.g., FP32→INT8) for optimization
  • Embed profiling and optimization checks into CI/CD pipelines to assess deployment readiness
  • Compliance & SBOM Generation: Develop pipelines to generate and validate SBOMs for ML models
  • Implement compliance checks for licensing, vulnerabilities, and security using CycloneDX, SPDX, Syft, or Trivy
  • Validate schema, dependencies, and runtime environments for production readiness
  • Cloud Integration & Deployment: Automate model registration, endpoint creation, and monitoring setup in AWS/GCP/Azure
  • Fulltime
Read More
Arrow Right

Senior Software Engineer - ML Infrastructure

We build simple yet innovative consumer products and developer APIs that shape h...
Location
Location
United States , San Francisco
Salary
Salary:
180000.00 - 270000.00 USD / Year
plaid.com Logo
Plaid
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of industry experience as a software engineer, with strong focus on ML/AI infrastructure or large-scale distributed systems
  • Hands-on expertise in building and operating ML platforms (e.g., feature stores, data pipelines, training/inference frameworks)
  • Proven experience delivering reliable and scalable infrastructure in production
  • Solid understanding of ML Ops concepts and tooling, as well as best practices for observability, security, and reliability
  • Strong communication skills and ability to collaborate across teams
Job Responsibility
Job Responsibility
  • Design and implement large-scale ML infrastructure, including feature stores, pipelines, deployment tooling, and inference systems
  • Drive the rollout of Plaid’s next-generation feature store to improve reliability and velocity of model development
  • Help define and evangelize an ML Ops “golden path” for secure, scalable model training, deployment, and monitoring
  • Ensure operational excellence of ML pipelines and services, including reliability, scalability, performance, and cost efficiency
  • Collaborate with ML product teams to understand requirements and deliver solutions that accelerate experimentation and iteration
  • Contribute to technical strategy and architecture discussions within the team
  • Mentor and support other engineers through code reviews, design discussions, and technical guidance
What we offer
What we offer
  • medical, dental, vision, and 401(k)
  • Fulltime
Read More
Arrow Right

Senior Software Consultant - ML Ops

10Pearls is seeking an MLOps Engineer – ML Platform & Feature Store to build, op...
Location
Location
Pakistan , Islamabad
Salary
Salary:
Not provided
10pearls.com Logo
10Pearls
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Engineering, or a related field (preferred)
  • 3–5 years of experience in ML engineering, data engineering, or MLOps roles
  • Strong Python skills with experience in pandas, numpy, pyarrow, scikit-learn
  • Hands-on experience with feature stores (Feast preferred) or similar feature pipeline systems
  • Experience with MLflow or similar experiment tracking/model registry tools
  • Familiarity with distributed computing frameworks (Spark or equivalent)
  • Working knowledge of Docker, Kubernetes (kubectl, Helm), and containerized workflows
  • Experience handling GPU-based workloads
  • Strong problem-solving skills and ability to support cross-functional teams
Job Responsibility
Job Responsibility
  • Build and maintain feature pipelines using Feast, including feature definitions and materialisation jobs (batch + streaming)
  • Develop and manage training pipelines, including containerization, scheduling, dataset access, and artifact handling
  • Operate and maintain MLflow tracking server, managing experiments, models, and artifact storage
  • Execute model evaluation workflows, run evaluation suites, and support model promotion decisions
  • Enable data scientists by resolving issues related to environment setup, data access, compute, and reproducibility
  • Manage GPU-based workloads and ensure efficient scheduling and utilization
  • Support distributed data processing using Spark or similar frameworks
  • Ensure air-gap readiness by managing dependencies, pre-building images, and enabling offline deployments
  • Collaborate with MLOps Lead on platform improvements, scalability, and long-term architecture
  • Fulltime
Read More
Arrow Right

Senior Machine Learning Ops Engineer

As our Senior MLOps Engineer, you will take ownership of our machine learning pl...
Location
Location
Germany , Berlin
Salary
Salary:
Not provided
enpal.com Logo
enpal
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ years of experience in MLOps, ML engineering, or applied machine learning in production environments
  • Strong ownership mentality—you care about impact, not just implementation
  • Clear communicator who can explain complex ML concepts to non-technical stakeholders
  • Solid experience with cloud infrastructure (Azure preferred), container orchestration (Docker/Kubernetes), and IaC (Terraform)
  • Proven track record with ML lifecycle tooling—model versioning, monitoring, retraining, CI/CD
  • Familiarity with MLFlow, Airflow, or similar platforms
  • Strong programming skills in Python and experience with ML frameworks
  • Hands-on experience with Snowflake, Databricks, or modern data stack tools
Job Responsibility
Job Responsibility
  • Act as AI Act Steward within Enpal - ensure compliance with the EU AI Act and future regulations
  • Build and maintain a central registry of all ML and GenAI use cases and models
  • Design processes to monitor high-risk models, ensuring explainability, robustness, and fairness
  • Design and implement core infrastructure, including: A centralized Model Registry
  • A scalable Feature Store
  • Automated Monitoring systems for both ML and GenAI models
  • Orchestration pipelines for retraining and redeployment (e.g. Airflow-based)
  • Collaborate with Data Engineering to ensure seamless CI/CD for ML workflows
  • Drive implementation of GenAI-based agents that interface with our DWH (e.g. access distribution, text-to-SQL, semantic search, natural language querying)
  • Prototype and deploy agentic LLM workflows using Snowflake and other enterprise data assets
What we offer
What we offer
  • Competitive salary
  • Flexible working arrangements
  • Empowering team culture
  • Hybrid working model
  • Modern office in Berlin-Friedrichshain with amenities (ping-pong table, yoga corner, roof terrace, stocked drinks fridges)
  • Onboarding day to get to know the company, team colleagues and founder
  • Monthly all-hands meetings
  • Lunch & Learn sessions
  • Legendary team spirit and unforgettable team events
  • Strong feedback culture
  • Fulltime
Read More
Arrow Right