CrawlJobs Logo

Senior ML Ops Engineer - Architecture & Strategy

bmw.de Logo

BMW

Location Icon

Location:
Germany , Munich

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

We own the platform blueprint for our ML infrastructure: designing systems that integrate with a data mesh of domain-owned data products, leverage Qualcomm Cloud AI 100 and NVIDIA GPU clusters for training at petabyte scale and produce optimised model artefacts ready for deployment to vehicle hardware. We set technical direction, make build-vs-buy decisions, and ensure the platform scales to hundreds of engineers.

Job Responsibility:

  • You design the reference architecture for the ML platform end-to-end: data ingestion, PB-scale data lake, heterogeneous training clusters, model registry, and deployment-ready artefacts
  • You design the data-format backbone, setting standards for data flows, ingestion, cataloguing, transcoding, and partitioning at PB scale, integrated with dataset management tooling
  • You define the platform component topology and integration contracts for pipeline orchestration, experiment tracking, hyperparameter optimisation, dataset management, observability, and metadata
  • You establish model lifecycle governance, including experiment tracking, approval gates, validation criteria, and clear handoff contracts to deployment teams
  • You drive cost governance at PB scale, including accelerator spot strategies, S3 tiering, cross-AZ traffic reduction, and Kubernetes cluster right-sizing
  • You partner with Security, Legal, and Functional-Safety teams on ISO 26262, ISO 8800, and data-protection compliance

Requirements:

  • University degree in Computer Science, Computer/Electrical Engineering or related subjects
  • 5–8+ years in ML platform or infrastructure engineering, with at least two years in a tech lead or architect role
  • Deep expertise in either AWS, Azure or Google cloud, ideally with multi-region or multi-account setups
  • Proven track record designing systems for PB-scale data and hundreds of concurrent training jobs as well as understanding of large vision models and the challenges of compressing them for automotive-grade SoCs
  • Strong knowledge of Kubernetes platform design, GitOps, and infrastructure-as-code
  • Excellent communication skills to align ML researchers, embedded engineers, data teams, and executives
  • Familiarity with edge model compilation toolchains for Qualcomm (QNN, AIMET) and/or NVIDIA (TensorRT, Triton) and experience with automotive data at scale, such as MDF4, MCAP, ROS bags, and multi-sensor synchronisation
What we offer:
  • Challenging projects with which we shape the mobility of tomorrow together
  • Wide range of personal and professional development opportunities
  • Attractive, fair and performance-related remuneration
  • High level of job security
  • Annual special payments such as vacation pay, Christmas bonus, and profit sharing
  • Flexible working hours including six weeks annual leave and overtime compensation
  • Discounted BMW & MINI conditions
  • Many other benefits at bmw.jobs/benefits

Additional Information:

Job Posted:
March 21, 2026

Employment Type:
Fulltime
Job Link Share:
PREMIUM
More languages and countries
+ Unlock 30955 hidden job offers
Languages
English Čeština Deutsch Ελληνικά Español Français +15
Countries
United States United Kingdom India Canada Australia +
See plans
Plans from $2.99 / month

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior ML Ops Engineer - Architecture & Strategy

Senior Software Engineer - ML Infrastructure

We build simple yet innovative consumer products and developer APIs that shape h...
Location
Location
United States , San Francisco
Salary
Salary:
180000.00 - 270000.00 USD / Year
plaid.com Logo
Plaid
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of industry experience as a software engineer, with strong focus on ML/AI infrastructure or large-scale distributed systems
  • Hands-on expertise in building and operating ML platforms (e.g., feature stores, data pipelines, training/inference frameworks)
  • Proven experience delivering reliable and scalable infrastructure in production
  • Solid understanding of ML Ops concepts and tooling, as well as best practices for observability, security, and reliability
  • Strong communication skills and ability to collaborate across teams
Job Responsibility
Job Responsibility
  • Design and implement large-scale ML infrastructure, including feature stores, pipelines, deployment tooling, and inference systems
  • Drive the rollout of Plaid’s next-generation feature store to improve reliability and velocity of model development
  • Help define and evangelize an ML Ops “golden path” for secure, scalable model training, deployment, and monitoring
  • Ensure operational excellence of ML pipelines and services, including reliability, scalability, performance, and cost efficiency
  • Collaborate with ML product teams to understand requirements and deliver solutions that accelerate experimentation and iteration
  • Contribute to technical strategy and architecture discussions within the team
  • Mentor and support other engineers through code reviews, design discussions, and technical guidance
What we offer
What we offer
  • medical, dental, vision, and 401(k)
  • Fulltime
Read More
Arrow Right

Senior Principal Technical Program Manager - ML Platform

Location
Location
Salary
Salary:
231300.00 - 301975.00 USD / Year
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience on software teams as Development Manager, Technical Product Manager or TPM leading technical platforms areas
  • Deep domain experience in AI and/or Search. Example: Model Inference, Model Evaluation, Model Training, LLM Ops, Semantic Search, Search Relevance, etc.
  • Partner with Engineering in defining direction, strategy and execution at Platform level
  • Strategic thinking and ability to understand business objectives to translate them into technical problems and programs.
  • Technical understanding of systems involved. Willingness to develop domain expertise in the area they operate - storage, networking, authentication, capacity management, service deployments, etc.
  • TPMs are not expected to write or read code, but are expected to understand system flows, block architectures, APIs and such.
  • Experience defining and running end-to-end complex technical programs
  • Strong leadership, organizational, and communication skills
Job Responsibility
Job Responsibility
  • Understand and stay up-to-date on latest innovations in AI and Search. Partner closely with engineering teams to translate these into practical platform evolution for Atlassian bringing value to our customers.
  • Analyze business objectives, customer needs, product adoption inhibitors and opportunities, industry trends, and based on these, in close collaboration with your stakeholders, define a long-term strategy and roadmap for your platform and product components.
  • Understand business objectives and translate them into technical systems problems that need to be prioritized solved in the current business environment.
  • Define specific systems programs and create a plan of action for realizing those programs. Such programs could be around capacity planning, migration efforts, high availability, network architecture, performance optimization, reliability improvements and more.
  • Use your technical understanding of Atlassian and related systems to partner with and influence engineers and architects in making progress on these problems.
  • Responsible for taking a systematic approach to engineering problems. This includes: prioritizing tasks, scoping out the project, defining objectives, and making consistent progress against each of these.
  • Be accountable for the success of these technical programs by managing the entire lifecycle from initiation to forecasting, budgeting, scheduling, etc.
  • Manage complex dependencies and projects with a broad scope across the company
What we offer
What we offer
  • health and wellbeing resources
  • paid volunteer days
Read More
Arrow Right

Senior Software Engineer - ML Infrastructure

We build simple yet innovative consumer products and developer APIs that shape h...
Location
Location
United States , New York
Salary
Salary:
190800.00 - 286800.00 USD / Year
plaid.com Logo
Plaid
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of industry experience as a software engineer, with strong focus on ML/AI infrastructure or large-scale distributed systems
  • Hands-on expertise in building and operating ML platforms (e.g., feature stores, data pipelines, training/inference frameworks)
  • Proven experience delivering reliable and scalable infrastructure in production
  • Solid understanding of ML Ops concepts and tooling, as well as best practices for observability, security, and reliability
  • Strong communication skills and ability to collaborate across teams
Job Responsibility
Job Responsibility
  • Design and implement large-scale ML infrastructure, including feature stores, pipelines, deployment tooling, and inference systems
  • Drive the rollout of Plaid’s next-generation feature store to improve reliability and velocity of model development
  • Help define and evangelize an ML Ops “golden path” for secure, scalable model training, deployment, and monitoring
  • Ensure operational excellence of ML pipelines and services, including reliability, scalability, performance, and cost efficiency
  • Collaborate with ML product teams to understand requirements and deliver solutions that accelerate experimentation and iteration
  • Contribute to technical strategy and architecture discussions within the team
  • Mentor and support other engineers through code reviews, design discussions, and technical guidance
  • Fulltime
Read More
Arrow Right

Principal Engineer

The Principal AI/ML Operations Engineer leads the architecture, automation, and ...
Location
Location
United States , Pleasanton, California
Salary
Salary:
251000.00 - 314500.00 USD / Year
blackline.com Logo
BlackLine
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Machine Learning, Data Science, or a related field
  • 10+ years in ML infrastructure, DevOps, and software system architecture
  • 4+ years in leading MLOps or AI Ops platforms
  • Strong programming skills in languages such as Python, Java, or Scala
  • Expertise in ML frameworks (TensorFlow, PyTorch, scikit-learn) and orchestration tools (Airflow, Kubeflow, Vertex AI, MLflow)
  • Proven experience operating production pipelines for ML and LLM-based systems across cloud ecosystems (GCP, AWS, Azure)
  • Deep familiarity with LangChain, LangGraph, ADK or similar agentic system runtime management
  • Strong competencies in CI/CD, IaC, and DevSecOps pipelines integrating testing, compliance, and deployment automation
  • Hands-on with observability stacks (Prometheus, Grafana, Newrelic) for model and agent performance tracking
  • Understanding of governance frameworks for Responsible AI, auditability, and cost metering across training and inference workloads
Job Responsibility
Job Responsibility
  • Define enterprise-level standards and reference architectures for ML-Ops and AIOps systems
  • Partner with data science, security, and product teams to set evaluation and governance standards (Guardrails, Bias, Drift, Latency SLAs)
  • Mentor senior engineers and drive design reviews for ML pipelines, model registries, and agentic runtime environments
  • Lead incident response and reliability strategies for ML/AI systems
  • Lead the deployment of AI models and systems in various environments
  • Collaborate with development teams to integrate AI solutions into existing workflows and applications
  • Ensure seamless integration with different platforms and technologies
  • Define and manage MCP Registry for agentic component onboarding, lifecycle versioning, and dependency governance
  • Build CI/CD pipelines automating LLM agent deployment, policy validation, and prompt evaluation of workflows
  • Develop and operationalize experimentation frameworks for agent evaluations, scenario regression, and performance analytics
What we offer
What we offer
  • short-term and long-term incentive programs
  • robust offering of benefit and wellness plans
  • Fulltime
Read More
Arrow Right

Principal Data And Analytics Engineer

The Principal Data and Analytics Engineer holds comprehensive responsibility for...
Location
Location
United States
Salary
Salary:
108086.00 - 180144.00 USD / Year
oreillyauto.com Logo
O'Reilly Auto Parts
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven experience architecting enterprise-scale data platforms and ecosystems, including hybrid and cloud-native environments (e.g., GCP BigQuery, Snowflake, Iceberg, Advanced SQL, Erwin, dbt, Kafka, Alation, Collibra)
  • Deep expertise in designing and scaling highly available, secure, and fault-tolerant batch and streaming pipelines with strong emphasis on cost optimization, observability, and latency control
  • Advanced proficiency in semantic modeling, reusable data asset design, and cross-functional data product delivery aligned to medallion architecture
  • Leadership in implementing CI/CD-enabled pipelines, RBAC frameworks, schema evolution strategies, and interoperable data exchange using Iceberg or equivalent table formats
  • Ownership of organization-wide metrics store and semantic layers, ensuring consistency, governance, and performance across reporting, AI, and ML use cases
  • Advanced expertise in programming languages such as Python, Scala, with the ability to architect complex data solutions
  • Demonstrated leadership in designing and overseeing the implementation of scalable, idempotent workflows using orchestration frameworks such as Airflow and Prefect
  • Demonstrated ability to translate business transformation goals into scalable data solutions and reusable patterns
  • Deep understanding of business processes, KPIs, and capability maps across functions such as supply chain, customer, store ops, and finance
  • Proven experience in driving cross-functional data product prioritization, influencing senior stakeholders, and quantifying impact of data initiatives
Job Responsibility
Job Responsibility
  • Help define and evolve enterprise data engineering blueprints, including data mesh, medallion architecture, and hybrid cloud data platforms
  • Set strategic direction for data platforms, tools, and services (e.g., Snowflake, GCP BigQuery, dbt, Kafka, Airflow/Prefect) in alignment with future-state architecture and business priorities
  • Architect and design highly scalable, resilient, cost optimal and secure data platforms
  • Lead the design and implementation of next-generation data platforms, ensuring fault tolerance, high availability, and optimal performance for petabyte-scale data
  • Establish and enforce organization-wide best practices for data pipeline development, CI/CD for data workflows, automated deployment playbooks, and robust rollback strategies
  • Lead technology evaluation and adoption, proactively researching, evaluating, and championing the integration of cutting-edge data technologies, frameworks, and methodologies
  • Define and scale enterprise knowledge management frameworks that ensure consistent documentation, discoverability, and reusability of data assets across domains
  • Establish and govern standards for metadata management, data lineage, architectural diagrams, and runbooks
  • Lead the design of federated governance models that empower domain-aligned teams to operate autonomously while conforming to centralized policies, frameworks and playbooks
  • Collaborate with data governance, compliance, and security teams to operationalize policy-as-code frameworks for data retention, access control, and PII handling
What we offer
What we offer
  • Competitive Wages & Paid Time Off
  • Stock Purchase Plan & 401k with Employer Contributions Starting Day One
  • Medical, Dental, & Vision Insurance with Optional Flexible Spending Account (FSA)
  • Team Member Health/Wellbeing Programs
  • Tuition Educational Assistance Programs
  • Opportunities for Career Growth
  • Fulltime
Read More
Arrow Right

Senior Manager, AI Engineering

By leading the strategic adoption and scaling of AI across the organisation this...
Location
Location
United Kingdom , London OR Newbury
Salary
Salary:
Not provided
vodafone.com Logo
Vodafone
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven experience in AI strategy, delivery, and enablement
  • Strong understanding of GenAI, ML Ops, and AI governance
  • Familiarity with infrastructure provisioning and model lifecycle
  • Ability to influence cross-functional teams and stakeholders
  • Experience in training, consulting, and change management
  • Knowledge of privacy, security, and ethical AI practices
Job Responsibility
Job Responsibility
  • Define and deliver the AI strategy and roadmap
  • Build and maintain self-service AI environments and infrastructure
  • Implement use cases to demonstrate business value
  • Operate and monitor AI models for accuracy and performance
  • Collaborate with architecture, governance, and security teams
  • Establish best practice and enable reuse across solutions
  • Drive AI enablement through training and consulting
  • Evangelise AI adoption across internal and customer-facing teams
  • Monitor industry trends and pilot emerging opportunities
  • Measure and report on efficiency gains and impact
What we offer
What we offer
  • Great pay, bonuses, up to 28 days off plus bank holidays, and paid time for charity work
  • Personalise benefits for you and your family, like discounts, vouchers, a pension plan and loads more
  • Amazing learning tools and top-notch parental leave policies
  • Fulltime
Read More
Arrow Right
New

Lgv Driver

Do you hold a Cat C+E Licence and are ready to dive into a new, exciting challen...
Location
Location
United Kingdom , Peterborough
Salary
Salary:
44373.05 GBP / Year
jobs.360resourcing.co.uk Logo
360 Resourcing Solutions
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • LGV Licence Cat C+E
  • Good work ethic
  • Time management skills
Job Responsibility
Job Responsibility
  • Deliveries and collections
  • Visiting different customers on a daily basis
  • Must be flexible for early starts/late finishes
What we offer
What we offer
  • Overtime premiums available
  • Annual pay review
  • Company sick pay
  • Modern fleet of vehicles with Bluetooth capability
  • Premium heated seats in the cabs
  • Free inhouse CPC Training
  • Company uniform provided
  • Free onsite parking
  • Employee Assistance Programme
  • Staff Discount Scheme
  • Fulltime
Read More
Arrow Right
New

Sr. Engineer, Software - EPM

This role is responsible for designing, building, and supporting scalable Oracle...
Location
Location
United States , Bellevue; Frisco
Salary
Salary:
113600.00 - 205000.00 USD / Year
https://www.t-mobile.com Logo
T-Mobile
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Information Systems, Software Engineering, or related field, or equivalent experience
  • 5+ years of software engineering or EPM Platform experience (Oracle, OneStream)
  • 3+ years of hands-on Oracle EPBCS / Planning Cloud experience
  • Proven experience owning end-to-end solution design for enterprise EPM implementations
  • Demonstrated experience with enterprise metadata management solutions (i.e. Oracle EDMCS)
  • Strong experience developing business rules and calculations within Oracle EPM platforms
  • Experience supporting integrations using OIC, Data Management, Data Exchange, or related technologies
  • Strong analytical, problem-solving, and communication skills
  • Experience collaborating with FP&A, or enterprise business stakeholders
  • Ability to facilitate requirements workshops with FP&A stakeholders and translate outputs into technical design documents
Job Responsibility
Job Responsibility
  • Design, develop, and enhance Oracle EPBCS (Planning Cloud) supporting enterprise planning models across revenue, customers, capex, etc.
  • Lead end-to-end solution design from requirements through deployment, including technical documentation and peer review
  • Build scalable planning applications supporting forecasting, workforce planning, and connected planning capabilities
  • Develop driver-based planning models, scenario analysis, and rolling forecast capabilities
  • Translate FP&A business requirements into scalable, enterprise-grade planning solutions
  • Develop and maintain business rules and calculations using Groovy scripting and native Oracle EPM capabilities
  • Support application performance optimization, cube tuning, metadata management, and usability improvements
  • Contribute to scalable application design, reusable components, and engineering best practices
  • Participate in new capability development and platform expansion initiatives
  • Design and support integrations using Oracle Integration Cloud (OIC), Data Management, and Data Exchange
What we offer
What we offer
  • Medical insurance
  • dental insurance
  • vision insurance
  • flexible spending account
  • 401(k)
  • employee stock grants
  • employee stock purchase plan
  • paid time off
  • paid holidays
  • paid parental leave
  • Fulltime
Read More
Arrow Right