CrawlJobs Logo

Senior ML Ops Engineer - Architecture & Strategy

bmw.de Logo

BMW

Location Icon

Location:
Germany , Munich

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

We own the platform blueprint for our ML infrastructure: designing systems that integrate with a data mesh of domain-owned data products, leverage Qualcomm Cloud AI 100 and NVIDIA GPU clusters for training at petabyte scale and produce optimised model artefacts ready for deployment to vehicle hardware. We set technical direction, make build-vs-buy decisions, and ensure the platform scales to hundreds of engineers.

Job Responsibility:

  • You design the reference architecture for the ML platform end-to-end: data ingestion, PB-scale data lake, heterogeneous training clusters, model registry, and deployment-ready artefacts
  • You design the data-format backbone, setting standards for data flows, ingestion, cataloguing, transcoding, and partitioning at PB scale, integrated with dataset management tooling
  • You define the platform component topology and integration contracts for pipeline orchestration, experiment tracking, hyperparameter optimisation, dataset management, observability, and metadata
  • You establish model lifecycle governance, including experiment tracking, approval gates, validation criteria, and clear handoff contracts to deployment teams
  • You drive cost governance at PB scale, including accelerator spot strategies, S3 tiering, cross-AZ traffic reduction, and Kubernetes cluster right-sizing
  • You partner with Security, Legal, and Functional-Safety teams on ISO 26262, ISO 8800, and data-protection compliance

Requirements:

  • University degree in Computer Science, Computer/Electrical Engineering or related subjects
  • 5–8+ years in ML platform or infrastructure engineering, with at least two years in a tech lead or architect role
  • Deep expertise in either AWS, Azure or Google cloud, ideally with multi-region or multi-account setups
  • Proven track record designing systems for PB-scale data and hundreds of concurrent training jobs as well as understanding of large vision models and the challenges of compressing them for automotive-grade SoCs
  • Strong knowledge of Kubernetes platform design, GitOps, and infrastructure-as-code
  • Excellent communication skills to align ML researchers, embedded engineers, data teams, and executives
  • Familiarity with edge model compilation toolchains for Qualcomm (QNN, AIMET) and/or NVIDIA (TensorRT, Triton) and experience with automotive data at scale, such as MDF4, MCAP, ROS bags, and multi-sensor synchronisation
What we offer:
  • Challenging projects with which we shape the mobility of tomorrow together
  • Wide range of personal and professional development opportunities
  • Attractive, fair and performance-related remuneration
  • High level of job security
  • Annual special payments such as vacation pay, Christmas bonus, and profit sharing
  • Flexible working hours including six weeks annual leave and overtime compensation
  • Discounted BMW & MINI conditions
  • Many other benefits at bmw.jobs/benefits

Additional Information:

Job Posted:
March 21, 2026

Employment Type:
Fulltime
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior ML Ops Engineer - Architecture & Strategy

Senior Software Engineer - ML Infrastructure

We build simple yet innovative consumer products and developer APIs that shape h...
Location
Location
United States , San Francisco
Salary
Salary:
180000.00 - 270000.00 USD / Year
plaid.com Logo
Plaid
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of industry experience as a software engineer, with strong focus on ML/AI infrastructure or large-scale distributed systems
  • Hands-on expertise in building and operating ML platforms (e.g., feature stores, data pipelines, training/inference frameworks)
  • Proven experience delivering reliable and scalable infrastructure in production
  • Solid understanding of ML Ops concepts and tooling, as well as best practices for observability, security, and reliability
  • Strong communication skills and ability to collaborate across teams
Job Responsibility
Job Responsibility
  • Design and implement large-scale ML infrastructure, including feature stores, pipelines, deployment tooling, and inference systems
  • Drive the rollout of Plaid’s next-generation feature store to improve reliability and velocity of model development
  • Help define and evangelize an ML Ops “golden path” for secure, scalable model training, deployment, and monitoring
  • Ensure operational excellence of ML pipelines and services, including reliability, scalability, performance, and cost efficiency
  • Collaborate with ML product teams to understand requirements and deliver solutions that accelerate experimentation and iteration
  • Contribute to technical strategy and architecture discussions within the team
  • Mentor and support other engineers through code reviews, design discussions, and technical guidance
What we offer
What we offer
  • medical, dental, vision, and 401(k)
  • Fulltime
Read More
Arrow Right

Senior Principal Technical Program Manager - ML Platform

Location
Location
Salary
Salary:
231300.00 - 301975.00 USD / Year
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience on software teams as Development Manager, Technical Product Manager or TPM leading technical platforms areas
  • Deep domain experience in AI and/or Search. Example: Model Inference, Model Evaluation, Model Training, LLM Ops, Semantic Search, Search Relevance, etc.
  • Partner with Engineering in defining direction, strategy and execution at Platform level
  • Strategic thinking and ability to understand business objectives to translate them into technical problems and programs.
  • Technical understanding of systems involved. Willingness to develop domain expertise in the area they operate - storage, networking, authentication, capacity management, service deployments, etc.
  • TPMs are not expected to write or read code, but are expected to understand system flows, block architectures, APIs and such.
  • Experience defining and running end-to-end complex technical programs
  • Strong leadership, organizational, and communication skills
Job Responsibility
Job Responsibility
  • Understand and stay up-to-date on latest innovations in AI and Search. Partner closely with engineering teams to translate these into practical platform evolution for Atlassian bringing value to our customers.
  • Analyze business objectives, customer needs, product adoption inhibitors and opportunities, industry trends, and based on these, in close collaboration with your stakeholders, define a long-term strategy and roadmap for your platform and product components.
  • Understand business objectives and translate them into technical systems problems that need to be prioritized solved in the current business environment.
  • Define specific systems programs and create a plan of action for realizing those programs. Such programs could be around capacity planning, migration efforts, high availability, network architecture, performance optimization, reliability improvements and more.
  • Use your technical understanding of Atlassian and related systems to partner with and influence engineers and architects in making progress on these problems.
  • Responsible for taking a systematic approach to engineering problems. This includes: prioritizing tasks, scoping out the project, defining objectives, and making consistent progress against each of these.
  • Be accountable for the success of these technical programs by managing the entire lifecycle from initiation to forecasting, budgeting, scheduling, etc.
  • Manage complex dependencies and projects with a broad scope across the company
What we offer
What we offer
  • health and wellbeing resources
  • paid volunteer days
Read More
Arrow Right

Senior Software Engineer - ML Infrastructure

We build simple yet innovative consumer products and developer APIs that shape h...
Location
Location
United States , New York
Salary
Salary:
190800.00 - 286800.00 USD / Year
plaid.com Logo
Plaid
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of industry experience as a software engineer, with strong focus on ML/AI infrastructure or large-scale distributed systems
  • Hands-on expertise in building and operating ML platforms (e.g., feature stores, data pipelines, training/inference frameworks)
  • Proven experience delivering reliable and scalable infrastructure in production
  • Solid understanding of ML Ops concepts and tooling, as well as best practices for observability, security, and reliability
  • Strong communication skills and ability to collaborate across teams
Job Responsibility
Job Responsibility
  • Design and implement large-scale ML infrastructure, including feature stores, pipelines, deployment tooling, and inference systems
  • Drive the rollout of Plaid’s next-generation feature store to improve reliability and velocity of model development
  • Help define and evangelize an ML Ops “golden path” for secure, scalable model training, deployment, and monitoring
  • Ensure operational excellence of ML pipelines and services, including reliability, scalability, performance, and cost efficiency
  • Collaborate with ML product teams to understand requirements and deliver solutions that accelerate experimentation and iteration
  • Contribute to technical strategy and architecture discussions within the team
  • Mentor and support other engineers through code reviews, design discussions, and technical guidance
  • Fulltime
Read More
Arrow Right

Principal Engineer

The Principal AI/ML Operations Engineer leads the architecture, automation, and ...
Location
Location
United States , Pleasanton, California
Salary
Salary:
251000.00 - 314500.00 USD / Year
blackline.com Logo
BlackLine
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Machine Learning, Data Science, or a related field
  • 10+ years in ML infrastructure, DevOps, and software system architecture
  • 4+ years in leading MLOps or AI Ops platforms
  • Strong programming skills in languages such as Python, Java, or Scala
  • Expertise in ML frameworks (TensorFlow, PyTorch, scikit-learn) and orchestration tools (Airflow, Kubeflow, Vertex AI, MLflow)
  • Proven experience operating production pipelines for ML and LLM-based systems across cloud ecosystems (GCP, AWS, Azure)
  • Deep familiarity with LangChain, LangGraph, ADK or similar agentic system runtime management
  • Strong competencies in CI/CD, IaC, and DevSecOps pipelines integrating testing, compliance, and deployment automation
  • Hands-on with observability stacks (Prometheus, Grafana, Newrelic) for model and agent performance tracking
  • Understanding of governance frameworks for Responsible AI, auditability, and cost metering across training and inference workloads
Job Responsibility
Job Responsibility
  • Define enterprise-level standards and reference architectures for ML-Ops and AIOps systems
  • Partner with data science, security, and product teams to set evaluation and governance standards (Guardrails, Bias, Drift, Latency SLAs)
  • Mentor senior engineers and drive design reviews for ML pipelines, model registries, and agentic runtime environments
  • Lead incident response and reliability strategies for ML/AI systems
  • Lead the deployment of AI models and systems in various environments
  • Collaborate with development teams to integrate AI solutions into existing workflows and applications
  • Ensure seamless integration with different platforms and technologies
  • Define and manage MCP Registry for agentic component onboarding, lifecycle versioning, and dependency governance
  • Build CI/CD pipelines automating LLM agent deployment, policy validation, and prompt evaluation of workflows
  • Develop and operationalize experimentation frameworks for agent evaluations, scenario regression, and performance analytics
What we offer
What we offer
  • short-term and long-term incentive programs
  • robust offering of benefit and wellness plans
  • Fulltime
Read More
Arrow Right

Principal Data And Analytics Engineer

The Principal Data and Analytics Engineer holds comprehensive responsibility for...
Location
Location
United States
Salary
Salary:
108086.00 - 180144.00 USD / Year
oreillyauto.com Logo
O'Reilly Auto Parts
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven experience architecting enterprise-scale data platforms and ecosystems, including hybrid and cloud-native environments (e.g., GCP BigQuery, Snowflake, Iceberg, Advanced SQL, Erwin, dbt, Kafka, Alation, Collibra)
  • Deep expertise in designing and scaling highly available, secure, and fault-tolerant batch and streaming pipelines with strong emphasis on cost optimization, observability, and latency control
  • Advanced proficiency in semantic modeling, reusable data asset design, and cross-functional data product delivery aligned to medallion architecture
  • Leadership in implementing CI/CD-enabled pipelines, RBAC frameworks, schema evolution strategies, and interoperable data exchange using Iceberg or equivalent table formats
  • Ownership of organization-wide metrics store and semantic layers, ensuring consistency, governance, and performance across reporting, AI, and ML use cases
  • Advanced expertise in programming languages such as Python, Scala, with the ability to architect complex data solutions
  • Demonstrated leadership in designing and overseeing the implementation of scalable, idempotent workflows using orchestration frameworks such as Airflow and Prefect
  • Demonstrated ability to translate business transformation goals into scalable data solutions and reusable patterns
  • Deep understanding of business processes, KPIs, and capability maps across functions such as supply chain, customer, store ops, and finance
  • Proven experience in driving cross-functional data product prioritization, influencing senior stakeholders, and quantifying impact of data initiatives
Job Responsibility
Job Responsibility
  • Help define and evolve enterprise data engineering blueprints, including data mesh, medallion architecture, and hybrid cloud data platforms
  • Set strategic direction for data platforms, tools, and services (e.g., Snowflake, GCP BigQuery, dbt, Kafka, Airflow/Prefect) in alignment with future-state architecture and business priorities
  • Architect and design highly scalable, resilient, cost optimal and secure data platforms
  • Lead the design and implementation of next-generation data platforms, ensuring fault tolerance, high availability, and optimal performance for petabyte-scale data
  • Establish and enforce organization-wide best practices for data pipeline development, CI/CD for data workflows, automated deployment playbooks, and robust rollback strategies
  • Lead technology evaluation and adoption, proactively researching, evaluating, and championing the integration of cutting-edge data technologies, frameworks, and methodologies
  • Define and scale enterprise knowledge management frameworks that ensure consistent documentation, discoverability, and reusability of data assets across domains
  • Establish and govern standards for metadata management, data lineage, architectural diagrams, and runbooks
  • Lead the design of federated governance models that empower domain-aligned teams to operate autonomously while conforming to centralized policies, frameworks and playbooks
  • Collaborate with data governance, compliance, and security teams to operationalize policy-as-code frameworks for data retention, access control, and PII handling
What we offer
What we offer
  • Competitive Wages & Paid Time Off
  • Stock Purchase Plan & 401k with Employer Contributions Starting Day One
  • Medical, Dental, & Vision Insurance with Optional Flexible Spending Account (FSA)
  • Team Member Health/Wellbeing Programs
  • Tuition Educational Assistance Programs
  • Opportunities for Career Growth
  • Fulltime
Read More
Arrow Right

Senior Manager, AI Engineering

By leading the strategic adoption and scaling of AI across the organisation this...
Location
Location
United Kingdom , London OR Newbury
Salary
Salary:
Not provided
vodafone.com Logo
Vodafone
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven experience in AI strategy, delivery, and enablement
  • Strong understanding of GenAI, ML Ops, and AI governance
  • Familiarity with infrastructure provisioning and model lifecycle
  • Ability to influence cross-functional teams and stakeholders
  • Experience in training, consulting, and change management
  • Knowledge of privacy, security, and ethical AI practices
Job Responsibility
Job Responsibility
  • Define and deliver the AI strategy and roadmap
  • Build and maintain self-service AI environments and infrastructure
  • Implement use cases to demonstrate business value
  • Operate and monitor AI models for accuracy and performance
  • Collaborate with architecture, governance, and security teams
  • Establish best practice and enable reuse across solutions
  • Drive AI enablement through training and consulting
  • Evangelise AI adoption across internal and customer-facing teams
  • Monitor industry trends and pilot emerging opportunities
  • Measure and report on efficiency gains and impact
What we offer
What we offer
  • Great pay, bonuses, up to 28 days off plus bank holidays, and paid time for charity work
  • Personalise benefits for you and your family, like discounts, vouchers, a pension plan and loads more
  • Amazing learning tools and top-notch parental leave policies
  • Fulltime
Read More
Arrow Right
New

Surgery Rn

As a Surgery Nurse at CHI St Vincent, you will prepare patients for surgical pro...
Location
Location
United States , Sherwood
Salary
Salary:
32.28 - 43.58 USD / Hour
americannursingcare.com Logo
American Nursing Care
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Graduate of an accredited school of nursing, upon hire
  • 1 year of recent experience in this discipline
  • Registered Nurse: AR, upon hire
  • Basic Life Support - CPR, within 30 Days
Job Responsibility
Job Responsibility
  • Prepare patients for surgical procedures and assist in the operating room
  • Go over consent forms, answer questions about the procedure, do preoperative assessments, make sure the equipment is ready to go, and update family members on the surgery status during the operation
  • Directly assist the surgeon during the actual surgery
  • Assist surgeon and surgical team in the OR
  • Care for patients before, during, and after procedures
  • Develop and implement nursing care plan for surgical patients
  • Ensure operating rooms are prepped with all necessary equipment and supplies
  • Demonstrate a high level of compassion for patients
What we offer
What we offer
  • Sign on bonus up to $25,000
  • Additional pay for participation in our clinical ladder
  • Referral bonuses
  • Excellent benefits
  • Tuition reimbursement
  • Relocation assistance
  • Medical
  • Prescription drug
  • Dental
  • Vision plans
  • Fulltime
Read More
Arrow Right
New

Electrical Hardware Architect

The Hardware Architect Engineer is responsible for customer HW requirements mana...
Location
Location
Poland , Krakow
Salary
Salary:
Not provided
borgwarner.com Logo
BorgWarner
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Masters Engineering degree in Electrical/Electronic or an associated field
  • 5 years of experience in Automotive Electronic or e-Mobility (Hardware)
  • Good foundation knowledge in hardware embedded systems
  • Experience within effective problem-solving methodologies
  • Strong communication skills are essential
  • Fluency in English is required
Job Responsibility
Job Responsibility
  • Analyze Hardware Stakeholder Requirements and interface with all involved parties (Customer and Internal)
  • Decompose the requirements into architectural blocks, I/O interfaces and system requirements/design that are assigned to Hardware Electronic, Mechanical, validation and Manufacturing engineering team
  • Generate Product Definition Document and perform Change Management
  • Develop requirements for embeded test software
  • Develop product functional test plan, define hardware test setup and test benches
  • Perform product functional hardware bench testing
  • Provide technical support to hardware environmental and electromagnetic compatibility validation
  • Provide technical support to manufacturing plant to set up production changes
  • Support customer test and validation programs
  • Participate in problem solving activities
What we offer
What we offer
  • Private Medicover medical care for the employee and his/ her family
  • Co-financing for the Multisport card
  • Possibility to join the UNIQA insurance
  • Flexible working hours
  • Competitive salary, adequate to skills and experience
  • Co-financing for holidays, Christmas gifts for employees’ children
  • Hard and soft trainings, language courses
  • Fulltime
Read More
Arrow Right