CrawlJobs Logo

Senior Platform Engineer - Cloud & Infrastructure

ZenML

Location Icon

Location:
Germany , Munich

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

Architect the Infrastructure of MLOps. This is a unique hybrid role where you won't just be maintaining internal clusters; you will be building core product features AND helping our most advanced customers architect their MLOps stacks.

Job Responsibility:

  • Build 'Infra-Heavy' Product Features like native schedulers and workload manager
  • Own the ZenML Pro (SaaS) Infrastructure ensuring resilience, scalability, and security
  • Enterprise Architecture & PoCs for complex customer deployments
  • Developer Experience by abstracting Kubernetes complexity from Data Scientists

Requirements:

  • Deep knowledge of Kubernetes (CKA level)
  • Experience with Docker, Terraform, Helm
  • Proficiency in Python and likely Go
  • Experience with AWS (EKS), GCP (GKE), Azure (AKS)
  • Experience with PostgreSQL, SQLModel, FastAPI
  • Infrastructure as Code (IaC) mastery
  • Ability to write production-quality code
  • Customer empathy and communication skills
  • Problem-solving skills for complex deployments
What we offer:
  • Inspiring international team
  • Genuine connection & lots of fun with team events
  • Annual company offsite
  • Office in the heart of Munich
  • Flexible hours & trust-based work
  • Remote-friendly culture
  • Competitive compensation

Additional Information:

Job Posted:
January 16, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior Platform Engineer - Cloud & Infrastructure

Senior Distributed Systems Engineer - Platform Engineering

For our Platform Engineering team, we are looking for programmers with strong in...
Location
Location
Poland
Salary
Salary:
Not provided
rtbhouse.com Logo
RTB House
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Excellent understanding of how complex IT systems work - from the hardware level, through software, to algorithms
  • Ability to proactively define requirements, ask appropriate questions and draw conclusions that will combine technical constraints and business needs
  • Ability to lead the design and implementation of a solution
  • Experience in leading project teams
  • Willingness to be involved in topics that go beyond programming and design, such as responsibility for technical areas or communication with other teams
  • Proactive attitude, independence in taking action
  • Extensive experience in programming and readiness to implement key system elements as well as involvement in code reviews
  • Good knowledge of methods of creating concurrent programs and distributed systems
  • Ability to critically analyze created solutions in terms of performance (from estimating the theoretical performance of designed systems to detecting and removing actual performance problems in production)
  • C1 level in English and Polish
Job Responsibility
Job Responsibility
  • Plan and then hands-on lead further development within a given technical area like deployment, monitoring, databases or load balancing, in the context of existing infrastructure within RTB House
  • Coordinate the work of a project team of 3-4 people, also making arrangements with other teams and units within RTB House
  • Ensure the reliability and scalability of the solutions built
What we offer
What we offer
  • Attractive compensation
  • Work in a team of enthusiasts who are willing to share their knowledge and experience
  • Flexible cooperation conditions - we do not have core hours, we do not have holiday limits
  • Access to the latest technologies and the possibility of real use of them in a large-scale and highly dynamic project
Read More
Arrow Right

Senior Software Engineer, Infrastructure

You’ll help shape the future of infrastructure automation for law enforcement sy...
Location
Location
United States , Seattle; Boston
Salary
Salary:
141000.00 - 225600.00 USD / Year
axon.com Logo
Axon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent experience)
  • 8+ years of professional software development experience
  • Strong background building cloud-native, distributed solutions
  • Experience designing tooling and automation to simplify the operational management of SaaS/PaaS systems
  • Proficiency in backend services with multiple managed languages (e.g., Java, Scala, Go, C#, or similar)
  • Expertise with Infrastructure as Code (IaC) tools (e.g., Terraform, CloudFormation) and building modular, reusable, testable components
  • Familiarity with Kubernetes platforms (e.g., AKS, EKS, or similar)
  • Hands-on experience with CI/CD platforms for automating infrastructure, builds, testing, and releases
  • Strong collaboration and communication skills, with empathy for the needs of engineering teams
Job Responsibility
Job Responsibility
  • Lead engineering architecture design reviews
  • Set a high technical bar for the team through code and architecture design reviews
  • Mentoring engineers
  • Working across teams with Product, Design, and Engineering to create integrated solutions that delight our customers
  • Improve our Engineering process, including long-term thinking, sprint planning and stand-ups
  • Building services that adhere to our high bar on availability and latency in this mission-critical space
  • Working with the latest open source technologies
What we offer
What we offer
  • Competitive salary and 401k with employer match
  • Discretionary paid time off
  • Paid parental leave for all
  • Medical, Dental, Vision plans
  • Fitness Programs
  • Emotional & Mental Wellness support
  • Learning & Development programs
  • Snacks in our offices
  • Fulltime
Read More
Arrow Right

Senior Engineering Manager - Infra Platform

Lead the team that powers every product at Intercom, building the platform that ...
Location
Location
Ireland , Dublin
Salary
Salary:
Not provided
intercom.com Logo
Intercom
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience leading an infrastructure or platform team in a cloud environment, ideally AWS, with accountability for availability and costs
  • A track record of independently leading and delivering on complex initiatives, often spanning multiple projects, teams, and quarters
  • Expert grasp of modern web and distributed systems
  • Clear, timely communication
  • Comfort with data
  • Ability to learn quickly, iterate, unblock yourself and your team, and persist until the right problem is solved
Job Responsibility
Job Responsibility
  • Lead and inspire a high-performing, autonomous team of engineers working on scaling and optimizing Intercom’s core infrastructure and supporting systems
  • Navigate complex scaling and infrastructure challenges, guiding the team through strategic decisions on how we build and scale Intercom’s core components
  • Collaborate closely with domain experts and other parts of R&D
  • Drive strategic initiatives and execute high-impact projects, ensuring they are delivered reliably and efficiently
  • Use the best tools in the industry
  • Foster a culture of agility and simplicity, prioritising scalability and uptime while supporting continuous deployment and incremental improvements
  • Mentor and coach engineers and future leaders, playing an active role in their hiring, onboarding, and career development
  • Adapt and thrive across various functions to ensure objectives are met
What we offer
What we offer
  • Competitive salary and equity in a fast-growing start-up
  • We serve lunch every weekday, plus a variety of snack foods and a fully stocked kitchen
  • Regular compensation reviews
  • Pension scheme & match up to 4%
  • Life assurance
  • Comprehensive health and dental insurance for you and your dependents
  • Flexible paid time off policy
  • Paid maternity leave
  • 6 weeks paternity leave for fathers
  • Cycle-to-Work Scheme
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, Infrastructure

You’ll help shape the future of infrastructure automation for law enforcement sy...
Location
Location
United States , Seattle; Boston
Salary
Salary:
141000.00 - 225600.00 USD / Year
axon.com Logo
Axon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent experience)
  • 8+ years of professional software development experience
  • Strong background building cloud-native, distributed solutions
  • Experience designing tooling and automation to simplify the operational management of SaaS/PaaS systems
  • Proficiency in backend services with multiple managed languages (e.g., Java, Scala, Go, C#, or similar)
  • Expertise with Infrastructure as Code (IaC) tools (e.g., Terraform, CloudFormation) and building modular, reusable, testable components
  • Familiarity with Kubernetes platforms (e.g., AKS, EKS, or similar)
  • Hands-on experience with CI/CD platforms for automating infrastructure, builds, testing, and releases
  • Strong collaboration and communication skills, with empathy for the needs of engineering teams
Job Responsibility
Job Responsibility
  • Lead engineering architecture design reviews
  • Set a high technical bar for the team through code and architecture design reviews
  • Mentoring engineers
  • Working across teams with Product, Design, and Engineering to create integrated solutions that delight our customers
  • Improve our Engineering process, including long-term thinking, sprint planning and stand-ups
  • Building services that adhere to our high bar on availability and latency in this mission-critical space
  • Working with the latest open source technologies
What we offer
What we offer
  • Competitive salary and 401k with employer match
  • Discretionary paid time off
  • Paid parental leave for all
  • Medical, Dental, Vision plans
  • Fitness Programs
  • Emotional & Mental Wellness support
  • Learning & Development programs
  • Snacks in our offices
  • Fulltime
Read More
Arrow Right

Senior Infrastructure Engineer

We are seeking a skilled and proactive individual to play a key role in supporti...
Location
Location
United Kingdom , Manchester
Salary
Salary:
Not provided
ans.co.uk Logo
ANS Group
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Exposure to secure architecture design and implementation
  • Experience with the deployment and management Carbon Black or other EDR solutions across cloud infrastructure
  • Significant previous experience as an infrastructure engineer working on a large scale enterprise or multi-tenant environment
  • VMware 7.0+
  • Significant experience troubleshooting and analysing complex failures
  • Operational experience of NSX 3.0+
  • Scripting abilities in Powershell and PowerCLI
  • Experience with Cisco UCS or other enterprise blade systems
  • Significant Experience with Storage Technologies (HPE 3PAR, Nimble, Dell Compellent)
  • Experience with FC storage networking
Job Responsibility
Job Responsibility
  • Work to ensure conformity to public sector infrastructure requirements are met
  • Work in conjunction with our SoC team to develop and maintain platform security baselines
  • Monitor, diagnose and resolve significant problems within the ANS infrastructure
  • Be an escalation point for team members and the support teams offering technical expertise in virtualization, compute hardware and storage
  • Collaborate and work with other technical teams to provide industry leading support to our customers
  • Responsible for creating high quality documentation
  • Proactively work to identify areas of improvement in the platform
  • Effectively deliver project milestones
  • Responsible for the generation of LLD from HLD
  • Ensure our infrastructure is up to date by planning & performing patching and firmware upgrades
What we offer
What we offer
  • 25 days’ holiday, plus you can buy up to 5 more days
  • Birthday off
  • An extra celebration day
  • 5 days’ additional holiday in the year you get married
  • 5 volunteer days
  • Private health insurance
  • Pension contribution match and 4 x life assurance
  • Flexible working and work from anywhere for up to 30 days per year
  • Maternity: 16 weeks’ full pay
  • Paternity: 3 weeks’ full pay
  • Fulltime
Read More
Arrow Right

Senior Platform Engineer

As a Senior Platform Engineer at Aignostics, you work hand in hand with our team...
Location
Location
Germany , Berlin
Salary
Salary:
Not provided
aignostics.com Logo
Aignostics
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven experience: 5+ years in platform engineering, with a proven track record in managing complex, cloud-native infrastructure
  • Technical expert: work with Kubernetes, Docker, Terraform, and Cloud Environments, with a deep understanding of CI/CD processes and tools
  • Excellent coder: automate with strong programming and scripting abilities, preferably in Go/Python and bash
  • Network architect: get our services flying by leveraging technologies like Virtual Private Clouds, DNS, Reverse Proxies and Firewalls
  • Outstanding communicator: ability to explain complex technical concepts, drive decisions, and collaborate across teams (fluent in English, German is a plus)
  • Self-driven learner: stays current with technology trends, evaluates new solutions (including AI), and grows with our challenges
  • Live the Devops Culture: Educating our developers on core principles, implementing collaborative processes, and providing the necessary tools and frameworks that enable them to do so
Job Responsibility
Job Responsibility
  • Introduce, implement and own architectural solutions of our Kubernetes clusters, internal services, and cloud-based infrastructure to improve our developer’s experience
  • Identify & Automate workflows and processes leveraging from event driven pipelines like Gitlab CICD or Argo Workflows
  • Introduce and drive the security of our applications and data by implementing state of the art security concepts in Kubernetes and cloud environments
  • Work closely with the engineering and data science teams to bring our products and AI model training to its excellence
  • Propose and drive the adoption of best practices in infrastructure management, scalability, and security
What we offer
What we offer
  • Cutting-edge AI research and development, with involvement of Charité, TU Berlin and our other partners
  • Work with a welcoming, diverse and highly international team of colleagues
  • Opportunity to take responsibility and grow your role within the startup
  • Expand your skills by benefitting from our Learning & Development yearly budget of 1,000€ (plus 2 L&D days), language classes and internal development programs
  • Mentoring program, you’ll learn from great experts
  • Flexible working hours and teleworking policy
  • Enjoy your well-deserved time off within our 30 paid vacations days per year
  • We are family & pet friendly and support flexible parental leave options
  • Pick a subsidized membership of your choice among public transport, sports and well-being
  • Enjoy our social gatherings, lunches, and off-site events for a fun and inclusive work environment
Read More
Arrow Right

Senior Site Reliability Engineer Cloud Platform

Zilliz is a fast-growing startup developing the industry’s leading vector databa...
Location
Location
Salary
Salary:
175000.00 - 225000.00 USD / Year
zilliz.com Logo
Zilliz
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ years of experience in site reliability engineering or similar roles with a focus on cloud-native systems
  • Proficiency in scripting languages such as Python, Go, or Java
  • Strong knowledge of container orchestration technologies like Kubernetes and Docker
  • Expertise with cloud platforms such as AWS, GCP, or Azure, and their respective monitoring and management tools
  • Experience with infrastructure as code tools such as Terraform or Ansible
  • Familiarity with CI/CD tools such as Jenkins, GitLab CI, or Argo
  • Proven ability to troubleshoot complex distributed systems and resolve issues promptly
  • Bachelor’s degree or above in computer science, software engineering, or other relevant disciplines
  • Ability to thrive in a fast-paced, startup environment and handle multiple projects simultaneously
Job Responsibility
Job Responsibility
  • Work at the intersection of development and site reliability. Creating SRE tools and systems, as well as supporting existing infrastructure and platforms
  • Ensure the reliability, availability, and performance of Zilliz’s distributed database systems
  • Develop and implement strategies for monitoring, incident management, and disaster recovery
  • Automate system operations and maintenance tasks to improve efficiency and reduce manual intervention
  • Design and build tools to manage and monitor infrastructure, ensuring scalability and robustness
  • Collaborate with software engineers to enhance system reliability, scalability, and performance
  • Maintain and improve the CI/CD pipeline to ensure smooth and rapid deployment of changes
  • Actively contribute to the Milvus Vector Database open-source community, focusing on improving reliability and operational efficiency
  • Fulltime
Read More
Arrow Right

Senior ML Platform Engineer

At WHOOP, we're on a mission to unlock human performance and healthspan. WHOOP e...
Location
Location
United States , Boston
Salary
Salary:
150000.00 - 210000.00 USD / Year
whoop.com Logo
Whoop
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s Degree in Computer Science, Engineering, or a related field
  • or equivalent practical experience
  • 5+ years of experience in software engineering with a focus on ML infrastructure, cloud platforms, or MLOps
  • Strong programming skills in Python, with experience in building distributed systems and REST/gRPC APIs
  • Deep knowledge of cloud-native services and infrastructure-as-code (e.g., AWS CDK, Terraform, CloudFormation)
  • Hands-on experience with model deployment platforms such as AWS SageMaker, Vertex AI, or Kubernetes-based serving stacks
  • Proficiency in ML lifecycle tools (MLflow, Weights & Biases, BentoML) and containerization strategies (Docker, Kubernetes)
  • Understanding of data engineering and ingestion pipelines, with ability to interface with data lakes, feature stores, and streaming systems
  • Proven ability to work cross-functionally with Data Science, Data Platform, and Software Engineering teams, influencing decisions and driving alignment
  • Passion for AI and automation to solve real-world problems and improve operational workflows
Job Responsibility
Job Responsibility
  • Architect, build, own, and operate scalable ML infrastructure in cloud environments (e.g., AWS), optimizing for speed, observability, cost, and reproducibility
  • Create, support, and maintain core MLOps infrastructure (e.g., MLflow, feature store, experiment tracking, model registry), ensuring reliability, scalability, and long-term sustainability
  • Develop, evolve, and operate MLOps platforms and frameworks that standardize model deployment, versioning, drift detection, and lifecycle management at scale
  • Implement and continuously maintain end-to-end CI/CD pipelines for ML models using orchestration tools (e.g., Prefect, Airflow, Argo Workflows), ensuring robust testing, reproducibility, and traceability
  • Partner closely with Data Science, Sensor Intelligence, and Data Platform teams to operationalize and support model development, deployment, and monitoring workflows
  • Build, manage, and maintain both real-time and batch inference infrastructure, supporting diverse use cases from physiological analytics to personalized feedback loops for WHOOP members
  • Design, implement, and own automated observability tooling (e.g., for model latency, data drift, accuracy degradation), integrating metrics, logging, and alerting with existing platforms
  • Leverage AI-powered tools and automation to reduce operational overhead, enhance developer productivity, and accelerate model release cycles
  • Contribute to and maintain internal platform documentation, SDKs, and training materials, enabling self-service capabilities for model deployment and experimentation
  • Continuously evaluate and integrate emerging technologies and deployment strategies, influencing WHOOP’s roadmap for AI-driven platform efficiency, reliability, and scale
What we offer
What we offer
  • equity
  • benefits
  • Fulltime
Read More
Arrow Right