CrawlJobs Logo

Machine Learning Engineering Team Lead

aignostics.com Logo

Aignostics

Location Icon

Location:
Germany , Berlin

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

Lead a high-performing team focused on building large-scale distributed training infrastructure and workflows using cutting-edge technologies for digital pathology, powering our state-of-the-art Foundational Model development. This is a hands-on leadership role where you'll spend approximately 50% of your time on technical contributions while guiding your team to push the boundaries of machine learning for cancer research and diagnostics.

Job Responsibility:

  • Build and scale a high-performing team capable of tackling complex distributed ML challenges
  • Own the full employee lifecycle: recruiting, onboarding, performance management, career development, and retention
  • Empower your team members and help them grow in autonomy and technical expertise
  • Mentor engineers at all levels, fostering a culture of continuous learning and psychological safety
  • Create an inclusive environment where diverse perspectives drive innovation
  • Define and execute technical roadmaps aligned with company objectives and product needs
  • Lead resource allocation and capacity planning to balance team workload and business priorities
  • Own FinOps responsibilities: optimize cloud costs, track spending, and ensure efficient resource utilization
  • Ensure operational readiness through monitoring, incident response protocols, and system reliability practices
  • Establish and track KPIs for team performance, system efficiency and health
  • Design, develop, and maintain robust large-scale distributed training pipelines and ML infrastructure using cutting-edge technologies
  • Lead architecture decisions for distributed systems that enable efficient model development at scale
  • Hands-on contribution to critical technical challenges, including optimization of training pipelines and infrastructure
  • Drive technical excellence through code reviews and architectural guidance
  • Stay at the forefront of distributed training technologies and bring innovation to the team
  • Partner closely with Product teams to translate business requirements into technical solutions
  • Collaborate with (senior) Research Scientists to enable scalable model development and experimentation
  • Work with Platform Engineering to ensure robust infrastructure and tooling
  • Build strong relationships across engineering teams to drive alignment and knowledge sharing
  • Communicate technical concepts effectively to both technical and non-technical stakeholders

Requirements:

  • Bachelor's or Master's degree in Computer Science, Engineering, Mathematics, or a related field
  • 6+ years of software engineering or ML engineering experience, with at least 2 years in a technical leadership or team lead role
  • Proven track record of building and leading high-performing engineering teams
  • Experience guiding projects across the whole Software Development Life Cycle
  • Deep understanding of fundamental Machine Learning concepts and principles, familiarity with advanced model optimization techniques
  • Significant experience with large-scale distributed training systems and frameworks (especially PyTorch and NCCL)
  • Familiarity with GPUs, distributed systems, parallel computing and scaling laws
  • Advanced programming skills in Python, experience in performance-critical languages (C/C++ or CUDA) being a plus
  • Familiarity of MLOps/DevOps best practices including CI/CD, Docker, Kubernetes, and observability, cloud platforms (GCP, AWS or Azure) and infrastructure-as-code
  • Experience with Linux, version control, and container technologies
  • Demonstrated ability in resource allocation, capacity planning, and FinOps principles
  • Excellent problem-solving and data-driven decision-making skills in ambiguous situations
  • Effective communication and stakeholder management skills
  • Ability to give constructive feedback and navigate difficult conversations
  • Proven people leadership skills with experience managing the full employee lifecycle
  • Strategic thinking with ability to balance short-term execution and long-term vision
  • Experience with agile methodologies and iterative development processes
  • Proven ability to influence without authority and build consensus across teams
  • Track record of empowering team members and fostering autonomy

Nice to have:

  • Experience with production systems in a regulated or healthcare environments, familiarity with medical device standards (ISO 13485)
  • Experience working with biomedical or image data
  • Hands-on experience with Google Kubernetes Engine, SLURM and Ray distributed computing framework
  • Experience with advanced ML stack (TorchDyno, JAX, TensorRT)
  • Familiarity with Information Security standards (ISO 27001) in software development
  • Experience with FinOps tools and cloud cost optimization strategies
  • Demonstrated experience with leveraging LLM/Agentic systems to accelerate development
What we offer:
  • Learning & Development yearly budget of 1,000€ (plus 2 L&D days)
  • Language classes, and internal development programs
  • Access to leadership development programs and executive coaching
  • Flexible working hours and teleworking policy
  • 30 paid vacation days per year
  • Family & pet friendly and support flexible parental leave options
  • Subsidized membership of your choice among public transport, sports, and well-being
  • Social gatherings, lunches, and off-site events for a fun and inclusive work environment
  • Optional company pension scheme

Additional Information:

Job Posted:
January 03, 2026

Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Machine Learning Engineering Team Lead

Senior Machine Learning Engineering Manager, Gen AI

We're seeking a Senior Machine Learning Manager (M60) to lead a cross-functional...
Location
Location
United States
Salary
Salary:
193500.00 - 303150.00 USD / Year
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years in ML, search, or backend engineering roles, with 3+ years leading teams
  • Strong track record of shipping ML-powered or LLM-integrated user-facing products
  • Experience with RAG systems (vector search, hybrid retrieval, LLM orchestration)
  • Deep experience in either modeling (e.g., LLMs, search, NLP) or engineering (e.g., backend infra, full-stack), with the ability to lead end-to-end
  • Deep understanding of LLM ecosystems (OpenAI, Claude, Mistral, OSS), orchestration frameworks (LangChain, LlamaIndex), and vector databases (Weaviate, Pinecone, FAISS, etc.)
  • Strong product intuition and ability to translate complex tech into valuable user features
  • Familiarity with GenAI evaluation methods: hallucination detection, groundedness scoring, and human-in-the-loop feedback loops
  • Master’s or PhD in Computer Science, Machine Learning, or related field preferred—or equivalent practical experience
Job Responsibility
Job Responsibility
  • Lead the vision, design, and execution of LLM-powered AI products, leveraging advance AI modeling (e.g. SLM post-training/fine-tuning), RAG architectures and hybrid ranking system
  • Define system architecture across retrievers, rankers, orchestration layers, prompt templates, and feedback mechanisms
  • Work closely with product and design teams to ensure delightful, fast, and grounded user experiences
  • Build and manage a cross-disciplinary team including ML engineers, backend/frontend engineers, and applied scientists
  • Foster a culture of E2E ownership — empowering the team to move from prototype to production quickly and iteratively
  • Mentor individuals to grow in both technical depth and product acumen
  • Shape the technical roadmap and long-term strategy for GenAI search across Atlassian’s product suite
  • Partner with platform and infra teams to scale inference, evaluate performance, and integrate usage signals for continuous improvement
  • Champion data quality, grounding, and responsible AI practices in all deployed features
What we offer
What we offer
  • health and wellbeing resources
  • paid volunteer days
  • Fulltime
Read More
Arrow Right

Principal Machine Learning Systems Engineer

Search Platform powers the search functionality in Atlassian products. The team ...
Location
Location
Salary
Salary:
Not provided
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years experience in multiple hands-on software/technology leadership roles, with end-to-end responsibility through the software development lifecycle
  • Worked on scaling ML use cases for 50+ TB of data
  • Good understanding of PySpark and Databricks jobs scaling challenges
  • Experience with ML workflows and observability at scale.
  • Bachelor's degree with a preference for Computer Science degree
  • Expertise with one or more prominent languages such as Java, Python, Kotlin, Go, or TypeScript is required.
  • Understanding of SaaS, PaaS, IaaS industry with hands-on experience with public cloud offerings (e.g., AWS, GCP, or Azure)
  • Java, Spring, REST, and NoSQL databases
  • Experience building event-driven based on SQS, SNS, Kafka or equivalent technologies
  • Knowledge to evaluate trade-offs between correctness, robustness, performance, space and time
Job Responsibility
Job Responsibility
  • Handle complex problems in the team from technical design to launch
  • Determine plans-of-attack on large projects
  • Solve complex architecture challenges and apply architectural standards and start using them on new projects
  • Lead code reviews & documentation and take on complex bug fixes, especially on high-risk problems
  • Set the standard for meaningful code reviews
  • Partner across engineering teams to take on company-wide programmes in multiple projects
  • Transfer your depth of knowledge from your current language to excel as a Software Engineer
  • Mentor junior members of the team
What we offer
What we offer
  • Atlassians can choose where they work – whether in an office, from home, or a combination of the two
  • health and wellbeing resources
  • paid volunteer days
Read More
Arrow Right

Senior Principal Machine Learning Engineer

You’ll form a new team of passionate engineers dedicated to building and scaling...
Location
Location
United States
Salary
Salary:
222300.00 - 348975.00 USD / Year
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s, Master’s, or PhD in Computer Science, Statistics, Mathematics, or a related field, or equivalent practical experience
  • 12+ years of industry experience in machine learning, data science, or AI, with a proven track record of delivering production-grade ML systems
  • Deep expertise in Python, Go, or Java, with the ability to write performant, production-quality code
  • familiarity with SQL, Spark, and cloud data environments (e.g., AWS, GCP, Databricks)
  • Experience building and scaling ML models for business-critical applications, ideally in security, privacy, anti-abuse, or compliance domains
  • Strong communication skills, able to explain complex ML concepts to diverse audiences and influence stakeholders
  • Demonstrated ability to solve ambiguous, complex problems and drive projects from ideation to production
  • Agile development mindset, with a focus on iterative improvement and business impact
Job Responsibility
Job Responsibility
  • Lead AI/ML Strategy for Trust: Drive the development and implementation of advanced machine learning algorithms and AI systems for Trust, Security, Product Abuse, and Compliance use cases (e.g., threat detection, vulnerability management, privacy automation, AI safety)
  • Architect and Scale ML Platforms: Design and build scalable, secure, and reliable ML infrastructure and pipelines, ensuring compliance with privacy and regulatory requirements
  • AI Safety and Responsible AI: Develop and champion AI safety practices, including output moderation, explainability, and alignment with evolving regulatory frameworks
  • Cross-Functional Collaboration: Partner with product, engineering, security, privacy, and analytics teams to deliver transformative AI/ML solutions that enhance Atlassian’s trust posture
  • Mentorship and Leadership: Mentor and guide ML engineers and data scientists, fostering a culture of technical excellence, innovation, and continuous improvement
  • Innovation and Research: Stay at the forefront of AI/ML research, evaluating and applying the latest techniques (e.g., LLMs, anomaly detection, privacy-preserving ML) to real-world Trust challenges
  • Platform Enablement: Build reusable ML services and APIs that empower other teams to integrate AI/ML into their products and workflows
  • Operational Excellence: Ensure high availability, reliability, and security of all ML-powered Trust platforms and services
What we offer
What we offer
  • health and wellbeing resources
  • paid volunteer days
  • benefits, bonuses, commissions, and equity
  • Fulltime
Read More
Arrow Right

Principal Machine Learning Engineer

Location
Location
Salary
Salary:
Not provided
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Fluency in at least one modern object-oriented programming language (preferably Java/Kotlin and Python)
  • Understanding of Machine Learning project lifecycle/tools along with prompt engineering
  • Experience in architecting and implementing high-performance RESTful microservices
  • Experience building and operating large scale distributed systems using Amazon Web Services (S3, Kinesis, Cloud Formation, EKS, AWS Security and Networking)
  • Experience with leveraging LLMs effectively and optimizing model usage on GPUs
  • Experience with Databricks or Apache Spark
  • Experience with Continuous Delivery and Continuous Integration
Job Responsibility
Job Responsibility
  • Regularly tackle the largest and most complex problems in the team, from technical design to launch
  • Work closely with Product, Engineering and Design leads in Jira AI, and translate their requirements into solid engineering deliverables, delegating work to the teams
  • Deliver solutions that are used by other teams and products
  • Follow a Product Engineer mindset by building features that are data-driven and customer-centric, fostering that culture within the Jira AI group
  • Exceptional problem solving ability using ML, AI and core software engineering
  • Routinely tackle complex architecture challenges and define architectural standards
  • Actively contribute to the code delivery through leading code reviews & documentation, direct contribution and fixing complex bugs in high-risk surface areas
  • Expertise in data analysis, statistical methods, and logical reasoning to inform data-driven decision-making
  • Partner across engineering teams to take on company-wide initiatives spanning multiple projects
  • Mentor junior members on the team
What we offer
What we offer
  • Atlassians can choose where they work – whether in an office, from home, or a combination of the two
  • Atlassians have more control over supporting their family, personal goals, and other priorities
Read More
Arrow Right

AI Machine Learning Principal Engineer

Our Regulatory Engineering team thrives on the challenge of operating in a compl...
Location
Location
Singapore , Singapore
Salary
Salary:
Not provided
dell.com Logo
Dell
Expiration Date
March 31, 2026
Flip Icon
Requirements
Requirements
  • Requires 8+ years of related experience in a professional role with a Bachelor’s degree
  • or 6+ years with a Master’s degree
  • or 3+ years with a PhD
  • or equivalent experience
  • Knowledge on product compliance, Safety, EMC, Wireless, Telecom and Environmental (e.g. Energy, Material, ECO Labels, Accessibility, Repairability, Packaging and etc.) legislation programs will be plus
  • Track record of being an enthusiastic and effective team player with experience leading and influencing internal and external stakeholders to ensure successful outcomes
Job Responsibility
Job Responsibility
  • Apply knowledge of AI machine learning, statistics, optimization, software engineering and data engineering to produce code for testing, operationalizing and governing software-integrated machine learning models
  • Works with stakeholders including business owners, software engineers, data scientists and data engineers to align and execute end-to-end solutions which start with data collection and management, extend through machine learning methodologies and efficiencies in software engineering, and end with governance of machine learning model performance
  • Optimizes code for existing data streams, machine learning models and APIs using best practices in software engineering and experimental design
  • Defines best practices for AI ML engineering, code optimization, model and system validation and model governance and educates prospects and customers on analytics and machine learning offerings
  • Leads the design and testing of analytical technologies, or the development/testing of new algorithms/functions to enhance capabilities within key market segments of interest
  • Leads definition of machine learning and data analytics vision, multiple use-case evaluation and selection criteria, as well as project-level scope definition for customers
  • Proactively identifies gaps and collaborates with engineering teams to find solutions, while solidifying commitments to execute these
  • Drives partnerships & relationships with third parties to develop vertical/horizontal analytical solutions and product integrations
What we offer
What we offer
  • Comprehensive Healthcare Programs
  • Award Winning Financial Wellness Tools and Resources
  • Generous Leave of Absence for New Parents and Caregivers
  • Industry Leading Wellness Platform
  • Employee Assistance Program
Read More
Arrow Right

Senior Machine Learning Engineer (Infrastructure)

We are looking for an experienced MLOps Engineer to join our team as a Senior Ma...
Location
Location
United States , Boston
Salary
Salary:
152800.00 - 224100.00 USD / Year
simplisafe.com Logo
SimpliSafe
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in software engineering, data engineering, or a related field, with at least 3 years focused on MLOps or ML infrastructure
  • Deep hands-on experience with AWS or similar public clouds, including compute, networking, container orchestration, and observability stacks
  • Hands-on experience with: CI/CD pipelines, Docker
  • Kubernetes
  • Infrastructure-as-code tools (e.g., Terraform, Cloud Formation)
  • Proficiency in programming languages like Python, and familiarity with machine learning frameworks (e.g., TensorFlow, PyTorch)
  • Solid understanding of ML lifecycle management, including experiment tracking, versioning, and monitoring
  • LLM application development, including prompt engineering and evaluation
  • Strong communication skills for partnering with cross-functional technical and non-technical teams
Job Responsibility
Job Responsibility
  • Lead the architecture, deployment, and optimization of scalable ML model serving systems for real-time and batch use cases
  • Collaborate with data scientists, engineers, and stakeholders to operationalize ML models
  • Develop CI/CD pipelines for ML models enabling rapid, safe, and consistent model releases
  • Design, implement, and own comprehensive production monitoring for ML models/systems
  • Manage cloud infrastructure, primarily in AWS or other major public clouds, to support ML workloads
  • Drive best practices in model versioning, observability, reproducibility, and deployment reliability
  • Serve in an on-call rotation as a first responder for software owned by your team
What we offer
What we offer
  • A mission- and values-driven culture and a safe, inclusive environment where you can build, grow and thrive
  • A comprehensive total rewards package that supports your wellness and provides security for SimpliSafers and their families
  • Free SimpliSafe system and professional monitoring for your home
  • Employee Resource Groups (ERGs) that bring people together, give opportunities to network, mentor and develop, and advocate for change
  • Participation in our annual bonus program, equity, and other forms of compensation
  • A full range of medical, retirement, and lifestyle benefits
  • Fulltime
Read More
Arrow Right

LLM - Senior Staff Engineer - Python + Machine Learning

AquSag is seeking a hands-on Machine Learning Senior Staff Engineer to lead cros...
Location
Location
Salary
Salary:
40.00 - 60.00 USD / Hour
aqusag.com Logo
AquSag Technologies
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 9+ yrs of strong background in Machine Learning, NLP, and modern deep learning architectures (Transformers, LLMs)
  • Hands-on experience with frameworks such as PyTorch, TensorFlow, Hugging Face, or DeepSpeed
  • Hands-on experience in Docker for Production deployment
  • Proven experience managing teams delivering ML/LLM models in production environments
  • Knowledge of distributed training, GPU/TPU optimization, and cloud platforms (AWS, GCP, Azure)
  • Familiarity with MLOps tools like MLflow, Kubeflow, or Vertex AI for scalable ML pipelines
  • Excellent leadership, communication, and cross-functional collaboration skills
  • Bachelor’s or Master’s in Computer Science, Engineering, or related field (PhD preferred)
  • Overlap of 6 hours with PST time zone is mandatory
  • Commitments Required: 8 hours per day with overlap of 6 hours with PST
Job Responsibility
Job Responsibility
  • Lead and mentor a cross-functional team of ML engineers, data scientists, and MLOps professionals
  • Oversee the full lifecycle of LLM and ML projects — from data collection to training, evaluation, and deployment
  • Collaborate with Research, Product, and Infrastructure teams to define goals, milestones, and success metrics
  • Provide technical direction on large-scale model training, fine-tuning, and distributed systems design
  • Implement best practices in MLOps, model governance, experiment tracking, and CI/CD for ML
  • Manage compute resources, budgets, and ensure compliance with data security and responsible AI standards
  • Communicate progress, risks, and results to stakeholders and executives effectively
  • Fulltime
Read More
Arrow Right

Manager, Machine Learning - Community Support Engineering

The Community Support Platform (CSP) at Airbnb is a critical system that drives ...
Location
Location
United States
Salary
Salary:
204000.00 - 255000.00 USD / Year
airbnb.com Logo
Airbnb
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Expertise in various machine learning and AI methodologies, including LLMs and non-LLMs, tailored for user-facing products
  • Proven experience in leading teams that develop large-scale ML models and systems to improve online user experiences
  • Strong leadership skills with a track record of nurturing an innovative and collaborative team environment
  • Exceptional verbal and written communication abilities, with a keen eye for detail
  • Demonstrated capability to work effectively with stakeholders at all organizational levels, both internally and externally
  • Skilled in navigating and resolving ambiguous challenges through proactive and strategic approaches
  • PhD, or Master's degree in Computer Science, Mathematics, Statistics, or related technical field
  • 10+ years of experience in building and shipping AI models and products, including 2+ years of experience with LLMs
  • 5+ years managing machine learning teams that deliver large impact
  • Expert knowledge of machine learning algorithms and techniques
Job Responsibility
Job Responsibility
  • Lead and mentor a dynamic team of highly skilled applied scientists and machine learning engineers in the research, design and optimization of AI models and services
  • Develop and refine the overarching strategy for the ML and AI aspects of our community support products, focusing on scalability, quality, safety, performance, and reliability
  • Foster rapid development cycles without sacrificing quality, collaborating closely with platform, backend, and frontend engineers to engineer robust ML models and systems that enhance community support initiatives
  • Evaluate technical trade-offs in key decisions, ensuring optimal outcomes through data-backed strategies
  • Conduct thorough design and architecture reviews to continually elevate our standards of technical excellence
What we offer
What we offer
  • bonus
  • equity
  • benefits
  • Employee Travel Credits
  • Fulltime
Read More
Arrow Right