CrawlJobs Logo

Machine Learning Platform / Backend Engineer

Serbia; Romania, Belgrade · Job Posted December 08, 2025
Apply Position
Job Link Share

Job Description

We are seeking a Machine Learning Platform/Backend Engineer to design, build, and maintain scalable infrastructure that empowers our data scientists and machine learning engineers to develop, train, benchmark, and monitor machine learning models efficiently. You will be instrumental in shaping our internal Machine Learning Platform and driving automation, reproducibility, and performance across the machine learning lifecycle.

Job Responsibility

  • Design, build, and maintain scalable infrastructure that empowers data scientists and machine learning engineers
  • Own the design and implementation of the internal ML platform, enabling end-to-end workflow orchestration, resource management, and automation using cloud-native technologies (GCP/Azure)
  • Design and manage Kubernetes-based infrastructure for multi-tenant GPU and CPU workloads with strong isolation, quota control, and monitoring
  • Integrate and extend orchestration tools (Airflow, Kubeflow, Ray, Vertex AI, Azure ML or custom schedulers) to automate data processing, training, and deployment pipelines
  • Develop shared services for model behavior/performance tracking, data/datasets versioning, and artifact management (MLflow, DVC, or custom registries)
  • Build out documentation in relation to architecture, policies and operations runbooks
  • Share skills, knowledge, and expertise with members of the data engineering team
  • Foster a culture of collaboration and continuous learning by organizing training sessions, workshops, and knowledge-sharing sessions
  • Collaborate and drive progress with cross-functional teams to design and develop new features and functionalities
  • Ensure that the developed solutions meet project objectives and enhance user experience
  • Have influence over the technology stack and internal technical improvements, contributing to strategic decision-making
  • Based on requirements and a longer-term product and feature strategy, design and implement reusable, testable, efficient, and elegant code
  • Ensure adherence to coding standards and best practices
  • Create, maintain, and run unit tests for new and existing applications and services
  • Aim to deliver defect-free and well-tested solutions
  • Analyze and collect data from various sources such as log files, application stack traces, and thread dumps
  • Utilize data analysis to identify trends, patterns, and potential areas for improvement
  • Begin to implement changes based on data analysis
  • Create and maintain CI/CD integration using various tools
  • Automate the build, test, and deployment processes to ensure efficiency and reliability
  • Research and propose third-party software solutions to optimize system performance
  • Expand product capabilities by integrating compatible third-party solutions
  • Monitor update and tracking of third-party solutions' compatibility with Everseen stack according to internal development guidelines
  • Monitor production logs to identify and troubleshoot issues promptly
  • Ensure seamless operation and timely resolution of any anomalies to maintain system reliability
  • Responsible for creating, reviewing, and maintaining high-quality technical documentation to ensure clarity, consistency, and knowledge sharing within the development team

Requirements

  • 4-5+ years of work experience in either ML infrastructure, MLOps, or Platform Engineering
  • Bachelors degree or equivalent focusing on the computer science field is preferred
  • Excellent communication and collaboration skills
  • Expert knowledge of Python
  • Experience with CI/CD tools (e.g., GitLab, Jenkins)
  • Hands-on experience with Kubernetes, Docker, and cloud services
  • Understanding of ML training pipelines, data lifecycle, and model serving concepts
  • Familiarity with workflow orchestration tools (e.g., Airflow, Kubeflow, Ray, Vertex AI, Azure ML)
  • A demonstrated understanding of the ML lifecycle, model versioning, and monitoring
  • Experience with ML frameworks (e.g., TensorFlow, PyTorch)
  • Experience with GPU orchestration (e.g., NVIDIA GPU Operator, MIG)
  • Experience with Infrastructure as Code (e.g., Terraform)
  • Experience with Data engineering tools (e.g., Snowflake, Databricks, BigQuery, Airbyte, Kafka)
  • Familiarity with feature stores and model registries
  • Exposure to large-scale distributed systems and performance optimisation
  • Ability to work with Linux systems, including troubleshooting skills such as log investigations, performance testing, and connectivity investigation
  • Possesses a deep understanding of technical concepts and terminology relevant to Everseen's products and services
  • Expert knowledge of advanced concepts like microservices and distributed systems
  • In-depth knowledge of Azure Kubernetes Services for container orchestration, Azure Blob Storage for data storage, and ElasticSearch for search and analytics
  • Ability to leverage cloud computing technologies and services for testing and validation purposes
  • In-depth knowledge of cloud security, scalability, and performance optimization principles
  • Excellent understanding of cloud computing technologies and services, including infrastructure as a service (IaaS), platform as a service (PaaS), and software as a service (SaaS)
  • Broad understanding of the software engineering and architecture space, including knowledge of various programming languages, frameworks, techniques, and industry trends in AI

Nice to have

  • Interest in Learning and Growth Mindset
  • Demonstrated interest in learning and a strong desire to expand knowledge in their respective field
  • Curiosity to explore new technologies, methodologies, and best practices to enhance skills and capabilities
  • Results-oriented attitude, with a drive to achieve objectives efficiently
  • Analytical and Problem-Solving Skills
  • Possesses strong analytical and problem-solving abilities, leveraging data to inform product decisions

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Machine Learning Platform / Backend Engineer

8 matching positions

Senior Machine Learning Platform Engineer

We are looking for a Senior Machine Learning Platform Engineer to join the growi...
Location
Location
United States , San Francisco
Salary
Salary:
180000.00 - 200000.00 USD / Year
strava.com Logo
Strava
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Have worked on complex, ambiguous platform challenges and broken them down into manageable tasks with both strategies and tactical execution
  • Demonstrated technical leadership in leading projects and the ability to mentor and grow early-career team members
  • Have demonstrated strong interpersonal and communication skills, and a collaborative approach to drive business impact across teams
  • Have worked with a variety of MLOps tools that fulfill different ML needs (like FastAPI, LitServe, Metaflow, MLflow, Kubeflow, Feast)
  • Are experienced in production ML model operational excellence and best practices, like automated model retraining, performance monitoring, feature logging, A/B testing
  • Experience with generative AI technologies around LLM evaluation, vector stores, and agent frameworks
  • Have built backend production tools and services on cloud environments like (but not limited to) AWS, using languages Python, Terraform, and other similar technologies
  • Have built and worked on data pipelines using large scale data technologies (like Spark, SQL, Snowflake)
  • Have experience building, shipping, and supporting ML models in production at scale
  • Have experience with exploratory data analysis and model prototyping, using languages such as Python or R and tools like Scikit learn, Pandas, Numpy, Pytorch, Tensorflow, Sagemaker
Job Responsibility
Job Responsibility
  • Own End to End Systems: Drive key projects to power AI/ML at Strava end-to-end from gathering stakeholders requirements to ground up developer to driving adoption and optimizing the experience
  • Interact with a Rich and Large Dataset: Explore and help leverage Strava’s extensive unique fitness and geo datasets from millions of users to extract actionable insights, inform product decisions, and optimize existing features
  • Contribute to a Well Loved Consumer Product: Work at the intersection of AI and fitness to help launch and maintain product experiences that will be used by tens of millions of active people worldwide
What we offer
What we offer
  • Offers Equity
  • Fulltime
Read More
Arrow Right
New

Senior Machine Learning Engineer, AI Platform

The AI Platform team is responsible for building the foundational infrastructure...
Location
Location
United States; Canada
Salary
Salary:
139000.00 - 218000.00 USD / Year
mozilla.org Logo
Mozilla
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree with 4–6 years of relevant industry experience, or Master’s degree with significant hands-on experience building and operating production ML systems, or work experience equivalent
  • Strong experience developing in Python for machine learning systems, backend services, or distributed data processing
  • Proven experience deploying and operating ML workloads in cloud environments, including production-grade infrastructure
  • Solid understanding of model serving architectures, inference pipelines, and performance tradeoffs (latency, throughput, cost, scaling strategies)
  • Hands-on experience working with GPU-based workloads and accelerated computing in production settings
  • Experience designing CI/CD pipelines and development workflows that support reliable ML system deployment
  • Ability to independently scope and drive technical initiatives while balancing product and operational priorities
  • Strong problem-solving skills and the ability to debug performance and reliability issues in distributed systems
  • Clear and effective communication skills, with experience collaborating across engineering, product, and infrastructure teams
Job Responsibility
Job Responsibility
  • Design, build, and operate core AI platform components used to train, deploy, and serve machine learning models in production environments
  • Own model serving and inference workflows end-to-end, driving improvements in reliability, scalability, performance, and operational excellence
  • Lead efforts to optimize inference systems for throughput, latency, and cost efficiency across CPU and GPU workloads
  • Design and manage GPU-based inference and training workloads, including performance tuning, capacity planning, and resource utilization optimization
  • Own and improve critical parts of the model lifecycle, including packaging, versioning, testing strategies, validation, and deployment automation
  • Implement and evolve observability practices (metrics, logging, tracing, alerting) to improve visibility and operational resilience of ML services and pipelines
  • Partner closely with product, infrastructure, security, and data teams to design scalable platform capabilities that enable AI-powered features
  • Contribute to technical design discussions, propose architectural improvements, and mentor junior engineers through code reviews and knowledge sharing
  • Participate in and help improve operational processes, including incident response, on-call rotations, and post-incident reviews
What we offer
What we offer
  • Generous performance-based bonus plans
  • Rich medical, dental, and vision coverage
  • Generous retirement contributions with 100% immediate vesting
  • Quarterly all-company wellness days
  • Country specific holidays plus a day off for your birthday
  • One-time home office stipend
  • Annual professional development budget
  • Quarterly well-being stipend
  • Considerable paid parental leave
  • Employee referral bonus program
  • Fulltime
Read More
Arrow Right

Senior Machine Learning Engineer, ML Training Platform

Location
Location
United States
Salary
Salary:
216700.00 - 303400.00 USD / Year
Reddit
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of software engineering experience, with a focus on Platform Engineering, ML Infrastructure, or Backend Systems
  • Deep Kubernetes Expertise: You know K8s beyond just 'deploying pods.' You understand CRDs, Controllers and the Operator pattern
  • Jupyter Ecosystem Knowledge: Experience customizing JupyterHub, JupyterLab extensions, or building similar interactive computing platforms
  • Strong Coding Skills: Proficiency in Python (for the ML ecosystem) and Go (for Kubernetes controllers/infrastructure tooling)
  • GPU Experience: Hands-on practice with CUDA environments, GPU virtualization/containerization, and doing it all within Kubernetes
  • Cloud Provider Experience: Familiarity with both managed ML offerings (Vertex AI, Sagemaker, etc) and building custom ML components in AWS and/or GCP
  • Experience working with distributed training frameworks, including Ray and Kubernetes
  • Comfortable with distributed systems, big data (Petabyte scale) and data-intensive systems
  • Strong focus on scalability, reliability, performance, and ease of use. You are an undying advocate for platform users and have a deep intuition for the machine learning development lifecycle
  • Strong organizational & communication skills
Job Responsibility
Job Responsibility
  • Lead the building, testing, and maintenance of ML training infrastructure at Reddit
  • Play a pivotal role in designing, building, and optimizing the infrastructure and tooling required to support large-scale machine learning workflows
  • Evolve the MLE experience, from provisioning interactive GPU environments through large-scale training, supporting on-demand and self-service workflows
  • Kubernetes Automation: Write custom Kubernetes Controllers and Operators to manage the lifecycle of interactive Jupyter workspaces and long-running ML training jobs, handle auto-idling, and ensure fault tolerance
  • GPU Orchestration: Work with the underlying compute team to ensure MLEs have efficient access to training hardware resources and handle resource contention gracefully
  • Developer Experience (DevX): Treat internal MLEs as your customers. Conduct user research, reduce friction in the 'Idea-to-Prototype' loop, and standardize software environments (Docker images, Python dependency management)
What we offer
What we offer
  • Comprehensive Healthcare Benefits and Income Replacement Programs
  • 401k Match
  • Family Planning Support
  • Gender-Affirming Care
  • Mental Health & Coaching Benefits
  • Flexible Vacation & Reddit Global Days off
  • Generous paid Parental Leave
  • Paid Volunteer time off
  • Fulltime
Read More
Arrow Right

Sr. Sw Engineer, Machine Learning

We are hiring a Senior Engineer to lead the architecture and implementation of a...
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
roku.com Logo
Roku
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science or a related field
  • Experience building recommendation or decisioning systems, ideally in advertising, media, or revenue-platform environments
  • Strong understanding of modern LLMs and agentic systems, and the ability to evaluate latency, cost, and quality tradeoffs
  • Solid experience with LLM and multi-agent pipelines, including prompting, tool use, orchestration, tradeoff analysis, and error handling
  • Experience deploying ML systems in production, including model serving, containerization, CI/CD, and monitoring
  • Hands-on experience with modern ML frameworks and tooling such as PyTorch, Hugging Face Transformers, agent orchestration frameworks (e.g., LangGraph or similar), feature stores, and vector databases for RAG workflows
  • Experience designing evaluation approaches for recommendation and generative systems using human review, automated and offline metrics, and online A/B testing
  • Strong software engineering fundamentals and solid production experience in Java or Python
  • Ability to translate ambiguous business requirements into practical technical solutions and communicate tradeoffs clearly to cross-functional partners
Job Responsibility
Job Responsibility
  • Define the technical architecture and overall stack for an agent-native business applications platform spanning pricing guidance, booking and order intelligence, upsell recommendations, churn and retention prediction, media planning, deal scoring, revenue forecasting, and more
  • Evaluate LLMs, multimodal systems, multi-agent orchestration frameworks, and recommendation, ranking, and forecasting models for product use
  • Design and build the pipeline from business and customer signals to model inference, recommendation generation, output validation, and integration with internal revenue systems and APIs
  • Build production-grade systems with strong error handling, output validation, explainability, auditability, and human-in-the-loop guardrails for high-stakes pricing and financial decisions
  • Partner cross-functionally with ML, backend, frontend, data, and business teams to iterate quickly based on feedback and business needs
  • Drive technical decisions that directly influence revenue impact, product quality, scalability, and time-to-market
What we offer
What we offer
  • Global access to mental health and financial wellness support and resources
  • Healthcare (medical, dental, and vision)
  • Life, accident, disability, commuter, and retirement options (401(k)/pension)
  • Time off in accordance with local leave policies
  • Fulltime
Read More
Arrow Right

GenAI / Machine Learning Engineer

We are seeking a GenAI / ML Engineer with 4–5 years of overall professional expe...
Location
Location
India , Pune
Salary
Salary:
Not provided
vodafone.com Logo
Vodafone
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4–5 years of overall experience in software engineering, AI/ML engineering, data engineering, cloud engineering, or related technology roles
  • At least 2 years of relevant domain experience in Generative AI, LLM-based applications, NLP or conversational AI, data or analytics engineering, cloud-native development, or enterprise chatbot platforms
  • Strong hands-on experience with Python for application development, automation, AI/ML workflows, or backend services
  • Working knowledge of machine learning and Generative AI concepts, including LLMs, embeddings, prompts, and RAG-based patterns
  • Comfortable working with SQL and enterprise datasets
  • Good working knowledge of Google Cloud Platform or similar cloud environments
  • Experience developing or integrating APIs and backend or cloud-based applications
  • Ability to debug, test, and optimise AI/ML or data-driven solutions for accuracy, reliability, and performance
  • Effective communication with both technical and business stakeholders and collaborative work across teams
Job Responsibility
Job Responsibility
  • Design, develop, and enhance GenAI and ML-based solutions for enterprise business use cases
  • Build and improve Natural Language to SQL and Retrieval-Augmented Generation based chatbot capabilities
  • Develop and maintain Python-based backend services, APIs, and AI workflows supporting LLM-driven applications
  • Work with Google Cloud Platform services to build, deploy, and optimise scalable, cloud-native AI applications
  • Improve retrieval quality, prompt orchestration, SQL generation accuracy, chatbot response quality, and overall solution performance
  • Collaborate with internal stakeholders to translate business requirements into scalable AI/ML product capabilities
  • Contribute to testing, evaluation, monitoring, documentation, and production-readiness of GenAI solutions
  • Support continuous improvement of AI engineering practices, including prompt design, evaluation frameworks, observability, and responsible AI usage
What we offer
What we offer
  • Opportunity to work on production-grade Generative AI solutions with direct business impact
  • Hands-on exposure to modern GCP services, cloud-native deployment patterns, and enterprise-scale AI architectures
  • Experience across LLMs, RAG, Natural Language to SQL, enterprise data platforms, APIs, and chatbot engineering
  • A high-visibility role with clear growth pathways into senior engineering, technical leadership, solution architecture, or AI product ownership roles
  • Fulltime
Read More
Arrow Right

Staff Machine Learning Engineer

Applied AI is a horizontal AI team at Uber partnering with product and platform ...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
uber.com Logo
Uber
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of industry experience in machine learning or software engineering, with a proven record of delivering ML solutions to production
  • Strong knowledge of machine learning, deep learning, and exposure to generative AI techniques (e.g., transformers, LLMs, diffusion)
  • Experience designing and scaling ML systems or platforms, including training pipelines, serving infrastructure, and model lifecycle tooling
  • Fluency in ML frameworks (e.g., PyTorch, TensorFlow, JAX) and development in Python and/or scalable backend languages (e.g., Java, Go)
  • Excellent collaboration and communication skills with the ability to work across teams and functions
Job Responsibility
Job Responsibility
  • Design and implement ML-driven systems that power core Uber experiences, with a focus on scalability, reliability, and performance
  • Lead the technical execution of key projects involving classical ML, deep learning, and generative AI technologies (e.g., LLMs, multimodal models)
  • Collaborate closely with product, data science, and infrastructure teams to develop AI solutions from ideation through production deployment
  • Contribute to and influence the technical direction for Applied AI, particularly around system design, model architecture, and infrastructure decisions
  • Champion engineering best practices in ML development — including experimentation workflows, model versioning, evaluation, monitoring, and responsible AI
  • Provide mentorship to engineers on the team and across partner orgs to help raise the technical bar
  • Fulltime
Read More
Arrow Right

GenAI / Machine Learning Engineer - VOIS

We are seeking a GenAI / ML Engineer with 3-4 years of overall professional expe...
Location
Location
India , Pune
Salary
Salary:
Not provided
vodafone.com Logo
Vodafone
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3-4 years of overall experience in software engineering, AI/ML engineering, data engineering, cloud engineering, or related technology roles
  • At least 1 year of relevant domain experience in areas such as Generative AI, LLM-based applications, NLP or conversational AI, data or analytics engineering, cloud-native development, or enterprise chatbot platforms
  • Strong hands-on experience with Python for application development, automation, AI/ML workflows, or backend services
  • Working knowledge of machine learning and Generative AI concepts, including LLMs, embeddings, prompts, and RAG-based patterns
  • Comfortable working with SQL and enterprise datasets to support analytical and data-driven use cases
  • Good working knowledge of Google Cloud Platform or similar cloud environments
  • Experience developing or integrating APIs and backend or cloud-based applications
  • Ability to debug, test, and optimise AI/ML or data-driven solutions for accuracy, reliability, and performance
  • Communicate effectively with both technical and business stakeholders and work collaboratively across teams
Job Responsibility
Job Responsibility
  • Design, develop, and enhance GenAI and ML-based solutions for enterprise business use cases
  • Build and improve Natural Language to SQL and Retrieval-Augmented Generation based chatbot capabilities
  • Develop and maintain Python-based backend services, APIs, and AI workflows supporting LLM-driven applications
  • Work with Google Cloud Platform services to build, deploy, and optimise scalable, cloud-native AI applications
  • Improve retrieval quality, prompt orchestration, SQL generation accuracy, chatbot response quality, and overall solution performance
  • Collaborate with internal stakeholders to translate business requirements into scalable AI/ML product capabilities
  • Contribute to testing, evaluation, monitoring, documentation, and production-readiness of GenAI solutions
  • Support continuous improvement of AI engineering practices, including prompt design, evaluation frameworks, observability, and responsible AI usage
What we offer
What we offer
  • Opportunity to work on production-grade Generative AI solutions with direct business impact
  • Hands-on exposure to modern GCP services, cloud-native deployment patterns, and enterprise-scale AI architectures
  • Experience across LLMs, RAG, Natural Language to SQL, enterprise data platforms, APIs, and chatbot engineering
  • A high-visibility role with clear growth pathways into senior engineering, technical leadership, solution architecture, or AI product ownership roles
  • Fulltime
Read More
Arrow Right

Senior Machine Learning Engineer

We’re hiring a Senior Machine Learning Engineer to help build the core AI system...
Location
Location
United States , San Francisco
Salary
Salary:
230000.00 - 300000.00 USD / Year
signifytechnology.com Logo
Signify Technology
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong experience building production AI systems around LLMs, OCR, and unstructured data workflows
  • Proven track record shipping applied AI products, not just prototyping models offline
  • Deep familiarity with modern LLM workflows including prompting, structured outputs, tool use, retries, fallbacks, guardrails, and model evaluation
  • Experience with document intelligence systems such as OCR pipelines, document extraction, classification, post-processing, and confidence-based review flows
  • Experience with voice or conversational AI, or adjacent systems involving transcripts, call automation, and conversational extraction
  • Strong proficiency in Python and comfort working in production codebases with APIs, queues, and backend services
  • Experience deploying and operating AI systems in AWS or similar cloud environments, including serverless or event-driven architectures
  • Strong instincts around evaluation, benchmarking, monitoring, and quality assurance for real-world AI systems
  • Ability to work across structured and unstructured data and design systems that are robust to noisy, incomplete, and ambiguous inputs
Job Responsibility
Job Responsibility
  • Build the core AI systems behind a next-generation healthcare platform that turns smartphone video into clinically accurate 3D models of human anatomy
  • Own the pipeline that bridges raw computer vision data and physical 3D-printed medical solutions, transforming noisy real-world scans into precise, CAD-compatible models used to improve patient outcomes
  • Work closely with engineers, researchers, and product leaders to design systems that translate cutting-edge ML research into reliable production technology used in healthcare
What we offer
What we offer
  • Equity
  • Fulltime
Read More
Arrow Right