This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are seeking a Machine Learning Platform/Backend Engineer to design, build, and maintain scalable infrastructure that empowers our data scientists and machine learning engineers to develop, train, benchmark, and monitor machine learning models efficiently. You will be instrumental in shaping our internal Machine Learning Platform and driving automation, reproducibility, and performance across the machine learning lifecycle.
Job Responsibility:
Design, build, and maintain scalable infrastructure that empowers data scientists and machine learning engineers
Own the design and implementation of the internal ML platform, enabling end-to-end workflow orchestration, resource management, and automation using cloud-native technologies (GCP/Azure)
Design and manage Kubernetes-based infrastructure for multi-tenant GPU and CPU workloads with strong isolation, quota control, and monitoring
Integrate and extend orchestration tools (Airflow, Kubeflow, Ray, Vertex AI, Azure ML or custom schedulers) to automate data processing, training, and deployment pipelines
Develop shared services for model behavior/performance tracking, data/datasets versioning, and artifact management (MLflow, DVC, or custom registries)
Build out documentation in relation to architecture, policies and operations runbooks
Share skills, knowledge, and expertise with members of the data engineering team
Foster a culture of collaboration and continuous learning by organizing training sessions, workshops, and knowledge-sharing sessions
Collaborate and drive progress with cross-functional teams to design and develop new features and functionalities
Ensure that the developed solutions meet project objectives and enhance user experience
Have influence over the technology stack and internal technical improvements, contributing to strategic decision-making
Based on requirements and a longer-term product and feature strategy, design and implement reusable, testable, efficient, and elegant code
Ensure adherence to coding standards and best practices
Create, maintain, and run unit tests for new and existing applications and services
Aim to deliver defect-free and well-tested solutions
Analyze and collect data from various sources such as log files, application stack traces, and thread dumps
Utilize data analysis to identify trends, patterns, and potential areas for improvement
Begin to implement changes based on data analysis
Create and maintain CI/CD integration using various tools
Automate the build, test, and deployment processes to ensure efficiency and reliability
Research and propose third-party software solutions to optimize system performance
Expand product capabilities by integrating compatible third-party solutions
Monitor update and tracking of third-party solutions' compatibility with Everseen stack according to internal development guidelines
Monitor production logs to identify and troubleshoot issues promptly
Ensure seamless operation and timely resolution of any anomalies to maintain system reliability
Responsible for creating, reviewing, and maintaining high-quality technical documentation to ensure clarity, consistency, and knowledge sharing within the development team
Requirements:
4-5+ years of work experience in either ML infrastructure, MLOps, or Platform Engineering
Bachelors degree or equivalent focusing on the computer science field is preferred
Excellent communication and collaboration skills
Expert knowledge of Python
Experience with CI/CD tools (e.g., GitLab, Jenkins)
Hands-on experience with Kubernetes, Docker, and cloud services
Understanding of ML training pipelines, data lifecycle, and model serving concepts
A demonstrated understanding of the ML lifecycle, model versioning, and monitoring
Experience with ML frameworks (e.g., TensorFlow, PyTorch)
Experience with GPU orchestration (e.g., NVIDIA GPU Operator, MIG)
Experience with Infrastructure as Code (e.g., Terraform)
Experience with Data engineering tools (e.g., Snowflake, Databricks, BigQuery, Airbyte, Kafka)
Familiarity with feature stores and model registries
Exposure to large-scale distributed systems and performance optimisation
Ability to work with Linux systems, including troubleshooting skills such as log investigations, performance testing, and connectivity investigation
Possesses a deep understanding of technical concepts and terminology relevant to Everseen's products and services
Expert knowledge of advanced concepts like microservices and distributed systems
In-depth knowledge of Azure Kubernetes Services for container orchestration, Azure Blob Storage for data storage, and ElasticSearch for search and analytics
Ability to leverage cloud computing technologies and services for testing and validation purposes
In-depth knowledge of cloud security, scalability, and performance optimization principles
Excellent understanding of cloud computing technologies and services, including infrastructure as a service (IaaS), platform as a service (PaaS), and software as a service (SaaS)
Broad understanding of the software engineering and architecture space, including knowledge of various programming languages, frameworks, techniques, and industry trends in AI
Nice to have:
Interest in Learning and Growth Mindset
Demonstrated interest in learning and a strong desire to expand knowledge in their respective field
Curiosity to explore new technologies, methodologies, and best practices to enhance skills and capabilities
Results-oriented attitude, with a drive to achieve objectives efficiently
Analytical and Problem-Solving Skills
Possesses strong analytical and problem-solving abilities, leveraging data to inform product decisions
Welcome to CrawlJobs.com – Your Global Job Discovery Platform
At CrawlJobs.com, we simplify finding your next career opportunity by bringing job listings directly to you from all corners of the web. Using cutting-edge AI and web-crawling technologies, we gather and curate job offers from various sources across the globe, ensuring you have access to the most up-to-date job listings in one place.
We use cookies to enhance your experience, analyze traffic, and serve personalized content. By clicking “Accept”, you agree to the use of cookies.