Machine Learning Platform / Backend Engineer Job at Everseen (Belgrade)

Job Description

We are seeking a Machine Learning Platform/Backend Engineer to design, build, and maintain scalable infrastructure that empowers our data scientists and machine learning engineers to develop, train, benchmark, and monitor machine learning models efficiently. You will be instrumental in shaping our internal Machine Learning Platform and driving automation, reproducibility, and performance across the machine learning lifecycle.

Job Responsibility

Design, build, and maintain scalable infrastructure that empowers data scientists and machine learning engineers
Own the design and implementation of the internal ML platform, enabling end-to-end workflow orchestration, resource management, and automation using cloud-native technologies (GCP/Azure)
Design and manage Kubernetes-based infrastructure for multi-tenant GPU and CPU workloads with strong isolation, quota control, and monitoring
Integrate and extend orchestration tools (Airflow, Kubeflow, Ray, Vertex AI, Azure ML or custom schedulers) to automate data processing, training, and deployment pipelines
Develop shared services for model behavior/performance tracking, data/datasets versioning, and artifact management (MLflow, DVC, or custom registries)
Build out documentation in relation to architecture, policies and operations runbooks
Share skills, knowledge, and expertise with members of the data engineering team
Foster a culture of collaboration and continuous learning by organizing training sessions, workshops, and knowledge-sharing sessions
Collaborate and drive progress with cross-functional teams to design and develop new features and functionalities
Ensure that the developed solutions meet project objectives and enhance user experience
Have influence over the technology stack and internal technical improvements, contributing to strategic decision-making
Based on requirements and a longer-term product and feature strategy, design and implement reusable, testable, efficient, and elegant code
Ensure adherence to coding standards and best practices
Create, maintain, and run unit tests for new and existing applications and services
Aim to deliver defect-free and well-tested solutions
Analyze and collect data from various sources such as log files, application stack traces, and thread dumps
Utilize data analysis to identify trends, patterns, and potential areas for improvement
Begin to implement changes based on data analysis
Create and maintain CI/CD integration using various tools
Automate the build, test, and deployment processes to ensure efficiency and reliability
Research and propose third-party software solutions to optimize system performance
Expand product capabilities by integrating compatible third-party solutions
Monitor update and tracking of third-party solutions' compatibility with Everseen stack according to internal development guidelines
Monitor production logs to identify and troubleshoot issues promptly
Ensure seamless operation and timely resolution of any anomalies to maintain system reliability
Responsible for creating, reviewing, and maintaining high-quality technical documentation to ensure clarity, consistency, and knowledge sharing within the development team

Requirements

4-5+ years of work experience in either ML infrastructure, MLOps, or Platform Engineering
Bachelors degree or equivalent focusing on the computer science field is preferred
Excellent communication and collaboration skills
Expert knowledge of Python
Experience with CI/CD tools (e.g., GitLab, Jenkins)
Hands-on experience with Kubernetes, Docker, and cloud services
Understanding of ML training pipelines, data lifecycle, and model serving concepts
Familiarity with workflow orchestration tools (e.g., Airflow, Kubeflow, Ray, Vertex AI, Azure ML)
A demonstrated understanding of the ML lifecycle, model versioning, and monitoring
Experience with ML frameworks (e.g., TensorFlow, PyTorch)
Experience with GPU orchestration (e.g., NVIDIA GPU Operator, MIG)
Experience with Infrastructure as Code (e.g., Terraform)
Experience with Data engineering tools (e.g., Snowflake, Databricks, BigQuery, Airbyte, Kafka)
Familiarity with feature stores and model registries
Exposure to large-scale distributed systems and performance optimisation
Ability to work with Linux systems, including troubleshooting skills such as log investigations, performance testing, and connectivity investigation
Possesses a deep understanding of technical concepts and terminology relevant to Everseen's products and services
Expert knowledge of advanced concepts like microservices and distributed systems
In-depth knowledge of Azure Kubernetes Services for container orchestration, Azure Blob Storage for data storage, and ElasticSearch for search and analytics
Ability to leverage cloud computing technologies and services for testing and validation purposes
In-depth knowledge of cloud security, scalability, and performance optimization principles
Excellent understanding of cloud computing technologies and services, including infrastructure as a service (IaaS), platform as a service (PaaS), and software as a service (SaaS)
Broad understanding of the software engineering and architecture space, including knowledge of various programming languages, frameworks, techniques, and industry trends in AI

Nice to have

Interest in Learning and Growth Mindset
Demonstrated interest in learning and a strong desire to expand knowledge in their respective field
Curiosity to explore new technologies, methodologies, and best practices to enhance skills and capabilities
Results-oriented attitude, with a drive to achieve objectives efficiently
Analytical and Problem-Solving Skills
Possesses strong analytical and problem-solving abilities, leveraging data to inform product decisions

Everseen - All Job Offers

Select Country

Machine Learning Platform / Backend Engineer

Job Description

Job Responsibility

Requirements

Nice to have

Looking for more opportunities?

Machine Learning Platform / Backend Engineer

Senior Machine Learning Platform Engineer

Senior Machine Learning Engineer, AI Platform

Senior Machine Learning Engineer, ML Training Platform

Sr. Sw Engineer, Machine Learning

GenAI / Machine Learning Engineer

Staff Machine Learning Engineer

GenAI / Machine Learning Engineer - VOIS

Senior Machine Learning Engineer

Our AI answers in your language