This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
The Model Development Lifecycle Team is building a cutting-edge platform to simplify the entire machine learning journey. From development and training to deployment and management, we empower teams to turn data into actionable insights. Our platform supports: Seamless API Integration: Deploy models as APIs for consistent use across applications, whether on-premises, in enterprise infrastructures, or through third-party hosting. Collaboration and Discoverability: Use our model registry to version, store, and easily find models across the organization. Scalable Training Resources: Leverage advanced tools like GPUs, Ray, and Spark to meet the needs of diverse AI projects.
Job Responsibility:
Integrate model monitoring to provide a holistic view of deployment health and performance
Enhance tagging capabilities across Domino entities to improve discoverability and tracking
Expand LLM hosting capabilities to address customer needs for scale, performance, and logging
Innovate within our Domino Apps offering by incorporating feature requests from major customers
Requirements:
Hands-on experience developing and managing high-performance back-end systems in distributed computing environments
Working closely with cross-functional teams to integrate systems with front-end interfaces and third-party services
Designing and implementing secure, scalable APIs (e.g., RESTful APIs, gRPC)
Profiling and optimizing back-end performance, especially in cloud environments or with container technologies like Docker and Kubernetes.
Using robust testing frameworks (unit, integration, end-to-end) and setting up CI/CD pipelines
Familiarity with model registries, versioning, and lifecycle management tools like MLflow or KubeFlow is a big plus
Experience with frameworks like Apache Spark, Azure ML, or SageMaker is a plus
Proficiency with cloud providers (AWS, Azure, GCP) and deploying services in these environments
Expertise in languages such as Python, Java, Scala, or Go
Nice to have:
Familiarity with model registries, versioning, and lifecycle management tools like MLflow or KubeFlow
Experience with frameworks like Apache Spark, Azure ML, or SageMaker