Senior Machine Learning Engineer Job at Microsoft Corporation (Zürich)

Senior Machine Learning Engineer

Microsoft Corporation

Location:
Switzerland , Zürich

Category:
IT - Software Development

Contract Type:
Not provided

Salary:

Not provided

Save Job

Apply Position

Job Description:

We are seeking a Senior Machine Learning Engineer to bridge the gap between advanced Vision-Language Model (VLM) research and high-performance production serving. Unlike standard data science and engineering roles, this position requires a dual competency: you must be capable of designing novel VLM architectures (including dataset curation and multilingual alignment) AND optimizing the inference stack (kernel optimization, distillation, and memory management) to run these models on specific hardware constraints (NVIDIA H100 and AMD MI300x). The successful candidate will own the entire vertical slice: from reading the latest arXiv papers and improving training sets, to writing the C++/CUDA kernels that serve the final model in production.

Job Responsibility:

Continuously evaluate and implement the latest research trends in Vision-Language Models, specifically focusing on Referring Expression Comprehension (REC), Document Understanding (Pix2Struct), and Visual Question Answering (VQA)
Design and build massive-scale training and evaluation datasets, ensuring multilingual compatibility and broad visual understanding for European market requirements
Lead the model co-design process, creating architectures that are natively optimized for accelerator capabilities (compute-bound vs. memory-bound operations)
Architect high-throughput serving layers using SGLang and vLLM, optimizing for non-standard decoding strategies
Implement scientific experiments to find the Pareto-optimal frontier between serving latency and generation quality
Execute Knowledge Distillation (KD), unstructured pruning, and quantization techniques to fit large-scale VLM architectures onto single-node GPU setups (specifically H100 or MI300x) without compromising model quality
Write and optimize custom kernels (CUDA/HIP) to accelerate serving latency, identifying bottlenecks at the operator level
Manage the full pre-training and post-training tech stack, ensuring seamless integration between model weights and inference engines
Take ownership of landing the serving-efficient model in a production environment, ensuring reliability and scalability

Requirements:

Master’s or PhD in Computer Science, Artificial Intelligence, or High-Performance Computing
Minimum 4+ years of experience in Machine Learning, with a mandatory split focus between Model Architecture and Systems Optimization
Proven experience building and shipping Vision-Language Models (e.g., architectures similar to CLIP, Flamingo, Pix2Struct)
Must have experience creating custom evaluation sets for tasks like Document Understanding
Expert-level knowledge of SGLang and vLLM for optimized serving
Demonstrable experience optimizing models for both NVIDIA (H100) and AMD (MI300x) accelerators
Hands-on experience with Knowledge Distillation and Pruning to reduce model latency for target serving sizes
A track record of taking complex multi-modal models from research code to a deployed, user-facing production product

Additional Information:

Job Posted:
February 17, 2026

Employment Type:

Fulltime

Work Type:

On-site work

Microsoft Corporation - All Job Offers

Job Link Share: