This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are seeking a highly skilled Senior Research Engineer to collaborate closely with both Research and Engineering teams. The role involves diagnosing and resolving bottlenecks across large-scale distributed training, data processing, and inference systems, while also driving optimizations for existing high-performance pipelines. The ideal candidate possesses a deep understanding of modern deep learning systems, combined with strong engineering expertise in areas such as layer-level optimization, large-scale distributed training, streaming, low-latency and asynchronous inference, inference compilers, and advanced parallelization techniques. This is a cross-functional role requiring strong technical rigor, attention to detail, intellectual curiosity, and excellent communication skills. The position is embedded within the Research team and is responsible for developing and refining the technical foundation that enables cutting-edge research and translates its outcomes into production, bridging research and production engineering.
Job Responsibility:
Investigate and mitigate performance bottlenecks in large-scale distributed training and inference systems
Develop and implement both low-level (operator/kernel) and high-level (system/architecture) optimization strategies
Translate research models and prototypes into highly optimized, production-ready inference systems
Explore and integrate inference compilers such as TensorRT, ONNX Runtime, AWS Neuron and Inferentia, or similar technologies
Design, test, and deploy scalable solutions for parallel and distributed workloads on heterogeneous hardware
Facilitate knowledge transfer and bidirectional support between Research and Engineering teams, ensuring alignment of priorities and solutions
Requirements:
Strong expertise in the Python ecosystem and major ML frameworks (PyTorch, JAX)
Experience with lower-level programming (C++ or Rust preferred)
Deep understanding of GPU acceleration (CUDA, profiling, kernel-level optimization)
TPU experience is a strong plus
Proven ability to accelerate deep learning workloads using compiler frameworks, graph optimizations, and parallelization strategies
Solid understanding of the deep learning lifecycle: model design, large-scale training, data processing pipelines, and inference deployment
Strong debugging, profiling, and optimization skills in large-scale distributed environments
Excellent communication and collaboration skills, with the ability to clearly prioritize and articulate impact-driven technical solutions
Welcome to CrawlJobs.com – Your Global Job Discovery Platform
At CrawlJobs.com, we simplify finding your next career opportunity by bringing job listings directly to you from all corners of the web. Using cutting-edge AI and web-crawling technologies, we gather and curate job offers from various sources across the globe, ensuring you have access to the most up-to-date job listings in one place.
We use cookies to enhance your experience, analyze traffic, and serve personalized content. By clicking “Accept”, you agree to the use of cookies.