This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Roku is changing how the world watches TV. Roku is the #1 TV streaming platform in the U.S., Canada, and Mexico, and we've set our sights on powering every television in the world. Roku pioneered streaming to the TV. Our mission is to be the TV streaming platform that connects the entire TV ecosystem. We connect consumers to the content they love, enable content publishers to build and monetize large audiences, and provide advertisers unique capabilities to engage consumers. The Advertising Performance group focuses on performance for all participants in the Advertising ecosystem - Advertisers, Publishers, and Roku. The systems and solutions span multiple disciplines and technologies to perform real-time multi-objective optimization across distributed systems at large scale and with low latency. We use Machine Learning, Reinforcement Learning, AI, Control and Optimization Systems, and Auction Dynamics to solve a large set of complex problems. At the core of this is our Machine Learning and Inference Platform that powers the entire landscape.
Job Responsibility
Lead the design and development of a SOTA Inference platform
Oversee the development of monitoring, observability, and other tooling to ensure system and model performance, reliability, and scalability of online inference services
Identify and resolve system inefficiencies, performance bottlenecks, and reliability issues, ensuring optimized end-to-end performance
Stay at the forefront of advancements in inference frameworks, ML hardware acceleration, and distributed systems, and incorporate innovations where and when they are impactful
Requirements
M.S. or above in CS, ECE, or a related field
10+ years of experience in developing and deploying large-scale, distributed systems, with at least 5 years in a leadership or technical lead role
Strong programming skills in high-performance languages
Deep understanding of inference frameworks and ML system deployment
Proven experience optimizing performance for large-scale machine learning systems, including a deep knowledge of SOTA model optimizations, hardware-software co-design, GPU acceleration, and HPC techniques
Excellent communication and collaboration skills
Experience leading teams working on high-throughput, low-latency ML serving systems
Experience collaborating with and leading global, cross-functional teams
Contributions to open-source ML or systems projects
What we offer
Global access to mental health and financial wellness support and resources
Healthcare (medical, dental, and vision) where applicable
Life, accident, disability, commuter, and retirement options (401(k)/pension) where applicable