This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Meta is seeking a Research Scientist Intern to join our Meta PyTorch Distributed Team. Our team’s mission is to make PyTorch faster and easier to use in order to create and maintain a state-of-the-art machine learning framework that is used across Meta and the entire industry. The key challenges in the team are composing multiple distributed training features to support growing model complexity, jointly optimizing computation and communication to maximize hardware utilization, and automating parallelizations to boost usability. Our internships are twelve (12) to twenty-four (24) weeks long and we have various start dates throughout the year.
Job Responsibility:
Apply relevant AI and machine learning techniques to advance the state-of-the-art in machine learning frameworks
Collaborate with users of PyTorch to enable new use cases for the framework both inside and outside Meta
Develop novel, accurate AI algorithms and advanced systems for large scale distributed training and inference
Leverage graph-based and compiler-based technologies to optimize distributed training and distributed inference use-cases
Requirements:
Currently has, or is in the process of obtaining, PhD degree in the field of Computer Science or a related STEM field
Experience in one or more of the following machine learning/deep learning domains: Large scale training and inference ML Systems Research, ML theory: Basic knowledge about ML models in different modalities like LLM (Large Language Models), Vision (VITS, MVITS) and Multimodal and how scale impacts performance, ML systems: AI infrastructure, machine learning accelerators, high performance computing, machine learning compilers, GPU architecture, machine learning frameworks, distributed systems, on-device optimization
Must obtain work authorization in the country of employment at the time of hire, and maintain ongoing work authorization during employment
Nice to have:
Experience or knowledge on training models at scale using PyTorch/TensorFlow/JAX
Experience or knowledge on working with a distributed GPU cluster
Intent to return to degree program after the completion of the internship/co-op
Proven track record of achieving significant results as demonstrated by grants, fellowships, patents, as well as first-authored publications at leading workshops or conferences such as NeurIPS, MLSys, ASPLOS, PLDI, CGO, PACT, ICML, or similar
Experience working and communicating cross functionally in a team environment