CrawlJobs Logo

Research Scientist Intern, PyTorch Framework Performance

meta.com Logo

Meta

Location Icon

Location:
United States , Bellevue

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

7650.00 - 12134.00 USD / Month

Job Description:

Our team’s mission is to make PyTorch models high-performing, deterministic and stable, via a robust foundational framework that supports the latest hardware, without sacrificing the flexibility and ease of use of PyTorch. We are seeking a PhD Research Intern to work on next-generation Mixture-of-Experts (MoE) systems for PyTorch, focused on substantially improving end-to-end training and inference throughput on modern accelerators (e.g., NVIDIA Hopper and beyond). This internship will explore novel combinations of communication-aware distributed training and kernel- and IO-aware execution optimizations (inspired bySonicMoE and related works) to unlock new performance regimes for large-scale sparse models. The project spans systems research, GPU kernel optimization, and framework optimization, with opportunities for open-source contributions and publication.

Job Responsibility:

  • Design and evaluate communication-aware, kernel-aware, and quantization-aware MoE execution strategies, combining ideas such as expert placement, routing, batching, scheduling, and precision selection
  • Develop and optimize GPU kernels and runtime components for MoE workloads, including fused kernels, grouped GEMMs, memory-efficient forward and backward passes
  • Explore quantization techniques (e.g., MXFP8, FP8) in the context of MoE, balancing accuracy, performance, and hardware efficiency
  • Build performance models and benchmarks to analyze compute, memory, communication, and quantization overheads across different sparsity regimes
  • Run experiments on single-node and multi-node GPU systems
  • Collaborate with the open-source community to gather feedback and iterate on the project
  • Contribute to PyTorch (Core, Compile, Distributed) within the scope of the project
  • Improve PyTorch performance in general

Requirements:

  • Currently has, or is in the process of obtaining, a PhD degree in the field of Computer Science or a related STEM field
  • Deep knowledge of transformer architectures, including attention, feed-forward layers, and Mixture-of-Experts (MoE) models
  • Strong background in ML systems research, with domain knowledge in MoE efficiency, such as routing, expert parallelism, communication overheads, and kernel-level optimizations
  • Hands-on experience writing GPU kernels using CUDA and/or cuteDSL
  • Working knowledge of quantization techniques and their impact on performance and accuracy
  • Must obtain work authorization in the country of employment at the time of hire, and maintain ongoing work authorization during employment

Nice to have:

  • Experience working on other ML compiler stack, especially on PT2 stack
  • Familiarity with distributed training and inference, such as data parallelism and collective communication
  • Ability to independently design experiments, analyze complex performance tradeoffs, and clearly communicate technical findings in writing and presentations
  • Intent to return to degree program after the completion of the internship/co-op
  • Proven track record of achieving significant results as demonstrated by grants, fellowships, patents, as well as first-authored publications at leading workshops or conferences such as NeurIPS, MLSys, ASPLOS, PLDI, CGO, PACT, ICML, or similar
  • Experience working and communicating cross functionally in a team environment

Additional Information:

Job Posted:
February 04, 2026

Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Research Scientist Intern, PyTorch Framework Performance

PhD AI Research Intern

Join our cutting-edge Machine Learning Research team at Atlassian as a PhD Resea...
Location
Location
United States , Seattle
Salary
Salary:
49.00 - 75.00 USD / Hour
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Completed Bachelors degree in Computer Science or a related field
  • Currently pursuing a PhD in Computer Science or a related field at any stage of your doctoral studies
  • Degree completion date cannot be earlier than September 2026 - June 2027
  • Strong foundation in AI/ML, LLMs, modeling and/or optimization techniques
  • Exhibit a solid grasp of algorithms and data structures
  • Demonstrate proficiency in Python programming and ability to write clean, efficient, and well-documented code
  • Experience working with large-scale datasets, including data preprocessing, augmentation, and scaling techniques
  • Has expertise in managing data using Python libraries such as NumPy, Pandas, Matplotlib, in addition to leveraging models from Hugging Face and has practical knowledge of applied machine learning and deep learning frameworks, like PyTorch
  • Demonstrated exposure to natural language processing (NLP) and Computer Vision (CV)
  • Familiarity with state-of-the-art research in machine learning and AI, as evidenced by relevant coursework, publications, or projects
Job Responsibility
Job Responsibility
  • Collaborate cross-functionally with Research Scientists and Machine Learning Engineers to design, implement, and evaluate experiments that advance the performance, efficiency, and scalability of modern ML and LLM systems for our AI products
  • Curate, preprocess, and manage large-scale datasets for training and evaluation, ensuring data quality, diversity, and reproducibility across experiments
  • Conduct continued training, fine-tuning, and alignment of large language models for specialized applications such as conversational AI, summarization, generative search, and multimodal agents
  • Evaluate cutting-edge ML algorithms through rigorous experimentation and provide detailed analyses highlighting performance insights, failure modes, and opportunities for improvement
  • Contribute to publications and presentations at internal workshops or top-tier academic venues, helping to drive innovation in Enterprise AI and large-scale ML systems
What we offer
What we offer
  • health and wellbeing resources
  • paid volunteer days
Read More
Arrow Right

Machine Learning Research Scientist

This role focuses on cutting-edge research and development in Artificial Intelli...
Location
Location
United States , Milpitas
Salary
Salary:
117500.00 - 270000.00 USD / Year
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • PhD in Computer Science, Electrical Engineering, or related fields focusing on Machine Learning for the dissertation
  • extensive experience in deep learning research, preferably in Large Language Models or Reinforcement Learning
  • experience developing applications with deep learning frameworks like PyTorch with a high software proficiency
  • strong programming skills in Python, data structures, and algorithms are required
  • experience with ML model optimization, GPU acceleration, heterogeneous computation, system software, and performance optimization desired
  • experience in Python Web Frameworks – Django, Flask - a plus but not required.
Job Responsibility
Job Responsibility
  • conducting research, developing solutions, and creating intellectual property in emerging fields like reinforcement learning, LLMs, digital twins, clean energy, data center optimization, and sustainability
  • developing advanced technologies for analysis, optimization, time series forecasting, uncertainty quantification, and control
  • providing thought leadership, collaborating internally and externally, and contributing to HPE’s strategy by identifying emerging technologies
  • publishing in top conferences like NeurIPS, AAAI, and ACL
  • developing patent applications
  • software development, GPU acceleration, model optimization, and real-time data streaming to create robust AI solutions for real-world use cases.
What we offer
What we offer
  • a competitive salary and extensive social benefits
  • diverse and dynamic work environment
  • work-life balance and support for career development
  • health and wellbeing programs
  • personal and professional development programs
  • diversity, inclusion, and belonging initiatives.
  • Fulltime
Read More
Arrow Right

Research Scientist Intern, PyTorch Compiler

Our team makes PyTorch run faster and more resource-efficient without sacrificin...
Location
Location
United States , Menlo Park
Salary
Salary:
7650.00 - 12134.00 USD / Month
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Currently has or is in the process of obtaining a PhD degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • Experience in ML compiler, Distributed Training, ML systems, or similar
  • Proficient in Python or Cuda programming
  • Must obtain work authorization in the country of employment at the time of hire, and maintain ongoing work authorization during employment
Job Responsibility
Job Responsibility
  • Develop new techniques in TorchDynamo, TorchInductor, PyTorch core, PyTorch Distributed
  • Explore the intersection of PyTorch compiler and PyTorch Distributed
  • Optimize Generative AI models across the stack (pre-training, fine-tuning, and inference)
  • Improve general PyTorch performance
  • Conduct cutting-edge research on ML compiler and ML distributed technologies
  • Collaborate with users of PyTorch to enable new use cases for the framework both inside and outside Meta
Read More
Arrow Right

Research Scientist Intern, Smart Glasses in Wearables AI

The Wearables AI team at Meta works to advance the field of artificial intellige...
Location
Location
United States , Sunnyvale
Salary
Salary:
7650.00 - 12134.00 USD / Month
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Currently has or is in the process of obtaining a Ph.D. degree in Computer Science, Artificial Intelligence, Multi-modal Systems, Computer Vision, Natural Language Processing, Speech Recognition, Audio Processing, Conversational AI, or other relevant technical field
  • Must obtain work authorization in the country of employment at the time of hire and maintain ongoing work authorization during employment
  • Experience with Python, C++, C, Java or other related languages
  • Experience building systems with deep learning frameworks such as Pytorch or Tensorflow
Job Responsibility
Job Responsibility
  • Perform research to advance the science and technology of intelligent machines
  • Develop novel and accurate NLP algorithms and systems, leveraging Deep Learning and Machine Learning on big data resources
  • Collaborate with researchers and cross-functional partners including communicating research plans, progress, and results
  • Publish research results and contribute to research that can be applied to Meta product development
Read More
Arrow Right

Research Scientist Intern, PyTorch Distributed

Meta is seeking a Research Scientist Intern to join our Meta PyTorch Distributed...
Location
Location
United States , Menlo Park
Salary
Salary:
7650.00 - 12134.00 USD / Month
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Currently has, or is in the process of obtaining, PhD degree in the field of Computer Science or a related STEM field
  • Experience in one or more of the following machine learning/deep learning domains: Large scale training and inference ML Systems Research, ML theory: Basic knowledge about ML models in different modalities like LLM (Large Language Models), Vision (VITS, MVITS) and Multimodal and how scale impacts performance, ML systems: AI infrastructure, machine learning accelerators, high performance computing, machine learning compilers, GPU architecture, machine learning frameworks, distributed systems, on-device optimization
  • Must obtain work authorization in the country of employment at the time of hire, and maintain ongoing work authorization during employment
Job Responsibility
Job Responsibility
  • Apply relevant AI and machine learning techniques to advance the state-of-the-art in machine learning frameworks
  • Collaborate with users of PyTorch to enable new use cases for the framework both inside and outside Meta
  • Develop novel, accurate AI algorithms and advanced systems for large scale distributed training and inference
  • Leverage graph-based and compiler-based technologies to optimize distributed training and distributed inference use-cases
Read More
Arrow Right

Research Scientist Intern, AI & Compute Foundation - MTIA Software

The MTIA (Meta Training & Inference Accelerator) Software team is part of the AI...
Location
Location
Canada , Toronto
Salary
Salary:
6240.00 - 10334.00 CAD / Month
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Currently has, or is in the process of obtaining, PhD degree in the field of Computer Science or a related STEM field
  • C/C++ programming skills
  • Must obtain work authorization in country of employment at the time of hire, and maintain ongoing work authorization during employment
  • Knowledge of Computer Architecture and Distributed systems with interest in one or more of High Performance Computing, Numerics, Performance and AI hardware including compute, networking and storage
Job Responsibility
Job Responsibility
  • Development of Software stack with one of the following core focus areas: AI frameworks, compiler stack, high performance kernel development and acceleration onto next generation of hardware architectures
  • Contribute to the development of the industry-leading PyTorch AI framework core compilers to support new state of the art inference and training AI hardware accelerators and optimize their performance
  • Analyze deep learning networks, develop & implement compiler optimization algorithms
  • Collaborating with AI research scientists to accelerate the next generation of deep learning models such as Recommendation systems, Generative AI, Computer vision, NLP etc
  • Performance tuning and optimizations of deep learning framework & software components
Read More
Arrow Right

Research Engineering Manager, Evaluations, Meta Superintelligence Labs

Meta is seeking a Research Engineering Manager to lead the Evaluations team with...
Location
Location
United States , Menlo Park
Salary
Salary:
219000.00 - 301000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or Master's degree in Computer Science, Machine Learning, or a related technical field
  • 4+ years of experience in machine learning engineering, machine learning research, or a related technical role
  • 3+ years of experience managing or leading technical teams, including hiring, mentoring, and performance management
  • Proficiency in Python and experience with ML frameworks such as PyTorch
  • Proven track record of leading medium to large-scale technical projects from conception to deployment
  • Demonstrated experience balancing hands-on technical work with people management and strategic planning
  • Clear communication and experience influencing cross-functional stakeholders
Job Responsibility
Job Responsibility
  • Build, mentor, and grow a team of research engineers and scientists focused on evaluation infrastructure and benchmarking
  • Conduct performance reviews, career development conversations, and provide technical mentorship to team members
  • Foster a culture of engineering excellence, research rigor, and rapid iteration within the team
  • Partner with recruiting to hire world-class research engineering talent
  • Curate and integrate publicly available and internal benchmarks to direct the capabilities of frontier model development
  • Oversee the development and implementation of evaluation environments, including environments for novel model capabilities and modalities
  • Establish partnerships with external data vendors to source and prepare high-quality evaluation datasets
  • Influence the technical roadmap for evaluation infrastructure in collaboration with MSL Infra team
  • Translate the technical vision of research scientists into actionable engineering plans and execution strategies
  • Partner with research scientists, product teams, and other engineering teams to align evaluation priorities with organizational goals
What we offer
What we offer
  • bonus
  • equity
  • benefits
  • Fulltime
Read More
Arrow Right

Research Scientist Intern, Multimodal Foundations

Meta is seeking Research Interns to join Fundamental AI Research (FAIR) working ...
Location
Location
United States , Bellevue
Salary
Salary:
7650.00 - 12134.00 USD / Month
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Currently has or is in the process of obtaining a PhD degree in the field of Computer Vision, Natural Language Processing, Machine Learning, Artificial Intelligence, or relevant technical field
  • Research and/or work experience in Computer vision
  • Research and/or work experience in Natural Language Processing
  • Research and/or work experience in Machine Learning or Deep Learning
  • Experience in Python, C++, or other related languages and with PyTorch framework
  • Must obtain work authorization in the country of employment at the time of hire, and maintain ongoing work authorization during employment
Job Responsibility
Job Responsibility
  • Brainstorm with research mentors, review literature and existing solutions of a challenging real-world research problem
  • Develop novel solutions, implement prototypes, and perform extensive experiments to test the proposed solutions in meaningful benchmarks and metrics, analyze the results and verify the conclusions
  • Draft and polish research reports and/or publications
  • Present research outcomes to internal and/or external audiences
  • Contribute research that can be applied to Meta product development
Read More
Arrow Right