This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are looking for Machine Learning Engineer to join our Models and Applications team. If the challenge of distributed training of large model on large number of GPUs excites you and you are passionate about improving training efficiency and enjoy innovating and coming up with new ideas, then this role is for you. You will be part of world class team focus on addressing the challenge of training generative AI.
Job Responsibility:
Train large model to convergence on AMD GPUs
Improve the end-to-end training pipeline performance
Optimize the distributed training pipeline and algorithm to scale out
Contribute your changes to open source
Up to date with latest training algorithms
Influence the direction of AMD AI platform
Cross team collaborate with various group and stakeholder
Requirements:
Experience in ML frameworks such as PyTorch, JAX or Tensorflow
Experience with distributed training and distributed training framework such as DeepSpeed
Experience with LLM or Vision, especially large model is a plus
Excellent python programing skills, including debugging, profiling, and perf analysis
Experience with ML pipeline
Strong communication and problem-solving skills
A master’s degree in computer science, artificial intelligence, machine learning, or a related field
Nice to have:
Experience with LLM or Vision, especially large model is a plus