This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
You will be part of a small, but dedicated team driving discrete GPU products’ performance attainment solutions across hardware, software and the platform. We are seeking a highly skilled engineer to join our Infrastructure team, focused on building scalable solutions for workload automation and performance analysis supporting advanced machine learning workloads.
Job Responsibility:
Technical team lead for a team of 5-6 engineers
Assess and understand the current automation and performance analysis infrastructure, identifying strengths, gaps, and opportunities for improvement
Collaborate with internal teams to gather technical requirements and understand evolving needs
Develop a forward looking plan that balances reusing existing systems with building new infrastructure where appropriate
Design, develop, and maintain automation and performance analysis tooling using Python, Bash, Make, and related technologies
Build and enhance workflow automation solutions using internally developed tools to orchestrate ML workloads
Develop new techniques and tooling to optimize ML workload execution, profiling, and analysis at scale
Requirements:
Strong development experience in Python and/or Bash (or equivalent scripting languages)
Experience with Github, Jenkins, or similar CI/CD and code review systems
Linux system administration experience preferred
Experience developing automated test infrastructure and orchestrating multisystem workflows is preferred
Ansible experience is a bonus
Strong analytical, problem solving, and debugging skills
Excellent communication skills
must be a critical thinker and self-starter
Ability to quickly learn and apply new tools, technologies, and frameworks
Networking experience preferred, including common protocols and basic debugging
Experience with Docker/containers and/or virtualization technologies preferred
Motivating leader with good interpersonal skills
Bachelor’s degree in a Computer Engineering/Computer Science field with 9+ years of hands-on experience, or a Master’s degree with 7+ years of relevant experience
Nice to have:
Ansible experience is a bonus
Networking experience preferred, including common protocols and basic debugging
Experience with Docker/containers and/or virtualization technologies preferred