Filters

Countries

United States (4)

Work Mode

On-site work (2)

Senior AI Infrastructure Engineer Jobs

5 Job Offers

Filters

Senior ML Infrastructure Engineer - Embodied AI

Join GM's Embodied AI team as a Senior ML Infrastructure Engineer in Sunnyvale. Design and deploy scalable platforms for machine learning training and evaluation to advance autonomous driving. Leverage 3+ years of experience with large-scale distributed systems, cloud infrastructure, and producti...

Location

United States , Sunnyvale

Salary

153200.00 - 234100.00 USD / Year

General Motors

Expiration Date

Until further notice

Senior Software Engineer - AI Infrastructure (Scheduler) - CoreAI

Join Microsoft's CoreAI team in Redmond as a Senior Software Engineer specializing in AI Infrastructure. You will design and develop critical distributed services in C# for large-scale AI training and inference. This role requires a strong background in computer science and experience with cloud-...

Location

United States , Redmond

Salary

119800.00 - 234700.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Senior AI Infrastructure Engineer

Location

Netherlands , Amsterdam

Salary

Not provided

Together AI

Expiration Date

Until further notice

Senior DevOps Engineer (AI & Cloud Infrastructure)

Join Inflection AI as a Senior DevOps Engineer in Palo Alto. Design and operate cutting-edge, GPU-powered cloud infrastructure on Azure/AWS for large-scale LLM training and inference. Leverage your expertise in Kubernetes, Terraform, and AI platforms to build highly automated, resilient systems. ...

Location

United States , Palo Alto

Salary

175000.00 - 250000.00 USD / Year

Inflection AI

Expiration Date

Until further notice

Senior AI Infrastructure Engineer

Seeking a Senior AI Infrastructure Engineer to design and maintain high-performance computing environments for AI/ML workloads. You will build scalable on-premises infrastructure using NVIDIA DGX, Kubernetes, and advanced GPU technologies. This role requires expertise in Linux, automation, and AI...

Location

United States , Bothell; Overland Park; Bellevue

Salary

113600.00 - 205000.00 USD / Year

T-Mobile

Expiration Date

Until further notice

Senior AI Infrastructure Engineer jobs represent a critical and rapidly evolving frontier in technology, where professionals build the foundational platforms that power artificial intelligence and machine learning. These engineers are the architects and custodians of the specialized, high-performance computing environments required to train complex models and run AI applications at scale. Unlike traditional infrastructure roles, this position demands a deep synthesis of hardware expertise, distributed systems knowledge, and an understanding of the unique demands of AI/ML workloads. The core mission is to create scalable, reliable, and efficient infrastructure that enables data scientists and ML engineers to innovate without being bottlenecked by underlying system constraints. Typically, professionals in these roles are responsible for designing, deploying, and maintaining large-scale GPU-accelerated computing clusters. This involves selecting and integrating cutting-edge hardware, such as advanced GPU servers and high-speed interconnects like InfiniBand, to construct robust on-premises or cloud-based AI platforms. A significant part of the job is ensuring optimal resource utilization through sophisticated workload management and orchestration, often using tools like Kubernetes with GPU-aware plugins. Engineers automate provisioning and management tasks using Infrastructure as Code (IaC) principles with tools like Terraform and Ansible, striving for operational excellence and self-service capabilities for research and development teams. They also focus on the entire data pipeline, implementing high-performance storage solutions and ensuring secure, efficient data flow to feed hungry AI models. Common responsibilities include capacity planning for exponential compute growth, performance tuning of systems, monitoring cluster health, and implementing robust security and governance frameworks specific to AI infrastructure. Collaboration is key; these engineers work closely with AI researchers, software developers, and DevOps teams to understand workload requirements and translate them into stable, scalable infrastructure solutions. They are also tasked with staying ahead of the technological curve, evaluating new hardware accelerators, software stacks, and architectural patterns to continuously enhance platform capabilities. The typical skill set for Senior AI Infrastructure Engineer jobs is broad and deep. A strong foundation in Linux/UNIX system administration is essential, coupled with proficiency in scripting and programming languages like Python and Bash. Expertise in containerization (Docker) and orchestration (Kubernetes) is standard, as is hands-on experience with GPU technologies and associated management software. A solid grasp of networking concepts, particularly around low-latency, high-throughput fabrics, and storage architectures optimized for large datasets is crucial. Importantly, candidates usually possess several years of experience in infrastructure or site reliability engineering, with a proven track record in large-scale, high-availability environments. Problem-solving skills, a passion for performance optimization, and the ability to navigate the intersection of cutting-edge hardware and complex software define success in this pivotal profession, making these jobs highly sought after in the modern tech landscape.