This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
BlackRock is one of the world’s leading providers of investment, advisory, and risk management solutions, powered by Aladdin, our integrated investment and risk management technology platform. Aladdin unifies data, analytics, and workflows across public and private markets, enabling scale, insights, and transformation for BlackRock and our clients. As part of Aladdin Engineering, you will join the AI Platform Engineering team, which is building the next-generation AI infrastructure and services that power Aladdin and other firm-wide applications. This team sits at the intersection of backend systems, AI engineering, AI infrastructure, and platform reliability, enabling advanced AI capabilities at scale. We are looking for a senior leader who thrives on solving complex engineering challenges, shaping AI reliability and automation strategy, and building robust, scalable platforms. You will lead teams responsible for ensuring operational excellence, reliability, and automation across AI workloads, influencing the AI ecosystem across the firm.
Job Responsibility:
Define and execute the SRE and DevOps strategy for AI platforms, ensuring high availability, scalability, and security
Architect and oversee cloud-native infrastructure across AWS, GCP, and Azure for AI workloads
Drive Kubernetes-based orchestration for AI models, including GPU scheduling and resource optimization
Establish CI/CD pipelines for AI platform and AI model lifecycle management (training, testing, deployment) with enterprise-grade security and compliance
Implement observability frameworks and reliability standards (SLIs, SLOs, SLAs) for distributed AI systems
Lead incident management, root cause analysis, and performance optimization across compute, storage, and network layers
Collaborate cross-functionally to translate business and functional requirements into resilient technical designs
Stay ahead of trends in SRE, DevOps, MLOps, and AI infrastructure to drive innovation and operational excellence
Requirements:
B.S./M.S. in Computer Science, Engineering, or related field
8+ years in platform engineering, SRE, DevOps or AIOps roles
Proficiency in Python, Bash/Shell for automation, orchestration, and AI workflows
Familiarity with Rust build and dependency management frameworks