This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
At BlackRock, technology underpins everything we do. AI is a core strategic priority for the firm, embedded across Aladdin and our investment, client, and operational platforms. We are seeking an AI Infrastructure Engineer to help build and operate the foundational infrastructure that enables AI systems to scale safely, securely, and reliably across the enterprise. This role sits within Aladdin Platform Engineering and focuses on the infrastructure and platform services required to support machine learning models, large language models (LLMs), and emerging AI capabilities in production. The successful candidate will work closely with AI Engineers, Data Scientists, Platform Engineers, Security, and Product partners to deliver resilient, cloud native AI platforms in a highly regulated environment.
Job Responsibility:
Design, build, and operate AI focused infrastructure platforms supporting model development, training, evaluation, and inference
Engineer scalable, reliable, and secure cloud native services to support AI workloads across AWS, Azure, and hybrid environments
Partner with AI Engineering and Data Science teams to improve developer experience, performance, and operational stability of AI systems
Enable production deployment of ML models and LLMs within governed enterprise environments, aligned with firmwide risk and compliance standards
Implement and maintain infrastructure as code and automation to ensure repeatable, auditable platform provisioning
Build and operate observability, monitoring, and alerting solutions for AI platforms, ensuring availability, performance, and cost transparency
Collaborate with Security and Risk partners to integrate identity, access controls, data protection, and governance into AI infrastructure
Contribute to architectural decisions and technical standards for AI platforms across Aladdin
Participate in on-call rotations and operational support as required for critical platforms
Continuously evaluate emerging AI infrastructure technologies and apply them pragmatically within BlackRock’s enterprise context
Requirements:
Strong experience in cloud infrastructure, platform engineering, or systems engineering roles
4+ hands-on expertise with AWS and/or Azure and/or GCP, including Azure ML, Azure Foundry, AWS Bedrock, Google Vertex, as well as cloud compute, networking, storage, and security services
Understanding of ML platform operations and governance concepts, including model deployment strategies, lifecycle management, monitoring/observability, and Disaster Recovery
Experience supporting LLMs, generative AI platforms, or model serving infrastructure
Experience supporting AI and machine learning workloads, with exposure to managed compute for model training and finetuning, experimentation over large datasets, and endtoend MLOps pipeline flow including data ingestion, training, validation and deployment
Proficiency with Infrastructure as Code tools (e.g., Terraform, ARM/Bicep, CloudFormation)
Strong programming or scripting skills (e.g., Python, Bash, or similar)
Experience building and operating containerized and Kubernetes based platforms
Solid understanding of reliability, scalability, observability, and operational best practices
Ability to work effectively in cross functional teams and communicate complex technical concepts clearly
Nice to have:
Familiarity with GPU or accelerator based infrastructure
Experience working in financial services or other highly regulated industries
Familiarity with multicloud architectures and enterprise governance requirements
What we offer:
Retirement investment and tools designed to help you in building a sound financial future
Access to education reimbursement
Comprehensive resources to support your physical health and emotional well-being