CrawlJobs Logo

Senior Principal Researcher - Cloud and AI Infrastructure

Canada, Vancouver 163000.00 - 296400.00 USD / Year · Job Posted February 01, 2026
Apply Position
Job Link Share

Job Description

Microsoft Research Asia – Vancouver lab, located in the vibrant city of Vancouver, BC, Canada, our lab represents Microsoft Research Asia’s exciting expansion into the Asia-Pacific region. We’re on a mission to transform the future of artificial intelligence by bridging the gap between cutting-edge general AI and the specialized, real-world applications that drive meaningful impact. We are seeking highly skilled Senior Principal Researcher - Cloud and AI Infrastructure with a keen interest in advancing cloud and Artificial Intelligence (AI) infrastructure architecture, and chip design using AI technologies. At the Vancouver Lab, we focus on deeply integrating intelligent systems across every layer of computing—from infrastructure to the physical environment. Our goal is to solve complex, real-world challenges with precision, scalability, and cost-efficiency. This means working at the intersection of AI, human interaction, and environmental context through a dynamic, co-evolutionary process. If you're passionate about pushing the boundaries of AI and want to be part of a team that’s shaping the future of intelligent systems, we invite you to explore opportunities with us. This is an opportunity to drive an ambitious research agenda while collaborating with diverse teams to push for novel applications of those areas.

Job Responsibility

  • Investigate and analyze emerging hardware technologies, trends, and advancements
  • Design and optimize hardware components, systems, and architectures to enhance performance, reliability, and efficiency
  • Conduct simulations, tests, and validations to ensure hardware designs meet required specifications and performance goals
  • Develop prototypes and proof-of-concept models to demonstrate new hardware technologies and applications
  • Identify opportunities for hardware improvements and cost reductions by staying informed about industry best practices and standards
  • Collaborate with cross-functional teams, including software researchers, designers, and engineers, to identify hardware requirements and develop innovative solutions
  • Partner with manufacturing vendors and production teams to transition innovative designs and concepts into deployable systems
  • Document research findings, design decisions, and technical specifications to facilitate knowledge sharing and collaboration within the organization

Requirements

  • Doctorate in relevant field AND 6+ years related research experience
  • OR Master's Degree in relevant field AND 7+ years related research experience
  • OR Bachelor's Degree in relevant field AND 9+ years related research experience
  • OR equivalent experience
  • 3+ years’ experience in research related to infrastructure design, computer architecture, or artificial intelligence
  • Experience publishing academic papers as a lead author or essential contributor
  • Experience participating in a top conference in relevant research domain
  • Experience in optimizing or designing hardware components and architectures to enhance performance, reliability, efficiency

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Senior Principal Researcher - Cloud and AI Infrastructure

8 matching positions

Senior Principal Machine Learning Engineer

You’ll form a new team of passionate engineers dedicated to building and scaling...
Location
Location
United States
Salary
Salary:
222300.00 - 348975.00 USD / Year
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s, Master’s, or PhD in Computer Science, Statistics, Mathematics, or a related field, or equivalent practical experience
  • 12+ years of industry experience in machine learning, data science, or AI, with a proven track record of delivering production-grade ML systems
  • Deep expertise in Python, Go, or Java, with the ability to write performant, production-quality code
  • familiarity with SQL, Spark, and cloud data environments (e.g., AWS, GCP, Databricks)
  • Experience building and scaling ML models for business-critical applications, ideally in security, privacy, anti-abuse, or compliance domains
  • Strong communication skills, able to explain complex ML concepts to diverse audiences and influence stakeholders
  • Demonstrated ability to solve ambiguous, complex problems and drive projects from ideation to production
  • Agile development mindset, with a focus on iterative improvement and business impact
Job Responsibility
Job Responsibility
  • Lead AI/ML Strategy for Trust: Drive the development and implementation of advanced machine learning algorithms and AI systems for Trust, Security, Product Abuse, and Compliance use cases (e.g., threat detection, vulnerability management, privacy automation, AI safety)
  • Architect and Scale ML Platforms: Design and build scalable, secure, and reliable ML infrastructure and pipelines, ensuring compliance with privacy and regulatory requirements
  • AI Safety and Responsible AI: Develop and champion AI safety practices, including output moderation, explainability, and alignment with evolving regulatory frameworks
  • Cross-Functional Collaboration: Partner with product, engineering, security, privacy, and analytics teams to deliver transformative AI/ML solutions that enhance Atlassian’s trust posture
  • Mentorship and Leadership: Mentor and guide ML engineers and data scientists, fostering a culture of technical excellence, innovation, and continuous improvement
  • Innovation and Research: Stay at the forefront of AI/ML research, evaluating and applying the latest techniques (e.g., LLMs, anomaly detection, privacy-preserving ML) to real-world Trust challenges
  • Platform Enablement: Build reusable ML services and APIs that empower other teams to integrate AI/ML into their products and workflows
  • Operational Excellence: Ensure high availability, reliability, and security of all ML-powered Trust platforms and services
What we offer
What we offer
  • health and wellbeing resources
  • paid volunteer days
  • benefits, bonuses, commissions, and equity
  • Fulltime
Read More
Arrow Right

Principal Engineer

The Principal AI/ML Operations Engineer leads the architecture, automation, and ...
Location
Location
United States , Pleasanton, California
Salary
Salary:
251000.00 - 314500.00 USD / Year
blackline.com Logo
BlackLine
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Machine Learning, Data Science, or a related field
  • 10+ years in ML infrastructure, DevOps, and software system architecture
  • 4+ years in leading MLOps or AI Ops platforms
  • Strong programming skills in languages such as Python, Java, or Scala
  • Expertise in ML frameworks (TensorFlow, PyTorch, scikit-learn) and orchestration tools (Airflow, Kubeflow, Vertex AI, MLflow)
  • Proven experience operating production pipelines for ML and LLM-based systems across cloud ecosystems (GCP, AWS, Azure)
  • Deep familiarity with LangChain, LangGraph, ADK or similar agentic system runtime management
  • Strong competencies in CI/CD, IaC, and DevSecOps pipelines integrating testing, compliance, and deployment automation
  • Hands-on with observability stacks (Prometheus, Grafana, Newrelic) for model and agent performance tracking
  • Understanding of governance frameworks for Responsible AI, auditability, and cost metering across training and inference workloads
Job Responsibility
Job Responsibility
  • Define enterprise-level standards and reference architectures for ML-Ops and AIOps systems
  • Partner with data science, security, and product teams to set evaluation and governance standards (Guardrails, Bias, Drift, Latency SLAs)
  • Mentor senior engineers and drive design reviews for ML pipelines, model registries, and agentic runtime environments
  • Lead incident response and reliability strategies for ML/AI systems
  • Lead the deployment of AI models and systems in various environments
  • Collaborate with development teams to integrate AI solutions into existing workflows and applications
  • Ensure seamless integration with different platforms and technologies
  • Define and manage MCP Registry for agentic component onboarding, lifecycle versioning, and dependency governance
  • Build CI/CD pipelines automating LLM agent deployment, policy validation, and prompt evaluation of workflows
  • Develop and operationalize experimentation frameworks for agent evaluations, scenario regression, and performance analytics
What we offer
What we offer
  • short-term and long-term incentive programs
  • robust offering of benefit and wellness plans
  • Fulltime
Read More
Arrow Right

Senior Principal Engineering Manager

Microsoft Research (MSR) is working to transform the future of artificial intell...
Location
Location
United States , Redmond
Salary
Salary:
163000.00 - 296400.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • 5+ years of people management experience leading software engineering teams, including managing principal engineers
  • Experience building or operating infrastructure for large-scale distributed systems, cloud platforms, or artificial intelligence (AI)/machine learning(ML) workloads
  • Track record of driving execution on complex, multi-workstream infrastructure projects with clear milestones and accountability
  • Technical fluency in one or more of: large-scale compute clusters, GPU infrastructure, scheduling and orchestration (Kubernetes, Volcano), or High-Performance Compute (HPC) environments
  • Experience with GPU programming (CUDA, NCCL) and frameworks such as PyTorch
  • Expertise in networking (InfiniBand, NVLink), storage systems, or distributed training parallelisms
  • A track record of strong cross-functional partnerships, including the ability to align on strategic direction, deliver joint accountabilities, and develop relationships with staff members with widely varied expertise
  • Experience scaling engineering teams through significant growth phases (hiring, onboarding, and integrating new engineers into a high-performing team)
  • Master's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 15+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
Job Responsibility
Job Responsibility
  • Lead, mentor, and grow the engineering team that builds MSR’s AI research infrastructure
  • Recruit and develop exceptional engineering talent, building a diverse team - including hiring, onboarding, career development, and performance management
  • Drive execution across the team by setting clear goals, tracking milestones, managing dependencies, and ensuring accountability for delivering complex infrastructure projects on time and at high quality
  • Lead team culture and process changes, cultivating an AI-first mentality that accelerates our progress through agentic coding, automation, and skills development
  • Provide technical vision and judgment on the team's architecture, strategy, and roadmap — spanning supercomputer GPU clusters, high performance networking, workload optimization, researcher tools, and agentic workflows — while empowering engineers to own deep technical details
  • Collaborate closely cross-discipline with engineers, program managers, and research and science teams to align priorities, resolve dependencies, and build better solutions together
  • Foster a team culture of operational excellence, continuous improvement, and high psychological safety where engineers are empowered to take ownership and innovate
  • Fulltime
Read More
Arrow Right

Principal Product Manager - Foundry Inferencing & Training (CoreAI - multiple roles)

Microsoft Foundry sits at the center of Microsoft’s AI strategy, powering how mo...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 331200.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s Degree and 8+ years of experience in product management, technical program management, software engineering, or related technical fields (or equivalent experience)
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Job Responsibility
Job Responsibility
  • Product Strategy & Ownership: Own product strategy and roadmap across AI model training, inference, experimentation, and platform enablement, balancing near-term delivery with long-term scale
  • Maintain end-to-end accountability from concept through launch, iteration, and measurable impact
  • Model Lifecycle & Platform Enablement: Drive initiatives across the AI model lifecycle, partnering with engineering and research to bring new capabilities from research into production
  • Enable internal teams and customers to access, integrate, and adopt models through high-quality platform experiences
  • Execution, Velocity & Operating Rigor: Lead complex, multi-quarter initiatives with high visibility, managing dependencies, risks, and tradeoffs across teams
  • Improve execution velocity by reducing friction in planning, experimentation, launches, and iteration cycles
  • Experimentation, Metrics & Continuous Improvement: Define and track metrics for efficiency, performance, reliability, and adoption, using experimentation and data to drive decisions
  • Identify opportunities for automation, simplification, and continuous improvement as systems scale
  • Cross-Functional Leadership & Communication: Act as a connective leader across engineering, data science, research, infrastructure, and go-to-market teams
  • Influence senior stakeholders through clear decision framing, executive-ready narratives, and data-backed recommendations
  • Fulltime
Read More
Arrow Right

Senior Principal Engineer- End-to-End AI Training Framework

As the Senior Principal Engineer, E2E AI Training Framework for Autonomous Drivi...
Location
Location
United States , Sunnyvale
Salary
Salary:
240000.00 - 320000.00 USD / Year
https://www.bosch.pl/ Logo
Robert Bosch Sp. z o.o.
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master’s degree or Ph.D in Computer Science, Robotics, Electrical Engineering, AI, or a closely related field with a focus on autonomous systems
  • 10+ years of experience in software development and system engineering for autonomous driving or ADAS applications
  • Proven industry experience in releasing AI-based L2+ systems, with a strong track record of successful product deployments
  • Deep knowledge of E2E AI stack solutions and training algorithms, including reinforcement learning, and imitation learning, as well as motion control and optimization techniques
  • Deep knowledge of AI frameworks such as TensorFlow and PyTorch
  • Deep knowledge in model optimization and embedded deployment of E2E AI stacks to embedded automotive hardware
  • Deep knowledge of cloud-based scalable training pipelines, MLOps, and CICD for training AI models with large-scale fleet datasets
  • Proven track record of leading the end-to-end development and successful deployment of complex AI-powered systems into production environments at scale
Job Responsibility
Job Responsibility
  • Define and drive execution of the technical roadmap and strategy for the E2E AI machinery, including training pipelines, optimization techniques, simulation and MLOps tooling
  • Oversee the design, development, and testing of the E2E AI machinery and its interaction with data sources, model repositories, and development targets
  • Collaborate closely with other functional tech leads (e.g. data engineering, infrastructure) to define and drive the overall architecture of the AI machinery ecosystem
  • Guide the set-up of a development framework that enables fast evaluation and integration of emerging E2E AI solutions
  • Guide the transition from research prototypes to production-ready solutions, ensuring performance optimization on automotive-grade hardware and scalability
  • Leverage your prior industry experience in launching AI-based L2+ systems to implement best practices in system validation, testing (SIL/HIL), and continuous improvement
  • Mentor and lead a high-caliber team of AI scientists and engineers, fostering a culture of innovation, collaboration, and technical excellence
What we offer
What we offer
  • health, dental, and vision plans
  • health savings accounts (HSA)
  • flexible spending accounts
  • 401(K) retirement plan with an attractive employer match
  • wellness programs
  • life insurance
  • long term disability insurance
  • paid time off
  • parental leave
  • Fulltime
Read More
Arrow Right

Senior Principal, Machine Learning & Artificial Intelligence

Xometry is seeking a Senior Principal, Machine Learning & Artificial Intelligenc...
Location
Location
United States , North Bethesda
Salary
Salary:
150000.00 - 196000.00 USD / Year
cherry.vc Logo
Cherry Ventures
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master’s or PhD in Computer Science, Machine Learning, Applied Mathematics, Electrical Engineering or related field (PhD preferred for deep generative/3D modeling emphasis)
  • 12+ years of professional experience in machine learning, artificial intelligence, or data science roles — with several years in senior or principal capacity leading major programs
  • Demonstrated experience architecting and delivering large scale ML/AI solutions - end-to-end from data ingestion, feature engineering, model training, evaluation, deployment, monitoring & operations
  • Deep expertise in machine learning frameworks (TensorFlow, PyTorch), data engineering, model infrastructure, MLOps, cloud platforms (AWS, GCP, Azure), and scalable production systems
  • Strong exposure to generative AI techniques (large language models, multimodal models, diffusion, GANs) and translating them into business use-cases
  • Excellent cross-functional collaboration skills: you can partner with product, engineering, ops, manufacturing, design, business leadership and translate technical concepts into business language
  • Proven ability to influence without direct authority and drive change across organizations
  • Strong communication and presentation skills
  • you can articulate technical vision, roadmap, trade-offs and outcomes to senior leadership
  • Track record of identifying and delivering measurable business impact via ML/AI - e.g., revenue growth, cost savings, improved efficiency
Job Responsibility
Job Responsibility
  • Serve as the technical leader of multiple large, cross-functional ML/AI solutions with significant, lasting impact across Xometry’s business
  • Define, and drive the 18-24-month ML/AI technical roadmap - balancing breakthrough innovation (e.g., generative 3D, foundation models, large-scale vision/3D pipelines) with reliable business value delivery (e.g., quoting accuracy, lead-time reduction, defect detection, cost optimization)
  • Influence partner roadmaps across engineering, product, operations, and business teams: align priorities, advise on resourcing, champion ML/AI best practices
  • Proactively identify and remove roadblocks for teams and projects — whether technical, operational, data-related, or resource constraints
  • Mentorship of individuals and technical teams
  • Act as a trusted SME with strong cross-functional partnerships: your insights and guidance will shape ML/AI infrastructure, data, model, infrastructure, and tooling decisions
  • Play a leadership role in identifying areas of opportunity — e.g., using ML/AI to unlock new revenue streams (e.g., rapid quoting for new manufacturing modalities, generative design for customers), reduce cost (e.g., automated quality inspection), or optimize efficiency (e.g., 3D-geometry classification, defect detection, generating manufacturing ready models)
  • Address problems adjacent to your sphere of immediate influence: proactively tackle challenges outside direct scope and champion holistic solutions
  • Stay ahead of industry developments in ML, AI, generative AI, 2D/3D modeling and manufacturing tech
  • translate insights into the improvement of internal best practices, tooling, frameworks, model governance, data pipelines, and operationalization
What we offer
What we offer
  • annual bonus
  • 401(k) match
  • medical, dental and vision insurance
  • life and disability insurance
  • generous paid time off including vacation, sick leave, floating and fixed holidays, maternity and bonding leave
  • EAP, other wellbeing resources
  • Fulltime
Read More
Arrow Right

Senior Principal, Machine Learning & Artificial Intelligence

Xometry is seeking a Senior Principal, Machine Learning & Artificial Intelligenc...
Location
Location
United States , Waltham
Salary
Salary:
150000.00 - 196000.00 USD / Year
cherry.vc Logo
Cherry Ventures
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master’s or PhD in Computer Science, Machine Learning, Applied Mathematics, Electrical Engineering or related field (PhD preferred for deep generative/3D modeling emphasis)
  • 12+ years of professional experience in machine learning, artificial intelligence, or data science roles — with several years in senior or principal capacity leading major programs
  • Demonstrated experience architecting and delivering large scale ML/AI solutions - end-to-end from data ingestion, feature engineering, model training, evaluation, deployment, monitoring & operations
  • Deep expertise in machine learning frameworks (TensorFlow, PyTorch), data engineering, model infrastructure, MLOps, cloud platforms (AWS, GCP, Azure), and scalable production systems
  • Experience in 3D modeling / geometry / computer vision / generative models (e.g., point-cloud processing, mesh processing, text23D, image23D, CAD/CAM integration) is highly desirable
  • Strong exposure to generative AI techniques (large language models, multimodal models, diffusion, GANs) and translating them into business use-cases
  • Excellent cross-functional collaboration skills: you can partner with product, engineering, ops, manufacturing, design, business leadership and translate technical concepts into business language
  • Proven ability to influence without direct authority and drive change across organizations
  • Strong communication and presentation skills
  • you can articulate technical vision, roadmap, trade-offs and outcomes to senior leadership
Job Responsibility
Job Responsibility
  • Serve as the technical leader of multiple large, cross-functional ML/AI solutions with significant, lasting impact across Xometry’s business
  • Define, and drive the 18-24-month ML/AI technical roadmap - balancing breakthrough innovation (e.g., generative 3D, foundation models, large-scale vision/3D pipelines) with reliable business value delivery (e.g., quoting accuracy, lead-time reduction, defect detection, cost optimization)
  • Influence partner roadmaps across engineering, product, operations, and business teams: align priorities, advise on resourcing, champion ML/AI best practices
  • Proactively identify and remove roadblocks for teams and projects — whether technical, operational, data-related, or resource constraints
  • Mentorship of individuals and technical teams
  • Act as a trusted SME with strong cross-functional partnerships: your insights and guidance will shape ML/AI infrastructure, data, model, infrastructure, and tooling decisions
  • Play a leadership role in identifying areas of opportunity — e.g., using ML/AI to unlock new revenue streams (e.g., rapid quoting for new manufacturing modalities, generative design for customers), reduce cost (e.g., automated quality inspection), or optimize efficiency (e.g., 3D-geometry classification, defect detection, generating manufacturing ready models)
  • Address problems adjacent to your sphere of immediate influence: proactively tackle challenges outside direct scope and champion holistic solutions
  • Stay ahead of industry developments in ML, AI, generative AI, 2D/3D modeling and manufacturing tech
  • translate insights into the improvement of internal best practices, tooling, frameworks, model governance, data pipelines, and operationalization
What we offer
What we offer
  • 401(k) match
  • medical, dental and vision insurance
  • life and disability insurance
  • generous paid time off including vacation, sick leave, floating and fixed holidays, maternity and bonding leave
  • EAP, other wellbeing resources
  • Fulltime
Read More
Arrow Right

Principal Product Manager - Microsoft Foundry (CoreAI)

The Foundry Inference & Training team is responsible for advancing Microsoft’s m...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree AND 8+ years experience in product/service/program management or software development OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Strategic Execution and Operating Rhythm
  • Translate leadership priorities into clear execution plans, milestones, and success metrics across Foundry Inference & Training
  • Establish and run the operating cadence including planning cycles, reviews, executive readouts, and follow-ups
  • Track commitments and dependencies across engineering, research, infrastructure, and partner teams, ensuring risks and gaps are surfaced early
  • Cross-Team Alignment and Influence
  • Act as a connective layer across teams working on model training, data, infrastructure, and platform integration
  • Drive alignment on goals, timelines, and decision points across multiple senior stakeholders
  • Resolve ambiguity by framing tradeoffs, options, and recommendations grounded in technical and business context
  • Program Leadership and Delivery
  • Lead complex, multi-quarter programs with high visibility and executive attention
  • Fulltime
Read More
Arrow Right