CrawlJobs Logo

Senior Principal Researcher - Cloud and AI Infrastructure

https://www.microsoft.com/ Logo

Microsoft Corporation

Location Icon

Location:
Canada , Vancouver

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

163000.00 - 296400.00 USD / Year

Job Description:

Microsoft Research Asia – Vancouver lab, located in the vibrant city of Vancouver, BC, Canada, our lab represents Microsoft Research Asia’s exciting expansion into the Asia-Pacific region. We’re on a mission to transform the future of artificial intelligence by bridging the gap between cutting-edge general AI and the specialized, real-world applications that drive meaningful impact. We are seeking highly skilled Senior Principal Researcher - Cloud and AI Infrastructure with a keen interest in advancing cloud and Artificial Intelligence (AI) infrastructure architecture, and chip design using AI technologies. At the Vancouver Lab, we focus on deeply integrating intelligent systems across every layer of computing—from infrastructure to the physical environment. Our goal is to solve complex, real-world challenges with precision, scalability, and cost-efficiency. This means working at the intersection of AI, human interaction, and environmental context through a dynamic, co-evolutionary process. If you're passionate about pushing the boundaries of AI and want to be part of a team that’s shaping the future of intelligent systems, we invite you to explore opportunities with us. This is an opportunity to drive an ambitious research agenda while collaborating with diverse teams to push for novel applications of those areas.

Job Responsibility:

  • Investigate and analyze emerging hardware technologies, trends, and advancements
  • Design and optimize hardware components, systems, and architectures to enhance performance, reliability, and efficiency
  • Conduct simulations, tests, and validations to ensure hardware designs meet required specifications and performance goals
  • Develop prototypes and proof-of-concept models to demonstrate new hardware technologies and applications
  • Identify opportunities for hardware improvements and cost reductions by staying informed about industry best practices and standards
  • Collaborate with cross-functional teams, including software researchers, designers, and engineers, to identify hardware requirements and develop innovative solutions
  • Partner with manufacturing vendors and production teams to transition innovative designs and concepts into deployable systems
  • Document research findings, design decisions, and technical specifications to facilitate knowledge sharing and collaboration within the organization

Requirements:

  • Doctorate in relevant field AND 6+ years related research experience
  • OR Master's Degree in relevant field AND 7+ years related research experience
  • OR Bachelor's Degree in relevant field AND 9+ years related research experience
  • OR equivalent experience
  • 3+ years’ experience in research related to infrastructure design, computer architecture, or artificial intelligence
  • Experience publishing academic papers as a lead author or essential contributor
  • Experience participating in a top conference in relevant research domain
  • Experience in optimizing or designing hardware components and architectures to enhance performance, reliability, efficiency

Additional Information:

Job Posted:
February 01, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior Principal Researcher - Cloud and AI Infrastructure

Senior Principal Machine Learning Engineer

You’ll form a new team of passionate engineers dedicated to building and scaling...
Location
Location
United States
Salary
Salary:
222300.00 - 348975.00 USD / Year
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s, Master’s, or PhD in Computer Science, Statistics, Mathematics, or a related field, or equivalent practical experience
  • 12+ years of industry experience in machine learning, data science, or AI, with a proven track record of delivering production-grade ML systems
  • Deep expertise in Python, Go, or Java, with the ability to write performant, production-quality code
  • familiarity with SQL, Spark, and cloud data environments (e.g., AWS, GCP, Databricks)
  • Experience building and scaling ML models for business-critical applications, ideally in security, privacy, anti-abuse, or compliance domains
  • Strong communication skills, able to explain complex ML concepts to diverse audiences and influence stakeholders
  • Demonstrated ability to solve ambiguous, complex problems and drive projects from ideation to production
  • Agile development mindset, with a focus on iterative improvement and business impact
Job Responsibility
Job Responsibility
  • Lead AI/ML Strategy for Trust: Drive the development and implementation of advanced machine learning algorithms and AI systems for Trust, Security, Product Abuse, and Compliance use cases (e.g., threat detection, vulnerability management, privacy automation, AI safety)
  • Architect and Scale ML Platforms: Design and build scalable, secure, and reliable ML infrastructure and pipelines, ensuring compliance with privacy and regulatory requirements
  • AI Safety and Responsible AI: Develop and champion AI safety practices, including output moderation, explainability, and alignment with evolving regulatory frameworks
  • Cross-Functional Collaboration: Partner with product, engineering, security, privacy, and analytics teams to deliver transformative AI/ML solutions that enhance Atlassian’s trust posture
  • Mentorship and Leadership: Mentor and guide ML engineers and data scientists, fostering a culture of technical excellence, innovation, and continuous improvement
  • Innovation and Research: Stay at the forefront of AI/ML research, evaluating and applying the latest techniques (e.g., LLMs, anomaly detection, privacy-preserving ML) to real-world Trust challenges
  • Platform Enablement: Build reusable ML services and APIs that empower other teams to integrate AI/ML into their products and workflows
  • Operational Excellence: Ensure high availability, reliability, and security of all ML-powered Trust platforms and services
What we offer
What we offer
  • health and wellbeing resources
  • paid volunteer days
  • benefits, bonuses, commissions, and equity
  • Fulltime
Read More
Arrow Right

Principal Engineer

The Principal AI/ML Operations Engineer leads the architecture, automation, and ...
Location
Location
United States , Pleasanton, California
Salary
Salary:
251000.00 - 314500.00 USD / Year
blackline.com Logo
BlackLine
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Machine Learning, Data Science, or a related field
  • 10+ years in ML infrastructure, DevOps, and software system architecture
  • 4+ years in leading MLOps or AI Ops platforms
  • Strong programming skills in languages such as Python, Java, or Scala
  • Expertise in ML frameworks (TensorFlow, PyTorch, scikit-learn) and orchestration tools (Airflow, Kubeflow, Vertex AI, MLflow)
  • Proven experience operating production pipelines for ML and LLM-based systems across cloud ecosystems (GCP, AWS, Azure)
  • Deep familiarity with LangChain, LangGraph, ADK or similar agentic system runtime management
  • Strong competencies in CI/CD, IaC, and DevSecOps pipelines integrating testing, compliance, and deployment automation
  • Hands-on with observability stacks (Prometheus, Grafana, Newrelic) for model and agent performance tracking
  • Understanding of governance frameworks for Responsible AI, auditability, and cost metering across training and inference workloads
Job Responsibility
Job Responsibility
  • Define enterprise-level standards and reference architectures for ML-Ops and AIOps systems
  • Partner with data science, security, and product teams to set evaluation and governance standards (Guardrails, Bias, Drift, Latency SLAs)
  • Mentor senior engineers and drive design reviews for ML pipelines, model registries, and agentic runtime environments
  • Lead incident response and reliability strategies for ML/AI systems
  • Lead the deployment of AI models and systems in various environments
  • Collaborate with development teams to integrate AI solutions into existing workflows and applications
  • Ensure seamless integration with different platforms and technologies
  • Define and manage MCP Registry for agentic component onboarding, lifecycle versioning, and dependency governance
  • Build CI/CD pipelines automating LLM agent deployment, policy validation, and prompt evaluation of workflows
  • Develop and operationalize experimentation frameworks for agent evaluations, scenario regression, and performance analytics
What we offer
What we offer
  • short-term and long-term incentive programs
  • robust offering of benefit and wellness plans
  • Fulltime
Read More
Arrow Right

Senior Principal Engineer- End-to-End AI Training Framework

As the Senior Principal Engineer, E2E AI Training Framework for Autonomous Drivi...
Location
Location
United States , Sunnyvale
Salary
Salary:
240000.00 - 320000.00 USD / Year
https://www.bosch.pl/ Logo
Robert Bosch Sp. z o.o.
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master’s degree or Ph.D in Computer Science, Robotics, Electrical Engineering, AI, or a closely related field with a focus on autonomous systems
  • 10+ years of experience in software development and system engineering for autonomous driving or ADAS applications
  • Proven industry experience in releasing AI-based L2+ systems, with a strong track record of successful product deployments
  • Deep knowledge of E2E AI stack solutions and training algorithms, including reinforcement learning, and imitation learning, as well as motion control and optimization techniques
  • Deep knowledge of AI frameworks such as TensorFlow and PyTorch
  • Deep knowledge in model optimization and embedded deployment of E2E AI stacks to embedded automotive hardware
  • Deep knowledge of cloud-based scalable training pipelines, MLOps, and CICD for training AI models with large-scale fleet datasets
  • Proven track record of leading the end-to-end development and successful deployment of complex AI-powered systems into production environments at scale
Job Responsibility
Job Responsibility
  • Define and drive execution of the technical roadmap and strategy for the E2E AI machinery, including training pipelines, optimization techniques, simulation and MLOps tooling
  • Oversee the design, development, and testing of the E2E AI machinery and its interaction with data sources, model repositories, and development targets
  • Collaborate closely with other functional tech leads (e.g. data engineering, infrastructure) to define and drive the overall architecture of the AI machinery ecosystem
  • Guide the set-up of a development framework that enables fast evaluation and integration of emerging E2E AI solutions
  • Guide the transition from research prototypes to production-ready solutions, ensuring performance optimization on automotive-grade hardware and scalability
  • Leverage your prior industry experience in launching AI-based L2+ systems to implement best practices in system validation, testing (SIL/HIL), and continuous improvement
  • Mentor and lead a high-caliber team of AI scientists and engineers, fostering a culture of innovation, collaboration, and technical excellence
What we offer
What we offer
  • health, dental, and vision plans
  • health savings accounts (HSA)
  • flexible spending accounts
  • 401(K) retirement plan with an attractive employer match
  • wellness programs
  • life insurance
  • long term disability insurance
  • paid time off
  • parental leave
  • Fulltime
Read More
Arrow Right

Senior Principal, Machine Learning & Artificial Intelligence

Xometry is seeking a Senior Principal, Machine Learning & Artificial Intelligenc...
Location
Location
United States , North Bethesda
Salary
Salary:
150000.00 - 196000.00 USD / Year
cherry.vc Logo
Cherry Ventures
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master’s or PhD in Computer Science, Machine Learning, Applied Mathematics, Electrical Engineering or related field (PhD preferred for deep generative/3D modeling emphasis)
  • 12+ years of professional experience in machine learning, artificial intelligence, or data science roles — with several years in senior or principal capacity leading major programs
  • Demonstrated experience architecting and delivering large scale ML/AI solutions - end-to-end from data ingestion, feature engineering, model training, evaluation, deployment, monitoring & operations
  • Deep expertise in machine learning frameworks (TensorFlow, PyTorch), data engineering, model infrastructure, MLOps, cloud platforms (AWS, GCP, Azure), and scalable production systems
  • Strong exposure to generative AI techniques (large language models, multimodal models, diffusion, GANs) and translating them into business use-cases
  • Excellent cross-functional collaboration skills: you can partner with product, engineering, ops, manufacturing, design, business leadership and translate technical concepts into business language
  • Proven ability to influence without direct authority and drive change across organizations
  • Strong communication and presentation skills
  • you can articulate technical vision, roadmap, trade-offs and outcomes to senior leadership
  • Track record of identifying and delivering measurable business impact via ML/AI - e.g., revenue growth, cost savings, improved efficiency
Job Responsibility
Job Responsibility
  • Serve as the technical leader of multiple large, cross-functional ML/AI solutions with significant, lasting impact across Xometry’s business
  • Define, and drive the 18-24-month ML/AI technical roadmap - balancing breakthrough innovation (e.g., generative 3D, foundation models, large-scale vision/3D pipelines) with reliable business value delivery (e.g., quoting accuracy, lead-time reduction, defect detection, cost optimization)
  • Influence partner roadmaps across engineering, product, operations, and business teams: align priorities, advise on resourcing, champion ML/AI best practices
  • Proactively identify and remove roadblocks for teams and projects — whether technical, operational, data-related, or resource constraints
  • Mentorship of individuals and technical teams
  • Act as a trusted SME with strong cross-functional partnerships: your insights and guidance will shape ML/AI infrastructure, data, model, infrastructure, and tooling decisions
  • Play a leadership role in identifying areas of opportunity — e.g., using ML/AI to unlock new revenue streams (e.g., rapid quoting for new manufacturing modalities, generative design for customers), reduce cost (e.g., automated quality inspection), or optimize efficiency (e.g., 3D-geometry classification, defect detection, generating manufacturing ready models)
  • Address problems adjacent to your sphere of immediate influence: proactively tackle challenges outside direct scope and champion holistic solutions
  • Stay ahead of industry developments in ML, AI, generative AI, 2D/3D modeling and manufacturing tech
  • translate insights into the improvement of internal best practices, tooling, frameworks, model governance, data pipelines, and operationalization
What we offer
What we offer
  • annual bonus
  • 401(k) match
  • medical, dental and vision insurance
  • life and disability insurance
  • generous paid time off including vacation, sick leave, floating and fixed holidays, maternity and bonding leave
  • EAP, other wellbeing resources
  • Fulltime
Read More
Arrow Right

Senior Principal, Machine Learning & Artificial Intelligence

Xometry is seeking a Senior Principal, Machine Learning & Artificial Intelligenc...
Location
Location
United States , Waltham
Salary
Salary:
150000.00 - 196000.00 USD / Year
cherry.vc Logo
Cherry Ventures
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master’s or PhD in Computer Science, Machine Learning, Applied Mathematics, Electrical Engineering or related field (PhD preferred for deep generative/3D modeling emphasis)
  • 12+ years of professional experience in machine learning, artificial intelligence, or data science roles — with several years in senior or principal capacity leading major programs
  • Demonstrated experience architecting and delivering large scale ML/AI solutions - end-to-end from data ingestion, feature engineering, model training, evaluation, deployment, monitoring & operations
  • Deep expertise in machine learning frameworks (TensorFlow, PyTorch), data engineering, model infrastructure, MLOps, cloud platforms (AWS, GCP, Azure), and scalable production systems
  • Experience in 3D modeling / geometry / computer vision / generative models (e.g., point-cloud processing, mesh processing, text23D, image23D, CAD/CAM integration) is highly desirable
  • Strong exposure to generative AI techniques (large language models, multimodal models, diffusion, GANs) and translating them into business use-cases
  • Excellent cross-functional collaboration skills: you can partner with product, engineering, ops, manufacturing, design, business leadership and translate technical concepts into business language
  • Proven ability to influence without direct authority and drive change across organizations
  • Strong communication and presentation skills
  • you can articulate technical vision, roadmap, trade-offs and outcomes to senior leadership
Job Responsibility
Job Responsibility
  • Serve as the technical leader of multiple large, cross-functional ML/AI solutions with significant, lasting impact across Xometry’s business
  • Define, and drive the 18-24-month ML/AI technical roadmap - balancing breakthrough innovation (e.g., generative 3D, foundation models, large-scale vision/3D pipelines) with reliable business value delivery (e.g., quoting accuracy, lead-time reduction, defect detection, cost optimization)
  • Influence partner roadmaps across engineering, product, operations, and business teams: align priorities, advise on resourcing, champion ML/AI best practices
  • Proactively identify and remove roadblocks for teams and projects — whether technical, operational, data-related, or resource constraints
  • Mentorship of individuals and technical teams
  • Act as a trusted SME with strong cross-functional partnerships: your insights and guidance will shape ML/AI infrastructure, data, model, infrastructure, and tooling decisions
  • Play a leadership role in identifying areas of opportunity — e.g., using ML/AI to unlock new revenue streams (e.g., rapid quoting for new manufacturing modalities, generative design for customers), reduce cost (e.g., automated quality inspection), or optimize efficiency (e.g., 3D-geometry classification, defect detection, generating manufacturing ready models)
  • Address problems adjacent to your sphere of immediate influence: proactively tackle challenges outside direct scope and champion holistic solutions
  • Stay ahead of industry developments in ML, AI, generative AI, 2D/3D modeling and manufacturing tech
  • translate insights into the improvement of internal best practices, tooling, frameworks, model governance, data pipelines, and operationalization
What we offer
What we offer
  • 401(k) match
  • medical, dental and vision insurance
  • life and disability insurance
  • generous paid time off including vacation, sick leave, floating and fixed holidays, maternity and bonding leave
  • EAP, other wellbeing resources
  • Fulltime
Read More
Arrow Right

Principal Product Manager - Microsoft Foundry (CoreAI)

The Foundry Inference & Training team is responsible for advancing Microsoft’s m...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree AND 8+ years experience in product/service/program management or software development OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Strategic Execution and Operating Rhythm
  • Translate leadership priorities into clear execution plans, milestones, and success metrics across Foundry Inference & Training
  • Establish and run the operating cadence including planning cycles, reviews, executive readouts, and follow-ups
  • Track commitments and dependencies across engineering, research, infrastructure, and partner teams, ensuring risks and gaps are surfaced early
  • Cross-Team Alignment and Influence
  • Act as a connective layer across teams working on model training, data, infrastructure, and platform integration
  • Drive alignment on goals, timelines, and decision points across multiple senior stakeholders
  • Resolve ambiguity by framing tradeoffs, options, and recommendations grounded in technical and business context
  • Program Leadership and Delivery
  • Lead complex, multi-quarter programs with high visibility and executive attention
  • Fulltime
Read More
Arrow Right

Principal Product Manager – Network as a Service, NaaS - Platforms

The Network as a Service (NaaS) team at T-Mobile is an emerging business unit fo...
Location
Location
United States , Bellevue
Salary
Salary:
133800.00 - 241400.00 USD / Year
https://www.t-mobile.com Logo
T-Mobile
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Deep expertise in platform product management, including API-driven and cloud-based platforms
  • Strong business acumen, aligning platform vision with enterprise strategy and customer needs
  • Experience managing cross-product platform capabilities, ensuring scalability, security, and efficiency
  • System thinking, with an ability to break down complex technical systems into modular and reusable components
  • Proven leadership, coaching, and mentoring skills, fostering a growth mindset across teams
  • Exceptional communication and influence, engaging senior executives and cross-functional stakeholders
  • Strong technical program management expertise, with experience managing end-to-end delivery of complex platform solutions
  • Bachelor's Degree in Engineering, Business, or related field
  • 10+ years of Product Management experience, ideally in network services, platforms, or enterprise SaaS
  • Proven track record of owning and scaling enterprise platform products
Job Responsibility
Job Responsibility
  • Define and own NaaS platform vision, strategy, and roadmap, ensuring alignment with business, technical, and product needs
  • Partner with product managers to translate individual product requirements into scalable, reusable platform capabilities
  • Apply system thinking to design robust, scalable, and modular platform solutions that serve multiple use cases
  • Conduct competitive analysis and industry research to ensure platform innovation and technical differentiation
  • Secure funding for platform initiatives, building strong business cases and conducting ROI analysis
  • Lead end-to-end execution, including defining platform scope, managing roadmaps, and ensuring seamless integration across NaaS products
  • Work closely with engineering, architecture, and DevOps teams to deliver secure, scalable, and high-performing platform solutions
  • Own and prioritize the platform product backlog, balancing feature development, technical debt, and business priorities
  • Ensure compliance with security, data privacy, and regulatory standards across NaaS platform solutions
  • Drive technical program management efforts, ensuring efficient execution, risk mitigation, and dependency management
What we offer
What we offer
  • Competitive base salary and compensation package
  • Annual stock grant
  • Employee stock purchase plan
  • 401(k)
  • Access to free, year-round money coaches
  • Medical, dental and vision insurance
  • Flexible spending account
  • Paid time off
  • Up to 12 paid holidays
  • Paid parental and family leave
  • Fulltime
Read More
Arrow Right

Principal Software Engineer 6 - (Backend & Agentic Execution Platform)

FreeWheel’s Programmatic Demand team is looking for a Principal Engineer to help...
Location
Location
United States , Chicago; Englewood; Denver
Salary
Salary:
180337.97 - 277420.95 USD / Year
comcastadvertising.com Logo
Comcast Advertising
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Engineering, or equivalent practical experience
  • 15+ years building and operating scalable, distributed systems
  • ad tech / media / programmatic experience strongly preferred
  • Deep expertise in system design and technical architecture, including ownership of complex systems in production
  • Strong coding ability in one or more of: Python, Java, Go, C++ (performance, concurrency, and distributed computing)
  • Experience building and operating cloud-native services on AWS (or equivalent) including infrastructure-as-code
  • Demonstrated ability to own and manage technical backlogs, influence prioritization, and drive execution across multiple teams
  • AI / Agentic Systems Experience (Strongly Preferred)
  • Experience designing “tooling” APIs intended for automation (human and machine clients)
  • Familiarity with modern AI/agent patterns (tool calling, RAG, evaluation/monitoring)
Job Responsibility
Job Responsibility
  • Collaborates with project stakeholders to identify product and technical requirements
  • Designs and oversees new software and web applications
  • Trains and mentors software engineers
  • Oversees the researching, writing, and editing of documentation and technical requirements
  • Keeps current with technological developments within the industry
  • Provides technical leadership throughout the design process
  • Assists in tracking and provides performance metrics
  • Works with Quality Assurance team
  • Leads project planning, resourcing, requirement analyzing and defining
  • Presents and defends architectural, design and technical choices
What we offer
What we offer
  • Paid Time off
  • Physical Wellbeing benefits
  • Financial Wellbeing benefits
  • Emotional Wellbeing benefits
  • Life Events + Family Support benefits
  • Fulltime
Read More
Arrow Right