CrawlJobs Logo

Principal Researcher - Cloud and AI Infrastructure

https://www.microsoft.com/ Logo

Microsoft Corporation

Location Icon

Location:
Canada , Vancouver

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

142400.00 - 257500.00 CAD / Year

Job Description:

Microsoft Research Asia – Vancouver lab, located in the vibrant city of Vancouver, BC, Canada, our lab represents Microsoft Research Asia’s exciting expansion into the Asia-Pacific region. We’re on a mission to transform the future of artificial intelligence by bridging the gap between cutting-edge general AI and the specialized, real-world applications that drive meaningful impact. We are seeking a highly skilled Principal Researcher - Cloud and AI Infrastructure with a keen interest in advancing cloud and Artificial Intelligence (AI) infrastructure architecture, and chip design using AI technologies. At the Vancouver lab, we focus on deeply integrating intelligent systems across every layer of computing—from infrastructure to the physical environment. Our goal is to solve complex, real-world challenges with precision, scalability, and cost-efficiency. This means working at the intersection of AI, human interaction, and environmental context through a dynamic, co-evolutionary process. If you're passionate about pushing the boundaries of AI and want to be part of a team that’s shaping the future of intelligent systems, we invite you to explore opportunities with us. This is an opportunity to drive an ambitious research agenda while collaborating with diverse teams to push for novel applications of those areas.

Job Responsibility:

  • Investigate and analyze emerging hardware technologies, trends, and advancements to stay ahead of the industry
  • Design and optimize hardware components, systems, and architectures to enhance performance, reliability, and efficiency
  • Conduct simulations, tests, and validations to ensure hardware designs meet required specifications and performance goals
  • Develop prototypes and proof-of-concept models to demonstrate new hardware technologies and applications
  • Identify opportunities for hardware improvements and cost reductions by staying informed about industry best practices and standards
  • Collaborate with cross-functional teams, including software researchers, designers, and engineers, to identify hardware requirements and develop innovative solutions
  • Partner with manufacturing vendors and production teams to transition innovative designs and concepts into deployable systems
  • Document research findings, design decisions, and technical specifications to facilitate knowledge sharing and collaboration within the organization
  • Embody our culture and values.

Requirements:

  • Doctorate in relevant field AND 3+ years related research experience OR equivalent experience
  • 3+ years experience in research related to infrastructure design, computer architecture, or artificial intelligence
  • Doctorate in relevant field AND 5+ years related research experience OR equivalent experience
  • Experience publishing academic papers as a lead author or essential contributor
  • Experience participating in a top conference in relevant research domain
  • Experience in optimizing or designing hardware components and architectures to enhance performance, reliability, efficiency, etc.

Additional Information:

Job Posted:
March 01, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Principal Researcher - Cloud and AI Infrastructure

Principal Applied Researcher AI/NLP

At PointClickCare our mission is simple: to help providers deliver exceptional c...
Location
Location
United States
Salary
Salary:
195800.00 - 217500.00 USD / Year
pointclickcare.com Logo
PointClickCare
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • PhD or comparable level of experience in Computer Science, Math, Physics, Engineering or a related field
  • 4-10+ year industry experience building solutions in commercial SaaS, including at least 4 years working in applications of NLP, Search or AI/ML technologies for healthcare
  • Strong interest in applying AI/ML/NLP to healthcare related problems and data
  • Expert-level practical, hands-on experience developing and applying a wide range of techniques in Natural Language Processing, including fine tuning of LLMs and other Transformer models, plus one or more additional AI/ML or Search related areas of expertise to solve real-world problems at scale
  • Demonstrated ability to lead and perform research and experimentation to select appropriate approaches, algorithms, evaluation methods, and frameworks, as well as tasks such as feature selection, language modeling, evaluation and fine tuning or training models, applying standard approaches or developing new tools or workflows as needed to meet project requirements
  • Significant experience building and deploying AI/machine learning and NLP models for large-scale SaaS products, including familiarity with industry standard software development concepts such as scaling issues, version control, CI/CD pipelines, and security
  • Solid understanding and experience with transformer models and multiple kinds of NLP and ML models and approaches including logistic regression, random forest, ensemble methods, SVM, KNN, reinforcement learning, and other ML techniques
  • Proficiency in Python and Java required. Proficiency in JavaScript or TypeScript and modern UI frameworks for building prototype or tool front ends desired
  • Proficiency doing data engineering for ML and NLP applications, including exposure to database systems and proficiency with SQL
  • Proficiency building models from big data using modern packages, models and data analysis stacks such as NumPy, SciPy, Pandas, Scikit-learn, PyTorch, Keras, LightGBM, fastText, NLTK, and spaCy. Proficiency fine tuning Hugging Face Transformers required
Job Responsibility
Job Responsibility
  • You will be applying NLP including GenAI and other AI/ML techniques to develop model systems and solutions, collaborating across functions to scale and integrate advanced solutions into successful end user experiences in large-scale cloud based SaaS production environments for healthcare
  • You will be working with product leaders, clinical informaticists, data scientists, UI/UX researchers and designers, other AI and machine learning and domain experts, engineering teams and others, including work with customers and users who are healthcare professionals
  • Design, build and evaluate solutions that may involve structured or unstructured data including speech or natural language for healthcare use cases, delivering capabilities such as summarization, predictive models, recommenders, semantic search, extraction, classification or other NLP, AI or machine learning based techniques
  • You will be performing research and experimentation to select appropriate approaches, algorithms, evaluation methods and frameworks and doing the R&D to deliver model systems
  • You will perform, oversee and assist in data collection, data cleaning, data analysis, algorithm selection or design, prompt tuning, parameter fine tuning, training, development and evaluation of systems that deliver responsible AI solutions at scale, using existing or developing new tools or workflows as needed
  • As a principal applied researcher, you will bring deep technical expertise and also provide mentorship on advanced AI, NLP, data science, statistical and machine learning methods and technologies, helping the organization develop new capabilities for innovative solutions
  • You will have substantial independence and responsibility from day one
What we offer
What we offer
  • Benefits starting from Day 1
  • Retirement Plan Matching
  • Flexible Paid Time Off
  • Wellness Support Programs and Resources
  • Parental & Caregiver Leaves
  • Fertility & Adoption Support
  • Continuous Development Support Program
  • Employee Assistance Program
  • Allyship and Inclusion Communities
  • Employee Recognition … and more
  • Fulltime
Read More
Arrow Right

Principal AI/ML & Innovation Engineer

We are seeking Principal AI/ML & Innovation Engineer who will be leading initiat...
Location
Location
Puerto Rico , Aguadilla
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or master’s degree in computer science, engineering, data science, machine learning, artificial intelligence, or closely related quantitative discipline
  • Typically, 10-15 years’ experience
  • Solid understanding of fundamental AI and machine learning concepts, including supervised and unsupervised learning, deep learning, reinforcement learning, natural language processing, computer vision, and statistical modeling
  • Proficient in implementing and deploying various machine learning algorithms, such as decision trees, random forests, support vector machines, and neural networks
  • Knowledge of popular machine learning frameworks and libraries like TensorFlow, PyTorch, or sci-kit
  • Strong understanding of GitHub CoPilot, Cursor, N8N, vibe coding, Windsurf, and similar technologies
  • Experience in Cloud Infrastructure (AWS, Azure, etc)
  • Knowledge of Open Source, Linux, etc
  • Understanding of Devops, SRE
  • Expertise in deep learning techniques, architectures, and frameworks (e.g., convolutional neural networks (CNN), recurrent neural networks (RNN), generative adversarial networks (GAN), etc.)
Job Responsibility
Job Responsibility
  • Designing, developing, and deploying advanced machine learning models and algorithms
  • Leading research initiatives to explore novel approaches and technologies
  • Designing the architecture of AI systems and ensuring scalability, performance, and reliability
  • Collaborating with other teams, such as data scientists, software engineers, and product managers
  • Providing technical leadership and mentorship to junior engineers
  • Overseeing and guiding multiple design review sessions across different projects
  • Partnering with the engineering manager and team lead to establish long-term design and implementation strategies
  • Leading efforts to incorporate feedback loops and continuous improvement processes
  • Leading meetings, ensuring efficient progress tracking, issue resolution, and team coordination
  • Creating and delivering high-level presentations and reports to executive stakeholders
What we offer
What we offer
  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion
  • Fulltime
Read More
Arrow Right

Principal Engineer

The Principal AI/ML Operations Engineer leads the architecture, automation, and ...
Location
Location
United States , Pleasanton, California
Salary
Salary:
251000.00 - 314500.00 USD / Year
blackline.com Logo
BlackLine
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Machine Learning, Data Science, or a related field
  • 10+ years in ML infrastructure, DevOps, and software system architecture
  • 4+ years in leading MLOps or AI Ops platforms
  • Strong programming skills in languages such as Python, Java, or Scala
  • Expertise in ML frameworks (TensorFlow, PyTorch, scikit-learn) and orchestration tools (Airflow, Kubeflow, Vertex AI, MLflow)
  • Proven experience operating production pipelines for ML and LLM-based systems across cloud ecosystems (GCP, AWS, Azure)
  • Deep familiarity with LangChain, LangGraph, ADK or similar agentic system runtime management
  • Strong competencies in CI/CD, IaC, and DevSecOps pipelines integrating testing, compliance, and deployment automation
  • Hands-on with observability stacks (Prometheus, Grafana, Newrelic) for model and agent performance tracking
  • Understanding of governance frameworks for Responsible AI, auditability, and cost metering across training and inference workloads
Job Responsibility
Job Responsibility
  • Define enterprise-level standards and reference architectures for ML-Ops and AIOps systems
  • Partner with data science, security, and product teams to set evaluation and governance standards (Guardrails, Bias, Drift, Latency SLAs)
  • Mentor senior engineers and drive design reviews for ML pipelines, model registries, and agentic runtime environments
  • Lead incident response and reliability strategies for ML/AI systems
  • Lead the deployment of AI models and systems in various environments
  • Collaborate with development teams to integrate AI solutions into existing workflows and applications
  • Ensure seamless integration with different platforms and technologies
  • Define and manage MCP Registry for agentic component onboarding, lifecycle versioning, and dependency governance
  • Build CI/CD pipelines automating LLM agent deployment, policy validation, and prompt evaluation of workflows
  • Develop and operationalize experimentation frameworks for agent evaluations, scenario regression, and performance analytics
What we offer
What we offer
  • short-term and long-term incentive programs
  • robust offering of benefit and wellness plans
  • Fulltime
Read More
Arrow Right

Senior Principal Researcher - Cloud and AI Infrastructure

Microsoft Research Asia – Vancouver lab, located in the vibrant city of Vancouve...
Location
Location
Canada , Vancouver
Salary
Salary:
163000.00 - 296400.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Doctorate in relevant field AND 6+ years related research experience
  • OR Master's Degree in relevant field AND 7+ years related research experience
  • OR Bachelor's Degree in relevant field AND 9+ years related research experience
  • OR equivalent experience
  • 3+ years’ experience in research related to infrastructure design, computer architecture, or artificial intelligence
  • Experience publishing academic papers as a lead author or essential contributor
  • Experience participating in a top conference in relevant research domain
  • Experience in optimizing or designing hardware components and architectures to enhance performance, reliability, efficiency
Job Responsibility
Job Responsibility
  • Investigate and analyze emerging hardware technologies, trends, and advancements
  • Design and optimize hardware components, systems, and architectures to enhance performance, reliability, and efficiency
  • Conduct simulations, tests, and validations to ensure hardware designs meet required specifications and performance goals
  • Develop prototypes and proof-of-concept models to demonstrate new hardware technologies and applications
  • Identify opportunities for hardware improvements and cost reductions by staying informed about industry best practices and standards
  • Collaborate with cross-functional teams, including software researchers, designers, and engineers, to identify hardware requirements and develop innovative solutions
  • Partner with manufacturing vendors and production teams to transition innovative designs and concepts into deployable systems
  • Document research findings, design decisions, and technical specifications to facilitate knowledge sharing and collaboration within the organization
  • Fulltime
Read More
Arrow Right

Principal Product Manager - Core AI

We are part of the CoreAI – Platform and Tools division at Microsoft. Our missio...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree AND 8+ years experience in product management or related fields OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Microsoft Cloud Background Check
  • 5+ years of experience designing and shipping complex products for developers, ML professionals, or similar
  • 3+ years of experience with distributed platform ecosystem (multi-tenant / real-time processing, batch computing etc.)
  • Proven experience in product management/technical program management, preferably in AI infrastructure or cloud services
  • Experience with and a deep understanding of GPUs, VM, OS and cloud infrastructure fundamentals
  • Ability to navigate ambiguity and drive clarity across complex, cross-functional initiatives
  • Experience creating solutions using Azure, AWS, or Google Cloud
  • Experience writing Python, particularly for machine learning
Job Responsibility
Job Responsibility
  • Join the CoreAI Infrastructure team at Microsoft, where we accelerate the transition and scaling of Generative AI models
  • Deliver a world-class AI Infrastructure stack that will host AI Foundry Services
  • Accountable for the platform roadmap for platform to scale AI workloads from research to production
  • Own a product area and be responsible for understanding developer needs and behaviors, defining product requirements, managing end-to-end product development, launches and iterations
  • Find a path to get things done despite roadblocks to get your work into the hands of customers quickly and iteratively
  • Enjoy working in a fast-paced, customer-first, product development cycle
  • Translate business goals into strategy, user experience and technical requirements in close collaboration with UX, Data Science, Engineering, AI research, and Product Marketing teams
  • Define goals and performance indicators, set up and oversee experiments, measure success with data and research
  • Collaborate effectively and communicate clearly with cross-functional teams, including product managers, designers, and other engineers, to build exceptional consumer-grade applications
  • Embody our culture and values
  • Fulltime
Read More
Arrow Right

Associate Vice President for Research Computing

The Associate Vice President (AVP) for Research Computing serves as the senior e...
Location
Location
United States of America , Rochester
Salary
Salary:
205245.00 - 328392.00 USD / Year
urmc.rochester.edu Logo
University of Rochester
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in a relevant field of study.
  • Doctoral or advanced degree in a relevant field of study, advanced strongly preferred.
  • Minimum of 10 years of experience in informatics and computational research, with at least 5 years in a leadership role preferred.
  • Experience as a faculty member in higher education preferred.
  • Experience working in research-intensive higher education environment preferred.
  • Deep understanding of the research lifecycle and advanced knowledge of HPC architectures, scientific software, cloud-based research environments, and large-scale data storage.
  • Working knowledge of scientific concepts in fields such as biology, biochemistry, genomic, imaging, chemistry, physics and data science / AI
  • Proven ability to lead large technical teams, foster faculty partnerships, and manage multimillion-dollar research computing portfolios.
  • Strong communication skills and ability to interact effectively at all organization levels.
  • Broad IT experience including solutions architecture, application development, engineering, business analysis, and project management.
Job Responsibility
Job Responsibility
  • Lead the development and execution of a long-range strategic plan for research computing that supports the university’s R1 research mission, including investments in HPC, research storage, AI/ML environments, cloud platforms, secure data enclaves and staffing.
  • Collaborate with the Vice President for Research and IT, Deans, and faculty leaders to define institutional priorities, align resources, and support cutting-edge, interdisciplinary research initiatives.
  • Represent the University in national and international consortia focused on research computing infrastructure, research data governance, and secure research computing.
  • Serve as a strategic advisor to executive leadership on research policy, funding, and risk management related to advanced research computing.
  • Oversee Operations, performance, and lifecycle management of the University’s research computing environment, including HPC clusters and cloud platforms.
  • Lead cross-functional technical teams responsible for system design, user support, research application integration, and compliance with research security standards (e.g.NIST 800-171, FISMA)
  • Oversee service-level agreements, uptime metrics, downtime and maintenance procedures and communications and annual investment planning to ensure the environment remains resilient, scalable, and aligned with faculty needs.
  • Act as a campus-wide leader and trusted advisor to faculty and research teams across disciplines, proactively identifying research needs and aligning computational services accordingly.
  • Lead outreach, onboarding, and education programs that expand awareness of research computing services and improve access and usability for all research teams, especially those in emerging or underserved disciplines.
  • Oversee consultation and proposal development services that support grant applications, including effort related to compute budgeting, data management planning, and infrastructure letters of support.
  • Fulltime
Read More
Arrow Right

Principal Software Engineering Infra Microsoft Copilot

As Microsoft continues to push the boundaries of AI, we are on the lookout for p...
Location
Location
China , Beijing
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • 4+ years’ experience building scalable platforms on public cloud infrastructure like Azure, AWS, or GCP with extensive use of technologies like Docker, Kubernetes, nginx, RDBMS, key-value stores, etc.
  • 6+ years’ experience in building and releasing production software at the platform level
  • Solid knowledge of APIs, data flows, systems, and services
Job Responsibility
Job Responsibility
  • Design, develop, and maintain performant and secure AI Platform services that power Copilot
  • Work collaboratively with platform, infrastructure, application engineers, and AI researchers to build next generation AI products and services
  • Ship high-quality and maintainable code, and ensure the reliability, scalability, and performance of platform components
  • Find a path to get things done despite roadblocks to get your work into the hands of users quickly and iteratively
  • Enjoy working in a fast-paced, design-driven, product development cycle
  • Embody our Culture and Values
  • Fulltime
Read More
Arrow Right

Senior Principal Machine Learning Engineer

You’ll form a new team of passionate engineers dedicated to building and scaling...
Location
Location
United States
Salary
Salary:
222300.00 - 348975.00 USD / Year
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s, Master’s, or PhD in Computer Science, Statistics, Mathematics, or a related field, or equivalent practical experience
  • 12+ years of industry experience in machine learning, data science, or AI, with a proven track record of delivering production-grade ML systems
  • Deep expertise in Python, Go, or Java, with the ability to write performant, production-quality code
  • familiarity with SQL, Spark, and cloud data environments (e.g., AWS, GCP, Databricks)
  • Experience building and scaling ML models for business-critical applications, ideally in security, privacy, anti-abuse, or compliance domains
  • Strong communication skills, able to explain complex ML concepts to diverse audiences and influence stakeholders
  • Demonstrated ability to solve ambiguous, complex problems and drive projects from ideation to production
  • Agile development mindset, with a focus on iterative improvement and business impact
Job Responsibility
Job Responsibility
  • Lead AI/ML Strategy for Trust: Drive the development and implementation of advanced machine learning algorithms and AI systems for Trust, Security, Product Abuse, and Compliance use cases (e.g., threat detection, vulnerability management, privacy automation, AI safety)
  • Architect and Scale ML Platforms: Design and build scalable, secure, and reliable ML infrastructure and pipelines, ensuring compliance with privacy and regulatory requirements
  • AI Safety and Responsible AI: Develop and champion AI safety practices, including output moderation, explainability, and alignment with evolving regulatory frameworks
  • Cross-Functional Collaboration: Partner with product, engineering, security, privacy, and analytics teams to deliver transformative AI/ML solutions that enhance Atlassian’s trust posture
  • Mentorship and Leadership: Mentor and guide ML engineers and data scientists, fostering a culture of technical excellence, innovation, and continuous improvement
  • Innovation and Research: Stay at the forefront of AI/ML research, evaluating and applying the latest techniques (e.g., LLMs, anomaly detection, privacy-preserving ML) to real-world Trust challenges
  • Platform Enablement: Build reusable ML services and APIs that empower other teams to integrate AI/ML into their products and workflows
  • Operational Excellence: Ensure high availability, reliability, and security of all ML-powered Trust platforms and services
What we offer
What we offer
  • health and wellbeing resources
  • paid volunteer days
  • benefits, bonuses, commissions, and equity
  • Fulltime
Read More
Arrow Right