CrawlJobs Logo

Principal Researcher - Cloud and AI Infrastructure

https://www.microsoft.com/ Logo

Microsoft Corporation

Location Icon

Location:
Canada , Vancouver

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

142400.00 - 257500.00 CAD / Year

Job Description:

Microsoft Research Asia – Vancouver lab, located in the vibrant city of Vancouver, BC, Canada, our lab represents Microsoft Research Asia’s exciting expansion into the Asia-Pacific region. We’re on a mission to transform the future of artificial intelligence by bridging the gap between cutting-edge general AI and the specialized, real-world applications that drive meaningful impact. We are seeking a highly skilled Principal Researcher - Cloud and AI Infrastructure with a keen interest in advancing cloud and Artificial Intelligence (AI) infrastructure architecture, and chip design using AI technologies. At the Vancouver lab, we focus on deeply integrating intelligent systems across every layer of computing—from infrastructure to the physical environment. Our goal is to solve complex, real-world challenges with precision, scalability, and cost-efficiency. This means working at the intersection of AI, human interaction, and environmental context through a dynamic, co-evolutionary process. If you're passionate about pushing the boundaries of AI and want to be part of a team that’s shaping the future of intelligent systems, we invite you to explore opportunities with us. This is an opportunity to drive an ambitious research agenda while collaborating with diverse teams to push for novel applications of those areas.

Job Responsibility:

  • Investigate and analyze emerging hardware technologies, trends, and advancements to stay ahead of the industry
  • Design and optimize hardware components, systems, and architectures to enhance performance, reliability, and efficiency
  • Conduct simulations, tests, and validations to ensure hardware designs meet required specifications and performance goals
  • Develop prototypes and proof-of-concept models to demonstrate new hardware technologies and applications
  • Identify opportunities for hardware improvements and cost reductions by staying informed about industry best practices and standards
  • Collaborate with cross-functional teams, including software researchers, designers, and engineers, to identify hardware requirements and develop innovative solutions
  • Partner with manufacturing vendors and production teams to transition innovative designs and concepts into deployable systems
  • Document research findings, design decisions, and technical specifications to facilitate knowledge sharing and collaboration within the organization
  • Embody our culture and values.

Requirements:

  • Doctorate in relevant field AND 3+ years related research experience OR equivalent experience
  • 3+ years experience in research related to infrastructure design, computer architecture, or artificial intelligence
  • Doctorate in relevant field AND 5+ years related research experience OR equivalent experience
  • Experience publishing academic papers as a lead author or essential contributor
  • Experience participating in a top conference in relevant research domain
  • Experience in optimizing or designing hardware components and architectures to enhance performance, reliability, efficiency, etc.

Additional Information:

Job Posted:
March 01, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Principal Researcher - Cloud and AI Infrastructure

Principal Applied Researcher AI/NLP

At PointClickCare our mission is simple: to help providers deliver exceptional c...
Location
Location
United States
Salary
Salary:
195800.00 - 217500.00 USD / Year
pointclickcare.com Logo
PointClickCare
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • PhD or comparable level of experience in Computer Science, Math, Physics, Engineering or a related field
  • 4-10+ year industry experience building solutions in commercial SaaS, including at least 4 years working in applications of NLP, Search or AI/ML technologies for healthcare
  • Strong interest in applying AI/ML/NLP to healthcare related problems and data
  • Expert-level practical, hands-on experience developing and applying a wide range of techniques in Natural Language Processing, including fine tuning of LLMs and other Transformer models, plus one or more additional AI/ML or Search related areas of expertise to solve real-world problems at scale
  • Demonstrated ability to lead and perform research and experimentation to select appropriate approaches, algorithms, evaluation methods, and frameworks, as well as tasks such as feature selection, language modeling, evaluation and fine tuning or training models, applying standard approaches or developing new tools or workflows as needed to meet project requirements
  • Significant experience building and deploying AI/machine learning and NLP models for large-scale SaaS products, including familiarity with industry standard software development concepts such as scaling issues, version control, CI/CD pipelines, and security
  • Solid understanding and experience with transformer models and multiple kinds of NLP and ML models and approaches including logistic regression, random forest, ensemble methods, SVM, KNN, reinforcement learning, and other ML techniques
  • Proficiency in Python and Java required. Proficiency in JavaScript or TypeScript and modern UI frameworks for building prototype or tool front ends desired
  • Proficiency doing data engineering for ML and NLP applications, including exposure to database systems and proficiency with SQL
  • Proficiency building models from big data using modern packages, models and data analysis stacks such as NumPy, SciPy, Pandas, Scikit-learn, PyTorch, Keras, LightGBM, fastText, NLTK, and spaCy. Proficiency fine tuning Hugging Face Transformers required
Job Responsibility
Job Responsibility
  • You will be applying NLP including GenAI and other AI/ML techniques to develop model systems and solutions, collaborating across functions to scale and integrate advanced solutions into successful end user experiences in large-scale cloud based SaaS production environments for healthcare
  • You will be working with product leaders, clinical informaticists, data scientists, UI/UX researchers and designers, other AI and machine learning and domain experts, engineering teams and others, including work with customers and users who are healthcare professionals
  • Design, build and evaluate solutions that may involve structured or unstructured data including speech or natural language for healthcare use cases, delivering capabilities such as summarization, predictive models, recommenders, semantic search, extraction, classification or other NLP, AI or machine learning based techniques
  • You will be performing research and experimentation to select appropriate approaches, algorithms, evaluation methods and frameworks and doing the R&D to deliver model systems
  • You will perform, oversee and assist in data collection, data cleaning, data analysis, algorithm selection or design, prompt tuning, parameter fine tuning, training, development and evaluation of systems that deliver responsible AI solutions at scale, using existing or developing new tools or workflows as needed
  • As a principal applied researcher, you will bring deep technical expertise and also provide mentorship on advanced AI, NLP, data science, statistical and machine learning methods and technologies, helping the organization develop new capabilities for innovative solutions
  • You will have substantial independence and responsibility from day one
What we offer
What we offer
  • Benefits starting from Day 1
  • Retirement Plan Matching
  • Flexible Paid Time Off
  • Wellness Support Programs and Resources
  • Parental & Caregiver Leaves
  • Fertility & Adoption Support
  • Continuous Development Support Program
  • Employee Assistance Program
  • Allyship and Inclusion Communities
  • Employee Recognition … and more
  • Fulltime
Read More
Arrow Right

Principal AI/ML & Innovation Engineer

We are seeking Principal AI/ML & Innovation Engineer who will be leading initiat...
Location
Location
Puerto Rico , Aguadilla
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or master’s degree in computer science, engineering, data science, machine learning, artificial intelligence, or closely related quantitative discipline
  • Typically, 10-15 years’ experience
  • Solid understanding of fundamental AI and machine learning concepts, including supervised and unsupervised learning, deep learning, reinforcement learning, natural language processing, computer vision, and statistical modeling
  • Proficient in implementing and deploying various machine learning algorithms, such as decision trees, random forests, support vector machines, and neural networks
  • Knowledge of popular machine learning frameworks and libraries like TensorFlow, PyTorch, or sci-kit
  • Strong understanding of GitHub CoPilot, Cursor, N8N, vibe coding, Windsurf, and similar technologies
  • Experience in Cloud Infrastructure (AWS, Azure, etc)
  • Knowledge of Open Source, Linux, etc
  • Understanding of Devops, SRE
  • Expertise in deep learning techniques, architectures, and frameworks (e.g., convolutional neural networks (CNN), recurrent neural networks (RNN), generative adversarial networks (GAN), etc.)
Job Responsibility
Job Responsibility
  • Designing, developing, and deploying advanced machine learning models and algorithms
  • Leading research initiatives to explore novel approaches and technologies
  • Designing the architecture of AI systems and ensuring scalability, performance, and reliability
  • Collaborating with other teams, such as data scientists, software engineers, and product managers
  • Providing technical leadership and mentorship to junior engineers
  • Overseeing and guiding multiple design review sessions across different projects
  • Partnering with the engineering manager and team lead to establish long-term design and implementation strategies
  • Leading efforts to incorporate feedback loops and continuous improvement processes
  • Leading meetings, ensuring efficient progress tracking, issue resolution, and team coordination
  • Creating and delivering high-level presentations and reports to executive stakeholders
What we offer
What we offer
  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion
  • Fulltime
Read More
Arrow Right

Principal Engineer

The Principal AI/ML Operations Engineer leads the architecture, automation, and ...
Location
Location
United States , Pleasanton, California
Salary
Salary:
251000.00 - 314500.00 USD / Year
blackline.com Logo
BlackLine
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Machine Learning, Data Science, or a related field
  • 10+ years in ML infrastructure, DevOps, and software system architecture
  • 4+ years in leading MLOps or AI Ops platforms
  • Strong programming skills in languages such as Python, Java, or Scala
  • Expertise in ML frameworks (TensorFlow, PyTorch, scikit-learn) and orchestration tools (Airflow, Kubeflow, Vertex AI, MLflow)
  • Proven experience operating production pipelines for ML and LLM-based systems across cloud ecosystems (GCP, AWS, Azure)
  • Deep familiarity with LangChain, LangGraph, ADK or similar agentic system runtime management
  • Strong competencies in CI/CD, IaC, and DevSecOps pipelines integrating testing, compliance, and deployment automation
  • Hands-on with observability stacks (Prometheus, Grafana, Newrelic) for model and agent performance tracking
  • Understanding of governance frameworks for Responsible AI, auditability, and cost metering across training and inference workloads
Job Responsibility
Job Responsibility
  • Define enterprise-level standards and reference architectures for ML-Ops and AIOps systems
  • Partner with data science, security, and product teams to set evaluation and governance standards (Guardrails, Bias, Drift, Latency SLAs)
  • Mentor senior engineers and drive design reviews for ML pipelines, model registries, and agentic runtime environments
  • Lead incident response and reliability strategies for ML/AI systems
  • Lead the deployment of AI models and systems in various environments
  • Collaborate with development teams to integrate AI solutions into existing workflows and applications
  • Ensure seamless integration with different platforms and technologies
  • Define and manage MCP Registry for agentic component onboarding, lifecycle versioning, and dependency governance
  • Build CI/CD pipelines automating LLM agent deployment, policy validation, and prompt evaluation of workflows
  • Develop and operationalize experimentation frameworks for agent evaluations, scenario regression, and performance analytics
What we offer
What we offer
  • short-term and long-term incentive programs
  • robust offering of benefit and wellness plans
  • Fulltime
Read More
Arrow Right

Senior Principal Researcher - Cloud and AI Infrastructure

Microsoft Research Asia – Vancouver lab, located in the vibrant city of Vancouve...
Location
Location
Canada , Vancouver
Salary
Salary:
163000.00 - 296400.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Doctorate in relevant field AND 6+ years related research experience
  • OR Master's Degree in relevant field AND 7+ years related research experience
  • OR Bachelor's Degree in relevant field AND 9+ years related research experience
  • OR equivalent experience
  • 3+ years’ experience in research related to infrastructure design, computer architecture, or artificial intelligence
  • Experience publishing academic papers as a lead author or essential contributor
  • Experience participating in a top conference in relevant research domain
  • Experience in optimizing or designing hardware components and architectures to enhance performance, reliability, efficiency
Job Responsibility
Job Responsibility
  • Investigate and analyze emerging hardware technologies, trends, and advancements
  • Design and optimize hardware components, systems, and architectures to enhance performance, reliability, and efficiency
  • Conduct simulations, tests, and validations to ensure hardware designs meet required specifications and performance goals
  • Develop prototypes and proof-of-concept models to demonstrate new hardware technologies and applications
  • Identify opportunities for hardware improvements and cost reductions by staying informed about industry best practices and standards
  • Collaborate with cross-functional teams, including software researchers, designers, and engineers, to identify hardware requirements and develop innovative solutions
  • Partner with manufacturing vendors and production teams to transition innovative designs and concepts into deployable systems
  • Document research findings, design decisions, and technical specifications to facilitate knowledge sharing and collaboration within the organization
  • Fulltime
Read More
Arrow Right

Associate Vice President for Research Computing

The Associate Vice President (AVP) for Research Computing serves as the senior e...
Location
Location
United States of America , Rochester
Salary
Salary:
205245.00 - 328392.00 USD / Year
urmc.rochester.edu Logo
University of Rochester
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in a relevant field of study.
  • Doctoral or advanced degree in a relevant field of study, advanced strongly preferred.
  • Minimum of 10 years of experience in informatics and computational research, with at least 5 years in a leadership role preferred.
  • Experience as a faculty member in higher education preferred.
  • Experience working in research-intensive higher education environment preferred.
  • Deep understanding of the research lifecycle and advanced knowledge of HPC architectures, scientific software, cloud-based research environments, and large-scale data storage.
  • Working knowledge of scientific concepts in fields such as biology, biochemistry, genomic, imaging, chemistry, physics and data science / AI
  • Proven ability to lead large technical teams, foster faculty partnerships, and manage multimillion-dollar research computing portfolios.
  • Strong communication skills and ability to interact effectively at all organization levels.
  • Broad IT experience including solutions architecture, application development, engineering, business analysis, and project management.
Job Responsibility
Job Responsibility
  • Lead the development and execution of a long-range strategic plan for research computing that supports the university’s R1 research mission, including investments in HPC, research storage, AI/ML environments, cloud platforms, secure data enclaves and staffing.
  • Collaborate with the Vice President for Research and IT, Deans, and faculty leaders to define institutional priorities, align resources, and support cutting-edge, interdisciplinary research initiatives.
  • Represent the University in national and international consortia focused on research computing infrastructure, research data governance, and secure research computing.
  • Serve as a strategic advisor to executive leadership on research policy, funding, and risk management related to advanced research computing.
  • Oversee Operations, performance, and lifecycle management of the University’s research computing environment, including HPC clusters and cloud platforms.
  • Lead cross-functional technical teams responsible for system design, user support, research application integration, and compliance with research security standards (e.g.NIST 800-171, FISMA)
  • Oversee service-level agreements, uptime metrics, downtime and maintenance procedures and communications and annual investment planning to ensure the environment remains resilient, scalable, and aligned with faculty needs.
  • Act as a campus-wide leader and trusted advisor to faculty and research teams across disciplines, proactively identifying research needs and aligning computational services accordingly.
  • Lead outreach, onboarding, and education programs that expand awareness of research computing services and improve access and usability for all research teams, especially those in emerging or underserved disciplines.
  • Oversee consultation and proposal development services that support grant applications, including effort related to compute budgeting, data management planning, and infrastructure letters of support.
  • Fulltime
Read More
Arrow Right

Senior Principal Machine Learning Engineer

You’ll form a new team of passionate engineers dedicated to building and scaling...
Location
Location
United States
Salary
Salary:
222300.00 - 348975.00 USD / Year
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s, Master’s, or PhD in Computer Science, Statistics, Mathematics, or a related field, or equivalent practical experience
  • 12+ years of industry experience in machine learning, data science, or AI, with a proven track record of delivering production-grade ML systems
  • Deep expertise in Python, Go, or Java, with the ability to write performant, production-quality code
  • familiarity with SQL, Spark, and cloud data environments (e.g., AWS, GCP, Databricks)
  • Experience building and scaling ML models for business-critical applications, ideally in security, privacy, anti-abuse, or compliance domains
  • Strong communication skills, able to explain complex ML concepts to diverse audiences and influence stakeholders
  • Demonstrated ability to solve ambiguous, complex problems and drive projects from ideation to production
  • Agile development mindset, with a focus on iterative improvement and business impact
Job Responsibility
Job Responsibility
  • Lead AI/ML Strategy for Trust: Drive the development and implementation of advanced machine learning algorithms and AI systems for Trust, Security, Product Abuse, and Compliance use cases (e.g., threat detection, vulnerability management, privacy automation, AI safety)
  • Architect and Scale ML Platforms: Design and build scalable, secure, and reliable ML infrastructure and pipelines, ensuring compliance with privacy and regulatory requirements
  • AI Safety and Responsible AI: Develop and champion AI safety practices, including output moderation, explainability, and alignment with evolving regulatory frameworks
  • Cross-Functional Collaboration: Partner with product, engineering, security, privacy, and analytics teams to deliver transformative AI/ML solutions that enhance Atlassian’s trust posture
  • Mentorship and Leadership: Mentor and guide ML engineers and data scientists, fostering a culture of technical excellence, innovation, and continuous improvement
  • Innovation and Research: Stay at the forefront of AI/ML research, evaluating and applying the latest techniques (e.g., LLMs, anomaly detection, privacy-preserving ML) to real-world Trust challenges
  • Platform Enablement: Build reusable ML services and APIs that empower other teams to integrate AI/ML into their products and workflows
  • Operational Excellence: Ensure high availability, reliability, and security of all ML-powered Trust platforms and services
What we offer
What we offer
  • health and wellbeing resources
  • paid volunteer days
  • benefits, bonuses, commissions, and equity
  • Fulltime
Read More
Arrow Right

Principal PM Manager

Microsoft’s Cloud business is expanding, and the Cloud Supply Chain (CSCP) organ...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree AND 8+ years experience in product/service/program management or software development OR equivalent experience
  • 1+ year(s) people management experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • These requirements include, but are not limited to, the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Lead a team of product managers to drive key platforms that underpins the success of our cloud infrastructure: AI based Data Platform that provides, curated, high-quality, governed data, in an industry standard format, to enable agentic AI capabilities to power the growing Microsoft AI infrastructure
  • AI native Collaboration Platform: This leader will be pivotal to drive the strategy and execution of transforming the way we collaborate with our external partners, suppliers and SI’s in an agentic world
  • Strategic Leadership: Set the strategic direction for the data and collaboration group by helping teams identify and prioritize unmet customer needs, emerging industry opportunities, and cloud and AI native trends
  • Guide product managers in synthesizing customer and market signals—telemetry, qualitative research and competitive analysis—to shape problem definitions and determine the most impactful path to pursue
  • Establish clear business goals, successful metrics and measurement criteria
  • Execution & Cross Functional Alignment: Lead regular reviews (cross org/executive), ensuring decisionmakers have clarity on roadmaps, program status, progress, and risks
  • Build strong alignment with engineering, design, research, field, and business partners to ensure platforms progress effectively toward production readiness
  • Oversee customer engagements, including proofs of concept and early production use, ensuring customer insights drive prioritization and iteration
  • People Leadership (Model, Coach, Care): Model: Demonstrate Microsoft’s leadership principles, fostering clarity, trust, inclusion, and resilient innovation in ambiguous problem spaces
  • Coach: Develop product managers’ strategic judgment, customer insight skills, and ability to drive deeply technical early stage work across organizational boundaries
  • Fulltime
Read More
Arrow Right

Principal AI Security Researcher

Microsoft Sentinel Platform NEXT R&D labs is the strategic incubation engine beh...
Location
Location
United States , Multiple Locations
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Doctorate in Statistics, Mathematics, Computer Science, Computer Security, or related field AND 3+ years experience in software development lifecycle, large-scale computing, threat analysis or modeling, cybersecurity, vulnerability research, and/or anomaly detection
  • OR Master's Degree in Statistics, Mathematics, Computer Science, Computer Security, or related field AND 4+ years experience in software development lifecycle, large-scale computing, threat analysis or modeling, cybersecurity, vulnerability research, and/or anomaly detection
  • OR Bachelor's Degree in Statistics, Mathematics, Computer Science, Computer Security, or related field AND 6+ years experience in software development lifecycle, large-scale computing, threat analysis or modeling, cybersecurity, vulnerability research, and/or anomaly detection
  • OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Microsoft Cloud Background Check
  • 5+ years of experience in cybersecurity, AI, software development lifecycle, large-scale computing, modeling, and/or anomaly detection
  • 5+ years of professional experience in security operations, pen-testing, researching cyber threats, understanding attacker methodology, tools, and infrastructure
  • Demonstrated autonomy and success driving zero-to-one (0→1) initiatives
  • ML background and hands-on experience
Job Responsibility
Job Responsibility
  • Security AI Research: be the security expert to our AI-focused team, helping evaluate our systems on real data, improve system inputs, triage and investigate AI-based findings, leverage AI and security experience to incubate and transform our products, educate applied scientists in cybersecurity
  • Collaboration: Partner with engineering, product, and research teams to translate scientific advances into robust, scalable, and production-ready solutions
  • AI/ML Research: design, development, and analysis of novel AI and machine learning models and algorithms for security and enterprise-scale applications
  • Experimentation & Evaluation: Design and execute AI experiments, simulations, and evaluations to validate models and system performance, ensuring measurable improvements
  • Customer Impact: Engage with enterprise customers and field teams to co-design solutions, gather feedback, and iterate quickly based on real-world telemetry and outcomes
  • Fulltime
Read More
Arrow Right