CrawlJobs Logo

Principal Software Engineer - Performance Tooling

United States, Redmond 139900.00 - 274800.00 USD / Year · Job Posted April 11, 2026
Apply Position
Job Link Share

Job Description

The Artificial Intelligence (AI) Frameworks team at Microsoft develops AI software that enables running AI models everywhere, from world’s fastest AI supercomputers, to servers, desktops, mobile phones, internet of things (IoT) devices and internet browsers. We collaborate with our hardware teams and partners, both internal and external, and operate at the intersection of AI algorithmic innovation, purpose-built AI hardware, systems, and software. We are a team of highly capable and motivated people that pride themselves on a collaborative and inclusive culture.  We own inference performance of OpenAI and other state of the art large language model (LLM) models and work directly with OpenAI on the models hosted on the Azure OpenAI service serving some of the largest workloads on the planet with trillions of inferences per day in major Microsoft products, including Office, Windows, Bing, SQL Server, and Dynamics.  As a Principal Software Engineer - Performance Tooling on the team, you will have the opportunity to work on multiple levels of the AI software stack, including the fundamental abstractions, programming models, compilers, runtimes, libraries and application programming interfaces (APIs) to enable large scale training and inferencing of models. You will benchmark OpenAI and other LLM models for performance on graphics processing units (GPUs) and Microsoft hardware, debug and optimize performance, monitor performance and enable these models to be deployed in the shortest amount of time and the least amount of hardware possible helping achieve Microsoft Azure's capex goals.

Job Responsibility

  • Work across multiple layers of the AI software stack (abstractions, programming models, compilers, runtimes, libraries, and APIs) to enable large-scale model training and inference
  • Benchmark OpenAI and other LLMs for performance on Graphic Processing Units (GPUs) and Microsoft hardware
  • Debug, profile, and optimize performance for training/inference workloads on CPUs (Central Processing Units)/GPUs
  • Monitor performance regressions and drive continuous improvements to reduce time-to-deploy and hardware footprint
  • Collaborate across teams of researchers and engineers to deliver scalable, production-ready AI performance improvements

Requirements

  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C++, or Python OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. This includes passing the Microsoft Cloud background check upon hire/transfer and every two years thereafter
  • Master's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C++, or Python OR Bachelor's Degree in Computer Science or related technical field AND 15+ years technical engineering experience with coding in languages including, but not limited to, C++, or Python OR equivalent experience
  • 4+ years’ practical experience working on high performance applications and performance debugging and optimization on CPUs/GPUs
  • Experience in DNN/LLM inference and experience in one or more DL frameworks such as PyTorch, Tensorflow, or ONNX Runtime and familiarity with CUDA, ROCm, Triton
  • Technical background and solid foundation in software engineering principles, computer architecture, GPU architecture, hardware neural net acceleration
  • Experience in end-to-end performance analysis and optimization of state of the art LLMs and HPC applications, including proficiency using GPU profiling tools
  • Cross-team collaboration skills and the desire to collaborate in a team of researchers and developers
  • Ability to independently lead projects

Nice to have

  • Master's Degree in Computer Science or related technical field AND 12+ years technical engineering experience
  • 4+ years’ practical experience working on high performance applications and performance debugging and optimization on CPUs/GPUs
  • Experience in DNN/LLM inference and experience in one or more DL frameworks such as PyTorch, Tensorflow, or ONNX Runtime and familiarity with CUDA, ROCm, Triton
  • Technical background and solid foundation in software engineering principles, computer architecture, GPU architecture, hardware neural net acceleration
  • Experience in end-to-end performance analysis and optimization of state of the art LLMs and HPC applications, including proficiency using GPU profiling tools
  • Cross-team collaboration skills and the desire to collaborate in a team of researchers and developers
  • Ability to independently lead projects

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Principal Software Engineer - Performance Tooling

8 matching positions

Principal Software Engineer, Trusted Data Platform

As a Principal Software Engineer, you will be a technical leader and hands-on co...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Software Engineering, or a related technical field
  • 10+ years of experience in backend software development, focusing on distributed systems and storage solutions
  • 5+ years of experience working with AWS storage services (S3, DynamoDB, EBS, EFS, FSx, Glacier)
  • Strong expertise in system design, architecture, and scalability for large-scale storage solutions
  • Proficiency in at least one major backend programming language (Kotlin, Java, Go, Rust, or Python)
  • Experience designing and implementing highly available, fault-tolerant, and cost-efficient storage architectures
  • Deep understanding of distributed systems, replication strategies, sharding, and caching
  • Knowledge of data security, encryption best practices, and compliance requirements (SOC2, GDPR, HIPAA)
  • Experience leading engineering teams, mentoring senior engineers, and driving technical roadmaps
  • Proficiency with observability tools, performance monitoring, and troubleshooting at scale
Job Responsibility
Job Responsibility
  • Designing and optimizing high-scale, distributed storage systems built on AWS storage technologies
  • Shaping the architecture, performance, and reliability of backend storage solutions that power critical applications at scale
  • Designing, implementing, and optimizing backend storage services that support high throughput, low latency, and fault tolerance
  • Working closely with senior engineers, architects, and cross-functional teams to drive scalability, availability, and efficiency improvements in large-scale storage solutions
  • Leading technical deep dives, architecture reviews, and root cause analyses to resolve complex production issues related to storage performance, consistency, and durability
  • Driving best practices in distributed system design, security, and cloud cost optimization
  • Mentoring senior engineers, contributing to technical roadmaps, and helping shape the long-term storage strategy
  • Collaborating with Site Reliability Engineers (SREs) to implement observability, monitoring, and disaster recovery strategies, ensuring high availability and compliance with industry standards
  • Advocating for automation, Infrastructure-as-Code (IaC), and DevOps best practices, leveraging tools like Terraform, AWS CloudFormation, Kubernetes (EKS), and CI/CD pipelines to enable scalable deployments and operational excellence
What we offer
What we offer
  • Atlassians can choose where they work – whether in an office, from home, or a combination of the two
  • Atlassians have more control over supporting their family, personal goals, and other priorities
  • We can hire people in any country where we have a legal entity
  • Interviews and onboarding are conducted virtually
  • Whatever your preference - working from home, an office, or in between - you can choose the place that's best for your work and your lifestyle
Read More
Arrow Right

Principal Software QA Engineer

Principal Software QA Engineer to lead test architecture and automation strategy...
Location
Location
Puerto Rico , Aguadilla
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of hands-on QA experience
  • Designing and building test automation frameworks from scratch
  • Non-functional testing (scale, reliability, performance, security)
  • Strong coding skills in Python, Java, or Go
  • Experience with Pytest, TestNG, JUnit, Playwright or similar tools
  • Deep understanding of Cloud platforms (AWS, Azure, GCP)
  • Microservices, Containers (Docker, Kubernetes)
  • Infrastructure & Data Center management
  • Linux/VM environments, Storage, Compute, Networking
  • REST APIs, JSON, SQL/NoSQL
Job Responsibility
Job Responsibility
  • Design, automate, and execute system-level test cases focused on scale, reliability, security, and performance
  • Lead the test automation strategy
  • evaluate and integrate new tools to improve efficiency and coverage
  • Collaborate closely with product, development, support, and platform engineering teams to ensure full lifecycle quality coverage
  • Provide technical leadership and mentorship to QA engineers and partners across teams
  • Contribute to design reviews with a QA lens to ensure testability and risk mitigation
  • Maintain and manage multiple product test configurations aligned with diverse deployment environments
What we offer
What we offer
  • Health & Wellbeing benefits
  • Personal & Professional Development programs
  • Unconditional Inclusion environment
  • Comprehensive suite of benefits supporting physical, financial and emotional wellbeing
  • Fulltime
Read More
Arrow Right

Principal Software Engineer

At PointClickCare our mission is simple: to help providers deliver exceptional c...
Location
Location
Canada , Mississauga
Salary
Salary:
156000.00 - 174000.00 CAD / Year
pointclickcare.com Logo
PointClickCare
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience writing clean code that performs well at scale using Java
  • Experience with UI development and React frameworks
  • Experience with Spring Boot
  • In-depth knowledge of relational databases (e.g. Microsoft SQL Server, MySQL)
  • Solid experience writing RESTful API endpoints
  • Absolutely love TDD and have working knowledge of it
  • Proficient in GIT
  • Experience using system and performance monitoring tools (e.g. New Relic, DataDog)
  • Experience with automated testing frameworks (e.g. Selenium, Cypress, RestAssured)
  • Excellent organization, critical-thinking and personal leadership skills
Job Responsibility
Job Responsibility
  • Identify, prioritize and execute tasks in the software development life cycle
  • Work with business to iterate over software requirements
  • Develop tools and applications by producing clean, efficient code
  • Automate tasks through appropriate tools and scripting
  • Analyze and debug systems
  • Perform validation and verification testing in a test-driven manner
  • Review the work of others, and invite others to review your work
  • Collaborate with internal teams and vendors to fix and improve products
  • Ensure software is up-to-date with latest technologies
What we offer
What we offer
  • Benefits starting from Day 1!
  • Retirement Plan Matching
  • Flexible Paid Time Off
  • Wellness Support Programs and Resources
  • Parental & Caregiver Leaves
  • Fertility & Adoption Support
  • Continuous Development Support Program
  • Employee Assistance Program
  • Allyship and Inclusion Communities
  • Employee Recognition … and more!
  • Fulltime
Read More
Arrow Right

Lead / Principal Software Engineer

We’re hiring Lead and Principal Software Engineers to build the next generation ...
Location
Location
Australia , Sydney
Salary
Salary:
Not provided
blumeglobal.com Logo
Blume Global
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years building scalable, fault-tolerant systems and enterprise software
  • Strong experience with backend architecture, platform modernization, and CI/CD
  • Proficiency in C#, Java, Python, SQL, and JavaScript
  • Experience with cloud infrastructure (AWS, Kinesis, Lambda) and DevOps tools (Docker, Kubernetes, Jenkins)
  • Proven ability to lead technical decisions, mentor engineers, and improve team productivity
  • Strong experience integrating and evaluating AI tools like GitHub Copilot and AIOps in real-world engineering workflows
  • Strong communication across product, compliance, and engineering teams
  • Track record of aligning technical work with business outcomes and customer value
Job Responsibility
Job Responsibility
  • Build the next generation of our platforms
  • Work on high-scale systems that process billions of transactions
  • Modernize core infrastructure
  • Drive AI initiatives to improve performance and reliability
  • Set technical direction
  • Mentor senior engineers
  • Shape architecture across multiple domains
What we offer
What we offer
  • Competitive Package + Equity
  • Find the team/project that fits you best
  • Hybrid and Flexible Work
  • Continuous Learning and Growth
  • Access learning platforms (Coursera, Pluralsight, LinkedIn Learning, WiseTech Academy), mentorship, and development opportunities
  • Top-Tier Hardware
  • Onsite Meals and Snacks
Read More
Arrow Right

Principal Software Engineer

As a Principal Software Engineer at Global-e, you will design and deliver the co...
Location
Location
United States , Hoboken, NJ
Salary
Salary:
Not provided
global-e.com Logo
Global-e
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ Years of Experience: A proven track record building large-scale, customer-facing applications in a fast-paced environment (e-commerce, fintech, tech startups a plus)
  • Distributed Systems Expertise: Familiarity with designing, deploying, and operating resilient, fault-tolerant systems that handle high traffic
  • Engineering Practices Proficiency: Hands-on experience with Agile methodologies, CI/CD pipelines, and rapid release cycles
  • Strong Database Skills: Ability to optimize and scale applications involving complex data interactions
Job Responsibility
Job Responsibility
  • Deliver High-Impact Features: Lead the design, development, and deployment of new capabilities across logistics (fulfillment, labels, tracking) and order management workflows
  • Shape Technical Architecture: Define, communicate, and guide architectural decisions to ensure scalability and reliability
  • Elevate Standards: Champion clean code, best practices, and robust testing frameworks, pushing the team to achieve technical excellence
  • Scale the Product: Propose and implement features, tooling, and infrastructure that support exponential growth and operational efficiency
  • Ensure Quality & Reliability: Employ a rigorous approach to verification, focusing on stable, high-performing systems that meet critical metrics and SLAs
  • Move Fast with Confidence: Embrace a rapid, iterative release cycle, balancing speed and safety through CI/CD pipelines, effective monitoring, and efficient processes
  • Collaborate & Share Knowledge: Work closely with other engineering teams, product managers, and stakeholders to ensure alignment and share expertise
  • Write Code in Scala: Contribute high-quality Scala code (no prior Scala experience required, just a passion for learning and an interest in functional programming)
What we offer
What we offer
  • Impact at Global Scale: Build features used by millions, simplifying global commerce and transforming the e-commerce landscape
  • Modern Technology Stack: Work on an advanced microservices platform, leveraging cloud-native tools and best-in-class engineering practices
  • Growth & Development: Expand your expertise through challenging projects, mentorship opportunities, and professional development programs
Read More
Arrow Right

Lead / Principal Software Engineer

Amtrak will be hiring experienced Software Engineers to support our Digital Tech...
Location
Location
United States , Washington; Philadelphia; Wilmington
Salary
Salary:
103700.00 - 161352.00 USD / Year
amtrak.com Logo
AMTRAK
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s Degree or equivalent combination of education, training and/or relevant experience
  • Plus 6 years of relevant work experience
  • Proficient in Java, Spring Core, Spring Boot, Spring MVC, Spring Batch, and Spring Integration
  • Strong front-end development skills with Angular (latest versions), JavaScript, TypeScript, HTML5, CSS3, Bootstrap, and Material UI
  • Deep understanding of AWS cloud services and cloud-native application architecture
  • Solid experience with SQL/PostgreSQL and relational database design
  • Hands-on experience with Agile methodologies, CI/CD pipelines, and DevOps tools (Jenkins, Git, Docker, Kubernetes)
  • Familiarity with Jira and Confluence for project tracking and documentation
  • Strong knowledge of TDD and BDD principles
  • Excellent problem-solving and analytical skills
Job Responsibility
Job Responsibility
  • Lead the design, development, and deployment of enterprise-grade applications using Java, Spring Frameworks, and Angular
  • Architect and implement cloud-native solutions leveraging AWS services and container orchestration with Kubernetes
  • Drive best practices in Agile development, CI/CD pipelines, and DevOps tooling (Jenkins, Git, Docker)
  • Collaborate with cross-functional teams to ensure high-quality deliverables aligned with business objectives
  • Implement Test-Driven Development (TDD) and Behavior-Driven Development (BDD) methodologies to maintain robust and reliable code
  • Optimize application performance and scalability through effective database design and query tuning in PostgreSQL or other relational databases
  • Provide technical leadership, mentorship, and guidance to junior engineers and peers
  • Ensure compliance with security standards and industry best practices throughout the software development lifecycle
What we offer
What we offer
  • Health, dental, and vision plans
  • Health savings accounts
  • Wellness programs
  • Flexible spending accounts
  • 401K retirement plan with employer match
  • Life insurance
  • Short and long term disability insurance
  • Paid time off
  • Back-up care
  • Adoption assistance
  • Fulltime
Read More
Arrow Right

Principal Fullstack Software Engineer

We're looking for a Principal Fullstack Software Engineer to join our team, pass...
Location
Location
United States , San Francisco
Salary
Salary:
Not provided
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of experience designing/building enterprise-grade solutions using microservices
  • Background in Java, Kotlin, Observability tools, and service operations
  • In-depth knowledge of AWS offerings
  • Experience building distributed systems for a SaaS product
  • Passion and experience with recognizing, raising, and reconciling gaps and redundant efforts across organizations
  • Success with cross-company collaboration
  • Experience influencing and performance coaching engineers
Job Responsibility
Job Responsibility
  • Understand the user journey and user funnel
  • Collaborate with product, design and engineering to influence product strategy and direction
  • Guide the technical direction and implementation of large-scale product features
  • Evaluate trade-offs between correctness, robustness, performance and customer impact to ensure we build the right solution
  • Debug inefficiencies on the team and fix them
  • Ship well-tested, secure, reliable, and maintainable code while keeping our customers best interests in mind
  • Contribute to code reviews, documentation, and complex bug fixes with security, performance and reliability in mind
  • Mentor and level up the skills of your teammates by sharing your expertise
  • Improve the growth engineering team through mentoring
  • Identify blockers to ensure software engineering excellence (design principles and patterns, unit testing, performance engineering, best practices for security and privacy)
What we offer
What we offer
  • health coverage
  • paid volunteer days
  • wellness resources
  • Fulltime
Read More
Arrow Right

Principal Engineer

Principal Engineers at Intercom have the opportunity to lead the definition and ...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
intercom.com Logo
Intercom
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Mastery of domain knowledge and work as a leader within the R&D org to drive key strategic projects
  • Significant, demonstrated impact that your work has had on the product and/or the teams
  • Deep knowledge of a high-level programming language (for example, Ruby, Python, Perl etc.)
  • Experience with Distributed systems
  • 2+ years of experience as the primary technical leader for a team
  • Experience collaborating directly with technical leaders, product teams and designers, and a proven track record of delivering value to customers or users
  • 7+ years of experience working as a fullstack software engineer
Job Responsibility
Job Responsibility
  • Lead the definition and execution of key strategic initiatives
  • Work autonomously and be accountable for strategic execution in part of the engineering organization
  • Build both back-end and front-end systems, and work closely with designers, product managers, researchers, and data analysts
  • Coach and mentor other engineers and partner closely with the Group Engineering Managers on technical strategy and leadership
  • Provide assessments of project progress, risks and challenges to engineering leadership to help guide resource allocation and prioritisation
  • Contribute to our technical architecture as we grow
  • Care about agility as much you care for scalability and availability
  • Contribute to all phases of software development including ideation, prototyping, design and implementation
  • Build using the best tools in the industry
  • Play an active role in hiring, mentoring and career development of other engineers
What we offer
What we offer
  • Competitive salary and equity in a fast-growing start-up
  • We serve lunch every weekday, plus a variety of snack foods and a fully stocked kitchen
  • Regular compensation reviews
  • Pension scheme & match up to 4%
  • Life assurance, as well as comprehensive health and dental insurance for you and your dependents
  • Flexible paid time off policy
  • Paid maternity leave, as well as 6 weeks paternity leave for fathers
  • MacBooks are our standard, but we’re happy to get you whatever equipment helps you get your job done
  • Fulltime
Read More
Arrow Right