CrawlJobs Logo

Principal Software Engineering Manager - AI Frameworks

https://www.microsoft.com/ Logo

Microsoft Corporation

Location Icon

Location:
United States , Redmond

Category Icon

Job Type Icon

Contract Type:
Employment contract

Salary Icon

Salary:

139900.00 - 304200.00 USD / Year

Job Description:

As a Principal Software Engineering Manager - AI Frameworks on the team, you will lead and grow a group of engineers working across multiple layers of the AI software serving stack, including fundamental abstractions, runtimes, libraries, and application programming interfaces (APIs). You will be responsible for setting technical direction, prioritizing investments, and ensuring the team delivers high-impact performance improvements that enable large-scale model training and inference. In this role, you will guide the team’s work on benchmarking OpenAI and other large language models (LLMs) across GPUs and Microsoft hardware, driving performance optimization, monitoring regressions, and accelerating time-to-deployment. You will partner closely with researchers, product teams, and platform owners to translate performance insights into production-ready improvements that reduce hardware footprint and support Microsoft Azure’s capex efficiency goals.

Job Responsibility:

  • Lead and develop a team of engineers working across multiple layers of the AI software stack to enable large-scale training and inference
  • Set technical vision and execution strategy for model performance benchmarking, optimization, and deployment across GPUs and Microsoft hardware
  • Drive performance outcomes by prioritizing and overseeing efforts to benchmark, profile, debug, and optimize training and inference workloads
  • Own performance health by establishing mechanisms to monitor regressions, measure impact, and continuously improve time-to-deploy and hardware efficiency
  • Partner cross-functionally with research, product, infrastructure, and hardware teams to deliver scalable, production-ready AI performance improvements
  • Balance short-term delivery and long-term investments, ensuring the team’s work aligns with organizational goals, platform roadmaps, and Azure capex objectives
  • Build a strong engineering culture through coaching, feedback, hiring, and career development, enabling the team to operate with increasing autonomy and impact

Requirements:

  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Master’s Degree in Computer Science or related technical field AND 10+ years of software engineering experience, including 6+ years in engineering management, OR Bachelor’s Degree in Computer Science or related technical field AND 12+ years of software engineering experience, including 6+ years in engineering management, or equivalent experience
  • Strong technical foundation in software engineering principles, computer architecture, GPU architecture, and hardware acceleration for neural networks, with the ability to guide teams working in these areas
  • Experience leading teams responsible for end-to-end performance analysis and optimization of LLMs, AI systems, or HPC workloads, including use of GPU profiling and performance analysis tools
  • Demonstrated ability to lead cross-team initiatives, align stakeholders, and translate research or platform capabilities into scalable, production-ready solutions
  • Proven people leadership skills, including hiring, coaching, performance management, and career development, with a track record of building high-performing, inclusive teams
  • Exposure to AI / ML infrastructure, including DNN or LLM training and/or inference systems, and experience with at least one modern deep learning framework (e.g., PyTorch, TensorFlow, ONNX Runtime)
  • Familiarity with GPU software stacks and acceleration technologies such as CUDA, ROCm, Triton, or equivalent, sufficient to guide technical direction and evaluate tradeoffs

Additional Information:

Job Posted:
April 24, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Principal Software Engineering Manager - AI Frameworks

Principal AI Engineer

We are looking for a Principal AI Engineer to lead the design and deployment of ...
Location
Location
United States
Salary
Salary:
200000.00 - 300000.00 USD / Year
apollo.io Logo
Apollo.io
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of software engineering experience
  • at least 3 years in applied LLM or agentic AI systems (2023–present)
  • proven success in deploying LLM-powered products used by real users at scale
  • deep backend & systems engineering expertise with Python, distributed systems, and scalable APIs
  • familiarity with LangChain, LlamaIndex, or similar orchestration frameworks
  • experience with RAG pipelines, vector DBs, embedding models, and semantic search tuning
  • experience managing performance across cloud providers (e.g., AWS Bedrock, OpenAI, Anthropic, etc.)
  • demonstrated experience building multi-step agents, planning workflows, chaining reasoning steps, and integrating APIs with agent memory/state
  • comfort with advanced prompting strategies, few-shot and chain-of-thought reasoning, and embedding retrieval setups
  • strong understanding of AI system evaluation, human ratings, A/B experimentation, and feedback loop pipelines
Job Responsibility
Job Responsibility
  • Architect and lead the development of multi-agent systems capable of long-horizon planning, reasoning, and API orchestration
  • build reusable agentic components that integrate deeply into sales and marketing processes
  • own and evolve our in-house platform for scalable, low-latency, and cost-efficient LLM and agent deployments
  • lead design of interfaces powered by natural language understanding and retrieval-augmented generation (RAG)
  • build embedding-based, intent-aware search and personalization systems tuned to business user needs
  • drive innovation in personalized outreach generation using context-aware generation pipelines
  • tune inference pipelines, caching layers, and model selection logic for high-scale, cost-aware performance
  • define and drive robust offline and online testing methodologies (A/B, sandboxing, human evals) across agents and LLM flows
  • architect human-in-the-loop systems and telemetry to improve accuracy, UX, and explainability over time
What we offer
What we offer
  • equity
  • company bonus or sales commissions/bonuses
  • 401(k) plan
  • at least 10 paid holidays per year
  • flex PTO
  • parental leave
  • employee assistance program
  • wellbeing benefits
  • global travel coverage
  • life/AD&D/STD/LTD insurance
  • Fulltime
Read More
Arrow Right

Principal Engineer

As a Principal Engineer at Aignostics, you will play a crucial role in shaping t...
Location
Location
Germany , Berlin
Salary
Salary:
Not provided
aignostics.com Logo
Aignostics
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Advanced degree in Computer Science, Software Engineering, or a related field
  • 10+ years of software development experience, with at least 5 years in senior technical leadership roles
  • Proven track record of driving technical excellence and innovation in organizations with 50+ engineers
  • Excellent communication skills, able to articulate complex technical concepts to both technical and non-technical stakeholders
  • Solid background in large scale systems and software architecture, design patterns, and clean coding
  • Extensive experience in designing and implementing large-scale, distributed and event-driven systems
  • Extensive experience with data processing at scale
  • Extensive expertise in multiple programming languages and frameworks
  • Deep understanding of cloud technologies (GCP, AWS), containerization and orchestration (Kubernetes)
  • Familiarity with DevSecOps and MLOps practices, complex CI/CD pipelines, and infrastructure as code
Job Responsibility
Job Responsibility
  • Own the technical direction and architectural integrity of our platform
  • Advise our CTO and Sr. Vice President of Engineering on the technical vision of Aignostics
  • Align our technical strategy with business objectives to provide a competitive advantage
  • Resolve technical conflicts across teams and harmonize technologies to unlock synergies
  • Advise product management on technical feasibility, cost, and risks of complex product features
  • Drive technical design, planning, and integration of our platform across systems
  • Provide technical guidance in system design reviews for all teams
  • Educate senior and mid-level engineers to bring them up to the next level
  • Demonstrate long-term thinking and utmost technical excellence in your individual contributions
  • Lead the technical strategic planning and execution across the TechOrg's quarterly roadmap
What we offer
What we offer
  • Cutting-edge AI research and development, with involvement of Charité, TU Berlin and our other partners
  • Work with a welcoming, diverse and highly international team of colleagues
  • Opportunity to take responsibility and grow your role within the startup
  • Expand your skills by benefitting from our Learning & Development yearly budget of 1,000 € (plus 2 L&D days), language classes and internal development programs
  • Mentoring program, you’ll learn from great experts
  • Flexible working hours and teleworking policy
  • Enjoy your well-deserved time off within our 30 paid vacations days per year
  • We are family & pet friendly and support flexible parental leave options
  • Pick a subsidized membership of your choice among public transport, sports and well-being
  • Enjoy our social gatherings, lunches, and off-site events for a fun and inclusive work environment
  • Fulltime
Read More
Arrow Right

Principal Engineer

The Principal AI/ML Operations Engineer leads the architecture, automation, and ...
Location
Location
United States , Pleasanton, California
Salary
Salary:
251000.00 - 314500.00 USD / Year
blackline.com Logo
BlackLine
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Machine Learning, Data Science, or a related field
  • 10+ years in ML infrastructure, DevOps, and software system architecture
  • 4+ years in leading MLOps or AI Ops platforms
  • Strong programming skills in languages such as Python, Java, or Scala
  • Expertise in ML frameworks (TensorFlow, PyTorch, scikit-learn) and orchestration tools (Airflow, Kubeflow, Vertex AI, MLflow)
  • Proven experience operating production pipelines for ML and LLM-based systems across cloud ecosystems (GCP, AWS, Azure)
  • Deep familiarity with LangChain, LangGraph, ADK or similar agentic system runtime management
  • Strong competencies in CI/CD, IaC, and DevSecOps pipelines integrating testing, compliance, and deployment automation
  • Hands-on with observability stacks (Prometheus, Grafana, Newrelic) for model and agent performance tracking
  • Understanding of governance frameworks for Responsible AI, auditability, and cost metering across training and inference workloads
Job Responsibility
Job Responsibility
  • Define enterprise-level standards and reference architectures for ML-Ops and AIOps systems
  • Partner with data science, security, and product teams to set evaluation and governance standards (Guardrails, Bias, Drift, Latency SLAs)
  • Mentor senior engineers and drive design reviews for ML pipelines, model registries, and agentic runtime environments
  • Lead incident response and reliability strategies for ML/AI systems
  • Lead the deployment of AI models and systems in various environments
  • Collaborate with development teams to integrate AI solutions into existing workflows and applications
  • Ensure seamless integration with different platforms and technologies
  • Define and manage MCP Registry for agentic component onboarding, lifecycle versioning, and dependency governance
  • Build CI/CD pipelines automating LLM agent deployment, policy validation, and prompt evaluation of workflows
  • Develop and operationalize experimentation frameworks for agent evaluations, scenario regression, and performance analytics
What we offer
What we offer
  • short-term and long-term incentive programs
  • robust offering of benefit and wellness plans
  • Fulltime
Read More
Arrow Right

Principal AI/ML & Innovation Engineer

We are seeking Principal AI/ML & Innovation Engineer who will be leading initiat...
Location
Location
Puerto Rico , Aguadilla
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or master’s degree in computer science, engineering, data science, machine learning, artificial intelligence, or closely related quantitative discipline
  • Typically, 10-15 years’ experience
  • Solid understanding of fundamental AI and machine learning concepts, including supervised and unsupervised learning, deep learning, reinforcement learning, natural language processing, computer vision, and statistical modeling
  • Proficient in implementing and deploying various machine learning algorithms, such as decision trees, random forests, support vector machines, and neural networks
  • Knowledge of popular machine learning frameworks and libraries like TensorFlow, PyTorch, or sci-kit
  • Strong understanding of GitHub CoPilot, Cursor, N8N, vibe coding, Windsurf, and similar technologies
  • Experience in Cloud Infrastructure (AWS, Azure, etc)
  • Knowledge of Open Source, Linux, etc
  • Understanding of Devops, SRE
  • Expertise in deep learning techniques, architectures, and frameworks (e.g., convolutional neural networks (CNN), recurrent neural networks (RNN), generative adversarial networks (GAN), etc.)
Job Responsibility
Job Responsibility
  • Designing, developing, and deploying advanced machine learning models and algorithms
  • Leading research initiatives to explore novel approaches and technologies
  • Designing the architecture of AI systems and ensuring scalability, performance, and reliability
  • Collaborating with other teams, such as data scientists, software engineers, and product managers
  • Providing technical leadership and mentorship to junior engineers
  • Overseeing and guiding multiple design review sessions across different projects
  • Partnering with the engineering manager and team lead to establish long-term design and implementation strategies
  • Leading efforts to incorporate feedback loops and continuous improvement processes
  • Leading meetings, ensuring efficient progress tracking, issue resolution, and team coordination
  • Creating and delivering high-level presentations and reports to executive stakeholders
What we offer
What we offer
  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion
  • Fulltime
Read More
Arrow Right

Principal Data Engineer

PointClickCare is searching for a Principal Data Engineer who will contribute to...
Location
Location
United States
Salary
Salary:
183200.00 - 203500.00 USD / Year
pointclickcare.com Logo
PointClickCare
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Principal Data Engineer with at least 10 years of professional experience in software or data engineering, including a minimum of 4 years focused on streaming and real-time data systems
  • Proven experience driving technical direction and mentoring engineers while delivering complex, high-scale solutions as a hands-on contributor
  • Deep expertise in streaming and real-time data technologies, including frameworks such as Apache Kafka, Flink, and Spark Streaming
  • Strong understanding of event-driven architectures and distributed systems, with hands-on experience implementing resilient, low-latency pipelines
  • Practical experience with cloud platforms (AWS, Azure, or GCP) and containerized deployments for data workloads
  • Fluency in data quality practices and CI/CD integration, including schema management, automated testing, and validation frameworks (e.g., dbt, Great Expectations)
  • Operational excellence in observability, with experience implementing metrics, logging, tracing, and alerting for data pipelines using modern tools
  • Solid foundation in data governance and performance optimization, ensuring reliability and scalability across batch and streaming environments
  • Experience with Lakehouse architectures and related technologies, including Databricks, Azure ADLS Gen2, and Apache Hudi
  • Strong collaboration and communication skills, with the ability to influence stakeholders and evangelize modern data practices within your team and across the organization
Job Responsibility
Job Responsibility
  • Lead and guide the design and implementation of scalable streaming data pipelines
  • Engineer and optimize real-time data solutions using frameworks like Apache Kafka, Flink, Spark Streaming
  • Collaborate cross-functionally with product, analytics, and AI teams to ensure data is a strategic asset
  • Advance ongoing modernization efforts, deepening adoption of event-driven architectures and cloud-native technologies
  • Drive adoption of best practices in data governance, observability, and performance tuning for streaming workloads
  • Embed data quality in processing pipelines by defining schema contracts, implementing transformation tests and data assertions, enforcing backward-compatible schema evolution, and automating checks for freshness, completeness, and accuracy across batch and streaming paths before production deployment
  • Establish robust observability for data pipelines by implementing metrics, logging, and distributed tracing for streaming jobs, defining SLAs and SLOs for latency and throughput, and integrating alerting and dashboards to enable proactive monitoring and rapid incident response
  • Foster a culture of quality through peer reviews, providing constructive feedback and seeking input on your own work
What we offer
What we offer
  • Benefits starting from Day 1!
  • Retirement Plan Matching
  • Flexible Paid Time Off
  • Wellness Support Programs and Resources
  • Parental & Caregiver Leaves
  • Fertility & Adoption Support
  • Continuous Development Support Program
  • Employee Assistance Program
  • Allyship and Inclusion Communities
  • Employee Recognition … and more!
  • Fulltime
Read More
Arrow Right

Principal Full Stack Engineer

As a Principal Full Stack Engineer, you will be responsible for architecting, de...
Location
Location
United States , San Francisco
Salary
Salary:
170800.00 - 274300.00 USD / Year
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Business Applications Experience with Oracle Fusion Cloud, Zuora Revenue, Coupa, Anaplan, Avalara, and prior QTC architecture experience
  • Strong proficiency in modern programming languages (e.g., Java, Python) and frameworks (e.g., React, Node.js)
  • Exposure to integration platforms such as Workato and RPA platforms such as UIPath
  • Experience with AI technologies and machine learning frameworks, with a focus on integrating these into business applications
  • Familiarity with cloud environments such as AWS or GCP, and experience with deploying and managing applications in the cloud
  • Ability to tackle complex technical challenges and provide innovative solutions
  • Excellent communication skills to collaborate effectively with cross-functional and leadership teams both across Engineering and Finance organizations
Job Responsibility
Job Responsibility
  • Design and implement scalable and robust full-stack solutions that integrate with finance systems or business applications
  • Collaborate with data scientists and machine learning engineers to incorporate AI features into products, enhancing functionality and user experience
  • Lead technical design and architecture discussions, ensuring best practices are followed in software development and AI integration
  • Work closely with technical product managers (TPM), designers, and other engineers and business teams to deliver high-quality products that meet business needs
  • Drive innovation by exploring new technologies and methodologies to improve product offerings and development processes
What we offer
What we offer
  • health coverage
  • paid volunteer days
  • wellness resources
  • Fulltime
Read More
Arrow Right

Principal Software Engineer, AI Developer Tools

At Docker, we make app development easier so developers can focus on what matter...
Location
Location
United States , Seattle
Salary
Salary:
232000.00 - 319000.00 USD / Year
docker.com Logo
Docker
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years software engineering experience with 3+ years in Staff or Principal Engineer roles
  • Deep expertise in AI/ML technologies with hands-on production experience building LLM-powered applications, AI agents, or AI-assisted developer tools
  • Strong understanding of LLM APIs (OpenAI, Anthropic, etc.), prompt engineering, agent orchestration frameworks, and practical applications of AI in software development workflows
  • Proven track record of architecting and building highly scalable distributed systems and developer-facing platforms
  • Production experience with modern cloud-native infrastructure including Kubernetes, GitOps deployment patterns, observability systems, and CI/CD pipelines
  • Proficiency in Go (preferred), Rust, Java, or Python with strong software engineering fundamentals
  • Experience designing developer tools, platform engineering systems, or internal tools that enable other teams
  • Exceptional product and platform mindset considering business outcomes, developer experience, and technical trade-offs
  • Strong communication skills with ability to influence technical and non-technical stakeholders across the organization
  • Track record of technical mentorship and elevating engineering teams' capabilities
Job Responsibility
Job Responsibility
  • Define the long-term technical vision and architecture for AI-powered developer tools and the self-service platform that enables teams to build their own AI agents
  • Establish architectural patterns, technical standards, and best practices for LLM integration, AI agent development, and production AI systems serving developers
  • Lead technical strategy for platform capabilities including deployment frameworks (ArgoCD/GitOps), observability integration (Grafana), security controls, and operational tooling for AI developer tools
  • Design highly available, scalable infrastructure for hosting AI agents and developer tools with predictable performance and intelligent resource management
  • Drive technical decisions on AI technology choices, LLM provider strategies, prompt engineering approaches, and agent orchestration frameworks
  • Partner with Senior Manager and product leadership to align technical architecture with business objectives and productization opportunities
  • Architect and build production-ready AI agents for developer productivity including code review assistants, test generators, deployment diagnostics, and incident response automation
  • Design and implement the self-service platform infrastructure that reduces time-to-production for new AI tools from weeks to days
  • Build systems that accelerate adoption of AI-native development tools (Claude Code, Cursor, Warp) across Docker's engineering organization
  • Establish reliability, security, and performance standards for AI systems including SLOs, monitoring, incident response, and cost management
What we offer
What we offer
  • Freedom & flexibility
  • fit your work around your life
  • Designated quarterly Whaleness Days plus end of year Whaleness break
  • Home office setup
  • we want you comfortable while you work
  • 16 weeks of paid Parental leave
  • Technology stipend equivalent to $100 net/month
  • PTO plan that encourages you to take time to do the things you enjoy
  • Training stipend for conferences, courses and classes
  • Equity
  • Fulltime
Read More
Arrow Right

Principal, Developer Relations, AI Developer Ecosystem

We are seeking a Principal, Developer Relations leader to shape and grow Arm’s A...
Location
Location
China , Shanghai
Salary
Salary:
Not provided
arm.com Logo
ARM
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven experience in Developer Relations, technical alliances, or partner management within the AI, software, or semiconductor industry in China, ideally including engineering team leadership
  • Strong understanding of AI frameworks, model optimization, and hardware–software interactions, with specific knowledge of the China AI ecosystem
  • Familiarity with Arm architectures and their advantages for AI performance and efficiency in cloud, edge, or mobile scenarios
  • Demonstrated experience leading and managing DevRel engineers or developer advocates (or similar technical advocacy roles), including setting goals, coaching, and hiring
  • Excellent communication and relationship management skills, able to engage technical and non-technical stakeholders, from developer communities to senior executives
  • Ability to analyze market and developer trends and influence strategic decisions based on AI software readiness and developer needs unique to China
  • Experience working with AI ISVs, open-source communities, or AI software ecosystem enablement initiatives in China
  • Fluency in Mandarin and English, with strong cross-cultural communication skills
  • Self-motivated, comfortable operating at a principal level, and able to manage multiple strategic initiatives in a fast-paced, matrixed environment
Job Responsibility
Job Responsibility
  • Lead, mentor, and grow a team of Developer Relations engineers/advocates, providing clear goals, coaching, and performance feedback
  • Define the operating model, priorities, and success metrics for the DevRel team in China, aligning with global Arm strategy and regional business goals
  • Recruit, onboard, and develop DevRel talent, building a high-performing team with complementary strengths in advocacy, content, and ecosystem enablement
  • Foster a culture of collaboration, data-driven decision-making, and “learn in public” within the DevRel team
  • Define and lead the Developer Relations strategy for AI in China, ensuring key AI frameworks, models, and applications are optimized for Arm platforms across cloud, edge, and device
  • Identify gaps and opportunities in the China AI ecosystem and translate them into clear DevRel initiatives, programs, and technical investments
  • Influence internal product and engineering roadmaps with developer and partner feedback, market signals, and AI software readiness needs specific to China
  • Act as a principal-level technical advocate for AI on Arm in China, representing Arm at conferences, meetups, webinars, and online communities
  • Guide your team to create high-impact technical content—sample code, reference implementations, tutorials, blog posts, and talks—that demonstrates best practices for AI workloads on Arm
  • Work with engineering and documentation teams to improve SDKs, tools, and documentation, drive benchmarks and competitive analysis, making it easy for Chinese developers to evaluate, adopt, and optimize AI on Arm
What we offer
What we offer
  • Health and Wellness
  • Work and Life Success
  • Financial Rewards
  • Development and Support
Read More
Arrow Right