CrawlJobs Logo

Senior Principal Storage Systems Engineer

https://www.microsoft.com/ Logo

Microsoft Corporation

Location Icon

Location:
United States , Redmond

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

163000.00 - 296400.00 USD / Year

Job Description:

Join the Strategic Planning and Architecture (SPARC) team within Microsoft’s Azure Hardware Systems and Infrastructure (AHSI) organization, the team behind Microsoft’s expanding Cloud Infrastructure and for powering Microsoft’s “Intelligent Cloud” mission. As part of the Systems Planning and Architecture (SPARC) group, you will help with pathfinding and architecture for future compute platforms, storage and related technologies that create advantages for Azure and Microsoft. You will collaborate across the Azure organization to evaluate next-generation datacenter technologies and influence Azure product roadmaps for both Microsoft and 3rd party silicon and systems.

Job Responsibility:

  • Drive pathfinding initiatives to identify and quantify optimization opportunities across distributed and disaggregated storage and memory architectures for computing and AI inferencing systems
  • Conduct in-depth architectural analysis of next-generation storage technologies, leveraging a strong understanding of workloads across key Azure segments
  • Collaborate cross-functionally to influence technology direction and contribute to long-term strategic planning for evolving datacenter architectures
  • Lead architectural exploration of emerging technologies through robust proof-of-concepts (PoCs) and end-to-end prototyping, aligned with product segments and real-world usage scenarios
  • Partner with hardware design and software enablement teams to mature innovations from concept validation to production-ready solutions
  • Engage with Azure operators and customers to identify current challenges and anticipate emerging needs across the platform
  • Evaluate and identify promising technologies aligned with Azure business priorities, engage ecosystem partners, and de-risk productization through capable PoCs
  • Work across organizational boundaries with roadmap planners, product architects, and hardware and software engineering teams to successfully integrate innovative solutions into Azure datacenters

Requirements:

  • Master's Degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 9+ years technical engineering experience OR Bachelor's Degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 11+ years technical engineering experience OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • 10+ years of experience with system performance evaluation using industry standard benchmarks and/or common cloud workloads
  • 10+ years of experience with significant hardware/software co-design projects involving CPU and/or systems architecture and influencing technical direction
  • Prior experience developing and driving cloud technologies for improved TCO/$ and TCO/$/performance
  • Intellectual curiosity and passion about learning and deploying new technologies
  • Verbal and written communication skills and ability to engage technical & non-technical peers
  • Experience contributing to complex projects with respect and integrity, including those with multiple workstreams spanning different business and technical disciplines
  • Understanding of compute and storage systems in the cloud, including storage and network technologies, experience with software-defined storage and distributed file systems
  • Deep understanding of AI inference systems and associated software, and emerging approaches to orchestrate tiered memory and storage capabilities for distributed serving and KV caching for agentic systems
  • Deep expertise in AI scale-up and scale-out networking/interconnect architectures, along with a good understanding of memory/storage technologies spanning HBM, LPDDR, HBF, etc
  • Skilled in partnering and influencing architects, hardware engineers, and software leads
  • Experience with gathering and analyzing system telemetry and low-level performance counters to identify and root-cause performance bottlenecks
  • Problem-solving skills, analytical capabilities, and attention to detail
  • Ability to manage through ambiguity, teamwork, and sense of presumed responsibility

Additional Information:

Job Posted:
February 21, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior Principal Storage Systems Engineer

Senior Principal Engineer Core Data Platform

As an engineer well into your career, we know you're an expert at what you do an...
Location
Location
United States , Seattle; San Francisco; Mountain View
Salary
Salary:
198300.00 - 318600.00 USD / Year
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Software Engineering, or a related technical field
  • 12+ years of experience in backend software development, with a focus on distributed systems and large-scale storage solutions
  • 8+ years of experience designing and managing highly available, large-scale storage architectures in cloud environments
  • 5+ years of hands-on experience working with AWS storage services (S3, EBS, EFS, FSx, Glacier, DynamoDB)
  • Proficiency in system design, performance optimization, and cost-efficient architecture for exabyte-scale storage
  • Expertise in at least one major backend programming language (Kotlin, Java, Go, Rust, or Python)
  • Experience leading technical strategy and architectural decisions in large, multi-team engineering organizations
  • Strong understanding of distributed systems principles, including consistency models, replication, sharding, and consensus algorithms (Raft, Paxos)
  • Deep knowledge of security best practices, including encryption, access control (IAM), and compliance standards (SOC2, GDPR, HIPAA)
  • Experience mentoring senior engineers and driving high-impact engineering initiatives
Job Responsibility
Job Responsibility
  • Collaborate with partner teams and internal customers to help define technical direction and OKRs for the Core Data platform organization
  • Regularly tackle the largest and most complex problems on the team, from technical design to implementation and launch
  • Partner across engineering teams to take on company-wide initiatives spanning multiple projects
  • Routinely tackle complex architecture challenges and apply architectural standards and start using them on new projects
  • Work across senior engineering and product leaders to build strategy and design solutions to earn customers trust and business
  • Own key OKRs and end-to-end outcomes of critical projects in a microservices environment
  • Champion best practices and innovative techniques for scalability, reliability, and performance optimizations
  • Own engineering and operational excellence for the health of our systems and processes
  • Proactively drive opportunities for continuous improvements and own key operational metrics
  • Continually drive developer productivity initiatives to ensure that we unleash the potential of our own teams
What we offer
What we offer
  • health coverage
  • paid volunteer days
  • wellness resources
  • Fulltime
Read More
Arrow Right

Principal Software Engineer, Trusted Data Platform

As a Principal Software Engineer, you will be a technical leader and hands-on co...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Software Engineering, or a related technical field
  • 10+ years of experience in backend software development, focusing on distributed systems and storage solutions
  • 5+ years of experience working with AWS storage services (S3, DynamoDB, EBS, EFS, FSx, Glacier)
  • Strong expertise in system design, architecture, and scalability for large-scale storage solutions
  • Proficiency in at least one major backend programming language (Kotlin, Java, Go, Rust, or Python)
  • Experience designing and implementing highly available, fault-tolerant, and cost-efficient storage architectures
  • Deep understanding of distributed systems, replication strategies, sharding, and caching
  • Knowledge of data security, encryption best practices, and compliance requirements (SOC2, GDPR, HIPAA)
  • Experience leading engineering teams, mentoring senior engineers, and driving technical roadmaps
  • Proficiency with observability tools, performance monitoring, and troubleshooting at scale
Job Responsibility
Job Responsibility
  • Designing and optimizing high-scale, distributed storage systems built on AWS storage technologies
  • Shaping the architecture, performance, and reliability of backend storage solutions that power critical applications at scale
  • Designing, implementing, and optimizing backend storage services that support high throughput, low latency, and fault tolerance
  • Working closely with senior engineers, architects, and cross-functional teams to drive scalability, availability, and efficiency improvements in large-scale storage solutions
  • Leading technical deep dives, architecture reviews, and root cause analyses to resolve complex production issues related to storage performance, consistency, and durability
  • Driving best practices in distributed system design, security, and cloud cost optimization
  • Mentoring senior engineers, contributing to technical roadmaps, and helping shape the long-term storage strategy
  • Collaborating with Site Reliability Engineers (SREs) to implement observability, monitoring, and disaster recovery strategies, ensuring high availability and compliance with industry standards
  • Advocating for automation, Infrastructure-as-Code (IaC), and DevOps best practices, leveraging tools like Terraform, AWS CloudFormation, Kubernetes (EKS), and CI/CD pipelines to enable scalable deployments and operational excellence
What we offer
What we offer
  • Atlassians can choose where they work – whether in an office, from home, or a combination of the two
  • Atlassians have more control over supporting their family, personal goals, and other priorities
  • We can hire people in any country where we have a legal entity
  • Interviews and onboarding are conducted virtually
  • Whatever your preference - working from home, an office, or in between - you can choose the place that's best for your work and your lifestyle
Read More
Arrow Right

Senior Principal Data Platform Software Engineer

We’re looking for a Sr Principal Data Platform Software Engineer (P70) to be a k...
Location
Location
Salary
Salary:
239400.00 - 312550.00 USD / Year
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 15+ years in Data Engineering, Software Engineering, or related roles, with substantial exposure to big data ecosystems
  • Demonstrated experience building and operating data platforms or large‑scale data services in production
  • Proven track record of building services from the ground up (requirements → design → implementation → deployment → ongoing ownership)
  • Hands‑on experience with AWS, GCP (e.g., compute, storage, data, and streaming services) and cloud‑native architectures
  • Practical experience with big data technologies, such as Databricks, Apache Spark, AWS EMR, Apache Flink, or StarRocks
  • Strong programming skills in one or more of: Kotlin, Scala, Java, Python
  • Experience leading cross‑team technical initiatives and influencing senior stakeholders
  • Experience mentoring Staff/Principal engineers and lifting the technical bar for a team or org
  • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience
Job Responsibility
Job Responsibility
  • Design, develop and own delivery of high quality big data and analytical platform solutions aiming to solve Atlassian’s needs to support millions of users with optimal cost, minimal latency and maximum reliability
  • Improve and operate large‑scale distributed data systems in the cloud (primarily AWS, with increasing integration with GCP and Kubernetes‑based microservices)
  • Drive the evolution of our high-performance analytical databases and its integrations with products, cloud infrastructures (AWS and GCP) and isolated cloud environments
  • Help define and uplift engineering and operational standards for petabyte scale data platforms, with sub‑second analytic queries and multi‑region availability (coding guidelines, code review practices, observability, incident response, SLIs/SLOs)
  • Partner across multiple product and platform teams (including Analytics, Marketplace/Ecosystem, Core Data Platform, ML Platform, Search, and Oasis/FedRAMP) to deliver company‑wide initiatives that depend on reliable, high‑quality data
  • Act as a technical mentor and multiplier, raising the bar on design quality, code quality, and operational excellence across the broader team
  • Design and implement self‑healing, resilient data platforms with strong observability, fault tolerance, and recovery characteristics
  • Own the long‑term architecture and technical direction of Atlassian’s product data platform with projects that are directly tied to Atlassian’s company-level OKRs
  • Be accountable for the reliability, cost efficiency, and strategic direction of Atlassian’s product analytical data platform
  • Partner with executives and influence senior leaders to align engineering efforts with Atlassian’s long-term business objectives
What we offer
What we offer
  • health and wellbeing resources
  • paid volunteer days
  • Fulltime
Read More
Arrow Right

Senior Software Engineer - Transactional Data Platform

As a Senior Software Engineer, you will play a critical role in designing, build...
Location
Location
Australia , Sydney
Salary
Salary:
Not provided
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Software Engineering, or a related technical field
  • 5+ years of experience in backend software development
  • 3+ years of hands-on experience working with AWS cloud services, particularly AWS storage technologies (S3, DynamoDB, EBS, EFS, FSx, or Glacier)
  • 3+ years of experience in designing and developing distributed systems or high-scale backend services
  • Strong programming skills in Kotlin
  • Experience working in agile environments following DevOps and CI/CD best practices
  • Strong Backend Development Skills
  • Proficiency in Kotlin, Java for backend development
  • Experience building high-performance, scalable microservices and APIs
  • Strong understanding of RESTful APIs, gRPC, and event-driven architectures
Job Responsibility
Job Responsibility
  • Designing, building, and optimizing high-performance, scalable, and resilient backend storage solutions on AWS cloud infrastructure
  • Developing distributed storage systems, APIs, and backend services that power mission-critical applications, ensuring low-latency, high-throughput, and fault-tolerant data storage
  • Collaborating closely with principal engineers, architects, SREs, and product teams to define technical roadmaps, improve storage efficiency, and optimize access patterns
  • Driving performance tuning, data modeling, caching strategies, and cost optimization across AWS storage services like S3, DynamoDB, EBS, EFS, FSx, and Glacier
  • Contributing to infrastructure automation, security best practices, and monitoring strategies using tools like Terraform, CloudWatch, Prometheus, and OpenTelemetry
  • Troubleshooting and resolving production incidents related to data integrity, latency spikes, and storage failures, ensuring high availability and disaster recovery preparedness
  • Mentoring junior engineers, participating in design reviews and architectural discussions, and advocating for engineering best practices such as CI/CD automation, infrastructure as code, and observability-driven development
What we offer
What we offer
  • Atlassians can choose where they work – whether in an office, from home, or a combination of the two
  • Flexibility for eligible candidates to work remotely across the West US
  • Fulltime
Read More
Arrow Right

Lead Principal Engineer

Atlassian Corporate Engineering is seeking a Lead Principal Engineer to drive ou...
Location
Location
United States , Washington DC; San Francisco; Austin; Mountain View; New York
Salary
Salary:
223100.00 - 358400.00 USD / Year
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • BS in Computer Science or related technical field or equivalent experience
  • 15+ years of experience developing large-scale distributed systems
  • 4+ years of experience providing architectural oversight and technical leadership
  • A track record of engineering in successful products
  • Excellent communication skills and a track record of cross-group/cross-discipline collaboration
  • Broad experience architecting, designing, and building large-scale distributed systems supporting high-usage applications
  • Broad knowledge and understanding of SaaS, PaaS, and IaaS with hands-on experience with one or more public cloud offerings (ideally AWS)
  • Fluency in any modern object-oriented programming language (e.g., Java, Kotlin, Python, Javascript, go, etc.) and in architecture patterns for distributed systems
  • Deep Experience with Storage, both relational and non-relational
  • Experience with applied Machine Learning
Job Responsibility
Job Responsibility
  • Create a vision for a connected ecosystem, and lead the engineering organization to deliver on a cohesive experience
  • Develop engineering excellence standards and drive adoption of those across the entire organization
  • Support the teams' transformation to data-driven decision making, defining metrics and implementing their measurement
  • Work with dependency teams to develop a plan and roadmap, integrating our products into the broader Atlassian ecosystem
  • Help with hiring and mentoring other engineers on team
  • Act as Technical Lead for a large area, providing leadership to other senior technical leads
  • Build relationships with key stakeholders and Engineering leaders across Atlassian.
What we offer
What we offer
  • benefits
  • bonuses
  • commissions
  • equity
  • Fulltime
Read More
Arrow Right

Senior Principal Technical Program Manager

Atlassians can choose where they work – whether in an office, from home, or a co...
Location
Location
United States , Mountain View
Salary
Salary:
191600.00 - 307800.00 USD / Year
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience on software teams as Development Manager or TPM
  • Strategic thinking and ability to understand business objectives to translate them into technical problems and programs
  • Technical understanding of systems involved
  • Willingness to develop domain expertise in the area they operate - storage, networking, authentication, capacity management, service deployments, etc.
  • TPMs are not expected to write or read code, but are expected to understand system flows, block architectures, APIs and such
  • Experience defining and running end-to-end complex technical programs
  • Strong leadership, organizational, and communication skills
Job Responsibility
Job Responsibility
  • Analyze business objectives, customer needs, product adoption inhibitors and opportunities, industry trends, and based on these, in close collaboration with your stakeholders, define a long-term strategy and roadmap for your platform and product components
  • Understand business objectives and translate them into technical systems problems that need to be prioritized solved in the current business environment
  • Define specific systems programs and create a plan of action for realizing those programs
  • Use your technical understanding of Atlassian and related systems to partner with and influence engineers and architects in making progress on these problems
  • Responsible for taking a systematic approach to engineering problems
  • Be accountable for the success of these technical programs by managing the entire lifecycle from initiation to forecasting, budgeting, scheduling, etc.
  • Manage complex dependencies and projects with a broad scope across the company
What we offer
What we offer
  • health coverage
  • paid volunteer days
  • wellness resources
  • Fulltime
Read More
Arrow Right

Principal Engineer

The Principal AI/ML Operations Engineer leads the architecture, automation, and ...
Location
Location
United States , Pleasanton, California
Salary
Salary:
251000.00 - 314500.00 USD / Year
blackline.com Logo
BlackLine
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Machine Learning, Data Science, or a related field
  • 10+ years in ML infrastructure, DevOps, and software system architecture
  • 4+ years in leading MLOps or AI Ops platforms
  • Strong programming skills in languages such as Python, Java, or Scala
  • Expertise in ML frameworks (TensorFlow, PyTorch, scikit-learn) and orchestration tools (Airflow, Kubeflow, Vertex AI, MLflow)
  • Proven experience operating production pipelines for ML and LLM-based systems across cloud ecosystems (GCP, AWS, Azure)
  • Deep familiarity with LangChain, LangGraph, ADK or similar agentic system runtime management
  • Strong competencies in CI/CD, IaC, and DevSecOps pipelines integrating testing, compliance, and deployment automation
  • Hands-on with observability stacks (Prometheus, Grafana, Newrelic) for model and agent performance tracking
  • Understanding of governance frameworks for Responsible AI, auditability, and cost metering across training and inference workloads
Job Responsibility
Job Responsibility
  • Define enterprise-level standards and reference architectures for ML-Ops and AIOps systems
  • Partner with data science, security, and product teams to set evaluation and governance standards (Guardrails, Bias, Drift, Latency SLAs)
  • Mentor senior engineers and drive design reviews for ML pipelines, model registries, and agentic runtime environments
  • Lead incident response and reliability strategies for ML/AI systems
  • Lead the deployment of AI models and systems in various environments
  • Collaborate with development teams to integrate AI solutions into existing workflows and applications
  • Ensure seamless integration with different platforms and technologies
  • Define and manage MCP Registry for agentic component onboarding, lifecycle versioning, and dependency governance
  • Build CI/CD pipelines automating LLM agent deployment, policy validation, and prompt evaluation of workflows
  • Develop and operationalize experimentation frameworks for agent evaluations, scenario regression, and performance analytics
What we offer
What we offer
  • short-term and long-term incentive programs
  • robust offering of benefit and wellness plans
  • Fulltime
Read More
Arrow Right

Senior Principal Technical Program Manager - ML Platform

Location
Location
Salary
Salary:
231300.00 - 301975.00 USD / Year
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience on software teams as Development Manager, Technical Product Manager or TPM leading technical platforms areas
  • Deep domain experience in AI and/or Search. Example: Model Inference, Model Evaluation, Model Training, LLM Ops, Semantic Search, Search Relevance, etc.
  • Partner with Engineering in defining direction, strategy and execution at Platform level
  • Strategic thinking and ability to understand business objectives to translate them into technical problems and programs.
  • Technical understanding of systems involved. Willingness to develop domain expertise in the area they operate - storage, networking, authentication, capacity management, service deployments, etc.
  • TPMs are not expected to write or read code, but are expected to understand system flows, block architectures, APIs and such.
  • Experience defining and running end-to-end complex technical programs
  • Strong leadership, organizational, and communication skills
Job Responsibility
Job Responsibility
  • Understand and stay up-to-date on latest innovations in AI and Search. Partner closely with engineering teams to translate these into practical platform evolution for Atlassian bringing value to our customers.
  • Analyze business objectives, customer needs, product adoption inhibitors and opportunities, industry trends, and based on these, in close collaboration with your stakeholders, define a long-term strategy and roadmap for your platform and product components.
  • Understand business objectives and translate them into technical systems problems that need to be prioritized solved in the current business environment.
  • Define specific systems programs and create a plan of action for realizing those programs. Such programs could be around capacity planning, migration efforts, high availability, network architecture, performance optimization, reliability improvements and more.
  • Use your technical understanding of Atlassian and related systems to partner with and influence engineers and architects in making progress on these problems.
  • Responsible for taking a systematic approach to engineering problems. This includes: prioritizing tasks, scoping out the project, defining objectives, and making consistent progress against each of these.
  • Be accountable for the success of these technical programs by managing the entire lifecycle from initiation to forecasting, budgeting, scheduling, etc.
  • Manage complex dependencies and projects with a broad scope across the company
What we offer
What we offer
  • health and wellbeing resources
  • paid volunteer days
Read More
Arrow Right