CrawlJobs Logo

Senior Principal Storage Systems Engineer

United States, Redmond 163000.00 - 296400.00 USD / Year · Job Posted February 21, 2026
Apply Position
Job Link Share

Job Description

Join the Strategic Planning and Architecture (SPARC) team within Microsoft’s Azure Hardware Systems and Infrastructure (AHSI) organization, the team behind Microsoft’s expanding Cloud Infrastructure and for powering Microsoft’s “Intelligent Cloud” mission. As part of the Systems Planning and Architecture (SPARC) group, you will help with pathfinding and architecture for future compute platforms, storage and related technologies that create advantages for Azure and Microsoft. You will collaborate across the Azure organization to evaluate next-generation datacenter technologies and influence Azure product roadmaps for both Microsoft and 3rd party silicon and systems.

Job Responsibility

  • Drive pathfinding initiatives to identify and quantify optimization opportunities across distributed and disaggregated storage and memory architectures for computing and AI inferencing systems
  • Conduct in-depth architectural analysis of next-generation storage technologies, leveraging a strong understanding of workloads across key Azure segments
  • Collaborate cross-functionally to influence technology direction and contribute to long-term strategic planning for evolving datacenter architectures
  • Lead architectural exploration of emerging technologies through robust proof-of-concepts (PoCs) and end-to-end prototyping, aligned with product segments and real-world usage scenarios
  • Partner with hardware design and software enablement teams to mature innovations from concept validation to production-ready solutions
  • Engage with Azure operators and customers to identify current challenges and anticipate emerging needs across the platform
  • Evaluate and identify promising technologies aligned with Azure business priorities, engage ecosystem partners, and de-risk productization through capable PoCs
  • Work across organizational boundaries with roadmap planners, product architects, and hardware and software engineering teams to successfully integrate innovative solutions into Azure datacenters

Requirements

  • Master's Degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 9+ years technical engineering experience OR Bachelor's Degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 11+ years technical engineering experience OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • 10+ years of experience with system performance evaluation using industry standard benchmarks and/or common cloud workloads
  • 10+ years of experience with significant hardware/software co-design projects involving CPU and/or systems architecture and influencing technical direction
  • Prior experience developing and driving cloud technologies for improved TCO/$ and TCO/$/performance
  • Intellectual curiosity and passion about learning and deploying new technologies
  • Verbal and written communication skills and ability to engage technical & non-technical peers
  • Experience contributing to complex projects with respect and integrity, including those with multiple workstreams spanning different business and technical disciplines
  • Understanding of compute and storage systems in the cloud, including storage and network technologies, experience with software-defined storage and distributed file systems
  • Deep understanding of AI inference systems and associated software, and emerging approaches to orchestrate tiered memory and storage capabilities for distributed serving and KV caching for agentic systems
  • Deep expertise in AI scale-up and scale-out networking/interconnect architectures, along with a good understanding of memory/storage technologies spanning HBM, LPDDR, HBF, etc
  • Skilled in partnering and influencing architects, hardware engineers, and software leads
  • Experience with gathering and analyzing system telemetry and low-level performance counters to identify and root-cause performance bottlenecks
  • Problem-solving skills, analytical capabilities, and attention to detail
  • Ability to manage through ambiguity, teamwork, and sense of presumed responsibility

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Senior Principal Storage Systems Engineer

8 matching positions

Principal Software Engineer, Trusted Data Platform

As a Principal Software Engineer, you will be a technical leader and hands-on co...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Software Engineering, or a related technical field
  • 10+ years of experience in backend software development, focusing on distributed systems and storage solutions
  • 5+ years of experience working with AWS storage services (S3, DynamoDB, EBS, EFS, FSx, Glacier)
  • Strong expertise in system design, architecture, and scalability for large-scale storage solutions
  • Proficiency in at least one major backend programming language (Kotlin, Java, Go, Rust, or Python)
  • Experience designing and implementing highly available, fault-tolerant, and cost-efficient storage architectures
  • Deep understanding of distributed systems, replication strategies, sharding, and caching
  • Knowledge of data security, encryption best practices, and compliance requirements (SOC2, GDPR, HIPAA)
  • Experience leading engineering teams, mentoring senior engineers, and driving technical roadmaps
  • Proficiency with observability tools, performance monitoring, and troubleshooting at scale
Job Responsibility
Job Responsibility
  • Designing and optimizing high-scale, distributed storage systems built on AWS storage technologies
  • Shaping the architecture, performance, and reliability of backend storage solutions that power critical applications at scale
  • Designing, implementing, and optimizing backend storage services that support high throughput, low latency, and fault tolerance
  • Working closely with senior engineers, architects, and cross-functional teams to drive scalability, availability, and efficiency improvements in large-scale storage solutions
  • Leading technical deep dives, architecture reviews, and root cause analyses to resolve complex production issues related to storage performance, consistency, and durability
  • Driving best practices in distributed system design, security, and cloud cost optimization
  • Mentoring senior engineers, contributing to technical roadmaps, and helping shape the long-term storage strategy
  • Collaborating with Site Reliability Engineers (SREs) to implement observability, monitoring, and disaster recovery strategies, ensuring high availability and compliance with industry standards
  • Advocating for automation, Infrastructure-as-Code (IaC), and DevOps best practices, leveraging tools like Terraform, AWS CloudFormation, Kubernetes (EKS), and CI/CD pipelines to enable scalable deployments and operational excellence
What we offer
What we offer
  • Atlassians can choose where they work – whether in an office, from home, or a combination of the two
  • Atlassians have more control over supporting their family, personal goals, and other priorities
  • We can hire people in any country where we have a legal entity
  • Interviews and onboarding are conducted virtually
  • Whatever your preference - working from home, an office, or in between - you can choose the place that's best for your work and your lifestyle
Read More
Arrow Right

Senior Principal Data Platform Software Engineer

We’re looking for a Sr Principal Data Platform Software Engineer (P70) to be a k...
Location
Location
Salary
Salary:
239400.00 - 312550.00 USD / Year
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 15+ years in Data Engineering, Software Engineering, or related roles, with substantial exposure to big data ecosystems
  • Demonstrated experience building and operating data platforms or large‑scale data services in production
  • Proven track record of building services from the ground up (requirements → design → implementation → deployment → ongoing ownership)
  • Hands‑on experience with AWS, GCP (e.g., compute, storage, data, and streaming services) and cloud‑native architectures
  • Practical experience with big data technologies, such as Databricks, Apache Spark, AWS EMR, Apache Flink, or StarRocks
  • Strong programming skills in one or more of: Kotlin, Scala, Java, Python
  • Experience leading cross‑team technical initiatives and influencing senior stakeholders
  • Experience mentoring Staff/Principal engineers and lifting the technical bar for a team or org
  • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience
Job Responsibility
Job Responsibility
  • Design, develop and own delivery of high quality big data and analytical platform solutions aiming to solve Atlassian’s needs to support millions of users with optimal cost, minimal latency and maximum reliability
  • Improve and operate large‑scale distributed data systems in the cloud (primarily AWS, with increasing integration with GCP and Kubernetes‑based microservices)
  • Drive the evolution of our high-performance analytical databases and its integrations with products, cloud infrastructures (AWS and GCP) and isolated cloud environments
  • Help define and uplift engineering and operational standards for petabyte scale data platforms, with sub‑second analytic queries and multi‑region availability (coding guidelines, code review practices, observability, incident response, SLIs/SLOs)
  • Partner across multiple product and platform teams (including Analytics, Marketplace/Ecosystem, Core Data Platform, ML Platform, Search, and Oasis/FedRAMP) to deliver company‑wide initiatives that depend on reliable, high‑quality data
  • Act as a technical mentor and multiplier, raising the bar on design quality, code quality, and operational excellence across the broader team
  • Design and implement self‑healing, resilient data platforms with strong observability, fault tolerance, and recovery characteristics
  • Own the long‑term architecture and technical direction of Atlassian’s product data platform with projects that are directly tied to Atlassian’s company-level OKRs
  • Be accountable for the reliability, cost efficiency, and strategic direction of Atlassian’s product analytical data platform
  • Partner with executives and influence senior leaders to align engineering efforts with Atlassian’s long-term business objectives
What we offer
What we offer
  • health and wellbeing resources
  • paid volunteer days
  • Fulltime
Read More
Arrow Right

Principal Engineer

The Principal AI/ML Operations Engineer leads the architecture, automation, and ...
Location
Location
United States , Pleasanton, California
Salary
Salary:
251000.00 - 314500.00 USD / Year
blackline.com Logo
BlackLine
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Machine Learning, Data Science, or a related field
  • 10+ years in ML infrastructure, DevOps, and software system architecture
  • 4+ years in leading MLOps or AI Ops platforms
  • Strong programming skills in languages such as Python, Java, or Scala
  • Expertise in ML frameworks (TensorFlow, PyTorch, scikit-learn) and orchestration tools (Airflow, Kubeflow, Vertex AI, MLflow)
  • Proven experience operating production pipelines for ML and LLM-based systems across cloud ecosystems (GCP, AWS, Azure)
  • Deep familiarity with LangChain, LangGraph, ADK or similar agentic system runtime management
  • Strong competencies in CI/CD, IaC, and DevSecOps pipelines integrating testing, compliance, and deployment automation
  • Hands-on with observability stacks (Prometheus, Grafana, Newrelic) for model and agent performance tracking
  • Understanding of governance frameworks for Responsible AI, auditability, and cost metering across training and inference workloads
Job Responsibility
Job Responsibility
  • Define enterprise-level standards and reference architectures for ML-Ops and AIOps systems
  • Partner with data science, security, and product teams to set evaluation and governance standards (Guardrails, Bias, Drift, Latency SLAs)
  • Mentor senior engineers and drive design reviews for ML pipelines, model registries, and agentic runtime environments
  • Lead incident response and reliability strategies for ML/AI systems
  • Lead the deployment of AI models and systems in various environments
  • Collaborate with development teams to integrate AI solutions into existing workflows and applications
  • Ensure seamless integration with different platforms and technologies
  • Define and manage MCP Registry for agentic component onboarding, lifecycle versioning, and dependency governance
  • Build CI/CD pipelines automating LLM agent deployment, policy validation, and prompt evaluation of workflows
  • Develop and operationalize experimentation frameworks for agent evaluations, scenario regression, and performance analytics
What we offer
What we offer
  • short-term and long-term incentive programs
  • robust offering of benefit and wellness plans
  • Fulltime
Read More
Arrow Right

Senior Principal Technical Program Manager - ML Platform

Location
Location
Salary
Salary:
231300.00 - 301975.00 USD / Year
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience on software teams as Development Manager, Technical Product Manager or TPM leading technical platforms areas
  • Deep domain experience in AI and/or Search. Example: Model Inference, Model Evaluation, Model Training, LLM Ops, Semantic Search, Search Relevance, etc.
  • Partner with Engineering in defining direction, strategy and execution at Platform level
  • Strategic thinking and ability to understand business objectives to translate them into technical problems and programs.
  • Technical understanding of systems involved. Willingness to develop domain expertise in the area they operate - storage, networking, authentication, capacity management, service deployments, etc.
  • TPMs are not expected to write or read code, but are expected to understand system flows, block architectures, APIs and such.
  • Experience defining and running end-to-end complex technical programs
  • Strong leadership, organizational, and communication skills
Job Responsibility
Job Responsibility
  • Understand and stay up-to-date on latest innovations in AI and Search. Partner closely with engineering teams to translate these into practical platform evolution for Atlassian bringing value to our customers.
  • Analyze business objectives, customer needs, product adoption inhibitors and opportunities, industry trends, and based on these, in close collaboration with your stakeholders, define a long-term strategy and roadmap for your platform and product components.
  • Understand business objectives and translate them into technical systems problems that need to be prioritized solved in the current business environment.
  • Define specific systems programs and create a plan of action for realizing those programs. Such programs could be around capacity planning, migration efforts, high availability, network architecture, performance optimization, reliability improvements and more.
  • Use your technical understanding of Atlassian and related systems to partner with and influence engineers and architects in making progress on these problems.
  • Responsible for taking a systematic approach to engineering problems. This includes: prioritizing tasks, scoping out the project, defining objectives, and making consistent progress against each of these.
  • Be accountable for the success of these technical programs by managing the entire lifecycle from initiation to forecasting, budgeting, scheduling, etc.
  • Manage complex dependencies and projects with a broad scope across the company
What we offer
What we offer
  • health and wellbeing resources
  • paid volunteer days
Read More
Arrow Right

Senior ML Data Engineer

As a Senior Data Engineer, you will play a pivotal role in our AI/ML workstream,...
Location
Location
Poland , Warsaw
Salary
Salary:
Not provided
awin.com Logo
Awin Global
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor or Master’s degree in data science, data engineering, Computer Science with focus on math and statistics / Master’s degree is preferred
  • At least 5 years experience as AI/ML data engineer undertaking above task and accountabilities
  • Strong foundation in computer science principes and statistical methods
  • Strong experience with cloud technology (AWS or Azure)
  • Strong experience with creation of data ingestion pipeline and ET process
  • Strong knowledge of big data tool such as Spark, Databricks and Python
  • Strong understanding of common machine learning techniques and frameworks (e.g. mlflow)
  • Strong knowledge of Natural language processing (NPL) concepts
  • Strong knowledge of scrum practices and agile mindset
  • Strong Analytical and Problem-Solving Skills with attention to data quality and accuracy
Job Responsibility
Job Responsibility
  • Design and maintain scalable data pipelines and storage systems for both agentic and traditional ML workloads
  • Productionise LLM- and agent-based workflows, ensuring reliability, observability, and performance
  • Build and maintain feature stores, vector/embedding stores, and core data assets for ML
  • Develop and manage end-to-end traditional ML pipelines: data prep, training, validation, deployment, and monitoring
  • Implement data quality checks, drift detection, and automated retraining processes
  • Optimise cost, latency, and performance across all AI/ML infrastructure
  • Collaborate with data scientists and engineers to deliver production-ready ML and AI systems
  • Ensure AI/ML systems meet governance, security, and compliance requirements
  • Mentor teams and drive innovation across both agentic and classical ML engineering practices
  • Participate in team meetings and contribute to project planning and strategy discussions
What we offer
What we offer
  • Flexi-Week and Work-Life Balance: We prioritise your mental health and well-being, offering you a flexible four-day Flexi-Week at full pay and with no reduction to your annual holiday allowance. We also offer a variety of different paid special leaves as well as volunteer days
  • Remote Working Allowance: You will receive a monthly allowance to cover part of your running costs. In addition, we will support you in setting up your remote workspace appropriately
  • Pension: Awin offers access to an additional pension insurance to all employees in Germany
  • Flexi-Office: We offer an international culture and flexibility through our Flexi-Office and hybrid/remote work possibilities to work across Awin regions
  • Development: We’ve built our extensive training suite Awin Academy to cover a wide range of skills that nurture you professionally and personally, with trainings conveniently packaged together to support your overall development
  • Appreciation: Thank and reward colleagues by sending them a voucher through our peer-to-peer program
Read More
Arrow Right

Senior ML Data Engineer

As a Senior Data Engineer, you will play a pivotal role in our AI/ML workstream,...
Location
Location
Salary
Salary:
Not provided
awin.com Logo
Awin Global
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor or Master’s degree in data science, data engineering, Computer Science with focus on math and statistics / Master’s degree is preferred
  • At least 5 years experience as AI/ML data engineer undertaking above task and accountabilities
  • Strong foundation in computer science principes and statistical methods
  • Strong experience with cloud technology (AWS or Azure)
  • Strong experience with creation of data ingestion pipeline and ET process
  • Strong knowledge of big data tool such as Spark, Databricks and Python
  • Strong understanding of common machine learning techniques and frameworks (e.g. mlflow)
  • Strong knowledge of Natural language processing (NPL) concepts
  • Strong knowledge of scrum practices and agile mindset
Job Responsibility
Job Responsibility
  • Design and maintain scalable data pipelines and storage systems for both agentic and traditional ML workloads
  • Productionise LLM- and agent-based workflows, ensuring reliability, observability, and performance
  • Build and maintain feature stores, vector/embedding stores, and core data assets for ML
  • Develop and manage end-to-end traditional ML pipelines: data prep, training, validation, deployment, and monitoring
  • Implement data quality checks, drift detection, and automated retraining processes
  • Optimise cost, latency, and performance across all AI/ML infrastructure
  • Collaborate with data scientists and engineers to deliver production-ready ML and AI systems
  • Ensure AI/ML systems meet governance, security, and compliance requirements
  • Mentor teams and drive innovation across both agentic and classical ML engineering practices
  • Participate in team meetings and contribute to project planning and strategy discussions
What we offer
What we offer
  • Flexi-Week and Work-Life Balance: We prioritise your mental health and well-being, offering you a flexible four-day Flexi-Week at full pay and with no reduction to your annual holiday allowance. We also offer a variety of different paid special leaves as well as volunteer days
  • Remote Working Allowance: You will receive a monthly allowance to cover part of your running costs. In addition, we will support you in setting up your remote workspace appropriately
  • Pension: Awin offers access to an additional pension insurance to all employees in Germany
  • Flexi-Office: We offer an international culture and flexibility through our Flexi-Office and hybrid/remote work possibilities to work across Awin regions
  • Development: We’ve built our extensive training suite Awin Academy to cover a wide range of skills that nurture you professionally and personally, with trainings conveniently packaged together to support your overall development
  • Appreciation: Thank and reward colleagues by sending them a voucher through our peer-to-peer program
Read More
Arrow Right

Sr. Engineering Manager - Storage Engineering

At Cloudera, we empower people to transform complex data into clear and actionab...
Location
Location
United States , San Jose
Salary
Salary:
203000.00 - 254000.00 USD / Year
cloudera.com Logo
Cloudera
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience in software engineering
  • 2+ years in an engineering management role
  • Demonstrable experience with the design, implementation, and operation of large-scale distributed systems, particularly in storage, file systems, databases, or cloud infrastructure
  • Strong understanding of fundamental storage concepts (e.g., consistency, replication, erasure coding, caching)
  • Proven track record of leading and managing high-performing engineering teams, demonstrating excellent communication and organizational skills
  • Excellent written and verbal communication skills
  • Bachelor's or Master's degree in Computer Science, Computer Engineering, or a related technical field
Job Responsibility
Job Responsibility
  • Manage and lead a team of talented engineers and senior individual contributors based in North America
  • Develop and execute on a technical roadmap and strategy for your team, aligning with the department's vision and the company's business goals
  • Lead and mentor a team of software engineers, including senior and principal-level contributors, fostering a culture of technical excellence and innovation
  • Partner with Engineering leaders, product managers, and partner teams to understand requirements, develop solid designs and implementations, and facilitate integration and adoption
  • Drive and enforce best practices for the software development lifecycle, including coding standards, testing, deployment, system scalability, reliability, and security, tracking key performance indicators for engineering quality and efficiency
  • Communicate team progress, successes, challenges, and strategic plans clearly and transparently to engineering leadership and other business stakeholders
  • Oversee team resources, staffing, mentoring, and enhancing a best-of-class engineering team
  • Work closely with customers in various geographies and partner teams (like PS and support) to ensure successful adoption of Ozone and provide technical guidance for enterprise customers running 100s of petabytes-scale big data analytics and ML/AI pipelines
  • Guide the team in contributing to the Apache open-source community
What we offer
What we offer
  • Generous PTO Policy
  • Support work life balance with Unplugged Days
  • Flexible WFH Policy
  • Mental & Physical Wellness programs
  • Phone and Internet Reimbursement program
  • Access to Continued Career Development
  • Comprehensive Benefits and Competitive Packages
  • Paid Volunteer Time
  • Employee Resource Groups
  • Fulltime
Read More
Arrow Right

Sr. Engineering Manager - Storage Engineering

At Cloudera, we empower people to transform complex data into clear and actionab...
Location
Location
United States , San Jose
Salary
Salary:
203000.00 - 254000.00 USD / Year
cloudera.com Logo
Cloudera
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience in software engineering
  • 2+ years in an engineering management role
  • Demonstrable experience with the design, implementation, and operation of large-scale distributed systems, particularly in storage, file systems, databases, or cloud infrastructure
  • Strong understanding of fundamental storage concepts (e.g., consistency, replication, erasure coding, caching)
  • Proven track record of leading and managing high-performing engineering teams, demonstrating excellent communication and organizational skills
  • Excellent written and verbal communication skills
  • Bachelor's or Master's degree in Computer Science, Computer Engineering, or a related technical field
Job Responsibility
Job Responsibility
  • Manage and lead a team of talented engineers and senior individual contributors based in North America
  • Develop and execute on a technical roadmap and strategy for your team, aligning with the department's vision and the company's business goals
  • Lead and mentor a team of software engineers, including senior and principal-level contributors, fostering a culture of technical excellence and innovation
  • Partner with Engineering leaders, product managers, and partner teams to understand requirements, develop solid designs and implementations, and facilitate integration and adoption
  • Drive and enforce best practices for the software development lifecycle, including coding standards, testing, deployment, system scalability, reliability, and security, tracking key performance indicators for engineering quality and efficiency
  • Communicate team progress, successes, challenges, and strategic plans clearly and transparently to engineering leadership and other business stakeholders
  • Oversee team resources, staffing, mentoring, and enhancing a best-of-class engineering team
  • Work closely with customers in various geographies and partner teams (like PS and support) to ensure successful adoption of Ozone and provide technical guidance for enterprise customers running 100s of petabytes-scale big data analytics and ML/AI pipelines
  • Guide the team in contributing to the Apache open-source community
What we offer
What we offer
  • Generous PTO Policy
  • Support work life balance with Unplugged Days
  • Flexible WFH Policy
  • Mental & Physical Wellness programs
  • Phone and Internet Reimbursement program
  • Access to Continued Career Development
  • Comprehensive Benefits and Competitive Packages
  • Paid Volunteer Time
  • Employee Resource Groups
  • Fulltime
Read More
Arrow Right