CrawlJobs Logo

Critical Infrastructure Platform Engineer

India, Hyderabad · Job Posted May 10, 2026
Apply Position
Job Link Share

Job Description

In alignment with our Microsoft values, we are committed to cultivating an inclusive work environment for all employees to positively impact our culture every day and we need you as a Critical Environment Platform Engineer. Microsoft's Cloud Operations & Innovation (CO+I) is the engine that powers our cloud services. As a (CIPS) Platform Engineer you will perform a key role in delivering the core infrastructure and foundational technologies for Microsoft's online services including Bing, Office 365, Xbox, OneDrive, and the Microsoft Azure platform. You will be part of a team that is responsible for operating the Critical Infrastructure Platform across our unified global datacenters; managing the demand planning and capacity utilization; and responsible for running the physical infrastructure (including supply chain, hardware, power, security, and workflow teams). We emphasize automation, data driven engineering, cost-effectiveness, and environmental sustainability. You will join a team of Platform Engineers who are passionate about designing, building, and operating the world's most advanced cloud infrastructures. You will work on cutting-edge technologies and projects that enable Microsoft to deliver innovative solutions and services to our customers while collaborating with other teams across the company and learning from the best in the industry. If you are looking for a challenging and rewarding career, this is the role for you. This is a flexible work opportunity role offering the option for partial remote work from home. As a group, CO+I is focused on the personal and professional development for all employees and offers trainings and growth opportunities including Career Rotation Programs, Diversity & Inclusion trainings and events, and professional certifications. Our infrastructure is comprised of a large global portfolio of more than 150 datacenters and 1 million servers. Our foundation is built upon and managed by a team of subject matter experts working to support services for more than 1 billion customers and 20 million businesses in over 90 countries worldwide. With environmental sustainability and optimization at the forefront of our datacenter design and operations, we continue to grow and evolve as we meet the ever-changing business demands that hold Microsoft as a world-class cloud provider. Do you want to empower billions across the world? Come and join us in CO+I and be at the forefront of the action!

Job Responsibility

  • Support a hybrid Continuous Integration (CI)/Continuous Delivery/Deployment (CD) DevOps virtualized infrastructure consisting of Windows & Linux Server Operating System, Hyper-V, Active Directory, Domain Name System, PowerShell scripting, with a focus on data protection technologies, service metrics and Key Performance Indicator (KPI) reporting, documentation skills, along with interpersonal awareness, proactivity, and a proven ability to manage and drive delivery of multiple simultaneous cross-group dependent server and service-based projects at scale
  • Perform, maintain, and continuously improve automated operating system installation and configuration
  • Configure and maintain hands-on bare metal enterprise-class server systems, including automated periodic firmware and driver updates at scale, Redundant Array of Independent Disks (RAID) and Intelligent Platform Management Interface (IPMI) configurations and hardware troubleshooting
  • Monitor servers, ensure service levels and KPIs are met and maintained, provide security conscience outcomes maintaining compliance alignment, and ensure 24/7/365 service and infrastructure operations support with continuous optimizations
  • Transition manual operational processes to automation while leveraging CI/CD DevOps principles
  • Partner with a global team while helping bring projects to successful outcomes and delivering rigorous documentation artifacts
  • Share responsibility for automating, securing, configuring, and delivering support of the Critical Environments infrastructure & related programs and projects in existing and future datacenters
  • Embody our culture and values

Requirements

  • Bachelor's Degree or Trade Certification in Computer Science, Information Technology, Mechanical Engineering, Electrical Engineering, Aerospace Engineering, Data Science, Cybersecurity, or related field OR equivalent experience
  • OR equivalent experience of 1 to 3 years
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Microsoft Cloud Background Check

Nice to have

  • Bachelor's Degree in Computer Science, Information Technology, or related field AND an internship in software engineering, network engineering, service engineering, or systems engineering OR equivalent experience
  • Familiarity with enterprise large-scale cloud or distributed systems
  • Basic understanding of Windows Server OS and UNIX-based operating systems including enterprise-level Linux distributions performing automated system installation and configuration, file system concepts, resource monitoring, user administration, package management, and process control & management
  • Demonstrate basic understanding of Configuration management with PowerShell DSC, Puppet, Chef or similar
  • Project or lab work with server & tooling platforms for multiple infrastructure services such as Windows Server 2022/2019, Linux Server distributions, Hyper-V, Azure Arc, MIT Kerberos, Active Directory, DNS, PowerShell/Python infrastructure-as-code scripting, data protection technologies, service monitoring, incident alarming, compliance, and other services to aid in fulfilling operational monitoring, audit logging of events, compliance requirements, outages, or security-related issues

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Critical Infrastructure Platform Engineer

8 matching positions

Platform Engineer - Infrastructure Runtime

Location
Location
Spain , Barcelona
Salary
Salary:
Not provided
deliveryhero.com Logo
Delivery Hero
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 2+ years of relevant infrastructure experience, ideally in a platform team
  • Experience working with Kubernetes internals, observability, and cloud-native architecture (EKS a plus)
  • Familiarity with Golang and/or Python programming
  • Hands-on experience with AWS services (EC2, S3, IAM, VPC, etc.) and Terraform
  • Understanding GitOps principles and tooling (Argo CD, Argo Rollouts, or similar)
  • Experience with CI/CD systems (GitHub Actions or similar)
  • Basic networking and security knowledge (VPCs, DNS, ingress, TLS, etc.)
  • Comfortable working on technical projects and collaborating across teams
  • Analytical mindset for troubleshooting and improving performance of distributed systems
  • Strong written and verbal communication skills in English
Job Responsibility
Job Responsibility
  • Support technical projects that evolve our Kubernetes-based compute platform on AWS, with a focus on reliability, scalability, and developer productivity
  • Help iterate on our GitOps workflows using Argo CD and Argo Rollouts for safe, automated, and progressive deployments
  • Maintain and improve CI/CD best practices by building and maintaining scalable GitHub Actions pipelines
  • Work with secure, multi-tenant infrastructure using Infrastructure as Code (Terraform)
  • Troubleshoot and help resolve challenging problems in distributed systems, service discovery, container orchestration, and platform observability
  • Take ownership of critical compute and networking infrastructure to ensure high performance, availability, and cost efficiency
  • Build and maintain internal tooling and automation scripts using Go and Python
  • Ensure our systems remain robust, reliable, and support smooth business operations
  • Collaborate with platform and product engineers to propagate best practices and platform knowledge
  • Share learnings with the team through documentation, pairing, and technical discussions
What we offer
What we offer
  • An enticing equity plan that lets you own a piece of the action
  • Top-notch private health insurance to keep you at your peak
  • Monthly Glovo credit to satisfy your cravings
  • Cobee discounts on transportation, food, and even kindergarten expenses or office-based nursery
  • Discounted gym memberships to keep you energized
  • The freedom to work from home two days a week, and the opportunity to work from anywhere for up to three weeks a year and personal days off
  • Enhanced parental leave
  • Online therapy and wellbeing benefits to ensure your mental well-being
  • Fulltime
Read More
Arrow Right

Staff Software Engineer, Platform Infrastructure

We are seeking an experienced and highly motivated Staff Software Engineer to le...
Location
Location
United States , Pittsburgh
Salary
Salary:
171000.00 - 273000.00 USD / Year
aurora.tech Logo
Aurora Innovation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Senior or Staff-level experience (P7 equivalent) as a Software Engineer, ideally in infrastructure, developer tooling, or critical shared services
  • Proven experience leading technical projects and mentoring/directing other engineers
  • Familiarity with distributed compute technologies, cloud services (e.g., AWS), and large-scale workflow management systems
  • Demonstrated ability to triage, debug, and perform on-call and incident management for complex, cross-cutting infrastructure issues
  • Strong communication skills to manage stakeholder alignment and drive cross-team standardization efforts
Job Responsibility
Job Responsibility
  • Lead the OTI Team: Serve as the technical lead (TL) for the OTI team within PIE-Compute, driving the strategic vision, execution, and long-term stability of the core infrastructure
  • Help Define and Optimize the Testing Ecosystem: Lead the design of the next-generation offline testing architecture to meet diverse team needs, reducing redundancy and siloing across the organization
  • Partner with Test Creation and Test Drive teams to standardize end-to-end test execution and reporting (Creation -> Execution -> Reporting)
  • Refine the full test lifecycle to ensure performance and scalability, and maintain clear attribution of failures to enhance reliability and efficient debugging
  • Own Critical OTI Components and Migrations: Take ownership of the shared OTI components, including maintenance and on-call support
  • Own various offline test Modalities, including step code, workflow code, and general health
  • Lead the maintenance and development of common OTI tooling, including launching test evaluations, polling APIs, communicating results, and providing recommended pipeline templates
  • Establish Architecture and Best Practices: Define and enforce data management policies for the testing ecosystem (storage, lifecycling, write strategies, data integrity, and lineage)
  • Define use cases and feature design for new test modalities, including single versus cross-modality testing strategies
  • Manage incidents related to offline tests and maintain Standard Operating Procedures (SOPs) for PRs, local workflows, V&V, and releases
What we offer
What we offer
  • annual bonus
  • equity compensation
  • benefits
  • Fulltime
Read More
Arrow Right

Staff Software Engineer, Platform Infrastructure

We are seeking an experienced and highly motivated Staff Software Engineer to le...
Location
Location
United States , Mountain View
Salary
Salary:
189000.00 - 303000.00 USD / Year
aurora.tech Logo
Aurora Innovation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Senior or Staff-level experience (P7 equivalent) as a Software Engineer, ideally in infrastructure, developer tooling, or critical shared services
  • Proven experience leading technical projects and mentoring/directing other engineers
  • Familiarity with distributed compute technologies, cloud services (e.g., AWS), and large-scale workflow management systems
  • Demonstrated ability to triage, debug, and perform on-call and incident management for complex, cross-cutting infrastructure issues
  • Strong communication skills to manage stakeholder alignment and drive cross-team standardization efforts
Job Responsibility
Job Responsibility
  • Lead the OTI Team: Serve as the technical lead (TL) for the OTI team within PIE-Compute, driving the strategic vision, execution, and long-term stability of the core infrastructure
  • Help Define and Optimize the Testing Ecosystem: Lead the design of the next-generation offline testing architecture to meet diverse team needs, reducing redundancy and siloing across the organization
  • Partner with Test Creation and Test Drive teams to standardize end-to-end test execution and reporting (Creation -> Execution -> Reporting)
  • Refine the full test lifecycle to ensure performance and scalability, and maintain clear attribution of failures to enhance reliability and efficient debugging
  • Own Critical OTI Components and Migrations: Take ownership of the shared OTI components, including maintenance and on-call support
  • Own various offline test Modalities, including step code, workflow code, and general health
  • Lead the maintenance and development of common OTI tooling, including launching test evaluations, polling APIs, communicating results, and providing recommended pipeline templates
  • Establish Architecture and Best Practices: Define and enforce data management policies for the testing ecosystem (storage, lifecycling, write strategies, data integrity, and lineage)
  • Define use cases and feature design for new test modalities, including single versus cross-modality testing strategies
  • Manage incidents related to offline tests and maintain Standard Operating Procedures (SOPs) for PRs, local workflows, V&V, and releases
What we offer
What we offer
  • annual bonus
  • equity compensation
  • benefits
  • Fulltime
Read More
Arrow Right

Senior Systems Engineer - Infrastructure & Platform Reliability

Lambda, The Superintelligence Cloud, is a leader in AI cloud infrastructure serv...
Location
Location
United States , San Francisco; San Jose
Salary
Salary:
206000.00 - 310000.00 USD / Year
lambda.ai Logo
Lambda
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Have a keen interest in system design, architecting for performance, scalability, and experience with multiple cloud infrastructure platforms (AWS, GCP, Azure, etc.)
  • Think carefully about systems: edge cases, failure modes, behaviors, and specific implementations
  • Know and prefer configuration management systems and toolchains (Chef, Ansible, Terraform, GitHub Actions, etc.)
  • Have solid programming skills: Python, Go, etc.
  • Have an urge to collaborate and communicate asynchronously, combined with a desire to record and document issues and solutions
  • Have an enthusiastic, go-for-it attitude. When you see something broken, you can’t help but fix it
  • Have an urge for delivering quickly and effectively, and iterating fast
Job Responsibility
Job Responsibility
  • Design, write, and deliver software and services to improve the availability, scalability, reliability, and efficiency of Lambda’s internal IT systems and platforms
  • Solve problems relating to mission critical services and build automation to prevent problem recurrence with the goal of automating response to all non-exceptional events
  • Work with Lambda Engineering and internal teams to Influence and create new designs, architectures, standards, and methods for large-scale distributed systems
  • Engage in service capacity planning and demand forecasting, software performance analysis, and system tuning
  • Be an excellent communicator, producing documentation and related artifacts for the systems you are responsible for
What we offer
What we offer
  • Generous cash & equity compensation
  • Health, dental, and vision coverage for you and your dependents
  • Wellness and commuter stipends for select roles
  • 401k Plan with 2% company match (USA employees)
  • Flexible paid time off plan that we all actually use
  • Fulltime
Read More
Arrow Right

Senior Software Engineer - Platform Infrastructure

We are seeking a Senior Software Engineer II to architect, build, and operate se...
Location
Location
United States
Salary
Salary:
192200.00 - 225810.00 USD / Year
confluent.io Logo
Confluent
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years of experience in software engineering, SRE, or security engineering roles, with significant experience operating security platform services
  • Strong backend software development experience (Go, Java, Rust, Python)
  • Expertise with distributed systems, cloud infrastructure (AWS, GCP, Azure), Kubernetes, service mesh, and container orchestration
  • Strong understanding of security domains: IAM, OAuth2, OIDC, PKI, secrets management, policy engines, audit pipelines, zero trust architecture
  • Experience building highly reliable, observable, and resilient production systems
  • Operational expertise: SLOs, SLIs, error budgets, on-call leadership, incident management
  • Strong collaboration skills to drive alignment across engineering, security, and compliance stakeholders
  • Excellent communication skills with ability to influence technical and business leaders
  • BS, MS, or PhD in computer science or a related field, or equivalent work experience
Job Responsibility
Job Responsibility
  • Architect, design, and develop platform services with a strong focus on scalability, security, and developer experience
  • Lead operational design for reliability: build comprehensive observability, monitoring, and incident response automation into security-critical services
  • Build automation and tooling to drive self-healing systems, proactive risk detection, failure recovery, and continuous resilience testing
  • Collaborate with compliance, governance, and risk teams to translate regulatory and policy requirements into scalable technical controls
  • Lead technical design reviews, security architecture reviews, and incident postmortems for platform-level incidents
  • Mentor engineers across multiple disciplines on both security and operational best practices
  • Own end-to-end delivery of services: from initial design and development through deployment, production hardening, and lifecycle maintenance
What we offer
What we offer
  • Remote-First Work
  • Robust Insurance Benefits
  • Flexible Time Away
  • The Best Teammates
  • Experience Ambassadors
  • Open and Honest Culture
  • Well-Being and Growth
  • Offers Equity
  • Fulltime
Read More
Arrow Right

Senior-Staff Software Engineer, Platform Infrastructure

As a Senior Software Engineer on this team, you will help architect, design and ...
Location
Location
United States , San Mateo
Salary
Salary:
130000.00 - 280000.00 USD / Year
verkada.com Logo
Verkada
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Must have a BS, MS, or PhD in Computer Science, or similar technical field of study
  • Experience and enthusiasm for learning about new infrastructure products, features, and strategies
  • Comfortable with working at the frontier of infrastructure and software development
  • Experience in Python and/or Go
  • Experience with one of the major cloud platforms (preferably AWS)
  • Strong written and verbal communications
Job Responsibility
Job Responsibility
  • Identify and lead critical efforts related to scalability, reliability and efficiency
  • Influence the features and direction of our platform with your own ideas
  • Provide technical support for engineers on team
  • Align with product and org objectives, and coordinate with cross-functional teams on delivering key results
What we offer
What we offer
  • Healthcare programs that can be tailored to meet the personal health and financial well-being needs - Premiums are 100% covered for the employee under at least one plan and 80% for family premiums under all plans
  • Nationwide medical, vision and dental coverage
  • Health Saving Account (HSA) with annual employer contributions and Flexible Spending Account (FSA) with tax saving options
  • Expanded mental health support
  • Paid parental leave policy & fertility benefits
  • Time off to relax and recharge through our paid holidays, firmwide extended holidays, flexible PTO and personal sick time
  • Professional development stipend
  • Fertility stipend
  • Wellness/fitness benefits
  • Healthy lunches provided daily
  • Fulltime
Read More
Arrow Right

Senior ML Infrastructure Engineer, Inference Platform

About the Team: The ML Inference Platform is part of the AV ML Infrastructure or...
Location
Location
United States , Austin, Texas; Mountain View, California; Sunnyvale, California
Salary
Salary:
155420.00 USD / Year
gm.com Logo
General Motors
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of industry experience, with focus on machine learning systems or high performance backend services
  • Expertise in either Python, C++ or other relevant coding languages
  • Expertise in ML inference, model serving frameworks (triton, rayserve, vLLM etc)
  • Strong communication skills and a proven ability to drive cross-functional initiatives
  • Ability to thrive in a dynamic, multi-tasking environment with ever-evolving priorities
Job Responsibility
Job Responsibility
  • Design and implement core platform backend software components
  • Collaborate with ML engineers and researchers to understand critical workflows, parse them to platform requirements, and deliver incremental value
  • Lead technical decision-making on model serving strategies, orchestration, caching, model versioning, and auto-scaling mechanisms for highly optimized use of accelerators
  • Drive the development of monitoring, observability, and metrics to ensure reliability, performance, and resource optimization of inference services
  • Proactively research and integrate state-of-the-art model serving frameworks, hardware accelerators, and distributed computing techniques
  • Lead technical initiatives across GM’s ML ecosystem
  • Raise the engineering bar through technical leadership, establishing best practices
  • Contribute to open source projects
  • represent GM in relevant communities
What we offer
What we offer
  • medical
  • dental
  • vision
  • Health Savings Account
  • Flexible Spending Accounts
  • retirement savings plan
  • sickness and accident benefits
  • life insurance
  • paid vacation & holidays
  • tuition assistance programs
  • Fulltime
Read More
Arrow Right

Critical Environment Platform Engineer

As a Critical Environment Platform Engineer, you will perform a key role in deli...
Location
Location
United States , Redmond
Salary
Salary:
84200.00 - 165200.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree or Trade Certification in Computer Science, Information Technology, Mechanical Engineering, Electrical Engineering, Aerospace Engineering, Data Science, Cybersecurity, or related field OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Microsoft Cloud Background Check
  • Bachelor's Degree in Computer Science, Information Technology, or related field AND an internship in software engineering, network engineering, service engineering, or systems engineering OR equivalent experience
  • Familiarity with enterprise large-scale cloud or distributed systems
  • Demonstrate basic understanding of Configuration management with PowerShell DSC, Puppet, Chef or similar
  • Project or lab work with server & tooling platforms for multiple infrastructure services such as Windows Server 2022/2019, Linux Server distributions, Hyper-V, Azure Arc, MIT Kerberos, Active Directory, DNS, PowerShell/Python infrastructure-as-code scripting, data protection technologies, service monitoring, incident alarming, compliance, and other services
  • Basic understanding Windows Server OS-based operating systems distributions performing automated system installation and configuration, file system concepts, resource monitoring, user administration, package management, and process control & management
Job Responsibility
Job Responsibility
  • Support a hybrid Continuous Integration (CI)/Continuous Delivery/Deployment (CD) DevOps virtualized infrastructure consisting of Windows & Linux Server Operating System, Hyper-V, Active Directory, Domain Name System, PowerShell scripting, with a focus on data protection technologies, service metrics and Key Performance Indicator (KPI) reporting, documentation skills
  • Perform, maintain, and continuously improve automated operating system installation and configuration
  • Configure and maintain hands-on bare metal enterprise-class server systems, including automated periodic firmware and driver updates at scale, Redundant Array of Independent Disks (RAID) and Intelligent Platform Management Interface (IPMI) configurations and hardware troubleshooting
  • Monitor servers, ensure service levels and KPIs are met and maintained, provide security conscience outcomes maintaining compliance alignment, and ensure 24/7/365 service and infrastructure operations support with continuous optimizations
  • Transition manual operational processes to automation while leveraging CI/CD DevOps principles
  • Partner with a global team while helping bring projects to successful outcomes and delivering rigorous documentation artifacts
  • Share responsibility for automating, securing, configuring, and delivering support of the Critical Environments infrastructure & related programs and projects in existing and future datacenters
  • Embody our culture and values
What we offer
What we offer
  • Career Rotation Programs
  • Diversity & Inclusion trainings and events
  • professional certifications
  • Fulltime
Read More
Arrow Right