CrawlJobs Logo

Cloud Solution Architecture - SRE

https://www.microsoft.com/ Logo

Microsoft Corporation

Location Icon

Location:
India , Multiple Locations

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

Do you have a passion for partnering with fast‑growing Software Development Companies (SDCs) and acting as their trusted technical advocate to ensure they receive the highest‑level experience from Microsoft? Support for Mission Critical is seeking deep technical architects aligned to SDC customers who are undergoing—or anticipating—hyper‑growth and increasing operational complexity. In this role, you will have visibility across Microsoft to help customers achieve maximum value not only from Azure, but across the broader Microsoft ecosystem, including AI, Security, M365, and Data platforms. Target customers operate at enterprise scale and require advanced capabilities in resiliency, AI‑driven systems, observability, workload expansion, and operational excellence. You will guide customers toward enhanced reliability, security, performance, capacity, and intelligent operations, correlating customer signals, telemetry, and platform events into actionable, outcome‑driven recommendations. This role emphasizes rapid response, customer advocacy, deep technical engagement, and the adoption of modern practices such as AI‑assisted operations, proactive monitoring, and resilience engineering.

Job Responsibility:

  • Customer Advocacy & Technical Leadership: Actively listen and empathize with customers to anticipate their technical and business needs, advocate for them within Microsoft, and measure success through customer satisfaction, system reliability, and operational excellence
  • Serve as a senior technical leader, driving vision for customers and internal teams
  • pilot new operating models, AI‑enabled capabilities, and data‑driven practices
  • scale proven architectures and patterns
  • and mentor others to elevate technical depth across the organization
  • Resiliency, Reliability & Operational Excellence: Apply a reliability‑first mindset, designing and validating highly available, fault‑tolerant systems through proactive testing, failure simulations, chaos engineering, and resilience reviews
  • Guide customers in defining and achieving SLOs, SLIs, and error budgets, with clear accountability and measurable outcomes
  • Drive continuous improvement by going beyond traditional root‑cause analysis to understand systemic, architectural, and organizational contributors to incidents
  • Monitoring, Observability & Intelligent Operations: Lead adoption of modern monitoring and observability practices, including distributed tracing, metrics, logs, and end‑to‑end service health visibility across complex, distributed systems
  • Correlate telemetry, customer signals, and platform events to produce actionable insights, risk identification, and proactive recommendations
  • Promote automation and AI‑assisted approaches for incident detection, triage, and remediation, reducing MTTR and escalation frequency
  • AI‑Enabled Architectures & Co‑Innovation: Advise customers on designing and operating AI‑enabled and data‑driven workloads, integrating Azure AI services and platform capabilities into mission‑critical architectures
  • Partner with customers and Microsoft Engineering to enable co‑innovation, applying AI to improve product reliability, operational efficiency, and customer experience
  • Cross‑Team Collaboration & Executive Engagement: Build strong bridges across Microsoft, working seamlessly with Engineering, Customer Success, Sales, and Support teams to align technical strategy with business outcomes
  • Communicate complex technical concepts in clear, actionable terms, fostering trusted relationships with customer senior decision‑makers (CTOs, product owners, engineering leaders) and Microsoft stakeholders
  • Define and execute account strategies for enterprise‑scale customers experiencing hyper‑growth, aligning priorities with organizational KPIs
  • Outcome Measurement & Impact: Measure and demonstrate success through resiliency targets, observability maturity, AI adoption, escalation prevention, customer satisfaction, impact avoidance, and business outcomes
  • Highlight outcome‑based results that clearly differentiate the value delivered by the team

Requirements:

  • Bachelors Degree in Computer Science, Information Technology, Engineering, Business, Liberal Arts, or related field AND 7+ years experience in cloud/infrastructure technologies, information technology (IT) consulting/support, systems administration, network operations, software development/support, technology solutions, practice development, architecture, and/or consulting OR equivalent experience
  • Deep proficiency in cloud, software, ISV, or consulting ecosystems
  • Strong technical depth, including level‑500 expertise in at least one Azure domain, with broad familiarity across the Azure platform
  • Experience with AI services and the Microsoft ecosystem, including Security, M365, Data, and AI platforms
  • Proven ability to design, operate, and troubleshoot complex, highly available, mission‑critical systems and to lead customer escalations effectively
  • Demonstrated experience with monitoring, observability, and reliability engineering practices in large‑scale distributed systems
  • Software development experience, including AI‑enabled solutions, and strong understanding of DevOps and CI/CD practices
  • Exceptional communication, stakeholder management, and relationship‑building skills

Nice to have:

  • Advanced degree and/or certifications such as PMP, SRE, or equivalent are a plus
  • Experience launching products, platforms, or support offers at enterprise scale

Additional Information:

Job Posted:
March 18, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Cloud Solution Architecture - SRE

Cloud Solution Architect

Sopra Steria offers tailored technology solutions and seeks a Cloud Solution Arc...
Location
Location
Netherlands , Nieuwegein
Salary
Salary:
Not provided
https://www.soprasteria.com Logo
Sopra Steria
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Completed HBO education in the direction of IT
  • knowledge of Kubernetes, OpenShift, Cloud native (lambda’s, functions)
  • knowledge of Amazon AWS, Microsoft Azure, RedHat
  • experience with Terraform, Bicep, Ansible
  • expertise in Grafana, Prometheus, Open Telemetry, Dynatrace
  • software delivery experience with Azure DevOps, GitHub, GitLab
  • application deployment using GitOps, HELM, ArgoCD
  • security tools such as SonarCloud, Snyk, Checkmarx, Dependency Track, Gitlab Ultimate
  • familiarity with Dev(sec)ops and SRE practices
  • fluent in Dutch and English.
Job Responsibility
Job Responsibility
  • Design, deploy, and manage cloud-based solutions using virtualisation, containerisation, and orchestration tools like Kubernetes and Docker
  • develop secure, compliant, efficient, and optimised cloud solutions
  • work with cloud-native development, serverless architectures, and DevOps practices
  • collaborate across teams and departments
  • enhance system performance, scalability, and security
  • drive innovation and efficiency in cloud environments.
What we offer
What we offer
  • Career development opportunities through the Sopra Steria Academy
  • mobility options including a company car
  • insurance coverage
  • meal vouchers
  • eco-cheques
  • team events.
  • Fulltime
Read More
Arrow Right

Senior Cloud Architect

Strategic and hands-on role focused on evolving CHUB’s architecture to support m...
Location
Location
United States , Bothell; Bellevue
Salary
Salary:
102000.00 - 184000.00 USD / Year
https://www.t-mobile.com Logo
T-Mobile
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience in software engineering and architecture roles
  • 3+ years passionate about AWS cloud solutions
  • Strong experience designing scalable distributed systems, microservices, and event-driven architectures
  • Hands-on experience with AWS services such as RDS, Lambda/Step Functions, S3, DynamoDB, ElastiCache, Elasticsearch, and Neptune
  • Proficiency with Infrastructure as Code tools (Terraform, CloudFormation), CI/CD pipelines, and Git-based workflows
  • Proven understanding of cloud security, scalability, high availability, and cost optimization principles
  • Background in Java/Spring Boot or similar frameworks
  • Sophisticated user in observability tools (e.g., Splunk, SignalFX, CloudWatch, Prometheus, Grafana)
  • Strong communication and leadership skills
  • able to influence technical direction across teams
Job Responsibility
Job Responsibility
  • Provide architectural leadership during the product intake phase—evaluating options, identifying risks, and helping define scalable solution designs
  • Own solution architecture across multiple parallel initiatives, ensuring design consistency and quality across CHUB components
  • Own the modernization of CHUB systems by driving cloud-native design and migration strategies using AWS services
  • Establish reusable cloud reference architectures aligned to AWS Well-Architected Framework and CHUB's technical vision
  • Partner with engineering, SRE, and platform teams to implement cloud infrastructure and services using Infrastructure as Code (IaC) and CI/CD automation
  • Coach and mentor engineers across CHUB in cloud architecture, security, resiliency, and observability practices
  • Participate in and lead design reviews, documentation efforts, and architectural governance processes
  • Promote technical excellence through mentorship, sharing knowledge and best practices, and documentation of architectural decisions
  • Support cloud cost optimization and recommend improvements to increase operational efficiency
  • Participate in incident reviews and root cause analysis for cloud-based systems
What we offer
What we offer
  • Competitive base salary and compensation package
  • Annual stock grant
  • Employee stock purchase plan
  • 401(k)
  • Access to free, year-round money coaches
  • Annual bonus or periodic sales incentive or bonus
  • Medical, dental and vision insurance
  • Flexible spending account
  • Paid time off and up to 12 paid holidays
  • Paid parental and family leave
  • Fulltime
Read More
Arrow Right

Staff Site Reliability Engineer

At Ledger, we are looking for an experienced Reliability Engineer to join our SR...
Location
Location
France , Paris
Salary
Salary:
Not provided
https://www.ledger.com Logo
Ledger
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years on cloud engineering at scale, on organizations operating SaaS solutions
  • Proficiency in working in Unix/Linux environments, Git, Python, Terraform, Kubernetes, AWS cloud solutions and architectures, CI/CD tools, Argocd, Ansible, configuration management, etc.
  • Strong knowledge on observability practices, with experience implementing and managing Logging, Monitoring and Alerting framework with solutions such as Datadog or Prometheus/Grafana/Loki.
  • Experience of cross-functional work and the ability to demonstrate a collaborative approach with regards to building key relationships across the organization and define projects scope, goals, plan and deliverables
  • Customer focused with the ability to identify and understand both internal and external customer's needs
  • Creative problem-solving and analysis skills with an ability to identify, develop, and implement solutions to meet the needs of the business
  • Excellent presentation and written communication
  • Ability to deal with ambiguity, high level of pressure and rapidly changing environments
  • Engineering degree.
Job Responsibility
Job Responsibility
  • Participate in building a DevOps / SRE culture and enable the transition to modern infrastructure management and deployment practices
  • Participate in building the SRE team roadmap (vision and delivery accountability). Anticipate stakeholder needs, game-changing technologies emergence and challenge scope / deadlines
  • Perform integration of platform software components
  • Participate to design and deliver solutions to improve the availability, scalability, latency, and efficiency of systems
  • Influence and create standards & best practices in support of service level objectives
  • Automate key SRE metrics including SLOs/SLAs and error budgets
  • Provide expert support to our level-2/application support team, to troubleshoot priority incidents, and conduct post-mortems
  • Apply analytics on past incidents and usage patterns to predict issues and take proactive actions
  • Ensure control of technical debt and promote quality practices
  • Follow SRE and chaos engineering approaches across all strategic systems to predict in coordination with Service Design and prevent outages and improve solution availability
What we offer
What we offer
  • Equity: Employees are the foundation of our success, and we award stock options so you can share in that success as we grow
  • Flexibility: A hybrid work policy
  • Social: Annual company outing for Ledgerdary Days, plus frequent social events, snacks and drinks
  • Medical: Comprehensive health insurance policy offering extensive medical, dental and vision care coverage
  • Well-being: Personal development, coaching & fitness with our dedicated partners
  • Vacation: Five weeks of paid leave per year, in addition to national holidays and rest & relaxation (RTT) days
  • High tech: Access to high performance office equipment and gadgets, including Apple products
  • Transport: Ledger reimburses part of your preferred means of transportation
  • Discounts: Employee discount on all our products.
  • Fulltime
Read More
Arrow Right

SRE DevOps Automation Engineer

SRE DevOps Automation Engineer role at Hewlett Packard Enterprise focusing on bu...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in computer science, engineering, information systems, or closely related quantitative discipline
  • Typically 3-6 years' experience
  • Strong Experience of DevOps practices like CI/CD, infrastructure as code, containerization, and orchestration using Kubernetes
  • Strong programming skills in Python, Java, Golang, or JavaScript
  • Experience with cloud-native applications, developer tools, managed services, and next-generation databases
  • Experience in managing public cloud Infrastructure
  • Good written and verbal communication skills and agile in a changing environment
Job Responsibility
Job Responsibility
  • Designs and develop moderate to complex cloud application modules per feature specifications adhering to security policies
  • Deploy cloud-based systems and applications code using continuous integration/deployment (CI/CD) pipelines to automate cloud applications' management, scaling, and deployment
  • Identifies debugs and creates solutions for issues with code and integration into application architecture
  • Develops and executes comprehensive test plans for features adhering to performance, scale, usability, and security requirements
  • Contributes towards innovation and integration of new technologies into projects
What we offer
What we offer
  • Health & Wellbeing benefits
  • Personal & Professional Development programs
  • Unconditional Inclusion environment
  • Comprehensive suite of benefits supporting physical, financial and emotional wellbeing
  • Fulltime
Read More
Arrow Right

DevOps Engineer

As a DevOps Engineer, you will be responsible for designing, implementing, and m...
Location
Location
Chile , Santiago
Salary
Salary:
Not provided
topsort.com Logo
Topsort
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or Master's degree in Computer Science, Engineering, or a related field
  • 3+ years of experience in DevOps / Platform / SRE roles, designing and maintaining cloud architectures
  • Strong experience with AWS and infrastructure as code (Terraform, Pulumi, CloudFormation, etc.)
  • Proficiency in containers and orchestration (Docker, Kubernetes)
  • Solid experience in cloud networking and security (VPCs, subnets, ACLs, VPNs)
  • Experience with service mesh (Istio, Consul, Linkerd) and API platforms (Kong, KrakenD, Tyk, Apigee)
  • Experience implementing and maintaining CI/CD pipelines (GitHub Actions, Tekton, ArgoCD, etc.)
  • Deep knowledge of Linux/Unix and scripting (Bash)
  • Strong programming skills in Python and Golang
  • Experience with monitoring and observability tools (openTelemetry, Prometheus, Grafana, ELK)
Job Responsibility
Job Responsibility
  • Design and maintain scalable cloud architectures, implementing solutions that ensure high availability and optimal performance
  • Develop and enhance our CI/CD pipelines, automating processes to increase the efficiency of the development team
  • Implement and manage monitoring and observability systems to proactively identify and resolve issues
  • Lead continuous improvement initiatives in our DevOps practices and system architecture
  • Collaborate with cross-functional teams to translate business needs into robust technical solutions
  • Mentor other team members and promote best engineering practices
Read More
Arrow Right

SRE DevOps Automation Engineer

SRE DevOps Automation Engineer role at Hewlett Packard Enterprise focusing on cl...
Location
Location
India
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in computer science, engineering, information systems, or closely related quantitative discipline
  • Master's desirable
  • Typically 3-6 years' experience
  • Strong Experience of DevOps practices like CI/CD, infrastructure as code, containerization, and orchestration using Kubernetes
  • Strong programming skills in Python, Java, Golang, or JavaScript
  • Experience with cloud-native applications, developer tools, managed services, and next-generation databases
  • Experience in managing public cloud Infrastructure
  • Good written and verbal communication skills and agile in a changing environment
Job Responsibility
Job Responsibility
  • Designs and develop moderate to complex cloud application modules per feature specifications adhering to security policies
  • Deploy cloud-based systems and applications code using continuous integration/deployment (CI/CD) pipelines to automate cloud applications' management, scaling, and deployment
  • Identifies debugs and creates solutions for issues with code and integration into application architecture
  • Develops and executes comprehensive test plans for features adhering to performance, scale, usability, and security requirements
  • Contributes towards innovation and integration of new technologies into projects
What we offer
What we offer
  • Health & Wellbeing benefits
  • Personal & Professional Development programs
  • Unconditional Inclusion environment
  • Fulltime
Read More
Arrow Right

Sre design & support engineer

We are looking for a self-driven, software engineering mindset SRE engineer to •...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
pepsico.com Logo
Pepsico
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8-11 years of work experience evolving to a SRE engineer
  • 3-5 years of experience in continuously improving and transforming IT operations ways of working
  • Bachelor’s degree in Computer Science, Information Technology or a related field
  • Proven experience as an SRE in designing the events diagnostics, performance measures and alert solutions to meet the SLA/SLO/SLIs
  • The ideal Engineer will be highly quantitative, have great judgment, able to connect dots across ecosytems, and efficiently work cross-functionally across teams to ensure SRE orchestrating solutions are meeting customer/end-user expectations
  • The candidate will take a pragmatic approach resolving incidents, including the ability to systemically triangulate root causes and work effectively with external and internal teams to meet objectives
  • A strong expertise of SRE (Software Reliability Engineering) and IT Service Management (ITSM) processes with a track record for improving service offerings – pro-actively resolving incidents, providing a seamless customer/end-user experience and proactively identifying and mitigating areas of risk
  • Hands on experience in Python, SQL /No-SQl( MySQL, Mongo DB, Cassandra, Postgress), AppDynamics, ELK Stack Grafana, Splunk, Dynatrace, Kafka and any SRE Ops toolsets
  • A firm understanding of cloud archticture for distributed environments
  • Front-end technologies: HTML, CSS, JavaScript, and frameworks like React, Angular, or Vue.js
Job Responsibility
Job Responsibility
  • Engage & influence product and engineering teams during the design and development phases to embed reliability and operability into new services defining & enforce events, logging, monitoring, and observability standards across applications
  • Ensuring non-functional requirements (NFRs) are embedded early including SLA/SLO/SLI and error budgets into the product’s offerings as part of the engineering solution
  • Execute as Pro-active SRE Support engineer, preventing P1, P2, potential P3s, diagnosing any anomalies prior to any user and driving the necessary remediations across the teams involved in end-to-end ecosystem availability, performance and consumption of the cloud architected application ecosystem leveraging SRE Orchestration solutions
  • Collaborates with Engineering & support teams, including participation in escalations, , and blameless postmortems,
  • Work closely with customer-facing support teams to empower them with SRE insights and tooling
  • Observe, diagnose & improve the end-2-end ecosystem performance of the Modern architected application portfolio i.e. technical “understanding of interactions" of a full stack application alongside with peer SRE team member
  • Continuously optimize the L2/support operations work via SRE workflow automation
  • Shape the SRE orchestration platform design with inputs from Production Operations, Business usage & Product and engineering teams
  • Actively engage and drive AI Ops adoption across teams
Read More
Arrow Right

Cloud Solution Architect

The Senior Cloud Solution Architect (CSA) with POD Lead responsibilities is acco...
Location
Location
Italy , Milano, Milan
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven experience in IT consulting, cloud architecture, or technical delivery leadership
  • Proven technical expertise across Microsoft Cloud services (Azure, M365, or Security solutions)
  • Ability to understand and apply AI concepts within customer and operational contexts
  • Strong project management background, including governance, reporting, and delivery lifecycle management
  • Demonstrated experience in leading distributed teams and coordinating cross-functional delivery with partners and vendors
  • Solid understanding of ITIL, DevOps, and operational frameworks (change management, incident response, SLAs, KPIs)
  • Ability to communicate effectively with executive, technical, and operational stakeholders
  • Excellent written communication and documentation skills
  • able to produce high-quality customer deliverables
  • Experience leading or managing large-scale delivery teams under managed services or customer success models
Job Responsibility
Job Responsibility
  • Partner with Secured CSU leaders, delivery partners, and global/national CSA teams enabling Success Program Deliveries across our Secured customer base in your assigned markets
  • Serve as a technical authority within the POD, providing architectural guidance, best practice validation, and delivery assurance across multiple Microsoft solution areas (Azure, Security, AI, or Modern Work)
  • Conduct random delivery audits, reviewing customer deliverables for accuracy, alignment to frameworks, and technical depth
  • Lead key customer engagements to set delivery standards, ensuring technical excellence and knowledge transfer
  • Partner with SMEs and IP Leads to pilot new content and improve delivery methodologies based on field insights
  • Maintain the Rhythm of Business (ROB) cadence by facilitating bi-weekly POD syncs, monthly dashboards, and quarterly business reviews to ensure visibility, accountability, and performance tracking
  • Monitor vendor CSA utilization, engagement compliance, and CSAT performance, applying short-term corrective actions and operational optimizations based on data insights
  • Drive delivery lifecycle governance, ensuring timely kickoffs, milestone tracking, deliverables submission, and closeouts fully aligned with Success Program SLAs and documentation standards
  • Oversee workload planning and capacity management in collaboration with Resource Coordinators (RCs) and Service Delivery Managers (SDMs) to balance priorities and ensure effective allocation
  • Track and report key performance indicators (KPIs)—including utilization, CSAT, audit coverage, compliance, and billable contribution—to assess delivery health and inform decision-making
  • Fulltime
Read More
Arrow Right