CrawlJobs Logo

Staff Site Reliability Engineer

India, Chennai · Job Posted June 09, 2026
Apply Position
Job Link Share

Job Description

Trimble is seeking a Staff Site Reliability Engineer (P4) to join our Corporate Business Systems team in Chennai. In this role, you will be a vital part of the team building the platform fueling Trimble’s digital transformation. We take a cloud-first approach to deliver customer-centric experiences and platform web services used by Trimble product teams and external partners. As a senior leader on this central team, you will provide technical and project leadership, interface with diverse software engineering groups, and support a variety of internal products and engineering divisions. This role requires quick thinking, a high degree of adaptability across different systems, and the ability to collaborate with multiple levels of staff globally.

Job Responsibility

  • Cloud Architecture & Enhancement: Develop new and enhance current shared public cloud services with a strict focus on Availability, Operations, Performance, Capacity, Security, and User Experience
  • Technical Leadership: Provide input and expertise relating to cloud hosting solutions (full infrastructure design and management). Transform business requirements into scalable operational designs
  • Collaboration & Planning: Attend and provide input on product planning sessions with internal development teams. Act as an expert on Business System services to communicate the value of our platform
  • Automation & Documentation: Identify and implement automation solutions. Develop and maintain critical documentation, including architecture diagrams, service descriptions, build/deploy processes, and operations run books
  • Mentorship & Support: Provide technical escalation and mentoring to other team members. Train operations teams to provide Level 1/2 support for shared public cloud services, acting as the ultimate Level 3 escalation point
  • Standards & Governance: Manage AWS/Azure best practice expectations and ensure alignment with corporate standards
  • Global Collaboration: Work effectively within a global team framework. Strike a balance between Indian and US time zones to attend business stakeholder meetings, address production issues, and serve as a reliable escalation point (including off-hours tasks when necessary)

Requirements

  • Bachelor’s Degree or equivalent in Computer Science, Engineering, Information Systems, or a related field
  • OR equivalent practical experience
  • Minimum of 10 years of experience in IT operations, including deep knowledge of networking, computing, and storage
  • Minimum of 5 years of experience with AWS and/or Azure cloud computing environments, with at least 2 years in an architect/design role
  • Windows and Linux deployment experience, including common services for each platform
  • Proficiency in at least one scripting language (preferably Python or Powershell/.NET) and proficiency utilizing Git as a source control system
  • Strong background in application operations, including Incident Management, Change Management, and Capacity Management
  • Excellent troubleshooting and problem-solving skills, knowledge of security best practices, a strong desire to learn independently, and exceptional written/verbal communication skills with a customer-service mindset

Nice to have

  • 7+ years of experience in IT with 3+ years of dedicated experience in both AWS and Microsoft Azure
  • Experience working within a regulatory change control environment (highly preferred)
  • Experience developing and deploying automation tooling
  • Experience supporting and integrating development tools, including Jira, Git, Jenkins, and Azure DevOps
  • Experience with Kubernetes deployment and operation
  • Experience with Operations tools, including Jira Service Management and various Observability & Monitoring platforms
  • Solid understanding of Software Development Life Cycle (SDLC) and Agile processes
  • Experience with AI/ML analysis tools is considered a strong asset

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Staff Site Reliability Engineer

8 matching positions

Staff Site Reliability Engineer

Fivetran is building data pipelines to power the modern data stack for thousands...
Location
Location
United States , Oakland
Salary
Salary:
196033.00 - 245041.50 USD / Year
fivetran.com Logo
Fivetran
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Expertise in managed Kubernetes (EKS, AKS, and GKE)
  • Expertise of Cloud Platforms and related tooling: AWS, Azure, GCP, Terraform, Ansible, Buildkite, Pulumi, and ArgoCD
  • Expertise in Python/Shell scripting
  • Expertise with Linux operating systems, internals, and administration
  • Expertise with cloud networking like VPNs, Privatelinks, and Private Service connect (GCP)
  • Experience with databases such as PostgreSQL
Job Responsibility
Job Responsibility
  • Responsible for ongoing reliability and robustness of Fivetran's production infrastructure by monitoring availability, capacity, and throughput
  • Evolve systems by adding reliability into our product roadmap
  • Coordinate the re-prioritize or fix critical bugs for support or sales requirements as needed
  • Make recommendations to production infrastructure by interfacing with engineering to ensure 100% availability
  • Ensure scalable artifacts deployment to all environments by automation scripts
  • Constantly monitor infrastructure vulnerabilities and remedy them by working with the security team
What we offer
What we offer
  • 100% employer-paid medical insurance
  • Generous paid time-off policy (PTO), plus paid sick time, inclusive parental leave policy, holidays, and volunteer days off
  • RSU stock grants
  • Professional development and training opportunities
  • Company virtual happy hours, free food, and fun team-building activities
  • Monthly cell phone stipend
  • Access to an innovative mental health support platform that offers personalized care and resources in areas such as: therapy, coaching, and self-guided mindfulness exercises for all covered employees and their covered dependents
  • Fulltime
Read More
Arrow Right

Staff Site Reliability Engineer

Fivetran is looking for a high-performance engineer to join a team of Site Relia...
Location
Location
Serbia , Novi Sad
Salary
Salary:
Not provided
fivetran.com Logo
Fivetran
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of experience working with SaaS platforms at scale
  • Expertise in managed Kubernetes (EKS, AKS, and GKE)
  • Knowledge of Cloud Platforms and related tooling: AWS, Azure, GCP, Terraform, Ansible, Buildkite, Pulumi, and ArgoCD
  • Experience in Python, Shell scripting, and Go
  • Experience with Linux operating systems, internals, and administration
  • Experience with cloud networking like Managed NAT Gateways, VPNs, Privatelinks, and Private Service Connect (GCP)
  • Experience with databases such as PostgreSQL
Job Responsibility
Job Responsibility
  • Responsible for the ongoing reliability and robustness of Fivetran’s production infrastructure by monitoring availability, capacity, and throughput
  • Collaborate with engineering teams to integrate reliability best practices into the product roadmap
  • Support the prioritization and resolution of critical bugs identified by support or sales
  • Contribute to maintaining the high reliability and availability of production infrastructure by collaborating with engineering to implement automation for scalable deployments
  • Ensure scalable artifacts deployment to all environments through automation scripts
  • Proactively monitor infrastructure vulnerabilities and collaborate with the security team to promptly address them
What we offer
What we offer
  • 100% employer-paid medical insurance
  • Generous paid time-off policy (PTO), plus paid sick time, inclusive parental leave policy, holidays, and volunteer days off
  • RSU stock grants
  • Professional development and training opportunities
  • Company virtual happy hours, free food, and fun team-building activities
  • Monthly cell phone stipend
  • Access to an innovative mental health support platform that offers personalized care and resources in areas such as: therapy, coaching, and self-guided mindfulness exercises for all covered employees and their covered dependents
Read More
Arrow Right

Staff Site Reliability Engineer

As a Staff Site Reliability Engineer, you will be a technical leader and strateg...
Location
Location
Singapore; Australia , Singapore; Melbourne
Salary
Salary:
Not provided
airwallex.com Logo
Airwallex
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of experience in SRE, DevOps, or infrastructure engineering roles, with progressive responsibility
  • Proven ability to lead SRE strategy and execution for large-scale, complex, cross-functional projects
  • Deep expertise with cloud platforms (AWS/GCP), Kubernetes, container orchestration, observability, and incident response frameworks
  • Strong experience supporting production systems with stringent high availability, compliance, and security requirements
  • Demonstrated leadership in mentoring and growing technical teams
  • Excellent collaboration and communication skills, able to influence stakeholders at all levels
  • Degree in Computer Science or related field
Job Responsibility
Job Responsibility
  • Drive the strategic vision and roadmap for Site Reliability Engineering at Airwallex, aligned with business objectives and product goals
  • Architect and oversee the implementation of highly scalable, secure, and resilient cloud infrastructure for new services and platform-wide initiatives
  • Lead and mentor senior engineers and cross-functional teams in reliability engineering best practices, automation, and incident management
  • Champion and evolve operational excellence through advanced observability, SLO management, runbooks, and proactive risk mitigation
  • Lead incident response for high-severity incidents, facilitating post-mortems and driving continuous improvements
  • Collaborate closely with Product, Engineering, Security, and DevOps leadership to ensure compliance, resilience, and alignment across functions
  • Influence and shape engineering culture around reliability, scalability, and DevOps principles across multiple teams
  • Advocate for innovation in tooling, automation, and infrastructure to improve developer productivity and service uptime
  • Fulltime
Read More
Arrow Right

Staff Site Reliability Engineer

Ever since we started in 2007, Sunrun has been at the forefront of connecting pe...
Location
Location
United States , Lehi
Salary
Salary:
242050.00 USD / Year
sunrun.com Logo
Sunrun
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s in Computer Information Systems, Software Engineering or closely related
  • 5 years of experience as a Software Developer using Microservices hosted in Azure
  • 5 years of experience with Virtualization and cloud computing
  • 5 years of experience with Object Oriented Design (OOD) & and Object-Oriented Programming (OOP)
  • 5 years of experience building software solutions in an engineering environment using Python & Shell scripting
  • 5 years of experience with Network analysis, debugging and troubleshooting with Wireshark & Git
Job Responsibility
Job Responsibility
  • Provide strategic leadership in designing, implementing, and managing the overall infrastructure strategy for our organization
  • Leverage cloud platforms (e.g., AWS, Azure) to design, deploy, and manage scalable infrastructure solutions
  • Spearhead the definition of advanced monitoring requirements and elevate SLAs
  • Collaborate with the engineering team and TPM to implement and enhance monitoring practices
  • Expertly convey intricate technical information to diverse stakeholders with clarity and precision
  • Provide leadership in integrating advanced SRE principles into applications and services
  • Lead the implementation of sophisticated system design measures for heightened security, performance, and resiliency
  • Develop strategic notification strategies for production outages
  • Leverage SLOs and SLIs to measure and optimize availability, latency, and response time
  • Lead and strategize emergency response efforts, conduct retrospectives with RCA, and manage on-call workloads effectively
What we offer
What we offer
  • Medical/Dental/Vision Insurance
  • Life Insurance
  • Disability Insurance
  • 401k Plan + Company Match
  • Stock Purchase Plan
  • Paid Vacations/Holidays
  • Paid Baby Bonding Leave
  • Employee Discounts
  • PowerU - 100% Funded Education Programs
  • Employee Donation Matching
  • Fulltime
Read More
Arrow Right

Staff Site Reliability Engineer

Join our Site Reliability Engineering (SRE) team and help ensure the reliability...
Location
Location
United States
Salary
Salary:
220000.00 - 325000.00 USD / Year
replit.com Logo
Replit
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8-10 years of experience in Site Reliability Engineering or similar roles (e.g., DevOps, Systems Engineering, Infrastructure Engineering)
  • Strong programming skills in languages like Python or Go
  • Deep understanding of distributed systems
  • Deep experience with container orchestration platforms, specifically Kubernetes, and cloud-native technologies
  • Proven track record of designing, implementing, and maintaining sophisticated monitoring and observability solutions
  • Strong incident management skills with extensive experience leading incident response for complex systems
  • Experience with infrastructure as code (e.g., Terraform, Pulumi) and configuration management tools
  • Excellent written and verbal communication skills
  • Strong interpersonal skills, with experience working with and mentoring engineers
  • A willingness to dive into understanding, debugging, and improving any layer of the stack
Job Responsibility
Job Responsibility
  • Architect and Implement Observability
  • Define and Drive Reliability Standards
  • Lead Incident Management and Response
  • Drive Automation and Infrastructure as Code
  • Optimize Performance on Kubernetes
  • Debug and Harden Distributed Systems
  • Provide Staff-Level Guidance
  • Educate and Mentor
  • Build and Integrate
What we offer
What we offer
  • Competitive Salary & Equity
  • 401(k) Program with a 4% match
  • Health, Dental, Vision and Life Insurance
  • Short Term and Long Term Disability
  • Paid Parental, Medical, Caregiver Leave
  • Commuter Benefits
  • Monthly Wellness Stipend
  • Autonomous Work Environment
  • In Office Set-Up Reimbursement
  • Flexible Time Off (FTO) + Holidays
  • Fulltime
Read More
Arrow Right

Staff Site Reliability Engineer

We are looking for a Site Reliability Engineer to own our internal systems infra...
Location
Location
United States , Sunnyvale
Salary
Salary:
175000.00 - 250000.00 USD / Year
figure.ai Logo
Figure
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong experience with Linux/Unix systems administration
  • Proficiency in programming/scripting
  • Extensive experience with cloud platforms (Azure, AWS, GCP) and on-prem hardware architectures
  • Experience designing, deploying, and operating high-availability, fault-tolerant, and distributed systems
  • Mastery of infrastructure as code (Terraform, CloudFormation, Ansible…)
  • Familiarity with monitoring, logging, and alerting tools (Prometheus, Grafana, Datadog…)
  • Solid understanding of networking fundamentals (TCP/IP, DNS, HTTP, load balancers, firewalls)
  • Experience defining Service Level Objectives (SLO), developing runbooks/incident response plans, facilitating post-mortems and managing systems assets
  • Ability to work in cross-functional teams with developers, infra, and product teams
  • Excellent verbal and written communication skills
Job Responsibility
Job Responsibility
  • Be the go to person for mission critical infrastructure enabling critical operations such as Source Configuration Management, CI/CD systems, software distribution, supplier portals, manufacturing and more
  • Migrate SaaS to self-hosted solutions to enhance security and reliability
  • Implement monitoring and alerting systems, and define incident response plans and runbooks
  • Reduce human workload through automation to automate deployment and scaling
  • Establish strong relationships with stakeholders to identify infrastructure needs and establish Service Level Objectives
  • Use a data driven approach to demonstrate service robustness and track optimization work
  • Partner with the security team to ensure that security remediations and updates are applied in a timely manner
  • Fulltime
Read More
Arrow Right

Staff Site Reliability Engineer

Affirm is reinventing credit to make it more honest and friendly, giving consume...
Location
Location
Spain
Salary
Salary:
101000.00 - 131000.00 EUR / Year
affirm.com Logo
Affirm
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience designing, developing, advocating as a point subject of reference, and launching backend systems at scale using scripting and development languages like Bash, Python or Kotlin
  • Extensive track record of developing highly available distributed systems using technologies like AWS, MySQL, Spark and Kubernetes
  • Track record of managing, driving and improving the Incident Livecycle process from live incident management through retrospective and post-incident analysis to provide actional insights to enhance overall system reliability, resilience, and performance
  • 7+ years experience in Site Reliability or Production Engineering teams
  • Experience delivering major features, system components or deprecating existing functionality in a system through the definition of a technical and execution plan
  • Ability to write high quality code that is easily understood and used by others
  • Strong verbal and written communication skills that support effective collaboration with our global engineering team and key stakeholders of an organization
  • Equivalent practical experience or a Bachelor’s degree in a related field
  • Based in Spain for the role
Job Responsibility
Job Responsibility
  • Set technical strategy vision for your team on a multi year-long time scale, and help your team tie it together with critical, business-impacting projects
  • Collaborate across teams in the product development lifecycle by collaborating with infrastructure, product management, developer experience & analytics to ensure technical sustainability, risks and trade-offs are well understood and managed
  • Act as a force-multiplier for your team through your definition and advocacy of technical solutions and operational processes
  • Take ownership of your team’s operations and availability by ensuring you have the right monitoring, triage rotations, playbooks, policies, testing and alerting in place to support “keep the lights on” & on-call efforts
  • Foster a culture of quality and ownership on your team by setting code review and design standards for your team, and advocating for them beyond your team through your writing and tech talks
  • Help develop talent on your team by providing feedback and guidance, and leading by example
  • Participate in an on-call rotation
What we offer
What we offer
  • Flexible Spending Wallets for tech, food and lifestyle
  • Away Days - wellness days to take off work and recharge
  • Learning & Development programs
  • Parental benefit
  • Employee Resource & Community Groups
  • Health care coverage - Affirm covers all premiums for all levels of coverage for you and your dependents
  • Flexible Spending Wallets - generous stipends for spending on Technology, Food, various Lifestyle needs, and family forming expenses
  • Time off - competitive vacation and holiday schedules allowing you to take time off to rest and recharge
  • ESPP - An employee stock purchase plan enabling you to buy shares of Affirm at a discount
  • Visa sponsorship
  • Fulltime
Read More
Arrow Right

Staff Site Reliability Engineer

Site Reliability Engineering at Affirm is a small, yet crucial, team that helps ...
Location
Location
Poland
Salary
Salary:
358000.00 - 458000.00 PLN / Year
affirm.com Logo
Affirm
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience designing, developing, advocating as a point subject of reference, and launching backend systems at scale using scripting and development languages like Bash, Python or Kotlin
  • Extensive track record of developing highly available distributed systems using technologies like AWS, MySQL, Spark and Kubernetes
  • Track record of managing, driving and improving the Incident Livecycle process from live incident management through retrospective and post-incident analysis to provide actional insights to enhance overall system reliability, resilience, and performance
  • 7+ years experience in Site Reliability or Production Engineering teams
  • Demonstrate curiosity with empathy, and strong opinions loosely held
  • Experience delivering major features, system components or deprecating existing functionality in a system through the definition of a technical and execution plan
  • Write high quality code that is easily understood and used by others
  • Thrive in ambiguity, and are comfortable moving from low level language idioms all the way to the architecture of large systems to understand how they work
  • Growth and impact trajectory demonstrates that you have mastered gathering and iterating on feedback from your engineering and cross-functional peers
  • Strong verbal and written communication skills that support effective collaboration with our global engineering team and key stakeholders of an organization
Job Responsibility
Job Responsibility
  • Set technical strategy vision for your team on a multi year-long time scale, and help your team tie it together with critical, business-impacting projects
  • Collaborate across teams in the product development lifecycle by collaborating with infrastructure, product management, developer experience & analytics to ensure technical sustainability, risks and trade-offs are well understood and managed
  • Act as a force-multiplier for your team through your definition and advocacy of technical solutions and operational processes
  • Take ownership of your team’s operations and availability by ensuring you have the right monitoring, triage rotations, playbooks, policies, testing and alerting in place to support “keep the lights on” & on-call efforts
  • Foster a culture of quality and ownership on your team by setting code review and design standards for your team, and advocating for them beyond your team through your writing and tech talks
  • Help develop talent on your team by providing feedback and guidance, and leading by example
What we offer
What we offer
  • Flexible Spending Wallets for tech, food and lifestyle
  • Away Days - wellness days to take off work and recharge
  • Learning & Development programs
  • Parental leave
  • Employee Resource & Community Groups
  • Health care coverage - Affirm covers all premiums for all levels of coverage for you and your dependents
  • Flexible Spending Wallets - generous stipends for spending on Technology, Food, various Lifestyle needs, and family forming expenses
  • Time off - competitive vacation and holiday schedules allowing you to take time off to rest and recharge
  • ESPP - An employee stock purchase plan enabling you to buy shares of Affirm at a discount
  • Fulltime
Read More
Arrow Right