CrawlJobs Logo

Site Reliability Engineer 3

trimble.com Logo

Trimble Inc.

Location Icon

Location:
India , Chennai

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

We are seeking a skilled and motivated Site Reliability Engineer to join our team in Trimble’s Core Cloud Platform. The ideal candidate will have a strong background in cloud platforms, infrastructure as code, and automation via programming/scripting languages. You will embed with a product delivery team to drive the reliability, scalability, and security of the team’s services and infrastructure. The Core Cloud Platform group builds the foundational common services used by dozens of Trimble products and millions of users.

Job Responsibility:

  • Develop and maintain infrastructure as code (IaC) using Terraform to ensure reliable and scalable cloud environments
  • Implement and enhance observability solutions using tools like New Relic, DataDog, Sumologic and Splunk for monitoring, logging, and alerting
  • Perform code deployments and manage CI/CD pipelines using Jenkins, Github, and related tooling to ensure smooth and efficient delivery processes
  • Automate routine tasks and workflows to increase operational efficiency and reduce manual intervention
  • Evaluate system designs and architectures for reliability, performance, security, and efficiency, ensuring best practices are followed
  • Lead incident response efforts, conduct root cause analysis, and implement long-term solutions for complex issues
  • Develop and maintain comprehensive runbooks and procedures for incident response and operational tasks
  • Collaborate with cross-functional teams to review and provide feedback on technical designs, ensuring alignment with SRE principles
  • Participate in on-call rotations and handle critical incidents with confidence and expertise
  • Continuously improve documentation for systems and services, contributing to a knowledge-sharing culture within the team

Requirements:

  • Bachelor's degree in a relevant field of study (e.g., Computer Science, Computer Engineering, Software Engineering, Information Technology, Information Systems)
  • 7-10 years of relevant work experience
  • Hands-on experience automating and improving processes in a software development & production environment
  • Working knowledge of one or more programming languages, such as Python, Go, Javascript, or similar
  • Ability to evaluate and troubleshoot technical issues with an attention to detail in problem-solving
  • Interest in automation and optimization of workflows for improved efficiency
  • Effective verbal and written communication skills
  • Ability to take on new challenges, with a willingness to receive both general and detailed instructions
  • Flexibility to adapt to evolving project requirements and timelines
  • This position requires a flexible schedule, which may include early mornings, late nights, and/or weekends to meet business needs
  • This position participates in an oncall rotation for 24x7 support

Additional Information:

Job Posted:
March 13, 2026

Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Site Reliability Engineer 3

Site Reliability Engineer II

Under general supervision, the Site Reliability Systems Administrator II is resp...
Location
Location
United States , Birmingham
Salary
Salary:
Not provided
genpt.com Logo
Genuine Parts Company
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree
  • Three (3) to five (5) years of related experience or an equivalent combination
  • Intermediate knowledge of appropriate networks, products, and protocols
  • Knowledge of Unix, Windows NT/2000/98, Internet Security, Oracle ERP, Distributed computing systems
  • Knowledge of job associated database/software/documentation/programming languages/monitoring and version control tools
  • Troubleshooting skills
  • Problem solving skills
  • Demonstrated knowledge and adherence to Change Management processes
  • Ability to interface well with customers, end users, partners, and associates
Job Responsibility
Job Responsibility
  • Defines, designs, and administers network systems used for data communications and recommends improvements to problems of moderate scope
  • Responsible for making sure that the company network works
  • Manages the load configuration of a central data communication processor under limited guidance and makes some recommendations for the purchase or upgrade of data networks
  • Exercises some discretion in proposing and implementing network system enhancements (software and hardware updates)
  • Serves as a point of contact for performance analysis, scalability, and service architecture/database administration issues
  • Coordinates equipment orders including terminals and cable installation, as well as upgrading, monitoring, testing, and servicing the database/systems
  • Helps to negotiate and place orders with common carriers
  • Performs other duties as assigned
What we offer
What we offer
  • Healthcare coverage
  • 401(k)
  • Tuition reimbursement
  • Vacation
  • Sick pay
  • Holiday pay
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer 2

Join us. At PagerDuty, you'll tackle complex problems, collaborate with kind and...
Location
Location
Portugal , Lisbon
Salary
Salary:
Not provided
https://www.pagerduty.com Logo
PagerDuty
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of experience in Site Reliability Engineering, DevOps, or Platform Engineering roles
  • Experience with Kubernetes and container orchestration
  • Experience working on cloud-native infrastructure (e.g. AWS, GCP, Azure)
  • Proficiency in at least one programming language (e.g. Python, Ruby, Go, etc.)
  • Experience with Infrastructure as Code, (e.g. Terraform, Cloudformation)
Job Responsibility
Job Responsibility
  • Deploy, configure, monitor and optimize highly available Kubernetes clusters on AWS/EKS
  • Help maintain the overall health of the platform, including triaging and troubleshooting production issues, monitoring system capacity, and working with other technical teams to ensure adherence to compliance and security best practices
  • Continuously strive to improve the internal developer experience and the software development lifecycle
  • Stay current on technical trends to suggest innovative tools and approaches to interesting problems
  • Participate in a 24/7 on-call rotation
What we offer
What we offer
  • Competitive salary
  • Comprehensive benefits package from day one
  • Flexible work arrangements
  • Company equity
  • ESPP (Employee Stock Purchase Program)
  • Retirement or pension plan
  • Generous paid vacation time
  • Paid holidays and sick leave
  • Dutonian Wellness Days & HibernationDuty - companywide paid days off in addition to PTO
  • Paid parental leave: 22 weeks for pregnant parent, 12 weeks for non-pregnant parent
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

We’re looking for a passionate Site Reliability Engineer to pioneer our SRE stra...
Location
Location
Spain , Barcelona
Salary
Salary:
45000.00 - 59000.00 EUR / Year
edpuzzle.com Logo
Edpuzzle
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • At least 3 years of experience in Site Reliability Engineering, DevOps Engineering, System Administration or Cloud Infrastructure Engineering for a web-based product with a focus on observability and reliability
  • Good knowledge of Amazon Web Services (AWS), CloudWatch and Datadog
  • Experience with software release management and deployment pipelines (Git, CI/CD)
  • Experience with Infrastructure as Code using AWS CDK
  • Experience writing JavaScript, TypeScript or Node.js code
  • Pragmatic with technologies: you understand tech is a tool to solve a product problem, tech is never the end goal
  • Excellent ability to communicate your ideas, regardless of the audience
  • Product-oriented: You make all your technology decisions with the final user in mind
  • You are naturally drawn towards understanding the bigger picture and recognize when there's a need for improvement, applying your intentional and rational thought process to address complex issues
  • You are able to work independently, plan and exercise conscious control of time spent on specific goals to reach deadlines effectively, and you don’t hesitate to pursue a goal despite the difficulties, all while maintaining a flexible mindset
Job Responsibility
Job Responsibility
  • Work with the Product, Infrastructure and Engineering teams to find the best technical solutions by participating in discussions and sharing your opinions
  • Take ownership of the problems that are being worked on, understanding why they are needed by the users, carrying out your own research, making your own proposals and working on the implementation while relying on your teammates for help when needed
  • Communicate effectively in a team in order to maximize productivity, ownership, and focus to help projects reach the finish line with the best possible outcome and by the project deadline
  • Design a cloud infrastructure that is secure, scalable, and highly available on AWS
  • Engage in proactive monitoring and observability with comprehensive tools and practices that not only detect and warn, but also predict potential system issues before they affect our users
  • Lead the charge in root cause analysis for production and infrastructure issues, transforming challenges into learning opportunities
  • Provision, configure and maintain cloud infrastructure as code
  • Perform rotatory on-call service, ensuring reliability and uptime for our users
  • Write technical documentation, contributing to our technical knowledge base and empowering your peers
  • Perform other exciting duties as opportunities and needs arise.
What we offer
What we offer
  • On-call compensation
  • 24 days’ paid holidays plus December 24th and 31st
  • Flexible working hours and reduced working time on Fridays to support work-life balance
  • €2000 annual allowance for meals with Cobee
  • Private health insurance policy with AXA
  • Access to Wellhub to support physical and emotional well-being
  • Flexible remuneration for childcare
  • Flexible remuneration for public transport
  • Flexible remuneration for health insurance of immediate family members (spouse and/or children)
  • Training and development (CodelyTV, Cloud Academy, etc.)
  • Fulltime
Read More
Arrow Right

Site Reliability Engineering Manager

The Wikimedia Foundation is looking for an Engineering Manager to join our SRE t...
Location
Location
United States of America
Salary
Salary:
132439.00 - 208378.00 USD / Year
wikimediafoundation.org Logo
Wikimedia Foundation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Prior experience managing teams
  • Prior hands-on experience with software or reliability engineering (within the last 3 years preferred)
  • Ability to analyze complex systems, troubleshoot issues, and devise effective solutions under pressure
  • Proficiency in project management methodologies to effectively plan, execute, and track new and existing initiatives
  • Strong understanding of cloud computing, networking, Linux systems administration, containerization (e.g., Docker, Kubernetes), and infrastructure as code (e.g., Terraform, Ansible) to be able to provide technical support to the team
  • Aptitude for automation and streamlining of tasks
  • Communicate effectively in both spoken and written English
  • Ability to work independently, as an effective part of a globally distributed team
  • Ability to travel several times a year for occasional in-person meetings
  • B.S. or M.S. in Computer Science or the equivalent in related work experience
Job Responsibility
Job Responsibility
  • Managing one to two globally distributed teams within Wikimedia’s Site Reliability Engineering organization
  • Providing guidance, mentorship, and support to ensure the team's effectiveness and growth
  • Working with team members to set individual performance goals, and supporting them in meeting and evolving their goals and career path
  • Recruiting, hiring, and helping onboard new team members
  • Triaging incoming workload, maintaining focus on priorities, and setting realistic expectations for both peers and team members
  • Coordinating and communicating with other members of the Wikimedia product & engineering teams on relevant projects, executing complex projects and contributing to the organizational strategy
  • Continuously developing the roadmap of the team in alignment with other SRE and Product & Technology teams, and helping to draft and execute the team’s annual and quarterly plans
  • Project managing new and existing initiatives
  • Leading the definition, refinement, and execution of the processes through which the team manages and performs work
  • Leading incident response, diagnosis, and follow-up on system alerts and outages across Wikimedia’s production infrastructure
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

We are looking for an engineer who is passionate about scaling Cloud services to...
Location
Location
Australia , Sydney
Salary
Salary:
Not provided
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years experience operating high-availability, fault-tolerant, scalable, distributed software in production
  • 3+ years of hands-on experience with public cloud offerings (AWS components like EC2, CloudFormation, RDS / Aurora, Caches, SQS - or equivalents, e.g. in GCP / Azure)
  • Familiarity with Unix / Linux operating systems
  • Great emphasis to debug, improve code, and automate routine tasks
  • Backend engineering experience in one or more prominent languages such as Java, Go or Python
  • Strong communication skills in written and verbal forms, and an ability to communicate complex technical issues to a range of technical and non-technical audiences (management, peers, clients)
Job Responsibility
Job Responsibility
  • Scaling Cloud services
  • Owning the database infrastructure, tooling and automation that Jira Cloud runs on
  • Analyzing and improving services and processes to achieve higher levels of reliability, performance, scalability, and cost efficiency
What we offer
What we offer
  • Health coverage
  • Paid volunteer days
  • Wellness resources
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

Atlassian is looking for an engineer passionate about scaling Cloud services to ...
Location
Location
Australia , Sydney
Salary
Salary:
Not provided
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years experience operating high-availability, fault-tolerant, scalable, distributed software in production: building monitoring into your code, tweaking dashboards, defining alerts, writing runbooks, etc.
  • 3+ years of hands-on experience with public cloud offerings (AWS components like EC2, CloudFormation, RDS / Aurora, Caches, SQS - or equivalents, e.g. in GCP / Azure)
  • Familiarity with Unix / Linux operating systems
  • Great emphasis to debug, improve code, and automate routine tasks
  • Backend engineering experience in one or more prominent languages such as Java, Go or Python
  • Strong communication skills in written and verbal forms, and an ability to communicate complex technical issues to a range of technical and non-technical audiences (management, peers, clients)
Job Responsibility
Job Responsibility
  • Scaling Cloud services
  • Owning the database infrastructure, tooling and automation that Jira Cloud runs on
  • Analyzing and improving services and processes to achieve higher reliability, performance, scalability, and cost efficiency
What we offer
What we offer
  • Health coverage
  • Paid volunteer days
  • Wellness resources
  • Fulltime
Read More
Arrow Right

Cloud Security Site Reliability Engineer

This role sits within the Cloud Security team which is responsible for Private a...
Location
Location
Singapore , Singapore
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree or equivalent work experience
  • 3+ years of relevant work experience
  • Highly motivated self-starter with good interpersonal and communication skills
  • Certification or formal training in site reliability engineering concepts and practices would be beneficial
  • Prior experience working towards SLIs, SLOs and observability capabilities
  • 2+ years experience in Python alongside Linux based scripting languages
  • Experience working on observability, logging and metrics toolsets
  • Experience of k8s and container technologies such as Docker, Openshift and EKS
  • Experience with Secrets products such as HashiCorp Vault or CyberArk beneficial but not essential
  • Experience with CICD tools such as terraform, Jenkins, Ansible.
Job Responsibility
Job Responsibility
  • Working across Container products and Secrets products, across Public and Private Cloud, as well as Cloud native specific products
  • Architecting and building tools and platforms that provide capabilities for SRE
  • Collaboration with multiple stakeholders and partners across Engineering and Operations as well as partner teams within the wider Citi organisation
  • Actively owning production level incidents till resolution.
  • Fulltime
Read More
Arrow Right

Manager, Site Reliability Engineering and Incident Management

Planet DDS is seeking a Manager, Site Reliability Engineering and Incident Manag...
Location
Location
United States , Atlanta
Salary
Salary:
118000.00 - 160000.00 USD / Year
planetdds.com Logo
Planet DDS
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years in SRE, DevOps, or Infrastructure roles
  • 3+ years in Incident Management leadership
  • Deep understanding of reliability, scalability, and performance optimization
  • Multi-cloud expertise in AWS, Azure, or GCP
  • Understanding of DNS, load balancing, firewalls, and compliance frameworks
  • Knowledge of fundamental cloud security (e.g., identity and access management, firewalls)
  • Deep understanding of logging and monitoring and security best practices
  • Strong collaboration and communication skills
  • Bachelor’s Degree in a relevant major or equivalent years of experience is a plus
Job Responsibility
Job Responsibility
  • Lead and mentor a team of SREs and Incident Managers
  • Foster a culture of reliability, accountability, and continuous improvement
  • Collaborate with engineering teams to design resilient platform architectures
  • Oversee the incident response process for outages and service disruptions
  • Ensure timely detection, escalation, and resolution of incidents
  • Drive post-incident reviews (PIRs) and root cause analysis
  • Implement improvements based on lessons learned to prevent recurrence
  • Mature and enforce best practices for incident response and runbooks
  • Automate operational tasks to reduce toil and improve efficiency
  • Maintain observability tools (monitoring, alerting, logging)
  • Fulltime
Read More
Arrow Right