SRE Developer Job at Wissen (Bangalore South)

Sre (Developer Relations)

Location

Japan , 東京23区

Salary:

7000000.00 - 10000000.00 JPY / Year

Randstad

Expiration Date

Until further notice

Requirements

Fluent in English
Minimum 4 years of experience as an SRE engineer or Infrastructure Engineer
Experience in consulting / forward deployed engineering (FDE) experience
Experience with Kubernetes
Experience with debugging, problem solving, and resolving incidents
Experience with application development
Experience in multiple widely-used programming languages
Experience in AWS, GitHub, JIRA/Confluence, Slack, Linux (bash, CLI)

What we offer

健康保険
厚生年金保険
雇用保険

Fulltime

SRE Ansible developer

Location

Canada , Toronto

Salary:

155000.00 USD / Year

Realign

Expiration Date

Until further notice

Requirements

Design and implement automation scripts using Ansible for infrastructure provisioning and configuration management
Develop and maintain monitoring solutions leveraging Dynatrace for application and system performance
Configure and optimize ITRS monitoring tools to ensure proactive alerting and incident management
Collaborate with development and operations teams to improve system reliability and scalability
Automate deployment pipelines and integrate with CICD processes for faster releases
Troubleshoot performance issues and implement solutions to enhance system resilience
Ensure compliance with security and operational standards across environments
Document automation workflows, monitoring configurations, and best practices for knowledge sharing
Total Experience: 6-8 years

Fulltime

Python Developer - Site Reliability Engineering (SRE)

We are seeking a skilled Python Developer with experience in the Site Reliabilit...

Location

Canada , Montreal

Salary:

Not provided

NTT DATA

Expiration Date

Until further notice

Requirements

3+ years of experience with Python development
6 years of experience working with Infrastructure as Code (Terraform and Ansible)
Experience with CI/CD pipelines, preferably GitHub Actions and Jenkins
Strong understanding of object-oriented design and development principles
Proficiency in Linux/Unix environments
Experience working with database technologies (preferably NoSQL), including data modeling, testing, and performance tuning
Ability to write reusable, optimized, maintainable, and well‑documented code following industry best practices
Experience implementing open-source monitoring and observability tools such as Prometheus, Grafana, Splunk or Open Telemetry
Strong problem‑solving skills and ability to take ownership of tasks and drive them independently to closure
Understanding of networking concepts (TCP/IP, DNS, Load Balancing)

Job Responsibility

Develop quality software working with public cloud service provider (CSP) infrastructure across different Public Cloud areas
Develop, enhance, and integrate automation workflows for Public Cloud Service Providers (CSP), initially focused on Azure, and integrate with in-house tooling
Integrate automation workflows into CI/CD pipelines using GitHub Actions and Jenkins
Build proof-of-concept solutions in new areas of cloud and automation development
Provide technical support and debugging for application failures in both on-premises and cloud environments
Participate in all phases of the Software Development Life Cycle (SDLC), including analysis, design, coding, testing, and deployment
Evaluate, onboard, and implement emerging DevOps and automation tools to improve efficiency
Build and integrate observability into cloud platforms and solutions using open-source tools (Prometheus, Grafana, OpenTelemetry)
Identify, highlight, and reduce operational toil through automation, architectural improvements, and process optimization
Collaborate with global teams to understand requirements, develop high‑quality code, and deliver cloud-focused projects

Principal Site Reliability Engineer

Palo Alto Networks runs a large hybrid infrastructure and is one of the largest ...

Location

United States , Santa Clara

Salary:

151600.00 - 245300.00 USD / Year

Palo Alto Networks

Expiration Date

Until further notice

Requirements

BS or MS in Computer Science, a related field, or equivalent professional experience or equivalent military experience
Expertise in configuration management with a framework such as Ansible, Terraform, Helm, Kubernetes
Proficient in Python and/or Go
Expertise in managing applications in the Kubernetes cluster with autoscaling enabled
Experience in Production Engineering, DevOps, or Site Reliability
Expertise in the public cloud (GCP or AWS), especially in GCP
Strong Linux administration, internals, and network troubleshooting
Proficiency with programming languages like Python, Golang, and shell scripting to automate tasks
Experience with CI/CD pipelines, GitLab, and GitHub preferred
Ability to diagnose and troubleshoot complex distributed systems handling high-volume transactions

Job Responsibility

Contribute to the success of SRE and DevOps
Develop expertise in new technologies
Work with developers, researchers, data scientists, and security experts
Design, build, and operate reliable, secure Cloud infrastructure
Ensure that applications are production-ready, scalable, and reliable
Develop tools and automation frameworks
Automate robust deployment of robust services
Orchestrate end-to-end monitoring and alerting
Participate with SRE and Dev teams in the on-call rotation
Lead root cause analysis of critical business and production issues

Fulltime

Lead Database Reliability Engineer

As our Lead Database Reliability Engineer, you'll support our products, Timely a...

Location

New Zealand

Salary:

Not provided

EverCommerce

Expiration Date

Until further notice

Requirements

Strong ability to work autonomously, prioritise effectively, and make sound technical decisions while knowing when to seek support or collaboration
Deep expertise in database reliability, performance, and management, with the ability to mentor and coach others
Strong experience with relational databases, ideally SQL Server and Azure SQL, with exposure to Oracle environments
Advanced T-SQL skills and a passion for solving complex data challenges
Experience with, or willingness to learn, scripting, object-oriented programming, and infrastructure-as-code technologies such as Python, PowerShell, and Terraform
Focus on delivering scalable, reliable solutions with strong practices across monitoring, alerting, automation, documentation, and knowledge sharing
Experience working in agile environments using SCRUM and/or Kanban methodologies
Confident communicator who enjoys collaborating, contributing ideas, and engaging in healthy discussions around data practices and engineering improvements

Job Responsibility

Drive data strategy and data practices for how our databases work, scale and are used
Capacity planning and performance tuning of the database platforms
Carry out database related project work (e.g. writing migration scripts, procs, DDL etc)
Manage high risk data deployments and post-deploy monitoring
Assist development teams in a consultancy role for identifying risks and proposing solutions around database reliability
Work with the data team and product teams to constantly improve data operational engineering practices
Maintain awareness of trends and emerging technologies in relevant fields and propose to Wellness Solutions when fit
Advocate for, and apply Devops and SRE principles across Wellness engineering teams
Grow and mentor other Data professionals

What we offer

Work-life balance
Additional annual leave
Flexibility to work from home or the office
High-spec home office setup
Professional development budget
Annual wellness allowance

Fulltime

Senior Software Engineer - Kubernetes & ServiceMesh

Join us in building Roku’s next-generation cloud-agnostic platform that powers K...

Location

India , Bengaluru

Salary:

Not provided

Roku

Expiration Date

Until further notice

Requirements

Strong hands-on experience with cloud technologies (AWS preferred
GCP or Azure is a plus), specifically in architecting and managing performant, large-scale systems handling significant traffic/data
Deep knowledge of Kubernetes (EKS, GKE, AKS, or similar) and service mesh technologies
Proficiency in Go or another programming language, Python or another scripting language
Experience designing infrastructure and building automation tools, while collaborating with internal team members and external stakeholders
Experience building CI/CD pipelines and following modern deployment practices
Familiarity with observability tools (Prometheus, Thanos, Loki, Grafana, etc.)
Ability to work independently and communicate effectively with technical and non-technical stakeholders
Passion for learning and solving complex infrastructure challenges
Experience integrating AI tools to improve processes and reduce operational toil (a plus)

Job Responsibility

Architect, design, and deploy Roku’s next-generation cloud platform and service mesh
Build and own solutions to Roku's compute problems using Docker, Kubernetes, Istio/Envoy, Terraform and scripting to evolve our tech stack and deployments
Proactively drive the research and implementation of new technologies to enhance scalability, reliability, and developer experience
Integrate security best practices into infrastructure design and automation
Build tooling to visualize inefficiencies and optimize costs across shared-tenancy clusters, including network traffic insights, cross-cluster communication efficiency, and cost attribution
Collaborate with internal teams to migrate workloads to Kubernetes + Istio, leveraging open-source observability tools
Work closely with the Observability team to scale monitoring and logging solutions for a holistic view of the platform
Leverage SRE principles to maintain high availability and streamline onboarding workflows
Mentor team members and help define best practices for infrastructure and automation

What we offer

global access to mental health and financial wellness support and resources
healthcare (medical, dental, and vision)
life insurance
accident insurance
disability insurance
commuter benefits
retirement options (401(k)/pension)
time off

Fulltime

Senior Software Engineer - SRE

Roku is changing how the world watches TV. Roku is the #1 TV streaming platform ...

Location

India , Bengaluru

Salary:

Not provided

Roku

Expiration Date

Until further notice

Requirements

Preferably 8+ years of experience in DevOps/SRE roles, with demonstrated expertise in implementing SRE principles, SLO/SLI frameworks, and error budget policies in production environments
Deep experience with observability and monitoring platforms such as Prometheus, Grafana, Datadog, New Relic, or equivalent, including experience building custom dashboards, alerts, and SLO-based monitoring
Strong background in incident management, including experience as an Incident Commander, conducting blameless postmortems, and implementing systematic reliability improvements based on incident learnings
Strong understanding of distributed systems and reliability engineering, including failure modes, fault tolerance patterns, circuit breakers, bulkheads, rate limiting, and graceful degradation strategies
Experience with a number of the following: Kubernetes, Docker, Service Mesh such as Istio, Envoy, Linkerd, Solo & ECS
Experience in cloud-focused software development, preferably in Go, Python, or other object-oriented programming languages
Experience with Infrastructure as Code (IaC) tools such as Terraform, Ansible, or CloudFormation
Experience with CI/CD automation, including GitLab pipelines and other related tools
Strong hands-on experience with cloud platforms such as AWS, GCP or Azure
Proven track record of implementing scalable, high-performance infrastructure solutions in fast-paced, dynamic environments

Job Responsibility

Design & Infrastructure
Contribute to postmortem culture by facilitating comprehensive, blameless post-incident reviews that identify root causes, contributing factors, and actionable remediation items. Track incident trends to identify systemic issues and prioritize reliability improvements
Implement chaos engineering practices to proactively identify failure modes, validate system resilience, and build confidence in recovery procedures. Conduct game days and disaster recovery exercises
SRE Process & Principles Implementation
Deploy and evolve SRE practices across the organization by establishing core SRE principles, frameworks, and methodologies. Define and implement service reliability practices, including Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Error Budgets, to balance innovation velocity with system reliability
Manage Error Budgets as a mechanism for making data-driven decisions about feature velocity vs. reliability. Track, report, and enforce error budget policies, facilitating conversations between engineering and product teams about risk tolerance and release decisions
Reliability Engineering & Infrastructure
Reduce toil through automation by identifying repetitive operational work and systematically eliminating it through infrastructure-as-code, automation frameworks, and intelligent tooling. Measure and track toil reduction efforts, aiming to keep toil below 50% of team time
Implement capacity planning processes that ensure systems have adequate headroom to meet SLOs during peak traffic, unexpected load spikes, and degraded states. Develop predictive models and automated scaling mechanisms
Observability, Monitoring & Reporting

What we offer

global access to mental health and financial wellness support and resources
healthcare (medical, dental, and vision)
life, accident, disability, commuter, and retirement options (401(k)/pension)
time off in accordance with local leave policies

Fulltime

Applications Support Tech Lead Analyst - Vice President

The Apps Sup Tech Lead Analyst is a strategic professional who stays abreast of ...

Location

India , Chennai, Tamil Nadu, India, Pune, Maharashtra, India

Salary:

Not provided

Citi

Expiration Date

Until further notice

Requirements

10+ years Proven and practical experience of in a production support role managing enterprise-level applications, demonstrating strong problem-solving and strategic thinking skills
Demonstrated experience with SRE practices, including advanced monitoring, alerting, incident response, post-mortems, and driving automation for operational efficiency and system reliability
Operating Systems & Scripting: Deep expertise in Unix/Linux environments and advanced Shell scripting
Monitoring & Logging Tools: Proficiency with enterprise monitoring tools (e.g., ITRS Geneos, AppDynamics) and log aggregation platforms (e.g., Splunk, ELK)
Practical experience with containerization platforms (OpenShift, Kubernetes)
Hands-on experience with relational (Oracle, MSSQL) and NoSQL (MongoDB) databases
Strong knowledge of messaging solutions (e.g., Tibco EMS, MQ, Kafka)
Experience working with REST APIs
Infrastructure Fundamentals: Solid understanding of distributed application architecture, including networks, load balancers, storage, and authentication (AD/LDAP)
Working knowledge and practical application of Object-Oriented Programming (OOP) concepts and principles

Job Responsibility

Partner with multiple technology teams to ensure appropriate integration of functions to meet goals
identify and define necessary system enhancements
analyze existing system logic, identify problems
and recommend and implements solutions
Provides expertise in area and an advanced level of understanding of the principles of apps support
Formulates and defines systems scope and objectives for complex, high impact application enhancements and problem resolution through in-depth analysis and evaluation of complex business processes, systems and industry standards
documents requirements
Partners with multiple technology areas and management teams to ensure appropriate integration of functions to meet goals
Works closely with Product Owners, Business Analysts and Systems Analysts to determine and document Systems impacts and support requirements
Considers the implications of the application of technology to the current environment

Fulltime

Select Country

SRE Developer

Job Description

Job Responsibility

Requirements

Nice to have

Looking for more opportunities?

SRE Developer

Sre (Developer Relations)

SRE Ansible developer

Python Developer - Site Reliability Engineering (SRE)

Principal Site Reliability Engineer

Lead Database Reliability Engineer

Senior Software Engineer - Kubernetes & ServiceMesh

Senior Software Engineer - SRE

Applications Support Tech Lead Analyst - Vice President

Our AI answers in your language