CrawlJobs Logo

Site Reliability Engineer (SRE)

India, Bangalore South · Job Posted June 15, 2026
Apply Position
Job Link Share

Job Description

Wissen Technology is hiring for Site Reliability Engineer (SRE). At Wissen Technology, we deliver niche, custom-built products that solve complex business challenges across industries worldwide. Founded in 2015, our core philosophy is built around a strong product engineering mindset—ensuring every solution is architected and delivered right the first time. Today, Wissen Technology has a global footprint with 2000+ employees across offices in the US, UK, UAE, India, and Australia. Our commitment to excellence translates into delivering 2X impact compared to traditional service providers. How do we achieve this? Through a combination of deep domain knowledge, cutting-edge technology expertise, and a relentless focus on quality. We don't just meet expectations—we exceed them by ensuring faster time-to-market, reduced rework, and greater alignment with client objectives. We have a proven track record of building mission-critical systems across industries, including financial services, healthcare, retail, manufacturing, and more. Wissen stands apart through its unique delivery models. Our outcome-based projects ensure predictable costs and timelines, while our agile pods provide clients the flexibility to adapt to their evolving business needs. Wissen leverages its thought leadership and technology prowess to drive superior business outcomes. Our success is powered by top-tier talent. Our mission is clear: to be the partner of choice for building world-class custom products that deliver exceptional impact—the first time, every time.

Job Responsibility

  • Ensure reliability, scalability, and performance of mission-critical systems
  • Provide Java application support and troubleshoot production issues
  • Implement and maintain Infrastructure as Code (IaC) using Terraform
  • Manage and optimize cloud infrastructure across AWS, Azure, or GCP
  • Automate CI/CD pipelines using GitHub Actions
  • Administer and support MongoDB and Kafka clusters
  • Drive incident response, root cause analysis, and postmortem documentation
  • Collaborate with cross-functional teams to enhance observability, monitoring, and alerting capabilities

Requirements

  • Strong experience in Java Application Support
  • Proven expertise with Terraform
  • Solid Cloud knowledge (AWS, Azure, or GCP)
  • 9+ years of professional experience in SRE or related roles
  • Hands-on experience with MongoDB and Kafka
  • Experience with GitHub Actions for CI/CD automation
  • Strong problem-solving skills and ability to work independently during critical incidents
  • Excellent communication and stakeholder management skills

Nice to have

  • Experience with monitoring tools such as Prometheus, Grafana, and ELK Stack
  • Exposure to containerization technologies including Docker and Kubernetes

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Site Reliability Engineer (SRE)

8 matching positions

Cloud Engineer / Site Reliability Engineer (SRE)

Location
Location
United States , Orlando
Salary
Salary:
75.00 USD / Hour
bhsg.com Logo
Beacon Hill
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong hands-on AWS experience with solid understanding of core AWS services
  • Experience supporting and troubleshooting AWS and Azure cloud environments
  • Terraform experience for Infrastructure as Code
  • Docker/containerization experience
  • Strong troubleshooting and problem-solving skills
  • Ability to translate requirements into technical execution
  • Experience performing cloud architecture and diagramming
  • Experience supporting deployments, environments, and site standups
  • Strong communication and collaboration skills
Job Responsibility
Job Responsibility
  • Support cloud infrastructure and deployments across AWS and Azure
  • Troubleshoot infrastructure and application-related cloud issues
  • Build and maintain Terraform-based infrastructure
  • Support Docker/containerized environments
  • Create architecture diagrams and technical documentation
  • Work closely with engineering and project teams to execute cloud initiatives
  • Assist with automation and operational improvement efforts
  • Fulltime
Read More
Arrow Right

Intermediate Site Reliability Engineer SRE – AI Reliability & Automation

At PointClickCare our mission is simple: to help providers deliver exceptional c...
Location
Location
Canada , Mississauga
Salary
Salary:
115000.00 - 128000.00 CAD / Year
pointclickcare.com Logo
PointClickCare
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years' experience in software engineering
  • Experience with SRE principles
  • Experience with AI/ML in production environments
  • A passion for automation, intelligent systems, and operational excellence
  • Strong debugging, problem-solving, and system design skills
  • Languages: Python, Java, Bash, Terraform
  • Platforms: Azure, Kubernetes, Docker
  • Tools: Datadog, Prometheus, AppDynamics, ELK, GitHub Actions
  • ML/AI: MCP framework, AI agents, Vector store, Agent orchestration (LangChain), RAG
  • CI/CD: Jenkins, ArgoCD, Spinnaker
Job Responsibility
Job Responsibility
  • Build ML-based anomaly detection and pattern recognition systems
  • Enhance telemetry with smart tagging and metadata for better AI insights
  • Develop event-driven workflows and self-healing systems using AI triggers
  • Automate incident response with generative AI and custom AI agent orchestration
  • Use time-series forecasting and predictive modelling to anticipate failures
  • Optimise infrastructure with AI-powered autoscaling and cost-aware resource allocation
  • Build scalable, fault-tolerant systems in a cloud-native environment
  • Participate in on-call rotations and lead incident response for critical systems
  • Skilled in API integration for streamlined data exchange and system connectivity
  • Run internal AIOps workshops and help teams adopt AI maturity models
What we offer
What we offer
  • Benefits starting from Day 1!
  • Retirement Plan Matching
  • Flexible Paid Time Off
  • Wellness Support Programs and Resources
  • Parental & Caregiver Leaves
  • Fertility & Adoption Support
  • Continuous Development Support Program
  • Employee Assistance Program
  • Allyship and Inclusion Communities
  • Employee Recognition … and more!
  • Fulltime
Read More
Arrow Right

Senior Site Reliability Engineer (SRE)

The Senior SRE is responsible for deployment, updates, and operational support f...
Location
Location
India , Chennai
Salary
Salary:
Not provided
dalet.com Logo
Dalet
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Cloud platforms: AWS, Azure
  • Containerisation & Orchestration: Kubernetes
  • Infrastructure as Code: Terraform
  • Configuration Management: Ansible
  • Packaging & Deployment: Helm
  • Databases: MariaDB, MongoDB
  • Monitoring, observability, networking, and cloud security.
Job Responsibility
Job Responsibility
  • Act as a senior technical authority for APAC Site Reliability Engineering activities
  • Drive best practices in reliability, operations, and engineering standards
  • Promote technical excellence, collaboration, and accountability across stakeholders
  • Make infrastructure complexity transparent to both internal teams and customers, ensuring a consistently excellent client experience
  • Implement, track, and evolve service performance measures such as SLAs, SLOs, and SLIs
  • Anticipate risks related to service availability, capacity, performance regressions, and security vulnerabilities
  • Drive continuous improvement, including leading and facilitating Root Cause Analysis (RCA) activities
  • Ensure timely execution of deployments, upgrades, maintenance activities, and change requests
  • Anticipate workload, plan deliverables, and ensure qualification/validation of upcoming tasks
  • Collaborate closely with engineering to improve platform components, automation, and operational processes
What we offer
What we offer
  • Great career opportunities around the world
  • Truly collaborative environment with supportive leadership
  • Cutting edge technologies (AI, Cloud, Cybersecurity...)
  • Talented and passionate team members
  • Fun working environment
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer (SRE) - Identity Access Management IAM

Join us as a Site Reliability Engineer (SRE) - Identity Access Management. You w...
Location
Location
India , Pune
Salary
Salary:
Not provided
barclays.co.uk Logo
Barclays
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience in designing, implementing, deploying, and running highly available, fault-tolerant, auto-scaling and auto-healing systems
  • Strong expertise in AWS (essential), (Azure, and GCP (Google cloud platform) is a plus), including Kubernetes (ECS is essential, Fargate and GCE is a plus) and server-less architectures
  • Strong experience in running disaster recovery, zero downtime solutions and in designing and implementing continuous delivery across large-scale, distributed, cloud-based micro service and API service solutions with 99.9%+ uptime
  • Hands-on experience coding in Python, Bash and JSON/Yaml (Configuration as Code)
  • The ability to drive reliability best practices across engineering teams, embed SRE principles into the DevSecOps lifecycle and partner with engineering, security and product teams, to balance reliability and feature velocity
  • Experience in hands-on configuration, deployment and operation of ForgeRock COTS based IAM (Identity Access management) solutions (PingGateway, PingAM, PingIDM, PingDS) with embedded security gates, HTTP header signing, access token and data at rest encryption, PKI based self-sovereign identity, or open source
Job Responsibility
Job Responsibility
  • Applying software engineering techniques, automation, and best practices in incident response, to ensure the reliability, availability, and scalability of the systems, platforms, and technology through them
  • Availability, performance, and scalability of systems and services through proactive monitoring, maintenance, and capacity planning
  • Resolution, analysis and response to system outages and disruptions, and implement measures to prevent similar incidents from recurring
  • Development of tools and scripts to automate operational processes, reducing manual workload, increasing efficiency, and improving system resilience
  • Monitoring and optimisation of system performance and resource usage, identify and address bottlenecks, and implement best practices for performance tuning
  • Collaboration with development teams to integrate best practices for reliability, scalability, and performance into the software development lifecycle, and work closely with other teams to ensure smooth and efficient operations
  • Stay informed of industry technology trends and innovations, and actively contribute to the organization's technology communities to foster a culture of technical excellence and growth
What we offer
What we offer
  • Competitive holiday allowance
  • Life assurance
  • Private medical care
  • Pension contribution
  • Fulltime
Read More
Arrow Right

Senior Site Reliability Engineer (SRE) – Cloud & Distributed Systems

We are seeking an experienced Senior Site Reliability Engineer (SRE) to design, ...
Location
Location
United States , Austin
Salary
Salary:
Not provided
dutechsystems.com Logo
Dutech Systems
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience in SRE, DevOps, or Systems Engineering
  • Strong expertise in Linux/Unix systems and system internals
  • Proficiency in at least one programming/scripting language (Python, Go, Java, Bash)
  • Experience designing and operating distributed systems
  • Hands-on experience with cloud platforms (AWS or GCP)
  • Experience with Docker and Kubernetes
  • Strong understanding of monitoring, alerting, and logging concepts
  • Experience managing SLIs, SLOs, and error budgets
  • Experience with incident management and RCA processes
Job Responsibility
Job Responsibility
  • Design, implement, and manage highly available, distributed systems
  • Maintain and optimize cloud infrastructure (AWS/GCP)
  • Develop automation scripts using Python, Go, Java, or Bash
  • Manage containerized environments using Docker and Kubernetes
  • Define and monitor SLIs, SLOs, and error budgets
  • Implement monitoring, logging, and alerting solutions
  • Lead incident management, root cause analysis (RCA), and postmortems
  • Ensure system security and compliance within operational workflows
  • Improve system reliability through performance tuning and optimization
  • Collaborate with engineering teams to enhance deployment and release processes
Read More
Arrow Right

Site Reliability Engineering (SRE) / Lead Engineer

We are currently seeking a Site Reliability Engineering (SRE) / Lead Engineer to...
Location
Location
Mexico , Guadalajara
Salary
Salary:
Not provided
nttdata.com Logo
NTT DATA
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8-10+ years of experience in SRE, Observability, or DevOps roles, with leadership responsibilities
  • Hands-on experience with OpenTelemetry for distributed tracing and observability instrumentation
  • Proven expertise with Application Performance Monitoring (APM) tools such as New Relic, Datadog, AppDynamics, or Dynatrace
  • Strong proficiency in Infrastructure as Code (IaC) using Terraform
  • Solid understanding of cloud platforms including AWS, GCP, or Azure
  • Experience with automation/configuration management tools like Ansible, Chef, or Puppet
  • Deep knowledge of CI/CD pipelines and tools such as GitHub Actions, Jenkins, or Azure DevOps
  • Experience managing Kubernetes and containerized environments (Docker, Helm)
  • Familiarity with log aggregation and analysis platforms like ELK Stack or Splunk
  • Excellent leadership, communication, and collaboration skills
Job Responsibility
Job Responsibility
  • Lead the strategic development and management of observability and reliability frameworks across the organization, ensuring alignment with business goals and technical requirements
  • Design and implementation of monitoring and observability solutions, collaborating with engineering teams to define standards and best practices
  • Manage Infrastructure as Code (IaC) initiatives using Terraform, coordinating with cloud and infrastructure teams to ensure scalable and secure deployments
  • Drive automation strategies for monitoring, alerting, and logging pipelines, focusing on process improvements and operational efficiency
  • Develop and maintain comprehensive observability roadmaps, including distributed tracing, logging, and metrics collection strategies
  • Collaborate with product management, sales, and pre-sales teams to provide technical expertise and support during solution design and customer engagements
  • Lead cross-functional teams to enhance CI/CD pipelines and deployment reliability, ensuring smooth integration of observability tools and practices
  • Engage with vendors and strategic partners to evaluate, select, and integrate observability and monitoring solutions, ensuring alignment with organizational needs and fostering strong collaborative relationships
  • Mentor and develop junior engineers and analysts, fostering a culture of reliability, observability, and operational excellence
  • Fulltime
Read More
Arrow Right

Principal Site Reliability Engineer (AI-first SRE)

Groupon is modernizing its global platform — and reliability is at the center of...
Location
Location
Peru
Salary
Salary:
Not provided
groupon.com Logo
Groupon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years in software/systems engineering, including 5+ years in SRE or platform reliability
  • Strong experience with GCP (preferred) or AWS, Kubernetes, and Terraform
  • Proficiency in Python or Go for automation and tooling
  • Deep understanding of observability stacks (Prometheus, Grafana, OpenTelemetry) and service meshes (Istio, Envoy)
  • Hands-on AIOps experience: anomaly detection, predictive analytics, ML-assisted operations
  • Strong communication and influencing skills — data over hierarchy
Job Responsibility
Job Responsibility
  • Architect and maintain self-healing systems with 99.9%+ availability targets
  • Use AI/ML to automate infrastructure governance and detect configuration or IaC anti-patterns
  • Implement adaptive SLIs/SLOs that evolve automatically from real-time data
  • Build AIOps-based observability and auto-remediation pipelines
  • Apply predictive modeling to forecast failures before they impact users
  • Lead chaos, performance, and resilience testing programs
  • Map platform and service behavior to revenue impact and drive improved revenue resilience through better infrastructure performance
  • Mentor engineers and drive reliability standards across teams
  • Partner with platform, data, and product teams to ensure stability aligns with business goals
  • Support major incident response, incident review, and participate in on-call rotations
What we offer
What we offer
  • The opportunity to work with cutting-edge technologies in a transformative environment
  • Professional growth and leadership development pathways tailored to your aspirations
  • A chance to leave a lasting impact by shaping the future of reliable and scalable systems
Read More
Arrow Right

Principal Site Reliability Engineer (AI-first SRE)

Groupon is modernizing its global platform — and reliability is at the center of...
Location
Location
Salary
Salary:
Not provided
groupon.com Logo
Groupon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years in software/systems engineering, including 5+ years in SRE or platform reliability
  • Strong experience with GCP (preferred) or AWS, Kubernetes, and Terraform
  • Proficiency in Python or Go for automation and tooling
  • Deep understanding of observability stacks (Prometheus, Grafana, OpenTelemetry) and service meshes (Istio, Envoy)
  • Hands-on AIOps experience: anomaly detection, predictive analytics, ML-assisted operations
  • Strong communication and influencing skills — data over hierarchy
Job Responsibility
Job Responsibility
  • Architect and maintain self-healing systems with 99.9%+ availability targets
  • Use AI/ML to automate infrastructure governance and detect configuration or IaC anti-patterns
  • Implement adaptive SLIs/SLOs that evolve automatically from real-time data
  • Build AIOps-based observability and auto-remediation pipelines
  • Apply predictive modeling to forecast failures before they impact users
  • Lead chaos, performance, and resilience testing programs
  • Map platform and service behavior to revenue impact and drive improved revenue resilience through better infrastructure performance
  • Mentor engineers and drive reliability standards across teams
  • Partner with platform, data, and product teams to ensure stability aligns with business goals
  • Support major incident response, incident review, and participate in on-call rotations
What we offer
What we offer
  • The opportunity to work with cutting-edge technologies in a transformative environment
  • Professional growth and leadership development pathways tailored to your aspirations
  • A chance to leave a lasting impact by shaping the future of reliable and scalable systems
Read More
Arrow Right