This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
As Padran Information Technologies, we are looking for teammates who are focused on growth and success! In this position, you will work for our clients that we provide consultancy and you will take part in projects in Turkey's leading companies. We are looking for a SRE who meets the following qualifications for our consulting business partner.
Job Responsibility:
Defining and driving reliability goals (SLIs/SLOs/SLAs) for services and leading efforts to achieve them
Designing scalable, fault-tolerant systems, and leading disaster recovery, backup, and failover planning
Owning incident management processes: leading major incident response, root cause analysis, and postmortems
Implementing chaos engineering practices to proactively identify weaknesses and strengthen system resilience
Building and maintaining observability stacks (metrics, logging, tracing) to enable proactive detection and troubleshooting
Partnering with development teams to embed reliability-focused design patterns into software architecture
Developing automation tools and self-healing systems to reduce toil and improve operational efficiency
Documenting runbooks, playbooks, and operational best practices to standardize processes across the organization
Requirements:
A minimum of Bachelor’s degree in Computer Science, Engineering, or a related field
5+ years of experience in SRE, Reliability Engineering, or large-scale systems operations
Strong expertise in designing and maintaining highly available, fault-tolerant, and distributed systems
Deep understanding of SLIs, SLOs, and SLAs
proven track record of driving reliability metrics
Hands-on experience with performance tuning, capacity planning, and incident response strategies
Proficiency in monitoring, logging, and tracing tools such as Newrelic, Datadog, Prometheus, Grafana, OpenTelemetry, ELK
Strong programming or scripting experience (Go, Python, Bash, or similar) for building automation and internal tools
Experience with Kubernetes, container orchestration, and hybrid/multi-cloud infrastructure
Solid networking fundamentals, troubleshooting, and production-level debugging expertise
Strong experience implementing chaos engineering, disaster recovery, and failover strategies
Familiarity with DevSecOps practices, security audits, and compliance frameworks
Expertise in infrastructure automation and Infrastructure as Code (Terraform, Ansible, etc.)
Experience leading postmortems, blameless culture adoption, and root cause analysis
Strong technical communication and documentation skills for distributed, global teams
Excellent command of English
What we offer:
Opportunity to work with leading companies in Turkey
Opportunity to use industry-leading technologies with our business partners Microsoft, IBM, AWS and Open Text
Career development and certification opportunities as an ISTQB accredited training center