Site Reliability Engineer Job at SESTEK (İstanbul & Ankara)

Job Description

We are looking for a “Site Reliability Engineer” who will take ownership of the health and reliability of our production environments. This role requires a proactive mindset, attention to detail, and a strong sense of responsibility in real-time operations. You will be the primary owner of our cloud infrastructure’s monitoring, resource management, and incident response processes. Your work will directly contribute to the stability and performance of mission-critical services. If you are passionate about building reliable systems and thrive in dynamic environments, this opportunity might be just for you.

Job Responsibility

Own the goal of maintaining stable and highly available production environments
Take responsibility for SBC (Session Border Controller) and network configurations, including troubleshooting and tuning
Be the primary point of accountability for cloud monitoring, resource usage tracking, and cost optimization
Regularly analyze system resources (CPU, memory, disk) and implement request/limit optimizations, particularly in Kubernetes environments
Design and execute reliability and resilience tests to improve system robustness
Manage operational metrics and alerting systems, ensuring timely responses to incidents
Act as the go-to person for real-time operational attention, helping reduce response times and increase system resilience

Requirements

3+ years of experience in SRE, Cloud Engineering, or DevOps roles
Strong understanding of cloud-native architectures and Kubernetes (including request/limit tuning, autoscaling, and Helm deployments)
Experience with infrastructure and application monitoring tools (Prometheus, Grafana, ELK, OpenTelemetry)
Familiarity with incident response, on-call support, and SLA/SLO practices
Proficient in analyzing resource usage (CPU, memory, disk) and performing system-level troubleshooting
Experience applying cloud best practices (preferably AWS) in real-world environments
Experience in designing and implementing Business Continuity and Disaster Recovery strategies
Good understanding of cloud security, preferably with experience in PCI DSS compliance program
Strong proficiency in Linux systems administration
Solid understanding of network fundamentals, including TCP/IP, DNS, NAT, routing, firewall rules, and load balancing
Ability to document operational procedures through clear and actionable SOPs or runbooks, enabling faster incident resolution, knowledge sharing, and improved on-call efficiency
Experience with network troubleshooting tools
Excellent communication and cross-team collaboration skills

Nice to have

Scripting ability (e.g., Bash, Python) is a plus
SBC configuration experience (e.g. Kamailio, Audiocodes) is a plus

What we offer

Private health insurance, meal card, transportation allowance
Monthly budget for external activities with your colleagues
Incentive for graduate and postgraduate studies
Training opportunities for technical and personal development as well as support for certificate programs related to the field of profession
Birthday celebrations, parties, and happy hours, “Welcome to Spring/Fall” events
Breakfast and healthy snacks at the office all day long

SESTEK - All Job Offers

Select Country

Site Reliability Engineer

Job Description

Job Responsibility

Requirements

Nice to have

What we offer

Looking for more opportunities?