This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
You will join an SRE-aligned operations team responsible for keeping a mission-critical, global cloud platform reliable, performant, and secure. The project focuses on 24/7 cloud operations, proactive monitoring, incident response, and continuous improvement of observability coverage across multi-region GCP environments. You will work closely with SRE, Cloud Engineering, and development teams to maintain high availability, support business continuity, and drive operational excellence.
Job Responsibility:
Monitor cloud infrastructure across multiple regions using advanced observability and monitoring tools
Respond to alerts and incidents in real time
provide supporting data for root cause analysis and escalate issues when required
Troubleshoot issues related to cloud networking, containers, storage, and APIs
Maintain and continuously improve troubleshooting guides (TSGs), incident response procedures, and operational documentation
Collaborate with SRE, Cloud Engineering, and development teams to resolve infrastructure and reliability issues
Perform routine health checks across the cloud environment
Perform routine patching and upgrades of observability and monitoring agents across the platform
Ensure compliance with SLAs, security policies, and operational standards
Participate in a 24/7 on-call rotation and support disaster recovery and business continuity activities
Analyze performance metrics and provide recommendations for optimization and automation
Requirements:
Bachelor’s degree in Computer Science, Information Technology, or a related field (or equivalent professional experience)
1–2 years of experience in a NOC, operations, or cloud infrastructure support role
Strong understanding of cloud platforms and services (AWS, Azure, GCP)
Familiarity with container orchestration technologies (Kubernetes, GKE) and CI/CD pipelines
Experience with monitoring and logging tools such as Datadog, Dynatrace, Prometheus, Grafana, ELK, CloudWatch, Splunk, Sumo Logic, New Relic, or similar solutions
Proficiency in Linux/Unix environments
Basic scripting or automation skills (Python, Bash, PowerShell) and/or Infrastructure as Code exposure (Terraform)
Strong communication skills, with the ability to document incidents and collaborate effectively
Must possess a legal work permit in Poland
What we offer:
Hybrid work model combining office & remote work
Attractively located office with collaboration spaces
Onsite parking space for employees
Referral program with financial bonus
Life Insurance
Budget for development (including language courses and others), clear career path with the possibility to gain experience in international environment
Access to internal Learning Platform with multiple trainings oriented for professional growth
Access to MyBenefit platform (Multisport included)
Team Building activities
Charity initiatives
Working environment promoting diversity and inclusion