This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are looking for an experienced engineer to build and enhance observability capabilities for agent-based solutions on AWS. This role focuses on providing deep visibility into distributed systems, improving reliability, and enabling proactive issue detection and resolution. What project we have for you Our customer is a multinational corporation with more than a century of history and offices in over 180 countries. Their most ambitious goal at the time is to introduce a range of Reduced-Risk Products (RRPs). The target audience is more than 1 billion consumers around the globe. IT platform hosts 700+ applications. Intellia's mission is to help the client with the engineering of a comprehensive software ecosystem for a game-changing IoT product on the margin of innovative consumer experience and cutting-edge technology. Our teams are involved in the engineering of core platform components for best-in-class eCommerce, Digital Marketing and IoT solutions. As a DevOps engineer, you will become a part of Core Architecture Team and be responsible for the architecture, implementation of best practices in our Digital Engineering Enterprise Platform. The Platform is a set of services and internet applications that accelerate the development and delivery of software applications by taking care of common SDLC challenges. The Platform provides access and consumption for engineering teams to a set of services, technologies, practices for their development and for operating their application, ensuring a set of compliance and best practices. Project is in production for 2+ years, being supported by multiple teams. Our technical domains are: – AWS cloud, partially Azure – SSO, Organizations, Service control policies, access models. – IAAC: terraform enterprise, terratest, chalice – Serverless: lambda, step functions, wide range of misc automations, fargate – System, Application, Network and security architectures – Orchecstration: k8s (eks) – SRE activities (logging, tracing, monitoring), OpsGenie, Splunk – Hashicorp Vault – Hybrid Networking
Job Responsibility
Design and implement observability frameworks for agent-based and distributed systems
Build and maintain monitoring, logging, and tracing pipelines
Develop dashboards and alerts to ensure system health and performance visibility
Analyze system behavior and identify performance bottlenecks and anomalies
Ensure high availability and reliability of runtime components
Integrate observability tools with AWS infrastructure and CI/CD pipelines
Support incident response, troubleshooting, and root cause analysis
Collaborate with platform and AI teams to improve system transparency and operability
Requirements
5+ years of experience working as a DevOps / Platform Engineer
Strong experience with AWS (EKS, EC2, VPC, RDS, Route53, API Gateway, Lambda)
Hands-on experience with Terraform (AWS, Kubernetes/Helm, Hashicorp Vault)
Hands-on experience with Observability tools: New Relic, Open Telemetry
Strong knowledge of Kubernetes
Strong programming skills in Python (scripting, FastAPI, Swagger) and Bash / PowerShell
Solid understanding of monitoring, logging, and distributed tracing concepts
Experience with containerization (Docker, Kubernetes)
Experience with CI/CD tools (Jenkins, GitLab)
Experience with configuration management tools (Ansible, Chef, Puppet)
Strong analytical thinking and troubleshooting skills
Attention to detail and ability to detect patterns and anomalies
Effective communication and collaboration skills
Ability to work under pressure during incident resolution
Proactive mindset toward system reliability and continuous improvement
Nice to have
Experience with AWS-native observability tools (CloudWatch, X-Ray)
Familiarity with agent-based or AI-driven systems
Experience setting up SLOs, SLIs, and alerting strategies
Understanding of performance tuning in distributed environments