This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are seeking a highly skilled Staff/Senior Software Consultant with deep expertise in observability, distributed systems, and cloud-native environments. In this role, you will work closely with engineering teams and stakeholders to design, implement, and optimize monitoring, logging, and tracing solutions that enhance system reliability, performance, and scalability.
Job Responsibility:
Lead the design and implementation of observability solutions across complex distributed systems
Architect and optimize logging, monitoring, and tracing frameworks for cloud-native applications
Diagnose and resolve performance bottlenecks and system failures in large-scale distributed environments
Collaborate with engineering, DevOps, and SRE teams to improve system visibility and reliability
Implement best practices for metrics collection, alerting, and incident response
Work with tools such as Datadog, Prometheus, Grafana, OpenTelemetry, and APM platforms like New Relic, Dynatrace, AWS X-ray and Azure Application Insights
Provide consulting guidance on cloud architectures (AWS, Azure, GCP) and observability strategies
Implement and manage Application Performance Monitoring (APM) solutions to gain deep visibility into application behavior and performance
Correlate APM data with logs, metrics, and traces for faster root cause analysis (RCA)
Develop dashboards, alerts, and automated workflows to ensure proactive system monitoring
Mentor junior engineers and contribute to technical strategy and decision-making
Stay up-to-date with industry trends and emerging technologies in observability and distributed systems
Requirements:
Bachelor's or Master's degree in Computer Science, Engineering, or a related field
3+ years of experience in software engineering, DevOps, or SRE roles
Strong expertise in observability concepts: logging, monitoring, tracing
Hands-on experience with tools like Datadog, Prometheus, Grafana, ELK stack and APM tools such as Datadog, NewRelic, AWS X-ray, Application Insights
Deep understanding of distributed systems architecture and debugging techniques
Experience working in cloud environments (AWS, Azure, or GCP)
Proficiency in at least one programming language (e.g., Python, Java, Go)
Experience with containerization and orchestration (Docker, Kubernetes)
Strong understanding of application performance tuning, transaction tracing, and service dependency mapping