CrawlJobs Logo

Cloud and Observability Engineer

India, Gurugram · Job Posted April 16, 2026
Apply Position
Job Link Share

Job Description

As a Cloud and Observability Engineer you will play a critical role in ensuring a smooth transition of customers’ monitoring and observability infrastructure. Your expertise in various other observability tools, coupled with a strong understanding of DevOps, will be essential in successfully migrating alerts and dashboards through creating extension packages and enhancing the customer's monitoring capabilities. You will collaborate with cross-functional teams, understand their requirements, design migration & extension strategies, execute the migration process, and provide training and support throughout the engagement.

Job Responsibility

  • Extension Delivery: Build & enhance quality extension packages for alerts, dashboards and parsing rules in Coralogix Platform to improve monitoring experience for key services using our platform
  • Migration Delivery: Help migrate customer alerts, dashboards and parsing rules from leading competitive observability and security platforms to Coralogix
  • Knowledge Management: Build, maintain and evolve documentation with respect to all aspects of extensions and migration
  • Conduct training sessions for internal stakeholders and customer on all aspects of the platform functionality (alerts, dashboards, parsing, querying, etc.), migrations process & techniques and extensions content
  • Collaborate closely with internal stakeholders and customers to understand their specific monitoring needs, gather requirements, and ensure alignment during the extension building process

Requirements

  • Minimum 2+ years of experience as a Systems Engineer, DevOps Engineer, or similar roles, with a focus on monitoring, alerting, and observability solutions
  • 2+ yrs of hands-on experience with and understanding of Cloud and Container technologies (GCP/Azure/AWS + K8/EKS/GKE/AKS)
  • Good knowledge and hands-on experience with 2 or more Observability platforms, including alert creation, dashboard creation, and infrastructure monitoring
  • Good understanding of CI/CD with at least one deployment and version control tool
  • Basic understanding and practical experience with PromQL, Prometheus's query language, for querying metrics and creating custom dashboards
  • Excellent problem-solving and debugging skills
  • Strong English verbal and written communication skills
  • Ability to analyze complex systems, identify inefficiencies or gaps, and propose optimized monitoring solutions
  • Ability to also work across US and European timezones

Nice to have

Cloud Service Provider DevOps certifications would be a plus

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Cloud and Observability Engineer

8 matching positions

Cloud Software Engineer - Observability Platform

ClickHouse is looking for an experienced engineer to join our Observability team...
Location
Location
United States
Salary
Salary:
141000.00 - 208000.00 USD / Year
clickhouse.com Logo
ClickHouse
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years building and running production systems at scale
  • Proficiency in Golang
  • Experience with Kubernetes, Helm, ArgoCD, and Terraform or similar IaC tools
  • Comfortable working with at least one major cloud provider (AWS, GCP, Azure)
  • Experience with OpenTelemetry, Prometheus, Grafana, or similar tools
  • Experience with ClickHouse preferred
Job Responsibility
Job Responsibility
  • Design, build, and operate distributed systems that power observability across ClickHouse Cloud
  • Own reliability, performance, and cost-efficiency of our telemetry pipeline and storage systems
  • Take part in the on-call rotation and help drive root-cause resolution and long-term fixes
  • Build tooling and automation to eliminate repetitive operational work
  • Help shape the roadmap for observability by identifying bottlenecks and scaling challenges
  • Collaborate with other engineering teams to improve their observability posture
  • Contribute to design discussions, architecture reviews, and mentor teammates
What we offer
What we offer
  • Flexible work environment - ClickHouse is a globally distributed company and remote-friendly. We currently operate in 20 countries
  • Healthcare - Employer contributions towards your healthcare
  • Equity in the company - Every new team member who joins our company receives stock options
  • Time off - Flexible time off in the US, generous entitlement in other countries
  • A $500 Home office setup if you’re a remote employee
  • Global Gatherings – We believe in the power of in-person connection and offer opportunities to engage with colleagues at company-wide offsites
  • Fulltime
Read More
Arrow Right

Cloud Software Engineer - Observability Platform

ClickHouse is looking for an experienced engineer to join our Observability team...
Location
Location
Canada
Salary
Salary:
Not provided
clickhouse.com Logo
ClickHouse
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years building and running production systems at scale
  • Proficiency in Golang
  • Experience with Kubernetes, Helm, ArgoCD, and Terraform or similar IaC tools
  • Comfortable working with at least one major cloud provider (AWS, GCP, Azure)
  • Experience with OpenTelemetry, Prometheus, Grafana, or similar tools
  • Experience with ClickHouse preferred
Job Responsibility
Job Responsibility
  • Design, build, and operate distributed systems that power observability across ClickHouse Cloud
  • Own reliability, performance, and cost-efficiency of our telemetry pipeline and storage systems
  • Take part in the on-call rotation and help drive root-cause resolution and long-term fixes
  • Build tooling and automation to eliminate repetitive operational work
  • Help shape the roadmap for observability by identifying bottlenecks and scaling challenges
  • Collaborate with other engineering teams to improve their observability posture
  • Contribute to design discussions, architecture reviews, and mentor teammates
What we offer
What we offer
  • Flexible work environment - ClickHouse is a globally distributed company and remote-friendly. We currently operate in 20 countries
  • Healthcare - Employer contributions towards your healthcare
  • Equity in the company - Every new team member who joins our company receives stock options
  • Time off - Flexible time off in the US, generous entitlement in other countries
  • A $500 Home office setup if you’re a remote employee
  • Global Gatherings – We believe in the power of in-person connection and offer opportunities to engage with colleagues at company-wide offsites
Read More
Arrow Right

Federal Observability Engineer

You will be part of a larger technical team, working as an Observability Enginee...
Location
Location
United States , HILL AFB
Salary
Salary:
105500.00 - 243000.00 USD / Year
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • US Citizenship Required
  • Secret Clearance Required
  • DD8750 - Security Plus or higher Security Certification (CISSP, CASP, etc)
  • Bachelor's degree preferred or Associate degree holder (technical field) with 6-8 years working experience in related fields
  • Strong understanding of cloud computing platforms (AWS, Azure, GCP)
  • Experience with containerization technologies (Docker, Kubernetes)
  • Proficiency in scripting languages (Python, Go, Bash)
  • Experience with SQL and NoSQL databases
  • Knowledge of networking protocols (TCP/IP, HTTP)
  • Proven experience with the OpsRamp platform is a strong plus
Job Responsibility
Job Responsibility
  • Designing, implementing, and maintaining observability infrastructure in an OpsRamp environment
  • Working as part of a larger technical team supporting HPE's PCE environment and Cloud infrastructure for a Federal Customer
  • Configuring and managing data sources, defining and monitoring key performance indicators (KPIs), and analyzing performance trends
  • Configuring log collection, aggregation, and analysis within the OpsRamp platform
  • Creating and managing alerts, defining escalation paths, and integrating with incident management systems
  • Developing and implementing automated workflows and remediation actions within the OpsRamp platform
  • Designing and building custom dashboards and reports to provide key insights into system health and performance
  • Integrating OpsRamp with other monitoring and observability tools as needed
  • Ensuring data quality and integrity within the OpsRamp platform
  • Troubleshooting and resolving performance issues, application errors, and other operational problems
What we offer
What we offer
  • Health & Wellbeing benefits
  • Personal & Professional Development programs
  • Unconditional Inclusion environment
  • Comprehensive suite of benefits supporting physical, financial and emotional wellbeing
  • Fulltime
Read More
Arrow Right

Staff Observability Operations Engineer

We are currently seeking several experienced and highly skilled Staff Observabil...
Location
Location
United States , Hartford
Salary
Salary:
130295.00 - 260590.00 USD / Year
https://www.cvshealth.com/ Logo
CVS Health
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ Years of experience in IT operations, with significant responsibilities in system monitoring, performance tuning, and troubleshooting enterprise applications
  • 5+ Years in a Site Reliability Engineering (SRE) role deploying and managing modern observability solutions
  • 5+ Years managing and implementing observability and event management platforms (e.g., AppDynamics, Splunk, Prometheus, Grafana)
  • Experience developing and administering ServiceNow ITOM event management solutions
  • Experience deploying and managing service reliability platforms (e.g., xMatters, OpsGenie, PagerDuty)
  • Experience with and deep knowledge of cloud environments, cloud monitoring platforms, and container orchestration tools (e.g., AWS/CloudTrail, Azure/Monitor, GCP/GCM, Kubernetes, OpenShift)
  • Proficiency in Python and other scripting languages such as Ansible, PowerShell, Bash for automation and configuration
  • Hands-on experience deploying, managing, and administering observability platforms
  • Hands-on experience leading, coordinating, and performing migration of application, platform, and infrastructure observability solutions
  • Proven ability to troubleshoot and resolve complex technical issues
Job Responsibility
Job Responsibility
  • Deploy and implement modern observability solutions
  • Manage and administer observability and event management platforms
  • Coordinate and manage release cycles for observability platforms
  • Troubleshoot and resolve incidents related to observability platforms
  • Continuously monitor and enhance platform performance
  • Collaborate with cross-functional stakeholders
  • Provide training and mentoring to junior engineers
  • Ensure compliance and security of observability platforms
  • Maintain documentation of observability platform configurations
  • Generate and analyze reports on platform performance and capacity
What we offer
What we offer
  • Affordable medical plan options
  • a 401(k) plan (including matching company contributions)
  • an employee stock purchase plan
  • No-cost programs for all colleagues including wellness screenings, tobacco cessation and weight management programs
  • confidential counseling and financial coaching
  • Paid time off
  • flexible work schedules
  • family leave
  • dependent care resources
  • colleague assistance programs
  • Fulltime
Read More
Arrow Right

Cloud Security Site Reliability Engineer

This role sits within the Cloud Security team responsible for Private and Public...
Location
Location
Singapore , Singapore
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree or equivalent work experience
  • 6+ years of relevant work experience
  • Highly motivated self-starter with excellent interpersonal and communication skills
  • Certification or formal training in site reliability engineering concepts and practices
  • Prior experience working towards SLIs, SLOs and observability capabilities at a large scale
  • 4+ years experience in Python (preferable) or Java, on large scale systems alongside Linux based scripting languages
  • Experience working on observability, logging and metrics toolsets
  • Experience of k8s and container technologies such as Docker, Openshift and EKS
  • Experience with public cloud technologies such as AWS, GCP or Azure
  • Experience with Secrets products such as HashiCorp Vault or CyberArk
Job Responsibility
Job Responsibility
  • Working across Container products and Secrets products, across Public and Private Cloud, as well as Cloud native specific products
  • Architecting and building tools and platforms that provide capabilities for SRE
  • Collaboration with multiple stakeholders and partners across Engineering and Operations as well as partner teams within the wider Citi organisation
  • Actively owning production level incidents till resolution.
What we offer
What we offer
  • Equal opportunity employer
  • Accessibility support for persons with disabilities.
  • Fulltime
Read More
Arrow Right

Senior Observability Engineer

Coralogix is a modern, full-stack observability platform transforming how busine...
Location
Location
Germany , Berlin
Salary
Salary:
Not provided
coralogix.com Logo
Coralogix
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in Site Reliability, DevOps, or Platform Engineering with a focus on observability
  • Proven expertise with at least one major observability platform (e.g., Prometheus, Victoria Metrics, OpenSearch)
  • Hands-on experience with Kubernetes, including deep knowledge of controllers, operators, and Helm
  • Experience writing Kubernetes controllers (controller-runtime, KubeBuilder)
  • Strong programming skills in Go or Python (Rust is a plus)
  • Experience designing, scaling, and operating observability systems at enterprise scale
  • Familiarity with at least one major cloud provider (AWS, Azure, or GCP)
  • Strong understanding of distributed systems, telemetry pipelines, and instrumentation standards (e.g., OpenTelemetry)
  • Excellent communication skills with the ability to explain complex topics to diverse stakeholders
Job Responsibility
Job Responsibility
  • Design, implement, and maintain observability features such as Alerting, SLOs, Reporting, and Synthetic Tests
  • Manage and scale OpenTelemetry Collectors and other observability agents across Kubernetes environments
  • Write and maintain Kubernetes Controllers using frameworks like controller-runtime and KubeBuilder
  • Operate and optimize the internal Coralogix account, ensuring proper usage, cost efficiency, and best practices adoption
  • Define and enforce observability guidelines and standards across the organization
  • Partner with engineering teams to embed observability by default into products and services
  • Control observability-related costs while maximizing performance, visibility, and value
  • Contribute to upstream projects such as OpenTelemetry, helping shape industry standards
  • Explore and implement cutting-edge observability technologies, including eBPF-based approaches
  • Fulltime
Read More
Arrow Right

Cloud Scale Test Engineer

As a Cloud Scale Test Engineer, you will ensure the delivery of high performance...
Location
Location
United States , San Jose
Salary
Salary:
90400.00 - 208500.00 USD / Year
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or master's degree in computer science, engineering, information systems, or related field
  • 1-4 years of experience
  • Strong Python Coding Skills
  • Passion for AI-driven automation and process optimization
  • Experience working on cloud platforms like AWS/Azure/Google Cloud
  • Experience with Observability platforms like Prometheus, Grafana. Open Search, New Relic etc
  • Knowledge of distributed tracing and debugging in cloud-native environments
  • Proficiency in GIT, Jira, Jenkins, and CI/CD tool
  • Basic networking knowledge
  • Experience in testing containerized applications and Kubernetes-based environments
Job Responsibility
Job Responsibility
  • Design, develop, and maintain robust test automation frameworks for cloud-scale distributed systems
  • Architect performance, load, and stress tests to validate system resilience under high traffic conditions
  • Build fault-injection and chaos engineering strategies to assess the reliability of distributed services
  • Develop and execute end-to-end integration, API, and system-level tests across microservices-based architectures
  • Implement continuous testing pipelines within CI/CD workflows to accelerate deployment cycles
  • Collaborate closely with development, SRE, and infrastructure teams to ensure quality best practices are embedded within the SDLC
  • Analyze system logs, telemetry data, and observability metrics to identify and mitigate potential failures before they impact production
  • Drive automation of security testing, API contract validation, and infrastructure testing
  • Participate in diagnosing critical production issues related to system reliability and performance
What we offer
What we offer
  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion
  • Fulltime
Read More
Arrow Right

Senior Cloud Engineer - Tech Lead

We're looking for an experienced and motivated Senior Cloud Engineer - Tech Lead...
Location
Location
Israel , Herzliya
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of experience in backend software development
  • Strong background in SaaS architecture and cloud-native technologies
  • Proficiency in programming languages: Go, .NET, TypeScript and Node.js
  • Experience with CI/CD pipelines, microservices, and containerization (Docker, Kubernetes)
  • Ability to take ownership of features from design through implementation and delivery
  • Excellent communication skills and the ability to work collaboratively across functions
  • Experience with monitoring, observability, and operational excellence in production environments
  • Experience mentoring others will be considered an advantage
Job Responsibility
Job Responsibility
  • Drive the design, development, and delivery of scalable SaaS solutions with high availability and performance
  • Develop and maintain backend services, fostering a culture of ownership, collaboration, and continuous improvement
  • Ensure adherence to software development best practices, including code reviews, testing, documentation and knowledge sharing
  • Contribute to architectural decisions and long-term technical vision
  • Execute technical tasks for key initiatives, ensuring alignment with product goals
  • Collaborate with POs, DevOps, QA, Security and other cross-functional teams to deliver features on time and with high quality
  • Mentor junior developers by guiding their technical growth, CRs and supporting them in conducting research and implementing high-quality solutions
  • Analyze and troubleshoot customer escalations, identify root causes, and drive timely, effective solutions to ensure product reliability and customer satisfaction
What we offer
What we offer
  • Comprehensive suite supporting physical, financial and emotional wellbeing
  • Career development programs
  • Inclusive work environment
  • Fulltime
Read More
Arrow Right