CrawlJobs Logo

Lead Observability Engineer

· Job Posted May 04, 2026
Apply Position
Job Link Share

Job Description

We are seeking a Lead Observability Engineer to join the team, and be able to work within ET (Eastern Time) or 14:00–22:00 / 15:00–23:00 CET. The ideal candidate has proven expertise in designing, operating, and scaling analytical data systems, specifically ClickHouse or similar distributed databases. In this role, you will take a hands-on leadership position in architecting and migrating our existing custom Cosmos telemetry storage system to a robust, high-performing ClickHouse-based solution. You will also play a key role in building the foundation for alerting, notification, and telemetry workflows, enabling full visibility into production systems and improving system observability at scale.

Job Responsibility

  • Lead the migration and transformation of telemetry storage from custom Cosmos DB solutions to ClickHouse, building a scalable and reliable end-to-end observability platform
  • Architect, implement, and maintain alerting and notification systems integrated with ClickHouse for critical services and applications
  • Develop, deploy, and operate high-throughput telemetry pipelines, ensuring accurate and actionable monitoring across cloud environments
  • Collaborate with engineering and product teams to define and champion observability best practices
  • Work with DevOps and development teams to automate collection, ingestion, and retention policies for logs, metrics, and traces
  • Drive continuous improvement in system performance, stability, and reliability through effective observability
  • Participate in on-call rotations, incident response, and root cause analysis to enhance monitoring and alerting capabilities.

Requirements

  • 5+ years of engineering experience in cloud observability platforms, infrastructure, and telemetry systems
  • Deep experience in alerting, notifications, and monitoring at scale
  • Advanced expertise with ClickHouse, or similar high-performance analytical databases, for telemetry storage and querying
  • Hands-on experience migrating telemetry/storage solutions (preferably from Cosmos DB to ClickHouse or equivalent)
  • Solid understanding of telemetry pipelines, cloud-native monitoring, and best practices
  • Experience with dashboarding and visualization tools (Grafana, Kibana, or similar)
  • Strong scripting and automation skills (Python, Bash, Terraform or equivalent)
  • Proven collaboration and communication skills across cross-functional teams.

What we offer

  • Flexible working format - remote, office-based or flexible
  • A competitive salary and good compensation package
  • Personalized career growth
  • Professional development tools (mentorship program, tech talks and trainings, centers of excellence, and more)
  • Active tech communities with regular knowledge sharing
  • Education reimbursement
  • Memorable anniversary presents
  • Corporate events and team buildings
  • Other location-specific benefits

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Lead Observability Engineer

8 matching positions

Lead Observability Engineer

Lead Observability Engineer role focusing on the Elastic Observability Platform,...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
blueyonder.com Logo
Blue Yonder
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Engineering, MIS, or equivalent experience
  • 7–10+ years of experience in observability engineering, SRE, monitoring platform ownership, or infrastructure operations
  • Deep, hands-on expertise with Elastic Stack (Elasticsearch, Kibana, Logstash, Beats/Elastic Agent, APM)
  • Strong architectural knowledge of cloud (Azure/AWS) and hybrid observability patterns
  • Experience leading observability for infrastructure, cloud platforms, network systems, Kubernetes, and Microsoft 365
  • Proven experience designing monitoring for SaaS platforms (Workday, Salesforce, ServiceNow)
  • Advanced scripting/automation experience (Python, PowerShell, Bash)
  • Strong knowledge of API integrations, data pipelines, and log-flow engineering
  • Experience leading incident diagnostics and delivering visibility for RCA and operational improvement
  • Strong analytical, architectural, and troubleshooting skills with a platform-owner mindset
Job Responsibility
Job Responsibility
  • Receives work assignments through the ticketing system or from senior leadership
  • Provides Tier-4 engineering expertise, platform ownership, and technical leadership for all observability capabilities across hybrid cloud, on-premises, and SaaS environments
  • Leads the design, architecture, and maturity of the enterprise observability ecosystem with a primary focus on the Elastic Observability Platform
  • Drives the enterprise strategy for logging, metrics, traces, synthetics, and alerting—including governance, standardization, and performance optimization
  • Partners closely with Cloud, Infrastructure, Security, Enterprise Applications, and SRE leadership to define observability frameworks
  • Ensures observability platforms meet enterprise requirements for security, performance, availability, compliance, and scalability
  • Oversees monitoring implementations for key SaaS applications including Workday, Salesforce, ServiceNow, and Microsoft 365
  • Provides guidance, mentorship, and direction to observability engineers, SREs, and operational teams
  • Acts as a strategic advisor during major incidents by providing real-time diagnostics, correlation insights, and driving RCA improvements
  • Required to provide on-call support during off-hours on weekdays, weekends, and holidays on a rotating basis
  • Fulltime
Read More
Arrow Right

Lead Observability Platform Engineer

Capital One is looking for an Observability Platform Engineer to join our Associ...
Location
Location
United States , Plano; McLean; Richmond
Salary
Salary:
149800.00 - 188100.00 USD / Year
capitalone.com Logo
Capital One
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • High School Diploma, GED, or equivalent certification
  • At least 3 years of experience creating reports and building alert monitors
  • At least 3 years working with macOS and Windows platforms
  • Strong analytical and technical skills
  • Ability to foster collaborative, open, working relationships with technology groups and other stakeholders, including vendor relationships
  • Demonstrated clear communication skills and ability to interact effectively at all levels of an organization, and to influence senior management and executives
  • Strong knowledge of syntax structures for reporting languages, such as SQL or Opal, and good familiarity with parsing data.
Job Responsibility
Job Responsibility
  • Work with partner teams to update configurations for our log collectors on our Windows and Mac endpoints
  • Work with stakeholders to identify, discuss and prioritize log ingestion strategies
  • Build complex dashboards that tell stories about the health of our endpoints, and identify opportunities for improvements
  • Create monitors that alert platform teams when changes to the environment may be impacting the health of devices and user experiences
  • Create reports that detail the performance of applications on our endpoints, and applications being considered for future deployment
  • Assist platform teams with issue triage by providing complex data and log analysis where needed
  • Use data to tell stories to our senior leaders, help to drive vendor and product roadmaps
  • Help create processes and strategies that can validate changes in performance across operating system and product version updates
What we offer
What we offer
  • Performance based incentive compensation, which may include cash bonus(es) and/or long term incentives (LTI)
  • A comprehensive, competitive, and inclusive set of health, financial and other benefits that support your total well-being
  • Fulltime
Read More
Arrow Right

Lead Engineer – Capital Markets Technology - FICC Strategic Trade Management

Wells Fargo is seeking a Lead Engineer to drive the modernization of our Fixed I...
Location
Location
United States , Iselin
Salary
Salary:
159000.00 - 305000.00 USD / Year
https://www.wellsfargo.com/ Logo
Wells Fargo
Expiration Date
June 23, 2026
Flip Icon
Requirements
Requirements
  • 5+ years of Specialty Software Engineering experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
  • 5+ years of experience with Java, Python/FastAPI, gRPC, service mesh, and microservices with a track record of building scalable, resilient, high-performance systems using modern engineering patterns
  • 5+ years of engineering experience in technology with strong Agentic Engineering and Data expertise
  • 5+ years of experience with advanced proficiency with distributed systems design, Data Modeling , and high-volume environments
  • 5+ years of experience with major GenAI platforms (e.g., GitHub Copilot), Claude, Codex
Job Responsibility
Job Responsibility
  • Own and execute the architecture roadmap for a Strategic Trade Management platform aligned to FINOS CDM-based schemas, enabling robust modeling of trade events, risk states, and valuation flows
  • Design and build distributed microservices using Java (Spring Boot, reactive), gRPC, Apache Ignite/Flink/Kafka, Python/FastAPI, and service mesh (Istio/Linkerd), applying FINOS CDM-based data structures to optimize for low latency and high throughput
  • Embed agentic AI solutions within platform workflows where they add measurable value—using FINOS CDM-aligned context to automate trade-state transitions, exception management, and operational decision support
  • Lead key workstreams in the transformation from monolithic systems to cloud-native architectures, applying domain-driven design and event-driven patterns to deliver scalable, resilient, and auditable services
  • Drive non-functional requirements (latency, scalability, resilience, observability, security) for the platform and help operationalize them through modern CI/CD, automated testing, and production readiness standards
  • Partner with stakeholders across trading, quant, risk, operations, and compliance to translate business requirements into pragmatic technical designs that meet regulatory and risk standards
  • Mentor engineers and tech leads through design reviews, code reviews, and hands-on guidance
  • establish best practices for secure, observable, and testable services
  • drive adoption of FINOS CDM-driven architectures and modern distributed-system patterns
What we offer
What we offer
  • Health benefits
  • 401(k) Plan
  • Paid time off
  • Disability benefits
  • Life insurance, critical illness insurance, and accident insurance
  • Parental leave
  • Critical caregiving leave
  • Discounts and savings
  • Commuter benefits
  • Tuition reimbursement
  • Fulltime
!
Read More
Arrow Right

Lead Engineer - Infrastructure Solutions

The Commercial Corporate & Investment Bank Technology (CCIBT) Infrastructure Sol...
Location
Location
United States , Charlotte; Irving; Iselin
Salary
Salary:
119000.00 - 224000.00 USD / Year
https://www.wellsfargo.com/ Logo
Wells Fargo
Expiration Date
July 05, 2026
Flip Icon
Requirements
Requirements
  • 5+ years of Technology Infrastructure Engineering and Solutions experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
  • 5+ years of experience defining, designing, or delivering infrastructure, platform, middleware, application, or enterprise technology solutions
  • 5+ years of experience working across large-scale enterprise environments, including on-premises, private cloud, or hybrid technology platforms
Job Responsibility
Job Responsibility
  • Partner with business, technology, and platform stakeholders to understand functional requirements, technical constraints, risk considerations, dependencies, and success criteria
  • Translate business needs into practical technical designs, implementation approaches, and engineering work packages that can be delivered in a large-scale enterprise environment
  • Evaluate solution options and communicate trade-offs across performance, resiliency, scalability, security, compliance, cost, maintainability, and delivery timelines
  • Help convert ambiguous problem statements into actionable technical direction for developers, engineers, and platform teams
  • Team with developers, engineers, architects, and infrastructure partners to design, build, test, validate, and deliver technical solutions
  • Provide engineering-level triage and solution direction for complex issues spanning application, middleware, compute, network, automation, DevSecOps, and operational readiness domains
  • Review designs, identify gaps, challenge assumptions, clarify dependencies, and help remove technical ambiguity that blocks delivery
  • Produce and maintain engineering design artifacts, implementation notes, operational runbooks, decision records, readiness documentation, and knowledge-transfer materials
  • Investigate and help resolve a wide array of technical problems, including low-latency application concerns, batch-processing challenges, environment issues, platform integration gaps, and production-readiness risks
  • Analyze complex technical symptoms, identify likely root causes, engage the right technical partners, and drive toward practical remediation options
What we offer
What we offer
  • Health benefits
  • 401(k) Plan
  • Paid time off
  • Disability benefits
  • Life insurance, critical illness insurance, and accident insurance
  • Parental leave
  • Critical caregiving leave
  • Discounts and savings
  • Commuter benefits
  • Tuition reimbursement
  • Fulltime
Read More
Arrow Right

Sr Data Quality & Observability Engineer (Snowflake)

Lamb Weston is continuing to modernize its enterprise data ecosystem to support ...
Location
Location
United States , Eagle
Salary
Salary:
117060.00 - 175600.00 USD / Year
lambweston.com Logo
Lamb Weston
Expiration Date
July 27, 2026
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Information Systems, Data Analytics, or a related field, or equivalent experience
  • 5+ years of experience in data analysis, data quality, or analytics engineering roles
  • Strong SQL skills and experience working with large, complex datasets
  • Hands-on data quality experience, including implementing data quality logic using SQL and data functions (e.g., window functions, conditional logic, string/date functions, aggregations, table functions/CTEs)
  • Demonstrated experience with data profiling, data validation, and data quality frameworks
  • Experience with Git-based version control, code review practices, and deploying changes through SDLC/CI-CD processes
  • Experience working in SAP data environments (ECC, S/4HANA, BW, or HANA)
  • Business Analyst skills, including requirements gathering, documentation, and stakeholder facilitation
  • Familiarity with cloud data platforms such as Snowflake and AWS preferred
  • Understanding of data governance, metadata, and lineage concepts
Job Responsibility
Job Responsibility
  • Design, implement, and maintain data quality rules, checks, and controls across enterprise data assets
  • Perform data profiling, root cause analysis, and anomaly detection across SAP and non-SAP data sources
  • Partner with business stakeholders to understand data quality issues, business impacts, and remediation priorities
  • Translate business requirements into measurable data quality rules and thresholds
  • Develop and maintain data quality frameworks, including reusable SQL patterns, UDFs, stored procedures
  • Implement automated scheduling and orchestration of data quality checks using Snowflake-native capabilities (e.g., tasks, streams) and/or pipeline orchestration tools (ie: Informatica)
  • Implement data quality monitoring and observability scorecards, and reporting for key metadata domains
  • Own and evolve enterprise data quality KPIs/scorecards, including standardized definitions, thresholds, and executive-ready reporting across domains
  • Analyze data discrepancies and ensure reconciliation back to systems of record
  • Lead issue management workflows, including defect triage, prioritization, root cause documentation, corrective action validation, and prevention recommendations
What we offer
What we offer
  • Health Insurance Benefits - Medical, Dental, Vision
  • Flexible Spending Accounts for Health and Dependent Care, and Health Reimbursement Accounts
  • Well-being programs including companywide events and a wellness incentive program
  • Paid Time Off
  • Financial Wellness – Industry leading 401(k) plan with generous company contributions, Financial Planning Services, Employee Stock purchase program, and Health Savings Accounts, Life and Accident insurance
  • Family-Friendly Employee events
  • Employee Assistance Program services – mental health and other concierge type services
  • Fulltime
Read More
Arrow Right

Lead Engineer

As a Mobile Squad Lead Engineer, you’ll be the technical backbone of your mobile...
Location
Location
India , Mumbai
Salary
Salary:
Not provided
collinsongroup.com Logo
Collinson
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of professional software engineering experience, with a strong focus on mobile application development
  • Strong experience building native Android and native iOS applications, with a deep understanding of modern mobile architectures and best practices
  • Previous experience as a Lead Engineer, Tech Lead, or Senior Mobile Engineer leading a small team
  • Proven experience integrating AI-assisted development tools and techniques into everyday workflows (e.g. code generation, testing, debugging, performance analysis, or automation)
  • Hands-on experience integrating mobile apps with backend APIs (REST and/or GraphQL) and cloud-based services
  • Strong focus on testing and quality, with experience using mobile testing frameworks and automated testing approaches
  • Experience owning mobile CI/CD pipelines, build automation, and app store release processes
  • An observability mindset, with experience using crash reporting, performance monitoring, and analytics tools to improve app reliability
  • Comfortable working with Git, Jira, Confluence, and modern agile engineering workflows
  • Proven ability to mentor engineers, review code and designs, and balance pragmatism with mobile engineering excellence
Job Responsibility
Job Responsibility
  • Lead the design and development of native Android and iOS applications, ensuring they are scalable, performant, secure, and well-architected
  • Provide technical leadership to the squad, guiding decisions on mobile architecture, frameworks, libraries, and best practices in code quality, performance, accessibility, and security
  • Remain highly hands-on, spending most of your time writing and reviewing mobile code, and ensuring high-quality, maintainable applications
  • Collaborate closely with backend engineers to design and consume APIs that support robust and efficient mobile experiences
  • Mentor and coach mobile engineers, fostering a culture of continuous learning, strong ownership, and technical excellence
  • Own mobile build, release, and deployment processes, including CI/CD pipelines and app store submissions for Google Play and Apple App Store
  • Champion automated testing, app stability, observability, and the responsible use of AI to improve developer productivity and software quality
  • Fulltime
Read More
Arrow Right

Lead Engineer

As a Lead Engineer at DreamFi, you’ll be a hands-on technical leader responsible...
Location
Location
United States
Salary
Salary:
120000.00 - 150000.00 USD / Year
mojotech.com Logo
MojoTech
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of professional experience in software engineering, with 1–2+ years in a leadership role
  • Deep backend experience with Golang in production environments
  • Solid experience building mobile apps with React Native
  • Strong understanding of AWS services (Lambda, ECS, RDS, S3, IAM, etc.)
  • Experience integrating BaaS or third-party fintech APIs
  • Proficiency in PostgreSQL, including schema design, query optimization, and migrations
  • Working experience with TypeScript in production environments
  • Familiarity with secure API design, authentication (OAuth2, JWT), and data protection
  • Strong understanding of engineering best practices around testing, version control, CI/CD, and code maintainability
  • A collaborative mindset, strong communication skills, and the ability to work across disciplines
Job Responsibility
Job Responsibility
  • Partner with the VP of Engineering to set the technical direction, team processes, and development roadmap
  • Lead the architecture, development, and deployment of core platform features
  • Work across the stack: React Native (mobile), Go (backend), and AWS (infrastructure)
  • Own integrations with BaaS providers (e.g., Synapse, Unit, Galileo) and other fintech APIs
  • Collaborate closely with product, design, and leadership to translate business needs into technical execution
  • Mentor and support engineers through code reviews, guidance, and career growth
  • Scale and maintain infrastructure using modern DevOps practices (IaC, CI/CD, monitoring)
  • Help define and support hiring efforts to grow a world-class engineering team
What we offer
What we offer
  • Equity and growth opportunities at a fast-growing fintech startup
  • Remote-first culture
  • Flexible hours
  • Strong autonomy
  • Fulltime
Read More
Arrow Right

Senior Data Engineer Lead / Architect - Senior Vice President

At Citi Services - Global Trade Technology Organization, we are on a mission to ...
Location
Location
India , Pune, Maharashtra, India, Chennai, Tamil Nadu, India
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of professional experience in data engineering, with a proven track record of designing and building large-scale data systems
  • 3+ years in a technical leadership or architect role, with experience mentoring junior and senior engineers
  • Expert-level proficiency in at least one programming language (Python or Scala preferred) and exceptional SQL skills
  • Proven hands-on experience with Python or Scala for data manipulation, scripting, machine learning, and backend development
  • Deep, hands-on experience with a major cloud platform (AWS, GCP, or Azure) and its data ecosystem (e.g., S3/GCS, Redshift/BigQuery, EMR/Dataproc, Kinesis/Dataflow)
  • Extensive hands-on experience with modern big data technologies and Data streaming (like Hadoop, Hive, Impala, Apache Spark, Kafka, or Flink)
  • Proficiency with workflow orchestration tools such as Airflow, Dagster, or Prefect
  • Proficiency in designing and implementing microservices architectures, RESTful APIs, and event-driven systems with 'Data as a Product' Principle
  • Solid understanding of data modeling concepts and database design for both analytical (OLAP) and transactional (OLTP) workloads
  • Deep understanding and hands-on experience with relational databases (e.g., PostgreSQL, Oracle), NoSQL databases (e.g., MongoDB, Cassandra), data warehousing, and big data technologies (e.g., Spark, Kafka)
Job Responsibility
Job Responsibility
  • Architect & Design: Design, architect, and oversee the development of robust, scalable, and reliable data infrastructure, including data lakes, data warehouses, and real-time streaming platforms on the cloud
  • Build & Code: Act as a senior individual contributor and hands-on technical leader. Write clean, maintainable, and high-performance code for data ingestion, transformation, and serving layers (e.g., using Python, Scala, SQL, and Spark)
  • Lead & Mentor: Lead a team of data engineers, providing technical guidance, mentorship, and career development support. Foster a collaborative and inclusive team environment
  • Champion Culture: Define, document, and champion data engineering best practices across the organization, including CI/CD, data quality, testing frameworks, observability, and code review standards
  • Drive Strategy: Partner with leadership, product managers, data scientists, and analysts to understand data needs and develop a long-term data strategy and roadmap
  • Innovate & Evaluate: Stay at the forefront of data engineering technologies. Evaluate, prototype, and recommend new tools and frameworks to continuously improve our data platform
  • Ensure Governance: Implement and enforce robust data governance, security, and privacy policies in partnership with our security and compliance teams
  • Fulltime
Read More
Arrow Right