CrawlJobs Logo

Observability Lead

Chicago Trading Company

Location Icon

Location:
United States , Chicago

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

175000.00 - 250000.00 USD / Year

Job Description:

We are seeking an Observability Lead to own the strategy, execution, and technical direction of CTC's observability platform. In this role, you will lead a small, high-impact team responsible for the tools and systems that give our engineers, quants, and traders visibility into the health and performance of critical infrastructure and applications.

Job Responsibility:

  • Define and drive the observability roadmap
  • Lead the design, implementation, and continuous improvement of monitoring, alerting, logging, tracing, and metrics infrastructure at scale
  • Own the end-to-end developer experience of observability tooling
  • Manage and grow a small team of engineers
  • Partner with infrastructure, platform, and application teams
  • Establish and enforce best practices for instrumentation, SLOs, alert quality, and operational readiness
  • Evaluate emerging tools, frameworks, and approaches in the observability space

Requirements:

  • 8+ years of technical engineering experience
  • At least 3 years focused on observability, monitoring, or site reliability engineering
  • Demonstrated expertise designing, building, and operating observability platforms at scale
  • Deep experience with Datadog and OpenTelemetry strongly preferred
  • Proven experience leading or managing a small engineering team
  • Strong understanding of distributed systems and micro-services architectures
  • Hands-on experience with Kubernetes and bare-metal infrastructure
  • Advanced programming proficiency in at least one of Python, Go, or Java
  • Familiarity with C++ or low-latency systems is a strong plus
  • A product-oriented mindset
  • Exceptional communication skills
  • Financial sector experience (trading, prop trading, hedge funds) and familiarity with low-latency, high-reliability systems are strongly preferred
  • Advanced degree (MS, PhD) in Computer Science, Engineering, or related field is a plus

Nice to have:

  • Familiarity with C++ or low-latency systems
  • Financial sector experience (trading, prop trading, hedge funds)
  • Advanced degree (MS, PhD) in Computer Science, Engineering, or related field
What we offer:
  • Generous medical coverage
  • Paid parental leave
  • Free breakfast and lunch
  • Healthy snacks
  • Wellness reimbursement
  • Quarterly recharge days

Additional Information:

Job Posted:
March 13, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Observability Lead

Solutions Engineering Lead

Coralogix is a modern full-stack observability platform that transforms how busi...
Location
Location
United States , Dallas
Salary
Salary:
Not provided
coralogix.com Logo
Coralogix
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years in customer-facing technical roles (Sales Engineering, Solutions Architecture or similar)
  • 3+ years leading or managing pre-sales technical teams with a record of coaching success
  • Experience supporting or owning team-level quotas within a sales organization
  • Hands-on expertise with the following: Kibana, Grafana, Datadog, New Relic, Splunk, Honeycomb, Jaeger, OpenSearch
  • Proficiency crafting PromQL, Lucene and SQL queries for troubleshooting and dashboards
  • Deep knowledge of cloud services central to observability: AWS: EKS, Fargate, Lambda, CloudFormation, CloudWatch Logs and Metrics
  • Azure Monitor and equivalents in Google Operations Suite
  • Working knowledge of OpenTelemetry, modern DevOps and container platforms (Kubernetes, Docker)
  • Strong ability to communicate with engineers and C-level audiences alike
  • Familiarity with structured sales methodologies such as MEDDPIC or Command of the Message (plus)
Job Responsibility
Job Responsibility
  • Own regional SE performance in partnership with Account Executives, ensuring quota attainment and deal velocity
  • Hire, onboard and mentor Solutions Engineers, setting clear KPIs and career paths
  • Maintain a strong personal presence with customers, modeling technical excellence and closing strategic opportunities
  • Improve processes for discovery, POC execution, documentation and knowledge sharing
  • Collaborate with Product, Support and Customer Success to shorten feedback loops and accelerate adoption
  • Architect and deploy reference designs for logs, metrics, traces, SIEM and Kubernetes monitoring across AWS, Azure and GCP
  • Lead white-board deep-dive sessions on ingestion pipelines, index-free querying and cost-optimized retention strategies
  • Provide escalation support during POCs: troubleshoot complex issues, analyze logs, traces, craft PromQL, Lucene or Dataprime queries and isolate root causes
  • Track technical success metrics such as POC win rate, onboarding time-to-value and validation scorecards, converting data insights into process improvements
  • Contribute code or scripts (Python, Go or Java) for custom exporters, automation and synthetic monitoring
What we offer
What we offer
  • Comprehensive and inclusive employee benefits for healthcare, dental, and mental health benefits
  • A 401(k) plan and match
  • Paid sick time
  • Paid time off
  • Fulltime
Read More
Arrow Right

Solutions Engineering Lead

We are hiring a Solutions Engineering Team Lead for the East region to scale and...
Location
Location
United States , Boston
Salary
Salary:
220000.00 - 300000.00 USD / Year
coralogix.com Logo
Coralogix
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years in customer-facing technical roles (Sales Engineering, Solutions Architecture or similar)
  • 3+ years leading or managing pre-sales technical teams with a record of coaching success
  • Experience supporting or owning team-level quotas within a sales organization
  • Hands-on expertise with the following: Kibana, Grafana, Datadog, New Relic, Splunk, Honeycomb, Jaeger, OpenSearch
  • Proficiency crafting PromQL, Lucene and SQL queries for troubleshooting and dashboards
  • Deep knowledge of cloud services central to observability: AWS: EKS, Fargate, Lambda, CloudFormation, CloudWatch Logs and Metrics
  • Azure Monitor and equivalents in Google Operations Suite
  • Working knowledge of OpenTelemetry, modern DevOps and container platforms (Kubernetes, Docker)
  • Strong ability to communicate with engineers and C-level audiences alike
Job Responsibility
Job Responsibility
  • Own regional SE performance in partnership with Account Executives, ensuring quota attainment and deal velocity
  • Hire, onboard and mentor Solutions Engineers, setting clear KPIs and career paths
  • Maintain a strong personal presence with customers, modeling technical excellence and closing strategic opportunities
  • Improve processes for discovery, POC execution, documentation and knowledge sharing
  • Collaborate with Product, Support and Customer Success to shorten feedback loops and accelerate adoption
  • Architect and deploy reference designs for logs, metrics, traces, SIEM and Kubernetes monitoring across AWS, Azure and GCP
  • Lead white-board deep-dive sessions on ingestion pipelines, index-free querying and cost-optimized retention strategies
  • Provide escalation support during POCs: troubleshoot complex issues, analyze logs, traces, craft PromQL, Lucene or Dataprime queries and isolate root causes
  • Track technical success metrics such as POC win rate, onboarding time-to-value and validation scorecards, converting data insights into process improvements
  • Contribute code or scripts (Python, Go or Java) for custom exporters, automation and synthetic monitoring
What we offer
What we offer
  • Comprehensive and inclusive employee benefits for healthcare, dental, and mental health benefits
  • 401(k) plan and match
  • Paid sick time and paid time off
  • Fulltime
Read More
Arrow Right

Team Lead, Technical Account Manager

Coralogix is a modern, full-stack observability platform transforming how busine...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
coralogix.com Logo
Coralogix
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Background knowledge and hands-on practice in Cloud DevOps, specifically experience with AWS (EC2, EKS, ECS, Fargate, Lambda, CloudFormation, Load Balancers, CloudWatch) and the equivalent with Azure and GCP
  • Background knowledge and hands-on practice in Observability, specifically experience working with one or more of the following tools - Kibana, Open-Search, Grafana, Datadog, Sumologic, NewRelic, AppDynamics, Dynatrace, Prometheus, Logz.io, SignalFX, Instana, Splunk, Honeycomb, Jaeger
  • Proven experience leading technical teams, especially focused on delivering observability solutions, logging infrastructure, and successful customer onboarding
  • Ability to define and track onboarding KPIs, focusing on technical adoption and customer satisfaction
  • Strong analytical skills to interpret customer data and usage trends, ensuring continuous improvements in observability practices
  • Ability to communicate complex technical information to both technical and non-technical stakeholders
  • Excellent communication skills in English
  • Strong presentation skills with the ability to establish credibility with executives
Job Responsibility
Job Responsibility
  • Lead, mentor, and manage a team of TAMs to ensure successful customer onboarding and long-term success
  • Develop KPIs for the team and track performance related to the onboarding experience, ensuring customer satisfaction
  • Provide technical guidance and foster team collaboration on observability tools and log analytics
  • Oversee the implementation of observability tools, guiding customers through Logs, Metric and Traces monitoring, and real-time analysis
  • Ensure that your team delivers expert-level onboarding and ongoing work, for our observability and logging solutions
  • Provide deep technical insights on cloud observability and integration of Coralogix into customer infrastructures
  • Be the primary escalation point for customer technical challenges
  • Proactively work with customers to enhance their logging and observability practices, integrating them seamlessly with Coralogix’s platform
  • Engage with Coralogix stakeholders to provide tailored technical solutions that align with customer business goals
  • Leverage customer feedback and usage data to enhance the onboarding process and overall TAM team performance
  • Fulltime
Read More
Arrow Right

Cyber Security Engineering Lead

Join Citi's Cloud Technology Services team to lead and execute critical cyber se...
Location
Location
Hungary , Budapest
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of relevant cybersecurity and/or IT experience
  • Leadership roles across technology or cybersecurity leading large programs or transformational activities
  • Proven track record of delivering security observability platforms such as telemetry data for performance and/or user experience.
  • Thorough understanding of industry and corporate technology standards for Cyber Security services
  • Demonstrated ability to take ownership and work with cross functional teams to manage multiple projects simultaneously under pressure
  • Advanced analytical and problem-solving skills
  • Consistently demonstrates clear and concise written and oral communication as well as strong presentation skills to both technical and non-technical audiences.
  • Bachelor’s degree in relevant subject or equivalent work experience
Job Responsibility
Job Responsibility
  • Lead a virtual team of Infrastructure Defense professionals.
  • Lead CTB transformational and RTB activities across NDCS and act as focal point managing cyber security platforms
  • Lead, design, own and deliver Security Observability Enablement on a global scale focusing on all related perimeter technologies – such as Firewall Telemetry.
  • Deliver end-to-end dashboards of critical security service based data (such as firewall performance)
  • Working with Transformation Program Directors, Senior Architects, Steering Committees on execution of perimeter security and edge security programs
  • Work with global cyber security industry partners on influencing next generation cyber technology, take part in related R&D efforts.
  • Responsible for inventory, accuracy and engineering excellence activities for assigned services and products.
What we offer
What we offer
  • Cafeteria Program
  • Home Office Allowance (for colleagues working in hybrid work models)
  • Paid Parental Leave Program (maternity and paternity leave)
  • Private Medical Care Program and onsite medical rooms at our offices
  • Pension Plan Contribution to voluntary pension fund
  • Group Life Insurance
  • Employee Assistance Program
  • Access to a wide variety of learning and development programs, online course libraries and upskilling platforms, such as Udemy and Degreed
  • Flexible work arrangements to support you in managing work - life balance
  • Career progression opportunities across geographies and business lines
  • Fulltime
Read More
Arrow Right

Lead Observability Engineer

Lead Observability Engineer role focusing on the Elastic Observability Platform,...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
blueyonder.com Logo
Blue Yonder
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Engineering, MIS, or equivalent experience
  • 7–10+ years of experience in observability engineering, SRE, monitoring platform ownership, or infrastructure operations
  • Deep, hands-on expertise with Elastic Stack (Elasticsearch, Kibana, Logstash, Beats/Elastic Agent, APM)
  • Strong architectural knowledge of cloud (Azure/AWS) and hybrid observability patterns
  • Experience leading observability for infrastructure, cloud platforms, network systems, Kubernetes, and Microsoft 365
  • Proven experience designing monitoring for SaaS platforms (Workday, Salesforce, ServiceNow)
  • Advanced scripting/automation experience (Python, PowerShell, Bash)
  • Strong knowledge of API integrations, data pipelines, and log-flow engineering
  • Experience leading incident diagnostics and delivering visibility for RCA and operational improvement
  • Strong analytical, architectural, and troubleshooting skills with a platform-owner mindset
Job Responsibility
Job Responsibility
  • Receives work assignments through the ticketing system or from senior leadership
  • Provides Tier-4 engineering expertise, platform ownership, and technical leadership for all observability capabilities across hybrid cloud, on-premises, and SaaS environments
  • Leads the design, architecture, and maturity of the enterprise observability ecosystem with a primary focus on the Elastic Observability Platform
  • Drives the enterprise strategy for logging, metrics, traces, synthetics, and alerting—including governance, standardization, and performance optimization
  • Partners closely with Cloud, Infrastructure, Security, Enterprise Applications, and SRE leadership to define observability frameworks
  • Ensures observability platforms meet enterprise requirements for security, performance, availability, compliance, and scalability
  • Oversees monitoring implementations for key SaaS applications including Workday, Salesforce, ServiceNow, and Microsoft 365
  • Provides guidance, mentorship, and direction to observability engineers, SREs, and operational teams
  • Acts as a strategic advisor during major incidents by providing real-time diagnostics, correlation insights, and driving RCA improvements
  • Required to provide on-call support during off-hours on weekdays, weekends, and holidays on a rotating basis
  • Fulltime
Read More
Arrow Right

SRE Observability Lead Engineer

The SRE Observability Lead Engineer is a hands-on leader responsible for shaping...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Relevant experience in Observability, SRE, Infrastructure Engineering, or Platform Architecture, including several years in senior leadership roles
  • Deep expertise in observability tools and stacks such as Grafana, Prometheus, OpenTelemetry, ELK, Splunk, and similar platforms
  • Strong hands-on experience across hybrid infrastructure, including on-prem, cloud (AWS, GCP, Azure), and container platforms (ECS, Kubernetes)
  • Proven ability to design scalable telemetry and instrumentation strategies, resolve production observability gaps, and integrate them into large-scale systems
  • Experience leading teams and managing people across geographically distributed locations
  • Strong ability to influence platform, cloud, and engineering leaders to ensure observability tooling is built for reuse and scale
  • Deep understanding of SRE fundamentals, including SLIs, SLOs, error budgets, and telemetry-driven operations
  • Strong collaboration skills and experience working across federated teams, building consensus and delivering change
  • Ability to stay up to date with industry trends and apply them to improve internal tooling and design decisions
  • Excellent written and verbal communication skills
Job Responsibility
Job Responsibility
  • Define and own the strategic vision and multi-year roadmap for Observability across Services Technology, aligned with enterprise reliability and production goals
  • Translate strategy into an actionable delivery plan in partnership with Services Architecture & Engineering function, delivering incremental, high-value milestones toward a unified, scalable observability architecture
  • Lead and mentor SREs across Services, fostering a technical growth and SRE mindset
  • Build and offer a suite of central observability services across LoBs – including standardized telemetry libraries, onboarding templates, dashboard packs, and alerting standards
  • Drive reusability and efficiency by creating common patterns and golden paths for observability adoption across critical client flows and platforms
  • Partner with infrastructure, CTO and other SMBF tooling teams, to ensure observability tooling is scalable, resilient, and avoids duplication (“cottage industries”)
  • Work hands-on to troubleshoot telemetry and instrumentation issues across on-prem, cloud (AWS, GCP, etc.), and ECS/Kubernetes-based environments
  • Collaborate closely with the architecture function to support implementation of observability NFRs in the SDLC, ensuring new apps go live with sufficient coverage and insight
  • Support SRE Communities of Practice (CoP) and foster strong relationships with SREs, developers, and platform leads across Services and beyond to accelerate adoption & promote SRE best practices like SLO adoption, Capacity Planning
  • Use Jira/Agile workflows to track and report on observability maturity across Services LoBs – coverage, adoption, and contribution to improved client experience
What we offer
What we offer
  • 27 days annual leave (plus bank holidays)
  • A discretional annual performance related bonus
  • Private Medical Care & Life Insurance
  • Employee Assistance Program
  • Pension Plan
  • Paid Parental Leave
  • Special discounts for employees, family, and friends
  • Access to an array of learning and development resources
  • Fulltime
Read More
Arrow Right

Engineering Manager

We are looking for a skilled Engineering Manager to lead our Billing and Interna...
Location
Location
Finland , Helsinki
Salary
Salary:
Not provided
aiven.io Logo
Aiven Deutschland GmbH
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven track record of leading diverse teams from junior to senior engineers to successfully deliver software products
  • Strong product sense enabling team to design innovative, cloud-native products
  • Excellent communication skills in English
  • Ability to bring order and clarity to a dynamic, ambiguous environment
  • Experience in recruiting engineering talent and building high-performing teams
  • Experience with agile software development methodologies
  • Experience in building and designing distributed systems in a cloud environment
  • Strong knowledge of database fundamentals including OLAP vs OLTP, persistence, replication, and clustering
  • Good grasp of monitoring and observability tools like Prometheus, Grafana, and OpenTelemetry
  • Ability to work with SQL to interact with platform's master database
Job Responsibility
Job Responsibility
  • Strategic Planning: Partner with Product Manager and Domain Head to create team's roadmap
  • Project Management: Oversee team's backlog and projects
  • Leadership & Delivery: Champion culture of urgency and ownership to deliver impactful results
  • Team Enablement: Empower team to act as product custodians
  • Performance & Development: Provide clear goals and feedback
  • Team Facilitation: Lead team meetings such as planning sessions and retrospectives
  • Culture Building: Foster psychologically safe, high-trust environment
  • Collaboration: Ensure effective communication and collaboration within team and across organization
What we offer
What we offer
  • Participate in Aiven's equity plan
  • Hybrid work policy
  • Equipment provided
  • Employer support for career development including learning platforms and annual learning budget
  • Global Employee Assistance Program
  • Professional massage at office
  • Health and fitness benefits through Urban Sport Club membership
  • Monthly team breakfast
  • Referral bonus programme
Read More
Arrow Right

Staff Observability Operations Engineer

We are currently seeking several experienced and highly skilled Staff Observabil...
Location
Location
United States , Hartford
Salary
Salary:
130295.00 - 260590.00 USD / Year
https://www.cvshealth.com/ Logo
CVS Health
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ Years of experience in IT operations, with significant responsibilities in system monitoring, performance tuning, and troubleshooting enterprise applications
  • 5+ Years in a Site Reliability Engineering (SRE) role deploying and managing modern observability solutions
  • 5+ Years managing and implementing observability and event management platforms (e.g., AppDynamics, Splunk, Prometheus, Grafana)
  • Experience developing and administering ServiceNow ITOM event management solutions
  • Experience deploying and managing service reliability platforms (e.g., xMatters, OpsGenie, PagerDuty)
  • Experience with and deep knowledge of cloud environments, cloud monitoring platforms, and container orchestration tools (e.g., AWS/CloudTrail, Azure/Monitor, GCP/GCM, Kubernetes, OpenShift)
  • Proficiency in Python and other scripting languages such as Ansible, PowerShell, Bash for automation and configuration
  • Hands-on experience deploying, managing, and administering observability platforms
  • Hands-on experience leading, coordinating, and performing migration of application, platform, and infrastructure observability solutions
  • Proven ability to troubleshoot and resolve complex technical issues
Job Responsibility
Job Responsibility
  • Deploy and implement modern observability solutions
  • Manage and administer observability and event management platforms
  • Coordinate and manage release cycles for observability platforms
  • Troubleshoot and resolve incidents related to observability platforms
  • Continuously monitor and enhance platform performance
  • Collaborate with cross-functional stakeholders
  • Provide training and mentoring to junior engineers
  • Ensure compliance and security of observability platforms
  • Maintain documentation of observability platform configurations
  • Generate and analyze reports on platform performance and capacity
What we offer
What we offer
  • Affordable medical plan options
  • a 401(k) plan (including matching company contributions)
  • an employee stock purchase plan
  • No-cost programs for all colleagues including wellness screenings, tobacco cessation and weight management programs
  • confidential counseling and financial coaching
  • Paid time off
  • flexible work schedules
  • family leave
  • dependent care resources
  • colleague assistance programs
  • Fulltime
Read More
Arrow Right