CrawlJobs Logo

Observability Engineer / Architect

whitehallresources.com Logo

Whitehall Resources Ltd

Location Icon

Location:
United Kingdom , Shropshire

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

You will play a pivotal role in designing and governing observability architecture across enterprise platforms, ensuring alignment with strategic objectives and technical standards. This role combines discovery (assessing current state, defining observability requirements, and shaping architectural blueprints) and delivery (implementing scalable solutions, integrating with enterprise ecosystems, and optimising technical design) to enhance service reliability, performance, and business insights. You will act as a key influencer in architectural decisions, ensuring observability capabilities are embedded into the organisation’s technology roadmap.

Job Responsibility:

  • Conduct current-state assessments of observability architecture across applications, infrastructure, and services
  • Define target-state architecture for observability, ensuring alignment with enterprise principles and technology standards
  • Identify monitoring gaps and prioritise remediation based on technical risk and business-critical outcomes
  • Collaborate with Enterprise Architects, Service Owners, and the Observability Centre of Excellence to shape observability strategy and backlog
  • Design and implement Dynatrace-based observability solutions with architectural considerations for scalability, resilience, and integration
  • Develop reference architectures and patterns for observability, embedding them into CI/CD pipelines and strategic tooling
  • Optimise data ingestion and technical architecture for Dynatrace deployments, ensuring compliance with vendor best practices and enterprise governance
  • Provide architectural oversight during incident analysis and troubleshooting to reduce MTTR and improve system reliability
  • Act as a technical authority for Dynatrace SaaS deployments, guiding engineering teams on architectural decisions
  • Drive adoption of observability standards, frameworks, and architectural principles across teams
  • Contribute to the enterprise observability roadmap, ensuring alignment with broader IT strategy and digital transformation goals
  • Mentor and upskill teams on architectural thinking, Dynatrace capabilities, and observability best practices
  • Participate in architecture review boards and provide input on cross-domain integration and interoperability

Requirements:

  • Must not have been outside of the UK for more than 6 Months in the last 5 years
  • Proven experience with Dynatrace SaaS and observability architecture, including dashboarding, alerting, and DQL
  • Strong understanding of observability principles (metrics, logs, traces) and their role in enterprise architecture
  • Familiarity with cloud platforms (AWS, Azure), container technologies (Kubernetes), and architectural frameworks
  • Experience designing solutions for enterprise systems (WebLogic, Apache, Oracle, SQL) and infrastructure (Windows, Linux, Unix)
  • Ability to produce architectural artefacts (diagrams, standards, patterns) and communicate complex designs effectively
  • Excellent stakeholder engagement and collaboration skills
  • Dynatrace Associate Certification
  • TOGAF or equivalent enterprise architecture certification (advantageous)

Nice to have:

TOGAF or equivalent enterprise architecture certification

Additional Information:

Job Posted:
January 07, 2026

Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Observability Engineer / Architect

Senior Software Engineer, Observability

The Observability team at Airtable ensures that engineers have the tools they ne...
Location
Location
United States , San Francisco; New York; Seattle
Salary
Salary:
196000.00 - 270000.00 USD / Year
airtable.com Logo
Airtable
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years of software engineering experience
  • 3+ years focused on observability or infrastructure at scale
  • Demonstrated success implementing and running production-grade logging, metrics, or tracing systems
  • Proficiency in distributed systems concepts, data streaming pipelines, and container orchestration (Kubernetes)
  • Deep hands-on knowledge of tools such as Prometheus, Grafana, Datadog, OpenTelemetry, ELK Stack, Loki, or ClickHouse
  • Comfort with at least one programming language (e.g., Go, Python, Java) to build and maintain observability tooling
  • Experience mentoring engineers and collaborating across multiple teams
  • Strong communication skills
  • Eagerness to own high-impact initiatives
  • Proven ability to balance short-term fixes with long-term strategic vision
Job Responsibility
Job Responsibility
  • Architect and scale core observability systems
  • Lead the design and evolution of logging, metrics, and tracing pipelines
  • Evaluate and integrate new technologies (e.g., OpenTelemetry, ClickHouse, ELK stack)
  • Guide and mentor a growing team of infrastructure engineers
  • Define and uphold coding standards and operational excellence
  • Partner with Deploy Infrastructure, Service Orchestration, and Product teams
  • Align infrastructure decisions with business goals
  • Own end-to-end reliability for observability tools and establish SLAs, SLOs, and error budgets
  • Optimize performance and cost of large-scale data pipelines
  • Shape the observability roadmap
What we offer
What we offer
  • Opportunity to receive benefits
  • Restricted stock units
  • May include incentive compensation
  • Comprehensive benefit offerings
  • Fulltime
Read More
Arrow Right

Monitoring & Observability Engineer

The Monitoring & Observability Engineer is a senior level position responsible f...
Location
Location
India , Chennai; Pune
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3-7 years of relevant experience in an Engineering & IT role
  • At least 2+ years of hands-on working experience in: Strong understanding of UI/UX principles and best practices
  • Proficient in JavaScript, TypeScript, HTML, CSS, React, and Node.js
  • Experience with backend technologies and databases (e.g., MongoDB)
  • Experience with Python Programming
  • Experience with version control systems (e.g., Git)
  • Strong problem-solving and analytical skills
  • Excellent communication and collaboration skills
  • Create modular and reusable React components to streamline development and maintain consistency across the application
  • Continuously improve existing applications, addressing bugs, and implementing new features
Job Responsibility
Job Responsibility
  • Drive the best-in-class monitoring using a range of tools across all regions of Global Consumer bank
  • Drive POCs and incubate new features and capabilities
  • Be forward looking and ensure long term strategic success
  • Work closely with the monitoring operations teams, production support, performance test teams, operations, application owners and application owners to deliver best-in-class monitoring
  • Explain complicated performance bottlenecks to stakeholders
  • Understand complicated application architecture, including Java app servers, Web Servers, Cloud (PCF, AWS, Google), Kubernetes, TIBCO, mainframe
  • Build advanced dashboards and queries
  • Be a subject matter expert for the Global Consumer Bank, including conducting brown bags and office hours
  • Recommend product customization for system integration
  • Identify problem causality, business impact and root causes
  • Fulltime
Read More
Arrow Right

Cloud Technical Architect / Data DevOps Engineer

The role involves designing, implementing, and optimizing scalable Big Data and ...
Location
Location
United Kingdom , Bristol
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • An organised and methodical approach
  • Excellent time keeping and task prioritisation skills
  • An ability to provide clear and concise updates
  • An ability to convey technical concepts to all levels of audience
  • Data engineering skills – ETL/ELT
  • Technical implementation skills – application of industry best practices & designs patterns
  • Technical advisory skills – experience in researching technological products / services with the intent to provide advice on system improvements
  • Experience of working in hybrid environments with both classical and DevOps
  • Excellent written & spoken English skills
  • Excellent knowledge of Linux operating system administration and implementation
Job Responsibility
Job Responsibility
  • Detailed development and implementation of scalable clustered Big Data solutions, with a specific focus on automated dynamic scaling, self-healing systems
  • Participating in the full lifecycle of data solution development, from requirements engineering through to continuous optimisation engineering and all the typical activities in between
  • Providing technical thought-leadership and advisory on technologies and processes at the core of the data domain, as well as data domain adjacent technologies
  • Engaging and collaborating with both internal and external teams and be a confident participant as well as a leader
  • Assisting with solution improvement activities driven either by the project or service
  • Support the design and development of new capabilities, preparing solution options, investigating technology, designing and running proof of concepts, providing assessments, advice and solution options, providing high level and low level design documentation
  • Cloud Engineering capability to leverage Public Cloud platform using automated build processes deployed using Infrastructure as Code
  • Provide technical challenge and assurance throughout development and delivery of work
  • Develop re-useable common solutions and patterns to reduce development lead times, improve commonality and lowering Total Cost of Ownership
  • Work independently and/or within a team using a DevOps way of working
What we offer
What we offer
  • Extensive social benefits
  • Flexible working hours
  • Competitive salary
  • Shared values
  • Equal opportunities
  • Work-life balance
  • Evolving career opportunities
  • Comprehensive suite of benefits that supports physical, financial and emotional wellbeing
  • Fulltime
Read More
Arrow Right

Senior Integration Architect & iPaaS Engineer

Archer is an aerospace company based in San Jose, California building an all-ele...
Location
Location
United States , San Jose
Salary
Salary:
163200.00 - 204000.00 USD / Year
archer.com Logo
Archer Aviation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10–12+ years of progressive experience in integration engineering, architecture, or iPaaS development
  • Substantial hands-on experience with Workato or an equivalent iPaaS platform (e.g., Boomi, Mulesoft, SAP Integration Suite, SnapLogic)
  • Deep understanding of various integration mechanisms: REST APIs, SOAP, event-driven architecture, EDI, and file-based methods
  • Demonstrated experience integrating major Enterprise systems (SAP, Workday etc.) with cloud-based SaaS applications
  • Proven ability to create professional integration diagrams, decision frameworks, sequence diagrams, and detailed architectural documentation
  • Solid foundational understanding of data flow design, payload modeling, and data quality principles
Job Responsibility
Job Responsibility
  • Define and govern enterprise integration patterns and decision frameworks (e.g., API, EDI, event-driven, file/SFTP, Workato orchestrations)
  • Architect complex end-to-end integrations spanning ERP (e.g., SAP S/4), SaaS, data platforms, and MES
  • Establish rigorous standards for API contracts, event schemas, cataloging, error handling, observability, and security
  • Develop reusable integration components and enforce best practices across development teams
  • Build, test, and deploy integrations using Workato or comparable iPaaS platforms
  • Create reusable recipes, connectors, and accelerators
  • Troubleshoot and optimize existing flows for superior performance, reliability, and maintainability
  • Provide immediate, hands-on support for critical production incidents
  • Design comprehensive source-to-target data mappings and integration payload structures
  • Ensure integrations deliver clean, complete, and high-quality data to consuming analytical systems (e.g., Snowflake, Foundry)
  • Fulltime
Read More
Arrow Right

Solutions Engineering Lead

We are hiring a Solutions Engineering Team Lead for the East region to scale and...
Location
Location
United States , Boston
Salary
Salary:
220000.00 - 300000.00 USD / Year
coralogix.com Logo
Coralogix
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years in customer-facing technical roles (Sales Engineering, Solutions Architecture or similar)
  • 3+ years leading or managing pre-sales technical teams with a record of coaching success
  • Experience supporting or owning team-level quotas within a sales organization
  • Hands-on expertise with the following: Kibana, Grafana, Datadog, New Relic, Splunk, Honeycomb, Jaeger, OpenSearch
  • Proficiency crafting PromQL, Lucene and SQL queries for troubleshooting and dashboards
  • Deep knowledge of cloud services central to observability: AWS: EKS, Fargate, Lambda, CloudFormation, CloudWatch Logs and Metrics
  • Azure Monitor and equivalents in Google Operations Suite
  • Working knowledge of OpenTelemetry, modern DevOps and container platforms (Kubernetes, Docker)
  • Strong ability to communicate with engineers and C-level audiences alike
Job Responsibility
Job Responsibility
  • Own regional SE performance in partnership with Account Executives, ensuring quota attainment and deal velocity
  • Hire, onboard and mentor Solutions Engineers, setting clear KPIs and career paths
  • Maintain a strong personal presence with customers, modeling technical excellence and closing strategic opportunities
  • Improve processes for discovery, POC execution, documentation and knowledge sharing
  • Collaborate with Product, Support and Customer Success to shorten feedback loops and accelerate adoption
  • Architect and deploy reference designs for logs, metrics, traces, SIEM and Kubernetes monitoring across AWS, Azure and GCP
  • Lead white-board deep-dive sessions on ingestion pipelines, index-free querying and cost-optimized retention strategies
  • Provide escalation support during POCs: troubleshoot complex issues, analyze logs, traces, craft PromQL, Lucene or Dataprime queries and isolate root causes
  • Track technical success metrics such as POC win rate, onboarding time-to-value and validation scorecards, converting data insights into process improvements
  • Contribute code or scripts (Python, Go or Java) for custom exporters, automation and synthetic monitoring
What we offer
What we offer
  • Comprehensive and inclusive employee benefits for healthcare, dental, and mental health benefits
  • 401(k) plan and match
  • Paid sick time and paid time off
  • Fulltime
Read More
Arrow Right

Staff Engineer, Site Reliability

LearnUpon is looking for a Staff Site Reliability Engineer to join our team in I...
Location
Location
Ireland , Dublin
Salary
Salary:
Not provided
learnupon.com Logo
LearnUpon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of experience in a software or Ops role
  • 5+ years of cloud engineering experience, with at least 2 years experience with AWS
  • Experience deploying Microservice environments, using containerisation technologies such as Kubernetes and Docker
  • Experience in designing and implementing Observability tech stacks
  • Have championed the benefits of Observability to Engineering teams
  • Can architect the design of SLO/SLI implementation that balances the needs of different teams
  • Familiar with cost analysis of Observability metrics gathering, Engineering effort, and tooling
  • Experience building and supporting large-scale distributed systems that back a consumer app or website with associated requirements of performance, security and disaster recovery
  • Experience with implementing IaaC (e.g. CloudFormation, Terraform etc.), automation tooling (e.g. Puppet, Ansible etc.), CI/CD (e.g. Jenkins, Travis CI, GitLab etc.)
  • Able to effectively communicate technical ideas to and collaborate with both technical and non-technical peers
Job Responsibility
Job Responsibility
  • Identifying opportunities to improve and scale our infrastructure for performance, observability, maintainability, and cost, by creating innovative solutions
  • Leading our efforts to build an observability function that incorporates application metrics, application transaction tracking, and event log management
  • Driving the processes to maintain resilient, scalable and cost-effective infrastructure
  • Working with other Engineering teams to provide infrastructure solutions that meet their ongoing requirements
  • Building tools focused on measuring, monitoring and alerting, with an eye towards self-service in order to promote Engineers’ ownership of observability
  • Reacting quickly to changing customer and business needs
  • Participate in on-call rota
  • Mentoring junior talent
What we offer
What we offer
  • Work in a fun and supportive environment with regular team events
  • Excellent career progression
  • Structured learning environment
  • Competitive salary and company ESOP
  • Private health insurance
  • 26 days annual leave
  • Fulltime
Read More
Arrow Right

Solutions Engineering Lead

Coralogix is a modern full-stack observability platform that transforms how busi...
Location
Location
United States , Dallas
Salary
Salary:
Not provided
coralogix.com Logo
Coralogix
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years in customer-facing technical roles (Sales Engineering, Solutions Architecture or similar)
  • 3+ years leading or managing pre-sales technical teams with a record of coaching success
  • Experience supporting or owning team-level quotas within a sales organization
  • Hands-on expertise with the following: Kibana, Grafana, Datadog, New Relic, Splunk, Honeycomb, Jaeger, OpenSearch
  • Proficiency crafting PromQL, Lucene and SQL queries for troubleshooting and dashboards
  • Deep knowledge of cloud services central to observability: AWS: EKS, Fargate, Lambda, CloudFormation, CloudWatch Logs and Metrics
  • Azure Monitor and equivalents in Google Operations Suite
  • Working knowledge of OpenTelemetry, modern DevOps and container platforms (Kubernetes, Docker)
  • Strong ability to communicate with engineers and C-level audiences alike
  • Familiarity with structured sales methodologies such as MEDDPIC or Command of the Message (plus)
Job Responsibility
Job Responsibility
  • Own regional SE performance in partnership with Account Executives, ensuring quota attainment and deal velocity
  • Hire, onboard and mentor Solutions Engineers, setting clear KPIs and career paths
  • Maintain a strong personal presence with customers, modeling technical excellence and closing strategic opportunities
  • Improve processes for discovery, POC execution, documentation and knowledge sharing
  • Collaborate with Product, Support and Customer Success to shorten feedback loops and accelerate adoption
  • Architect and deploy reference designs for logs, metrics, traces, SIEM and Kubernetes monitoring across AWS, Azure and GCP
  • Lead white-board deep-dive sessions on ingestion pipelines, index-free querying and cost-optimized retention strategies
  • Provide escalation support during POCs: troubleshoot complex issues, analyze logs, traces, craft PromQL, Lucene or Dataprime queries and isolate root causes
  • Track technical success metrics such as POC win rate, onboarding time-to-value and validation scorecards, converting data insights into process improvements
  • Contribute code or scripts (Python, Go or Java) for custom exporters, automation and synthetic monitoring
What we offer
What we offer
  • Comprehensive and inclusive employee benefits for healthcare, dental, and mental health benefits
  • A 401(k) plan and match
  • Paid sick time
  • Paid time off
  • Fulltime
Read More
Arrow Right

Cloud Security Site Reliability Engineer

This role sits within the Cloud Security team responsible for Private and Public...
Location
Location
Singapore , Singapore
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree or equivalent work experience
  • 6+ years of relevant work experience
  • Highly motivated self-starter with excellent interpersonal and communication skills
  • Certification or formal training in site reliability engineering concepts and practices
  • Prior experience working towards SLIs, SLOs and observability capabilities at a large scale
  • 4+ years experience in Python (preferable) or Java, on large scale systems alongside Linux based scripting languages
  • Experience working on observability, logging and metrics toolsets
  • Experience of k8s and container technologies such as Docker, Openshift and EKS
  • Experience with public cloud technologies such as AWS, GCP or Azure
  • Experience with Secrets products such as HashiCorp Vault or CyberArk
Job Responsibility
Job Responsibility
  • Working across Container products and Secrets products, across Public and Private Cloud, as well as Cloud native specific products
  • Architecting and building tools and platforms that provide capabilities for SRE
  • Collaboration with multiple stakeholders and partners across Engineering and Operations as well as partner teams within the wider Citi organisation
  • Actively owning production level incidents till resolution.
What we offer
What we offer
  • Equal opportunity employer
  • Accessibility support for persons with disabilities.
  • Fulltime
Read More
Arrow Right