CrawlJobs Logo

Senior Infrastructure Engineer - Monitoring

coventrybuildingsociety.co.uk Logo

Coventry Building Society

Location Icon

Location:
United Kingdom , Coventry

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

44140.00 - 55000.00 GBP / Year
Save Job
Save Icon
Job offer has expired

Job Description:

Coventry Building Society have an exciting new role for a Senior Infrastructure Engineer – Infrastructure Monitoring to join the CIDO department. The role holder will be a Subject Matter Expert (SME), and responsible for providing strategic expertise to ensure effective management of the technical and business services for your specialist area. The Senior Infrastructure Engineer – Infrastructure Monitoring role sits in the Infrastructure Monitoring and technical observability Team.

Job Responsibility:

  • Lead the administration, configuration, and maintenance of the Dynatrace platform (SaaS or Managed) across Hybrid Cloud and On-Premise environments
  • Manage the rollout of OneAgents and ActiveGates across complex environments
  • Create custom dashboards for different stakeholders
  • Act as the Subject Matter Expert (SME) during critical incidents (P1/P2)
  • Integrate Dynatrace with our ITSM and CI/CD ecosystems
  • Design complex synthetic transaction tests to simulate customer journeys
  • Train application teams on how to interpret APM data

Requirements:

  • Dynatrace Mastery: hands-on experience with Dynatrace
  • Comfortable with Management Zones, Request Attributes, Tagging Rules, and Alert Profiles
  • Core Architecture: Deep understanding of multi-tier architectures (Web, App, DB) and microservices
  • Containerization: Strong experience monitoring containerized environments (Kubernetes, OpenShift, or Docker)
  • Cloud Fluency: Experience with at least one major cloud provider (AWS, Azure, or GCP)
  • Protocol Knowledge: Strong understanding of HTTP/S, TCP/IP, DNS, and SQL
  • Must have strong experience of business critical and regulated Financial Services environments
  • Security & Compliance: Understanding of PII masking and GDPR/regulatory constraints regarding data capture
  • High-Volume Environments: Experience working in environments processing high TPS (Transactions Per Second)

Nice to have:

  • Strong knowledge of other monitoring tools such as Cisco AppDynamics, ThousandEyes, SolarWinds SHO, Datadog
  • Scripting: Proficiency in Python, Bash, or PowerShell for automation tasks
What we offer:
  • 28 days holiday a year plus bank holidays and a holiday buy/sell scheme
  • Annual discretionary bonus scheme (up to 20% based on company performance)
  • Personal pension with matched contributions
  • Life assurance (6 times annual salary)

Additional Information:

Job Posted:
January 05, 2026

Expiration:
January 09, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior Infrastructure Engineer - Monitoring

Senior Infrastructure Engineer

We are seeking a skilled and proactive individual to play a key role in supporti...
Location
Location
United Kingdom , Manchester
Salary
Salary:
Not provided
ans.co.uk Logo
ANS Group
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Exposure to secure architecture design and implementation
  • Experience with the deployment and management Carbon Black or other EDR solutions across cloud infrastructure
  • Significant previous experience as an infrastructure engineer working on a large scale enterprise or multi-tenant environment
  • VMware 7.0+
  • Significant experience troubleshooting and analysing complex failures
  • Operational experience of NSX 3.0+
  • Scripting abilities in Powershell and PowerCLI
  • Experience with Cisco UCS or other enterprise blade systems
  • Significant Experience with Storage Technologies (HPE 3PAR, Nimble, Dell Compellent)
  • Experience with FC storage networking
Job Responsibility
Job Responsibility
  • Work to ensure conformity to public sector infrastructure requirements are met
  • Work in conjunction with our SoC team to develop and maintain platform security baselines
  • Monitor, diagnose and resolve significant problems within the ANS infrastructure
  • Be an escalation point for team members and the support teams offering technical expertise in virtualization, compute hardware and storage
  • Collaborate and work with other technical teams to provide industry leading support to our customers
  • Responsible for creating high quality documentation
  • Proactively work to identify areas of improvement in the platform
  • Effectively deliver project milestones
  • Responsible for the generation of LLD from HLD
  • Ensure our infrastructure is up to date by planning & performing patching and firmware upgrades
What we offer
What we offer
  • 25 days’ holiday, plus you can buy up to 5 more days
  • Birthday off
  • An extra celebration day
  • 5 days’ additional holiday in the year you get married
  • 5 volunteer days
  • Private health insurance
  • Pension contribution match and 4 x life assurance
  • Flexible working and work from anywhere for up to 30 days per year
  • Maternity: 16 weeks’ full pay
  • Paternity: 3 weeks’ full pay
  • Fulltime
Read More
Arrow Right

Senior Infrastructure Engineer - Postgres

ClickHouse is expanding its cloud data platform across AWS, GCP, and Azure—addin...
Location
Location
United States
Salary
Salary:
140000.00 - 208000.00 USD / Year
clickhouse.com Logo
ClickHouse
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years in SRE, DevOps, or infrastructure engineering, with a track record of running distributed, production-grade systems
  • Solid understanding of Postgres operations, scaling, and performance tuning
  • Deep hands-on experience across AWS, with exposure to GCP and Azure
  • comfortable navigating multi-cloud topologies
  • Proficient with Terraform, Kubernetes, and container-based infrastructure
  • Strong Go development skills (or willingness to write and own production Go code)
  • Familiar with tools like Prometheus, Grafana, Loki, OpenTelemetry, or equivalents
  • Deep understanding of SLOs, incident response, and continuous improvement in service reliability
  • You operate with a founder’s mentality — hands-on, resourceful, and willing to dive deep to get things done. You take pride in hard work, autonomy, and shipping impactful systems
Job Responsibility
Job Responsibility
  • Lead reliability and operations for ClickHouse’s Postgres integration — upgrades, patching, maintenance, and scaling
  • Design and implement automation for provisioning, deployments, and service lifecycle management across AWS, GCP, and Azure
  • Develop infrastructure-as-code using Terraform and modern CI/CD tooling to ensure consistent, repeatable deployments
  • Contribute Go-based tooling and services that improve automation, observability, and developer experience
  • Own observability and monitoring, ensuring robust alerting, metrics, and tracing across environments
  • Drive incident management and postmortem practices that strengthen reliability and learning loops
  • Collaborate cross-functionally with platform, networking, and product teams to improve service operability
  • Mentor and enable engineers, helping the team scale effectively as customer adoption grows
What we offer
What we offer
  • Flexible work environment - ClickHouse is a globally distributed company and remote-friendly. We currently operate in 20 countries
  • Healthcare - Employer contributions towards your healthcare
  • Equity in the company - Every new team member who joins our company receives stock options
  • Time off - Flexible time off in the US, generous entitlement in other countries
  • A $500 Home office setup if you’re a remote employee
  • Global Gatherings – We believe in the power of in-person connection and offer opportunities to engage with colleagues at company-wide offsites
  • Fulltime
Read More
Arrow Right

Senior Infrastructure Engineer

Senior Infrastructure Engineer! Permanent Position / Direct Hire!
Location
Location
United States , Orlando, FL
Salary
Salary:
44.00 USD / Hour
sar-tech.net Logo
SAR Tech
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Ability to work in a project driven environment
  • Experience or desire to work in an agile environment
Job Responsibility
Job Responsibility
  • Manage the Wintel & Cloud compute teams and guide the team members on the new design plans and layouts for server infrastructure
  • Project management for Server Infrastructure Projects to ensure that compute program is on track and deliverables are on schedule
  • Upgrade/refresh the legacy OS/application servers to the latest OS platform either on-prem or cloud
  • Client hardware/OS requirements study, design, and implement solutions at the client’s datacenters/plants/offices across the globe
  • Managing Windows servers for 200+ medium and small Plants and offices to ensure zero impact on production
  • Installing, configuring, and maintaining IT compute services like physical servers, Virtual servers, Microsoft operating systems, VMware, Hyper-V & AD
  • Develop solutions architecture and evaluate architectural alternatives for private, public and hybrid cloud models, including IaaS, PaaS, and other cloud services
  • Act as the Single Point of Contact (SPOC) for technical escalations from global and regional leadership ensuring that all escalated technical issues are resolved
  • Troubleshooting the client’s day-to-day IT operational issues onsite. All issues are registered in ServiceNow (ticket management system) and prioritized with a category as Major Incident, Priority 1, Priority 2, Priority 3 and Priority 4
  • Provision, configure, and maintain AWS services such as EC2, S3, RDS, VPC, CloudWatch
  • Fulltime
Read More
Arrow Right

Senior Infrastructure Engineer – Hosting

As a Senior Infrastructure Engineer – Hosting you will be responsible for the de...
Location
Location
United States
Salary
Salary:
150000.00 USD / Year
corporatetools.com Logo
Corporate Tools
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3-5 years of experience in Linux system administration, virtualization, and cloud infrastructure
  • Experience with Proxmox or other hypervisors (VMware, KVM, Xen, Hyper-V)
  • Experience with Ceph or SAN storage solutions for virtualization
  • Ability to manage kernel tuning, system performance, and process optimization
  • Hands-on experience with Ceph storage, ZFS, iSCSI, NFS, RAID, and SAN architectures
  • Understanding of storage performance metrics (IOPS, throughput, latency)
  • Ability to work on projects solo or with a team
  • Love for learning and improving code
  • Strong communication and collaboration skills
  • Experience with WordPress hosting, database replication, and caching techniques
Job Responsibility
Job Responsibility
  • Develop and design robust and scalable hardware solutions
  • Take ownership of projects from conception to deployment, ensuring timely delivery and meeting the specified requirements
  • Work closely with cross-functional teams, including IT, product management, and other software teams, to ensure seamless integration and alignment with business objectives
  • Deploy, configure, and maintain Proxmox VE clusters for virtualization or other hypervisors
  • Implement high-availability (HA) and failover solutions for virtual machines
  • Manage resource allocation (CPU, memory, disk, network) to optimize performance for hosted applications
  • Automate VM deployment and configuration using Ansible, Terraform, or SaltStack
  • Maintain backups and disaster recovery plans for virtualized environments
  • Design and manage Ceph clusters or SAN storage (iSCSI, NFS, ZFS, etc.) for high-performance, redundant storage
  • Monitor and optimize storage performance, including IOPS, latency, and throughput
What we offer
What we offer
  • 100% employer-paid medical, dental and vision for employees
  • Annual review with raise option
  • 22 days Paid Time Off accrued annually, and 4 holidays
  • After 3 years, PTO increases to 29 days. Employees transition to flexible time off after 5 years with the company—not accrued, not capped, take time off when you want
  • The 4 holidays are: New Year’s Day, Fourth of July, Thanksgiving, and Christmas Day
  • Paid Parental Leave
  • Up to 6% company matching 401(k) with no vesting period
  • Quarterly allowance
  • Use to make your remote work set up more comfortable, for continuing education classes, a plant for your desk, coffee for your coworker, a massage for yourself... really, whatever
  • Open concept office with friendly coworkers
  • Fulltime
Read More
Arrow Right

Senior Software Engineer - ML Infrastructure

We build simple yet innovative consumer products and developer APIs that shape h...
Location
Location
United States , San Francisco
Salary
Salary:
180000.00 - 270000.00 USD / Year
plaid.com Logo
Plaid
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of industry experience as a software engineer, with strong focus on ML/AI infrastructure or large-scale distributed systems
  • Hands-on expertise in building and operating ML platforms (e.g., feature stores, data pipelines, training/inference frameworks)
  • Proven experience delivering reliable and scalable infrastructure in production
  • Solid understanding of ML Ops concepts and tooling, as well as best practices for observability, security, and reliability
  • Strong communication skills and ability to collaborate across teams
Job Responsibility
Job Responsibility
  • Design and implement large-scale ML infrastructure, including feature stores, pipelines, deployment tooling, and inference systems
  • Drive the rollout of Plaid’s next-generation feature store to improve reliability and velocity of model development
  • Help define and evangelize an ML Ops “golden path” for secure, scalable model training, deployment, and monitoring
  • Ensure operational excellence of ML pipelines and services, including reliability, scalability, performance, and cost efficiency
  • Collaborate with ML product teams to understand requirements and deliver solutions that accelerate experimentation and iteration
  • Contribute to technical strategy and architecture discussions within the team
  • Mentor and support other engineers through code reviews, design discussions, and technical guidance
What we offer
What we offer
  • medical, dental, vision, and 401(k)
  • Fulltime
Read More
Arrow Right

Senior Cloud Infrastructure Engineer

Taskrabbit is looking for an experienced Senior Cloud Infrastructure Engineer to...
Location
Location
United States , New York; San Francisco
Salary
Salary:
147000.00 - 196000.00 USD / Year
taskrabbit.com Logo
Taskrabbit
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • At least 5+ years of experience in the Infrastructure and DevOps Space
  • Experience with build automation and configuration management tools (e.g. Ansible, Puppet, Chef.)
  • Strong knowledge of the Amazon Web Services (AWS) ecosystem and other core AWS technologies, ElasticSearch Service, RDS, WAF, CloudFront, Kubernetes etc.
  • You have worked with common infrastructure tools like Docker, Terraform, Helm, GitHub Actions, and ArgoCD
  • Experience with a microservice architecture running in containers (Docker or other containerisation technology)
  • Experience supporting 24x7, high-availability internet application environments that include web, application, and database servers and load balancing systems
  • Experience working with a product that has end-users
  • Bachelor's degree or higher in Computer Science, or equivalent experience
  • Excellent written and communication skills
  • A strong ownership attitude and a track record of taking responsibility for problems and pushing through to resolution
Job Responsibility
Job Responsibility
  • Build and maintain new modern infrastructure such as Kubernetes, new CI/CD tools and assisting with application teams on adapting
  • Building and maintaining CI / CD pipelines from scratch for testing and releasing configuration and software
  • Monitor and resolve issues in all environments using tools such as DataDog, PagerDuty, and AWS logs
  • Engage in capacity planning and demand forecasting, anticipating performance bottlenecks, and scaling the environment as needed using DataDog and other tools
  • Design and implement a zero-downtime solution to accomplish a highly available service (99.9%)
  • Ensure systems are secure against cyber threats and implement fixes for Security vulnerabilities
  • Automate tasks and develop tools to increase engineering efficiency and visibility
  • Design and implement disaster recovery (DR) between different regions in cloud providers such as AWS
  • Manage web domain and certificates
  • Troubleshoot production and testing environment issues, including performance and function issues
What we offer
What we offer
  • employer-paid health insurance
  • 401k match with immediate vesting
  • generous and flexible time off with 2 company-wide closure weeks
  • Taskrabbit product stipends
  • wellness + productivity + education stipends
  • IKEA discounts
  • reproductive health support
  • Fulltime
Read More
Arrow Right

Network Admin Senior Engineer-Infrastructure Management

Sopra Steria, a major Tech player in Europe, is hiring for the position of Netwo...
Location
Location
India , Noida
Salary
Salary:
Not provided
https://www.soprasteria.com Logo
Sopra Steria
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven hands-on network engineering experience
  • Solid understanding of the OSI or TCP/IP model
  • Deep Hands on to understand the incident and troubleshooting with SLA
  • Deep understanding of networking routing protocols (e.g., Static, BGP, OSPF)
  • Deep understanding of VLAN, HSRP, VRRP, STP, VTP, VSS, VLS and VPC802.11, QoS, Ether-Channel
  • Deep Hands-on experience on Cisco, Nexus and HPE network devices e.g., router, switch
  • Hands-on experience with monitoring, network diagnostic and network analytics tools
  • Deep Hands-on experience on Cisco, Aruba Wireless Controller and Access Point
  • Deep understanding of Clear Pass, 802.1x, Radius, TACACS
  • Deep understanding and Hands on IPsec and Remote VPN
What we offer
What we offer
  • Inclusive and respectful work environment
  • Open positions for people with disabilities
  • Commitment to fighting against all forms of discrimination
Read More
Arrow Right

Senior Software Engineer (Infrastructure) - HyperDX

Join us in revolutionizing Observability for Developers! We’re on a mission to r...
Location
Location
United States
Salary
Salary:
133450.00 - 197200.00 USD / Year
clickhouse.com Logo
ClickHouse
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of backend engineering experience
  • Strong TypeScript and Node.js skills
  • Deep understanding of APIs, event-driven systems, and high-throughput data pipelines
  • Proficiency in SQL and experience working with analytical databases
  • Experience with Docker and Kubernetes, plus Helm for managing production deployments
  • Experience with infrastructure-as-code (Terraform, Pulumi, or similar)
  • Familiarity with CI/CD pipelines, monitoring systems, and production-grade alerting practices
  • A passion for building reliable, maintainable, cloud-native systems
Job Responsibility
Job Responsibility
  • Build the core platform: Design and implement backend systems and APIs that power HyperDX
  • Scale deployments and infrastructure: Architect, deploy, and maintain cloud-native systems
  • Ensure maintainability and operational excellence: Define best practices for CI/CD, monitoring, logging, and alerting
  • Engineer for scale: Design and operate ingestion and data processing pipelines
  • Engage with the community: Collaborate with open-source contributors and customers
What we offer
What we offer
  • Flexible work environment - remote-friendly
  • Healthcare - Employer contributions towards your healthcare
  • Equity in the company - stock options
  • Time off - Flexible time off in the US
  • A $500 Home office setup if you’re a remote employee
  • Global Gatherings – company-wide offsites
  • Fulltime
Read More
Arrow Right