CrawlJobs Logo

Monitoring Engineer / Infrastructure Engineer

United Kingdom, Hemel Hempstead · Job Posted April 23, 2026
Apply Position
Job Link Share

Job Description

We are seeking a Monitoring Engineer to lead day-to-day technical operations within a Windows infrastructure environment. Reporting into the Head of Operating Systems, this role combines hands-on technical delivery with leadership responsibilities across a specialist infrastructure team. You will play a key part in shaping monitoring capability, improving operational resilience, and supporting both project delivery and live service within a highly governed environment. The role requires strong expertise in enterprise monitoring tools and infrastructure architecture, with the ability to influence technical direction, support decision making, and ensure operational standards are consistently met.

Job Responsibility

  • Lead and mentor the infrastructure monitoring team, supporting development of SME capability and operational maturity
  • Own and contribute to solution design, estimation, high and low-level design, and implementation activities under Project Manager guidance
  • Ensure adherence to SLAs, responding, resolving, or escalating issues appropriately within defined thresholds
  • Develop and maintain operational and end-user documentation, ensuring consistency and compliance with standards
  • Support pre-sales and solution scoping activities where required
  • Work closely with Architects and Solution Designers to assess options and provide technical recommendations
  • Accurately estimate effort, cost, and delivery timelines for implementation tasks
  • Ensure all team activity is fully documented in line with governance and operational standards
  • Provide regular progress updates to Project Management to support delivery tracking and planning

Requirements

  • Strong enterprise infrastructure background with extensive operational experience
  • Proven experience leading infrastructure or technical teams within structured delivery environments
  • Deep technical expertise in monitoring and infrastructure tooling, including: Microsoft System Center Operations Manager (SCOM), PRTG Network Monitor
  • Experience in network device monitoring and dashboard configuration
  • Strong fault finding, diagnosis, and resolution skills across complex infrastructure environments
  • Experience with virtualised environments, enterprise storage, file/print services, and hardware evaluation
  • Strong understanding of service management and working within SLA-driven environments
  • Experience working within governed frameworks and structured delivery methodologies
  • Project leadership experience within structured methodologies such as PRINCE2 or Project Management Institute (PMI) approaches
  • Diploma or equivalent in Computer Science or related discipline

Nice to have

  • Experience working in customer-facing environments and understanding business impact of technical issues
  • Strong documentation skills for both end-user and operational audiences
  • Accreditation at Microsoft Certified Systems Engineer (MCSE) level or equivalent
  • Knowledge of ITIL Foundation principles and service management best practice
  • Experience with enterprise messaging, thin client environments, or virtualization platforms

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Monitoring Engineer / Infrastructure Engineer

8 matching positions

Full Stack Software Engineer - Monitoring Infrastructure

As a Full Stack Software Engineer focused on Monitoring Infrastructure, you will...
Location
Location
United States , San Francisco; Chicago (Woodridge); Oakland
Salary
Salary:
Not provided
formic.co Logo
Formic
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science or equivalent practical experience
  • 5+ years of experience building complex distributed systems in domains such as IoT, robotics, automotive, or similar
  • Strong proficiency in Python and Django
  • Deep understanding of Linux-based systems
  • Experience with AWS, ideally AWS IoT
  • Familiarity with observability tools such as Grafana, Datadog, or similar
  • Experience working with video streaming systems
  • Experience with React or other front-end technologies strongly preferred
  • Located in or willing to relocate to the Chicago, IL (Woodridge) or San Francisco, CA (Oakland) areas and able to work in a hybrid environment (3+ days per week)
Job Responsibility
Job Responsibility
  • Contribute to the design and development of Formic’s end-to-end monitoring stack
  • Build systems that support provisioning, data collection, and remote troubleshooting
  • Work closely with the Robotics team to understand and design monitoring interfaces
  • Develop software across edge and cloud environments to enable robust data collection and processing
  • Troubleshoot monitoring and data collection issues on deployed systems
  • Provision and maintain cloud infrastructure as needed
  • Write unit and integration tests to ensure reliability and maintainability
  • Participate in Scrum ceremonies and code reviews
  • Fulltime
Read More
Arrow Right

Full Stack Software Engineer - Monitoring Infrastructure

As part of the Engineering Team you will be working on building and improving mo...
Location
Location
United States , Chicago; San Francisco
Salary
Salary:
Not provided
formic.co Logo
Formic
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in CS or equivalent experience/training
  • 5+ years of relevant experience working on complex distributed systems such as IoT, Robotics, Automotive or equivalent
  • Knowledge of Python and Django
  • Deep understanding of Linux
  • Experience with AWS, ideally AWS IoT
  • Experience with Grafana, Datadog or similar dashboarding tools
  • Experience with Video Streaming
  • Located in - or willing to relocate to - the Chicago, IL or San Francisco, CA areas and willing to work on a hybrid basis (3+ days/week) in Woodridge, IL or San Francisco
Job Responsibility
Job Responsibility
  • Contribute to design and development of complete monitoring stack that enables: System Provisioning
  • Data Collection
  • Remote troubleshooting
  • Work closely with Robotics team to understand and help design monitoring interfaces
  • Develop SW for both edge and cloud to allow robust data collection and processing
  • Help troubleshoot data collection issues on deployed systems
  • Provision cloud infrastructure as needed
  • Write unit and integration tests as needed
  • Participate in Scrum ceremonies
  • Fulltime
Read More
Arrow Right

O&M infrastructure Engineer - Facility Monitoring

Location
Location
Saudi Arabia , Riyadh
Salary
Salary:
Not provided
gizasystems.com Logo
Giza Systems
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong incident response and troubleshooting abilities
  • High attention to detail with adherence to strict procedures
  • Effective communication skills for coordination and access management
  • Ability to work under pressure and make sound decisions in a fast-paced 24x7 environment
  • Understanding of electrical systems including power distribution, UPS, generators, and electrical safety
  • Expertise in HVAC systems used for data center cooling
  • Familiarity with fire detection and suppression technologies
  • Knowledge of physical security systems such as access control and CCTV surveillance
  • Years of Experience Min: 1 Max: 3
  • Degree: Bachelor's degree
Job Responsibility
Job Responsibility
  • Continuously monitor and maintain optimal environmental conditions including temperature, humidity, and airflow
  • Monitor power distribution systems such as UPS, PDUs, and generators
  • Monitor physical security systems including access control, surveillance cameras, and intrusion detection systems
  • Monitor fire detection and suppression systems
  • Respond promptly to alerts and incidents related to facility infrastructure
  • Troubleshoot and resolve L1 issues and coordinate with relevant teams to minimize downtime
  • Perform full data center facilities operations and L1 support for IT systems
  • Ensure only authorized personnel access the data center facility
  • Escort authorized personnel to their racks and allocated spaces
  • Prepare gate passes and manage access permissions
  • Fulltime
Read More
Arrow Right

IT Systems Engineer | Infrastructure Engineer

We are seeking an Adelaide-based Systems Engineer to take ownership of our core ...
Location
Location
Australia , Adelaide
Salary
Salary:
Not provided
dyflex.com.au Logo
DyFlex Solutions
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ years of experience in systems engineering / systems administration or infrastructure engineering
  • Deep expertise in the Microsoft ecosystem, including Windows Server 2022, Entra ID (hybrid), Azure, and Microsoft 365
  • Proven ability to automate processes using PowerShell (advanced scripting) and/or Power Automate
  • Strong background in cybersecurity uplift: patching, hardening, vulnerability remediation, and identity/endpoint security
  • Hands‑on experience with ASD Essential Eight, with exposure to ISO 27001 or SOC 2 considered highly advantageous
  • Experience in firewall administration (e.g., Sophos), routing/switching fundamentals, and secure remote access design
  • Experience supporting or administering Linux (SUSE preferred) within a predominantly Windows environment
  • Demonstrated ability to deliver technical upgrades end‑to‑end with high‑quality documentation and handover
  • Experience producing clear technical diagrams and architectural documentation
  • Strong communication, collaboration, and coaching skills, with the ability to guide junior team members
Job Responsibility
Job Responsibility
  • Manage and optimise our Microsoft ecosystem, including Windows Server, Active Directory, and Microsoft 365
  • Administer and enhance Microsoft Entra ID in a hybrid environment, including Conditional Access, SSO integrations, and identity security controls
  • Lead our cybersecurity uplift, driving vulnerability remediation, system hardening, Essential Eight maturity, and Microsoft Defender improvements
  • Contribute to the implementation and operationalisation of Microsoft Sentinel, including onboarding data sources and alert tuning
  • Architect, manage, and scale our Azure environment (IaaS/PaaS) to support a rapidly growing national team
  • Act as the final Level 3 escalation point for complex server, identity, networking, and endpoint issues
  • Oversee network integrity and security, including firewall management, site‑to‑site VPNs, remote access VPNs, and uplift of network segmentation
  • Drive infrastructure automation and consistency by developing and maintaining advanced PowerShell scripts and automations
  • Support and enhance our SOE, server build patterns, platform standards, and operational processes
  • Maintain and monitor our mixed environment, including SUSE Linux servers used for internal projects
What we offer
What we offer
  • A flexible and supportive work environment
  • Competitive remuneration and benefits including novated lease, birthday leave, salary packaging, wellbeing programme, additional purchased leave, and company-provided laptop
  • Comprehensive SAP training and certifications
  • Fulltime
Read More
Arrow Right

Ai Infrastructure Engineer, Core Infrastructure

As a Software Engineer on the ML Infrastructure team, you will design and build ...
Location
Location
United States , San Francisco; Seattle; New York
Salary
Salary:
179400.00 - 310500.00 USD / Year
scale.com Logo
Scale
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ years of experience building large-scale backend or distributed systems
  • Strong programming skills in Python, Go, or Rust, and familiarity with modern cloud-native architecture
  • Experience with containers and orchestration tools (Kubernetes, Docker) and Infrastructure as Code (Terraform)
  • Familiarity with schedulers or workload management systems (e.g., Kubernetes controllers, Slurm, Ray, internal job queues)
  • Understanding of observability and reliability practices (metrics, tracing, alerting, SLOs)
  • A track record of improving system efficiency, reliability, or developer velocity in production environments
Job Responsibility
Job Responsibility
  • Design and maintain fault-tolerant, cost-efficient systems that manage compute allocation, scheduling, and autoscaling across clusters and clouds
  • Build common abstractions and APIs that unify job submission, telemetry, and observability across serving and training workloads
  • Develop systems for usage metering, cost attribution, and quota management, enabling transparency and control over compute budgets
  • Improve reliability and efficiency of large-scale GPU workloads through better scheduling, bin-packing, preemption, and resource sharing
  • Partner with ML engineers and API teams to identify bottlenecks and define long-term architectural standards
  • Lead projects end-to-end — from requirements gathering and design to rollout and monitoring — in a cross-functional environment
What we offer
What we offer
  • Comprehensive health, dental and vision coverage
  • retirement benefits
  • a learning and development stipend
  • generous PTO
  • equity based compensation
  • Fulltime
Read More
Arrow Right

Research Engineer / Software Engineer (platform/core infrastructure)

Build the future of offensive security with XBOW. Attackers are already using AI...
Location
Location
United States
Salary
Salary:
150000.00 - 350000.00 USD / Year
xbow.com Logo
Xbow
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong experience building and operating scalable, distributed systems on cloud infrastructure such as AWS or similar
  • Comfortable working with infrastructure as code (e.g., Terraform, CDK)
  • A track record of performance tuning across cloud services, databases, and compute layers
  • Eager to learn new tools, languages, and technologies as needed
  • A thoughtful communicator who values clarity and simplicity and is comfortable working in a fast-paced startup and navigating ambiguity
  • Strong problem-solving skills and the ability to work with incomplete information
  • Curious, practical, and eager to work across layers of the stack when needed
  • You think proactively about failure modes and bring experience implementing disaster recovery and business continuity plans that keep critical systems running
Job Responsibility
Job Responsibility
  • Design and implement infrastructure systems that scale reliably and securely, and can be deployed across multiple cloud environments (AWS, Azure, OCI etc.) and contexts (SaaS, on prem)
  • Tune and optimize cloud services across compute, storage, networking, and observability to drive performance, reliability and maintainability of core services
  • Develop our core services, written in TypeScript, Kotlin and Go
  • Support large-scale systems with event driven architectures
  • Own problems end-to-end—from design through deployment to production support
  • Navigate ambiguity and help define how we build as much as what we build
  • Partner closely with other engineers, AI researchers and Security researchers to enable high-quality, high-velocity product development
  • Design for resilience by implementing disaster recovery and business continuity strategies that ensure uptime, even when things break
  • Improve how we build, deploy, and monitor services at scale
What we offer
What we offer
  • Competitive salary and a generous equity package
  • Career Growth: Shape your role, lead the function, and grow with the company
  • Meaningful Work: You will tackle technically complex challenges and play a pivotal role in the growth of our business
  • Fulltime
Read More
Arrow Right
New

Senior Azure Infrastructure Engineer

We are seeking a Senior Azure Infrastructure Engineer to design, implement, and ...
Location
Location
United States , Mechanicsville
Salary
Salary:
Not provided
https://www.roberthalf.com Logo
Robert Half
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience implementing and managing Microsoft Azure infrastructure
  • 8+ years of experience with cloud migrations from on-premises to Azure
  • 8+ years of experience in enterprise, mission-critical environments
  • 8+ years of experience designing and implementing backup and disaster recovery solutions
  • 8+ years of experience with Azure networking and security policies
  • 8+ years of experience using Azure Monitor for observability, alerting, and dashboards
  • 8+ years of experience with vulnerability remediation and security initiatives
  • Strong understanding of networking concepts, including OSI model and network protocols
Job Responsibility
Job Responsibility
  • Designed, implemented, and maintained Azure infrastructure, including compute, networking, and security configurations
  • Provided technical guidance on Azure architecture, best practices, and governance
  • Monitored cloud environments and ensured high availability, performance, and security
  • Planned and executed cloud migrations from on-premises environments to Azure
  • Supported and troubleshot production applications across enterprise systems
  • Developed and maintained backup and disaster recovery strategies to ensure business continuity
  • Implemented and managed security tools and vulnerability remediation processes
  • Built monitoring dashboards and alerting using Azure Monitor, Workbooks, and Alerts
  • Collaborated with cross-functional teams supporting technologies such as .NET, Java, SQL, Oracle, and Microsoft Dynamics
What we offer
What we offer
  • Medical, vision, dental, and life and disability insurance
  • 401(k) plan
  • Fulltime
Read More
Arrow Right
New

Infrastructure Engineer Manager

Deploy, configure, and maintain servers, networks, storage systems, and infrastr...
Location
Location
Egypt , New Cairo
Salary
Salary:
Not provided
ethicshr.com Logo
Ethics HR
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Information Technology, or related field from a reputable university
  • Minimum of 8 years of experience in IT infrastructure, preferably in banking or financial services
  • Strong knowledge of server, network, storage, and cloud technologies (AWS, Azure, or equivalent)
  • Must have knowledge and experience of Datacenter management, Unified communication family suites, LANs, WANs, Wireless Networks, Virtualization, strong understanding TCP/IP protocol stack and network layer, Routing, Switching, MPLS networks, Firewalls, SASE networks, VPN technologies, system monitoring and management
  • Familiarity with IT governance, compliance, and CBE regulations
  • Excellent oral and written communication skills
  • Strong problem-solving, troubleshooting, and analytical skills
  • Ability to work in a fast-paced, start-up environment and manage multiple infrastructure projects simultaneously
Job Responsibility
Job Responsibility
  • Deploy, configure, and maintain servers, networks, storage systems, and infrastructure
  • Monitor infrastructure performance and implement improvements for high availability, scalability, and security
  • Familiarity with APIs to enable automation, integration, and monitoring of infrastructure
  • Collaborate with technology teams to ensure integration of core banking, digital, and enterprise applications
  • Implement and manage security controls, firewalls, intrusion detection/prevention systems, and encryption mechanisms
  • Perform regular backups, disaster recovery tests, and failover readiness
  • Troubleshoot and resolve infrastructure-related incidents, minimizing downtime and operational impact
  • Ensure infrastructure complies with CBE IT regulations, cybersecurity requirements, and internal IT policies
  • Document infrastructure designs, configurations, and operational procedures
  • Support deployment of new technology initiatives and digital banking solutions
  • Fulltime
Read More
Arrow Right