Site Reliability Engineer Platform Engineer Job at Tier4 Group (Reston)

Job Description

Join a mission-driven, national financial services organization at the heart of the U.S. housing finance ecosystem. This is a mid-sized, highly regulated enterprise operating at market scale—supporting platforms and analytics that enable trillions of dollars in annual economic activity. You’ll work in a modern tech environment with strong engineering partners, clear business impact, and a mandate for reliability, security, and continuous improvement. Our client is hiring a hands-on SRE / Platform Engineer to operate, tune, and scale our OpenShift/Kubernetes platforms while bridging on-prem to Azure to power our analytics ecosystem. You’ll own reliability, automation, and observability across a hybrid estate—partnering closely with developers, data engineers, infrastructure operations, and security to deliver secure, performant platform services using modern DevSecOps practices.

Job Responsibility

Operate, tune, and optimize OpenShift/Kubernetes clusters (scheduling, ingress, upgrades, quotas, policies)
Stand up and/or refine observability (Datadog, Prometheus, Grafana)—dashboards, alerts, SLOs, runbooks
Map current hybrid topology and critical delivery pipelines
identify toil and prioritize automation (Terraform/Ansible)
Begin supporting Azure environments (compute, networking, storage, data services) used by analytics teams
Drive GitOps-first workflows
harden CI/CD with ArgoCD/Jenkins/GitHub Actions and policy-as-code guardrails
Implement or enhance platform services (Vault, Kafka/AMQ, ingress, service mesh) for dev and data teams
Lead incident response and postmortems
institutionalize RCA, blameless learning, and continuous improvement
Advance the hybrid service model—migrations, integrations, reliability/latency tuning, cost and performance optimization
Operate and optimize OpenShift/Kubernetes clusters, ingress (e.g., Nginx), and container networking/service mesh
Manage Azure services (compute, VNet, storage, data services) supporting analytics workloads
Build and maintain automated infrastructure with Terraform, Ansible, and GitOps workflows
Implement and evolve observability (Datadog, Prometheus, Grafana): metrics, traces, logs, alerting, SLOs, runbooks
Design, harden, and support delivery pipelines with ArgoCD/Jenkins/GitHub Actions
Provide platform tooling and enablement for application developers, data engineers, and operations teams
Ensure security and access management (HashiCorp Vault, secrets management, least privilege)
Lead incident response, coordinate cross-functional resolution, and drive corrective actions and platform improvements
Script or develop tools in Bash, Python, or Go to eliminate toil and improve developer experience

Requirements

5+ years hands-on operating and managing Kubernetes and OpenShift clusters
Strong experience with Microsoft Azure (compute, networking, storage, and data services)
Proven skills in automation and Infrastructure-as-Code (Terraform, Ansible, GitOps)
Proficiency with observability tooling (Datadog, Prometheus, Grafana)
Scripting/coding ability in Bash, Python, or Go

Nice to have

Experience bridging on-prem and cloud in a hybrid service model (migration, integration, optimization)
Expertise with Kafka/AMQ, HashiCorp Vault, and ArgoCD/Jenkins/GitHub Actions
Background leading incident response and postmortems with strong RCA and continuous improvement practices

Tier4 Group - All Job Offers

Select Country

Site Reliability Engineer Platform Engineer

Job Description

Job Responsibility

Requirements

Nice to have

Looking for more opportunities?

Site Reliability Engineer Platform Engineer

Development Platform Site Reliability Engineer

Big Data/Data Platform Site Reliability Engineer

Cloud Platform Engineer (Site Reliability)

Senior Site Reliability Engineer Cloud Platform

Site Reliability Engineer - Container Platform

Site Reliability Engineer - Data Platform Operation

Senior Site Reliability Engineer - Automation Platform

Senior Site Reliability Engineer - Automation Platform

Our AI answers in your language