Senior Site Reliability Engineer Job at Onebrief (Arlington)

Job Description

We are hiring a Site Reliability Engineer to join our Infrastructure & Security team. You’ll work closely with fellow SREs, security, and customer success. You will be the first line of support for our mission critical deployments, and responsible for ensuring best-in-class service quality and issue resolution. You will work in both on-premise DoD environments and AWS cloud environments. Your lessons from the field will shape how our team works, from policy to implementation. In addition to working at the customer, you will contribute directly to solutions that increase stability, performance, and security of our deployments, and improve the overall experience of deploying and managing Onebrief on premise.

Job Responsibility

Implementing a World-Class Observability Platform
Defining and Upholding Reliability
Leading Incident Response
Automating for Scale and Security
Eliminating Toil and Scaling the Team

Requirements

An active Top Secret clearance
5+ years in Platform, DevOps, or Site Reliability Engineering with an infrastructure and operations focus
Proven partner to DevOps/Platform and application teams
A deep understanding of incident response processes
Technical expertise in Infrastructure as Code (Terraform, Ansible)
Technical expertise in Containers and orchestration (Kubernetes)
Technical expertise in CI/CD (GitLab CI/CD, Jenkins, GitHub Actions)
Technical expertise in Scripting (Python, Go, or Bash)
Technical expertise in Cloud (AWS or AWS GovCloud)
Technical expertise in Observability (Grafana stack, ELK stack, or Datadog)
Technical expertise in Networking fundamentals

Nice to have

Experience in DoD environments and compliance frameworks (RMF, STIGs, ICD 503)
GitOps practices and toolchains
Security‑minded design for sensitive environments
Experience designing and implementing meaningful SLIs/SLOs (including error budgets) for complex, distributed systems
Familiarity with on‑prem virtualization(VMware, Proxmox, Nutanix, Hyper-V, etc)
Service mesh exposure (Istio, Linkerd)
Relevant certifications (e.g., AWS DevOps Engineer, CKA/CKAD)
Active Security+ or another DoD 8570.01-approved security credential, or the ability to obtain the valid credentials within 3 months of employment

What we offer

Equity: Share in the company's success
Flexible Work Environment: Remote-first organization* with flexible work hours and unlimited PTO
Comprehensive Health Coverage: Health, dental, vision, and life insurance
Retirement Plan: 401(k) plan with company match
Parental Leave: 8 weeks at 100% regardless of state
Company Retreats: Annual company summit trips
Home Office Budget: $1,000 per year for home office improvements
Relocation assistance

Onebrief - All Job Offers

Select Country

Senior Site Reliability Engineer

Job Description

Job Responsibility

Requirements

Nice to have

What we offer

Looking for more opportunities?