Staff Engineer – Reliability Engineering Job at Geico (Bethesda, MD)

Job Description

At GEICO, we offer a rewarding career where your ambitions are met with endless possibilities. Every day we honor our iconic brand by offering quality coverage to millions of customers and being there when they need us most. We thrive through relentless innovation to exceed our customers’ expectations while making a real impact for our company through our shared purpose. When you join our company, we want you to feel valued, supported and proud to work here. That’s why we offer The GEICO Pledge: Great Company, Great Culture, Great Rewards and Great Careers. Our Staff Engineer works with our Distinguished Engineers and Sr.Engineers to innovate and build new systems, improve, and enhance existing systems and identify new opportunities to apply your knowledge to solve critical problems. As a Site Reliability Engineer (SRE) at GEICO, you will tackle the unique challenges of operating at scale, leveraging expertise in coding and large-scale system design. You will also participate in on-call rotations, troubleshooting and post-mortem analysis to improve system reliability and minimize operational impact.

Job Responsibility

Focus on multiple areas and provide strategic and technical guidance
Utilize programming languages like Go, Python, Java, .Net or other object-oriented languages, SQL, and NoSQL databases
Work with container orchestration tools such as Docker and Kubernetes (K8S), OpenStack and a variety of Azure tools and services
Architect and develop cloud-native applications using Azure Services
Collaborate with product managers, team members, customers, and other engineering teams to solve our toughest problems
Ensure the quality, performance and usability of the engineering solutions
Serve as a mentor and thought leader, coaching engineers and Influence and educate executives
Drive best practices for platform reliability, disaster recovery, monitoring, alerting, and incident management
Collaborate with cross-functional teams (Platform engineering, DevOps, SREs) to integrate, test, and improve platform reliability and performance
Determine and support resource requirements, evaluate operational processes, measure outcomes to ensure desired results, demonstrate adaptability and sponsor continuous learning
Willing to take on-call and operation support

Requirements

Experience in at least two modern programming languages (Go, Python, Java, .NET) and object-oriented design
Advance knowledge of web technologies such as HTML, CSS, JavaScript is preferred
Understand open-source databases like MySQL, PostgreSQL, etc., familiar with No-SQL databases like ONgDB, Cassandra, MongoDB, Elasticsearch, etc.
Deep hands-on experience in complex system design and data pipeline and architectures, scale and performance, tuning, with good knowledge of Docker and Kubernetes
Hands-on experience with major cloud platforms (Azure, AWS, GCP) or large-scale private data center environments
Experience managing distributed systems in public, private or hybrid cloud environments
Experience with monitoring, logging and observability tools (Prometheus, Grafana, Open Telemetry)
Passion for automation and reducing manual operations using tools like Terraform and Ansible
Familiarity with configuration management and orchestration tools like Helm, Puppet, Spinnaker
Experience with CI/CD pipelines, Infrastructure as Code(IaC), and cloud-based deployments
Knowledge of developer tooling across the software development life cycle (task management, source code, building, deployment, test automation and related tools, operations, real-time communication)
Ability to operate in fast-paced, high-scale environment with a problem-solving mindset
Knowledge in ML and AI technologies is a plus
6+ years of professional experience in software development, platform architecture, administration, governance, infrastructure management, installation, and maintenance of the hardware, software, and network systems
4+ years of experience in open-source frameworks
3+ years of experience with architecture and design
3+ years of experience with AWS, GCP, Azure, or hybrid data center
Bachelor's degree in computer science, Information Systems, or equivalent education or work experience

Nice to have

Knowledge in ML and AI technologies is a plus

What we offer

Market-competitive compensation
401K savings plan vested from day one with 6% match
Performance and recognition-based incentives
Tuition assistance
Mental healthcare
Fertility and adoption assistance
Workplace flexibility
GEICO Flex program (ability to work from anywhere in the US for up to four weeks per year)

Geico - All Job Offers

Select Country

Staff Engineer – Reliability Engineering

Job Description

Job Responsibility

Requirements

Nice to have

What we offer

Looking for more opportunities?

Staff Engineer – Reliability Engineering

Staff Engineer, Software Reliability Engineering

Staff Engineer, Product Development Engineering (Memory Reliability)

Staff Engineer, Site Reliability Engineer

Staff Site Reliability Engineer - Incident Management & Reliability

Site Reliability Engineer Staff

Senior Staff Reliability Engineer

Staff Reliability Engineer - AI & Hyperscale Server NPI and Mfg.

Staff Reliability Engineer

Our AI answers in your language