Cloud Platform Engineer (Site Reliability) Job at Amentum (Houston)

Job Description

We have an exciting opportunity for a Cloud Platform Engineer (Site Reliability) to join our team Nexus, a teammate company. As a Cloud Platform Engineer (Site Reliability), you will be part of a team working with NASA at Johnson Space Center (JSC) on deep space exploration projects like Orion, Lunar Gateway, and Artemis.

Job Responsibility

Developing new cloud-native platform services spanning all three major cloud environments
Developing best practices for cloud-native application development and promoting them within the organization
Administering NASA cloud networks and managing requests for deployment of COTS and Cloud Native applications into cloud environments
Writing quality code, providing quality and engaged code reviews for peers
Working with Managed Kubernetes offering across all three major cloud providers
Integrating cloud managed AI and data services with other bespoke and open-source Kubernetes applications
Developing best practices for cloud-native application development and promoting them within the organization
Identifying opportunities to abstract Prospective Project requirements and develop Enterprise-grade, multi-tenant Platform Services
Collaborate with NASA security and compliance teams to ensure teams are adhering to industry best practices and regulatory requirements
Working directly with NASA human spaceflight missions like Orion, Lunar Gateway, Artemis
Perform other duties as required

Requirements

Typically requires a bachelor’s degree or equivalent certification in a related area and normally possess 10 years of experience in the field or in a related area
Strong experience with Kubernetes in production
Ability to manage and use GitLab (preferably very proficient)
Hands-on experience with CI/CD pipeline tools
Observability Monitoring tools such as Grafana and SuperSet
Proficiency with Infrastructure-as-Code utilizing Terraform for infrastructure automation and/or open source alternatives (OpenTofu)
Extensive Linux experience (familiarity with Windows also preferred, but not required)
Expert in at least one programming language (Go and Python is preferred)
Experience with Python, SQL (and R is preferable)
Working understanding of Machine Learning Model Lifecycle management (is preferred)
Proof of U.S. Citizenship or US Permanent Residency may be a requirement for this position
Must be able to complete a U.S. government background investigation

Nice to have

Solution-oriented mindset, accepts feedback with enthusiasm
Effectively communicates with team members: clear status updates to direct report, regularly establishes unanimous decisions with peers
Self-motivated and self-managing, with strong organizational skills
Experience architecting and building cloud-native applications
Demonstrable success and confidence in owning the full project lifecycle
Good instincts for identifying opportunities for abstraction
Balances individual project development tasks at hand with overall platform and company goals

What we offer

Excellent personal and professional career growth
9/80 work schedule (every other Friday off), when applicable
Onsite cafeteria (breakfast & lunch)
Health, dental, and vision insurance
Paid time off and holidays
Retirement benefits (including 401(k) matching)
Educational reimbursement
Parental leave
Employee stock purchase plan
Tax-saving options
Disability and life insurance
Pet insurance

Amentum - All Job Offers

Select Country

Cloud Platform Engineer (Site Reliability)

Job Description

Job Responsibility

Requirements

Nice to have

What we offer

Looking for more opportunities?

Cloud Platform Engineer (Site Reliability)

Senior Site Reliability Engineer Cloud Platform

Staff Site Reliability Engineer - Cloud

Principal Site Reliability Engineer (Sovereign Cloud)

Principal Site Reliability Engineer (Sovereign Cloud)

Senior Site Reliability Engineer (SRE) – Cloud & Distributed Systems

Cloud Security Site Reliability Engineer

Cloud Security Site Reliability Engineer

Senior Vice President, Cloud Security Site Reliability Engineer

Our AI answers in your language