Senior Platform Engineer Job at T-Mobile (Bellevue)

Job Description

We are seeking a foundational Site Reliability Engineer to join our Device Insurance Technology team as we build a new internal engineering capability. This role is a unique opportunity to help establish our DevOps and SRE practices from the ground up within a modern, cloud-native environment. As the first dedicated SRE on the team, you will play a critical role in designing, building, and owning CI/CD pipelines, deployment processes, and production observability systems. You will work closely with development teams, architects, and external partners to transition operational ownership from legacy systems and enable scalable, reliable service delivery. This is a hands-on, high-impact engineering role where you will balance building foundational systems with supporting live production environments. You will help shape the future operating model, drive automation, and influence how reliability is implemented across the platform.

Job Responsibility

Develop, configure, and support CI/CD pipelines
Automate build, test, and deployment workflows to enable safe and repeatable releases
Integrate automated quality checks, code scanning, and deployment validations into pipelines
Support containerized deployments using Docker and Kubernetes
Use Infrastructure-as-Code (IaC) tools like Helm to manage cloud infrastructure
Participate in automated provisioning of environments and system configurations
Embed monitoring and alerting into delivery pipelines
Support debugging of build, deployment, and environment issues across Dev/Test/Prod systems
Automate processes to enhance system reliability and resilience
Minimize operational incidents through proactive monitoring and maintenance
Develop scripts, tools and automation to reduce manual efforts in operational tasks
Manage incident response to ensure rapid recovery and minimal disruption
Help build and maintain dashboards, alerts, and logs that provide visibility into system health and application behavior
Use tools such as Prometheus, Grafana, Splunk, or OpenTelemetry to monitor services and infrastructure
Analyze system performance data to guide optimizations and proactively detect issues
Adapt to new technologies to maintain and enhance system robustness
Contribute to documentation, runbooks, playbooks, and operational readiness reviews

Requirements

4+ years of experience in DevOps and SRE role
Experience in developing and maintaining CI/CD pipelines for software deployment
Experience with Gitlab pipelines and helm
4+ years - Implementing and managing cloud-native platforms and solutions
Hands-on experience with containerization (Docker, Kubernetes)
4+ years Hands-on experience with monitoring/logging tools such as Splunk, Grafana, OpenTelemetry and incident management
4+ years - Guiding and mentoring teams in reliability engineering practices
Understanding of web protocols, how full stack applications operate and data flows
Basic knowledge of at least one major cloud platform (AWS preferred)
Strong communication skills and ability to work under pressure
Bachelor's Degree plus 3 years of related work experience OR advanced degree with 1 year of related work experience OR combination of education and experience deemed equivalent
Acceptable areas of study include Computer Science, Engineering or related field
At least 18 years of age
Legally authorized to work in the United States

Nice to have

Experience integrating DevSecOps tools like code scanning, policy enforcement or container image validation
Understanding of blue/green, canary or rolling deployment strategies
Exposure to artifact management, secrets management or GitOps workflows
Exposure to incident management frameworks including alerting, escalation and postmortem practices
Understanding of Agile methodologies to improve and streamline processes
Ability to analyze system performance data to identify trends and improvement opportunities
Capability to drive innovation in system management and operations through new technologies and approaches
Ability to adapt to new technologies and changes in the digital landscape to maintain system robustness
Experience using generative AI tools (e.g., Claude, GitHub Copilot) for development support and task acceleration
AWS Certified DevOps Engineer
Certified Kubernetes Administrator
Google Cloud Certified - Professional DevOps Engineer

What we offer

Competitive base salary and compensation package
Annual stock grant
Employee stock purchase plan
401(k)
Access to free, year-round money coaches
Annual bonus or periodic sales incentive or bonus
Medical, dental and vision insurance
Flexible spending account
Paid time off
Up to 12 paid holidays
Paid parental and family leave
Family building benefits
Back-up care
Enhanced family support
Childcare subsidy
Tuition assistance
College coaching
Short- and long-term disability
Voluntary AD&D coverage
Voluntary accident coverage
Voluntary life insurance
Voluntary disability insurance
Voluntary long-term care insurance
Mobile service & home internet discounts
Pet insurance
Access to commuter and transit programs

T-Mobile - All Job Offers

Select Country

Senior Platform Engineer

Job Description

Job Responsibility

Requirements

Nice to have

What we offer

Looking for more opportunities?