Engineer, SRE GenAI Jobs

Engineer, SRE GenAI

T-Mobile

Location:
United States , Bellevue ▼
Overland Park
Frisco

Category:
IT - Software Development

Contract Type:
Not provided

Salary:

92500.00 - 166800.00 USD / Year

Save Job

Apply Position

Job Description:

As an Engineer in Site Reliability Engineering (SRE) for AI Systems, you will help ensure the reliability, scalability, and performance of AI platforms. This role includes participating in on-call rotations, improving system observability, and supporting operations across cloud-native infrastructure. This is a hands-on role ideal for someone with foundational SRE skills and a growth mindset to expand in GenAI and LLM infrastructure operations.

Job Responsibility:

Participate in on-call rotations to support AI platforms and respond to production incidents with urgency and precision
Monitor system health and performance using tools like Grafana, Splunk, and PowerBI
Support cloud-native infrastructure deployments, with a focus on Azure (primary), and exposure to AWS or GCP
Implement runbooks and automate repetitive operational tasks to reduce toil
Support CI/CD pipelines and IaC deployments using Gitlab pipelines, Databricks
Assist in the development and enforcement of Service Level Objectives (SLOs) and real-time alerts for AI APIs and services
Collaborate with senior engineers to improve platform reliability and scale LLM-based applications

Requirements:

Bachelor's Degree Computer Science, Engineering or a related field
2–4 years of experience in DevOps, SRE, or cloud platform engineering
Hands-on experience with monitoring/logging systems such as Prometheus, Grafana, Splunk, or OpenSearch
Familiarity with cloud environments (preferably Azure
AWS/GCP a plus)
Experience in scripting or automation using Python, Bash, or PowerShell
Basic understanding of containerization (Docker, Kubernetes) and CI/CD concepts
Willingness to participate in an on-call schedule and incident resolution
Strong solving and root cause analysis skills
Communication
Customer Service
Analytics
Technical Writing
At least 18 years of age
Legally authorized to work in the United States

Nice to have:

Exposure to AI/ML infrastructure or LLM-based systems (e.g., OpenAI, ChatGPT, Azure OpenAI)
Experience with infrastructure-as-code tools like Terraform or ARM templates
Familiarity with LLM observability or API token usage metrics
Passion for learning AI reliability practices and collaborating with cross-functional teams

What we offer:

Competitive base salary and compensation package
Annual stock grant
Employee stock purchase plan
401(k)
Access to free, year-round money coaches
Annual bonus or periodic sales incentive or bonus
Medical, dental and vision insurance
Flexible spending account
Paid time off and up to 12 paid holidays
Paid parental and family leave
Family building benefits
Back-up care
Enhanced family support
Childcare subsidy
Tuition assistance
College coaching
Short- and long-term disability
Voluntary AD&D coverage
Voluntary accident coverage
Voluntary life insurance
Voluntary disability insurance
Voluntary long-term care insurance
Mobile service & home internet discounts
Pet insurance
Access to commuter and transit programs

Additional Information:

Job Posted:
December 27, 2025

Employment Type:

Fulltime

Work Type:

On-site work

T-Mobile - All Job Offers

Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Welcome to CrawlJobs.com –
Your Global Job Discovery Platform

At CrawlJobs.com, we simplify finding your next career opportunity by bringing job listings directly to you from all corners of the web. Using cutting-edge AI and web-crawling technologies, we gather and curate job offers from various sources across the globe, ensuring you have access to the most up-to-date job listings in one place.

Engineer, SRE GenAI

T-Mobile

Location:
United States , Bellevue ▼
Overland Park
Frisco

Category:
IT - Software Development

Contract Type:
Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Nice to have:

Additional Information:

Job Posted:
December 27, 2025

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Engineer, SRE GenAI

Senior AI Site Reliability Engineer

Senior DevOps Engineer (GCP)

Distinguished Technologist, Deep Learning

Distinguished Technologist, Cloud Development (AI/ML)

Assistant Special Educational Needs Coordinator

Account Executive, Business Sales

Cleaner

Digital Product Manager

Engineer, SRE GenAI

T-Mobile

Location:United States , Bellevue ▼Overland ParkFrisco

Category:IT - Software Development

Contract Type:Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Nice to have:

Additional Information:

Job Posted:December 27, 2025

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Engineer, SRE GenAI

Senior AI Site Reliability Engineer

Senior DevOps Engineer (GCP)

Distinguished Technologist, Deep Learning

Distinguished Technologist, Cloud Development (AI/ML)

Assistant Special Educational Needs Coordinator

Account Executive, Business Sales

Cleaner

Digital Product Manager

Location:
United States , Bellevue ▼
Overland Park
Frisco

Category:
IT - Software Development

Contract Type:
Not provided

Job Posted:
December 27, 2025