CrawlJobs Logo

Senior AIOps Engineer (Platform & Infrastructure)

groupon.com Logo

Groupon

Location Icon

Location:

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

Groupon is moving beyond "experimenting" with AI to running it at massive scale. As we transition to an AI-First organization, we are building a centralized AIOps team to solve a critical challenge: moving AI features from fragmented prototypes to high-performing, cost-efficient production reality. As a Senior AIOps Engineer, you won't just be managing servers; you will be the architect of the "Golden Paths"—the reusable, automated infrastructure that enables our product teams to ship LLMs, Vector Search, and AI Agents faster than ever before.

Job Responsibility:

  • Architect the AI Stack: Design and operate core infrastructure on Kubernetes, including Vector Databases, LLM Gateways (LiteLLM), and workflow automation tools (n8n)
  • Enable at Scale: Drive AI adoption by creating self-service "Golden Paths" using Terraform and Helm, allowing engineering teams to deploy RAG pipelines with one click
  • Operational Excellence: Implement centralized observability, tracing (Langfuse), and governance to ensure our AI systems are reliable, auditable, and secure
  • Fiscal Discipline: Own the "AI Bill"—monitoring token usage and latency to optimize spend while maintaining high performance

Requirements:

  • 5+ years in Platform Engineering, SRE, or DevOps within a cloud-native environment
  • Deep experience managing stateful and stateless workloads (Helm, Istio, Docker)
  • Hands-on experience deploying and operating AI/ML tools or data-intensive systems in production
  • Strong skills in Python or Go to build custom API wrappers and automate operational tasks
  • Expertise in Prometheus, Grafana, and ELK stack to ensure end-to-end observability of complex AI requests
What we offer:
  • End-to-end Ownership: Real authority to standardize how a global company builds with AI
  • Career Growth: This is a high-visibility role within a new, strategic team with potential for leadership progression

Additional Information:

Job Posted:
January 31, 2026

Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior AIOps Engineer (Platform & Infrastructure)

Technology Outbound Product Manager

Join the innovators of OpsRamp as its technology product management leader, resp...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in marketing, engineering, computer science, or a related field
  • MBA or advanced technical degree preferred
  • 4+ years of experience in technical marketing, product marketing, or product management, or pre-sales in observability, ITOM, log management, SaaS and enterprise software, or IT infrastructure industries
  • Knowledge/experience with SaaS software preferred
  • Public cloud experience is a plus
  • Knowledge of application modernization (e.g., Kubernetes), automation (python, pipelines, PowerShell, etc.) is a plus
  • Proven track record of developing and executing successful GTM strategies and campaigns that drive awareness, demand generation, and market leadership
  • Excellent written and verbal communication skills, with the ability to distill complex technical concepts into clear, concise, and compelling messaging and content
  • Strong analytical skills and experience conducting market and competitive analysis to identify key trends, insights, and opportunities
  • Ability to work effectively in a fast-paced, dynamic environment with cross-functional teams and multiple stakeholders
Job Responsibility
Job Responsibility
  • Develop and execute technical evangelizing strategies to drive awareness, demand generation, and market leadership for OpsRamp solutions
  • Collaborate with product management and engineering teams to deeply understand product features, capabilities, and roadmaps, and translate them into compelling value propositions, messaging, and content
  • Create and maintain a wide range of technical collateral, including whitepapers, solution briefs, presentations, videos, demos, and blog posts
  • Drive the creation and delivery of technical enablement materials to support technical sales, partners, and customers, including training presentations, FAQs, and technical guides
  • Conduct market and competitive analysis to identify key trends, insights, and opportunities to differentiate OpsRamp in the ITOM market
  • Serve as a technical evangelist and spokesperson for OpsRamp at industry events, conferences, webinars, and customer meetings
  • Collaborate with product marketing and corporate marketing teams to develop technical content that drives engagement, leads, and pipeline
  • Gather key customer and target audience insights to inform product positioning and messaging as well as the product roadmap
  • Contribute to GTM strategy and messaging, and help maintain technical accuracy of marketing messages.
What we offer
What we offer
  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion
  • Fulltime
Read More
Arrow Right

Executive Director, Digital SRE & Operations

We’re building a world of health around every individual — shaping a more connec...
Location
Location
United States , Austin, Texas
Salary
Salary:
175100.00 - 334750.00 USD / Year
https://www.cvshealth.com/ Logo
CVS Health
Expiration Date
March 31, 2026
Flip Icon
Requirements
Requirements
  • 18+ years of experience in software engineering, platform operations, or site reliability engineering
  • 8+ years leading large-scale SRE, DevOps, or platform reliability organizations
  • Experience leveraging AI/ML for operations, including anomaly detection, predictive alerts, log analysis, or automated remediation
  • Familiarity with AIOps tools such as Datadog Watchdog, Dynatrace Davis, Splunk AI, Elastic AIOps, or custom ML/LLM solutions
  • Understanding of how to safely operate and monitor AI-enabled production systems
  • Deep expertise in distributed systems, cloud infrastructure, and high-availability architectures
  • Strong knowledge of SRE principles, DevOps, and reliability engineering at scale
  • Experience implementing AIOps or AI-driven operational tooling
  • Executive-level communication skills with the ability to influence senior leaders and business stakeholders
  • Experience operating mission-critical digital platforms serving millions of users
Job Responsibility
Job Responsibility
  • Define and own the enterprise SRE strategy, including SLOs, SLIs, error budgets, and reliability roadmaps
  • Establish reliability standards and practices across web, mobile, backend services, APIs, data platforms, and AI workloads
  • Drive a culture of reliability-by-design and operational excellence across engineering teams
  • Lead adoption of AIOps capabilities for proactive issue detection, alert noise reduction, and predictive failure prevention
  • Implement AI-assisted incident triage, automated runbooks, root-cause analysis, and self-healing systems
  • Partner with the AI Platform team to integrate LLMs and ML models into operational workflows (log summarization, anomaly detection, remediation)
  • Own enterprise observability strategy across metrics, logs, traces, and user experience monitoring
  • Standardize tooling and practices using platforms such as Datadog, Splunk, Prometheus, Grafana, OpenTelemetry
  • Deliver real-time dashboards and executive reporting on uptime, performance, latency, and error budgets
  • Partner with DevOps and Platform teams to ensure safe, automated, and scalable CI/CD pipelines
What we offer
What we offer
  • Affordable medical plan options
  • 401(k) plan (including matching company contributions)
  • Employee stock purchase plan
  • No-cost programs for all colleagues including wellness screenings, tobacco cessation and weight management programs, confidential counseling and financial coaching
  • Paid time off
  • Flexible work schedules
  • Family leave
  • Dependent care resources
  • Colleague assistance programs
  • Tuition assistance
  • Fulltime
Read More
Arrow Right

DevOps Engineer (Platform Specialist)

Your goal is to make software delivery invisible. You will design and maintain t...
Location
Location
United States , Baltimore
Salary
Salary:
Not provided
https://www.roberthalf.com Logo
Robert Half
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Mid-to-Senior (4+ years) experience
  • Infrastructure as Code (IaC) using Terraform or Pulumi
  • Multi-cloud environment management (AWS/Azure)
  • Building and optimizing CI/CD workflows
  • AIOps for predicting deployment failures
  • Production Kubernetes cluster management
  • Knowledge of daemonless runtimes like Podman
  • Automated security scanning (SAST/DAST) and compliance checks
  • High-cardinality monitoring using OpenTelemetry, Grafana, and Prometheus
Job Responsibility
Job Responsibility
  • Design and maintain the Internal Developer Platform (IDP)
  • Transition from traditional CI/CD to autonomous, self-healing pipelines
  • Use Terraform or Pulumi to manage multi-cloud environments (AWS/Azure)
  • Build and optimize CI/CD workflows that use AIOps to predict deployment failures
  • Manage production Kubernetes clusters and explore daemonless runtimes like Podman
  • Inject automated security scanning (SAST/DAST) and compliance checks directly into the build process
  • Implement high-cardinality monitoring using OpenTelemetry, Grafana, and Prometheus
What we offer
What we offer
  • Medical, vision, dental, and life and disability insurance
  • Company 401(k) plan
Read More
Arrow Right

DevOps Engineer (Platform Specialist)

Your goal is to make software delivery invisible. You will design and maintain t...
Location
Location
United States , Washington, DC
Salary
Salary:
Not provided
https://www.roberthalf.com Logo
Robert Half
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Mid-to-Senior (4+ years) experience
  • Infrastructure as Code (IaC) using Terraform or Pulumi
  • Multi-cloud environment management (AWS/Azure)
  • Building and optimizing CI/CD workflows
  • AIOps for predicting deployment failures
  • Production Kubernetes cluster management
  • Knowledge of daemonless runtimes like Podman
  • Automated security scanning (SAST/DAST) and compliance checks
  • High-cardinality monitoring using OpenTelemetry, Grafana, and Prometheus
  • Must be legally authorized to work in the United States
Job Responsibility
Job Responsibility
  • Design and maintain the Internal Developer Platform (IDP)
  • Transition from traditional CI/CD to autonomous, self-healing pipelines
  • Use Terraform or Pulumi to manage multi-cloud environments (AWS/Azure)
  • Build and optimize CI/CD workflows that use AIOps to predict deployment failures
  • Manage production Kubernetes clusters and explore daemonless runtimes like Podman
  • Inject automated security scanning (SAST/DAST) and compliance checks directly into the build process
  • Implement high-cardinality monitoring using OpenTelemetry, Grafana, and Prometheus
What we offer
What we offer
  • Medical, vision, dental, and life and disability insurance
  • Company 401(k) plan
  • Free online training
Read More
Arrow Right

Lead / Principal Software Engineer

We’re hiring Lead and Principal Software Engineers to build the next generation ...
Location
Location
Australia , Sydney
Salary
Salary:
Not provided
blumeglobal.com Logo
Blume Global
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years building scalable, fault-tolerant systems and enterprise software
  • Strong experience with backend architecture, platform modernization, and CI/CD
  • Proficiency in C#, Java, Python, SQL, and JavaScript
  • Experience with cloud infrastructure (AWS, Kinesis, Lambda) and DevOps tools (Docker, Kubernetes, Jenkins)
  • Proven ability to lead technical decisions, mentor engineers, and improve team productivity
  • Strong experience integrating and evaluating AI tools like GitHub Copilot and AIOps in real-world engineering workflows
  • Strong communication across product, compliance, and engineering teams
  • Track record of aligning technical work with business outcomes and customer value
Job Responsibility
Job Responsibility
  • Build the next generation of our platforms
  • Work on high-scale systems that process billions of transactions
  • Modernize core infrastructure
  • Drive AI initiatives to improve performance and reliability
  • Set technical direction
  • Mentor senior engineers
  • Shape architecture across multiple domains
What we offer
What we offer
  • Competitive Package + Equity
  • Find the team/project that fits you best
  • Hybrid and Flexible Work
  • Continuous Learning and Growth
  • Access learning platforms (Coursera, Pluralsight, LinkedIn Learning, WiseTech Academy), mentorship, and development opportunities
  • Top-Tier Hardware
  • Onsite Meals and Snacks
Read More
Arrow Right

Senior Product Marketing Manager

Are you passionate about cloud computing and the future of intelligent cloud ope...
Location
Location
United States , Redmond
Salary
Salary:
106400.00 - 203600.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master's Degree in Marketing, Computer Science, Business or related field AND 3+ years experience in business OR Bachelor's Degree in Marketing, Computer Science, Business or related field AND 5+ years experience in business OR equivalent experience
  • Strong background in B2B audience marketing, cloud infrastructure, AIOps platform, or adjacent technical domains
  • Proven experience launching complex, technical products and shaping new or emerging categories
  • Deep comfort with technical concepts (cloud architecture, AI systems, automation, APIs)
  • Exceptional positioning, messaging, and storytelling skills
  • Strategic thinker who can also execute with speed and precision
  • Customer-obsessed and insight-driven
Job Responsibility
Job Responsibility
  • Develop and lead the outbound marketing strategy for agentic cloud operations, from early-category definition to scale
  • Develop differentiated positioning, messaging frameworks, and value propositions for technical and business audiences
  • Define customer personas, and use cases across platform, infrastructure, and AI-driven operations teams
  • Partner closely with Integrated Marketing and Audience Marketing to execute outbound marketing campaigns, track results and optimize campaigns or programs
  • Lead go-to-market planning and execution for major product launches and feature releases for the agentic cloud ops portfolio
  • Craft the core narrative around agentic systems across key cloud operations domains and lifecycle i.e. deployment/configuration, observability, resiliency, optimization, and security
  • Translate complex technical concepts into clear, compelling stories without oversimplifying
  • Partner with Go-To-Market managers to build enablement assets (pitch decks, demos, battlecards, case studies)
  • Equip Go-To-Market and field teams to sell a new category with confidence and consistency
  • Support enterprise, mid-market, and developer-led motions as needed
  • Fulltime
Read More
Arrow Right
New

Senior Python Engineer

A Senior Engineer opportunity within our Enterprise AI team. Working with a grou...
Location
Location
United Kingdom , Fleet Place Office
Salary
Salary:
Not provided
justeattakeaway.com Logo
Just Eat Takeaway.com
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience working with cloud platforms like AWS (EC2, ECS, S3, Lambda, Fargate, DynamoDB/RDS) or GCP (Compute Engine, Cloud Storage, Cloud Functions, BigQuery)
  • Strong experience in Python and fluency in another language
  • Knowledge of Infrastructure as Code tools (e.g., CloudFormation, Terraform, Ansible, Serverless Framework)
  • Enjoy automating processes
  • Knowledge of containers (Docker, Container Orchestration like Kubernetes/ECS/GKE)
  • A genuine interest in and at least foundational experience with AI/ML concepts and technologies, demonstrating an eagerness to grow into a specialised AI Engineering role
  • Proven track record of delivering high-quality work and driving forward best practices in software engineering
  • Stays up to date with new technology in the AI space
Job Responsibility
Job Responsibility
  • Design, develop, and deploy high-quality, scalable software solutions, focusing on AI-enabled applications and infrastructure
  • Lead and participate in technical projects and deployments of AI systems
  • Provide guidance and mentoring to other team members on best practices in AI engineering
  • Use best practices (e.g., MLOps, AIOps) to improve products/services and processes related to AI
  • Optimise existing model serving and data pipelines to meet changing performance and security requirements
  • Hold requirements gathering sessions with business stakeholders and data science teams
  • Lead functional projects or work streams focused on AI infrastructure and tooling
  • Fulltime
Read More
Arrow Right

Software Engineering Director

We are seeking an experienced Software Engineering Director to lead the company’...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
awtg.co.uk Logo
AWTG
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven experience (10+ years) in software engineering, technical leadership, or similar roles, with at least 3 years in a senior management capacity
  • Strong background in software development, architecture, and systems design
  • Extensive experience in implementing AI-first software
  • Proven experience in AI development and AIOps implementation
  • Experience with various cloud platforms (GCP, AWS, Azure, Etc), DevOps tools
  • Demonstrated ability to scale technical teams and deliver complex software projects on time and on budget
  • Experience in creating solutions that has cloud, web, mobile app components
  • In-depth knowledge of cybersecurity, data privacy regulations, and compliance standards
  • In-depth knowledge of various AI methodologies and learning algorithms
  • Proven experience in various programming languages like Python, Java, React, C#, domain specific languages, native and cross platform development, etc
Job Responsibility
Job Responsibility
  • Define and oversee the company’s technical vision, strategy, software development, and product roadmap
  • Align technology initiatives with the company’s vision, business objectives and growth strategies
  • Evaluate and implement emerging technologies to maintain a competitive edge
  • Implement an AI-first software vision on products, platforms and solutions
  • Secure internal and external funding for development of new technologies and innovations
  • Manage P&L for the entire Software Division
  • Develop products and platforms that is ready for accelerate and sustain growth
  • Lead revenue generation activities including ensuring that bids and proposals are in top quality
  • Build, lead, and mentor a high-performing team of developers, engineers, and IT professionals
  • Foster a culture of innovation, collaboration, and continuous improvement within software engineering and product teams
  • Fulltime
Read More
Arrow Right