CrawlJobs Logo

Senior AIOps Engineer (Platform & Infrastructure)

groupon.com Logo

Groupon

Location Icon

Location:

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

Groupon is moving beyond "experimenting" with AI to running it at massive scale. As we transition to an AI-First organization, we are building a centralized AIOps team to solve a critical challenge: moving AI features from fragmented prototypes to high-performing, cost-efficient production reality. As a Senior AIOps Engineer, you won't just be managing servers; you will be the architect of the "Golden Paths"—the reusable, automated infrastructure that enables our product teams to ship LLMs, Vector Search, and AI Agents faster than ever before.

Job Responsibility:

  • Architect the AI Stack: Design and operate core infrastructure on Kubernetes, including Vector Databases, LLM Gateways (LiteLLM), and workflow automation tools (n8n)
  • Enable at Scale: Drive AI adoption by creating self-service "Golden Paths" using Terraform and Helm, allowing engineering teams to deploy RAG pipelines with one click
  • Operational Excellence: Implement centralized observability, tracing (Langfuse), and governance to ensure our AI systems are reliable, auditable, and secure
  • Fiscal Discipline: Own the "AI Bill"—monitoring token usage and latency to optimize spend while maintaining high performance

Requirements:

  • 5+ years in Platform Engineering, SRE, or DevOps within a cloud-native environment
  • Deep experience managing stateful and stateless workloads (Helm, Istio, Docker)
  • Hands-on experience deploying and operating AI/ML tools or data-intensive systems in production
  • Strong skills in Python or Go to build custom API wrappers and automate operational tasks
  • Expertise in Prometheus, Grafana, and ELK stack to ensure end-to-end observability of complex AI requests
What we offer:
  • End-to-end Ownership: Real authority to standardize how a global company builds with AI
  • Career Growth: This is a high-visibility role within a new, strategic team with potential for leadership progression

Additional Information:

Job Posted:
January 31, 2026

Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior AIOps Engineer (Platform & Infrastructure)

New

Managing Vice President - Infrastructure Platforms & Operations

The Managing Vice President, Infrastructure Platforms & Operations is a senior t...
Location
Location
United States , Bethesda
Salary
Salary:
215700.00 - 389700.00 USD / Year
https://www.marriott.com Logo
Marriott Bonvoy
Expiration Date
May 20, 2026
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Information Systems, Engineering, Business Administration, or related technical field
  • 15+ years of senior leadership experience across cloud engineering, infrastructure platforms, network services, and/or enterprise workplace technologies, preferably in a large global Fortune 500 organization
  • 10+ years of prior hands-on technical engineering or development experience (cloud, infrastructure, networking, automation, or enterprise platforms)
  • Demonstrated success leading large, multi-disciplinary global engineering and operations organizations
  • Deep expertise in multi-cloud platforms, network architecture, DevSecOps, automation, and reliability engineering
  • Strong experience partnering with cybersecurity teams to deliver secure by design platforms
  • Proven ability to influence senior executives and lead transformation in complex, matrixed enterprises
  • Strong financial acumen with experience managing large technology budgets and vendor portfolios
Job Responsibility
Job Responsibility
  • Lead global teams responsible for cloud foundations, DevOps and CI/CD platforms, automation, container platforms, service mesh, and self-service engineering capabilities
  • Oversee enterprise cloud landing zones across all regions, ensuring secure, scalable, and cost-efficient architecture
  • Drive modernization of hybrid platforms, including datacenter, edge compute, and infrastructure engineering capabilities
  • Oversee SRE, observability, resiliency, and disaster recovery governance
  • Lead global network architecture and operations across datacenter networks, property connectivity, enterprise networks, and cloud network integration
  • Drive transformation of Marriott's global connectivity ecosystem, including SD WAN, wireless, secure network edge, voice, and network automation
  • Ensure network performance, reliability, compliance, and resiliency at global scale
  • Lead workplace technology platforms supporting collaboration, productivity, endpoint, and digital employee experience solutions
  • Partner with business, HR, and IT leaders to deliver intuitive, reliable, and secure workplace tools that enable associate productivity
  • Drive standardization, modernization, and lifecycle management of workplace platforms and services
What we offer
What we offer
  • 401(k) plan
  • stock purchase plan
  • discounts at Marriott properties
  • commuter benefits
  • employee assistance plan
  • childcare discounts
  • medical
  • dental
  • vision
  • health care flexible spending account
  • Fulltime
Read More
Arrow Right

Technology Outbound Product Manager

Join the innovators of OpsRamp as its technology product management leader, resp...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in marketing, engineering, computer science, or a related field
  • MBA or advanced technical degree preferred
  • 4+ years of experience in technical marketing, product marketing, or product management, or pre-sales in observability, ITOM, log management, SaaS and enterprise software, or IT infrastructure industries
  • Knowledge/experience with SaaS software preferred
  • Public cloud experience is a plus
  • Knowledge of application modernization (e.g., Kubernetes), automation (python, pipelines, PowerShell, etc.) is a plus
  • Proven track record of developing and executing successful GTM strategies and campaigns that drive awareness, demand generation, and market leadership
  • Excellent written and verbal communication skills, with the ability to distill complex technical concepts into clear, concise, and compelling messaging and content
  • Strong analytical skills and experience conducting market and competitive analysis to identify key trends, insights, and opportunities
  • Ability to work effectively in a fast-paced, dynamic environment with cross-functional teams and multiple stakeholders
Job Responsibility
Job Responsibility
  • Develop and execute technical evangelizing strategies to drive awareness, demand generation, and market leadership for OpsRamp solutions
  • Collaborate with product management and engineering teams to deeply understand product features, capabilities, and roadmaps, and translate them into compelling value propositions, messaging, and content
  • Create and maintain a wide range of technical collateral, including whitepapers, solution briefs, presentations, videos, demos, and blog posts
  • Drive the creation and delivery of technical enablement materials to support technical sales, partners, and customers, including training presentations, FAQs, and technical guides
  • Conduct market and competitive analysis to identify key trends, insights, and opportunities to differentiate OpsRamp in the ITOM market
  • Serve as a technical evangelist and spokesperson for OpsRamp at industry events, conferences, webinars, and customer meetings
  • Collaborate with product marketing and corporate marketing teams to develop technical content that drives engagement, leads, and pipeline
  • Gather key customer and target audience insights to inform product positioning and messaging as well as the product roadmap
  • Contribute to GTM strategy and messaging, and help maintain technical accuracy of marketing messages.
What we offer
What we offer
  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion
  • Fulltime
Read More
Arrow Right
New

Principal Engineer-Site Reliability Engineering and AIOps

We are looking for a Principal Engineer to set the enterprise technical directio...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
https://www.wellsfargo.com/ Logo
Wells Fargo
Expiration Date
May 10, 2026
Flip Icon
Requirements
Requirements
  • 7+ years of Engineering experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
  • 7+ years of engineering experience, including principal-level technical leadership on large-scale reliability, production operations, or platform programs across complex environments
  • 7+ years of software engineering experience (e.g., Java, C#, Python) with demonstrated expertise in system design and distributed systems
  • track record of delivering reusable automation and platform capabilities adopted by multiple teams
  • 5+ years operating Linux/Unix and Windows platforms in production, including performance tuning, capacity planning, and reliability hardening for mission-critical services
  • 5+ years designing and operating cloud solutions (public and/or private cloud), including reliability and security architecture, infrastructure-as-code, and cost-aware engineering at scale
  • 5+ years leading reliability and operations practices for enterprise-scale, highly available services, including major incident leadership, problem management, and establishing operational readiness mechanisms
  • 5+ years architecting and scaling full-stack observability solutions, including instrumentation standards, alert strategy, service dashboards, and governance that improves signal quality and reduces noise
  • 5+ years with automation and observability toolsets (e.g., Ansible, Grafana, Elastic, Splunk, Prometheus) and experience building reusable components, templates, and paved paths integrated with CI/CD
  • Exceptional communication and influence skills, including the ability to align senior stakeholders, drive technical decisions across organizations, and clearly articulate risk, tradeoffs, and recommended paths forward
Job Responsibility
Job Responsibility
  • Act as an advisor to leadership to develop or influence applications, network, information security, database, operating systems, or web technologies for highly complex business and technical needs across multiple groups
  • Lead the strategy and resolution of highly complex and unique challenges requiring in-depth evaluation across multiple areas or the enterprise, delivering solutions that are long-term, large-scale and require vision, creativity, innovation, advanced analytical and inductive thinking
  • Translate advanced technology experience, an in-depth knowledge of the organizations tactical and strategic business objectives, the enterprise technological environment, the organization structure, and strategic technological opportunities and requirements into technical engineering solutions
  • Provide vision, direction and expertise to leadership on implementing innovative and significant business solutions
  • Maintain knowledge of industry best practices and new technologies and recommends innovations that enhance operations or provide a competitive advantage to the organization
  • Strategically engage with all levels of professionals and managers across the enterprise and serve as an expert advisor to leadership
  • Set and evangelize the SRE and AIOps technical strategy for EFT, establishing reference architectures, standards, and guardrails (service tiering, onboarding criteria, SLO/error budget governance) and holding teams accountable through transparent executive-level reporting
  • Act as a principal-level technical advisor and multiplier: mentor senior engineers, contribute to hiring and technical bar-raising, and define reliability patterns and guardrails across applications, networks, databases, operating systems, and web technologies
  • Own the reliability and observability architecture across hybrid/multi-cloud, driving standardization of monitoring, logging, tracing, synthetics, and resilience/chaos testing
  • define platform patterns that teams can adopt with minimal friction
  • Fulltime
!
Read More
Arrow Right

Lead / Principal Software Engineer

We’re hiring Lead and Principal Software Engineers to build the next generation ...
Location
Location
Australia , Sydney
Salary
Salary:
Not provided
blumeglobal.com Logo
Blume Global
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years building scalable, fault-tolerant systems and enterprise software
  • Strong experience with backend architecture, platform modernization, and CI/CD
  • Proficiency in C#, Java, Python, SQL, and JavaScript
  • Experience with cloud infrastructure (AWS, Kinesis, Lambda) and DevOps tools (Docker, Kubernetes, Jenkins)
  • Proven ability to lead technical decisions, mentor engineers, and improve team productivity
  • Strong experience integrating and evaluating AI tools like GitHub Copilot and AIOps in real-world engineering workflows
  • Strong communication across product, compliance, and engineering teams
  • Track record of aligning technical work with business outcomes and customer value
Job Responsibility
Job Responsibility
  • Build the next generation of our platforms
  • Work on high-scale systems that process billions of transactions
  • Modernize core infrastructure
  • Drive AI initiatives to improve performance and reliability
  • Set technical direction
  • Mentor senior engineers
  • Shape architecture across multiple domains
What we offer
What we offer
  • Competitive Package + Equity
  • Find the team/project that fits you best
  • Hybrid and Flexible Work
  • Continuous Learning and Growth
  • Access learning platforms (Coursera, Pluralsight, LinkedIn Learning, WiseTech Academy), mentorship, and development opportunities
  • Top-Tier Hardware
  • Onsite Meals and Snacks
Read More
Arrow Right

Senior Product Marketing Manager

Are you passionate about cloud computing and the future of intelligent cloud ope...
Location
Location
United States , Redmond
Salary
Salary:
106400.00 - 203600.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master's Degree in Marketing, Computer Science, Business or related field AND 3+ years experience in business OR Bachelor's Degree in Marketing, Computer Science, Business or related field AND 5+ years experience in business OR equivalent experience
  • Strong background in B2B audience marketing, cloud infrastructure, AIOps platform, or adjacent technical domains
  • Proven experience launching complex, technical products and shaping new or emerging categories
  • Deep comfort with technical concepts (cloud architecture, AI systems, automation, APIs)
  • Exceptional positioning, messaging, and storytelling skills
  • Strategic thinker who can also execute with speed and precision
  • Customer-obsessed and insight-driven
Job Responsibility
Job Responsibility
  • Develop and lead the outbound marketing strategy for agentic cloud operations, from early-category definition to scale
  • Develop differentiated positioning, messaging frameworks, and value propositions for technical and business audiences
  • Define customer personas, and use cases across platform, infrastructure, and AI-driven operations teams
  • Partner closely with Integrated Marketing and Audience Marketing to execute outbound marketing campaigns, track results and optimize campaigns or programs
  • Lead go-to-market planning and execution for major product launches and feature releases for the agentic cloud ops portfolio
  • Craft the core narrative around agentic systems across key cloud operations domains and lifecycle i.e. deployment/configuration, observability, resiliency, optimization, and security
  • Translate complex technical concepts into clear, compelling stories without oversimplifying
  • Partner with Go-To-Market managers to build enablement assets (pitch decks, demos, battlecards, case studies)
  • Equip Go-To-Market and field teams to sell a new category with confidence and consistency
  • Support enterprise, mid-market, and developer-led motions as needed
  • Fulltime
Read More
Arrow Right

Senior Python Engineer

A Senior Engineer opportunity within our Enterprise AI team. Working with a grou...
Location
Location
United Kingdom , Fleet Place Office
Salary
Salary:
Not provided
justeattakeaway.com Logo
Just Eat Takeaway.com
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience working with cloud platforms like AWS (EC2, ECS, S3, Lambda, Fargate, DynamoDB/RDS) or GCP (Compute Engine, Cloud Storage, Cloud Functions, BigQuery)
  • Strong experience in Python and fluency in another language
  • Knowledge of Infrastructure as Code tools (e.g., CloudFormation, Terraform, Ansible, Serverless Framework)
  • Enjoy automating processes
  • Knowledge of containers (Docker, Container Orchestration like Kubernetes/ECS/GKE)
  • A genuine interest in and at least foundational experience with AI/ML concepts and technologies, demonstrating an eagerness to grow into a specialised AI Engineering role
  • Proven track record of delivering high-quality work and driving forward best practices in software engineering
  • Stays up to date with new technology in the AI space
Job Responsibility
Job Responsibility
  • Design, develop, and deploy high-quality, scalable software solutions, focusing on AI-enabled applications and infrastructure
  • Lead and participate in technical projects and deployments of AI systems
  • Provide guidance and mentoring to other team members on best practices in AI engineering
  • Use best practices (e.g., MLOps, AIOps) to improve products/services and processes related to AI
  • Optimise existing model serving and data pipelines to meet changing performance and security requirements
  • Hold requirements gathering sessions with business stakeholders and data science teams
  • Lead functional projects or work streams focused on AI infrastructure and tooling
  • Fulltime
Read More
Arrow Right

Software Engineering Director

We are seeking an experienced Software Engineering Director to lead the company’...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
awtg.co.uk Logo
AWTG
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven experience (10+ years) in software engineering, technical leadership, or similar roles, with at least 3 years in a senior management capacity
  • Strong background in software development, architecture, and systems design
  • Extensive experience in implementing AI-first software
  • Proven experience in AI development and AIOps implementation
  • Experience with various cloud platforms (GCP, AWS, Azure, Etc), DevOps tools
  • Demonstrated ability to scale technical teams and deliver complex software projects on time and on budget
  • Experience in creating solutions that has cloud, web, mobile app components
  • In-depth knowledge of cybersecurity, data privacy regulations, and compliance standards
  • In-depth knowledge of various AI methodologies and learning algorithms
  • Proven experience in various programming languages like Python, Java, React, C#, domain specific languages, native and cross platform development, etc
Job Responsibility
Job Responsibility
  • Define and oversee the company’s technical vision, strategy, software development, and product roadmap
  • Align technology initiatives with the company’s vision, business objectives and growth strategies
  • Evaluate and implement emerging technologies to maintain a competitive edge
  • Implement an AI-first software vision on products, platforms and solutions
  • Secure internal and external funding for development of new technologies and innovations
  • Manage P&L for the entire Software Division
  • Develop products and platforms that is ready for accelerate and sustain growth
  • Lead revenue generation activities including ensuring that bids and proposals are in top quality
  • Build, lead, and mentor a high-performing team of developers, engineers, and IT professionals
  • Foster a culture of innovation, collaboration, and continuous improvement within software engineering and product teams
  • Fulltime
Read More
Arrow Right

Senior Incident Optimization & Reliability Specialist - End-User Technology

The Senior Incident Optimization & Reliability Specialist serves as a critical b...
Location
Location
India , Chennai
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Information Technology, Computer Engineering, or a related technical field
  • A minimum of 8+ years of hands-on experience in IT operations, end-user computing, or a related field, with proven experience in incident reduction and operational excellence
  • Demonstrated success in leading event management and incident reduction initiatives with quantifiable results
  • Direct, hands-on experience with modern AIOps and enterprise event management platforms (e.g., BigPanda)
  • Deep understanding of end-user technology ecosystems, including VMWare-hosted cloud desktop infrastructure, Microsoft 365 suite (Teams, Outlook, Office), SharePoint, and collaboration platforms
  • Expertise with a broad range of domain-specific monitoring and observability tools
  • Hands-on experience developing robust automation solutions using scripting languages (e.g., Python, PowerShell) and modern automation frameworks
  • Proficiency in log analysis, pattern recognition, and using query languages for data analysis on log aggregation platforms
  • Excellent analytical abilities with a systematic approach to troubleshooting complex issues
  • Exceptional communication skills with the ability to influence and collaborate effectively across diverse, cross-functional teams
Job Responsibility
Job Responsibility
  • Conduct comprehensive analysis of alert and incident patterns to identify top sources of operational noise, determine root causes, and develop data-driven strategies for reduction
  • Design, implement, and optimize rules for event correlation, de-duplication, and suppression on AIOps and event management platforms
  • Architect and develop automation playbooks for incident data enrichment and create self-healing capabilities to reduce manual intervention (toil)
  • Assess the current observability footprint across all end-user technology domains
  • Champion and apply core SRE practices to systematically improve service reliability
  • Partner closely with end-user services, engineering, and platform teams to understand incident drivers, validate correlation logic, and provide expert guidance
  • Continuously validate the effectiveness of implemented rules and automation to ensure no business-impacting alerts are missed
  • Fulltime
Read More
Arrow Right