CrawlJobs Logo

Senior AIOps Engineer (Platform & Infrastructure)

· Job Posted January 31, 2026
Apply Position
Job Link Share

Job Description

Groupon is moving beyond "experimenting" with AI to running it at massive scale. As we transition to an AI-First organization, we are building a centralized AIOps team to solve a critical challenge: moving AI features from fragmented prototypes to high-performing, cost-efficient production reality. As a Senior AIOps Engineer, you won't just be managing servers; you will be the architect of the "Golden Paths"—the reusable, automated infrastructure that enables our product teams to ship LLMs, Vector Search, and AI Agents faster than ever before.

Job Responsibility

  • Architect the AI Stack: Design and operate core infrastructure on Kubernetes, including Vector Databases, LLM Gateways (LiteLLM), and workflow automation tools (n8n)
  • Enable at Scale: Drive AI adoption by creating self-service "Golden Paths" using Terraform and Helm, allowing engineering teams to deploy RAG pipelines with one click
  • Operational Excellence: Implement centralized observability, tracing (Langfuse), and governance to ensure our AI systems are reliable, auditable, and secure
  • Fiscal Discipline: Own the "AI Bill"—monitoring token usage and latency to optimize spend while maintaining high performance

Requirements

  • 5+ years in Platform Engineering, SRE, or DevOps within a cloud-native environment
  • Deep experience managing stateful and stateless workloads (Helm, Istio, Docker)
  • Hands-on experience deploying and operating AI/ML tools or data-intensive systems in production
  • Strong skills in Python or Go to build custom API wrappers and automate operational tasks
  • Expertise in Prometheus, Grafana, and ELK stack to ensure end-to-end observability of complex AI requests

What we offer

  • End-to-end Ownership: Real authority to standardize how a global company builds with AI
  • Career Growth: This is a high-visibility role within a new, strategic team with potential for leadership progression

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Senior AIOps Engineer (Platform & Infrastructure)

8 matching positions

GenAI Senior Platform Engineer - Python, VP

Citi's global Innovation Labs is seeking a versatile Senior GenAI Platform Engin...
Location
Location
Canada , Mississauga
Salary
Salary:
120800.00 - 170800.00 USD / Year
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of experience in the software industry, with a strong emphasis on building enterprise software
  • 6+ years of relevant experience developing and implementing scalable and robust platforms, applications, and services using modern libraries and frameworks (e.g., Python: FastAPI, Flask, Pandas, Scikit-learn, Hugging Face
  • Node.js: Express, NestJS
  • TypeScript)
  • 5+ years of experience delivering complex backend solutions and services (e.g., APIs, microservices) into production
  • Demonstrated experience in managing and implementing successful projects of varying sizes and complexities
  • Proven understanding of Generative AI systems, AIOps, and application monitoring/evaluation
  • Experience with cloud architectures, with specific experience in public cloud offerings
  • Strong passion and proven hands-on experience integrating with AI/ML technologies
  • Experience with software development agents, agile development, CI/CD pipelines, software testing, and code reviews
Job Responsibility
Job Responsibility
  • Lead the design, development, and maintenance of highly complex GenAI platforms, applications, and services using Python, Node.js, and TypeScript
  • Ensure the seamless operation, scalability, and integration of AI capabilities across various Citi business units
  • Engage with data science, technical, and business stakeholders to define and design the overall architecture for key use-cases
  • Drive the deployment of new GenAI products and process improvements, working with internal and external partners to design, validate, and deliver solutions
  • Resolve high-impact technical and business problems, leading projects through in-depth evaluation of complex business processes, system architecture, and industry standards
  • Provide expert guidance and advanced knowledge in modern programming, ensuring platform design adheres to architectural blueprints and best practices for generative models
  • Develop and enforce robust coding standards, testing methodologies, debugging practices, and implementation strategies for enterprise-grade solutions across Python, Node.js, and TypeScript
  • Manage multiple concurrent initiatives and projects of varying sizes and complexity
  • Engage with external vendors and startups for joint initiatives and exploration of new technologies
  • Cultivate a comprehensive understanding of how business, architecture, and infrastructure integrate within the GenAI ecosystem at Citi
What we offer
What we offer
  • Discover the top benefits offered to our global workforce, designed to support your well-being, growth and work-life balance
  • Fulltime
Read More
Arrow Right

Senior Ansible Automation & Platform Engineer

The Senior Ansible Automation & Platform Engineer is a strategic member of the o...
Location
Location
United States , Austin; Mountain View; Warren
Salary
Salary:
Not provided
gm.com Logo
General Motors
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7–12+ years in Architecture, DevOps, SRE, Platform Engineering, or Infrastructure Engineering
  • Expert-level proficiency with Ansible (playbooks, roles, collections, Jinja2, modules)
  • Hands-on experience designing and operating Ansible Automation Platform (AAP)
  • Strong experience with Terraform, Chef, or other IaC tools
  • Deep Linux engineering background and configuration management expertise
  • Expert in integrating automation with ServiceNow (CMDB, ITSM, workflows)
  • Exceptional scripting skills (Python, Bash, PowerShell)
  • Experience with AWS/Azure/GCP automation
  • Experience with Kubernetes, containerization, and orchestration
  • Experience with CI/CD pipelines (GitHub Actions, GitLab, Jenkins, Azure DevOps)
Job Responsibility
Job Responsibility
  • Architect, design, and operate the Ansible Automation Platform (AAP) including controller, execution environments, mesh architecture, and collections strategy
  • Define and maintain the Ansible Platform roadmap, including feature evolution, lifecycle management, scalability planning, and enterprise adoption milestones
  • Establish platform governance: coding standards, role/playbook patterns, collections, testing frameworks, and security guardrails
  • Build and maintain Execution Environments (EEs) optimized for performance, security, and dependency management
  • Lead platform upgrades, migrations, and cross-environment standardization
  • Design enterprise-grade Ansible automation frameworks with reusable roles, collections, and modular playbooks
  • Build automation for provisioning, configuration management, patching, compliance, and cloud infrastructure
  • Integrate Ansible with Terraform, CI/CD pipelines, GitOps workflows, and event-driven automation systems
  • Implement self-service automation capabilities for developers, operations, and business teams
  • Integrate Agentic AI systems to enhance automation workflows, including: AI-driven playbook generation and validation, Automated remediation recommendations, Intelligent change-impact analysis, AI-assisted troubleshooting and root-cause analysis
What we offer
What we offer
  • Relocation benefits (may be eligible)
  • Fulltime
Read More
Arrow Right

Senior DevOps Engineer

Ford is embarking on an electrifying digital transformation, and our cutting-edg...
Location
Location
United States , Remote
Salary
Salary:
85400.00 - 192900.00 USD / Year
ford.com Logo
Ford Motor Company
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience
  • Proven experience with modern CI/CD pipelines (e.g., GitHub Actions, Tekton, Jenkins), Infrastructure as Code (e.g., Terraform), and advanced deployment techniques (blue/green, canary releases) (5+ Years)
  • Deep understanding of REST API design and experience with distributed architectures running on modern platforms like Cloud Run, Kubernetes (GKE), or OpenShift (5+ Years)
  • Proficiency in languages such as GoLang, Python, or Java to build highly effective automation, custom tooling, and integrations
  • Demonstrable experience working within Agile methodologies, coupled with a baseline understanding of how to utilize AI tools to enhance software engineering productivity
Job Responsibility
Job Responsibility
  • Spearhead DevOps & GitOps Evolution: Lead the modernization of DevOps tooling and CI/CD pipelines for our mission-critical Apigee API Gateway, embracing GitOps methodologies to ensure declarative, automated, and secure deployments
  • Pioneer AIOps & Intelligent SRE: Design and evolve production operations by embedding SRE principles and leveraging AIOps tools. Utilize AI-driven observability for anomaly detection, predictive alerting, and automated incident remediation to ensure exceptional availability
  • Enable AI & Next-Gen Workloads: Architect gateway solutions that securely and efficiently route high-volume traffic for Ford’s Generative AI, LLM, and Machine Learning APIs (handling intelligent rate-limiting, caching, and payload security)
  • Innovate with AI-Assisted Development: Utilize GenAI coding assistants (e.g., GitHub Copilot) to accelerate the creation of Infrastructure as Code (IaC), automation scripts, and test-driven development (TDD) frameworks
  • Global Collaboration & On-Call: Actively participate in a global on-call rotation (currently 1 week every 10 weeks, 'follow the sun' model), collaborating with an international team to ensure 24/7 operational excellence
  • Drive Strategic Alignment: Partner seamlessly across engineering, product, and security domains to champion the enterprise-wide API Gateway strategy and integrate security-as-code (DevSecOps) from day one
What we offer
What we offer
  • Immediate medical, dental, vision and prescription drug coverage
  • Flexible family care days, paid parental leave, new parent ramp-up programs, subsidized back-up child care and more
  • Family building benefits including adoption and surrogacy expense reimbursement, fertility treatments, and more
  • Vehicle discount program for employees and family members and management leases
  • Tuition assistance
  • Established and active employee resource groups
  • Paid time off for individual and team community service
  • A generous schedule of paid holidays, including the week between Christmas and New Year's Day
  • Paid time off and the option to purchase additional vacation time
  • Fulltime
Read More
Arrow Right

Senior Python Engineer

A Senior Engineer opportunity within our Enterprise AI team. Working with a grou...
Location
Location
United Kingdom , Fleet Place Office
Salary
Salary:
Not provided
justeattakeaway.com Logo
Just Eat Takeaway.com
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience working with cloud platforms like AWS (EC2, ECS, S3, Lambda, Fargate, DynamoDB/RDS) or GCP (Compute Engine, Cloud Storage, Cloud Functions, BigQuery)
  • Strong experience in Python and fluency in another language
  • Knowledge of Infrastructure as Code tools (e.g., CloudFormation, Terraform, Ansible, Serverless Framework)
  • Enjoy automating processes
  • Knowledge of containers (Docker, Container Orchestration like Kubernetes/ECS/GKE)
  • A genuine interest in and at least foundational experience with AI/ML concepts and technologies, demonstrating an eagerness to grow into a specialised AI Engineering role
  • Proven track record of delivering high-quality work and driving forward best practices in software engineering
  • Stays up to date with new technology in the AI space
Job Responsibility
Job Responsibility
  • Design, develop, and deploy high-quality, scalable software solutions, focusing on AI-enabled applications and infrastructure
  • Lead and participate in technical projects and deployments of AI systems
  • Provide guidance and mentoring to other team members on best practices in AI engineering
  • Use best practices (e.g., MLOps, AIOps) to improve products/services and processes related to AI
  • Optimise existing model serving and data pipelines to meet changing performance and security requirements
  • Hold requirements gathering sessions with business stakeholders and data science teams
  • Lead functional projects or work streams focused on AI infrastructure and tooling
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, AI

Credit Genie is a mobile-first financial wellness platform designed to help indi...
Location
Location
United States , Pittsburgh
Salary
Salary:
150000.00 - 250000.00 USD / Year
creditgenie.com Logo
Credit Genie
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • A Software Engineer with 5+ years of industry experience
  • Strong foundations in multiple programming languages (Python, Java, TypeScript, etc.)
  • Hands-on experience with cloud platforms (AWS, GCP, or Azure)
  • Experienced at designing and implementing distributed, production-grade systems
  • Comfortable with system design, APIs, version control, Infrastructure as Code, and testing
  • Collaborative and excited by fast-moving, problem-solving environments
  • Prior exposure with Machine Learning and AI concepts, tools, or frameworks (e.g., LLMs, vector databases, specialized model serving)
Job Responsibility
Job Responsibility
  • Lead the design and implementation of highly available, scalable backend services and APIs that serve and integrate our AI models and applications into production systems
  • Architect and optimize the services and data pipelines essential for deploying, monitoring, and maintaining real-time AI inferencing and retrieval at scale
  • Collaborate with AI and ML Engineers to improve model deployment, monitoring, and experimentation workflows (AIOps)
  • Drive technical excellence, setting high standards for code quality, system reliability, and performance
  • Mentor and guide other engineers on best practices for building robust backend systems in an AI-focused environment
  • Have fun working on hard and highly impactful problems
What we offer
What we offer
  • Offers Equity
  • Offers Bonus
  • Comprehensive medical, vision, and dental coverage
  • 401(k) retirement plan with company match
  • Short & long term disability insurance
  • Life insurance
  • Flexible PTO
  • 100% company-paid medical, dental, and vision coverage for you and your dependents on your first day of employment
  • Receive up to $100 per month in fitness reimbursement or enjoy a complimentary full membership to LifeTime Fitness or Equinox
  • 401(k) with a 3.5% match and immediate vesting
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, AI

LogicMonitor is advancing observability through AI‑driven data intelligence, con...
Location
Location
India , Pune
Salary
Salary:
Not provided
logicmonitor.com Logo
LogicMonitor
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Data Engineering, or a related field
  • 4-5 years of experience in backend or data systems engineering
  • Experience building streaming data pipelines (Kafka / Spark or any similar technology)
  • Strong programming background in Java and Python, including microservice design
  • Experience with ETL, data modeling, and distributed storage systems
  • Familiarity with LLM pipelines, embeddings, and vector retrieval
  • Understanding of Kubernetes, containerization, and CI/CD workflows
  • Awareness of data governance, validation, and lineage best practices
  • Strong communication and collaboration across AI, Data, and Platform teams
Job Responsibility
Job Responsibility
  • Design and build streaming and batch data pipelines that process metrics, logs, and events for AI workflows
  • Develop ETL and feature‑extraction pipelines using Python and Java microservices
  • Integrate data ingestion and enrichment from multiple observability sources into AI‑ready formats
  • Build resilient data orchestration using Kafka, Airflow, and Redis Streams
  • Develop data indexing and semantic search for large‑scale observability and operational data
  • Work with structured and unstructured data lakes and warehouses (Delta Lake, Iceberg, ClickHouse)
  • Collaborate with the AI Platform team to manage embeddings, metadata, and model context storage
  • Optimize latency and throughput for retrieval, query expansion, and AI response generation
  • Build and maintain Java microservices (Spring Boot) that serve AI and analytics data to Edwin and AIOps applications
  • Develop Python APIs (FastAPI / LangGraph) for LLM orchestration, summarization, and correlation reasoning
Read More
Arrow Right

Principal, Managed Services Portfolio and Technical Delivery Leader

Turnberry Solutions is seeking a Principal, Managed Services Portfolio and Techn...
Location
Location
United States , Minneapolis
Salary
Salary:
170000.00 - 190000.00 USD / Year
turnberrysolutions.com Logo
Turnberry Solutions
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of experience in consulting, managed services, or enterprise IT operations leadership
  • Proven experience operating in a hybrid consulting and MSP leadership environment
  • Demonstrated success leading large-scale operational delivery teams and client-facing managed services engagements
  • Experience managing Service Delivery Managers, operational leads, or multi-layer delivery organizations
  • Strong executive communication and stakeholder management capabilities, comfortable in CIO, CTO, and Head-of-Engineering conversations
  • Hands-on technical leader at some prior point in career
  • Personally architected, built, or led modernization of production systems and can still hold a deep technical conversation today
  • Strong working knowledge of cloud platforms (AWS, Azure, or GCP) across architecture, operations, cost, security, and reliability
  • Cloud certifications are a plus
  • Demonstrated experience leading application modernization initiatives: legacy-to-cloud migrations, monolith decomposition, replatforming, refactoring, API/integration modernization, and observability uplift
Job Responsibility
Job Responsibility
  • Oversee delivery quality, operational performance, and client satisfaction across managed services engagements
  • Lead and mentor Service Delivery Managers, Support Leads, and operational leadership teams
  • Drive service stabilization, operational maturity, and scalable governance models across accounts
  • Serve as executive escalation point for critical delivery, operational, or client issues
  • Ensure SLA/KPI adherence, operational transparency, and continuous improvement across engagements
  • Establish scalable governance, reporting, knowledge management, and service review frameworks
  • Drive operational rigor across Incident Management, Problem Management, Change Management, Service Delivery Governance, Capacity and Demand Management, and Continuous Service Improvement
  • Serve as the senior technical voice across the managed services portfolio, credible with client architects, CTOs, platform engineering leaders, and security leadership
  • Own the technical narrative within AMS engagements: absorb customer technical directives, evaluate impact across the supported application estate, and translate them into actionable delivery, modernization, and operational plans
  • Shape and steer the long-term technical roadmap conversations inside managed services engagements, moving accounts beyond pure run-the-lights operations into modernization, platform evolution, and value-generating engineering work
What we offer
What we offer
  • comprehensive healthcare package (medical, dental, vision)
  • disability and group term life insurance
  • health and flexible spending accounts
  • utilization bonus
  • 401(k) with match
  • flexible time off for salaried employees
  • parental leave for salaried employees
  • flexible work arrangements
  • Fulltime
Read More
Arrow Right

Manager, Service Strategy

Location
Location
United Kingdom , London
Salary
Salary:
Not provided
mastercard.com Logo
Mastercard
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience in technology, consulting or service management roles, ideally within Financial services or Real-time payments ecosystems
  • Proven background in top-tier consulting firms with direct experience leading complex transformation programs
  • Deep understanding of real-time payment systems, regulatory requirements, and high-availability architectures
  • Exceptional stakeholder engagement, and communication skills
  • comfortable influencing C-level stakeholders and regulators
  • Strong analytical and strategic thinking skills with a bias for execution and measurable results
  • Proven track record of leading high-stakes strategic projects with significant impact, demonstrating an ability to navigate and influence at the highest levels of the organization
  • A problem-solver with an analytical mindset, capable of cutting through noise and complexity to deliver clear, actionable insights
  • Demonstrated ability to lead and motivate cross-functional and cross-regional teams without direct reporting lines, showcasing strong leadership and collaboration skills
  • Proven ability to influence various senior stakeholders and drive substantial change, requiring excellent communication and negotiation skills
Job Responsibility
Job Responsibility
  • Support in developing and owning the service management strategy aligned with Mastercard’s long-term business and technology goals for RTP International
  • Collaborate with product, engineering, and infrastructure teams to align operational readiness with product roadmaps and launches
  • Support strategic initiatives to modernize service platforms, introducing automation, AIOps, observability, and intelligent alerting
  • Drive growth and profitability by socializing issues and potential solutions that improve service resilience and stability
  • Leverage consulting experience to create scalable service models that align with diverse client needs across geographies
  • Support strategic engagements with clients, participant banks and regulators to ensure service expectations and technical capabilities are fully aligned
  • Act as an advisor internally and externally, translating complex technical and operational concepts into clear, actionable strategies
  • Support the service operations team leadership in identifying, structuring, and prioritizing key issues, defining problem statements, and developing solutions that elevate the customer experience
  • Drive progress on team goals and internal strategic projects focused on operational improvements, customer satisfaction, and communications
  • Monitor market trends and organizational performance, as well as utilizing customer data, to identifying risks and implement action plans that protect and enhance customer trust
  • Fulltime
Read More
Arrow Right