CrawlJobs Logo

AIOps Automation Engineering Lead

https://www.citi.com/ Logo

Citi

Location Icon

Location:
India , Chennai

Category Icon

Job Type Icon

Contract Type:
Employment contract

Salary Icon

Salary:

Not provided

Job Description:

The Engineering Lead Analyst is a senior level position responsible for leading a variety of engineering activities including the design, acquisition and deployment of hardware, software and network infrastructure in coordination with the Technology team. The position is within the Production Management AIOps Organization that is at the forefront of transforming production management and operations through cutting-edge technologies. The incumbent will lead the efforts to automate the routine production tasks, enhance predictive capabilities, reduce manual intervention and ensure integration of AI into existing operational workflows.

Job Responsibility:

  • Serve as a technology subject matter expert for internal and external stakeholders and provide direction for all firm mandated controls and compliance initiatives, all projects within the group and in creating a technology domain roadmap
  • ensure that all integration of functions meet business goals
  • define necessary system enhancements to deploy new products and process enhancements
  • recommend product customization for system integration
  • identify problem causality, business impact and root causes
  • exhibit knowledge of how own specialty area contributes to the business and apply knowledge of competitors, products and services
  • advise or mentor junior team members
  • impact the engineering function by influencing decisions through advice, counsel or facilitating services
  • drive and implement rigorous quality standards for all aspects of the automation delivery from initial concept to final implementation
  • continually evolve the working practices within and services provided by Production Management (regionally and globally) to improve efficiency and productivity
  • continuous forward compatibility and acquisition of competency around automation, Artificial Intelligence, Robotics Process Automation, predictive analytics, etc.
  • decision analytics and technology platforms to deliver immediate results and long-term business impact
  • develop predictive models that will form the basis of information-driven strategies executed with respect to services provided by Production Management

Requirements:

  • 10+ years of relevant experience in an Engineering role
  • experience working in Financial Services or a large complex and/or global environment
  • project management experience
  • J2EE/microservices development experience of running applications in cloud native environments (Google Cloud, AWS, API Gateway technologies)
  • strong proficiency in JavaScript, including experience with ReactJS and NodeJS
  • experience with MongoDB or other NoSQL databases
  • solid understanding of Python and experience with relevant libraries
  • experience with version control systems like Git
  • knowledge of CI/CD pipelines and DevOps practices is a plus
  • consistently demonstrates clear and concise written and verbal communication
  • comprehensive knowledge of design metrics, analytics tools, benchmarking activities and related reporting to identify best practices
  • demonstrated analytic/diagnostic skills
  • ability to work in a matrix environment and partner with virtual teams
  • ability to work independently, multi-task, and take ownership of various parts of a project or initiative
  • ability to work under pressure and manage to tight deadlines or unexpected changes in expectations or requirements
  • proven track record of operational process change and improvement

Nice to have:

  • knowledge of CI/CD pipelines and DevOps practices
  • project management experience
What we offer:
  • Equal opportunity employer
  • consideration without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other characteristic protected by law

Additional Information:

Job Posted:
May 03, 2025

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for AIOps Automation Engineering Lead

New

Principal Engineer-Site Reliability Engineering and AIOps

We are looking for a Principal Engineer to set the enterprise technical directio...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
https://www.wellsfargo.com/ Logo
Wells Fargo
Expiration Date
May 10, 2026
Flip Icon
Requirements
Requirements
  • 7+ years of Engineering experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
  • 7+ years of engineering experience, including principal-level technical leadership on large-scale reliability, production operations, or platform programs across complex environments
  • 7+ years of software engineering experience (e.g., Java, C#, Python) with demonstrated expertise in system design and distributed systems
  • track record of delivering reusable automation and platform capabilities adopted by multiple teams
  • 5+ years operating Linux/Unix and Windows platforms in production, including performance tuning, capacity planning, and reliability hardening for mission-critical services
  • 5+ years designing and operating cloud solutions (public and/or private cloud), including reliability and security architecture, infrastructure-as-code, and cost-aware engineering at scale
  • 5+ years leading reliability and operations practices for enterprise-scale, highly available services, including major incident leadership, problem management, and establishing operational readiness mechanisms
  • 5+ years architecting and scaling full-stack observability solutions, including instrumentation standards, alert strategy, service dashboards, and governance that improves signal quality and reduces noise
  • 5+ years with automation and observability toolsets (e.g., Ansible, Grafana, Elastic, Splunk, Prometheus) and experience building reusable components, templates, and paved paths integrated with CI/CD
  • Exceptional communication and influence skills, including the ability to align senior stakeholders, drive technical decisions across organizations, and clearly articulate risk, tradeoffs, and recommended paths forward
Job Responsibility
Job Responsibility
  • Act as an advisor to leadership to develop or influence applications, network, information security, database, operating systems, or web technologies for highly complex business and technical needs across multiple groups
  • Lead the strategy and resolution of highly complex and unique challenges requiring in-depth evaluation across multiple areas or the enterprise, delivering solutions that are long-term, large-scale and require vision, creativity, innovation, advanced analytical and inductive thinking
  • Translate advanced technology experience, an in-depth knowledge of the organizations tactical and strategic business objectives, the enterprise technological environment, the organization structure, and strategic technological opportunities and requirements into technical engineering solutions
  • Provide vision, direction and expertise to leadership on implementing innovative and significant business solutions
  • Maintain knowledge of industry best practices and new technologies and recommends innovations that enhance operations or provide a competitive advantage to the organization
  • Strategically engage with all levels of professionals and managers across the enterprise and serve as an expert advisor to leadership
  • Set and evangelize the SRE and AIOps technical strategy for EFT, establishing reference architectures, standards, and guardrails (service tiering, onboarding criteria, SLO/error budget governance) and holding teams accountable through transparent executive-level reporting
  • Act as a principal-level technical advisor and multiplier: mentor senior engineers, contribute to hiring and technical bar-raising, and define reliability patterns and guardrails across applications, networks, databases, operating systems, and web technologies
  • Own the reliability and observability architecture across hybrid/multi-cloud, driving standardization of monitoring, logging, tracing, synthetics, and resilience/chaos testing
  • define platform patterns that teams can adopt with minimal friction
  • Fulltime
!
Read More
Arrow Right
New

Managing Vice President - Infrastructure Platforms & Operations

The Managing Vice President, Infrastructure Platforms & Operations is a senior t...
Location
Location
United States , Bethesda
Salary
Salary:
215700.00 - 389700.00 USD / Year
https://www.marriott.com Logo
Marriott Bonvoy
Expiration Date
May 20, 2026
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Information Systems, Engineering, Business Administration, or related technical field
  • 15+ years of senior leadership experience across cloud engineering, infrastructure platforms, network services, and/or enterprise workplace technologies, preferably in a large global Fortune 500 organization
  • 10+ years of prior hands-on technical engineering or development experience (cloud, infrastructure, networking, automation, or enterprise platforms)
  • Demonstrated success leading large, multi-disciplinary global engineering and operations organizations
  • Deep expertise in multi-cloud platforms, network architecture, DevSecOps, automation, and reliability engineering
  • Strong experience partnering with cybersecurity teams to deliver secure by design platforms
  • Proven ability to influence senior executives and lead transformation in complex, matrixed enterprises
  • Strong financial acumen with experience managing large technology budgets and vendor portfolios
Job Responsibility
Job Responsibility
  • Lead global teams responsible for cloud foundations, DevOps and CI/CD platforms, automation, container platforms, service mesh, and self-service engineering capabilities
  • Oversee enterprise cloud landing zones across all regions, ensuring secure, scalable, and cost-efficient architecture
  • Drive modernization of hybrid platforms, including datacenter, edge compute, and infrastructure engineering capabilities
  • Oversee SRE, observability, resiliency, and disaster recovery governance
  • Lead global network architecture and operations across datacenter networks, property connectivity, enterprise networks, and cloud network integration
  • Drive transformation of Marriott's global connectivity ecosystem, including SD WAN, wireless, secure network edge, voice, and network automation
  • Ensure network performance, reliability, compliance, and resiliency at global scale
  • Lead workplace technology platforms supporting collaboration, productivity, endpoint, and digital employee experience solutions
  • Partner with business, HR, and IT leaders to deliver intuitive, reliable, and secure workplace tools that enable associate productivity
  • Drive standardization, modernization, and lifecycle management of workplace platforms and services
What we offer
What we offer
  • 401(k) plan
  • stock purchase plan
  • discounts at Marriott properties
  • commuter benefits
  • employee assistance plan
  • childcare discounts
  • medical
  • dental
  • vision
  • health care flexible spending account
  • Fulltime
Read More
Arrow Right

Executive Director, Digital Engineering- Aetna Member Services

We’re building a world of health around every individual — shaping a more connec...
Location
Location
United States , Hartford
Salary
Salary:
175100.00 - 334750.00 USD / Year
https://www.cvshealth.com/ Logo
CVS Health
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 15+ years of software engineering experience with deep expertise in backend systems, distributed services, and API platforms
  • Proven experience leading large engineering organizations delivering mission‑critical services
  • Strong background in AWS cloud platform, microservices architecture, CI/CD pipelines, and DevOps/SRE practices
  • Demonstrated success driving stability, resiliency, and observability improvements at scale
  • Experience leveraging AI, ML, or LLM-based engineering and operational tooling
  • Bachelor’s degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience
Job Responsibility
Job Responsibility
  • Lead the design, development, and delivery of scalable backend systems, APIs, and microservices powering member-facing capabilities
  • Define API contract standards, and integration patterns used across Member Services platforms
  • Drive service modernization by adopting cloud‑native architectures, containerization, service mesh, and event-driven patterns
  • Establish standards for availability, resiliency, performance, and disaster recovery across all services
  • Implement SLO/SLI/error budget frameworks, health checks, and high‑availability architectures
  • Institutionalize strong observability practices using metrics, logs, traces, and distributed monitoring
  • Drive continuous reliability improvements through chaos engineering, automated fault injection, and proactive root‑cause analysis
  • Integrate AI and LLM-based tooling into software development, QA, and operational processes (e.g., test automation, code generation, anomaly detection, intelligent incident triage)
  • Promote AIOps capabilities to reduce manual toil and amplify engineering productivity
  • Introduce AI-enhanced workflows across Member Services to improve personalization, routing, and intelligent decisioning
What we offer
What we offer
  • medical
  • dental
  • vision coverage
  • paid time off
  • retirement savings options
  • wellness programs
  • bonus
  • commission or short-term incentive program
  • equity award program
  • Fulltime
Read More
Arrow Right

Executive Director, Digital Engineering- Aetna Member Services

The Executive Director, Digital Engineering- Aetna Member Services is a senior t...
Location
Location
United States , Work at Home
Salary
Salary:
175100.00 - 334750.00 USD / Year
https://www.cvshealth.com/ Logo
CVS Health
Expiration Date
May 31, 2026
Flip Icon
Requirements
Requirements
  • 15+ years of software engineering experience with deep expertise in backend systems, distributed services, and API platforms
  • Proven experience leading large engineering organizations delivering mission‑critical services
  • Strong background in AWS cloud platform, microservices architecture, CI/CD pipelines, and DevOps/SRE practices
  • Demonstrated success driving stability, resiliency, and observability improvements at scale
  • Experience leveraging AI, ML, or LLM-based engineering and operational tooling
  • Bachelor’s degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience
Job Responsibility
Job Responsibility
  • Lead the design, development, and delivery of scalable backend systems, APIs, and microservices powering member-facing capabilities
  • Define API contract standards, and integration patterns used across Member Services platforms
  • Drive service modernization by adopting cloud‑native architectures, containerization, service mesh, and event-driven patterns
  • Establish standards for availability, resiliency, performance, and disaster recovery across all services
  • Implement SLO/SLI/error budget frameworks, health checks, and high‑availability architectures
  • Institutionalize strong observability practices using metrics, logs, traces, and distributed monitoring
  • Drive continuous reliability improvements through chaos engineering, automated fault injection, and proactive root‑cause analysis
  • Integrate AI and LLM-based tooling into software development, QA, and operational processes
  • Promote AIOps capabilities to reduce manual toil and amplify engineering productivity
  • Introduce AI-enhanced workflows across Member Services to improve personalization, routing, and intelligent decisioning
What we offer
What we offer
  • Affordable medical plan options
  • 401(k) plan (including matching company contributions)
  • Employee stock purchase plan
  • No-cost programs for all colleagues including wellness screenings, tobacco cessation and weight management programs, confidential counseling and financial coaching
  • Paid time off
  • Flexible work schedules
  • Family leave
  • Dependent care resources
  • Colleague assistance programs
  • Tuition assistance
  • Fulltime
Read More
Arrow Right

AI Operations Tech Leader

We are looking for an experienced Al Ops Tech Leader — Operations Support to lea...
Location
Location
Salary
Salary:
Not provided
lingarogroup.com Logo
Lingaro
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years in data engineering, Al/ML engineering, or operations support technology roles
  • 4—6+ years in technical leadership positions within operations support / IT operations / service operations environments
  • Proven track record delivering production Al/ML/data solutions that measurably improved operations support KPIs
  • Strong hands-on expertise with modern data/AI stacks (Python, Spark, Kafka, Airflow, cloud data platforms, PyTorch/TensorFlow, LLM frameworks) and integration into operations support ecosystems
  • Deep practical experience with AIOps patterns in live operations support settings: event correlation, anomaly detection, automated actions, predictive analytics, GenAI for ops
  • Experience leading development or significant enhancement of AIOps/internal tooling platforms specifically for operations support teams
  • Ability to stay deeply technical while leading people and strategy in a high-velocity operations support context
  • Excellent communication — can explain complex Al concepts to operations support practitioners and translate operational pain into technical roadmaps for executives
  • Strong bias for action, production impact, and reducing operational toil through intelligent automation
Job Responsibility
Job Responsibility
  • Actively lead and contribute to high-impact data/AI projects that directly improve operations support outcomes
  • Design and deliver scalable features embedded into operations support workflows and platforms
  • Ensure solutions meet strict operations support SLAs for reliability, low latency, auditability, explainability, and zero-downtime deployment
  • Up-to-date with innovations and research in AIOPS Tools
  • Lead the architecture, development, and continuous enhancement of internal AIOps platforms and reusable components that power operations support teams
  • Serve as the lead Al technical authority and trusted advisor for all operations support programs, automation movements, and Al transformation efforts
  • Lead technical discussions, architecture reviews, PoCs, vendor evaluations, and solution selection
  • Identify, prioritize, and drive the highest-ROI Al use cases in operations support
  • Build, mentor, and lead a high-performing squad of AIOps specialists focused on operations support outcomes
  • Foster a culture of rapid experimentation, production-first mindset, and relentless focus on operational impact
  • Fulltime
Read More
Arrow Right
New

Network Engineer, Foundation & Support

Meta is seeking a Network Engineer to join the Foundation and Support team, focu...
Location
Location
United States , Denver
Salary
Salary:
135000.00 - 191000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • 6+ years of experience in network engineering, with a focus on enterprise network deployment, operations, and site services infrastructure
  • 6+ years of experience designing, deploying, and operating LAN, WAN, and wireless network environments at scale across multiple physical sites
  • Experience leading incident response and troubleshooting for network outages in enterprise environments
  • Experience with network configuration and provisioning automation using scripting languages such as Python or Ansible
  • Experience coordinating with vendors and cross-functional teams including facilities, site operations, and infrastructure engineering
Job Responsibility
Job Responsibility
  • Plan and execute end-to-end deployment of enterprise network services across Meta's physical sites, including LAN, WAN, wireless, and access layer infrastructure
  • Evaluate site-specific network requirements and design deployment architectures that align with foundation infrastructure standards and capacity targets
  • Lead incident response for network outages, performing root cause analysis, coordinating remediation efforts, and implementing corrective actions to prevent recurrence
  • Drive change management processes for network infrastructure updates, ensuring proper review, testing, and rollout procedures are followed
  • Collaborate with vendors, site operations, facilities, and cross-functional infrastructure teams to coordinate network initiatives and resolve complex issues
  • Develop and maintain deployment runbooks, network topology documentation, and configuration standards for enterprise site services
  • Leverage AIOps tools and automation to accelerate network provisioning, anomaly detection, configuration validation, and operational reporting
  • Analyze network performance data and incident trends to inform proactive improvements to site network reliability and deployment quality
  • Advise cross-functional partners on complex site network matters, including capacity planning, technology refresh cycles, and deployment sequencing
  • Track deployment milestones and operational metrics, surfacing risks early and communicating progress to stakeholders and leadership
What we offer
What we offer
  • bonus
  • equity
  • Fulltime
Read More
Arrow Right

Lead Engineer – Platform Engineering

We are looking for a Lead DevOps Engineer to join the Platform Engineering team ...
Location
Location
United States , St Petersburg, Florida
Salary
Salary:
Not provided
raymondjames.com Logo
Raymond James
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Deep experience with virtualization platforms (e.g., VMware vSphere/ESXi, Hyper‑V, KVM/Nutanix)
  • Hands‑on experience with configuration management tools such as Ansible
  • Implement and support enterprise load balancer solutions (e.g., F5 BIG-IP, NGINX, Azure/AWS load balancers), including configuration, automation, and traffic‑routing policies
  • Familiarity with AI‑assisted operations tools (AIOps), or how they can fit into the workflow
  • Solid understanding of CI/CD systems (GitHub Actions, Azure DevOps, Jenkins, GitLab CI)
  • Advanced scripting skills in Python, PowerShell, and/or Bash
  • Experience with provisioned workflow development in Service Now
  • Strong knowledge of monitoring and logging platforms (Prometheus/Grafana, Splunk, Elastic, Datadog, etc.)
  • Understanding of security best practices, IAM/RBAC, secrets management, and compliance frameworks
  • Strong networking and systems fundamentals (TCP/IP, DNS, load balancing, storage)
Job Responsibility
Job Responsibility
  • Design, build, and maintain automation for VM provisioning, configuration, and lifecycle management
  • Enhance and support CI/CD pipelines for infrastructure and platform services
  • Provide technical leadership and mentorship to engineers across the platform engineering team
  • Use AI‑assisted tooling when beneficial for anomaly detection, event correlation, and operational insights
  • Work on standardized VM images, templates, and OS baselines to ensure consistency and security
  • Improve platform reliability through monitoring, alerting, and SRE‑aligned practices
  • Develop and maintain observability tooling, dashboards, and automated remediation workflows
  • Ensure security best practices across VM platforms, including RBAC, secrets management, and patching
  • Optimize VM capacity, performance, and resource utilization across environments
  • Collaborate with development, cloud, and security teams to deliver stable, self‑service platform capabilities
  • Fulltime
Read More
Arrow Right

Intermediate Site Reliability Engineer SRE – AI Reliability & Automation

At PointClickCare our mission is simple: to help providers deliver exceptional c...
Location
Location
Canada , Mississauga
Salary
Salary:
115000.00 - 128000.00 CAD / Year
pointclickcare.com Logo
PointClickCare
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years' experience in software engineering
  • Experience with SRE principles
  • Experience with AI/ML in production environments
  • A passion for automation, intelligent systems, and operational excellence
  • Strong debugging, problem-solving, and system design skills
  • Languages: Python, Java, Bash, Terraform
  • Platforms: Azure, Kubernetes, Docker
  • Tools: Datadog, Prometheus, AppDynamics, ELK, GitHub Actions
  • ML/AI: MCP framework, AI agents, Vector store, Agent orchestration (LangChain), RAG
  • CI/CD: Jenkins, ArgoCD, Spinnaker
Job Responsibility
Job Responsibility
  • Build ML-based anomaly detection and pattern recognition systems
  • Enhance telemetry with smart tagging and metadata for better AI insights
  • Develop event-driven workflows and self-healing systems using AI triggers
  • Automate incident response with generative AI and custom AI agent orchestration
  • Use time-series forecasting and predictive modelling to anticipate failures
  • Optimise infrastructure with AI-powered autoscaling and cost-aware resource allocation
  • Build scalable, fault-tolerant systems in a cloud-native environment
  • Participate in on-call rotations and lead incident response for critical systems
  • Skilled in API integration for streamlined data exchange and system connectivity
  • Run internal AIOps workshops and help teams adopt AI maturity models
What we offer
What we offer
  • Benefits starting from Day 1!
  • Retirement Plan Matching
  • Flexible Paid Time Off
  • Wellness Support Programs and Resources
  • Parental & Caregiver Leaves
  • Fertility & Adoption Support
  • Continuous Development Support Program
  • Employee Assistance Program
  • Allyship and Inclusion Communities
  • Employee Recognition … and more!
  • Fulltime
Read More
Arrow Right