CrawlJobs Logo

Ai devops platform support lead

https://www.citi.com/ Logo

Citi

Location Icon

Location:
India , Pune

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

Engineer the future of global finance. At Citi, our Tech team doesn’t just support finance – we are helping to redefine it. Every day, $5 trillion crosses through our network. We do business in 180+ countries, operating at a scale few can match. From deploying advanced AI to helping shape global markets, we build systems that matter. Look to join a team where your work helps influence economies, your ideas can drive innovation and outcomes, and your growth is backed by mentorship, continuous learning, and flexibility with potential hybrid work opportunities. Help solve real-world challenges that touch millions and get the opportunity to build the future of finance with Citi Tech. We are seeking an experienced and motivated leader for our AI and DevOps Platform Support team in North America. This role is responsible for ensuring the stability, reliability, and performance of our critical AI and DevOps platforms. The team supports a wide range of services, including multiple AI applications, developer tools, and CI/CD pipeline technologies used by teams across the organization. The ideal candidate will lead a team of SRE and Support engineers, manage incident and problem resolution, and collaborate with engineering and development teams to improve platform services and supportability. Involvement includes short- to medium‑term planning of actions and resources for the area.

Job Responsibility:

  • Demonstrates an in-depth understanding of how application support integrates within the overall technology function to achieve objectives
  • requires a good understanding of the industry
  • Vendor relationship management, including oversight for all offshore managed services
  • Improve the service level the team provides to our end users, including maximizing operational efficiencies and strengthening incident management, problem management, and knowledge‑sharing practices
  • Guide development teams on application stability and supportability improvements
  • Formulate and implement a framework for managing capacity, throughput, and latency
  • Define and implement application onboarding guidelines and standards
  • Work with various team members, coaching them on how to maximize their potential, work better in a highly integrated team environment, and focus on bringing out their strengths
  • Drive continued cost reductions and efficiencies across the portfolios supported through Root Cause Analysis reviews, knowledge management, performance tuning, and user training
  • Participate in business review meetings, relating technology tools and strategies to business requirements
  • Assure adherence to all support processes and tool standards, and work with management to create new and/or enhance existing processes to ensure consistency and quality in “best practices” across the overall support program
  • Perform other duties and functions as assigned
  • Act as the primary point of contact for platform matters, defining the vision and roadmap in partnership with engineering leaders and business stakeholders
  • Champion the platform's resilience strategy by planning and executing wargaming scenarios, chaos engineering tests, and disaster recovery drills
  • Drive a comprehensive automation strategy to reduce manual toil, improve deployment velocity, and identify opportunities to leverage AI for operational intelligence
  • Define and drive the enterprise-wide observability strategy, ensuring the team has the tools and insights needed to guarantee platform health, performance, and cost‑effectiveness. This includes overseeing monitoring, logging, tracing, and alerting
  • Remain hands‑on and maintain a deep technical understanding of the platform architecture and services
  • Oversee the operational health of all production platforms (including OpenShift, ECS, CI/CD), ensuring SLAs are met and a robust incident management process is in place
  • Implement and manage comprehensive monitoring and observability strategies to ensure proactive issue detection, performance analysis, and system health checks across all supported platforms

Requirements:

  • 10 years of relevant experience in a hands‑on technical leadership role
  • Lead architecture decision‑making for platform services, ensuring alignment with enterprise standards, long‑term scalability, and operational resilience
  • Experience with senior stakeholder management
  • Project management experience with demonstrable results in improving IT services
  • Exceptional communication and presentation skills, with the ability to articulate a technical vision and report on key metrics to senior leadership
  • A strong track record of developing and executing a strategic roadmap for a technical platform, balancing new features with a dedicated “book of work” for stability
  • Demonstrable experience leading resilience initiatives such as wargaming, disaster recovery planning, and incident response simulations
  • Ability to effectively share information with other support team members and other technology teams
  • Ability to plan and organize workload
  • Consistently demonstrates clear and concise written and verbal communication skills
  • Ability to communicate appropriately with relevant stakeholders
  • Bachelor’s/University degree
  • Master’s degree preferred

Nice to have:

  • Working knowledge of Generative AI with LLMs preferred
  • Experience with CI/CD and configuration management preferred
  • Experience with Red Hat OpenShift or similar Kubernetes technologies preferred
  • Experience working with databases such as Postgres, Oracle, MongoDB, and Redis preferred
  • Experience writing code in Java, Python, Go, or similar, and desire to build on these skills preferred
  • Hands‑on experience with modern observability and monitoring tools (e.g., Prometheus, Grafana, Splunk, ELK)

Additional Information:

Job Posted:
March 21, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Ai devops platform support lead

AI Lead Engineer

Lead Engineer role in HPE Hybrid Cloud focusing on AI innovation and technology ...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience designing and developing software systems design tools and languages in storage/server/networking area
  • Two or more years of experience in applying AI to practical and comprehensive technology solutions
  • Experience with ML, deep learning, TensorFlow, Python, NLP
  • Experience in program leadership, governance, and change enablement
  • Knowledge of basic algorithms, object-oriented and functional design principles, and best-practice patterns
  • Experience in REST API development, NoSQL database design, and RDBMS design and optimizations
  • Experience with innovation accelerators
  • Cloud Architectures
  • Cross Domain Knowledge
  • Design Thinking
Job Responsibility
Job Responsibility
  • Lead cross-functional teams in identifying and prioritizing key areas of a partner's business where AI solutions can drive significant business benefit
  • Design and develop solutions leveraging patterns in the data and metadata stored in Petabytes of Objects and Files in distributed fashion across enterprise storage platform
  • Design, develop, and deploy hybrid RAG architectures integrating LLMs with retrieval-based systems for improved relevance and contextual responses
  • Work on functional design, process design (including scenario design, flow mapping), prototyping, testing, training, and defining support procedures
  • Translating technical AI findings into clear, business-oriented language for non-technical stakeholders
  • Implement and manage pipelines that effectively combine retrieval mechanisms with generative capabilities
  • Develop custom plugins, adapters, or APIs to integrate retrieval systems with generative models
  • Fine-tune and optimize large language models
  • Monitor and troubleshoot issues within pipelines
  • Evaluate and benchmark the performance of vector databases
What we offer
What we offer
  • Health & Wellbeing benefits
  • Personal & Professional Development programs
  • Unconditional Inclusion environment
  • Fulltime
Read More
Arrow Right
New

Application Production Support Engineer Generative AI

Engineer the future of global finance. At Citi, our Tech team doesn’t just suppo...
Location
Location
India , Pune
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6-8 years of relevant experience in technical support, platform operations, or engineering
  • Exposure to architecture concepts with the ability to contribute to technical discussions and understand design decisions
  • Experience working with business partners, engineering teams, or technology stakeholders
  • Demonstrated experience supporting IT services, platform operations, or infrastructure components
  • Strong verbal and written communication skills, with the ability to document technical issues clearly
  • Experience supporting operational workstreams or participating in platform improvement initiatives
  • Participation in resilience‑related or stability‑focused activities preferred
  • Ability to collaborate effectively with cross‑functional teams
  • Strong organizational skills and ability to manage daily workload and task priorities
  • Working knowledge of Generative AI concepts preferred
Job Responsibility
Job Responsibility
  • Understand how application support functions within the broader technology organization and contributes to business objectives
  • Assist with vendor coordination and day‑to‑day interactions with offshore managed services
  • Support efforts to improve service levels, including participating in incident management, problem management, and knowledge‑sharing initiatives
  • Partner with development and engineering teams to support application stability and operational readiness
  • Assist in collecting capacity, performance, and latency data to support platform planning efforts
  • Support application onboarding activities using established guidelines and standards
  • Contribute to fostering a collaborative and supportive team environment that encourages skill development
  • Participate in cost‑efficiency initiatives such as Root Cause Analysis reviews, knowledge management, and performance tuning
  • Assist in preparing materials for business review meetings and help align technology activities with business needs
  • Follow established support processes and tool standards and provide input on improvement opportunities
  • Fulltime
Read More
Arrow Right

AI and DevOps Platform Support Manager

Engineer the future of global finance. At Citi, our Tech team doesn’t just suppo...
Location
Location
United Kingdom , Belfast
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Relevant experience in a technical leadership or management role with demonstrated success in building and scaling a high-performing support team
  • Experience of senior stakeholder management
  • Project management with demonstrable results in improving IT services
  • Exceptional communication and presentation skills, with the ability to articulate a technical vision and report on key metrics to senior leadership
  • A strong track record of developing and executing a strategic roadmap for a technical platform, balancing new features with a dedicated 'book of work' for stability
  • Demonstrable experience leading resilience initiatives such as wargaming, disaster recovery planning, and incident response simulation
  • Effectively share information with other support team members and with other technology teams
  • Ability to plan and organize workload
  • Consistently demonstrates clear and concise written and verbal communication skills
  • Ability to communicate appropriately to relevant stakeholders
Job Responsibility
Job Responsibility
  • Demonstrates an in-depth understanding of how apps support integrates within the overall technology function to achieve objectives
  • requires a good understanding of the industry
  • Vendor relationship management including oversight for all offshore managed service
  • Improve the service level the team provides to our end users, which includes maximizing operational efficiencies, strengthening incident management, problem management and knowledge sharing practices
  • Guide development teams on application stability and supportability improvements
  • Formulate and implement a framework for managing capacity, throughput and latency
  • Define and implemented application on-boarding guidelines and standards
  • Work with various team members on coaching them on how to maximize their potential, work better in a highly integrated team environment and focus on bringing out their strengths
  • Drives continued cost reductions and efficiencies across the portfolios supported by means of Root Cause Analysis reviews, Knowledge management, Performance tuning, and user training
  • Evaluates subordinates' performance and makes decisions on pay increases, hiring, terminations and other personnel actions
What we offer
What we offer
  • 27 days annual leave (plus bank holidays)
  • A discretional annual performance related bonus
  • Private Medical Care & Life Insurance
  • Employee Assistance Program
  • Pension Plan
  • Paid Parental Leave
  • Special discounts for employees, family, and friends
  • Access to an array of learning and development resources
  • Fulltime
Read More
Arrow Right

Senior Director of Platform Engineering

Lead the Future of Platform Engineering at Modus Create. As Senior Director of P...
Location
Location
United States of America
Salary
Salary:
Not provided
moduscreate.com Logo
Modus Create
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 15+ years in Platform Engineering/DevOps
  • 7+ years in senior engineering leadership
  • ideally in consulting or high-growth tech environments
  • a clear point of view on modern architecture, engineering best-practices, and agile delivery
  • proven experience scaling distributed global teams and platform engineering operations
  • strong pre-sales and delivery experience
  • able to shape winning proposals and roadmaps
  • a customer-first mindset and passion for solving complex problems with elegant, scalable solutions
  • excellent communication and collaboration skills in cross-functional and cross-cultural environments
  • a history of growing leaders and fostering high-trust, high-performance teams
Job Responsibility
Job Responsibility
  • Lead and scale a high-performing, distributed platform engineering team through strong mentorship and inclusive leadership
  • define what great looks like—through reusable runbooks, technical standards, and nurturing a culture grounded in quality, belonging, and continuous learning
  • help clients modernize platforms, launch new infrastructure, and make better innovation investment decisions
  • ensure every solution is aligned with client goals and drives measurable value
  • own and evolve our delivery frameworks, platform engineering standards, and team operations
  • champion cloud-native development, DevOps and SRE best practices, and scalable architecture
  • partner with Sales, Partnerships, and Client Executives to shape and win new opportunities
  • translate client needs into technical solutions, delivery plans, and estimates
  • lead development of proposals, estimation, and pre-sales architecture discussions
  • develop reusable solution assets, infrastructure templates and case studies for future engagements
What we offer
What we offer
  • Remote work with flexible working hours
  • Modus Global Office Programme: on-demand access to private offices, meeting rooms, coworking spaces and business lounges in locations in over 120 countries
  • Employee Referral Program
  • Client Referral Program
  • Travel according to client or team needs
  • The chance to work side-by-side with thought leaders in emerging tech
  • Access to more than 12,000 courses with a licensed Coursera account
  • Possibility to obtain paid certification/courses if they align with company goals and are relevant to the employee's role
  • Fulltime
Read More
Arrow Right

IT Development Manager

At Bosch, we shape the future by inventing high-quality technologies and service...
Location
Location
Poland , Warszawa
Salary
Salary:
Not provided
https://www.bosch.pl/ Logo
Robert Bosch Sp. z o.o.
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in developing Data Management and Analytics applications
  • Extensive Knowledge of (Meta) Data Management capabilities: Data Governance, Data Lineage, Data Assets, Data Products, Data Catalog, Data Marketplace, Data Policy, Ontologies
  • Considerable experience in designing, developing & integrating Data and Analytics applications using modern architectures and frameworks, structured and unstructured data
  • Broad and up to date technical knowledge: Databases (e.g. Oracle, Databricks), Middleware/Integration (e.g. Solace, Kafka), API Management, Cloud Providers (Azure, AWS, Google), AI Technologies (LLMs, Agents)
  • Experimental mindset, self-motivation to search for solutions and appreciate learning new things
  • Strong communication skills, proactive in contacting people
  • English (fluent in spoken and written), German would be a plus.
Job Responsibility
Job Responsibility
  • Lead technical developments of a Data Intelligence Platform
  • Partner with Business, Solution and IT Architects on the strategy and delivery of the Platform functionalities
  • Define consistent system specific guidelines for the software development and configuration environment in alignment with central guidelines
  • Assess customer requirements from a technical perspective with respective effort estimations and assist in the design and development of proof of concept and prototypes
  • Document specifications and support the creation of operational support manuals during the technical implementation
  • Take over responsibility for interface implementation and documentation
  • Steer external and internal developers
  • Support DevOps by sizing and scalability concepts (for specific use cases).
What we offer
What we offer
  • Competitive salary + annual bonus
  • Hybrid work with flexible working hours
  • Referral Bonus Program
  • Copyright costs for IT employees
  • Complex environment of working, professional support and possibility to share knowledge and best practices
  • Ongoing development opportunities in a multinational environment
  • Broad access to professional trainings (incl. language courses), conferences and webinars
  • Private medical care and life insurance
  • Cafeteria System with multiple benefits (incl. MultiSport, shopping vouchers, cinema tickets, etc.)
  • Prepaid Lunch Card
  • Fulltime
Read More
Arrow Right

Director of Data, ML & AI Engineering

As Director of Data, ML & AI Engineering, you will lead the design, delivery, an...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
collinsongroup.com Logo
Collinson
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Senior leadership experience across data, platform, ML, and/or AI engineering in enterprise or federated environments
  • Deep understanding of modern cloud-native data platforms, large-scale distributed systems, and emerging data technologies
  • Proven experience delivering and evolving enterprise-scale data and AI platforms from inception to production
  • Hands-on knowledge of ML/AI operationalisation, including pipelines, lifecycle management, and experimentation frameworks
  • Demonstrated capability managing cost, risk, security, and compliance at scale
  • Strong people leadership and team development experience, promoting inclusion, clarity, and accountability
  • Ability to translate complex technical concepts into business impact with senior stakeholders
  • A collaborative, adaptive leadership style that encourages openness, trust, and curiosity
Job Responsibility
Job Responsibility
  • Lead the design and evolution of enterprise-grade data, ML, and AI engineering platforms, covering ingestion, transformation, feature management, model pipelines, and deployment
  • Ensure platforms are resilient, scalable, and production-ready to support both analytics and AI workloads
  • Balance continuous innovation with operational reliability, service continuity, and business value
  • Lead multiple engineering squads across data, platform, ML, and AI engineering disciplines
  • Establish clear engineering standards, ownership models, and accountability frameworks
  • Embed modern delivery practices such as DevOps, DataOps, MLOps, and AIOps to improve reliability and speed
  • Champion operational excellence, predictable delivery, and effective incident management
  • Partner with the VP of Analytics and Head of Innovation & AI to align platform capabilities with insight delivery, experimentation, and AI productisation
  • Provide high-quality, governed, production-ready data products and shared tools that empower analytics and AI teams
  • Accelerate time to value through automation, reusable patterns, and scalable platform abstractions
Read More
Arrow Right
New

Application Production Support Engineer Generative AI

Engineer the future of global finance. At Citi, our Tech team doesn’t just suppo...
Location
Location
India , Pune
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6-8 years of relevant experience in technical support, platform operations, or engineering
  • Exposure to architecture concepts with the ability to contribute to technical discussions and understand design decisions
  • Experience working with business partners, engineering teams, or technology stakeholders
  • Demonstrated experience supporting IT services, platform operations, or infrastructure components
  • Strong verbal and written communication skills, with the ability to document technical issues clearly
  • Experience supporting operational workstreams or participating in platform improvement initiatives
  • Participation in resilience‑related or stability‑focused activities preferred
  • Ability to collaborate effectively with cross‑functional teams
  • Strong organizational skills and ability to manage daily workload and task priorities
  • Working knowledge of Generative AI concepts preferred
Job Responsibility
Job Responsibility
  • Understand how application support functions within the broader technology organization and contributes to business objectives
  • Assist with vendor coordination and day‑to‑day interactions with offshore managed services
  • Support efforts to improve service levels, including participating in incident management, problem management, and knowledge‑sharing initiatives
  • Partner with development and engineering teams to support application stability and operational readiness
  • Assist in collecting capacity, performance, and latency data to support platform planning efforts
  • Support application onboarding activities using established guidelines and standards
  • Contribute to fostering a collaborative and supportive team environment that encourages skill development
  • Participate in cost‑efficiency initiatives such as Root Cause Analysis reviews, knowledge management, and performance tuning
  • Assist in preparing materials for business review meetings and help align technology activities with business needs
  • Follow established support processes and tool standards and provide input on improvement opportunities
  • Fulltime
Read More
Arrow Right

Principal Engineer

The Principal AI/ML Operations Engineer leads the architecture, automation, and ...
Location
Location
United States , Pleasanton, California
Salary
Salary:
251000.00 - 314500.00 USD / Year
blackline.com Logo
BlackLine
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Machine Learning, Data Science, or a related field
  • 10+ years in ML infrastructure, DevOps, and software system architecture
  • 4+ years in leading MLOps or AI Ops platforms
  • Strong programming skills in languages such as Python, Java, or Scala
  • Expertise in ML frameworks (TensorFlow, PyTorch, scikit-learn) and orchestration tools (Airflow, Kubeflow, Vertex AI, MLflow)
  • Proven experience operating production pipelines for ML and LLM-based systems across cloud ecosystems (GCP, AWS, Azure)
  • Deep familiarity with LangChain, LangGraph, ADK or similar agentic system runtime management
  • Strong competencies in CI/CD, IaC, and DevSecOps pipelines integrating testing, compliance, and deployment automation
  • Hands-on with observability stacks (Prometheus, Grafana, Newrelic) for model and agent performance tracking
  • Understanding of governance frameworks for Responsible AI, auditability, and cost metering across training and inference workloads
Job Responsibility
Job Responsibility
  • Define enterprise-level standards and reference architectures for ML-Ops and AIOps systems
  • Partner with data science, security, and product teams to set evaluation and governance standards (Guardrails, Bias, Drift, Latency SLAs)
  • Mentor senior engineers and drive design reviews for ML pipelines, model registries, and agentic runtime environments
  • Lead incident response and reliability strategies for ML/AI systems
  • Lead the deployment of AI models and systems in various environments
  • Collaborate with development teams to integrate AI solutions into existing workflows and applications
  • Ensure seamless integration with different platforms and technologies
  • Define and manage MCP Registry for agentic component onboarding, lifecycle versioning, and dependency governance
  • Build CI/CD pipelines automating LLM agent deployment, policy validation, and prompt evaluation of workflows
  • Develop and operationalize experimentation frameworks for agent evaluations, scenario regression, and performance analytics
What we offer
What we offer
  • short-term and long-term incentive programs
  • robust offering of benefit and wellness plans
  • Fulltime
Read More
Arrow Right