CrawlJobs Logo

Ai Ops Principal Engineer

United States, Charlotte / Iselin 159000.00 - 305000.00 USD / Year · Job Posted June 14, 2026

Job offer has expired

Job Link Share

Job Description

Wells Fargo is seeking a Principal Engineer – AIOps to join Platform Strategy & Transformation as part of Commercial & Corporate and Investment Management Technology (CCIBT) group. Learn more about the career areas and business divisions at wellsfargojobs.com. This role sits at the core of CCIBT's Zero Touch Production (ZTP) transformation agenda, driving the strategy, architecture, and execution of next‑generation AIOps capabilities across the enterprise. You will define and deliver intelligent, autonomous operations by leveraging AI/ML, observability, automation, and event-driven architectures to minimize manual intervention, improve resilience, and enable self-healing systems. You will partner closely with senior engineering, platform, SRE, and business leaders to accelerate AIOps adoption, embed intelligence into production ecosystems, and deliver measurable improvements in availability, efficiency, and operational risk reduction. This is a hands-on senior developer role requiring strong development skills and ability to work with advanced automations using technologies like Robotic Process Automation (RPA), Artificial Intelligence, Low-code technologies like UiPath, Microsoft Power Platforms, Google ADK, LangChain, LangGraph, Alteryx etc.

Job Responsibility

  • Lead the strategy, design, and execution of AIOps platforms and capabilities to enable Zero Touch Production across CCIBT
  • Define and drive enterprise-wide AIOps roadmap, including observability, event correlation, anomaly detection, predictive insights, and automated remediation
  • Architect and implement self-healing systems leveraging AI/ML, event-driven automation, and closed-loop workflows
  • Drive adoption of intelligent incident management, root cause analysis (RCA), noise reduction, and auto-resolution techniques
  • Establish target-state architecture and engineering standards for AIOps platforms, tooling, and integrations
  • Influence enterprise technology strategy by evaluating emerging AIOps trends, tools, and frameworks
  • Partner with SRE, infrastructure, cloud, and application teams to embed AIOps into SDLC, CI/CD, and production operations
  • Lead large-scale engineering initiatives with cross-functional and enterprise impact
  • Provide thought leadership on resilience engineering, reliability, automation, and production excellence
  • Mentor and guide senior engineers and teams on AIOps best practices, architecture, and implementation
  • Collaborate with risk, compliance, and governance teams to ensure secure, compliant, and auditable automation

Requirements

  • 7+ years of Engineering experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
  • 5+ years of experience in AIOps, SRE, production engineering, or large-scale distributed systems operations
  • 4+ years of experience with Python, programming, or scripting languages
  • 2+ years of experience working with Generative AI, large language models (LLM), or foundation models

Nice to have

  • 2+ Agentic AI and Agent building experience
  • Experience with AI-powered development or GitHub Copilot
  • Proven experience designing and implementing observability, monitoring, and automation platforms at scale
  • Deep expertise in AIOps platforms and tools (e.g., Prometheus, AppDynamics, Splunk, ITRS Geneos, BigPanda, OpenTelemetry ecosystems)
  • Strong experience with AI/ML for IT operations, including anomaly detection, event correlation, forecasting, and intelligent alerting
  • Hands-on experience with automation frameworks (e.g., Ansible, Terraform, or similar) and event-driven architectures
  • Strong understanding of SRE principles, SLIs/SLOs, error budgets, and reliability engineering practices
  • Experience building self-healing systems and closed-loop remediation workflows
  • Proficiency in cloud platforms and cloud-native architectures (Kubernetes, microservices)
  • Knowledge of data pipelines, streaming platforms (Kafka), and telemetry ingestion/processing
  • Familiarity with GenAI/LLM-assisted operations, including incident summarization, knowledge mining, and automated runbook generation
  • Ability to operate across complex organizational structures with strong stakeholder management and communication skills
  • Proven ability to define target-state architecture, operating models, and actionable roadmaps
  • Ability to manage multiple high-complexity engineering initiatives with significant enterprise impact
  • Strong analytical, problem-solving, and architectural design skills
  • Excellent communication and documentation skills (e.g., Confluence, Git, architecture diagrams)
  • Comfortable driving transformation and influencing senior leadership in a fast-paced, evolving environment

What we offer

  • Health benefits
  • 401(k) Plan
  • Paid time off
  • Disability benefits
  • Life insurance, critical illness insurance, and accident insurance
  • Parental leave
  • Critical caregiving leave
  • Discounts and savings
  • Commuter benefits
  • Tuition reimbursement
  • Scholarships for dependent children
  • Adoption reimbursement

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Ai Ops Principal Engineer

8 matching positions

Principal AI Ops Architect

Scale’s rapidly growing Global Public Sector team is focused on using AI to addr...
Location
Location
Qatar; United Kingdom , Doha; London
Salary
Salary:
Not provided
scale.com Logo
Scale
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years in a high-impact technical role (SRE, FDE or MLOps) with experience in the public sector
  • Familiarity with international government security standards and the complexities of deploying sovereign AI
  • Proven experience maintaining production-grade applications with a deep understanding of the full request lifecycle-connecting frontend/API layers to the backend and AI core
  • Proficiency in coding and the modern AI infrastructure, including Kubernetes, vector databases, agentic development, and LLM observability tools
  • Ownership: You treat every production deployment as your own. You race toward solving hard problems before the customer even sees them
  • Reliability: You understand that in the public sector, a model failure may be a risk to public safety or privacy
  • Customer communication: The ability to explain to a high-ranking official why the performance of the system has degraded and how we are fixing it
Job Responsibility
Job Responsibility
  • Own the production outcome: Take full accountability for the long-term performance and reliability of AI use cases deployed across international government agencies
  • Ensure Full-Stack integrity: Oversee the end-to-end health of the platform, ensuring seamless integration between the AI core and all full-stack components, from APIs to UI, to maintain a responsive and production-ready environment
  • Scale the feedback loop: Build automated systems to monitor model performance and data drift across geographically dispersed environments, ensuring the right levels of reliability
  • Navigate global compliance: Manage the technical lifecycle within diverse regulatory frameworks
  • Incident command: Lead the response for production issues in mission-critical environments, ensuring rapid resolution and building the guardrails to prevent them from happening again
  • Bridge the gap: Translate deep technical performance metrics into clear insights for senior international government officials
  • Drive product evolution: Partner with our Engineering and ML teams to ensure the lessons learned in the field directly influence the technical architecture and decisions of future use cases
Read More
Arrow Right

Principal Engineer, Computer Vision & AI /3D Data (Team Lead)

Cesium is the leading open platform for streaming and visualizing huge 3D geospa...
Location
Location
Salary
Salary:
Not provided
bentley.com Logo
Bentley Systems
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • PhD or equivalent in Computer Vision, AI, or Machine Learning
  • 10+ years' experience as a Software Engineer
  • at least 5+ years leading and mentoring technical teams within a product driven environment
  • 5+ years of experience in AI/3D vision development including industry experience of deploying AI and working with 3D data, including ML ops and practical, user-focused product development
  • Expertise in deep learning, computer vision, 3D geometry, and multimodal AI, including experience with large language models
  • Strong programming skills in Python and/or C++ with ML frameworks (PyTorch, TensorFlow), GPU programming (CUDA)
  • Excellent communication and leadership skills
  • Fluent in English
Job Responsibility
Job Responsibility
  • Lead and mentor a team of 5 engineers, providing technical direction and project coordination in computer vision and AI/3D data modeling projects, providing coaching and guidance
  • Design and deploy advanced AI/ML and 3D vision algorithms generated by our modelling team for large-scale datasets for practical, user focused product development including point clouds, meshes, sensor data and Gaussian splatting
  • Define the AI strategy and contribute to product roadmap decisions
  • Implement ML Ops practices for scalable, automated training and inference pipelines
  • Conduct research on emerging AI techniques for 3D understanding and integrate findings into production
  • Ensure quality through rigorous evaluation, optimization, and code reviews
What we offer
What we offer
  • A great Team and culture
  • An exciting career as an integral part of a world-leading software company providing solutions for architecture, engineering, and construction
  • An attractive salary and benefits package
  • A commitment to inclusion, belonging and colleague wellbeing through global initiatives and resource groups
  • A company committed to making a real difference by advancing the world’s infrastructure for better quality of life, where your contributions help build a more sustainable, connected, and resilient world
Read More
Arrow Right

Principal Engineer, Computer Vision & AI /3D Data

Cesium is the leading open platform for streaming and visualizing huge 3D geospa...
Location
Location
Salary
Salary:
Not provided
bentley.com Logo
Bentley Systems
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • PhD or equivalent in Computer Vision, AI, or Machine Learning
  • 5+ years of experience in AI/3D vision development including industry experience of deploying AI and working with 3D data, including ML ops and practical, user-focused product development
  • Expertise in deep learning, computer vision, 3D geometry, and multimodal AI, including experience with large language models
  • Proven experience at least 5+ years leading and mentoring technical teams
  • Strong programming skills in Python and/or C++ with ML frameworks (PyTorch, TensorFlow), GPU programming (CUDA)
  • Excellent communication and leadership skills
  • Fluent in English
Job Responsibility
Job Responsibility
  • Lead and mentor a team of 5 engineers, providing technical direction and project coordination in computer vision and AI/3D data modeling projects, providing coaching and guidance
  • Design and deploy advanced AI/ML and 3D vision algorithms generated by our modelling team for large-scale datasets for practical, user focused product development including point clouds, meshes, sensor data and Gaussian splatting
  • Define the AI strategy and contribute to product roadmap decisions
  • Implement ML Ops practices for scalable, automated training and inference pipelines
  • Conduct research on emerging AI techniques for 3D understanding and integrate findings into production
  • Ensure quality through rigorous evaluation, optimization, and code reviews
What we offer
What we offer
  • A great Team and culture
  • An exciting career as an integral part of a world-leading software company providing solutions for architecture, engineering, and construction
  • An attractive salary and benefits package
  • A commitment to inclusion, belonging and colleague wellbeing through global initiatives and resource groups
  • A company committed to making a real difference by advancing the world’s infrastructure for better quality of life, where your contributions help build a more sustainable, connected, and resilient world
Read More
Arrow Right

Principal AI Demand Planner

Microsoft’s Cloud business is expanding, and the Cloud Supply Chain (CSCP) organ...
Location
Location
United States , Redmond
Salary
Salary:
130900.00 - 277200.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science, Engineering, Supply Chain, Information Technology (IT), Business, Operations, Finance, Accounting, Data Science, or related field AND 8+ years supply chain, inventory management or sales operations experience, preferably in planning (demand/supply/forecasting), cloud industry experience, infrastructure, data science, and/or channel management experience
  • OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Microsoft Cloud Background Check
  • Ability to think, understand, and process information quickly
  • Cloud industry experience, data science, and/or channel management experience
  • End-to-end supply chain experience (EX: plan, source, make, deliver) and experience working across those various capabilities
  • Mindset of continuous improvement and proven capability to synthesize information and draw conclusions
  • Experience building PowerBIs and/or AI Agents and adopting them into key tasks and processes
Job Responsibility
Job Responsibility
  • Define and drive the 5-year long-range forecast for AI infrastructure
  • Translate strategic business priorities and capacity signals into actionable, data-driven demand plans
  • Partner cross-functionally with Finance, Engineering, Cloud Operations, and Supply Chain leadership
  • Create the AI-related long-range demand plan and keep it up to date
  • Work with sourcing on what commodities require inputs to support LTAs
  • Identify highly complex opportunities and gaps in long range planning and work with engineering, datacenter, supply chain teams and finance to develop scenarios
  • Prepare executive presentations and briefings to communicate status, shifts and decisions required
  • Support S&OP transformation efforts across the Cloud Supply Chain
  • Define data driven metrics and use data to make decisions
  • Fulltime
Read More
Arrow Right

Full Stack AI Engineer

We are seeking a highly skilled Full Stack AI Engineer to join our team and work...
Location
Location
India , Vadodara;Ahmedabad;Indore
Salary
Salary:
Not provided
Prakash Software Solutions Pvt. Ltd
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8-10 years of experience in software engineering with a strong focus on AI/ML
  • Proficiency in frontend frameworks like React, Angular, or Vue.js
  • Strong hands-on experience with backend technologies like Node.js, Python (with frameworks like Flask, Django, or FastAPI), or Java
  • Experience with cloud platforms such as AWS, Azure, or GCP
  • Proven ability to design and implement complex, scalable, and maintainable architectures
  • Excellent problem-solving and analytical skills
  • Strong communication and collaboration skills
  • Passion for continuous learning and staying up to date with the latest advancements in AI/ML
  • End-to-end experience with at least one full AI stack on Azure, AWS, or GCP, including components such as Azure Machine Learning, AWS SageMaker, or Google AI Platform
  • Hands-on experience with agent frameworks like Autogen, AWS Agent Framework, LangGraph etc.
Job Responsibility
Job Responsibility
  • Collaborate with the Principal Architect to design and implement AI agents and multi-agent frameworks
  • Develop and maintain robust, scalable, and maintainable microservices architectures
  • Ensure seamless integration of AI agents , MCP Servers with core systems and databases
  • Develop APIs and SDKs for internal and external consumption
  • Work closely with data scientists to fine-tune and optimize LLMs for specific tasks and domains
  • Implement ML Ops practices, including CI/CD pipelines, model versioning, and experiment tracking
  • Design and implement comprehensive monitoring and observability solutions to track model performance, identify anomalies, and ensure system stability
  • Utilize containerization technologies such as Docker and Kubernetes for efficient deployment and scaling of applications
  • Leverage cloud platforms such as AWS, Azure, or GCP for infrastructure and services
  • Design and implement data pipelines for efficient data ingestion, transformation, and storage
  • Fulltime
Read More
Arrow Right

Principal Engineer

The Principal AI/ML Operations Engineer leads the architecture, automation, and ...
Location
Location
United States , Pleasanton, California
Salary
Salary:
251000.00 - 314500.00 USD / Year
blackline.com Logo
BlackLine
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Machine Learning, Data Science, or a related field
  • 10+ years in ML infrastructure, DevOps, and software system architecture
  • 4+ years in leading MLOps or AI Ops platforms
  • Strong programming skills in languages such as Python, Java, or Scala
  • Expertise in ML frameworks (TensorFlow, PyTorch, scikit-learn) and orchestration tools (Airflow, Kubeflow, Vertex AI, MLflow)
  • Proven experience operating production pipelines for ML and LLM-based systems across cloud ecosystems (GCP, AWS, Azure)
  • Deep familiarity with LangChain, LangGraph, ADK or similar agentic system runtime management
  • Strong competencies in CI/CD, IaC, and DevSecOps pipelines integrating testing, compliance, and deployment automation
  • Hands-on with observability stacks (Prometheus, Grafana, Newrelic) for model and agent performance tracking
  • Understanding of governance frameworks for Responsible AI, auditability, and cost metering across training and inference workloads
Job Responsibility
Job Responsibility
  • Define enterprise-level standards and reference architectures for ML-Ops and AIOps systems
  • Partner with data science, security, and product teams to set evaluation and governance standards (Guardrails, Bias, Drift, Latency SLAs)
  • Mentor senior engineers and drive design reviews for ML pipelines, model registries, and agentic runtime environments
  • Lead incident response and reliability strategies for ML/AI systems
  • Lead the deployment of AI models and systems in various environments
  • Collaborate with development teams to integrate AI solutions into existing workflows and applications
  • Ensure seamless integration with different platforms and technologies
  • Define and manage MCP Registry for agentic component onboarding, lifecycle versioning, and dependency governance
  • Build CI/CD pipelines automating LLM agent deployment, policy validation, and prompt evaluation of workflows
  • Develop and operationalize experimentation frameworks for agent evaluations, scenario regression, and performance analytics
What we offer
What we offer
  • short-term and long-term incentive programs
  • robust offering of benefit and wellness plans
  • Fulltime
Read More
Arrow Right

Principal Engineer, Model Dev Platform

As the Principal Engineer for the Model Development Platform at Wayve, you will ...
Location
Location
United States , Sunnyvale
Salary
Salary:
Not provided
wayve.ai Logo
Wayve
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Technical Leadership at Scale – 10+ years of experience designing and building large-scale distributed systems, ML/AI infrastructure, full stack web application, or developer platforms, including at least 3 years as a staff or principal-level engineer
  • Architectural Depth & Breadth – Proven ability to design systems spanning web platforms, ML pipelines, and large-scale compute orchestration (e.g., Spark, Ray, Kubernetes, Airflow, MLflow)
  • Reliability & Performance Mindset – Experience driving platform reliability improvements, defining SLAs/SLOs, and building self-healing and observable systems that operate at “four nines” availability or better
  • Hands-On Systems Design – Deep understanding of distributed computing, workflow orchestration, data modeling, and API design, with the ability to write and review production-quality code
  • Collaborative Influence – Excellent communication and cross-functional collaboration skills
  • ability to guide engineers, managers, and researchers toward unified technical direction
  • Mentorship & Culture – Demonstrated success in mentoring engineers across levels and cultivating a culture of engineering excellence
  • Education – Bachelor’s degree in Computer Science, Software Engineering, or related field (advanced degree preferred, or equivalent experience)
Job Responsibility
Job Responsibility
  • Design and evolve the overarching architecture of the model development platform, ensuring system-wide reliability, observability, and scalability
  • Work across disciplines—from front-end web UIs to large-scale distributed training, from Spark-based data pipelines to experiment scheduling algorithms using linear optimization—to unify the platform’s architecture and ensure smooth interoperability between systems
  • Dive deep into the thorniest technical challenges faced by individual subteams, bringing your expertise in distributed systems, large-scale compute, and system design to bear
  • Develop and refine systems that optimize how models are tested—whether in simulation or on-road—balancing constraints like hardware availability, safety requirements, and research priorities
  • Architect data processing pipelines capable of ingesting, transforming, and enriching petabytes of sensor data from the global fleet
  • Serve as a mentor and coach for engineers across the organization—developing technical talent, improving design practices, and fostering a culture of learning and technical excellence
  • Partner with Product Management, Research, and Operations to align technical architecture with user needs and product vision
Read More
Arrow Right
New

Principal Software Engineer

Do you want to build AI-powered developer services that enable a billion builder...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or Master's degree in Computer Science, or equivalent practical experience
  • 10+ years of industry experience building and shipping software using modern programming languages such as C#, C++, Java, Go or Python
  • Track record of successfully leading end-to-end engineering projects from conception to delivery across multiple ship cycles
  • Excellent technical design, problem solving and debugging skills
  • Ability to learn new technologies quickly and adapt to deliver customer and business impact
  • Customer obsession and passion for shipping high quality products
  • Track record of collaborating effectively with multiple cross-functional teams across geographies
Job Responsibility
Job Responsibility
  • Design, implement, test, instrument, document and run PaaS Services in Azure
  • Partner with product management, OSS community, ISV partners, customers, and other stakeholders to define requirements, scope projects and ship products in rapid, iterative cycles
  • Stay up to date on industry trends around AI Advancements, Cloud Native technologies, open source development and dev ops processes, leading efforts on innovation, modern design, and reliability engineering
  • Champion engineering practices of safe and fast paced releases – e.g. flight code changes and drive telemetry and analytics to take a data-driven approach to understanding customer impact
  • Support and influence team culture of customer obsession, continuous improvement, reflection, and growth - mentor, initiate and participate in in design and code sharing
  • Be avid customer advocate – meet with customers, and product support to learn about their experience, analyze how features are performing in production and make the product better
  • Build for security, privacy, scalability, reliability, and compliance
  • Fulltime
Read More
Arrow Right