CrawlJobs Logo

AI Operations Engineer II

https://www.microsoft.com/ Logo

Microsoft Corporation

Location Icon

Location:
United States , Multiple Locations

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

100600.00 - 199000.00 USD / Year

Job Description:

The Security AI Platform team builds and operates production infrastructure that powers AI-native security capabilities at Microsoft scale. The AI Operations group is responsible for deployments, CI/CD pipelines, production reliability, and first-level on-call. We work closely with the Platform + Apps team who develops the core product features. We are seeking an AI Operations Engineer II to build and maintain operational infrastructure for the platform. In this role, you will work on CI/CD pipelines, Kubernetes deployments, monitoring systems, and incident response. You will develop expertise in production operations while contributing to reliability improvements with guidance from senior engineers.

Job Responsibility:

  • Maintain and extend CI/CD pipelines within established patterns: Azure DevOps/GitHub Actions workflows, build automation, test integration, and deployment scripts
  • Manage Kubernetes deployments: Helm chart updates, deployment execution, pod troubleshooting, and resource configuration
  • Develop and maintain observability: Prometheus metrics collection, Grafana dashboard creation, alerting rules, and log queries (Kusto/KQL)
  • Participate in on-call rotation: respond to alerts, triage incidents, escalate appropriately, and document findings
  • Debug and diagnose issues: analyze logs, traces, and metrics to identify problems
  • work with senior engineers on complex issues
  • Maintain Infrastructure as Code: update Bicep templates, Helm values, and environment configurations
  • Ensure branch health: monitor PR pipelines, address build failures, maintain security scanning, and enforce merge policies
  • Execute production deployments: run canary rollouts, monitor deployment health, and perform rollbacks when needed
  • Write and maintain runbooks: document operational procedures, troubleshooting guides, and deployment checklists
  • Collaborate with Platform team: understand service requirements, validate deployment readiness, and provide operational feedback
  • Participate in post-incident reviews: contribute to root cause analysis and implement reliability improvements

Requirements:

  • Bachelor's Degree in Computer Science or related technical field AND 2+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • These requirements include, but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft background and Microsoft Cloud background check upon hire/transfer and every two years thereafter

Nice to have:

  • Master's Degree in Computer Science or related technical field AND 3+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 5+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • 2+ years technical engineering experience in DevOps, SRE, or platform operations
  • 2+ years experience with CI/CD pipelines
  • 1+ years hands-on experience with Kubernetes
  • Experience with Azure services: AKS, Event Hub, Storage, Key Vault, or Managed Identity
  • Infrastructure as Code experience: Bicep, Terraform, or Helm chart development
  • Scripting proficiency: PowerShell, Bash, or Python for automation
  • Experience with log analysis and querying: KQL/Kusto, Loki, or ELK stack
  • Exposure to incident response and on-call responsibilities
  • Understanding of distributed systems concepts and troubleshooting approaches
  • Experience with container technologies: Docker, container registries, and image management
  • Familiarity with ML infrastructure: GPU nodes, inference servers, or model deployment

Additional Information:

Job Posted:
February 18, 2026

Employment Type:
Fulltime
Work Type:
Remote work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for AI Operations Engineer II

Systems Engineer II (Automation/AI)

PagerDuty is seeking a Systems Engineer II (Automation/AI) to join our diverse, ...
Location
Location
Canada , Toronto
Salary
Salary:
83000.00 - 125000.00 CAD / Year
https://www.pagerduty.com Logo
PagerDuty
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in an IT support environment
  • 3-5+ years of experience in a systems administration/engineering role with a focus on automation
  • Solid understanding of Agentic and Generative AI concepts and platforms
  • Experience working with IAM tools (Entra, AzureAD, Okta) and processes (account lifecycle, permissions, privileged account management)
  • Experience working in a cloud environment (Azure, GCP, and/or AWS)
  • Provide support to employees across the globe using ITSM solutions such as Jira as Tier 2 support
  • Experience with Workato or similar automation tools (Zapier, etc)
  • Advanced Automation skills: Implementing more complex workflows and approval processes
  • Proficiency in Windows and MacOS operating systems
  • Experience with configuration management tools (e.g., Intune, JAMF)
Job Responsibility
Job Responsibility
  • Develop and maintain processes and automation processes to streamline business/productivity tasks for PagerDuty employees
  • Administrate and iterate on our GenAI/AgenticAI programs
  • Look for areas of improvement to accelerate AI adoption throughout the company
  • Work with other IT members to advance our IAM programs, have solid understanding of authentication protocols and processes (SCIM/SAML/etc)
  • Provide technical support and troubleshooting for desktop-related issues, ensuring minimal downtime and disruption for employees
  • Work closely with other IT team members, including network and security engineers, to ensure seamless integration and operation of desktop environments
  • Maintain comprehensive documentation of desktop configurations, procedures, and troubleshooting steps
  • Assist in the creation and delivery of training materials to help employees effectively use their laptops and related software
What we offer
What we offer
  • Competitive salary
  • Comprehensive benefits package from day one
  • Flexible work arrangements
  • Company equity
  • ESPP (Employee Stock Purchase Program)
  • Retirement or pension plan
  • Generous paid vacation time
  • Paid holidays and sick leave
  • Dutonian Wellness Days & HibernationDuty - companywide paid days off in addition to PTO
  • Paid parental leave: 22 weeks for pregnant parent, 12 weeks for non-pregnant parent
  • Fulltime
Read More
Arrow Right

IT Engineer II

Affirm is reinventing credit to make it more honest and friendly, giving consume...
Location
Location
United States
Salary
Salary:
102000.00 - 155000.00 USD / Year
affirm.com Logo
Affirm
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3–5 years of experience in IT Engineering, Systems Administration, or Solutions Engineering within a SaaS or enterprise environment
  • Deep understanding of modern SaaS ecosystems, SSO/SCIM integrations, and identity/access management frameworks (e.g., Okta, Azure AD)
  • Hands-on experience coding and automating with APIs and scripting languages (Python, JavaScript/TypeScript)
  • Strong communication and stakeholder management skills—capable of bridging business goals with technical implementation
  • Demonstrated ability to manage multiple priorities and deliver measurable results in a dynamic, fast-paced environment
Job Responsibility
Job Responsibility
  • Technical Strategy & Systems Design: Partner with business and technical stakeholders to translate requirements into scalable, secure, and user-centric solutions. Design architectures that integrate SaaS platforms, identity systems, and automations across the enterprise
  • Automation & Integration Development: Build and maintain automations using REST APIs, scripting languages (Python, JavaScript/TypeScript), and workflow platforms (e.g., Okta Workflows, Workato, Zapier). Streamline repetitive processes and strengthen cross-system interoperability
  • AI & Emerging Technology Enablement: Leverage AI-assisted tools to accelerate development, incident response, and documentation. Experiment with generative AI and workflow intelligence to improve IT operations and knowledge management
  • Operational Reliability & Security: Enhance the reliability and resilience of critical IT services. Implement monitoring, alerting, and governance practices that protect data and maintain compliance with security standards
  • Incident Resolution & Root Cause Analysis: Troubleshoot technical issues efficiently, conduct root cause analyses, and deliver durable long-term fixes that reduce operational noise and support business continuity
  • Knowledge Sharing & Collaboration: Create and maintain technical documentation, playbooks, and process diagrams. Mentor teammates and contribute to a culture of transparency, automation, and continuous improvement
What we offer
What we offer
  • Health care coverage - Affirm covers all premiums for all levels of coverage for you and your dependents
  • Flexible Spending Wallets - generous stipends for spending on Technology, Food, various Lifestyle needs, and family forming expenses
  • Time off - competitive vacation and holiday schedules allowing you to take time off to rest and recharge
  • ESPP - An employee stock purchase plan enabling you to buy shares of Affirm at a discount
  • Fulltime
Read More
Arrow Right

Engineering Manager II, Java Backend - Marketing Org

Groupon is on a radical journey to transform our business with relentless pursui...
Location
Location
India
Salary
Salary:
Not provided
groupon.com Logo
Groupon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • BE/BTech or ME/MTech degree in Computer Science, Software Engineering from a recognised institute
  • 10+ years of overall industry experience with 4 years of leadership (hands-on engineering management) role in developing and supporting web applications, micro services in a distributed environment
  • Experience with managing platform services, delivering projects in a dynamic environment
  • Experience with managing high-performance individuals in different time zones
  • Good communication and collaboration skills, and the ability to deal with ambiguity
  • Strong experience in programming languages - Java and competent in designing and building enterprise-scale applications, common frameworks, etc.
  • Experience building and scaling large-scale distributed systems and development experience with service-oriented architectures/microservices
  • Prior AWS/GCP/Azure/Pivotal Cloud Foundry experience is a must
  • Proven success managing and scaling platform services in fast-moving environments
  • Experience leading cross-regional teams and driving platform-level strategies
Job Responsibility
Job Responsibility
  • Customer Communication Excellence: Drive reliability, performance, and personalization across our Dispatch and Subscription systems
  • Achieve a 99.99% delivery success rate and under 1s latency for real-time communications
  • Subscription Platform Growth: Lead engineering teams focused on the evolution and scalability of our subscription systems
  • Enable 2x growth in subscriber engagement and conversion over the next 12 months
  • Platform Scalability & Operational Efficiency: Reduce technical debt and optimize resource usage across core services by modernizing legacy infrastructure and adopting cloud-native patterns
  • Achieve a 30% reduction in platform operational costs while improving performance benchmarks
  • Engineering Team Performance & Growth: Build high-performing, autonomous, and globally distributed teams that execute with agility and ownership
  • Maintain 90%+ team engagement and less than 10% regrettable attrition, with measurable velocity improvements
  • AI-First Transformation Enablement: Champion the integration of AI/ML where it enhances business outcomes, from system automation to intelligent routing and customer experience
  • Deliver at least 2 AI-driven features or systems into production per year
Read More
Arrow Right

Senior Risk Consultant II - Liability

At Allianz Commercial (AzC), we are the global leader for insuring corporate and...
Location
Location
United States , Remote
Salary
Salary:
110000.00 - 130000.00 USD / Year
https://www.allianz.com Logo
Allianz
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Minimum 7 years’ professional engineering or consulting experience in industrial or project development environments within Manufacturing, Automotive, or Food industries
  • Through education or experience have demonstrable knowledge and skills in relevant disciplines including engineering in environmental management, hydrogeology, environmental compliance, health and safety systems, waste management, chemical analysis, and occupational hygiene
  • Experience working in risk management and loss prevention processes in industrial operations or in project driven environments
  • Knowledge of Standardization Systems (ISO, API, ASME, etc) and auditing processes
  • Basic data analytic experience in analyzing datasets related to loss prevention in the corresponding engineering discipline
  • Master’s Degree in either Mechanical, Industrial, Electrical, Chemical, Materials, Manufacturing, Systems, Automation and Robotics, Quality, or Process Engineering (other engineering disciplines will be considered if they align with the rest of the requirements)
  • Or Bachelor’s degree or equivalent in other area will be considered in conjunction with extensive technical experience
  • A demonstrable engineering/technical expertise
  • Commitment to maintaining knowledge of developments and new technologies in field
  • Knowledge of the business-related liability legal environment, regulations, guidelines as they apply to the North America (USA and Canada) businesses and projects
Job Responsibility
Job Responsibility
  • Complete desk top reviews and risk assessments (pre & post loss) for underwriting evaluating and summarizing exposures and controls on General and Product Liability
  • Work with underwriting, claims, the client and the broker where appropriate to develop customized loss control programs including locations to be surveyed (or included as part of continuing service), scope and cost of risk consulting services
  • Generate and review required reports and recommendations
  • Meet reporting deadlines in accordance with departmental performance objectives
  • Support Allianz Commercial functions (Underwriting, Claims, Marketing, etc.) and clients by providing as required scientific and technical insights in current and emerging risks about various industries, products, and technical applications
  • Survey client operations obtain appropriate data, identify hazards, discuss findings with appropriate parties
  • Generate loss control reports, recommendations etc
  • Actively contribute to ARC Liability projects that support Allianz Commercial transformation and digitalization
  • Contribute in the development and implement all available digital tools, including AI and data analytics, to enhance risk assessments and improve efficiency
What we offer
What we offer
  • Hybrid working model
  • Great compensation and benefits package
  • Generous bonus scheme and pension
  • Career development and digital learning programs
  • International career mobility
  • Support for flexible working, health, and wellbeing
  • Private healthcare
  • Generous parental leave benefits
  • Opportunities to be engaged in shaping a future that is safe, inclusive, and sustainable
  • Fulltime
Read More
Arrow Right

Senior Risk Consultant II - Marine

At Allianz Commercial (AzC), we are the global leader for insuring corporate and...
Location
Location
Canada , Toronto
Salary
Salary:
113495.00 - 176673.00 USD / Year
https://www.allianz.com Logo
Allianz
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • A minimum 7 years’ loss prevention engineering experience in an area relevant to his/her LoB in the insurance industry
  • Good knowledge and professional experience in business sectors
  • Insurance risk consulting experience and a professional engineering license is a definite plus
  • Experience working in a Global Organization with the ability to adapt to change
  • A bachelor’s Degree in Engineering discipline (preferably in a line associated with his/her line of business)
  • Other discipline will be considered in conjunction with extensive technical experience
  • A demonstrable engineering/technical expertise
  • Commitment to maintaining knowledge of developments and new technologies in field
  • Able to work technical loss control topics
  • Knowledge of specific business sectors
Job Responsibility
Job Responsibility
  • Complete desk top reviews and risk assessments (pre & post loss) for underwriting evaluating and summarizing exposures and controls
  • Work with underwriting, claims, the client and the broker where appropriate to develop customized loss control programs including locations to be surveyed (or included as part of continuing service), scope and cost of risk consulting services
  • Generate and review required reports and recommendations
  • Meet reporting deadlines in accordance with departmental performance objectives
  • Survey client operations obtain appropriate data, identify hazards, discuss findings with appropriate parties
  • Generate loss control reports, recommendations etc
  • Appoint/monitor 3rd parties ensuring adequate contract/controls in place with budgets agreed up front
  • Check and ensure quality of work provided by 3rd parties
  • Assist AzC and clients with respect to ongoing or specific projects
  • Support technical training presentations for internal and external clients as required/appropriate
What we offer
What we offer
  • Hybrid working model
  • Great compensation and benefits package
  • Generous bonus scheme and pension
  • Career development and digital learning programs
  • International career mobility
  • Support for flexible working, health, and wellbeing
  • Private healthcare
  • Generous parental leave benefits
  • Opportunities to be engaged in shaping a future that is safe, inclusive, and sustainable
  • Fulltime
Read More
Arrow Right
New

AI Engineer II

Guidepoint is seeking an AI Engineer II to join our Toronto-based AI team. The T...
Location
Location
Canada , Toronto
Salary
Salary:
Not provided
modoras.com Logo
Modoras Accounting Syd
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3–5 years of professional experience designing, building, and operating production-grade backend systems
  • 2+ years of hands-on experience building and operating Generative AI and agent systems in production
  • Strong Python engineering skills, with experience building scalable REST APIs using frameworks such as FastAPI
  • Working knowledge of JavaScript and Node.js
  • Hands-on experience building and maintaining agent-based AI systems in production using frameworks such as LangChain or LangGraph
  • Experience working with large language models from providers such as OpenAI, Anthropic, or Google Gemini, including prompt engineering and tool integration
  • Practical experience building and operating RAG systems using embeddings and retrieval systems such as Elasticsearch
  • Experience evaluating AI or agent systems beyond manual testing, including automated or programmatic evaluation using tools such as MLflow
  • Familiarity with asynchronous processing and workers using technologies such as RabbitMQ or Redis
  • Experience with monitoring, alerting, deployment, and CI/CD pipelines in cloud-native environments using Kubernetes
Job Responsibility
Job Responsibility
  • Design and implement AI systems and agents that automate compliance and editorial workflows in support of research enablement
  • Design and implement agent workflows using frameworks such as LangGraph and LangChain
  • Build agents that perform intent interpretation, task decomposition, tool use, web search, and human-in-the-loop escalation
  • Build and operate production-grade REST APIs to serve AI and agent capabilities
  • Develop and maintain scalable data pipelines and background workers that support AI workloads and agent execution
  • Design and maintain retrieval-augmented generation (RAG) pipelines using embeddings, Elasticsearch, structured data, and web-based sources
  • Develop and maintain automated evaluation pipelines for agent behavior and outputs using MLflow and related tooling
  • Improve agent reliability, latency, and cost through prompt engineering, prompt management techniques, and workflow optimization
  • Integrate agents with internal services, APIs, data stores, and asynchronous workers using queues such as RabbitMQ or Redis
  • Monitor and operate agent systems using observability platforms such as Datadog, including alerting and incident response
What we offer
What we offer
  • Paid Time Off
  • Comprehensive benefits plan
  • Company RRSP Match
  • Development opportunities through the LinkedIn Learning platform
Read More
Arrow Right
New

AI Engineer II

Guidepoint is seeking an AI Engineer II to join our Toronto-based AI team. The T...
Location
Location
Canada , Toronto
Salary
Salary:
Not provided
modoras.com Logo
Modoras Accounting Syd
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3–5 years of professional experience designing, building, and operating production-grade backend systems
  • 2+ years of hands-on experience building and operating Generative AI and agent systems in production
  • Strong Python engineering skills, with experience building scalable REST APIs using frameworks such as FastAPI
  • Working knowledge of JavaScript and Node.js
  • Hands-on experience building and maintaining agent-based AI systems in production using frameworks such as LangChain or LangGraph
  • Experience working with large language models from providers such as OpenAI, Anthropic, or Google Gemini, including prompt engineering and tool integration
  • Practical experience building and operating RAG systems using embeddings and retrieval systems such as Elasticsearch
  • Experience evaluating AI or agent systems beyond manual testing, including automated or programmatic evaluation using tools such as MLflow
  • Familiarity with asynchronous processing and workers using technologies such as RabbitMQ or Redis
  • Experience with monitoring, alerting, deployment, and CI/CD pipelines in cloud-native environments using Kubernetes
Job Responsibility
Job Responsibility
  • Design and implement AI systems and agents that automate compliance and editorial workflows in support of research enablement
  • Design and implement agent workflows using frameworks such as LangGraph and LangChain
  • Build agents that perform intent interpretation, task decomposition, tool use, web search, and human-in-the-loop escalation
  • Build and operate production-grade REST APIs to serve AI and agent capabilities
  • Develop and maintain scalable data pipelines and background workers that support AI workloads and agent execution
  • Design and maintain retrieval-augmented generation (RAG) pipelines using embeddings, Elasticsearch, structured data, and web-based sources
  • Develop and maintain automated evaluation pipelines for agent behavior and outputs using MLflow and related tooling
  • Improve agent reliability, latency, and cost through prompt engineering, prompt management techniques, and workflow optimization
  • Integrate agents with internal services, APIs, data stores, and asynchronous workers using queues such as RabbitMQ or Redis
  • Monitor and operate agent systems using observability platforms such as Datadog, including alerting and incident response
What we offer
What we offer
  • Paid Time Off
  • Comprehensive benefits plan
  • Company RRSP Match
  • Development opportunities through the LinkedIn Learning platform
  • Fulltime
Read More
Arrow Right

AI Engineer II

In the 1980s, Microsoft Office revolutionized global industries and empowered th...
Location
Location
United States
Salary
Salary:
100600.00 - 199000.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Engineering, or related technical field
  • 2+ years technical engineering experience including some production experience building applications with LLMs and/or deploying ML/LLM/agent systems
  • coding in languages including, but not limited to, Python, Typescript/Javascript, C, C++, or C#
Job Responsibility
Job Responsibility
  • Prototype and iterate on AI agent experiences that reimagine Office workflows, rapidly, validating feasibility, UX, and technical approaches, while pushing the frontier of the agent-driven productivity
  • Ship features by operating at the true intersection of AS and SWE, where a single AI engineer can unlock disproportionate impact by translating model capability and product intent into intelligent, production-ready systems
  • Design and develop tools and evaluation frameworks to observe and monitor real-time agent health, investigate customer feedback, and build continuous learning systems for end-to-end AI workflows
  • Invent and lead industry-defining multimodal AI experiences across Microsoft 365 by exploring new interaction paradigms and system architectures that unlock and define natural user interface between human and AI
  • Continuous learning and stay current with advances in generative AI and software engineering. Invest time in learning new tools and frameworks, propose improvements to build processes, and share knowledge with colleagues
  • Embody Microsoft culture and values
  • Fulltime
Read More
Arrow Right