CrawlJobs Logo

Principal AI Ops Architect

scale.com Logo

Scale

Location Icon

Location:
Qatar; United Kingdom , Doha

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

Scale’s rapidly growing Global Public Sector team is focused on using AI to address critical challenges facing the public sector around the world. Our core work consists of: Creating custom AI applications that will impact millions of citizens; Generating high-quality training data for national LLMs; Upskilling and advisory services to spread the impact of AI. As a Principal AI Ops Architect, you will design and develop the production lifecycle of full-stack AI applications, while supporting end-to-end system reliability, real-time inference observability, sovereign data orchestration, high-security software integration, and the resilient cloud infrastructure required for our international government partners.

Job Responsibility:

  • Own the production outcome: Take full accountability for the long-term performance and reliability of AI use cases deployed across international government agencies
  • Ensure Full-Stack integrity: Oversee the end-to-end health of the platform, ensuring seamless integration between the AI core and all full-stack components, from APIs to UI, to maintain a responsive and production-ready environment
  • Scale the feedback loop: Build automated systems to monitor model performance and data drift across geographically dispersed environments, ensuring the right levels of reliability
  • Navigate global compliance: Manage the technical lifecycle within diverse regulatory frameworks
  • Incident command: Lead the response for production issues in mission-critical environments, ensuring rapid resolution and building the guardrails to prevent them from happening again
  • Bridge the gap: Translate deep technical performance metrics into clear insights for senior international government officials
  • Drive product evolution: Partner with our Engineering and ML teams to ensure the lessons learned in the field directly influence the technical architecture and decisions of future use cases

Requirements:

  • 6+ years in a high-impact technical role (SRE, FDE or MLOps) with experience in the public sector
  • Familiarity with international government security standards and the complexities of deploying sovereign AI
  • Proven experience maintaining production-grade applications with a deep understanding of the full request lifecycle-connecting frontend/API layers to the backend and AI core
  • Proficiency in coding and the modern AI infrastructure, including Kubernetes, vector databases, agentic development, and LLM observability tools
  • Ownership: You treat every production deployment as your own. You race toward solving hard problems before the customer even sees them
  • Reliability: You understand that in the public sector, a model failure may be a risk to public safety or privacy
  • Customer communication: The ability to explain to a high-ranking official why the performance of the system has degraded and how we are fixing it

Additional Information:

Job Posted:
March 20, 2026

Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Principal AI Ops Architect

Principal Engineer

The Principal AI/ML Operations Engineer leads the architecture, automation, and ...
Location
Location
United States , Pleasanton, California
Salary
Salary:
251000.00 - 314500.00 USD / Year
blackline.com Logo
BlackLine
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Machine Learning, Data Science, or a related field
  • 10+ years in ML infrastructure, DevOps, and software system architecture
  • 4+ years in leading MLOps or AI Ops platforms
  • Strong programming skills in languages such as Python, Java, or Scala
  • Expertise in ML frameworks (TensorFlow, PyTorch, scikit-learn) and orchestration tools (Airflow, Kubeflow, Vertex AI, MLflow)
  • Proven experience operating production pipelines for ML and LLM-based systems across cloud ecosystems (GCP, AWS, Azure)
  • Deep familiarity with LangChain, LangGraph, ADK or similar agentic system runtime management
  • Strong competencies in CI/CD, IaC, and DevSecOps pipelines integrating testing, compliance, and deployment automation
  • Hands-on with observability stacks (Prometheus, Grafana, Newrelic) for model and agent performance tracking
  • Understanding of governance frameworks for Responsible AI, auditability, and cost metering across training and inference workloads
Job Responsibility
Job Responsibility
  • Define enterprise-level standards and reference architectures for ML-Ops and AIOps systems
  • Partner with data science, security, and product teams to set evaluation and governance standards (Guardrails, Bias, Drift, Latency SLAs)
  • Mentor senior engineers and drive design reviews for ML pipelines, model registries, and agentic runtime environments
  • Lead incident response and reliability strategies for ML/AI systems
  • Lead the deployment of AI models and systems in various environments
  • Collaborate with development teams to integrate AI solutions into existing workflows and applications
  • Ensure seamless integration with different platforms and technologies
  • Define and manage MCP Registry for agentic component onboarding, lifecycle versioning, and dependency governance
  • Build CI/CD pipelines automating LLM agent deployment, policy validation, and prompt evaluation of workflows
  • Develop and operationalize experimentation frameworks for agent evaluations, scenario regression, and performance analytics
What we offer
What we offer
  • short-term and long-term incentive programs
  • robust offering of benefit and wellness plans
  • Fulltime
Read More
Arrow Right

Senior Principal Technical Program Manager - ML Platform

Location
Location
Salary
Salary:
231300.00 - 301975.00 USD / Year
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience on software teams as Development Manager, Technical Product Manager or TPM leading technical platforms areas
  • Deep domain experience in AI and/or Search. Example: Model Inference, Model Evaluation, Model Training, LLM Ops, Semantic Search, Search Relevance, etc.
  • Partner with Engineering in defining direction, strategy and execution at Platform level
  • Strategic thinking and ability to understand business objectives to translate them into technical problems and programs.
  • Technical understanding of systems involved. Willingness to develop domain expertise in the area they operate - storage, networking, authentication, capacity management, service deployments, etc.
  • TPMs are not expected to write or read code, but are expected to understand system flows, block architectures, APIs and such.
  • Experience defining and running end-to-end complex technical programs
  • Strong leadership, organizational, and communication skills
Job Responsibility
Job Responsibility
  • Understand and stay up-to-date on latest innovations in AI and Search. Partner closely with engineering teams to translate these into practical platform evolution for Atlassian bringing value to our customers.
  • Analyze business objectives, customer needs, product adoption inhibitors and opportunities, industry trends, and based on these, in close collaboration with your stakeholders, define a long-term strategy and roadmap for your platform and product components.
  • Understand business objectives and translate them into technical systems problems that need to be prioritized solved in the current business environment.
  • Define specific systems programs and create a plan of action for realizing those programs. Such programs could be around capacity planning, migration efforts, high availability, network architecture, performance optimization, reliability improvements and more.
  • Use your technical understanding of Atlassian and related systems to partner with and influence engineers and architects in making progress on these problems.
  • Responsible for taking a systematic approach to engineering problems. This includes: prioritizing tasks, scoping out the project, defining objectives, and making consistent progress against each of these.
  • Be accountable for the success of these technical programs by managing the entire lifecycle from initiation to forecasting, budgeting, scheduling, etc.
  • Manage complex dependencies and projects with a broad scope across the company
What we offer
What we offer
  • health and wellbeing resources
  • paid volunteer days
Read More
Arrow Right

Business Systems Architect Principal

The Business Systems Architect Principal is central to BT International’s transf...
Location
Location
Hungary , Budapest
Salary
Salary:
Not provided
plus.net Logo
Plusnet
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strategic Architecture Leadership – Proven ability to define and communicate architectural vision for complex business systems landscapes, with track record driving large-scale transformation programmes in regulated telco environments
  • Business Systems Domain Expertise – Deep understanding of BAU systems (sales, service, enterprise applications) and service operations platforms (service desk, AI Ops, lead-to-cash, billing, inventory) with knowledge of how these capabilities support business operations and cost optimisation
  • Systems Integration – Extensive experience linking together various IT systems, services and software across BAU applications, service operations platforms and NaaS capabilities to enable functional operation and support business processes
  • Vendor and Stakeholder Management – Strong ability to work with SaaS platforms and enterprise vendors as strategic partners, negotiate technical implementations and manage complex stakeholder relationships across business and operational teams
  • Modernisation Patterns – Expert knowledge of incremental modernisation approaches including service operations automation, AI-driven process optimisation and cloud-native patterns that balance operational continuity with transformation
  • Technical Depth – Hands-on background in business systems integration and service operations with coding capability, enabling credibility with engineering teams and active participation in technical spikes when needed
  • Operational Excellence – Understanding of comprehensive observability, service resilience and operational automation approaches including metrics, logging, distributed tracing and telemetry pipelines for business systems
  • NaaS Integration Understanding – Knowledge of how BAU systems integrate with network-as-a-service models, enabling sales and service operations to leverage aaS capabilities whilst maintaining operational stability
  • Leadership and Influence – Ability to lead blended IT teams (support, maintenance, BAU systems, service operations) through transformation, build consensus across organisational boundaries and develop technical leadership capability in operational functions
  • Extensive experience leading IT systems and BSS architecture in telecommunications or complex B2B environments, with demonstrated success modernizing legacy landscapes across multiple system domains
Job Responsibility
Job Responsibility
  • Define and lead the architectural strategy for business systems across BAU applications (sales, service, enterprise) and service operations (AI Ops, Service Desk, L2C including Pricing/Design/Quoting/SRM, billing, inventory), establishing target state architecture that optimises legacy systems whilst designing future-ready capabilities
  • Own the business systems portfolio optimisation roadmap, making strategic keep/modernise/retire decisions for BAU systems and service operations platforms based on technical fitness, cost-effectiveness and alignment with asset-light, NaaS-based operating model
  • Establish systems integration approaches that link together BAU applications, service operations platforms and NaaS capabilities, enabling functional operation across sales, service and enterprise systems whilst supporting business processes
  • Lead vendor strategy for business systems platforms including SaaS applications, enterprise systems and service operations tools, negotiating strategic partnerships that simplify IT landscape whilst aligning with aaS business needs and reducing vendor dependencies
  • Champion modern architecture patterns including service operations automation, AI-driven process optimisation, billing automation, inventory accuracy improvement and self-service capabilities that reduce manual propensity and improve cost-to-serve metrics
  • Design for operational excellence by establishing comprehensive observability across business systems including metrics, logging, distributed tracing and telemetry that enable proactive issue detection and support continuous improvement in service delivery
  • Collaborate with Data and AI architects to leverage data platforms and AI capabilities for business intelligence, service automation and customer insights, ensuring business systems generate valuable data and support AI-driven process improvements
  • Drive architectural governance through design reviews and architecture conformance processes, ensuring business systems initiatives align with enterprise standards, security requirements and support transformation to asset-light operating model
  • Build and mentor Business Systems architects who work with BAU operations and service delivery teams, establishing technical leadership capability and fostering architectural thinking across support, maintenance and enterprise systems functions
  • Work with engineering leadership to establish integration patterns that connect business systems with NaaS capabilities, enabling sales and service operations to leverage network-as-a-service whilst maintaining operational stability and cost-effectiveness
What we offer
What we offer
  • Cafeteria package - HUF 600,000/ year
  • Performance-based bonus
  • Comprehensive private health care package for all the employees, which can be extended to family members
  • Nursery support for mothers returning from maternity
  • Extended paternity leave: 10+10 day fully paid days
  • Commuting allowance
  • Home office allowance
  • Employee discount opportunities
  • Highly affordable mobile packages for the family as well
  • New high-class offices both in Budapest and Debrecen
  • Fulltime
Read More
Arrow Right

Full Stack AI Engineer

We are seeking a highly skilled Full Stack AI Engineer to join our team and work...
Location
Location
India , Vadodara;Ahmedabad;Indore
Salary
Salary:
Not provided
Prakash Software Solutions Pvt. Ltd
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8-10 years of experience in software engineering with a strong focus on AI/ML
  • Proficiency in frontend frameworks like React, Angular, or Vue.js
  • Strong hands-on experience with backend technologies like Node.js, Python (with frameworks like Flask, Django, or FastAPI), or Java
  • Experience with cloud platforms such as AWS, Azure, or GCP
  • Proven ability to design and implement complex, scalable, and maintainable architectures
  • Excellent problem-solving and analytical skills
  • Strong communication and collaboration skills
  • Passion for continuous learning and staying up to date with the latest advancements in AI/ML
  • End-to-end experience with at least one full AI stack on Azure, AWS, or GCP, including components such as Azure Machine Learning, AWS SageMaker, or Google AI Platform
  • Hands-on experience with agent frameworks like Autogen, AWS Agent Framework, LangGraph etc.
Job Responsibility
Job Responsibility
  • Collaborate with the Principal Architect to design and implement AI agents and multi-agent frameworks
  • Develop and maintain robust, scalable, and maintainable microservices architectures
  • Ensure seamless integration of AI agents , MCP Servers with core systems and databases
  • Develop APIs and SDKs for internal and external consumption
  • Work closely with data scientists to fine-tune and optimize LLMs for specific tasks and domains
  • Implement ML Ops practices, including CI/CD pipelines, model versioning, and experiment tracking
  • Design and implement comprehensive monitoring and observability solutions to track model performance, identify anomalies, and ensure system stability
  • Utilize containerization technologies such as Docker and Kubernetes for efficient deployment and scaling of applications
  • Leverage cloud platforms such as AWS, Azure, or GCP for infrastructure and services
  • Design and implement data pipelines for efficient data ingestion, transformation, and storage
  • Fulltime
Read More
Arrow Right

Principal ML Ops Engineer

Principal ML Ops Engineer who will lead the design and operationalization of ML ...
Location
Location
United States , Charlotte; Phoenix; Johnston; Westwood; Iselin; Boston
Salary
Salary:
175000.00 - 230000.00 USD / Year
citizensbank.com Logo
Citizens Bank
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of experience with Python for scripting ML workflows
  • 5+ years of experience deploying ML pipelines and systems using AWS SageMaker
  • 3+ years of experience developing APIs with Flask, Django, or FastAPI
  • 2+ years of experience with ML frameworks and tools such as scikit-learn, PyTorch, XGBoost, LightGBM, MLflow
  • Solid understanding of the ML lifecycle: model development, training, validation, deployment, and monitoring
  • Solid understanding of CI/CD pipelines for ML workflows using Bitbucket, Jenkins, Nexus
  • Experience with ETL processes for ML pipelines using Spark and Kafka
  • Bachelor’s Degree or equivalent combination of education, training, and experience required
Job Responsibility
Job Responsibility
  • Lead and mentor engineering teams, including GCC talent development and potential onshore leadership
  • Architect, design, and build ML engineering systems on the CFG ML Platform to accelerate ML pipeline delivery
  • Develop and enhance platform capabilities and frameworks to standardize and automate ML pipeline deployment
  • Implement capabilities such as feature stores, feature tracking, feature serving (real-time and batch), model performance monitoring, model lineage tracking, model health, and model serving and consumption (real-time, batch, event-triggered, near real-time using Kafka)
  • Define processes, research market trends, and implement best practices for ML pipeline development and deployment
  • Collaborate with business teams, data science teams, enterprise architects, and security to uphold ML engineering standards
  • Develop CI/CD pipelines for continuous integration and delivery of ML models
  • Identify and automate ML pipeline and model deployment patterns to streamline workflows
  • Troubleshoot and resolve issues related to ML system performance and deployment
  • Contribute to GenAI initiatives, including building intelligent agents and integrating them into ML Ops workflows
What we offer
What we offer
  • comprehensive medical, dental and vision coverage
  • retirement benefits
  • maternity/paternity leave
  • flexible work arrangements
  • education reimbursement
  • wellness programs
  • competitive pay
  • opportunity to earn an annual discretionary bonus
  • Fulltime
Read More
Arrow Right

Principal Engineer, Model Dev Platform

As the Principal Engineer for the Model Development Platform at Wayve, you will ...
Location
Location
United States , Sunnyvale
Salary
Salary:
Not provided
wayve.ai Logo
Wayve
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Technical Leadership at Scale – 10+ years of experience designing and building large-scale distributed systems, ML/AI infrastructure, full stack web application, or developer platforms, including at least 3 years as a staff or principal-level engineer
  • Architectural Depth & Breadth – Proven ability to design systems spanning web platforms, ML pipelines, and large-scale compute orchestration (e.g., Spark, Ray, Kubernetes, Airflow, MLflow)
  • Reliability & Performance Mindset – Experience driving platform reliability improvements, defining SLAs/SLOs, and building self-healing and observable systems that operate at “four nines” availability or better
  • Hands-On Systems Design – Deep understanding of distributed computing, workflow orchestration, data modeling, and API design, with the ability to write and review production-quality code
  • Collaborative Influence – Excellent communication and cross-functional collaboration skills
  • ability to guide engineers, managers, and researchers toward unified technical direction
  • Mentorship & Culture – Demonstrated success in mentoring engineers across levels and cultivating a culture of engineering excellence
  • Education – Bachelor’s degree in Computer Science, Software Engineering, or related field (advanced degree preferred, or equivalent experience)
Job Responsibility
Job Responsibility
  • Design and evolve the overarching architecture of the model development platform, ensuring system-wide reliability, observability, and scalability
  • Work across disciplines—from front-end web UIs to large-scale distributed training, from Spark-based data pipelines to experiment scheduling algorithms using linear optimization—to unify the platform’s architecture and ensure smooth interoperability between systems
  • Dive deep into the thorniest technical challenges faced by individual subteams, bringing your expertise in distributed systems, large-scale compute, and system design to bear
  • Develop and refine systems that optimize how models are tested—whether in simulation or on-road—balancing constraints like hardware availability, safety requirements, and research priorities
  • Architect data processing pipelines capable of ingesting, transforming, and enriching petabytes of sensor data from the global fleet
  • Serve as a mentor and coach for engineers across the organization—developing technical talent, improving design practices, and fostering a culture of learning and technical excellence
  • Partner with Product Management, Research, and Operations to align technical architecture with user needs and product vision
Read More
Arrow Right

Senior Principal, Machine Learning & Artificial Intelligence

Xometry is seeking a Senior Principal, Machine Learning & Artificial Intelligenc...
Location
Location
United States , North Bethesda
Salary
Salary:
150000.00 - 196000.00 USD / Year
cherry.vc Logo
Cherry Ventures
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master’s or PhD in Computer Science, Machine Learning, Applied Mathematics, Electrical Engineering or related field (PhD preferred for deep generative/3D modeling emphasis)
  • 12+ years of professional experience in machine learning, artificial intelligence, or data science roles — with several years in senior or principal capacity leading major programs
  • Demonstrated experience architecting and delivering large scale ML/AI solutions - end-to-end from data ingestion, feature engineering, model training, evaluation, deployment, monitoring & operations
  • Deep expertise in machine learning frameworks (TensorFlow, PyTorch), data engineering, model infrastructure, MLOps, cloud platforms (AWS, GCP, Azure), and scalable production systems
  • Strong exposure to generative AI techniques (large language models, multimodal models, diffusion, GANs) and translating them into business use-cases
  • Excellent cross-functional collaboration skills: you can partner with product, engineering, ops, manufacturing, design, business leadership and translate technical concepts into business language
  • Proven ability to influence without direct authority and drive change across organizations
  • Strong communication and presentation skills
  • you can articulate technical vision, roadmap, trade-offs and outcomes to senior leadership
  • Track record of identifying and delivering measurable business impact via ML/AI - e.g., revenue growth, cost savings, improved efficiency
Job Responsibility
Job Responsibility
  • Serve as the technical leader of multiple large, cross-functional ML/AI solutions with significant, lasting impact across Xometry’s business
  • Define, and drive the 18-24-month ML/AI technical roadmap - balancing breakthrough innovation (e.g., generative 3D, foundation models, large-scale vision/3D pipelines) with reliable business value delivery (e.g., quoting accuracy, lead-time reduction, defect detection, cost optimization)
  • Influence partner roadmaps across engineering, product, operations, and business teams: align priorities, advise on resourcing, champion ML/AI best practices
  • Proactively identify and remove roadblocks for teams and projects — whether technical, operational, data-related, or resource constraints
  • Mentorship of individuals and technical teams
  • Act as a trusted SME with strong cross-functional partnerships: your insights and guidance will shape ML/AI infrastructure, data, model, infrastructure, and tooling decisions
  • Play a leadership role in identifying areas of opportunity — e.g., using ML/AI to unlock new revenue streams (e.g., rapid quoting for new manufacturing modalities, generative design for customers), reduce cost (e.g., automated quality inspection), or optimize efficiency (e.g., 3D-geometry classification, defect detection, generating manufacturing ready models)
  • Address problems adjacent to your sphere of immediate influence: proactively tackle challenges outside direct scope and champion holistic solutions
  • Stay ahead of industry developments in ML, AI, generative AI, 2D/3D modeling and manufacturing tech
  • translate insights into the improvement of internal best practices, tooling, frameworks, model governance, data pipelines, and operationalization
What we offer
What we offer
  • annual bonus
  • 401(k) match
  • medical, dental and vision insurance
  • life and disability insurance
  • generous paid time off including vacation, sick leave, floating and fixed holidays, maternity and bonding leave
  • EAP, other wellbeing resources
  • Fulltime
Read More
Arrow Right

Senior Principal, Machine Learning & Artificial Intelligence

Xometry is seeking a Senior Principal, Machine Learning & Artificial Intelligenc...
Location
Location
United States , Waltham
Salary
Salary:
150000.00 - 196000.00 USD / Year
cherry.vc Logo
Cherry Ventures
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master’s or PhD in Computer Science, Machine Learning, Applied Mathematics, Electrical Engineering or related field (PhD preferred for deep generative/3D modeling emphasis)
  • 12+ years of professional experience in machine learning, artificial intelligence, or data science roles — with several years in senior or principal capacity leading major programs
  • Demonstrated experience architecting and delivering large scale ML/AI solutions - end-to-end from data ingestion, feature engineering, model training, evaluation, deployment, monitoring & operations
  • Deep expertise in machine learning frameworks (TensorFlow, PyTorch), data engineering, model infrastructure, MLOps, cloud platforms (AWS, GCP, Azure), and scalable production systems
  • Experience in 3D modeling / geometry / computer vision / generative models (e.g., point-cloud processing, mesh processing, text23D, image23D, CAD/CAM integration) is highly desirable
  • Strong exposure to generative AI techniques (large language models, multimodal models, diffusion, GANs) and translating them into business use-cases
  • Excellent cross-functional collaboration skills: you can partner with product, engineering, ops, manufacturing, design, business leadership and translate technical concepts into business language
  • Proven ability to influence without direct authority and drive change across organizations
  • Strong communication and presentation skills
  • you can articulate technical vision, roadmap, trade-offs and outcomes to senior leadership
Job Responsibility
Job Responsibility
  • Serve as the technical leader of multiple large, cross-functional ML/AI solutions with significant, lasting impact across Xometry’s business
  • Define, and drive the 18-24-month ML/AI technical roadmap - balancing breakthrough innovation (e.g., generative 3D, foundation models, large-scale vision/3D pipelines) with reliable business value delivery (e.g., quoting accuracy, lead-time reduction, defect detection, cost optimization)
  • Influence partner roadmaps across engineering, product, operations, and business teams: align priorities, advise on resourcing, champion ML/AI best practices
  • Proactively identify and remove roadblocks for teams and projects — whether technical, operational, data-related, or resource constraints
  • Mentorship of individuals and technical teams
  • Act as a trusted SME with strong cross-functional partnerships: your insights and guidance will shape ML/AI infrastructure, data, model, infrastructure, and tooling decisions
  • Play a leadership role in identifying areas of opportunity — e.g., using ML/AI to unlock new revenue streams (e.g., rapid quoting for new manufacturing modalities, generative design for customers), reduce cost (e.g., automated quality inspection), or optimize efficiency (e.g., 3D-geometry classification, defect detection, generating manufacturing ready models)
  • Address problems adjacent to your sphere of immediate influence: proactively tackle challenges outside direct scope and champion holistic solutions
  • Stay ahead of industry developments in ML, AI, generative AI, 2D/3D modeling and manufacturing tech
  • translate insights into the improvement of internal best practices, tooling, frameworks, model governance, data pipelines, and operationalization
What we offer
What we offer
  • 401(k) match
  • medical, dental and vision insurance
  • life and disability insurance
  • generous paid time off including vacation, sick leave, floating and fixed holidays, maternity and bonding leave
  • EAP, other wellbeing resources
  • Fulltime
Read More
Arrow Right