This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Scale’s rapidly growing International Public Sector team is focused on using AI to address critical challenges facing the public sector around the world. Our core work consists of: Creating custom AI applications that will impact millions of citizens; Generating high-quality training data for national LLMs; Upskilling and advisory services to spread the impact of AI. As a Production AI Ops Lead, you will design and develop the production lifecycle of full-stack AI applications, while supporting end-to-end system reliability, real-time inference observability, sovereign data orchestration, high-security software integration, and the resilient cloud infrastructure required for our international government partners.
Job Responsibility:
Own the production outcome: Take full accountability for the long-term performance and reliability of AI use cases deployed across international government agencies
Ensure Full-Stack integrity: Oversee the end-to-end health of the platform, ensuring seamless integration between the AI core and all full-stack components, from APIs to UI, to maintain a responsive and production-ready environment
Scale the feedback loop: Build automated systems to monitor model performance and data drift across geographically dispersed environments, ensuring the right levels of reliability
Navigate global compliance: Manage the technical lifecycle within diverse regulatory frameworks
Incident command: Lead the response for production issues in mission-critical environments, ensuring rapid resolution and building the guardrails to prevent them from happening again
Bridge the gap: Translate deep technical performance metrics into clear insights for senior international government officials
Drive product evolution: Partner with our Engineering and ML teams to ensure the lessons learned in the field directly influence the technical architecture and decisions of future use cases
Requirements:
6+ years in a high-impact technical role (SRE, FDE or MLOps) with experience in the public sector
Familiarity with international government security standards and the complexities of deploying sovereign AI
Proven experience maintaining production-grade applications with a deep understanding of the full request lifecycle-connecting frontend/API layers to the backend and AI core
Proficiency in coding and the modern AI infrastructure, including Kubernetes, vector databases, agentic development, and LLM observability tools
Ownership: You treat every production deployment as your own. You race toward solving hard problems before the customer even sees them
Reliability: You understand that in the public sector, a model failure may be a risk to public safety or privacy
Customer communication: The ability to explain to a high-ranking official why the performance of the system has degraded and how we are fixing it