This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are seeking a highly progressive Platform Engineer specializing in AI infrastructure and agentic execution environments to join our core cloud enablement team in Vancouver. In this role, you will bridge the gap between traditional Site Reliability Engineering (SRE) and cutting-edge Agentic AI operations. You will design, build, and operate secure multi-cloud landing zones, developer "golden paths," and reusable automated frameworks that empower application teams to safely deploy AI agents at scale.
Job Responsibility
Build integration patterns, API mediation layers, and approval workflows supporting autonomous AI agent tool execution and runtime function calling
Integrate advanced distributed telemetry for agent runs (execution traces, evaluation metrics, latency logs, and token cost analytics)
Establish runtime safety controls for AI applications, embedding automated rollback scripts, cost control ceilings, and master kill-switches
Build and scale highly secure, automated multi-cloud landing zones (AWS and Azure) utilizing reusable Terraform modules
Construct and maintain robust GitLab CI/CD pipelines, package registries, and automated infrastructure release strategies
Implement strict automated infrastructure guardrails using Open Policy Agent (OPA), Conftest, or Azure Policies to guarantee security without breaking developer velocity
Champion Site Reliability Engineering standards by managing Service Level Objectives (SLOs), calculating error budgets, configuring autoscaling matrices, and leading chaos engineering simulations