This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Dialpad’s AI Engineering organization is responsible for building and maintaining customer-facing AI features at scale across all of our cloud-native products and services. Dialpad's Agentic Runtime team owns the infrastructure and execution engine that runs AI agents at scale across Dialpad's core product modalities — including voice, messaging, video, and digital engagement.
Job Responsibility:
Contribute to the design, development, and maintenance of agentic runtime systems, including agent orchestration, tool execution pipelines, and multi-step reasoning loops
Build and optimize core runtime components, including task planners, action dispatchers, memory managers, and context window management systems
Work on agent coordination techniques, including dynamic tool selection, parallel agent execution, state management, and result aggregation across multi-agent workflows
Maintain and enhance highly scalable agentic platforms with a focus on low-latency execution, cost efficiency, and deterministic behavior
Ensure high availability, reliability, and fault tolerance in agent runtime services, including graceful degradation when LLM or tool calls fail
Collaborate with cross-functional teams — including ML researchers, product, and platform engineers — to translate agentic product requirements into robust runtime infrastructure
Develop and optimize real-time distributed systems, microservices, and event-driven architectures powering agentic task execution
Design and implement sandboxed execution environments for safe agent use of tools, code execution, and external API calls
Implement and maintain monitoring, alerting, and performance metrics covering agent run success rates, token consumption, latency, and cost attribution
Evaluate and integrate emerging agentic frameworks, LLM APIs, and tooling ecosystems to continuously improve platform capabilities
Write clean, modular, and well-tested code while following best engineering practices in a rapidly evolving problem space
Participate in code reviews to ensure the quality, maintainability, and scalability of runtime components
Provide mentorship and technical guidance to junior engineers navigating the unique challenges of agentic systems
Requirements:
3–6 years of experience in distributed systems, platform engineering, or ML infrastructure, with exposure to LLM-based or agentic systems strongly preferred
Strong understanding of agent architectures, including ReAct, plan-and-execute, and multi-agent coordination patterns
Deep knowledge of context management, prompt lifecycle, tool-call protocols (e.g., function calling, MCP), and agent memory strategies (short-term, episodic, and long-term)
Experience integrating and managing external tool ecosystems, including web search, code interpreters, databases, and third-party APIs
Familiarity with retrieval-augmented generation (RAG) and how retrieval fits into broader agentic pipelines
Understanding of LLM output reliability challenges — hallucination, non-determinism, and retry/fallback strategies at runtime
Proficiency in Go and Python 3 (experience with Rust or TypeScript is a plus)
Strong understanding of distributed systems, microservices, and event-driven architectures suited to long-running agent tasks
Passion for real-time performance optimization, including streaming responses, async execution, and parallel tool invocation
Experience with API design using OpenAPI, Swagger, or equivalent, with an eye toward agentic interaction patterns
Knowledge of gRPC or equivalent RPC protocols for inter-service communication within agent runtimes
Experience with Docker and Kubernetes, including managing long-running or stateful agent workloads in containerized environments
Familiarity with cloud platforms (GCP preferred, AWS/Azure optional), including managed services relevant to agentic workloads such as queuing, secrets management, and compute autoscaling
Hands-on experience with Infrastructure as Code tools like Terraform or Ansible
Knowledge of CI/CD frameworks and continuous delivery practices, with comfort shipping infrastructure in a fast-moving research-adjacent environment