CrawlJobs Logo

Ai Systems Engineer

Canada, Kitchener Employment contract 132000.00 - 156750.00 CAD / Year · Job Posted May 04, 2026
Apply Position
Job Link Share

Job Description

We are hiring AI Systems Engineers to help build that machinery. This role is for engineers who like consequential junctions: between training outputs and deployable artifacts, between runtime systems and safe release, between quality claims and evidence, and between ambitious AI plans and systems that can actually carry them.This is not a research role, and it is not a generic support role. It is an implementation-heavy, building-focused engineering role on a small team responsible for making in-house AI capabilities easier to package, evaluate, deploy, promote, operate, and improve. AI Platform Engineering exists to shorten the path from emerging AI capability to reliable production impact. We build the shared systems, standards, and delivery pathways that let in-house models and AI capability packages move from candidate state into observable, rollback-safe production operation. Our work sits at the junction between model development, runtime systems, evaluation, and delivery. We enable the broader AI Platform division by making it faster and safer to ship new capabilities, improve existing ones, and learn from production behavior. This is a new team. The systems, interfaces, and standards are still being shaped. The work is highly consequential, highly practical, and closely tied to the company's broader AI strategy. We are not building one-off demos or isolated launches. We are building the machinery by which a growing AI organization can repeatedly deliver real capability into production.

Job Responsibility

  • Help design, build, and improve the systems that connect AI capability development to production reality
  • Improving how model and capability artifacts are packaged, versioned, promoted, and rolled back
  • Building or improving deployment and release pathways for AI-backed services
  • Enabling shadow-serving, staged rollout, and candidate-versus-incumbent comparison
  • Strengthening runtime behavior, observability, and debugging for model-backed systems
  • Building or automating evaluation systems that make release decisions evidence-based
  • Reducing bespoke coordination and strengthening the shared rails used by multiple AI teams

Requirements

  • Bachelor's degree in Computer Science, Engineering, or equivalent related experience
  • 2 to 6 years of professional software engineering experience, with a proven track record of shipping production infrastructure or real systems that matter
  • Experience in writing solid, maintainable production code and applying strong software engineering fundamentals to solve complex debugging challenges
  • Experience in operating within ambiguous, cross-functional environments where requirements evolve and interfaces are real
  • Expertise in building for reproducibility, operability, and rollout safety, focusing on the quality of change rather than just local implementation

Nice to have

  • Experience with cloud infrastructure, containerized environments, managed ML platforms, or service orchestration systems
  • Experience with model serving, deployment systems, experiment tracking, artifact/version management, or ML lifecycle tooling
  • Experience with distributed systems, service platforms, search/relevance systems, internal enablement tooling, or production AI platforms
  • Experience with testing, benchmarking, experimentation systems, or evaluation frameworks that informed release decisions
  • Exposure to applied AI, speech, conversational systems, customer-facing workflows, or other production ML domains

What we offer

  • Competitive salary
  • Comprehensive benefits
  • Real opportunities for growth
  • Cutting-edge AI tools
  • Robust training program
  • Inclusive offices
  • Great Place to Work culture

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Ai Systems Engineer

8 matching positions

Ai Systems Engineer

We are hiring founding AI Systems Engineers to help build that machinery. This r...
Location
Location
Canada , Kitchener
Salary
Salary:
111000.00 - 133500.00 CAD / Year
dialpad.com Logo
Dialpad
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Engineering, or equivalent related experience
  • 2 to 6 years of professional software engineering experience, with a proven track record of shipping production infrastructure or real systems that matter
  • Experience in writing solid, maintainable production code and applying strong software engineering fundamentals to solve complex debugging challenges
  • Experience in operating within ambiguous, cross-functional environments where requirements evolve and interfaces are real
  • Expertise in building for reproducibility, operability, and rollout safety, focusing on the quality of change rather than just local implementation
Job Responsibility
Job Responsibility
  • Help design, build, and improve the systems that connect AI capability development to production reality
  • Improving how model and capability artifacts are packaged, versioned, promoted, and rolled back
  • Building or improving deployment and release pathways for AI-backed services
  • Enabling shadow-serving, staged rollout, and candidate-versus-incumbent comparison
  • Strengthening runtime behavior, observability, and debugging for model-backed systems
  • Building or automating evaluation systems that make release decisions evidence-based
  • Reducing bespoke coordination and strengthening the shared rails used by multiple AI teams
What we offer
What we offer
  • Competitive salary
  • Comprehensive benefits
  • Real opportunities for growth
  • Cutting-edge AI tools
  • Robust training program
  • Inclusive office environment
  • Great Place to Work culture
  • Fulltime
Read More
Arrow Right

AI Systems Engineer – AI Model (Training & Inference)

The AMD AI Group is looking for a Senior Software Development Engineer to own th...
Location
Location
Canada , Markham
Salary
Salary:
106400.00 - 159600.00 CAD / Year
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Industry experience shipping production AI/ML infrastructure, with hands-on work spanning both training and inference.
  • Bachelor’s or Master’s degree or Ph.D in Computer/Software Engineering, Computer Science, or related technical discipline
Job Responsibility
Job Responsibility
  • Enable and optimize large-scale model training (LLMs, VLMs, MoE architectures) on AMD Instinct GPU clusters, ensuring correctness, reproducibility, and competitive throughput.
  • Build and maintain training infrastructure: job orchestration, distributed checkpointing, data loading pipelines, and storage optimization for multi-thousand GPU clusters on Kubernetes.
  • Debug and resolve training-specific issues including gradient norm explosions, non-deterministic behavior across GPU generations, and compute-communication overlap in distributed training (FSDP, DeepSpeed, Megatron-LM).
  • Optimize RCCL collective communication patterns for training workloads, including all-reduce, all-gather, and reduce-scatter across multi-node topologies.
  • Develop monitoring, alerting, and compliance infrastructure to ensure training cluster health, data security, and SLA adherence at scale.
  • Design and build end-to-end validation and testing infrastructure using proxy workloads, synthetic benchmarks, and configurable workload generators to systematically validate platform readiness across AMD Instinct GPU generations.
  • Write and optimize high-performance GPU kernels (GEMM, attention, quantized matmul, GPTQ/AWQ) in HIP, Triton, and MLIR targeting AMD Instinct architectures, with demonstrated ability to outperform open-source baselines.
  • Drive end-to-end inference enablement on new AMD GPU silicon - be among the first to get frontier models running on each new Instinct generation, creating reproducible guides and reference implementations.
  • Optimize inference serving frameworks (vLLM, SGLang, TorchServe) for AMD GPUs: batching strategies, KV-cache management, speculative decoding, and continuous batching for production throughput/latency targets.
  • Develop novel approaches to inference acceleration, including bio-inspired algorithms, SLM-assisted batching, and custom scheduling strategies that exploit AMD hardware characteristics.
  • Fulltime
Read More
Arrow Right

Staff AI Engineer - Agentic AI Systems

As a Staff AI Engineer, you will play a key role in designing and delivering hig...
Location
Location
India , Bengaluru, Karnataka, India | Hyderabad, Telangana, India | Pune, Maharashtra, India
Salary
Salary:
Not provided
teradata.com Logo
Teradata
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science or equivalent from a recognized institution
  • 8+ years of experience in backend services, distributed systems, or data platform development
  • Strong proficiency in Java, Go, or Python for service development
  • Deep understanding of design principles, distributed system patterns, and service architecture
  • Hands-on experience designing and developing RESTful APIs
  • Experience with SQL and NoSQL databases and data modelling
  • Strong debugging, problem solving, and troubleshooting skills
  • Experience with modern containerization and orchestration tools such as Kubernetes
  • Knowledge of public cloud platforms
  • Experience with AI productivity tools (e.g., GitHub Copilot)
Job Responsibility
Job Responsibility
  • Design, architect, develop, and maintain high quality systems, services, and applications with an emphasis on scalability, reliability, and performance
  • Collaborate with cross-functional engineers and product partners to shape architecture and consistently deliver end to end features
  • Build and integrate robust RESTful APIs, ensuring security, data consistency, and maintainability
  • Work with SQL and NoSQL databases to implement efficient data models and service access patterns
  • Apply and experiment with AI/ML technologies, including agentic AI and large language models (LLMs)
  • Use AI powered engineering tools to improve development quality, speed, and productivity
  • Mentor engineers, supporting them in technical planning, implementation, and best practices
  • Identify and resolve system performance bottlenecks, optimizing code, architecture, and infrastructure
  • Write unit and integration tests and participate in code reviews to uphold engineering excellence
  • Investigate production issues, ensuring timely and effective solutions
  • Fulltime
Read More
Arrow Right

Full Stack Engineer (AI & Agentic AI Systems)

The Full Stack Engineer (AI & Agentic AI Systems) is a strategic professional wh...
Location
Location
India , Pune; Chennai
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 2-5 years in an Apps Development role
  • Demonstrated execution capabilities
  • Strong analytical and quantitative skills
  • Data driven and results-oriented
  • Experience in running high traffic, distributed, cloud based services
  • Experience in affecting large culture change
  • Experience leading infrastructure programs
  • Skilled at working with third party service providers
  • Excellent written and oral communication skills
  • Bachelor’s/University degree or equivalent experience
Job Responsibility
Job Responsibility
  • Design and deliver end‑to‑end solutions spanning architecture, system design, low‑level design, and high‑quality coding across modern full‑stack environments
  • Build responsive, modular UI applications using React, integrating complex AI-driven workflows and real‑time interactions
  • Develop scalable, high‑performance backend services in Java / Python, implementing resilient APIs, event‑driven patterns, and microservices architectures
  • Engineer AI‑powered features leveraging Google Gemini LLM, Vertex AI, ADK, vector databases (A2A), RAG pipelines, MCP, context engineering, and advanced prompt engineering techniques
  • Implement secure, well‑structured REST and GraphQL APIs, ensuring reliability, versioning discipline, and clean integration patterns across platforms
  • Optimize system performance and scalability, applying profiling, load‑testing insights, caching strategies, and distributed system tuning
  • Drive robust CI/CD practices, integrating automated testing, code quality gates, containerization, and cloud‑native deployment pipelines
  • Partner with QE to build and maintain automated test suites (UI, API, integration, and performance), improving release quality and reducing regression risk
  • Identify, diagnose, and remediate performance bottlenecks, penetration testing vulnerabilities, and production issues with precision and root‑cause clarity
  • Collaborate cross‑functionally with AI scientists, architects, and product teams to translate business challenges into production‑ready, intelligent agentic systems
  • Fulltime
Read More
Arrow Right

Full Stack Engineer (AI & Agentic AI Systems)

The Full Stack Engineer (AI & Agentic AI Systems) is a strategic professional wh...
Location
Location
India , Pune; Chennai
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years in a product development/product management environment
  • Strong analytical and quantitative skills
  • Data driven and results-oriented
  • Experience delivering with an agile methodology
  • Experience in affecting large culture change
  • Experience leading infrastructure programs
  • Skilled at working with third party service providers
  • Excellent written and oral communication skills
  • Bachelor’s/University degree or equivalent experience
  • Strong expertise in SQL (Oracle, PostgreSQL)
Job Responsibility
Job Responsibility
  • Design and deliver end‑to‑end solutions spanning architecture, system design, low‑level design, and high‑quality coding across modern full‑stack environments
  • Build responsive, modular UI applications using React, integrating complex AI-driven workflows and real‑time interactions
  • Develop scalable, high‑performance backend services in Java / Python, implementing resilient APIs, event‑driven patterns, and microservices architectures
  • Engineer AI‑powered features leveraging Google Gemini LLM, Vertex AI, ADK, vector databases (A2A), RAG pipelines, MCP, context engineering, and advanced prompt engineering techniques
  • Implement secure, well‑structured REST and GraphQL APIs, ensuring reliability, versioning discipline, and clean integration patterns across platforms
  • Optimize system performance and scalability, applying profiling, load‑testing insights, caching strategies, and distributed system tuning
  • Drive robust CI/CD practices, integrating automated testing, code quality gates, containerization, and cloud‑native deployment pipelines
  • Partner with QE to build and maintain automated test suites (UI, API, integration, and performance), improving release quality and reducing regression risk
  • Identify, diagnose, and remediate performance bottlenecks, penetration testing vulnerabilities, and production issues with precision and root‑cause clarity
  • Collaborate cross‑functionally with AI scientists, architects, and product teams to translate business challenges into production‑ready, intelligent agentic systems
  • Fulltime
Read More
Arrow Right

HPC & AI Systems Engineer for Integrated Systems Test

HPC & AI Systems Engineer for Integrated Systems Test role at Hewlett Packard En...
Location
Location
Puerto Rico , Aguadilla
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or master's degree in Computer Engineering, Computer Science, Electrical Engineering, Information Systems, or equivalent
  • Minimum 4 years of experience
  • Experience with certification & submission to OS vendors of Linux (RedHat, SLES, Ubuntu, etc.), Windows Server operating systems, Windows Client operating systems, and VMWare (ESXi)
  • Experience installing and working with Linux, Windows and VMWare OSes
  • Experience in programming or scripting languages, Python, PowerShell, Perl, Linux Shell, Java, MySQL, MS SQL Server
  • Understanding of Redfish commands, RESTful API, and JSON format
  • Knowledge of creating and using Docker containers and VMs
  • Experience in configuring Storage (internal/external storage, file systems, and raid/non-raid settings) and Networking devices (iSCSI, FCoE, IPs, VLANs, Bonding, Jumbo Frames, LAGs)
  • Knowledge of networking concepts such as NIC teaming, VLANs, IPv4, IPv6
  • Excellent written and verbal communication skills in English
Job Responsibility
Job Responsibility
  • Work with Program & Product Management, technical leads, and product development teams to obtain product feature requirements
  • Design and implement new test features in existing and new test cases
  • Analyze, debug and provide feedback/resolution on issues uncovered by test team prior to submission of results to OS vendors for approval
  • Implement software solutions for multiple test programs/projects with internal and outsourced development partners
  • Review and evaluate the implementation and use of test automation and test tools
  • Planning, development, and implementation of software tools for the testing and evaluation of current and next-generation HPE HPC products
  • Debug and analyze issues to a successful resolution
  • Perform testing in local and remote labs
  • Drive appropriate automated test execution to test engineers at various global locations
  • Provide training and guidance to test teams both onshore and offshore
What we offer
What we offer
  • Health & Wellbeing benefits
  • Personal & Professional Development programs
  • Unconditional Inclusion environment
  • Comprehensive suite of benefits that supports physical, financial and emotional wellbeing
  • Fulltime
Read More
Arrow Right

Ai / Generative Ai Engineer - Agentic Systems & Platforms

We are seeking an AI / Generative AI Engineer to design and build production‑gra...
Location
Location
India , Pune
Salary
Salary:
Not provided
vodafone.com Logo
Vodafone
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong Python and backend engineering expertise
  • Experienced in building and deploying LLM‑based applications, RAG systems, and agentic workflows
  • Comfortable working with orchestration frameworks such as LangChain, LangGraph, or Semantic Kernel
  • Knowledgeable in data engineering concepts, including data pipelines and vector databases
  • Experienced with at least one major cloud platform (Azure, AWS, or GCP)
  • Familiar with containerisation and deployment using Docker and Kubernetes
  • Able to collaborate effectively in cross‑functional teams and communicate technical concepts clearly
  • Committed to secure, ethical, and responsible use of AI technologies
Job Responsibility
Job Responsibility
  • Build LLM‑based copilots and intelligent assistants for enterprise use cases
  • Design and implement Retrieval‑Augmented Generation (RAG) solutions using structured and unstructured data
  • Develop and orchestrate multi‑agent workflows using modern frameworks and orchestration logic
  • Create and maintain backend APIs using Python frameworks such as FastAPI or Flask
  • Integrate enterprise platforms and external APIs into scalable AI solutions
  • Build data pipelines for ingestion, transformation, and embedding workflows
  • Implement and manage vector databases, ensuring performance and reliability
  • Establish evaluation frameworks to assess LLM output quality and relevance
  • Implement observability practices including tracing, logging, and monitoring
  • Apply software engineering best practices including modular code design, testing, documentation, and CI/CD pipelines
What we offer
What we offer
  • Opportunities to work on cutting‑edge, enterprise‑scale AI systems
  • Exposure to complex, real‑world use cases with measurable business impact
  • A collaborative environment that supports continuous learning and growth
  • Competitive salary and access to Vodafone’s global career development ecosystem
  • Fulltime
Read More
Arrow Right

Ai / Generative Ai Engineer - Agentic Systems & Platforms

We are seeking an AI / Generative AI Engineer to design and build production‑gra...
Location
Location
India , Pune
Salary
Salary:
Not provided
vodafone.com Logo
Vodafone
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong Python and backend engineering expertise
  • Experienced in building and deploying LLM‑based applications, RAG systems, and agentic workflows
  • Comfortable working with orchestration frameworks such as LangChain, LangGraph, or Semantic Kernel
  • Knowledgeable in data engineering concepts, including data pipelines and vector databases
  • Experienced with at least one major cloud platform (Azure, AWS, or GCP)
  • Familiar with containerisation and deployment using Docker and Kubernetes
  • Able to collaborate effectively in cross‑functional teams and communicate technical concepts clearly
  • Committed to secure, ethical, and responsible use of AI technologies
Job Responsibility
Job Responsibility
  • Build LLM‑based copilots and intelligent assistants for enterprise use cases
  • Design and implement Retrieval‑Augmented Generation (RAG) solutions using structured and unstructured data
  • Develop and orchestrate multi‑agent workflows using modern frameworks and orchestration logic
  • Create and maintain backend APIs using Python frameworks such as FastAPI or Flask
  • Integrate enterprise platforms and external APIs into scalable AI solutions
  • Build data pipelines for ingestion, transformation, and embedding workflows
  • Implement and manage vector databases, ensuring performance and reliability
  • Establish evaluation frameworks to assess LLM output quality and relevance
  • Implement observability practices including tracing, logging, and monitoring
  • Apply software engineering best practices including modular code design, testing, documentation, and CI/CD pipelines
What we offer
What we offer
  • Opportunities to work on cutting‑edge, enterprise‑scale AI systems
  • Exposure to complex, real‑world use cases with measurable business impact
  • A collaborative environment that supports continuous learning and growth
  • Competitive salary and access to Vodafone’s global career development ecosystem
Read More
Arrow Right