AI Engineer – Intelligent Operations (Infrastructure) Job at Realign (Toronto)

Software Engineer - AI Infrastructure

We’re looking for a software engineer to join our Infrastructure team—building a...

Location

United States , New York City

Salary:

135000.00 - 280000.00 USD / Year

Assembled

Expiration Date

Until further notice

Requirements

Have 6+ years of engineering experience, with past ownership of high-scale, production-critical infrastructure
Have experience with distributed systems and container orchestration (especially Kubernetes)
Have worked with AI/ML platforms or are excited to build foundational infrastructure for LLM-based applications
Thrive in fast-paced environments with shifting requirements and ambiguous problem spaces
Are motivated by impact, enjoy deep technical challenges, and want to work cross-functionally across security, AI, and product
Have strong familiarity with one or more parts of our tech stack: Cloud provider: AWS
Orchestration: Kubernetes + Karpenter
LLM integration: Experience with OpenAI, Anthropic, or open-source model serving (e.g., vLLM, HuggingFace TGI, Ray Serve)
Prompt & embedding infrastructure: Vector databases (e.g., Pinecone, Weaviate, PGVector), semantic search, prompt templating systems
Datastores: Postgres + PgBouncer, Snowflake, Redis

Job Responsibility

Agent service reliability and scaling: We manage and scale the infrastructure that serves LLM-powered agents across chat, email, and voice. This includes selecting inference strategies, integrating with model providers (e.g. OpenAI, Anthropic), and dynamically routing traffic for performance and cost efficiency
Prompt and embedding storage systems: Assist relies heavily on dynamically generated prompts and semantic search across support content. The team owns highly-available, fast-access storage and indexing layers optimized for real-time AI interactions
Privacy and security: Enterprises expect strict guardrails around AI use. We’re building systems like network-level intrusion detection (IDS/IPS), audit logging, and LLM usage policy enforcement to meet these expectations and unlock new sales channels
Observability and usage analytics: We operate systems that surface key metrics—token usage, latency, cost per response, and quality signals—so the Assist team can continuously improve Assist’s performance and accuracy
AI-powered developer tools: We are beginning to explore and evangelize the use of AI to accelerate internal engineering workflows—through internal chat agents, pair programming tools, and intelligent automation for deployment, debugging, and on-call. Our goal is to empower engineers across the company to build faster and more confidently with AI

What we offer

Generous medical, dental, and vision benefits
Paid company holidays, sick time, and unlimited time off
Monthly credits to spend on each: professional development, general wellness, Assembled customers, and commuting
Paid parental leave
Hybrid work model with catered lunches everyday (M-F), snacks, and beverages in our SF & NY offices
401(k) plan enrollment
Stock options are provided as part of the compensation package

Fulltime

Senior Software Engineer - AI & Intelligent Tooling

We build simple yet innovative consumer products and developer APIs that shape h...

Location

United States , San Francisco

Salary:

180000.00 - 270000.00 USD / Year

Plaid

Expiration Date

Until further notice

Requirements

5+ years of professional software engineering experience
Experience building and operating production systems
Hands-on experience using AI-powered developer tools
Strong problem-solving and collaboration skills

Job Responsibility

Design, build, and operate internal tools and services used by Plaid’s engineers
Integrate AI-powered tools and workflows into core engineering processes
Improve reliability, usability, and performance of existing internal platforms
Own systems end-to-end, including production support and iterative improvement
Collaborate with teammates and stakeholders to deliver practical, high-impact solutions

What we offer

medical, dental, vision, and 401(k)

Fulltime

AI Research Infrastructure Engineer

Block is scaling Customer Insights into an AI-powered insights accelerator that ...

Location

United States , Bay Area

Salary:

168300.00 - 297000.00 USD / Year

Cash App

Expiration Date

Until further notice

Requirements

7+ years of experience in research, automation implementation, analytics, or related technical fields with hands-on workflow optimization experience
3+ years implementing AI/ML solutions, with experience in automation, LLM integration, or applied AI/analytics workflows
Hands-on technical skills in programming languages (Python, R, SQL) for automation development, API/MCP integrations, cloud platforms, and research data pipeline creation
Experience with research and analytic platforms and tools (Qualtrics, Snowflake, etc) or transferable experience with analytics and automation platforms
Strong technical communication and translation skills with ability to make complex AI/ML concepts, data architecture decisions, and automation workflows accessible and actionable for researchers, product managers, and business stakeholders
Proven ability to build stakeholder confidence and alignment during technology transformation
Strong project management skills with ability to coordinate multiple complex automation initiatives, manage competing priorities, and deliver measurable operational efficiency gains (reduced cycle times, improved quality outcomes, increased research capacity)
Familiarity with financial services, fintech, or payments industry research contexts and regulatory requirements preferred

Job Responsibility

Design, build, and deploy AI agents and agentic workflows that automate research operations from study design through insights delivery, using LLMs, prompt engineering, MCP (Model Context Protocol) integrations, and workflow orchestration integrated with existing research and analytics tech stack
Design, build, and maintain automated data pipelines that ingest, transform, and unify research data from diverse sources (surveys, transcripts, analytics, behavioral logs) into AI-ready repositories with RAG capabilities for instant insight access via tools like Goose
Architect ETL/ELT frameworks using Python, SQL or equivalent tools to ensure data consistency, traceability, and scalability
Develop data models and schemas for research metadata, participant data, and AI-generated insights to support efficient querying and analysis
Design and prototype research automation systems using AI/ML techniques, partnering with design & engineering teams to productionize solutions
Partner with engineering, design, and platform teams to integrate research automation systems with Block's tech stack (i.e. Goose, GitHub, etc.) and establish governance frameworks for quality, ethics, and compliance
Mentor team members on AI agent development, agentic system design, and research automation best practices to build organizational capabilities in intelligent automation

What we offer

Remote work
medical insurance
flexible time off
retirement savings plans
modern family planning

Fulltime

Ai Application Operations & Maintenance Engineer (Azure)

The organization is seeking a professional specialized in Application Maintenanc...

Location

Albania , Tirana

Salary:

Not provided

Business Integration Partners

Expiration Date

Until further notice

Requirements

Experience in Application Maintenance and Operations for enterprise applications
Solid knowledge of Python in an application context focused on AI functionalities
Operational knowledge of Microsoft Azure and its main PaaS services
Experience with Azure Kubernetes Service (AKS) and containerized workloads
Strong troubleshooting skills based on logs, metrics, and alerts
Knowledge of monitoring, logging, and observability principles
Familiarity with microservices architectures and multi-layer environments
Understanding of IAM concepts, Managed Identities, and secret management
Experience operating AI / Generative AI solutions in production
Knowledge of Azure OpenAI, embedding services, and vector search

Job Responsibility

Manage corrective and adaptive maintenance activities for AI applications in production
Analyze and resolve application incidents and anomalies across front-end, back-end, and service layers
Support application release activities and configuration management across different environments (Dev/Test/Prod)
Collaborate with development teams to analyze application issues and improve overall software quality
Provide operational support for solutions based on Azure Kubernetes Service (AKS), including management of containerized workloads
Continuously monitor application and infrastructure services using Azure Monitor, Log Analytics, and Application Insights
Analyze application logs, metrics, and alerts to ensure appropriate levels of reliability and performance
Perform advanced troubleshooting on data ingestion pipelines, AI services, search services, and databases
Provide operational support for data persistence services, including Azure SQL Database for structured data, Azure Cosmos DB for unstructured data and conversational history, Azure Storage Accounts (Blob Storage) for document repositories
Verify and support correct content indexing and retrieval through Azure AI Search, including vector search and similarity search

Fulltime

AI Research Engineer, Data Infrastructure

As a Research Engineer in Infrastructure, you will design and implement a robust...

Location

United States , Palo Alto

Salary:

180000.00 - 250000.00 USD / Year

1X Technologies

Expiration Date

Until further notice

Requirements

Strong experience in building data pipelines and ETL systems
Ability to design and implement systems for data collection and management from robotic fleets
Familiarity with architectures that span on-robot components, on-premise clusters, and cloud infrastructure
Experience with data labeling tools or building dataset visualization and annotation tooling
Proficiency in creating or applying machine learning models for dataset organization and automated labeling

Job Responsibility

Optimize operational efficiency of data collection across the NEO robot fleet
Design intelligent triggers to determine when and what data should be uploaded from the robots
Automate ETL pipelines to make fleet-wide data easily queryable and training-ready
Collaborate with external dataset providers to prepare diverse multi-modal pre-training datasets
Build frontend tools for visualizing and automating the labeling of large datasets
Develop machine learning models for automatic dataset labeling and organization

What we offer

Equity
Health, dental, and vision insurance
401(k) with company match
Paid time off and holidays

Fulltime

AI Research Engineer, Data Infrastructure

As a Research Engineer in Infrastructure, you will design and implement a robust...

Location

United States , Palo Alto

Salary:

180000.00 - 250000.00 USD / Year

1X Technologies

Expiration Date

Until further notice

Requirements

Strong experience in building data pipelines and ETL systems
Ability to design and implement systems for data collection and management from robotic fleets
Familiarity with architectures that span on-robot components, on-premise clusters, and cloud infrastructure
Experience with data labeling tools or building dataset visualization and annotation tooling
Proficiency in creating or applying machine learning models for dataset organization and automated labeling

Job Responsibility

Optimize operational efficiency of data collection across the NEO robot fleet
Design intelligent triggers to determine when and what data should be uploaded from the robots
Automate ETL pipelines to make fleet-wide data easily queryable and training-ready
Collaborate with external dataset providers to prepare diverse multi-modal pre-training datasets
Build frontend tools for visualizing and automating the labeling of large datasets
Develop machine learning models for automatic dataset labeling and organization

What we offer

Equity
Health, dental, and vision insurance
401(k) with company match
Paid time off and holidays

Fulltime

Full Stack Engineer (AI & Agentic AI Systems)

The Full Stack Engineer (AI & Agentic AI Systems) is a strategic professional wh...

Location

India , Pune; Chennai

Salary:

Not provided

Citi

Expiration Date

Until further notice

Requirements

8+ years in a product development/product management environment
Strong analytical and quantitative skills
Data driven and results-oriented
Experience delivering with an agile methodology
Experience in affecting large culture change
Experience leading infrastructure programs
Skilled at working with third party service providers
Excellent written and oral communication skills
Bachelor’s/University degree or equivalent experience
Strong expertise in SQL (Oracle, PostgreSQL)

Job Responsibility

Design and deliver end‑to‑end solutions spanning architecture, system design, low‑level design, and high‑quality coding across modern full‑stack environments
Build responsive, modular UI applications using React, integrating complex AI-driven workflows and real‑time interactions
Develop scalable, high‑performance backend services in Java / Python, implementing resilient APIs, event‑driven patterns, and microservices architectures
Engineer AI‑powered features leveraging Google Gemini LLM, Vertex AI, ADK, vector databases (A2A), RAG pipelines, MCP, context engineering, and advanced prompt engineering techniques
Implement secure, well‑structured REST and GraphQL APIs, ensuring reliability, versioning discipline, and clean integration patterns across platforms
Optimize system performance and scalability, applying profiling, load‑testing insights, caching strategies, and distributed system tuning
Drive robust CI/CD practices, integrating automated testing, code quality gates, containerization, and cloud‑native deployment pipelines
Partner with QE to build and maintain automated test suites (UI, API, integration, and performance), improving release quality and reducing regression risk
Identify, diagnose, and remediate performance bottlenecks, penetration testing vulnerabilities, and production issues with precision and root‑cause clarity
Collaborate cross‑functionally with AI scientists, architects, and product teams to translate business challenges into production‑ready, intelligent agentic systems

Fulltime

New

Artificial Intelligence (AI) Engineer

We are looking for an Artificial Intelligence (AI) Engineer to support the desig...

Location

United States , Albuquerque

Salary:

Not provided

Robert Half

Expiration Date

Until further notice

Requirements

Bachelor’s degree in computer science, software engineering, information technology, or a related technical field, or equivalent practical experience
Must have experience deploying Kubernetes and MCP servers integrated with AI data sources
At least 2 years of hands-on experience supporting AI or machine learning platforms, model deployment, MLOps processes, or AI-focused infrastructure
Demonstrated experience deploying and managing server-based workloads in Kubernetes environments
Strong programming and automation capabilities using Python, Bash, or similar scripting languages
Solid understanding of DevOps and MLOps practices, including Git-based development, CI/CD pipelines, containers, and Kubernetes orchestration
Experience working with AI and machine learning frameworks such as PyTorch, Hugging Face, or related ecosystems
Familiarity with enterprise security and compliance requirements, including authentication approaches such as OAuth and regulated operating environments
Ability to communicate effectively with both technical and non-technical teams and collaborate across multiple functions
Secret Security Clearance – Active or Inactive or ability to get a clearance

Job Responsibility

Direct the rollout and integration of AI platforms and services, ensuring they work effectively with existing enterprise technologies and operational standards
Architect, implement, and refine AI infrastructure in partnership with cloud, server, and platform engineering teams to support dependable system performance
Move machine learning solutions from development into production by establishing repeatable processes for deployment, maintenance, and long-term support
Create and manage CI/CD and MLOps workflows that cover model validation, packaging, release, rollback, and lifecycle oversight
Automate infrastructure and platform operations through scripting, infrastructure-as-code methods, and configuration management tools
Troubleshoot platform and service issues, perform root cause analysis, and produce clear technical documentation for support and maintenance activities
Strengthen system visibility by implementing logging, monitoring, alerting, and incident response practices across AI environments
Uphold security and compliance expectations by contributing to audits, remediation efforts, vulnerability management, and secure design reviews
Identify and deliver improvements that increase performance, scalability, reliability, and cost efficiency across AI-enabled systems
Work with technical and business stakeholders to align AI implementations with organizational priorities and evaluate emerging tools for long-term operational value

What we offer

medical
vision
dental
life and disability insurance
401(k) plan

Select Country

AI Engineer – Intelligent Operations (Infrastructure)

Job Description

Job Responsibility

Requirements

Nice to have

Looking for more opportunities?