CrawlJobs Logo

AI Platform Engineer, Backend

United States, San Francisco Bay Area · Job Posted February 18, 2026
Apply Position
Job Link Share

Job Description

As an AI Platform Backend Engineer, you will own and build the core backend systems that power Brain Co.’s AI platform. This is a foundational, high-leverage role: your work enables product teams, ML engineers, and customer deployments to move faster and more safely. You will design and operate backend services that support complex AI workflows (multi-step reasoning, tool use, human-in-the-loop processes), with a strong focus on scalability, reliability, and developer experience. You’ll take systems from early technical exploration through production deployment, often in ambiguous problem spaces where you help define the right solution. This role offers significant autonomy and the opportunity to grow into a broader technical and organizational leadership role over time.

Job Responsibility

  • Design, build, and operate foundational backend services and data pipelines that power Brain Co.’s AI platform
  • Take end-to-end ownership: architecture, implementation, deployment, and long-term maintenance
  • Build systems with strong guarantees around correctness, fault tolerance, and observability
  • Design for real uptime expectations in enterprise and government environments
  • Design modular, reusable backend architectures and APIs (REST, gRPC, event-driven)
  • Make sound architectural tradeoffs with a long-term platform mindset
  • Break down open-ended, complex problems into clear technical designs
  • Move from first principles to production-ready systems with speed and rigor
  • Improve latency, throughput, and cost efficiency through profiling, thoughtful system design, and iteration
  • Work closely with Product, ML, Infrastructure, and customer-facing teams
  • Enable others by building systems that are intuitive, reliable, and developer-friendly

Requirements

  • 5+ years of experience building backend systems or platforms in production
  • Strong fundamentals in distributed systems (consistency, availability, failure modes, retries, idempotency)
  • Deep proficiency in at least one backend language (Go, Typescript, Rust, Python, or similar)
  • Experience designing, operating, and evolving APIs and services at scale
  • Proven ability to design systems from first principles
  • Experience building shared infrastructure, internal platforms, or developer-facing services
  • Strong intuition for developer experience and long-term maintainability
  • Experience owning services with real uptime and operational responsibility
  • Familiarity with observability tooling (metrics, logging, tracing) and incident response

Nice to have

  • Experience with AI/ML platforms, inference systems, or data-intensive pipelines
  • Familiarity with Kubernetes and cloud-native service deployment
  • Exposure to multi-tenant, regulated, or government environments
  • Experience with real-time or streaming data systems

What we offer

  • Competitive salary plus equity
  • Medical, Dental, and Vision coverage
  • 401(k)
  • Unlimited PTO
  • Daily lunches
  • Commuter benefits

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

AI Platform Engineer, Backend

8 matching positions

GCP AI Platform Architect / Lead AI Platform Engineer

Our client is an innovative technology company specializing in the development o...
Location
Location
Poland , Kraków
Salary
Salary:
Not provided
teamquest.pl Logo
TeamQuest Sp. z o. o.
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • GCP Expertise (verifiable - ask for production examples): GCP is their primary cloud not secondary experience alongside AWS/Azure. Production deployments across most of: Vertex AI, Cloud Run or GKE, Pub/Sub, BigQuery, Secret Manager, VPC Service Controls, IAM + Workload Identity. Has designed for GCP from scratch, not migrated from another cloud, end-to-end ownership
  • AI / Backend Engineering: Python is the primary language - production-grade service/API development, not scripting or data science only. Strong track record building distributed systems and integrating LLMs.
  • Agentic Architecture (must be production, not PoC): Hands-on production experience with at least one: LangGraph, Google ADK, CrewAI, or custom multi-agent orchestration layer. RAG pipelines shipped to production. Google ADK: candidate must be able to explain what it is, when to use it, and how it compares to LangGraph and custom orchestration. AI agent workflows, ReAct prompting, and Function Calling in production environments
  • Multi-Tenant Architecture: Has designed a multi-tenant SaaS platform end-to-end - not just contributed. Can articulate tenant isolation strategies: IAM boundary design, data isolation per tenant, VPC controls.
  • API Design & Integrations: Proven ability to create secure, high-performance APIs capable of asynchronously managing traffic and communication between multiple decoupled services.
  • Enterprise Security: Practical knowledge of data isolation in multi-tenant SaaS architectures, IAM, and securing cloud-based environments.
  • Vector Databases: Hands-on experience with semantic search and at least one of: Pinecone, Weaviate, pgvector, or Vertex Matching Engine.
Job Responsibility
Job Responsibility
  • System Architecture: Design and develop a scalable, cloud-native architecture on Google Cloud Platform (GCP) that meets enterprise security and multi-tenant data isolation requirements for a SaaS environment
  • AI Agent Orchestration: Architect and implement autonomous, multi-step AI workflows with a clear separation of agent responsibilities (retrieval, analysis, reasoning, response generation)
  • Hands-on Core Development: Actively contribute to core system development-coding orchestration logic, designing services, optimizing performance, and building secure API integrations for routing queries across internal and external agents
  • Frontend Enablement: Design the backend layer, streaming protocols, and APIs to seamlessly support and integrate with advanced conversational UIs
  • Data Management & Extensibility: Build a robust backend capable of processing qualitative and social data, ensuring the platform is easily extensible to incorporate new data sources
What we offer
What we offer
  • Attractive salary
  • Full remote work
  • Social benefits:sporto card,healthcare insurance
  • Fulltime
Read More
Arrow Right

GCP AI Platform Architect / Lead AI Platform Engineer

Our client is an innovative technology company specializing in the development o...
Location
Location
Poland , Katowice
Salary
Salary:
Not provided
teamquest.pl Logo
TeamQuest Sp. z o. o.
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • GCP Expertise (verifiable - ask for production examples): production deployments across most of: Vertex AI, Cloud Run or GKE, Pub/Sub, BigQuery, Secret Manager, VPC Service Controls, IAM + Workload Identity
  • Has designed for GCP from scratch, not migrated from another cloud, end-to-end ownership
  • AI / Backend Engineering: Python is the primary language - production-grade service/API development, not scripting or data science only
  • Strong track record building distributed systems and integrating LLMs
  • Agentic Architecture (must be production, not PoC): Hands-on production experience with at least one: LangGraph, Google ADK, CrewAI, or custom multi-agent orchestration layer
  • RAG pipelines shipped to production
  • Google ADK: candidate must be able to explain what it is, when to use it, and how it compares to LangGraph and custom orchestration
  • AI agent workflows, ReAct prompting, and Function Calling in production environments
  • Multi-Tenant Architecture: Has designed a multi-tenant SaaS platform end-to-end - not just contributed
  • Can articulate tenant isolation strategies: IAM boundary design, data isolation per tenant, VPC controls
Job Responsibility
Job Responsibility
  • System Architecture: Design and develop a scalable, cloud-native architecture on Google Cloud Platform (GCP) that meets enterprise security and multi-tenant data isolation requirements for a SaaS environment
  • AI Agent Orchestration: Architect and implement autonomous, multi-step AI workflows with a clear separation of agent responsibilities (retrieval, analysis, reasoning, response generation)
  • Hands-on Core Development: Actively contribute to core system development-coding orchestration logic, designing services, optimizing performance, and building secure API integrations for routing queries across internal and external agents
  • Frontend Enablement: Design the backend layer, streaming protocols, and APIs to seamlessly support and integrate with advanced conversational UIs
  • Data Management & Extensibility: Build a robust backend capable of processing qualitative and social data, ensuring the platform is easily extensible to incorporate new data sources
What we offer
What we offer
  • Attractive salary
  • Full remote work
  • Social benefits: sport card, healthcare insurance
  • Fulltime
Read More
Arrow Right

Backend Engineer (AI Platform)

Plaud is building the world's most trusted AI work companion for professionals t...
Location
Location
Singapore , Singapore
Salary
Salary:
Not provided
plaud.ai Logo
Plaud
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Minimum 3 years of backend or AI engineering experience
  • At least 1+ years specifically in LLM application architecture
  • Deep practical knowledge of advanced agent patterns (e.g., Plan-Act-Reflection)
  • Proven ability to design complex distributed systems
  • Experience defining API standards and data protocols for cross-team usage
Job Responsibility
Job Responsibility
  • Agent Architecture Design: Designed the AI Agent architecture and implemented the "Plan-Act-Reflection" agentic flow
  • Skill Design: Developed agent skills including Function Calling, MCP Server integration, and Streaming APIs
  • Design DAG (Directed Acyclic Graph) reasoning flows to break down ambiguous user requests into executable steps
  • Solve critical runtime challenges like "Context Rot" (context overflow) by designing strategies for context offloading, isolation, and intelligent compression
  • RFT: Architect the Automated Data Flywheel system. Design Reward Functions and LLM-as-a-judge pipelines to programmatically evaluate agent performance and drive reinforcement learning
What we offer
What we offer
  • Market-competitive compensation
  • Global exposure
  • Vibrant, creativity-fueled work atmosphere
  • Fulltime
Read More
Arrow Right

Staff Software Engineer, Backend (AI Platform)

Cresta is on a mission to turn every customer conversation into a competitive ad...
Location
Location
United States
Salary
Salary:
Not provided
cresta.com Logo
Cresta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years writing production software
  • 2+ years focused on ML platform or infra
  • Expert Python (async, typing, packaging, performance)
  • Working Golang knowledge for systems components
  • Proven experience with one or more serving frameworks (e.g., vLLM, Triton, TorchServe)
  • Kubernetes and cloud-native ops
  • Solid grasp of distributed systems, networking, and container security
  • Culture of rigorous testing, code review, and continuous delivery
Job Responsibility
Job Responsibility
  • Own model serving: Design, build, and maintain low-latency, highly-available serving stacks for in-house ML model serving and integrating with LLM serving partners
  • Automate training pipelines: Orchestrate data prep, training, evaluation, and registry workflows on Kubernetes with solid MLOps practices
  • Optimize at scale: Profile and tune throughput, memory, and cost
  • introduce caching, sharding, batching, and GPU/CPU autoscaling where it pays off
  • Build platform primitives: Create reusable SDKs, templates, and CLI tools that let research and product teams ship models independently and safely
  • Raise the bar: Instrument deep observability (tracing, metrics, alerts), drive blameless post-mortems, and mentor engineers on production ML best practices
What we offer
What we offer
  • Comprehensive medical, dental, and vision coverage with plans to fit you and your family
  • Flexible PTO to take the time you need, when you need it
  • Paid parental leave for all new parents welcoming a new child
  • Retirement savings plan to help you plan for the future
  • Remote work setup budget to help you create a productive home office
  • Monthly wellness and communication stipend to keep you connected and balanced
  • In-office meal program and commuter benefits provided for onsite employees
Read More
Arrow Right

Backend Engineer - AI Developer Platform

At N26, we are building the internal AI platform that will power the next genera...
Location
Location
Germany , Berlin
Salary
Salary:
Not provided
n26.com Logo
N26
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Backend engineer who enjoys building platforms and developer-facing systems
  • Solid experience building software products written in languages such as Kotlin, Go, Python, or TypeScript
  • Experience working with APIs and distributed systems
  • Interest in developer platforms, tooling, or internal products
  • Curiosity about AI and how it can improve software development workflows
  • Strong collaboration skills and the ability to work within a highly technical team
  • Curiosity and willingness to learn new things
  • Data driven mindset
Job Responsibility
Job Responsibility
  • Help build the internal AI platform used by engineering teams at N26
  • Developing the core services that connect internal tools to AI providers
  • Implementing routing, cost controls, security policies, and observability
  • Building tools that make AI capabilities easy and safe to consume
  • Contributing to a platform where teams can publish and reuse AI skills
  • Supporting discovery, versioning, and governance of AI capabilities
  • Enabling composability across different AI-powered tools
  • Building services that enable AI-assisted workflows for engineers
  • Integrating AI capabilities with internal developer platforms
  • Supporting experimentation and iteration on new AI-enabled developer experiences
What we offer
What we offer
  • Accelerate your career growth by joining one of Europe’s most talked about disruptors
  • Employee benefits that range from a competitive personal development budget, work from home budget, discounts to fitness & wellness memberships, language apps and public transportation
  • Access to a Premium subscription on your personal N26 bank account
  • Subscriptions for friends and family members
  • Additional day of annual leave for each year of service
  • A high degree of autonomy and access to cutting edge technologies
  • A relocation package with visa support for those who need it
Read More
Arrow Right

Senior Backend Python Engineer - AI Platform

Are you looking for a career move that will put you at the heart of a global fin...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proficiency in core Python and FastAPI framework
  • Profound understanding of software design principles, architectural patterns, and an unwavering commitment to writing clean, maintainable, and production-grade code
  • Experience of the full lifecycle of design, implementation and running of enterprise software solutions involving cross functional team collaboration
  • Experience contributing to the architecture and design (architecture, design patterns, reliability, scaling) of new and current systems
  • Experience with containerized deployment (Kubernetes, OpenShift etc)
  • Experience with DevOps, CI/CD and agile methodology
Job Responsibility
Job Responsibility
  • You will design, implement, build and deploy backend systems to automate the analysis of data, code and documentation, and structure the extracted knowledge in a Credit Risk Domain aware knowledge graph
What we offer
What we offer
  • Generous holiday allowance starting at 27 days plus bank holidays
  • increasing with tenure
  • A discretional annual performance related bonus
  • Private medical insurance packages to suit your personal circumstances
  • Employee Assistance Program
  • Pension Plan
  • Paid Parental Leave
  • Special discounts for employees, family, and friends
  • Access to an array of learning and development resources
  • Fulltime
Read More
Arrow Right

Senior ML Platform Engineer, AI Platform

We are seeking a skilled and passionate ML Platform Engineer to join our team an...
Location
Location
Singapore , Singapore
Salary
Salary:
Not provided
airwallex.com Logo
Airwallex
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years in backend software development
  • at least 2+ years focus on AI/ML Platform or MLOps infrastructure
  • deep expertise in MLOps practices, including automated deployment pipelines, model optimization, and production lifecycle management
  • proven experience designing and implementing low-latency model serving solutions
  • proficiency in Python
  • skill in writing high-quality, maintainable code
  • experience in design and development of large-scale distributed, high concurrency, low-latency inference, high availability systems
  • excellent communication and mentoring abilities
  • a relevant degree in Computer Science, Mathematics or related fields
Job Responsibility
Job Responsibility
  • Platform Development: Design, build, and maintain the end-to-end MLOps platform using Kubernetes and Cloud Services
  • Infrastructure as Code (IaC): Use Terraform or similar tools to manage, provision, and scale all ML-related infrastructure securely and efficiently
  • Pipeline Automation: Implement and optimize CI/CD/CT (Continuous Integration, Delivery, Training) pipelines to automate model training, testing, packaging, and deployment using tools like Argo and Kubeflow Pipelines
  • Serving Infrastructure: Build highly available, low-latency, and high-throughput model serving infrastructure
  • Observability: Implement robust monitoring, alerting, and logging solutions to track infrastructure health, model performance, and data/model drift
  • Tooling & Support: Evaluate, integrate, and support ML tools such as Feature Stores and distributed model training pipelines
  • Security & Compliance: Ensure platform security, implement RBAC (Role-Based Access Control), and manage secrets for sensitive data and production environments
  • Collaboration: Work closely with Data Scientists and ML Engineers to understand their needs and provide technical guidance on best practices for scaling their models
  • Fulltime
Read More
Arrow Right

Senior Machine Learning Engineer, AI Platform

The AI Platform team is responsible for building the foundational infrastructure...
Location
Location
United States; Canada
Salary
Salary:
139000.00 - 218000.00 USD / Year
mozilla.org Logo
Mozilla
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree with 4–6 years of relevant industry experience, or Master’s degree with significant hands-on experience building and operating production ML systems, or work experience equivalent
  • Strong experience developing in Python for machine learning systems, backend services, or distributed data processing
  • Proven experience deploying and operating ML workloads in cloud environments, including production-grade infrastructure
  • Solid understanding of model serving architectures, inference pipelines, and performance tradeoffs (latency, throughput, cost, scaling strategies)
  • Hands-on experience working with GPU-based workloads and accelerated computing in production settings
  • Experience designing CI/CD pipelines and development workflows that support reliable ML system deployment
  • Ability to independently scope and drive technical initiatives while balancing product and operational priorities
  • Strong problem-solving skills and the ability to debug performance and reliability issues in distributed systems
  • Clear and effective communication skills, with experience collaborating across engineering, product, and infrastructure teams
Job Responsibility
Job Responsibility
  • Design, build, and operate core AI platform components used to train, deploy, and serve machine learning models in production environments
  • Own model serving and inference workflows end-to-end, driving improvements in reliability, scalability, performance, and operational excellence
  • Lead efforts to optimize inference systems for throughput, latency, and cost efficiency across CPU and GPU workloads
  • Design and manage GPU-based inference and training workloads, including performance tuning, capacity planning, and resource utilization optimization
  • Own and improve critical parts of the model lifecycle, including packaging, versioning, testing strategies, validation, and deployment automation
  • Implement and evolve observability practices (metrics, logging, tracing, alerting) to improve visibility and operational resilience of ML services and pipelines
  • Partner closely with product, infrastructure, security, and data teams to design scalable platform capabilities that enable AI-powered features
  • Contribute to technical design discussions, propose architectural improvements, and mentor junior engineers through code reviews and knowledge sharing
  • Participate in and help improve operational processes, including incident response, on-call rotations, and post-incident reviews
What we offer
What we offer
  • Generous performance-based bonus plans
  • Rich medical, dental, and vision coverage
  • Generous retirement contributions with 100% immediate vesting
  • Quarterly all-company wellness days
  • Country specific holidays plus a day off for your birthday
  • One-time home office stipend
  • Annual professional development budget
  • Quarterly well-being stipend
  • Considerable paid parental leave
  • Employee referral bonus program
  • Fulltime
Read More
Arrow Right