CrawlJobs Logo

Staff Infrastructure Software Engineer - AI Platform

United Kingdom, Edinburgh · Job Posted January 03, 2026
Apply Position
Job Link Share

Job Description

We are currently seeking a Staff Software Engineer to join the AI Platform team to drive the design, architecture, and production posture of Addepar’s AI Platform and our products and solutions. This team is at the center of Addepar's mission to integrate AI across our product suite and is growing quickly. This role focuses on building a scalable platform and infrastructure to power deep, rich AI capabilities and products. As the AI platform expands, you will be a central architect of the "Core Platform" layer – building the managed services, serving infrastructure, observability, and cross-platform integrations that turn experimental AI capabilities into scalable production software. This includes productionizing the cutting edge of the latest AI developments like agents, MCP, computer use, etc; alongside designing and automating the operational backbone of Addepar's AI stack.

Job Responsibility

  • Design and build the production runtime for LLM-based agents and products, creating the services and infrastructure that serve autonomous agents
  • Develop deep application-level knowledge to proactively inform and influence requirements, constraints and best practices for implementing composable, complex AI systems
  • Lead the design, implementation, and automation of production infrastructure on a variety of cloud environments (Kubernetes/Databricks), to enable us to ship and scale AI features instantly
  • Evangelize and promote disciplined, best engineering practices to enforce strong production hygiene and culture
  • Initiate and lead collaborations with cross-functional teams to identify and resolve complex application or infrastructure issues, serving as a technical subject matter expert
  • Architect, build, and maintain advanced, automated CI/CD pipelines e.g. using Jenkins, ArgoCD, AWS CodeBuild/Pipeline, GitHub Actions, or similar, establishing best practices for deployment strategies (e.g., blue/green, canary)
  • Develop systems and best practices monitoring, alerting, and troubleshooting of our probabilistic and AI-driven systems and broader software stack

Requirements

  • Extensive experience as a Software/Backend Engineer, with a track record of taking on increasing responsibility
  • Experience across the full product lifecycle: designing, implementing, shipping, scaling, operationalizing, and maintaining technology/SaaS products
  • Exceptional Programming skills and fundamentals in Python/Go/Java, with a proven track record of building large scale production systems
  • Proficient experience with diverse compute environments including microservices (K8s), Databricks and serverless architectures (e.g. AWS Lambda)
  • Demonstrable experience leading initiatives with infrastructure-as-code tools such as Terraform in complex, multi-account environments
  • Proficient experience with comprehensive monitoring and alerting stacks (e.g. Prometheus/Grafana/Sentry/cloud-native tools), with a focus on observability strategy
  • Excellent interpersonal and communication skills to effectively collaborate with multi-functional teams, articulate complex technical concepts, and influence outcomes

Nice to have

  • Extensive experience with Databricks (Unity Catalog, Model Serving, Jobs)
  • Demonstrable experience writing and contributing to significant systems automation tooling or open-source projects is a strong plus
  • Specific experience with LLMs and agentic systems and associated technologies such as Langchain, Vector DBs, or MLFlow
  • Exposure to industry practices in financial services or other highly regulated data environments is a plus

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Staff Infrastructure Software Engineer - AI Platform

8 matching positions

Staff Software Engineer, Managed AI - AI Platform

Be a part of the AI revolution with sustainable technology at Crusoe. Here, you'...
Location
Location
United States , San Francisco, CA; Sunnyvale, CA
Salary
Salary:
208725.00 - 253000.00 USD / Year
crusoe.ai Logo
Crusoe
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Advanced degree in Computer Science/Engineering
  • 8-10+ years of industry experience with demonstrated history of consistent success leading a varied portfolio of initiatives across your function
  • Experience with distributed systems, cloud services (compute, storage, networking, database), and delivering early-stage projects quickly
  • Experience with Generative AI (LLMs, Multimodal) and familiar with AI infrastructure (training, inference, ETL pipelines)
  • Proficient with container runtimes (e.g., Kubernetes), microservices, REST APIs, gRPC, and the full software development lifecycle including CI/CD
Job Responsibility
Job Responsibility
  • Lead the design and implementation of core AI services, including: Resilient fault-tolerant queues for efficient task distribution
  • Model catalogs for managing and versioning AI models
  • Scheduling mechanisms optimized for cost and performance
  • Architect and scale infrastructure to handle millions of API requests per second
  • Implement robust monitoring and alerting to ensure system health and 24/7 availability
  • Collaborate closely with product management, business strategy, and other engineering teams to define the AI platform roadmap
  • Influence the long-term vision and architectural decisions of the platform
  • Contribute to open-source AI frameworks and actively participate in the AI community
  • Prototype and rapidly iterate on emerging technologies and new features
What we offer
What we offer
  • Restricted Stock Units in a fast growing, well-funded technology company
  • Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
  • Employer contributions to HSA accounts
  • Paid Parental Leave
  • Paid life insurance, short-term and long-term disability
  • Teladoc
  • 401(k) with a 100% match up to 4% of salary
  • Generous paid time off and holiday schedule
  • Cell phone reimbursement
  • Tuition reimbursement
  • Fulltime
Read More
Arrow Right

Staff Software Engineer, AI Agent Platform

The Geico AI Agent Platform team is seeking an exceptional Staff Software Engine...
Location
Location
United States , Chevy Chase; New York City
Salary
Salary:
115000.00 - 260000.00 USD / Year
geico.com Logo
Geico
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in computer science, Engineering, Mathematics, or a related field
  • an advanced degree (master’s or Ph.D.) is highly desirable
  • 6+ years of hands-on experience in designing, implementing, and maintaining multi-tenant AIML systems and platforms in production environments
  • 6+ years of experience working with cloud platforms such as Azure and AWS
  • Extensive expertise in designing and deploying large-scale data pipelines and real-time inference systems and managing the end-to-end AI Agent and/or AIML system development lifecycles, including configuration, evaluation, monitoring, observability and AuthN/AuthR considerations
  • 6+ years of experience working with common backend systems & tools (e.g, Kubernetes, Temporal, OpenSearch, PostgreSQL, Redis, Neo4J, etc.)
  • Deep understanding of Docker, container optimization, and multi-stage builds
  • Experience with Prometheus, Grafana, Open Telemetry and distributed tracing
  • 3+ years of experience building front-end web applications using frameworks such as React and/or Next.JS
  • Deep proficiency in programming languages such as Python, Java, Go, etc., with a strong emphasis on coding excellence
Job Responsibility
Job Responsibility
  • Architect and implement scalable multi-tenant backend systems for building AI agent workflows, including agent configuration, offline evaluation, synthetic data generation, workflow simulation, agent marketplace, etc. using Azure Kubernetes Service (AKS), FastAPI, etc., ensuring economy of scale and control cost of maintenance
  • Collaborate with Design team to architect and implement frontend experiences and workflows for onboarding both technical and non-technical stakeholders, maximizing user adoption and successful AI agent development
  • Develop observability frameworks to ensure 99.9%+ uptime for AI agent platforms through robust monitoring, alerting, and incident response procedures
  • Evaluate and (if desirable) integrate cutting-edge GenAI frameworks, libraries and vendors to maintain a state-of-the-art technology stack, including hybrid cloud solutions with AWS/GCP as backup or specialized use cases
  • Architect and implement scalable, high-performance machine learning platforms and systems capable of processing large data volumes and supporting real-time decision making and workflows
  • Oversee the end-to-end lifecycle of AI agent applications, ensuring robust testing, deployment, and ongoing monitoring
  • Ensure adherence to company production readiness standards, security protocols, and regulatory compliance throughout the development lifecycle
  • Continuously optimize platform performance, reducing latency and improving throughput for AI agent workloads
  • Design and implement backup, recovery, and business continuity plans for hosted platform applications & services
  • Design and maintain robust CI/CD pipelines for ML model deployment using Azure DevOps, GitHub Actions, and MLOps tools
What we offer
What we offer
  • Comprehensive Total Rewards program that offers personalized coverage tailor-made for you and your family’s overall well-being
  • Financial benefits including market-competitive compensation
  • a 401K savings plan vested from day one that offers a 6% match
  • performance and recognition-based incentives
  • and tuition assistance
  • Access to additional benefits like mental healthcare as well as fertility and adoption assistance
  • Supports flexibility- We provide workplace flexibility as well as our GEICO Flex program, which offers the ability to work from anywhere in the US for up to four weeks per year
  • Fulltime
Read More
Arrow Right

Staff Infrastructure Software Engineer, Enterprise AI

Scale GP is building the next generation of enterprise-grade Generative AI produ...
Location
Location
United States , New York; San Francisco
Salary
Salary:
216200.00 - 270250.00 USD / Year
scale.com Logo
Scale
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven experience in a senior role
  • 5+ years of full-time software engineering experience
  • Deep understanding of modern infrastructure practices, including CI/CD, IaC (e.g., Terraform, Helm Charts), container orchestration (e.g., Kubernetes) and observability platforms (e.g., Datadog, Prometheus, Grafana)
  • Extensive experience with at least one major cloud provider (AWS, Azure, or GCP)
  • Strong knowledge of security and compliance in enterprise environments, with a focus on access management, data isolation, and customer-specific VPC setups
  • Proficiency in Python or JavaScript/TypeScript, and SQL
Job Responsibility
Job Responsibility
  • Define the architectural patterns for our multi-cloud infrastructure to support secure, reliable, and scalable Agentic workflows for enterprise customers
  • Lead the infrastructure roadmap with a strong focus on compliance, privacy, and security standards, including designing change management and data isolation strategies
  • Own the development and maintenance of our best-in-class Agentic observability platform (logging, metrics, tracing, and analytics) to proactively ensure system health and enable rapid incident response
  • Drive developer efficiency by building automated tooling and championing Infrastructure-as-Code (IaC) paradigms throughout the engineering organization
  • Solve the toughest engineering problems related to multi-tenancy, data isolation, and high-performance inference at a massive scale, taking end-to-end ownership across the full product lifecycle
What we offer
What we offer
  • Comprehensive health, dental and vision coverage
  • retirement benefits
  • a learning and development stipend
  • generous PTO
  • equity based compensation
  • additional benefits such as a commuter stipend
  • Fulltime
Read More
Arrow Right

Sr Staff AI Software Engineer (CORA AI)

Idira is looking for a Senior AI Software Engineer to join our Generative AI fou...
Location
Location
Israel , Petah Tikva
Salary
Salary:
Not provided
paloaltonetworks.com Logo
Palo Alto Networks
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree with 15 years of experience, or Master's degree with 12 years of experience, or PhD with 8 years of experience
  • Extensive experience in software engineering, with a focus on building and deploying production-grade systems
  • Proven experience with cloud platforms, particularly AWS services
  • Strong programming skills in languages such as Python
  • Hands-on experience with machine learning, deep learning, and Large Language Models (LLMs)
  • Experience with the full software development lifecycle (design, development, testing, deployment, maintenance)
Job Responsibility
Job Responsibility
  • Design, develop, and deploy scalable AI agents and shared infrastructure for GenAI capabilities
  • Manage the full software development lifecycle, from prototyping to production-grade, observable systems
  • Collaborate with product managers, researchers, and engineers to translate ideas into robust and scalable services
  • Work hands-on with large language models (LLMs), agentic frameworks, and a broad range of AWS services
  • Implement strong telemetry, evaluation metrics, and feedback loops to continuously improve AI solutions
  • Champion operational excellence through pragmatic experimentation, iteration, and solid design
Read More
Arrow Right

Senior Software Engineer, AI Platform

GoodLeap is a technology company delivering best-in-class financing and software...
Location
Location
United States , AUSTIN; SAN FRANCISCO; IRVINE; ROSEVILLE
Salary
Salary:
173000.00 - 200000.00 USD / Year
goodleap.com Logo
GoodLeap
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience building and shipping scalable, robust backend services and APIs
  • Strong proficiency in Python and/or TypeScript
  • Solid understanding of distributed systems, service-oriented architecture, and event-driven patterns (e.g. Kafka, RabbitMQ, SQS)
  • Passion for software development, emerging technologies and culture of innovation
  • A collaborative mindset and interest in mentoring teammates and elevating team practices
  • Excellent communication and interpersonal skills
Job Responsibility
Job Responsibility
  • Build features and extensions to our agentic AI platform using scalable, robust, and AI-first software engineering practices
  • Design tools and infrastructure to enable teams at GoodLeap to easily build and enhance AI agents that empower homeowners, contractors, and operations staff
  • Work alongside a team of AI engineers, product managers, and data scientists to evaluate and improve our agent ecosystem
  • Collaborate with Staff engineers, product, architecture, and design leads to deliver highly-available, fault-tolerant products and services
  • Work on significant and unique technical challenges, evaluate and recommend solutions, and guide decision making by considering technical tradeoffs
  • Grasp both the technical and business perspective so you can help drive innovation
  • Work autonomously and be self-disciplined, requiring minimal supervision or guidance
  • Collaborate with other team members and coach more junior team members to grow both their technical skills and soft skills
What we offer
What we offer
  • May be eligible for a bonus and equity
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, AI Platform Engineer

As Microsoft continues to push the boundaries of AI, we are on the lookout for p...
Location
Location
United States , Mountain View
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in computer science, or related technical discipline AND 4+ years technical engineering experience building customer-facing applications/products with coding in languages including, but not limited to C#, Python, Java, Golang
  • OR equivalent experience
  • Experience leveraging generative AI technologies to develop innovative and user-focused product features
  • 4+ years' experience building APIs and creating pipelines for large-scale products
  • 4+ years' experience building scalable services on top of public cloud infrastructure like Azure, AWS, or GCP. Extensive use datastores like RDBMS, key-value stores, etc.
Job Responsibility
Job Responsibility
  • Work on building new AI features that enhance copilot
  • Build secure and performant AI Platform services that power Copilot
  • Work collaboratively with other AI Researchers, Platform, infrastructure, application engineers to build next generation AI products and services
  • Ship high-quality, well-tested, secure, and maintainable code
  • Find a path to get things done despite roadblocks to get your work into the hands of users quickly and iteratively
  • Enjoy working in a fast-paced, design-driven, product development cycle
  • Embody our Culture and Values.
  • Fulltime
Read More
Arrow Right

Senior Staff Software Engineer - AI

GEICO is seeking an experienced Engineer with a passion for building high-perfor...
Location
Location
United States , Seattle, WA; Austin, TX; Palo Alto, CA; Chicago, IL; Dallas, TX
Salary
Salary:
110000.00 - 230000.00 USD / Year
geico.com Logo
Geico
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience building and deploying ML systems in production with cross-functional engineering teams
  • Fluency in at least two modern languages such as Python, Go, Java, C++, or C# including object-oriented design
  • Experience architecting multi-component ML platforms using open-source/cloud-agnostic components: Datastores: PostgreSQL, NoSQL (MongoDB, Cassandra, CosmosDB) Streaming: Kafka, Flink, or Spark Streaming
  • Experience with end-to-end ML lifecycle: version control, CI/CD, Kubernetes, testing, monitoring, and production support
  • Experience with cloud providers (Azure, AWS or GCP) in production ML environments
  • Experience with observability tools and distributed systems monitoring, logging, tracing, and root cause analysis
  • Experience building multi-agent systems using LLMs and agentic frameworks (e.g., LangChain, LangGraph, AutoGen, Semantic Kernel, CrewAI)
  • Hands-on experience with RAG, semantic search, and vector databases (e.g., Milvus, pgvector, Qdrant, ElasticSearch)
  • Experience designing human-in-the-loop workflows and safety controls for autonomous systems
  • Strong architecture and design skills with ability to influence technical direction and roadmap
Job Responsibility
Job Responsibility
  • Design and build a multi-agent AI platform where specialized agents autonomously detect, diagnose, and resolve issues through agent-to-agent (A2A) collaboration
  • Develop intelligent agents using LLMs and agentic frameworks that coordinate detection, diagnostic, remediation, and knowledge tasks with minimal human intervention
  • Define agent interaction protocols, A2A communication standards, and evaluation frameworks for agent decision quality and autonomous action safety
  • Architect vector database solutions (Milvus, pgvector, Qdrant) for semantic search and RAG to enable context-aware agent decision-making
  • Build end-to-end ML pipelines for severity classification, anomaly detection, failure pattern recognition, and impact forecasting using observability data
  • Establish scalable orchestration infrastructure for multi-agent workflows with CI/CD, automated evaluation, canary releases, and rollback strategies
  • Implement monitoring for agent interactions, A2A communication patterns, decision quality, data drift, and system reliability
  • Lead technical architecture ensuring scalability, observability, and integration with existing alerting, logging, and monitoring systems
  • Define standards for agent safety, explainability, governance, and human-in-the-loop controls for high-impact automated actions
  • Partner with SRE, Product, and Engineering teams to translate reliability goals into measurable ML objectives and maintain pragmatic technical roadmaps
What we offer
What we offer
  • Comprehensive Total Rewards program that offers personalized coverage tailor-made for you and your family’s overall well-being
  • Financial benefits including market-competitive compensation
  • a 401K savings plan vested from day one that offers a 6% match
  • performance and recognition-based incentives
  • and tuition assistance
  • Access to additional benefits like mental healthcare as well as fertility and adoption assistance
  • Supports flexibility- We provide workplace flexibility as well as our GEICO Flex program, which offers the ability to work from anywhere in the US for up to four weeks per year
  • Fulltime
Read More
Arrow Right

Staff Software Engineer, Platform

At Scale, our products include the Generative AI Data Engine, SGP, Donovan, and ...
Location
Location
United States , San Francisco; Seattle; New York
Salary
Salary:
248400.00 - 310500.00 USD / Year
scale.com Logo
Scale
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of full-time engineering experience, post-graduation with specialities in back-end systems
  • Extensive experience in software development and a deep understanding of distributed systems and public cloud platforms (AWS preferred)
  • Demonstrated a track record of independent ownership and leadership across successful multi-team engineering projects
  • Possess excellent communication and collaboration skills, and the ability to translate complex technical concepts to non-technical stakeholders
  • Experience working fluently with standard containerization & deployment technologies like Kubernetes, Terraform, Docker, etc.
  • Experience with orchestration platforms, such as Temporal and AWS Step Functions
  • Experience with NoSQL document databases (MongoDB) and structured databases (Postgres)
  • Strong knowledge of software engineering best practices and CI/CD tooling (CircleCI, ArgoCD)
Job Responsibility
Job Responsibility
  • Architectural Vision: You will drive the design and implementation of foundational systems, acting as a bridge between high-level business goals and technical goals
  • Cross-Functional Leadership: You will collaborate with cross-functional teams to define and drive adoption of the next generation of features for our AI data infrastructure
  • Technical Ownership: You are responsible for proactively identifying and driving opportunities for organizational growth, driving improvements in programming practices, and upgrading the tools that define our development lifecycle
  • Technical Mentorship: You will serve as a subject matter expert, presenting technical information to stakeholders and providing the guidance to elevate the engineering culture across the company
What we offer
What we offer
  • Comprehensive health, dental and vision coverage
  • retirement benefits
  • a learning and development stipend
  • generous PTO
  • additional benefits such as a commuter stipend
  • equity based compensation
  • Fulltime
Read More
Arrow Right