CrawlJobs Logo

Senior Staff Machine Learning Engineer (AI Agent)

cresta.com Logo

Cresta

Location Icon

Location:
United States; Canada

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

At Cresta, the AI Agent team is on a mission to create state-of-the-art AI Agents that solve practical problems for our customers. We are focused on leveraging the latest technologies in Large Language Models (LLMs) and AI Agent systems, while ensuring that the solutions we develop are cost-effective, secure, and reliable. This role will involve hands-on work on cutting-edge projects, requiring innovative and passionate machine learning engineers who can bring research into practical, scalable applications. As a Staff Machine Learning Engineer, your goal will be to take AI Agents from the realm of research and bring them into practical, real-world use cases. This includes developing and deploying proprietary LLMs, scaling AI solutions, and addressing key challenges such as evaluation and reliability. While we’re focused on real-world application rather than pure research, you’ll be working with some of the most advanced technologies in the GenAI space. This is a unique opportunity to shape the future of AI at Cresta by solving complex problems and bringing breakthrough AI advancements into production environments.

Job Responsibility:

  • Design, develop, and deploy Cresta’s AI Agent solutions and proprietary models
  • Focus on practical AI challenges such as improving reasoning, planning capabilities, and evaluation in real-world scenarios
  • Collaborate with cross-functional teams including front-end and back-end software engineers to integrate AI Agents into Cresta’s customer solutions
  • Lead initiatives to scale AI systems for production environments, ensuring performance and reliability across use cases
  • Contribute to solving cutting-edge problems in AI and help define the future roadmap for Cresta’s AI Agents
  • Innovate and research ways to improve security, cost-efficiency, and reliability of AI systems

Requirements:

  • Bachelor’s Degree in Computer Science, Mathematics, or a related field
  • Master’s or Ph.D. preferred, or equivalent professional experience
  • 7+ years of hands-on industry experience with AI and machine learning
  • 3+ years of experience working with LLMs in large-scale production environments
  • Expert knowledge of machine learning concepts and methods, especially those related to NLP, Generative AI, and working with LLMs
  • Proven leadership in designing and deploying AI solutions at scale
  • Extensive practical knowledge of modern machine learning frameworks and technologies (e.g., PyTorch, Tensorflow, Hugging Face, NumPy)
  • Experience with distributed systems and cloud-based AI infrastructure
  • Strong problem-solving and strategic thinking abilities
  • Proven ability to lead cross-functional teams and work collaboratively to deliver innovative AI solutions in production
  • A passion for driving AI adoption and pushing the boundaries of AI technology into real-world applications
  • Ability to mentor junior engineers and influence strategic decisions across the organization
What we offer:
  • Variety of medical, dental, and vision plans
  • Paid parental leave
  • Monthly Health & Wellness allowance
  • Work from home office stipend
  • Lunch reimbursement for in-office employees
  • PTO: 3 weeks in Canada
  • Base salary, equity, and a variety of benefits

Additional Information:

Job Posted:
December 07, 2025

Employment Type:
Fulltime
Work Type:
Remote work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior Staff Machine Learning Engineer (AI Agent)

Senior Staff Machine Learning Engineer

Help design our AI platform and develop our next generation of machine learning ...
Location
Location
United States , San Francisco
Salary
Salary:
216500.00 - 324500.00 USD / Year
gofundme.com Logo
GoFundMe
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 9+ years of hands-on experience in machine learning engineering, AI development, software engineering, or related fields
  • Experience emphasizing secure, large-scale, distributed system design, AI/ML pipeline development, and implementation
  • Extensive experience designing, developing, and operating scalable backend systems
  • Experience applying software engineering best practices such as domain-driven design, event-driven architectures, and microservices
  • Deep expertise in agentic workflows, AI evaluation solutions, prompt management, and secure AI development and testing practices
  • Strong knowledge of relational and document-based databases, data storage paradigms, and efficient RESTful API design
  • Experience establishing robust CI/CD pipelines, automated testing (unit and integration), and deployment practices
  • Strong leadership skills, including effective planning and management of complex projects, mentoring of team members, and fostering a collaborative, high-performing engineering culture
  • Excellent communicator, able to articulate complex technical concepts clearly to both technical and non-technical stakeholders
  • Bachelor's degree in Computer Science, Software Engineering, or a related technical field (preferred)
Job Responsibility
Job Responsibility
  • Design and implement AI platforms to enable scalable and secure access to LLMs from multiple model providers for diverse use cases
  • Design and implement agentic workflows, agentic tool ecosystems, and LLM prompt management solutions
  • Design, build, and optimize scalable model training, fine tuning, and inference pipelines, ensuring robust integration with production systems
  • Influence technical strategy and approach to developing embedding stores, vector databases, and other reusable assets
  • Lead initiatives to streamline ML and AI workflows, improve operational efficiency, and establish standardized procedures to achieve consistent, high-quality results across our AI systems
  • Design and develop backend services and RESTful APIs using Python and FastAPI, integrating seamlessly with ML pipelines and services
  • Take operational responsibility for team-owned services, including performance monitoring, optimization, troubleshooting, and participation in an on-call rotation
  • Collaborate with both technical and non-technical colleagues, including data and applied scientists, software engineers, product managers, and business stakeholders, to deliver reliable and scalable ML-driven products
  • Coach and mentor fellow ML engineers, promoting a culture of collaboration, continuous improvement, and engineering excellence within the team
  • Employ a diverse set of tools and platforms including Python, AWS, Databricks, Docker, Kubernetes, FastAPI, Terraform, Snowflake, Coralogix, and GitHub to build, deploy, and maintain scalable, highly available machine learning infrastructure
What we offer
What we offer
  • Competitive pay
  • Comprehensive healthcare benefits
  • Financial assistance for things like hybrid work, family planning
  • Generous parental leave
  • Flexible time-off policies
  • Mental health and wellness resources
  • Learning, development, and recognition programs
  • Fulltime
Read More
Arrow Right

Senior Staff Engineer, Applied AI

GEICO is seeking a Senior Staff Engineer, Applied AI to provide technical archit...
Location
Location
United States , Chevy Chase, MD; Palo Alto, CA
Salary
Salary:
130000.00 - 260000.00 USD / Year
geico.com Logo
Geico
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8 or more years of professional software engineering or applied machine learning experience
  • 2 or more years working with Generative AI or LLM-based systems in production
  • Proven track record of architecting and delivering complex AI/ML capabilities that span multiple teams and have measurable business impact
  • Deep hands-on expertise with Python and modern AI frameworks including LangChain, LangGraph, LangSmith, LlamaIndex, Hugging Face, OpenAI/Anthropic APIs, and emerging agentic frameworks
  • Demonstrated experience building and deploying production RAG (Retrieval-Augmented Generation) systems including document ingestion, chunking strategies, vector search, and context retrieval
  • Demonstrated experience designing and operating production AI systems including multi-agent architectures, intelligent automation, and workflow orchestration
  • Strong understanding of agent architectures, workflow orchestration, retrieval-augmented generation (RAG), vector databases, knowledge graphs, and semantic reasoning
  • Familiarity with Agent-to-Agent (A2A) communication protocols and Model Context Protocol (MCP) for building interoperable AI systems
  • Experience ensuring platform scalability, cross-domain coherence, and alignment with AI platform capabilities and strategy
  • Strong expertise in distributed systems, microservices architecture, service design, performance optimization, and reliability engineering
Job Responsibility
Job Responsibility
  • Specify architectures and system decompositions for AI/ML capabilities that involve significant integrations and cross-team collaboration across multiple product areas
  • Provide technical architecture and leadership for medium to large, complex, cross-functional AI initiatives with visibility at the tech VP level
  • Architect and lead implementation of advanced Generative AI solutions including agent-based systems, intelligent automation, document intelligence, and decision support systems that span multiple business domains
  • Design and implement sophisticated agentic workflows that orchestrate multiple AI agents, tools, APIs, reasoning steps, and business logic to automate complex enterprise processes at scale
  • Question status quo with an eye for simpler designs and more secure approaches, influencing tech VPs to set direction for multiple teams
  • Build systems and platforms that meet the highest standards for scalability, resilience, performance, availability, security, and compliance
  • Identify and scope opportunities for automating business processes using AI across multiple product areas and business domains
  • Advance the state-of-the-art in applied AI by integrating knowledge graphs, vector reasoning, retrieval architectures, and multi-agent systems to solve complex business problems
  • Drive innovation by exploring new models, frameworks, reasoning techniques, and AI architectures and applying them strategically to high-impact business challenges
  • Run rigorous experimentation programs including hypothesis definition, A/B testing, measurement frameworks, and iterative improvement across production AI systems
What we offer
What we offer
  • Comprehensive Total Rewards program that offers personalized coverage tailor-made for you and your family’s overall well-being
  • Financial benefits including market-competitive compensation
  • a 401K savings plan vested from day one that offers a 6% match
  • performance and recognition-based incentives
  • and tuition assistance
  • Access to additional benefits like mental healthcare as well as fertility and adoption assistance
  • Supports flexibility- We provide workplace flexibility as well as our GEICO Flex program, which offers the ability to work from anywhere in the US for up to four weeks per year
  • Fulltime
Read More
Arrow Right

Senior Staff Machine Learning Engineer – Agent Engineering

GEICO is seeking an experienced Sr Staff Machine Learning Engineer – Agent Engin...
Location
Location
United States , New York City; Palo Alto; Chevy Chase
Salary
Salary:
130000.00 - 300000.00 USD / Year
geico.com Logo
Geico
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of professional software development experience with at least two general-purpose programming languages such as Java, C++, Python, TypeScript, etc.
  • 7+ years of experience architecting, building & deploying end-to-end AI solutions utilizing open-source/cloud-agnostic components such as search engine (e.g. elastic search, Qdrant), data warehouse (e.g. snowflake), streaming platform (e.g. Kafka), relational database (e.g. postgresql), Nosql (e.g. Cassandra), distributed processing (e.g. Spark, Ray), workflow orchestration (e.g. Airflow, Temporal), etc.
  • 5+ years’ experience managing end-to-end solution development life cycle, esp. Measurement and monitoring of operations metrics, analytical insights and business outcomes via dashboards and other tools
  • Bachelor’s degree or above in Computer Science, Engineering, Statistics or a related field
Job Responsibility
Job Responsibility
  • Own design, development and maintenance of high-performance AI solutions that utilize agentic workflows to deliver concrete business value for internal stakeholders
  • Collaborate with cross-functional teams, including data scientists, ML engineers, software engineers, product managers, designers to gather requirements, define project scope and prioritize feature backlogs
  • Contribute to the selection, evaluation, and implementation of software technologies, tools, and frameworks
  • Take ownership in project planning and stakeholder management
  • Mentor and guide junior engineers via code reviews and design sessions
What we offer
What we offer
  • Comprehensive Total Rewards program
  • 401K savings plan with 6% match
  • performance and recognition-based incentives
  • tuition assistance
  • mental healthcare
  • fertility and adoption assistance
  • workplace flexibility
  • GEICO Flex program (work from anywhere in the US for up to four weeks per year)
  • Fulltime
Read More
Arrow Right

Senior Staff Software Engineer - AI

GEICO is seeking an experienced Engineer with a passion for building high-perfor...
Location
Location
United States , Seattle, WA; Austin, TX; Palo Alto, CA; Chicago, IL; Dallas, TX
Salary
Salary:
110000.00 - 230000.00 USD / Year
geico.com Logo
Geico
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience building and deploying ML systems in production with cross-functional engineering teams
  • Fluency in at least two modern languages such as Python, Go, Java, C++, or C# including object-oriented design
  • Experience architecting multi-component ML platforms using open-source/cloud-agnostic components: Datastores: PostgreSQL, NoSQL (MongoDB, Cassandra, CosmosDB) Streaming: Kafka, Flink, or Spark Streaming
  • Experience with end-to-end ML lifecycle: version control, CI/CD, Kubernetes, testing, monitoring, and production support
  • Experience with cloud providers (Azure, AWS or GCP) in production ML environments
  • Experience with observability tools and distributed systems monitoring, logging, tracing, and root cause analysis
  • Experience building multi-agent systems using LLMs and agentic frameworks (e.g., LangChain, LangGraph, AutoGen, Semantic Kernel, CrewAI)
  • Hands-on experience with RAG, semantic search, and vector databases (e.g., Milvus, pgvector, Qdrant, ElasticSearch)
  • Experience designing human-in-the-loop workflows and safety controls for autonomous systems
  • Strong architecture and design skills with ability to influence technical direction and roadmap
Job Responsibility
Job Responsibility
  • Design and build a multi-agent AI platform where specialized agents autonomously detect, diagnose, and resolve issues through agent-to-agent (A2A) collaboration
  • Develop intelligent agents using LLMs and agentic frameworks that coordinate detection, diagnostic, remediation, and knowledge tasks with minimal human intervention
  • Define agent interaction protocols, A2A communication standards, and evaluation frameworks for agent decision quality and autonomous action safety
  • Architect vector database solutions (Milvus, pgvector, Qdrant) for semantic search and RAG to enable context-aware agent decision-making
  • Build end-to-end ML pipelines for severity classification, anomaly detection, failure pattern recognition, and impact forecasting using observability data
  • Establish scalable orchestration infrastructure for multi-agent workflows with CI/CD, automated evaluation, canary releases, and rollback strategies
  • Implement monitoring for agent interactions, A2A communication patterns, decision quality, data drift, and system reliability
  • Lead technical architecture ensuring scalability, observability, and integration with existing alerting, logging, and monitoring systems
  • Define standards for agent safety, explainability, governance, and human-in-the-loop controls for high-impact automated actions
  • Partner with SRE, Product, and Engineering teams to translate reliability goals into measurable ML objectives and maintain pragmatic technical roadmaps
What we offer
What we offer
  • Comprehensive Total Rewards program that offers personalized coverage tailor-made for you and your family’s overall well-being
  • Financial benefits including market-competitive compensation
  • a 401K savings plan vested from day one that offers a 6% match
  • performance and recognition-based incentives
  • and tuition assistance
  • Access to additional benefits like mental healthcare as well as fertility and adoption assistance
  • Supports flexibility- We provide workplace flexibility as well as our GEICO Flex program, which offers the ability to work from anywhere in the US for up to four weeks per year
  • Fulltime
Read More
Arrow Right

Member of technical staff - Research - Agent

About H: H exists to push the boundaries of superintelligence with agentic AI. B...
Location
Location
France; United Kingdom , Paris; London
Salary
Salary:
Not provided
hcompany.ai Logo
H Company
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Senior Experience: Previous demonstrable role(s) as a Staff, Principal, or Senior Engineer (or equivalent Research Scientist) in a Frontier AI Lab with a proven track record of leading complex, end-to-end AI/ML projects from conception to production
  • Education / Publication: Preferably PhD (or equivalent research experience) in Machine Learning, Computer Science, or a related field, preferably with a strong publication record (e.g., NeurIPS, ICML, ICLR) in Computer Science
  • Core Expertise: Deep theoretical and practical expertise in Agentic AI and proven experience building, scaling, and shipping solutions involving foundation models (LLMs/VLMs)
  • Soft Skills: Collaborative: Enjoys collaboration and thrives in a teamwork-oriented, fast-paced research environment
  • High-Impact Communicator: Possesses impactful communication skills, with the ability to bridge the gap between research and engineering and articulate complex ideas clearly
  • Mission-Driven: Genuinely eager to explore and solve the new engineering and research challenges at the frontier of agentic AI
Job Responsibility
Job Responsibility
  • Research & Leadership: Design and develop new agents, proposing new research directions, e.g., combining state-of-the-art RL with foundation models (LLMs/VLMs)
  • Algorithm & Systems Design: Design, implement, and scale complex, high-performance systems for training large-scale agents. This includes both the foundational infrastructure and the novel algorithms, reward models, and sophisticated training environments
  • Research-to-Production: Collaborate closely with researchers and engineers to implement, test, and productionize new agent logics, learning algorithms, and system architectures
  • Evaluation & Reliability: Create, manage, and scale massive benchmarks and evaluation systems to rigorously track agent capabilities. You will own system reliability, scalability, and observability for our entire research infrastructure
  • Mentorship & Standards: Mentor and guide other engineers and researchers on the team, fostering technical excellence. You will establish and enforce engineering standards, tooling, and best practices for both code and research design
  • Innovation: Conduct thorough code and design reviews, champion technical innovation, and proactively address technical debt to accelerate the R&D lifecycle
What we offer
What we offer
  • Join the exciting journey of shaping the future of AI, and be part of the early days of one of the hottest AI startups
  • Collaborate with a fun, dynamic, and multicultural team, working alongside world-class AI talent in a highly collaborative environment
  • Enjoy a competitive salary
  • Unlock opportunities for professional growth, continuous learning, and career development
  • Fulltime
Read More
Arrow Right

Staff Software Engineer, Vehicle AI

GM is looking to hire highly skilled and experienced Staff Software Engineers to...
Location
Location
United States , Mountain View, California
Salary
Salary:
189300.00 - 290000.00 USD / Year
gm.com Logo
General Motors
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, related technical field, or equivalent practical experience
  • 8+ years of professional software development experience, with a focus on large-scale distributed systems or AI/ML infrastructure
  • Expert proficiency in one or more programming languages such as Python, C++, Java, or Kotlin
  • Extensive experience designing, building, and deploying production-grade AI/ML models or intelligent agents
  • Demonstrated technical leadership in complex projects, including mentoring and driving cross-functional initiatives
Job Responsibility
Job Responsibility
  • Lead the architecture and implementation of next-generation AI agents, from conceptualization to production deployment
  • Drive technical direction and strategy for the AI agent platform, ensuring scalability, reliability, and performance
  • Mentor and guide junior and senior engineers, fostering a culture of technical excellence and best practices
  • Collaborate with Product Managers and other engineering teams to define requirements and deliver impactful solutions
  • Conduct complex code reviews, system design reviews, and provide constructive feedback
  • Identify and address technical debt, performance bottlenecks, and architectural challenges within the agent infrastructure
  • Stay current with the latest advancements in AI, machine learning, and software engineering to continually improve our technology stack
What we offer
What we offer
  • Incentive pay program offers payouts based on company performance, job level, and individual performance
  • Company vehicle evaluation program
  • This Job may be eligible for relocation benefits
  • Fulltime
Read More
Arrow Right

Senior Staff Machine Learning Engineer - Clinical - AI Teams

We are looking for a Senior Staff Machine Learning Engineer to join the Clinical...
Location
Location
France , Paris
Salary
Salary:
Not provided
doctolib.fr Logo
Doctolib
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years in ML/AI with 3+ years at Staff+ or Principal level leading complex, multi-team technical initiatives
  • PhD in Computer Science, AI, Statistics, or related field (or equivalent research experience)
  • Deep expertise in at least two of: clinical NLP, LLM fine-tuning and evaluation, automatic speech recognition, RAG systems, or reinforcement learning
  • Expert in Python, PyTorch/Transformers for training, vLLM/similar for inference with a track record deploying ML systems in production (AWS/GCP)
  • Exceptional communication skills and are able to align diverse stakeholders and explain complex technical decisions
Job Responsibility
Job Responsibility
  • Own the long-term technical roadmap for clinical ML systems, from model architecture selection to production deployment patterns
  • Drive strategic build vs. buy decisions, balancing custom development with foundation model APIs
  • Define standards for safe rollout in healthcare contexts: shadow testing, staged deployment, and human-in-the-loop workflows
  • Lead design and implementation of LLM-powered clinical agents (consultation summarization, clinical coding, evidence-grounded recommendations)
  • Establish rigorous evaluation frameworks using real-world clinical datasets, ensuring outputs meet clinical accuracy and safety standards
  • Build production infrastructure: model and prompt versioning, guardrails, uncertainty quantification, cost optimization, and observability
  • Define a bold strategy for online and offline performance objectives to reach new levels of healthcare professionals satisfaction
  • Mentor Staff or Senior ML Engineers and Applied Scientists, elevating technical standards across the organization
  • Lead cross-functional initiatives spanning Product, Medical Affairs, Legal, and Compliance teams
  • Translate clinical needs into research questions and production systems that measurably improve patient outcomes
What we offer
What we offer
  • Free comprehensive health insurance for you and your children
  • Parent Care Program: receive one additional month of leave on top of the legal parental leave
  • Free mental health and coaching services through our partner Moka.care
  • For caregivers and workers with disabilities, a package including an adaptation of the remote policy, extra days off for medical reasons, and psychological support
  • Work from abroad for up to 10 days per year thanks to our flexibility days policy
  • Work Council subsidy to refund part of sport club membership or creative class
  • Up to 14 days of RTT
  • A subsidy from the work council to refund part of the membership to a sport club or a creative class
  • Lunch voucher with Swile card
  • Fulltime
Read More
Arrow Right

Staff Software Engineer, AI Agent Platform

The Geico AI Agent Platform team is seeking an exceptional Staff Software Engine...
Location
Location
United States , Chevy Chase; New York City
Salary
Salary:
115000.00 - 260000.00 USD / Year
geico.com Logo
Geico
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in computer science, Engineering, Mathematics, or a related field
  • an advanced degree (master’s or Ph.D.) is highly desirable
  • 6+ years of hands-on experience in designing, implementing, and maintaining multi-tenant AIML systems and platforms in production environments
  • 6+ years of experience working with cloud platforms such as Azure and AWS
  • Extensive expertise in designing and deploying large-scale data pipelines and real-time inference systems and managing the end-to-end AI Agent and/or AIML system development lifecycles, including configuration, evaluation, monitoring, observability and AuthN/AuthR considerations
  • 6+ years of experience working with common backend systems & tools (e.g, Kubernetes, Temporal, OpenSearch, PostgreSQL, Redis, Neo4J, etc.)
  • Deep understanding of Docker, container optimization, and multi-stage builds
  • Experience with Prometheus, Grafana, Open Telemetry and distributed tracing
  • 3+ years of experience building front-end web applications using frameworks such as React and/or Next.JS
  • Deep proficiency in programming languages such as Python, Java, Go, etc., with a strong emphasis on coding excellence
Job Responsibility
Job Responsibility
  • Architect and implement scalable multi-tenant backend systems for building AI agent workflows, including agent configuration, offline evaluation, synthetic data generation, workflow simulation, agent marketplace, etc. using Azure Kubernetes Service (AKS), FastAPI, etc., ensuring economy of scale and control cost of maintenance
  • Collaborate with Design team to architect and implement frontend experiences and workflows for onboarding both technical and non-technical stakeholders, maximizing user adoption and successful AI agent development
  • Develop observability frameworks to ensure 99.9%+ uptime for AI agent platforms through robust monitoring, alerting, and incident response procedures
  • Evaluate and (if desirable) integrate cutting-edge GenAI frameworks, libraries and vendors to maintain a state-of-the-art technology stack, including hybrid cloud solutions with AWS/GCP as backup or specialized use cases
  • Architect and implement scalable, high-performance machine learning platforms and systems capable of processing large data volumes and supporting real-time decision making and workflows
  • Oversee the end-to-end lifecycle of AI agent applications, ensuring robust testing, deployment, and ongoing monitoring
  • Ensure adherence to company production readiness standards, security protocols, and regulatory compliance throughout the development lifecycle
  • Continuously optimize platform performance, reducing latency and improving throughput for AI agent workloads
  • Design and implement backup, recovery, and business continuity plans for hosted platform applications & services
  • Design and maintain robust CI/CD pipelines for ML model deployment using Azure DevOps, GitHub Actions, and MLOps tools
What we offer
What we offer
  • Comprehensive Total Rewards program that offers personalized coverage tailor-made for you and your family’s overall well-being
  • Financial benefits including market-competitive compensation
  • a 401K savings plan vested from day one that offers a 6% match
  • performance and recognition-based incentives
  • and tuition assistance
  • Access to additional benefits like mental healthcare as well as fertility and adoption assistance
  • Supports flexibility- We provide workplace flexibility as well as our GEICO Flex program, which offers the ability to work from anywhere in the US for up to four weeks per year
  • Fulltime
Read More
Arrow Right