CrawlJobs Logo

Ai Engineer, Quality

United States, San Francisco Employment contract · Job Posted May 04, 2026
Apply Position
Job Link Share

Job Description

Fieldguide is building AI agents for the most complex audit and advisory workflows. We're a San Francisco-based Vertical AI company building in a $100B+ market undergoing rapid transformation. Over 50 of the top 100 accounting and consulting firms trust us to power their most mission-critical work. We're backed by Bessemer Venture Partners, 8VC, Floodgate, Y Combinator, Elad Gil, and other top-tier investors. As an AI Engineer, Quality , you will own the evaluation infrastructure that ensures our AI agents perform reliably at enterprise scale. This role is 100% focused on making evaluations a first-class engineering capability: building the unified platform, automated pipelines, and production feedback loops that let us evaluate any new model against all critical workflows within hours. You'll work at the intersection of ML engineering, observability, and quality assurance to ensure our agents meet the rigorous standards our customers demand.

Job Responsibility

  • Design and build a unified evaluation platform that serves as the single source of truth for all of our agentic systems and audit workflows
  • Build observability systems that surface agent behavior, trace execution, and failure modes in production, and feedback loops that turn production failures into first-class evaluation cases
  • Own the evaluation infrastructure stack including integration with LangSmith and LangGraph
  • Translate customer problems into concrete agent behaviors and workflows
  • Integrate and orchestrate LLMs, tools, retrieval systems, and logic into cohesive, reliable agent experiences
  • Build automated pipelines that evaluate new models against all critical workflows within hours of release
  • Design evaluation harnesses for our most complex Agentic systems and workflows
  • Implement comparison frameworks that measure effectiveness, consistency, latency, and cost across model versions
  • Design guardrails and monitoring systems that catch quality regressions before they reach customers
  • Use AI as core leverage in how you design, build, test, and iterate
  • Prototype quickly to resolve uncertainty, then harden systems for enterprise-grade reliability
  • Build evaluations, feedback mechanisms, and guardrails so agents improve over time
  • Work with SMEs and ML Engineers to create evaluation datasets by curating production traces
  • Design prompts, retrieval pipelines, and agent orchestration systems that perform reliably at scale
  • Define and document evaluation standards, best practices, and processes for the engineering organization
  • Advocate for evaluation-driven development and make it easy for the team to write and run evals
  • Partner with product and ML engineers to integrate evaluation requirements into agent development from day one
  • Take full ownership of large product areas rather than executing on narrow tasks

Requirements

  • Multiple years of experience shipping production software in complex, real-world systems
  • Experience with TypeScript, React, Python, and Postgres
  • Built and deployed LLM-powered features serving production traffic
  • Implemented evaluation frameworks for model outputs and agent behaviors
  • Designed observability or tracing infrastructure for AI/ML systems
  • Worked with vector databases, embedding models, and RAG architectures
  • Experience with evaluation platforms (LangSmith, Langfuse, or similar)
  • Comfort operating in ambiguity and taking responsibility for outcomes

Nice to have

Experience with audit and accounting workflows

What we offer

  • Competitive compensation packages with meaningful ownership
  • Flexible PTO
  • 401k
  • Wellness benefits, including a bundle of free therapy sessions
  • Technology & Work from Home reimbursement
  • Flexible work schedules

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Ai Engineer, Quality

8 matching positions

Delivery Quality Engineer, AI Business

As a Delivery Quality Engineer within Prolific AI Data Services, you will be the...
Location
Location
Mexico , Mexico City
Salary
Salary:
Not provided
prolific.com Logo
Prolific
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in quality engineering, data or annotation quality, analytics engineering, trust and integrity, or ML/LLM evaluation operations
  • Strong proficiency in Python and SQL, with comfort applying statistical concepts such as sampling strategies, confidence levels, and agreement metrics
  • A proven track record of turning ambiguous or messy quality problems into clear metrics, automated checks, and durable process improvements
  • Strong quality systems thinking, with the ability to translate complex edge cases into clear rules, tests, rubrics, and governance mechanisms
  • Hands-on experience instrumenting workflows and implementing pragmatic automation that catches quality and integrity issues early
  • Demonstrated ability to influence cross-functional teams (Product, Engineering, Operations, Client teams) and drive change without direct authority
  • Strong customer empathy, with a clear understanding of what “useful, trustworthy data” means for research, AI training, and evaluation use cases
Job Responsibility
Job Responsibility
  • Own end-to-end quality design for Prolific managed service studies, including rubrics, acceptance criteria, defect taxonomies, severity models, and clear definitions of done
  • Define, implement, and maintain quality measurement systems, including sampling plans, golden sets, calibration protocols, agreement targets, adjudication workflows, and drift detection
  • Build and deploy automated quality checks and launch gates using Python and SQL, such as schema and format validation, completeness checks, anomaly detection, consistency testing, and label distribution monitoring
  • Design and run launch readiness processes, including pre-launch checks, pilot calibration, ramp criteria, full-launch thresholds, and pause/rollback mechanisms
  • Partner with Product and Engineering to embed in-study quality controls and authenticity checks into workflows, tooling, and escalation paths
  • Write and continuously improve guidelines and training materials to keep participants, reviewers, and internal teams aligned on evolving quality standards
  • Investigate quality and integrity issues end to end, running root-cause analysis across guidelines, UX, screening, training, and operations, and driving corrective and preventive actions (CAPAs)
  • Build dashboards and operating cadences to track defect rates, rework, throughput versus quality trade-offs, integrity events, and SLA adherence
  • Lead calibration sessions and coach QA leads and reviewers to improve decision consistency, rubric application, and overall quality judgement
  • Translate one-off quality fixes into repeatable, scalable playbooks across customers, programs, and study types
What we offer
What we offer
  • competitive salary
  • benefits
  • remote working
  • impactful, mission-driven culture
  • equity
  • opportunity to earn a cash variable element, such as a bonus or commission
  • Fulltime
Read More
Arrow Right

Data Quality Engineer, AI Business

As a Data Quality Engineer within Prolific AI Data Services, you will be the qua...
Location
Location
Salary
Salary:
Not provided
prolific.com Logo
Prolific
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in quality engineering, data or annotation quality, analytics engineering, trust and integrity, or ML/LLM evaluation operations
  • Strong proficiency in Python and SQL, with comfort applying statistical concepts such as sampling strategies, confidence levels, and agreement metrics
  • A proven track record of turning ambiguous or messy quality problems into clear metrics, automated checks, and durable process improvements
  • Strong quality systems thinking, with the ability to translate complex edge cases into clear rules, tests, rubrics, and governance mechanisms
  • Hands-on experience instrumenting workflows and implementing pragmatic automation that catches quality and integrity issues early
  • Demonstrated ability to influence cross-functional teams (Product, Engineering, Operations, Client teams) and drive change without direct authority
  • Strong customer empathy, with a clear understanding of what “useful, trustworthy data” means for research, AI training, and evaluation use cases
Job Responsibility
Job Responsibility
  • Own end-to-end quality design for Prolific managed service studies, including rubrics, acceptance criteria, defect taxonomies, severity models, and clear definitions of done
  • Define, implement, and maintain quality measurement systems, including sampling plans, golden sets, calibration protocols, agreement targets, adjudication workflows, and drift detection
  • Build and deploy automated quality checks and launch gates using Python and SQL, such as schema and format validation, completeness checks, anomaly detection, consistency testing, and label distribution monitoring
  • Design and run launch readiness processes, including pre-launch checks, pilot calibration, ramp criteria, full-launch thresholds, and pause/rollback mechanisms
  • Partner with Product and Engineering to embed in-study quality controls and authenticity checks into workflows, tooling, and escalation paths
  • Write and continuously improve guidelines and training materials to keep participants, reviewers, and internal teams aligned on evolving quality standards
  • Investigate quality and integrity issues end to end, running root-cause analysis across guidelines, UX, screening, training, and operations, and driving corrective and preventive actions (CAPAs)
  • Build dashboards and operating cadences to track defect rates, rework, throughput versus quality trade-offs, integrity events, and SLA adherence
  • Lead calibration sessions and coach QA leads and reviewers to improve decision consistency, rubric application, and overall quality judgement
  • Translate one-off quality fixes into repeatable, scalable playbooks across customers, programs, and study types
Read More
Arrow Right

Quality Engineer – AI & Data Platforms

The role is responsible for planning, executing, and automating functionality, i...
Location
Location
India , Bangalore; Coimbatore
Salary
Salary:
Not provided
solitontech.com Logo
Soliton
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong experience in both manual and automated testing methodologies
  • Hands-on experience with test automation tools such as Selenium and Pytest
  • Experience in testing APIs, backend services, and enterprise system integrations
  • Familiarity with cloud-native environments, including Kubernetes, AKS, and OpenShift
  • Solid understanding of defect lifecycle management using tools such as Jira or Azure DevOps
  • Ability to work effectively within cross-functional Agile teams
  • Bachelor's or master's degree in computer science, Engineering, or a related field
  • 2-5 years of experience in software quality assurance or testing roles
  • Minimum 1 year of experience in automation tools such as Selenium and Pytest
  • Experience testing complex, distributed, or AI-enabled systems
Job Responsibility
Job Responsibility
  • Develop and execute comprehensive test plans and test cases aligned with CX Reimagination objectives
  • Define and validate acceptance criteria for AI-driven workflows and multi-agent systems
  • Validate core application features and integrations with enterprise data sources
  • Perform end-to-end, integration, and regression testing to ensure system stability after enhancements and releases
  • Design, develop, and maintain automated test scripts using tools such as Selenium and Pytest
  • Conduct performance and load testing for Kubernetes-based deployments, including AKS and OpenShift environments
  • Identify, document, prioritize, and track defects using tools like Jira or Azure DevOps
  • Collaborate closely with AI engineers, developers, and DevOps teams to ensure timely defect resolution
  • Validate usability, consistency, and intuitiveness of UI/UX for internal and external users, coordinating with relevant teams where applicable
  • Validate compliance with AI governance policies, including fairness, transparency, and data privacy
What we offer
What we offer
  • Flexible work hours
  • Special support for mothers
  • Profit sharing starting from the second year
  • Health insurance for employees and families
  • Gym and cycle allowance
Read More
Arrow Right

Lead Quality Engineer Modern Test Automation & AI

The IT Quality Tech Lead Analyst is a mid-level position responsible for hands- ...
Location
Location
Canada , Mississauga
Salary
Salary:
120800.00 - 170800.00 USD / Year
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 9-13 years of relevant testing experience with proficiency in UI, API & Responsive/Mobile testing
  • Highly experienced in one of the following: Cypress automation (javascript) or Selenium with, BDD tools like Cucumber
  • Experience in creating test suites, test suite execution and test reporting
  • Understanding of financial market concepts is essential
  • Knowledge of the Software Development Lifecycle (SDLC) and QA methodologies, Quality Process, relevant operating systems, languages and database tools, defect tracking systems, including change management, automation tools
  • Bachelor's University degree, Master's degree preferred
Job Responsibility
Job Responsibility
  • Ensure essential procedures are followed and help define operating standards and processes
  • Has the ability to operate with a limited level of direct supervision
  • Experience in AI driven test automation
  • Takes ownership of tasks assigned and reports to senior management appropriately at regular agreed intervals
  • Build and enhance scalable test automation frameworks that support efficient test execution and maintenance
  • Write, execute and run manual/automated test cases regularly and analyze test results, logging any defects and providing detailed reports
  • Integrate automated tests within the CI/CD pipeline ensuring that tests run continuously with each deployment
  • Experience in API/ database testing
  • Work closely with development, manual QA, and product teams to understand requirements, features, and testing needs in Agile environments
  • Utilize tools like JIRA to identify, log and prioritize defects
  • Fulltime
Read More
Arrow Right
New

Lead AI Engineer (Generative AI)

Location
Location
Australia , Melbourne
Salary
Salary:
160000.00 - 170000.00 AUD / Year
welovesalt.com Logo
Salt
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 2 – 4 years experience applying modern Generative AI in real delivery environments
  • A strong track record of providing technical leadership
  • A strong background in software engineering before stepping into a more AI-focused role in recent times
  • Deep expertise in Generative AI, large language models, retrieval‑augmented generation (RAG), and modern machine learning techniques
  • Strengths in prompt engineering, offline/online evaluation, safety guardrails, and telemetry‑driven improvement
  • Practical experience with RAG, embeddings/vector search, and tool‑use/function‑calling orchestration
  • The ability to define trustworthy AI metrics
  • A Software engineering background using the likes of React, Typescript, React Native
  • AWS experience / certifications
  • Strong knowledge of CI/CD
Job Responsibility
Job Responsibility
  • Accelerating the adoption of Generative AI for this company, leading the adoption of AI‑first engineering practices and an 'AI-first' mindset
  • Providing technical leadership and shaping AI-enabled solutions
  • Reporting into the Head of Technology and working closely with the Head of Enterprise AI
  • Collaborating with a variety of cross-functional teams including
  • Engineering, QA, BA, Product Management, Senior Business Stakeholders, Data Science, Cyber Security, Enterprise Architecture
  • Coaching and training engineers and leaders in the organisation in safe, effective AI‑assisted development
  • Encouraging teams to embed AI‑driven thinking in solution design and delivery
  • Defining and helping to operationalise trustworthy metrics to measure the impact of AI features, and developer productivity uplifts
  • Producing AI-augmented solution designs when required
  • Driving engineering excellence by uplifting standards, modernising technical practices, and championing AI‑aligned SDLC patterns
What we offer
What we offer
  • Super
  • Bonus
  • Fulltime
Read More
Arrow Right

Senior AI Engineer (Agentic AI / LLM Engineering)

We’re partnering with a rapidly growing, innovation-focused organization that is...
Location
Location
United States
Salary
Salary:
Not provided
zeektek.com Logo
Zeektek
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in software engineering, data engineering, or AI/ML engineering
  • Hands-on experience building applications using LLMs (e.g., GPT, open-source models)
  • Experience with agentic AI frameworks (e.g., LangChain, AutoGen, CrewAI, or similar)
  • Strong programming skills in Python
  • Experience working with large datasets and distributed data platforms
  • Familiarity with Databricks or similar modern data platforms
  • Experience building production-grade AI systems (not just POCs)
  • Strong understanding of prompt engineering and LLM optimization techniques
Job Responsibility
Job Responsibility
  • Design and build agentic AI systems leveraging large language models (LLMs)
  • Develop scalable AI solutions using modern data and AI platforms (Databricks preferred)
  • Translate business problems into production-ready AI workflows and applications
  • Collaborate with product and architecture teams to define AI use cases and technical approaches
  • Implement and optimize: Prompt engineering strategies
  • Token usage and cost efficiency
  • Model performance and response quality
  • Work with large-scale datasets to support training, fine-tuning, and inference workflows
  • Contribute to the development of AI engineering standards, frameworks, and best practices
  • Partner with data science teams to integrate models into production environments
What we offer
What we offer
  • Weekly Direct Deposit
  • 401K Matching
  • Competitive medical, dental and vision insurance
  • Consistent communication throughout your project
  • ZeekTek Referral Program
Read More
Arrow Right

Staff AI Engineer - Agentic AI Systems

As a Staff AI Engineer, you will play a key role in designing and delivering hig...
Location
Location
India , Bengaluru, Karnataka, India | Hyderabad, Telangana, India | Pune, Maharashtra, India
Salary
Salary:
Not provided
teradata.com Logo
Teradata
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science or equivalent from a recognized institution
  • 8+ years of experience in backend services, distributed systems, or data platform development
  • Strong proficiency in Java, Go, or Python for service development
  • Deep understanding of design principles, distributed system patterns, and service architecture
  • Hands-on experience designing and developing RESTful APIs
  • Experience with SQL and NoSQL databases and data modelling
  • Strong debugging, problem solving, and troubleshooting skills
  • Experience with modern containerization and orchestration tools such as Kubernetes
  • Knowledge of public cloud platforms
  • Experience with AI productivity tools (e.g., GitHub Copilot)
Job Responsibility
Job Responsibility
  • Design, architect, develop, and maintain high quality systems, services, and applications with an emphasis on scalability, reliability, and performance
  • Collaborate with cross-functional engineers and product partners to shape architecture and consistently deliver end to end features
  • Build and integrate robust RESTful APIs, ensuring security, data consistency, and maintainability
  • Work with SQL and NoSQL databases to implement efficient data models and service access patterns
  • Apply and experiment with AI/ML technologies, including agentic AI and large language models (LLMs)
  • Use AI powered engineering tools to improve development quality, speed, and productivity
  • Mentor engineers, supporting them in technical planning, implementation, and best practices
  • Identify and resolve system performance bottlenecks, optimizing code, architecture, and infrastructure
  • Write unit and integration tests and participate in code reviews to uphold engineering excellence
  • Investigate production issues, ensuring timely and effective solutions
  • Fulltime
Read More
Arrow Right

AI Systems Engineer – AI Model (Training & Inference)

The AMD AI Group is looking for a Senior Software Development Engineer to own th...
Location
Location
Canada , Markham
Salary
Salary:
106400.00 - 159600.00 CAD / Year
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Industry experience shipping production AI/ML infrastructure, with hands-on work spanning both training and inference.
  • Bachelor’s or Master’s degree or Ph.D in Computer/Software Engineering, Computer Science, or related technical discipline
Job Responsibility
Job Responsibility
  • Enable and optimize large-scale model training (LLMs, VLMs, MoE architectures) on AMD Instinct GPU clusters, ensuring correctness, reproducibility, and competitive throughput.
  • Build and maintain training infrastructure: job orchestration, distributed checkpointing, data loading pipelines, and storage optimization for multi-thousand GPU clusters on Kubernetes.
  • Debug and resolve training-specific issues including gradient norm explosions, non-deterministic behavior across GPU generations, and compute-communication overlap in distributed training (FSDP, DeepSpeed, Megatron-LM).
  • Optimize RCCL collective communication patterns for training workloads, including all-reduce, all-gather, and reduce-scatter across multi-node topologies.
  • Develop monitoring, alerting, and compliance infrastructure to ensure training cluster health, data security, and SLA adherence at scale.
  • Design and build end-to-end validation and testing infrastructure using proxy workloads, synthetic benchmarks, and configurable workload generators to systematically validate platform readiness across AMD Instinct GPU generations.
  • Write and optimize high-performance GPU kernels (GEMM, attention, quantized matmul, GPTQ/AWQ) in HIP, Triton, and MLIR targeting AMD Instinct architectures, with demonstrated ability to outperform open-source baselines.
  • Drive end-to-end inference enablement on new AMD GPU silicon - be among the first to get frontier models running on each new Instinct generation, creating reproducible guides and reference implementations.
  • Optimize inference serving frameworks (vLLM, SGLang, TorchServe) for AMD GPUs: batching strategies, KV-cache management, speculative decoding, and continuous batching for production throughput/latency targets.
  • Develop novel approaches to inference acceleration, including bio-inspired algorithms, SLM-assisted batching, and custom scheduling strategies that exploit AMD hardware characteristics.
  • Fulltime
Read More
Arrow Right