CrawlJobs Logo

AI Research Engineer, Enterprise Evaluations

scale.com Logo

Scale

Location Icon

Location:
United States , San Francisco

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

179400.00 - 224250.00 USD / Year

Job Description:

Scale AI is seeking a technically rigorous and driven AI Research Engineer to join our Enterprise Evaluations team. This high-impact role is critical to our mission of delivering the industry's leading GenAI Evaluation Suite. You will be a hands-on contributor to the core systems that ensure the safety, reliability, and continuous improvement of LLM-powered workflows and agents for the enterprise.

Job Responsibility:

  • Partner with Scale’s Operations team and enterprise customers to translate ambiguity into structured evaluation data, guiding the creation and maintenance of gold-standard human-rated datasets and expert rubrics that anchor AI evaluation systems
  • Analyze feedback and collected data to identify patterns, refine evaluation frameworks, and establish iterative improvement loops that enhance the quality and relevance of human-curated assessments
  • Design, research, and develop LLM-as-a-Judge autorater frameworks and AI-assisted evaluation systems. This includes creating models that critique, grade, and explain agent outputs (e.g., RLAIF, model-judging-model setups), along with scalable evaluation pipelines and diagnostic tools
  • Pursue research initiatives that explore new methodologies for automatically analyzing, evaluating, and improving the behavior of enterprise agents, pushing the boundaries of how AI systems are assessed and optimized in real-world contexts

Requirements:

  • Bachelor’s degree in Computer Science, Electrical Engineering, a related field, or equivalent practical experience
  • 2+ years of experience in Machine Learning or Applied Research, focused on applied ML systems or evaluation infrastructure
  • Hands-on experience with Large Language Models (LLMs) and Generative AI in professional or research environments
  • Strong understanding of frontier model evaluation methodologies and the current research landscape
  • Proficiency in Python and major ML frameworks (e.g., PyTorch, TensorFlow)
  • Solid engineering and statistical analysis foundation, with experience developing data-driven methods for assessing model quality

Nice to have:

  • Advanced degree (Master’s or Ph.D.) in Computer Science, Machine Learning, or a related quantitative field
  • Published research in leading ML or AI conferences such as NeurIPS, ICML, ICLR, or KDD
  • Experience designing, building, or deploying LLM-as-a-Judge frameworks or other automated evaluation systems for complex models
  • Experience collaborating with operations or external teams to define high-quality human annotator guidelines
  • Expertise in ML research engineering, stochastic systems, observability, or LLM-powered applications for model evaluation and analysis
  • Experience contributing to scalable pipelines that automate the evaluation and monitoring of large-scale models and agents
  • Familiarity with distributed computing frameworks and modern cloud infrastructure
What we offer:
  • Comprehensive health, dental and vision coverage
  • retirement benefits
  • a learning and development stipend
  • generous PTO
  • commuter stipend
  • equity grant

Additional Information:

Job Posted:
February 20, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for AI Research Engineer, Enterprise Evaluations

PhD AI Research Intern

Join our cutting-edge Machine Learning Research team at Atlassian as a PhD Resea...
Location
Location
Canada
Salary
Salary:
55.00 USD / Hour
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Completed Bachelors degree in Computer Science or a related field
  • Currently pursuing a PhD in Computer Science or a related field at any stage of your doctoral studies
  • Strong foundation in AI/ML, LLMs, modeling and/or optimization techniques
Job Responsibility
Job Responsibility
  • Collaborate cross-functionally with Research Scientists and Machine Learning Engineers to design, implement, and evaluate experiments that advance the performance, efficiency, and scalability of modern ML and LLM systems for our AI products
  • Curate, preprocess, and manage large-scale datasets for training and evaluation, ensuring data quality, diversity, and reproducibility across experiments
  • Conduct continued training, fine-tuning, and alignment of large language models for specialized applications such as conversational AI, summarization, generative search, and multimodal agents
  • Evaluate cutting-edge ML algorithms through rigorous experimentation and provide detailed analyses highlighting performance insights, failure modes, and opportunities for improvement
  • Contribute to publications and presentations at internal workshops or top-tier academic venues, helping to drive innovation in Enterprise AI and large-scale ML systems
What we offer
What we offer
  • health and wellbeing resources
  • paid volunteer days
Read More
Arrow Right

Senior Generative AI Engineer

The Citi Innovation Lab is a leader in creating new ideas, innovative technology...
Location
Location
Israel , Tel Aviv
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Hands-on experience with transformer-based models and their applications
  • Strong understanding of LLM, LLM model selection, benchmarking, and optimization
  • Experience with RAG systems and vector databases
  • Proficiency in developing and deploying AI agents
  • Knowledge of open-source models and methods, including benchmarks for evaluating AI performance
  • Knowledge of security risks and mitigation strategies for autonomous AI agents, including OWASP guidelines
  • Proficiency in Python and experience with libraries such as Pandas, Tabula, and TensorFlow/PyTorch
  • Strong problem-solving skills and attention to detail
  • Excellent communication and documentation skills
Job Responsibility
Job Responsibility
  • Develop and implement enterprise scale cutting edge models such as visual document understanding and text2code
  • Implement and Optimize vector-based retrieval systems for RAG by covering embedding models, ANN indexing, hybrid search, and re-ranking
  • Implement autonomous AI agents to implement adaptive, error resistant data extraction, and content validation tasks
  • Develop and deploy enterprise software applications using state of the art practices, such as micro services, modular code, as well as proficiency in writing unit and integration tests to ensure the accuracy and reliability of the AI applications
  • Ensure data privacy and security in all AI-driven processes, adhering to OWASP guidelines and Citi’s stringent authentication and authorization policies
  • Collaborate with cross-functional teams to integrate AI solutions into existing workflows
  • Document the development process and create comprehensive technical specifications
  • Manage and maintain AI applications, ensuring best practices in model management and versioning
  • Deploy resulting AI applications using industrial strength framework and processes, including Kubernetes and OpenShift for scalable and efficient operations on-premises
  • Ability to research and develop and utilize transformer-based models for enhanced application performance
  • Fulltime
Read More
Arrow Right

PhD AI Research Intern

Join our cutting-edge Machine Learning Research team at Atlassian as a PhD Resea...
Location
Location
United States , Seattle
Salary
Salary:
49.00 - 75.00 USD / Hour
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Completed Bachelors degree in Computer Science or a related field
  • Currently pursuing a PhD in Computer Science or a related field at any stage of your doctoral studies
  • Degree completion date cannot be earlier than September 2026 - June 2027
  • Strong foundation in AI/ML, LLMs, modeling and/or optimization techniques
  • Exhibit a solid grasp of algorithms and data structures
  • Demonstrate proficiency in Python programming and ability to write clean, efficient, and well-documented code
  • Experience working with large-scale datasets, including data preprocessing, augmentation, and scaling techniques
  • Has expertise in managing data using Python libraries such as NumPy, Pandas, Matplotlib, in addition to leveraging models from Hugging Face and has practical knowledge of applied machine learning and deep learning frameworks, like PyTorch
  • Demonstrated exposure to natural language processing (NLP) and Computer Vision (CV)
  • Familiarity with state-of-the-art research in machine learning and AI, as evidenced by relevant coursework, publications, or projects
Job Responsibility
Job Responsibility
  • Collaborate cross-functionally with Research Scientists and Machine Learning Engineers to design, implement, and evaluate experiments that advance the performance, efficiency, and scalability of modern ML and LLM systems for our AI products
  • Curate, preprocess, and manage large-scale datasets for training and evaluation, ensuring data quality, diversity, and reproducibility across experiments
  • Conduct continued training, fine-tuning, and alignment of large language models for specialized applications such as conversational AI, summarization, generative search, and multimodal agents
  • Evaluate cutting-edge ML algorithms through rigorous experimentation and provide detailed analyses highlighting performance insights, failure modes, and opportunities for improvement
  • Contribute to publications and presentations at internal workshops or top-tier academic venues, helping to drive innovation in Enterprise AI and large-scale ML systems
What we offer
What we offer
  • health and wellbeing resources
  • paid volunteer days
Read More
Arrow Right

Multimodal AI Engineer, Document Understanding

Join us and help shape the future of AI by redefining document workflows with AI...
Location
Location
United States , San Francisco
Salary
Salary:
Not provided
llamaindex.ai Logo
LlamaIndex
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3-7 years of experience in machine learning engineering or applied research
  • Strong software engineering fundamentals with production Python experience (modern tooling: uv, ruff, mypy, Pydantic)
  • Hands-on experience training, fine-tuning, or deploying ML models in production
  • Deep understanding of modern ML techniques, particularly in computer vision, NLP, or multimodal learning
  • Experience with at least one of: data pipeline development, model training/fine-tuning, or ML infrastructure
  • Ability to read and implement from research papers and technical specifications
  • Track record of executing with high intensity in fast-paced environments
  • Strong technical communication skills and comfort with open-source collaboration
Job Responsibility
Job Responsibility
  • Develop, train, and optimize machine learning models for document structure understanding, table extraction, layout analysis, and multimodal content processing
  • Build robust data pipelines, evaluation frameworks, and experimentation infrastructure
  • Design and implement production ML systems that handle complex, real-world documents at scale
  • Stay current with latest advances in vision-language models, document AI, and multimodal learning
  • Collaborate with engineering teams to integrate ML innovations into production APIs
  • Contribute to both our open-source frameworks and enterprise offerings
  • Drive technical decisions while balancing research exploration with product delivery
What we offer
What we offer
  • Competitive base salary and equity compensation
  • Comprehensive medical/dental/vision coverage for you and your family
  • Unlimited paid time off policy
  • Daily catered lunch and snacks in the San Francisco office
  • Budget for conferences, research materials, and professional development
  • Access to cutting-edge compute resources and research tools
  • Fulltime
Read More
Arrow Right
New

Staff Software Engineer – Forward Deployed

We are seeking a skilled Software Engineer who will design, build, and maintain ...
Location
Location
China , Shanghai; Dalian; Wuhan
Salary
Salary:
Not provided
pfizer.de Logo
Pfizer
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Engineering, or related field with 8-12 years of relevant experience
  • AI-Augmented Development: optimize AI tool usage, train engineers on AI-augmented workflows, evaluate new AI development tools, establish practices that balance AI speed with verification rigor
  • Business Immersion: rapidly acquire domain expertise, translate between business and engineering, mentor engineers on immersion
  • Data Integration: navigate complex enterprise data landscapes, build relationships to gain data access, handle undocumented schemas, build robust integration solutions, mentor engineers on data integration
  • Full-Stack Development: build complete applications rapidly across any technology stack, select the right tools, balance technical debt with delivery speed, mentor engineers on full-stack development
  • Multi-Audience Communication: influence through communication at all levels, handle difficult conversations skillfully, train engineers on effective communication, represent teams across the function
  • Problem Discovery: seek out undefined problems, embed with users to discover latent needs, coach engineers on problem discovery techniques, turn ambiguity into clear problem statements
  • Rapid Prototyping & Validation: lead rapid delivery initiatives, coach on prototype-first approaches, establish trust through consistent fast delivery, define clear criteria for prototype-to-production transitions
  • Site Reliability Engineering: define reliability standards, drive post-incident improvements systematically, design capacity planning processes, mentor engineers on SRE practices
  • Stakeholder Management: influence senior stakeholders, manage complex stakeholder landscapes with competing agendas, build trust rapidly with new stakeholders, shield teams from organizational friction
Job Responsibility
Job Responsibility
  • Delivery: Lead technical delivery of complex projects across multiple teams, unblock others through hands-on contributions, ensure engineering quality
  • AI: Design AI-augmented engineering workflows for your area, evaluate new AI tools, train engineers on effective AI usage, balance speed with verification
  • People: Coach multiple engineers on career growth, lead hiring for technical roles across your area, shape team technical culture
  • Business: Drive business outcomes through technical solutions across your area, influence product roadmaps, partner effectively with business stakeholders
  • Process: Drive process efficiency within your team, coordinate cross-functional technical work, lead retrospectives
  • Documentation: Design documentation strategies for your projects, ensure knowledge persists beyond individuals, write specifications that enable effective collaboration
  • Fulltime
Read More
Arrow Right
New

Field Research Engineer

We're seeking a Field Research Engineer to serve as Prolific's technical researc...
Location
Location
United States
Salary
Salary:
Not provided
prolific.com Logo
Prolific
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of experience in AI/ML research, research engineering, or applied ML
  • Strong knowledge of LLM evaluation methodologies, data collection design, and human feedback approaches
  • Experience designing AI-assisted or model-in-the-loop workflows
  • Track record of executing research or applied AI/ML projects with clear outcomes
  • Can build quickly when needed - comfortable prototyping solutions and building own tooling
  • Strong communicator across contexts
  • Based in US, Canada, or Mexico (or able to work US timezone hours)
Job Responsibility
Job Responsibility
  • Serve as Prolific's technical research partner on customer engagements with AI labs and enterprise AI teams
  • Advise and prototype bespoke solutions like evaluation methodology, rubric design, data collection approaches, quality frameworks, verifiers and validation frameworks, as well as RL environments
  • Own methodology for projects - from scoping through delivery - working alongside our services team
  • Build trusted relationships with customer research and engineering teams
  • Surface patterns from customer work that should become platform capabilities, partnering with Product and Engineering to shape roadmap priorities
  • Work with Services and Account teams to qualify opportunities and shape proposals
  • Share learnings across the Research Engineering function to strengthen our collective expertise
  • Contribute to Prolific's research publications and external collaborations as opportunities arise
  • Collaborate with product teams to translate research insights into practical applications
  • Mentor team members on advanced AI concepts and emerging research
What we offer
What we offer
  • competitive salary
  • benefits
  • remote working
  • equity
  • opportunity to earn a cash variable element, such as a bonus or commission
  • Fulltime
Read More
Arrow Right

Product Manager III (Research)

As a Product Manager for Emerging Technologies with Everseen, you will act as a ...
Location
Location
Timișoara; Belgrade
Salary
Salary:
Not provided
everseen.ai Logo
Everseen
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6 to 10 years experience as a Product Manager, ideally with exposure to technology-driven products or research collaborations
  • Bachelor’s degree
  • advanced certifications preferred
  • Experience developing positioning, packaging, and pricing strategies and plans
  • Proficiency in managing the full product lifecycle, from ideation and development to launch and post-market analysis
  • Ability to develop and articulate a compelling product vision aligned with cutting-edge technology trends
  • Commitment to continuous improvement, process optimization, and operational excellence in dynamic environments
  • Strong analytical and strategic thinking skills with the ability to evaluate complex markets and emerging opportunities
  • A passion for innovation, research, and applying new technologies to solve business challenges
  • Excellent communication and stakeholder management capabilities, with a track record of effective cross-functional leadership
Job Responsibility
Job Responsibility
  • Serve as a subject matter expert on emerging technology trends and advise senior leadership on strategic opportunities
  • Product Strategy & Roadmap: Defines vision for initiatives leveraging emerging technologies
  • drives strategic alignment
  • Requirement Definition: Leads business requirements elicitation for complex initiatives
  • Documentation: Leads strategic product communications and documentation
  • Quality Assurance: Champion quality and process improvements across product development cycles
  • User Experience: Oversee user research and experience standards to ensure innovations meet customer needs and expectations
  • Research Activity: Leads comprehensive competitor and market analysis across products
  • advises on strategy and evaluates business opportunities
  • Collaborate closely with Research teams to identify, evaluate, and translate technological advancements into viable product opportunities
  • Fulltime
Read More
Arrow Right
New

Senior AI Engineer

Guidepoint is seeking an experienced Senior AI Engineer to join our Toronto-base...
Location
Location
Canada , Toronto
Salary
Salary:
Not provided
modoras.com Logo
Modoras Accounting Syd
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years of professional experience (or 5+ with a Master’s degree) designing, building, and scaling distributed, production-grade backend systems
  • 2+ years building and operating Generative AI and agentic systems in production
  • Strong software engineering fundamentals in Python, including building and scaling REST APIs using frameworks such as FastAPI, with experience in asynchronous programming and microservices
  • Hands-on experience building enterprise AI agents and workflows using LLM platforms such as OpenAI, Anthropic (Claude), or Google Gemini, and frameworks like LangChain or agent SDKs
  • Experience building and operating within the enterprise AI ecosystem, including custom GPTs or agents, agent builders, connectors/apps, and application or agent SDKs (e.g., OpenAI Apps SDK, ChatKit, or equivalents)
  • Experience designing and operating agent integration layers (e.g., MCP servers or similar) that connect AI agents to internal APIs, tools, and services, with secure authentication and authorization using enterprise identity platforms such as Okta, Microsoft Entra ID, or OAuth-based systems
  • Strong understanding of AI governance, compliance, and responsible AI practices, including access control, auditability, data handling, and secure deployment of AI systems in enterprise environments
  • Direct experience with RAG, vector search using databases such as Elasticsearch, multi-agent AI systems, tool-calling agents, prompt engineering, and agent evaluation in production environments
  • Cloud-native experience deploying and operating containerized applications on Azure (preferred) or AWS/GCP using Docker and Kubernetes
  • Proven ability to lead complex technical initiatives, make sound architectural decisions, and mentor engineers building production-ready AI systems
Job Responsibility
Job Responsibility
  • Design, build, and operate scalable, low-latency backend services and REST APIs that power Generative AI capabilities, including retrieval-augmented generation (RAG) pipelines, vector search, and enterprise-grade agentic systems
  • Own the full lifecycle of AI applications and agents, from system architecture and development to CI/CD, deployment, agent evaluation, monitoring, and ongoing optimization in production
  • Build production-grade research agents and enterprise AI workflows that integrate LLMs with proprietary knowledge, vector databases (e.g., Elasticsearch), internal tools, external APIs, and real-time data
  • Design and operate multi-agent AI systems, including tool-calling agents and agent orchestration patterns, to support complex research and enterprise workflows
  • Apply AIOps best practices for building, evaluating, deploying, and operating AI agents with strong observability, reliability, and quality controls
  • Continuously improve retrieval and generation quality using prompt engineering, retrieval tuning, re-ranking, advanced chunking strategies, and hallucination reduction techniques
  • Provide technical leadership through design discussions, code reviews, and mentorship, and partner closely with product and business stakeholders to influence the AI roadmap
What we offer
What we offer
  • Paid Time Off
  • Comprehensive benefits plan
  • Company RRSP Match
  • Development opportunities through the LinkedIn Learning platform
Read More
Arrow Right