CrawlJobs Logo

AI Research Engineer, Enterprise Evaluations

scale.com Logo

Scale

Location Icon

Location:
United States , San Francisco

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

179400.00 - 224250.00 USD / Year

Job Description:

Scale AI is seeking a technically rigorous and driven AI Research Engineer to join our Enterprise Evaluations team. This high-impact role is critical to our mission of delivering the industry's leading GenAI Evaluation Suite. You will be a hands-on contributor to the core systems that ensure the safety, reliability, and continuous improvement of LLM-powered workflows and agents for the enterprise.

Job Responsibility:

  • Partner with Scale’s Operations team and enterprise customers to translate ambiguity into structured evaluation data, guiding the creation and maintenance of gold-standard human-rated datasets and expert rubrics that anchor AI evaluation systems
  • Analyze feedback and collected data to identify patterns, refine evaluation frameworks, and establish iterative improvement loops that enhance the quality and relevance of human-curated assessments
  • Design, research, and develop LLM-as-a-Judge autorater frameworks and AI-assisted evaluation systems. This includes creating models that critique, grade, and explain agent outputs (e.g., RLAIF, model-judging-model setups), along with scalable evaluation pipelines and diagnostic tools
  • Pursue research initiatives that explore new methodologies for automatically analyzing, evaluating, and improving the behavior of enterprise agents, pushing the boundaries of how AI systems are assessed and optimized in real-world contexts

Requirements:

  • Bachelor’s degree in Computer Science, Electrical Engineering, a related field, or equivalent practical experience
  • 2+ years of experience in Machine Learning or Applied Research, focused on applied ML systems or evaluation infrastructure
  • Hands-on experience with Large Language Models (LLMs) and Generative AI in professional or research environments
  • Strong understanding of frontier model evaluation methodologies and the current research landscape
  • Proficiency in Python and major ML frameworks (e.g., PyTorch, TensorFlow)
  • Solid engineering and statistical analysis foundation, with experience developing data-driven methods for assessing model quality

Nice to have:

  • Advanced degree (Master’s or Ph.D.) in Computer Science, Machine Learning, or a related quantitative field
  • Published research in leading ML or AI conferences such as NeurIPS, ICML, ICLR, or KDD
  • Experience designing, building, or deploying LLM-as-a-Judge frameworks or other automated evaluation systems for complex models
  • Experience collaborating with operations or external teams to define high-quality human annotator guidelines
  • Expertise in ML research engineering, stochastic systems, observability, or LLM-powered applications for model evaluation and analysis
  • Experience contributing to scalable pipelines that automate the evaluation and monitoring of large-scale models and agents
  • Familiarity with distributed computing frameworks and modern cloud infrastructure
What we offer:
  • Comprehensive health, dental and vision coverage
  • retirement benefits
  • a learning and development stipend
  • generous PTO
  • commuter stipend
  • equity grant

Additional Information:

Job Posted:
February 20, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for AI Research Engineer, Enterprise Evaluations

PhD AI Research Intern

Join our cutting-edge Machine Learning Research team at Atlassian as a PhD Resea...
Location
Location
Canada
Salary
Salary:
55.00 USD / Hour
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Completed Bachelors degree in Computer Science or a related field
  • Currently pursuing a PhD in Computer Science or a related field at any stage of your doctoral studies
  • Strong foundation in AI/ML, LLMs, modeling and/or optimization techniques
Job Responsibility
Job Responsibility
  • Collaborate cross-functionally with Research Scientists and Machine Learning Engineers to design, implement, and evaluate experiments that advance the performance, efficiency, and scalability of modern ML and LLM systems for our AI products
  • Curate, preprocess, and manage large-scale datasets for training and evaluation, ensuring data quality, diversity, and reproducibility across experiments
  • Conduct continued training, fine-tuning, and alignment of large language models for specialized applications such as conversational AI, summarization, generative search, and multimodal agents
  • Evaluate cutting-edge ML algorithms through rigorous experimentation and provide detailed analyses highlighting performance insights, failure modes, and opportunities for improvement
  • Contribute to publications and presentations at internal workshops or top-tier academic venues, helping to drive innovation in Enterprise AI and large-scale ML systems
What we offer
What we offer
  • health and wellbeing resources
  • paid volunteer days
Read More
Arrow Right

Senior Generative AI Engineer

The Citi Innovation Lab is a leader in creating new ideas, innovative technology...
Location
Location
Israel , Tel Aviv
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Hands-on experience with transformer-based models and their applications
  • Strong understanding of LLM, LLM model selection, benchmarking, and optimization
  • Experience with RAG systems and vector databases
  • Proficiency in developing and deploying AI agents
  • Knowledge of open-source models and methods, including benchmarks for evaluating AI performance
  • Knowledge of security risks and mitigation strategies for autonomous AI agents, including OWASP guidelines
  • Proficiency in Python and experience with libraries such as Pandas, Tabula, and TensorFlow/PyTorch
  • Strong problem-solving skills and attention to detail
  • Excellent communication and documentation skills
Job Responsibility
Job Responsibility
  • Develop and implement enterprise scale cutting edge models such as visual document understanding and text2code
  • Implement and Optimize vector-based retrieval systems for RAG by covering embedding models, ANN indexing, hybrid search, and re-ranking
  • Implement autonomous AI agents to implement adaptive, error resistant data extraction, and content validation tasks
  • Develop and deploy enterprise software applications using state of the art practices, such as micro services, modular code, as well as proficiency in writing unit and integration tests to ensure the accuracy and reliability of the AI applications
  • Ensure data privacy and security in all AI-driven processes, adhering to OWASP guidelines and Citi’s stringent authentication and authorization policies
  • Collaborate with cross-functional teams to integrate AI solutions into existing workflows
  • Document the development process and create comprehensive technical specifications
  • Manage and maintain AI applications, ensuring best practices in model management and versioning
  • Deploy resulting AI applications using industrial strength framework and processes, including Kubernetes and OpenShift for scalable and efficient operations on-premises
  • Ability to research and develop and utilize transformer-based models for enhanced application performance
  • Fulltime
Read More
Arrow Right

PhD AI Research Intern

Join our cutting-edge Machine Learning Research team at Atlassian as a PhD Resea...
Location
Location
United States , Seattle
Salary
Salary:
49.00 - 75.00 USD / Hour
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Completed Bachelors degree in Computer Science or a related field
  • Currently pursuing a PhD in Computer Science or a related field at any stage of your doctoral studies
  • Degree completion date cannot be earlier than September 2026 - June 2027
  • Strong foundation in AI/ML, LLMs, modeling and/or optimization techniques
  • Exhibit a solid grasp of algorithms and data structures
  • Demonstrate proficiency in Python programming and ability to write clean, efficient, and well-documented code
  • Experience working with large-scale datasets, including data preprocessing, augmentation, and scaling techniques
  • Has expertise in managing data using Python libraries such as NumPy, Pandas, Matplotlib, in addition to leveraging models from Hugging Face and has practical knowledge of applied machine learning and deep learning frameworks, like PyTorch
  • Demonstrated exposure to natural language processing (NLP) and Computer Vision (CV)
  • Familiarity with state-of-the-art research in machine learning and AI, as evidenced by relevant coursework, publications, or projects
Job Responsibility
Job Responsibility
  • Collaborate cross-functionally with Research Scientists and Machine Learning Engineers to design, implement, and evaluate experiments that advance the performance, efficiency, and scalability of modern ML and LLM systems for our AI products
  • Curate, preprocess, and manage large-scale datasets for training and evaluation, ensuring data quality, diversity, and reproducibility across experiments
  • Conduct continued training, fine-tuning, and alignment of large language models for specialized applications such as conversational AI, summarization, generative search, and multimodal agents
  • Evaluate cutting-edge ML algorithms through rigorous experimentation and provide detailed analyses highlighting performance insights, failure modes, and opportunities for improvement
  • Contribute to publications and presentations at internal workshops or top-tier academic venues, helping to drive innovation in Enterprise AI and large-scale ML systems
What we offer
What we offer
  • health and wellbeing resources
  • paid volunteer days
Read More
Arrow Right

Multimodal AI Engineer, Document Understanding

Join us and help shape the future of AI by redefining document workflows with AI...
Location
Location
United States , San Francisco
Salary
Salary:
Not provided
llamaindex.ai Logo
LlamaIndex
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3-7 years of experience in machine learning engineering or applied research
  • Strong software engineering fundamentals with production Python experience (modern tooling: uv, ruff, mypy, Pydantic)
  • Hands-on experience training, fine-tuning, or deploying ML models in production
  • Deep understanding of modern ML techniques, particularly in computer vision, NLP, or multimodal learning
  • Experience with at least one of: data pipeline development, model training/fine-tuning, or ML infrastructure
  • Ability to read and implement from research papers and technical specifications
  • Track record of executing with high intensity in fast-paced environments
  • Strong technical communication skills and comfort with open-source collaboration
Job Responsibility
Job Responsibility
  • Develop, train, and optimize machine learning models for document structure understanding, table extraction, layout analysis, and multimodal content processing
  • Build robust data pipelines, evaluation frameworks, and experimentation infrastructure
  • Design and implement production ML systems that handle complex, real-world documents at scale
  • Stay current with latest advances in vision-language models, document AI, and multimodal learning
  • Collaborate with engineering teams to integrate ML innovations into production APIs
  • Contribute to both our open-source frameworks and enterprise offerings
  • Drive technical decisions while balancing research exploration with product delivery
What we offer
What we offer
  • Competitive base salary and equity compensation
  • Comprehensive medical/dental/vision coverage for you and your family
  • Unlimited paid time off policy
  • Daily catered lunch and snacks in the San Francisco office
  • Budget for conferences, research materials, and professional development
  • Access to cutting-edge compute resources and research tools
  • Fulltime
Read More
Arrow Right

Staff Software Engineer – Forward Deployed

We are seeking a skilled Software Engineer who will design, build, and maintain ...
Location
Location
China , Shanghai; Dalian; Wuhan
Salary
Salary:
Not provided
pfizer.de Logo
Pfizer
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Engineering, or related field with 8-12 years of relevant experience
  • AI-Augmented Development: optimize AI tool usage, train engineers on AI-augmented workflows, evaluate new AI development tools, establish practices that balance AI speed with verification rigor
  • Business Immersion: rapidly acquire domain expertise, translate between business and engineering, mentor engineers on immersion
  • Data Integration: navigate complex enterprise data landscapes, build relationships to gain data access, handle undocumented schemas, build robust integration solutions, mentor engineers on data integration
  • Full-Stack Development: build complete applications rapidly across any technology stack, select the right tools, balance technical debt with delivery speed, mentor engineers on full-stack development
  • Multi-Audience Communication: influence through communication at all levels, handle difficult conversations skillfully, train engineers on effective communication, represent teams across the function
  • Problem Discovery: seek out undefined problems, embed with users to discover latent needs, coach engineers on problem discovery techniques, turn ambiguity into clear problem statements
  • Rapid Prototyping & Validation: lead rapid delivery initiatives, coach on prototype-first approaches, establish trust through consistent fast delivery, define clear criteria for prototype-to-production transitions
  • Site Reliability Engineering: define reliability standards, drive post-incident improvements systematically, design capacity planning processes, mentor engineers on SRE practices
  • Stakeholder Management: influence senior stakeholders, manage complex stakeholder landscapes with competing agendas, build trust rapidly with new stakeholders, shield teams from organizational friction
Job Responsibility
Job Responsibility
  • Delivery: Lead technical delivery of complex projects across multiple teams, unblock others through hands-on contributions, ensure engineering quality
  • AI: Design AI-augmented engineering workflows for your area, evaluate new AI tools, train engineers on effective AI usage, balance speed with verification
  • People: Coach multiple engineers on career growth, lead hiring for technical roles across your area, shape team technical culture
  • Business: Drive business outcomes through technical solutions across your area, influence product roadmaps, partner effectively with business stakeholders
  • Process: Drive process efficiency within your team, coordinate cross-functional technical work, lead retrospectives
  • Documentation: Design documentation strategies for your projects, ensure knowledge persists beyond individuals, write specifications that enable effective collaboration
  • Fulltime
Read More
Arrow Right

Field Research Engineer

We're seeking a Field Research Engineer to serve as Prolific's technical researc...
Location
Location
United States
Salary
Salary:
Not provided
prolific.com Logo
Prolific
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of experience in AI/ML research, research engineering, or applied ML
  • Strong knowledge of LLM evaluation methodologies, data collection design, and human feedback approaches
  • Experience designing AI-assisted or model-in-the-loop workflows
  • Track record of executing research or applied AI/ML projects with clear outcomes
  • Can build quickly when needed - comfortable prototyping solutions and building own tooling
  • Strong communicator across contexts
  • Based in US, Canada, or Mexico (or able to work US timezone hours)
Job Responsibility
Job Responsibility
  • Serve as Prolific's technical research partner on customer engagements with AI labs and enterprise AI teams
  • Advise and prototype bespoke solutions like evaluation methodology, rubric design, data collection approaches, quality frameworks, verifiers and validation frameworks, as well as RL environments
  • Own methodology for projects - from scoping through delivery - working alongside our services team
  • Build trusted relationships with customer research and engineering teams
  • Surface patterns from customer work that should become platform capabilities, partnering with Product and Engineering to shape roadmap priorities
  • Work with Services and Account teams to qualify opportunities and shape proposals
  • Share learnings across the Research Engineering function to strengthen our collective expertise
  • Contribute to Prolific's research publications and external collaborations as opportunities arise
  • Collaborate with product teams to translate research insights into practical applications
  • Mentor team members on advanced AI concepts and emerging research
What we offer
What we offer
  • competitive salary
  • benefits
  • remote working
  • equity
  • opportunity to earn a cash variable element, such as a bonus or commission
  • Fulltime
Read More
Arrow Right

Staff AI Engineer

We are looking for a visionary and technically deep AI Architect to lead the des...
Location
Location
India , Pune, Maharashtra; Karnataka
Salary
Salary:
Not provided
teradata.com Logo
Teradata
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of experience in software engineering or technology architecture
  • At least 4 years focused on AI, ML, and Generative AI solutions at enterprise scale
  • Deep hands-on expertise with Large Language Models (LLMs) including model selection, prompt engineering, fine-tuning (LoRA, PEFT), and evaluation techniques
  • Proven experience designing and implementing MCP servers and structured context management strategies for LLM applications
  • Strong working knowledge of Generative AI frameworks and orchestration tools — LangChain, LlamaIndex, LangGraph, AutoGen, CrewAI, or equivalent agentic platforms
  • Hands-on experience with RAG architectures, vector databases (Pinecone, Weaviate etc), and embedding models
  • Proficiency in Python-based AI/ML development, including libraries such as HuggingFace Transformers, PyTorch, TensorFlow, scikit-learn, and OpenAI SDK
  • Experience with AI cloud platforms and services — Azure OpenAI, AWS Bedrock, Google Vertex AI, or equivalent enterprise AI infrastructure
  • Solid understanding of MLOps practices including model versioning, experiment tracking (MLflow, Weights & Biases), CI/CD for ML, and model monitoring in production
  • Familiarity with AI agent design patterns, multi-agent systems, tool-use frameworks, and autonomous workflow orchestration
Job Responsibility
Job Responsibility
  • Define and own the enterprise AI architecture strategy, establishing standards, reference architectures, and technology roadmaps for AI/ML adoption
  • Architect and oversee the design of LLM-powered applications including RAG pipelines, AI agents, and agentic workflow systems
  • Design and govern MCP server implementations, enabling structured, context-aware interactions between LLMs and enterprise data sources
  • Lead the evaluation and selection of AI/ML platforms, LLM providers (OpenAI, Anthropic, Azure OpenAI, Google Gemini, open-source models), and supporting infrastructure
  • Architect prompt engineering frameworks, fine-tuning pipelines, and model evaluation strategies to ensure LLM output quality, accuracy, and reliability
  • Design AI orchestration layers using frameworks such as LangChain, LlamaIndex, AutoGen, CrewAI, or custom agentic architectures for multi-step reasoning workflows
  • Establish AI data pipelines for model training, embeddings generation, vector database management, and real-time inference serving
  • Define and enforce responsible AI practices including bias detection, explainability, hallucination mitigation, content safety guardrails, and regulatory compliance
  • Collaborate with engineering leads, data scientists, and product teams to embed AI capabilities into existing platforms and new product initiatives
  • Mentor and upskill engineering teams on AI/ML technologies, architectural patterns, and emerging developments in the AI ecosystem
What we offer
What we offer
  • People-first culture
  • Flexible work model
  • Focus on well-being
  • Inclusive environment
  • Fulltime
Read More
Arrow Right

Product Manager III (Research)

As a Product Manager for Emerging Technologies with Everseen, you will act as a ...
Location
Location
Timișoara; Belgrade
Salary
Salary:
Not provided
everseen.ai Logo
Everseen
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6 to 10 years experience as a Product Manager, ideally with exposure to technology-driven products or research collaborations
  • Bachelor’s degree
  • advanced certifications preferred
  • Experience developing positioning, packaging, and pricing strategies and plans
  • Proficiency in managing the full product lifecycle, from ideation and development to launch and post-market analysis
  • Ability to develop and articulate a compelling product vision aligned with cutting-edge technology trends
  • Commitment to continuous improvement, process optimization, and operational excellence in dynamic environments
  • Strong analytical and strategic thinking skills with the ability to evaluate complex markets and emerging opportunities
  • A passion for innovation, research, and applying new technologies to solve business challenges
  • Excellent communication and stakeholder management capabilities, with a track record of effective cross-functional leadership
Job Responsibility
Job Responsibility
  • Serve as a subject matter expert on emerging technology trends and advise senior leadership on strategic opportunities
  • Product Strategy & Roadmap: Defines vision for initiatives leveraging emerging technologies
  • drives strategic alignment
  • Requirement Definition: Leads business requirements elicitation for complex initiatives
  • Documentation: Leads strategic product communications and documentation
  • Quality Assurance: Champion quality and process improvements across product development cycles
  • User Experience: Oversee user research and experience standards to ensure innovations meet customer needs and expectations
  • Research Activity: Leads comprehensive competitor and market analysis across products
  • advises on strategy and evaluates business opportunities
  • Collaborate closely with Research teams to identify, evaluate, and translate technological advancements into viable product opportunities
  • Fulltime
Read More
Arrow Right