CrawlJobs Logo

AI Research Engineer, Enterprise Evaluations

scale.com Logo

Scale

Location Icon

Location:
United States , San Francisco

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

179400.00 - 224250.00 USD / Year

Job Description:

Scale AI is seeking a technically rigorous and driven AI Research Engineer to join our Enterprise Evaluations team. This high-impact role is critical to our mission of delivering the industry's leading GenAI Evaluation Suite. You will be a hands-on contributor to the core systems that ensure the safety, reliability, and continuous improvement of LLM-powered workflows and agents for the enterprise.

Job Responsibility:

  • Partner with Scale’s Operations team and enterprise customers to translate ambiguity into structured evaluation data, guiding the creation and maintenance of gold-standard human-rated datasets and expert rubrics that anchor AI evaluation systems
  • Analyze feedback and collected data to identify patterns, refine evaluation frameworks, and establish iterative improvement loops that enhance the quality and relevance of human-curated assessments
  • Design, research, and develop LLM-as-a-Judge autorater frameworks and AI-assisted evaluation systems. This includes creating models that critique, grade, and explain agent outputs (e.g., RLAIF, model-judging-model setups), along with scalable evaluation pipelines and diagnostic tools
  • Pursue research initiatives that explore new methodologies for automatically analyzing, evaluating, and improving the behavior of enterprise agents, pushing the boundaries of how AI systems are assessed and optimized in real-world contexts

Requirements:

  • Bachelor’s degree in Computer Science, Electrical Engineering, a related field, or equivalent practical experience
  • 2+ years of experience in Machine Learning or Applied Research, focused on applied ML systems or evaluation infrastructure
  • Hands-on experience with Large Language Models (LLMs) and Generative AI in professional or research environments
  • Strong understanding of frontier model evaluation methodologies and the current research landscape
  • Proficiency in Python and major ML frameworks (e.g., PyTorch, TensorFlow)
  • Solid engineering and statistical analysis foundation, with experience developing data-driven methods for assessing model quality

Nice to have:

  • Advanced degree (Master’s or Ph.D.) in Computer Science, Machine Learning, or a related quantitative field
  • Published research in leading ML or AI conferences such as NeurIPS, ICML, ICLR, or KDD
  • Experience designing, building, or deploying LLM-as-a-Judge frameworks or other automated evaluation systems for complex models
  • Experience collaborating with operations or external teams to define high-quality human annotator guidelines
  • Expertise in ML research engineering, stochastic systems, observability, or LLM-powered applications for model evaluation and analysis
  • Experience contributing to scalable pipelines that automate the evaluation and monitoring of large-scale models and agents
  • Familiarity with distributed computing frameworks and modern cloud infrastructure
What we offer:
  • Comprehensive health, dental and vision coverage
  • retirement benefits
  • a learning and development stipend
  • generous PTO
  • commuter stipend
  • equity grant

Additional Information:

Job Posted:
February 20, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:
PREMIUM
More languages and countries
+ Unlock 31695 hidden job offers
Languages
English Čeština Deutsch Ελληνικά Español Français +15
Countries
United States United Kingdom India Canada Australia +
See plans
Plans from $2.99 / month

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for AI Research Engineer, Enterprise Evaluations

PhD AI Research Intern

Join our cutting-edge Machine Learning Research team at Atlassian as a PhD Resea...
Location
Location
Canada
Salary
Salary:
55.00 USD / Hour
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Completed Bachelors degree in Computer Science or a related field
  • Currently pursuing a PhD in Computer Science or a related field at any stage of your doctoral studies
  • Strong foundation in AI/ML, LLMs, modeling and/or optimization techniques
Job Responsibility
Job Responsibility
  • Collaborate cross-functionally with Research Scientists and Machine Learning Engineers to design, implement, and evaluate experiments that advance the performance, efficiency, and scalability of modern ML and LLM systems for our AI products
  • Curate, preprocess, and manage large-scale datasets for training and evaluation, ensuring data quality, diversity, and reproducibility across experiments
  • Conduct continued training, fine-tuning, and alignment of large language models for specialized applications such as conversational AI, summarization, generative search, and multimodal agents
  • Evaluate cutting-edge ML algorithms through rigorous experimentation and provide detailed analyses highlighting performance insights, failure modes, and opportunities for improvement
  • Contribute to publications and presentations at internal workshops or top-tier academic venues, helping to drive innovation in Enterprise AI and large-scale ML systems
What we offer
What we offer
  • health and wellbeing resources
  • paid volunteer days
Read More
Arrow Right

Senior Generative AI Engineer

The Citi Innovation Lab is a leader in creating new ideas, innovative technology...
Location
Location
Israel , Tel Aviv
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Hands-on experience with transformer-based models and their applications
  • Strong understanding of LLM, LLM model selection, benchmarking, and optimization
  • Experience with RAG systems and vector databases
  • Proficiency in developing and deploying AI agents
  • Knowledge of open-source models and methods, including benchmarks for evaluating AI performance
  • Knowledge of security risks and mitigation strategies for autonomous AI agents, including OWASP guidelines
  • Proficiency in Python and experience with libraries such as Pandas, Tabula, and TensorFlow/PyTorch
  • Strong problem-solving skills and attention to detail
  • Excellent communication and documentation skills
Job Responsibility
Job Responsibility
  • Develop and implement enterprise scale cutting edge models such as visual document understanding and text2code
  • Implement and Optimize vector-based retrieval systems for RAG by covering embedding models, ANN indexing, hybrid search, and re-ranking
  • Implement autonomous AI agents to implement adaptive, error resistant data extraction, and content validation tasks
  • Develop and deploy enterprise software applications using state of the art practices, such as micro services, modular code, as well as proficiency in writing unit and integration tests to ensure the accuracy and reliability of the AI applications
  • Ensure data privacy and security in all AI-driven processes, adhering to OWASP guidelines and Citi’s stringent authentication and authorization policies
  • Collaborate with cross-functional teams to integrate AI solutions into existing workflows
  • Document the development process and create comprehensive technical specifications
  • Manage and maintain AI applications, ensuring best practices in model management and versioning
  • Deploy resulting AI applications using industrial strength framework and processes, including Kubernetes and OpenShift for scalable and efficient operations on-premises
  • Ability to research and develop and utilize transformer-based models for enhanced application performance
  • Fulltime
Read More
Arrow Right

PhD AI Research Intern

Join our cutting-edge Machine Learning Research team at Atlassian as a PhD Resea...
Location
Location
United States , Seattle
Salary
Salary:
49.00 - 75.00 USD / Hour
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Completed Bachelors degree in Computer Science or a related field
  • Currently pursuing a PhD in Computer Science or a related field at any stage of your doctoral studies
  • Degree completion date cannot be earlier than September 2026 - June 2027
  • Strong foundation in AI/ML, LLMs, modeling and/or optimization techniques
  • Exhibit a solid grasp of algorithms and data structures
  • Demonstrate proficiency in Python programming and ability to write clean, efficient, and well-documented code
  • Experience working with large-scale datasets, including data preprocessing, augmentation, and scaling techniques
  • Has expertise in managing data using Python libraries such as NumPy, Pandas, Matplotlib, in addition to leveraging models from Hugging Face and has practical knowledge of applied machine learning and deep learning frameworks, like PyTorch
  • Demonstrated exposure to natural language processing (NLP) and Computer Vision (CV)
  • Familiarity with state-of-the-art research in machine learning and AI, as evidenced by relevant coursework, publications, or projects
Job Responsibility
Job Responsibility
  • Collaborate cross-functionally with Research Scientists and Machine Learning Engineers to design, implement, and evaluate experiments that advance the performance, efficiency, and scalability of modern ML and LLM systems for our AI products
  • Curate, preprocess, and manage large-scale datasets for training and evaluation, ensuring data quality, diversity, and reproducibility across experiments
  • Conduct continued training, fine-tuning, and alignment of large language models for specialized applications such as conversational AI, summarization, generative search, and multimodal agents
  • Evaluate cutting-edge ML algorithms through rigorous experimentation and provide detailed analyses highlighting performance insights, failure modes, and opportunities for improvement
  • Contribute to publications and presentations at internal workshops or top-tier academic venues, helping to drive innovation in Enterprise AI and large-scale ML systems
What we offer
What we offer
  • health and wellbeing resources
  • paid volunteer days
Read More
Arrow Right

Multimodal AI Engineer, Document Understanding

Join us and help shape the future of AI by redefining document workflows with AI...
Location
Location
United States , San Francisco
Salary
Salary:
Not provided
llamaindex.ai Logo
LlamaIndex
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3-7 years of experience in machine learning engineering or applied research
  • Strong software engineering fundamentals with production Python experience (modern tooling: uv, ruff, mypy, Pydantic)
  • Hands-on experience training, fine-tuning, or deploying ML models in production
  • Deep understanding of modern ML techniques, particularly in computer vision, NLP, or multimodal learning
  • Experience with at least one of: data pipeline development, model training/fine-tuning, or ML infrastructure
  • Ability to read and implement from research papers and technical specifications
  • Track record of executing with high intensity in fast-paced environments
  • Strong technical communication skills and comfort with open-source collaboration
Job Responsibility
Job Responsibility
  • Develop, train, and optimize machine learning models for document structure understanding, table extraction, layout analysis, and multimodal content processing
  • Build robust data pipelines, evaluation frameworks, and experimentation infrastructure
  • Design and implement production ML systems that handle complex, real-world documents at scale
  • Stay current with latest advances in vision-language models, document AI, and multimodal learning
  • Collaborate with engineering teams to integrate ML innovations into production APIs
  • Contribute to both our open-source frameworks and enterprise offerings
  • Drive technical decisions while balancing research exploration with product delivery
What we offer
What we offer
  • Competitive base salary and equity compensation
  • Comprehensive medical/dental/vision coverage for you and your family
  • Unlimited paid time off policy
  • Daily catered lunch and snacks in the San Francisco office
  • Budget for conferences, research materials, and professional development
  • Access to cutting-edge compute resources and research tools
  • Fulltime
Read More
Arrow Right

Staff Software Engineer – Forward Deployed

We are seeking a skilled Software Engineer who will design, build, and maintain ...
Location
Location
China , Shanghai; Dalian; Wuhan
Salary
Salary:
Not provided
pfizer.de Logo
Pfizer
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Engineering, or related field with 8-12 years of relevant experience
  • AI-Augmented Development: optimize AI tool usage, train engineers on AI-augmented workflows, evaluate new AI development tools, establish practices that balance AI speed with verification rigor
  • Business Immersion: rapidly acquire domain expertise, translate between business and engineering, mentor engineers on immersion
  • Data Integration: navigate complex enterprise data landscapes, build relationships to gain data access, handle undocumented schemas, build robust integration solutions, mentor engineers on data integration
  • Full-Stack Development: build complete applications rapidly across any technology stack, select the right tools, balance technical debt with delivery speed, mentor engineers on full-stack development
  • Multi-Audience Communication: influence through communication at all levels, handle difficult conversations skillfully, train engineers on effective communication, represent teams across the function
  • Problem Discovery: seek out undefined problems, embed with users to discover latent needs, coach engineers on problem discovery techniques, turn ambiguity into clear problem statements
  • Rapid Prototyping & Validation: lead rapid delivery initiatives, coach on prototype-first approaches, establish trust through consistent fast delivery, define clear criteria for prototype-to-production transitions
  • Site Reliability Engineering: define reliability standards, drive post-incident improvements systematically, design capacity planning processes, mentor engineers on SRE practices
  • Stakeholder Management: influence senior stakeholders, manage complex stakeholder landscapes with competing agendas, build trust rapidly with new stakeholders, shield teams from organizational friction
Job Responsibility
Job Responsibility
  • Delivery: Lead technical delivery of complex projects across multiple teams, unblock others through hands-on contributions, ensure engineering quality
  • AI: Design AI-augmented engineering workflows for your area, evaluate new AI tools, train engineers on effective AI usage, balance speed with verification
  • People: Coach multiple engineers on career growth, lead hiring for technical roles across your area, shape team technical culture
  • Business: Drive business outcomes through technical solutions across your area, influence product roadmaps, partner effectively with business stakeholders
  • Process: Drive process efficiency within your team, coordinate cross-functional technical work, lead retrospectives
  • Documentation: Design documentation strategies for your projects, ensure knowledge persists beyond individuals, write specifications that enable effective collaboration
  • Fulltime
Read More
Arrow Right

Distinguished Engineer – AI Security

We're building a world of health around every individual — shaping a more connec...
Location
Location
United States , Scottsdale
Salary
Salary:
175100.00 - 334750.00 USD / Year
https://www.cvshealth.com/ Logo
CVS Health
Expiration Date
June 30, 2026
Flip Icon
Requirements
Requirements
  • 15+ years of AI experience, including significant depth in advanced technical or architectural roles
  • 5+ years of cybersecurity experience defining and integrating security standards and controls that aligned to established frameworks such as NIST CSF
  • Deep expertise in AI security concepts such as adversarial ML, secure model deployment, AI agent authorization, AI data loss protection, AI safety, and AI risk management
  • Strong background in Zero Trust architecture and hybrid infrastructure security
  • Demonstrated ability to lead and influence large-scale, cross-functional security initiatives
  • Hands-on experience building, deploying, and securing AI systems and platforms in enterprise environments
  • Practical experience applying AI security and risk management frameworks in real-world engineering contexts
  • AI Security Frameworks: MITRE ATLAS, NIST RMF, ISACA AI Audit Toolkit, and emerging ISO/IEC AI security standards
  • AI Technologies: Expert conceptual and hands-on implementation knowledge of core ML and generative AI technologies including transformer-based NLP, LLM-based generative AI and agentic AI
  • AI Risk Management & Model Security: Threat modeling, adversarial defenses, model lifecycle security, and vulnerability management
Job Responsibility
Job Responsibility
  • Define and help execute the enterprise AI security strategy, spanning secure model selection, development, and deployment criteria, adversarial threat mitigation, and alignment with emerging AI governance requirements
  • Design, build, and maintain reusable AI security frameworks, reference patterns, and technical standards for model integrity, secure data pipelines, and privacy-preserving machine learning
  • Perform hands-on security assessments of AI systems, identify risks, and provide mitigation guidance based on AI security posture management and detection findings
  • Drive innovation in AI security techniques, controls, and tooling through applied research and practical implementation
  • Apply and guide the application of AI security frameworks such as MITRE ATLAS, NIST RMF, and emerging ISO/IEC AI standards to secure the end-to-end AI lifecycle
  • Apply Zero Trust principles to hybrid and cloud infrastructure environments supporting AI workloads, including workload identity, segmentation, and continuous verification
  • Partner closely with Enterprise Architecture and Platform Engineering to integrate AI security controls into infrastructure design patterns and shared services
  • Guide and, where appropriate, directly implement security capabilities across on-premises and cloud platforms to ensure consistent protection for AI and traditional systems
  • Hands-on Engineering & Prototyping: Design and build proof-of-concept solutions, reference implementations, and reusable components to validate AI security and infrastructure security approaches
  • Framework and Pattern Development: Architect repeatable security patterns and guardrails that can be adopted by data science, engineering, and platform teams
What we offer
What we offer
  • Affordable medical plan options
  • 401(k) plan (including matching company contributions)
  • Employee stock purchase plan
  • No-cost programs for all colleagues including wellness screenings, tobacco cessation and weight management programs, confidential counseling and financial coaching
  • Paid time off
  • Flexible work schedules
  • Family leave
  • Dependent care resources
  • Colleague assistance programs
  • Tuition assistance
  • Fulltime
Read More
Arrow Right

Field Research Engineer

We're seeking a Field Research Engineer to serve as Prolific's technical researc...
Location
Location
United States
Salary
Salary:
Not provided
prolific.com Logo
Prolific
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of experience in AI/ML research, research engineering, or applied ML
  • Strong knowledge of LLM evaluation methodologies, data collection design, and human feedback approaches
  • Experience designing AI-assisted or model-in-the-loop workflows
  • Track record of executing research or applied AI/ML projects with clear outcomes
  • Can build quickly when needed - comfortable prototyping solutions and building own tooling
  • Strong communicator across contexts
  • Based in US, Canada, or Mexico (or able to work US timezone hours)
Job Responsibility
Job Responsibility
  • Serve as Prolific's technical research partner on customer engagements with AI labs and enterprise AI teams
  • Advise and prototype bespoke solutions like evaluation methodology, rubric design, data collection approaches, quality frameworks, verifiers and validation frameworks, as well as RL environments
  • Own methodology for projects - from scoping through delivery - working alongside our services team
  • Build trusted relationships with customer research and engineering teams
  • Surface patterns from customer work that should become platform capabilities, partnering with Product and Engineering to shape roadmap priorities
  • Work with Services and Account teams to qualify opportunities and shape proposals
  • Share learnings across the Research Engineering function to strengthen our collective expertise
  • Contribute to Prolific's research publications and external collaborations as opportunities arise
  • Collaborate with product teams to translate research insights into practical applications
  • Mentor team members on advanced AI concepts and emerging research
What we offer
What we offer
  • competitive salary
  • benefits
  • remote working
  • equity
  • opportunity to earn a cash variable element, such as a bonus or commission
  • Fulltime
Read More
Arrow Right

Staff AI Engineer

We are looking for a visionary and technically deep AI Architect to lead the des...
Location
Location
India , Pune, Maharashtra; Karnataka
Salary
Salary:
Not provided
teradata.com Logo
Teradata
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of experience in software engineering or technology architecture
  • At least 4 years focused on AI, ML, and Generative AI solutions at enterprise scale
  • Deep hands-on expertise with Large Language Models (LLMs) including model selection, prompt engineering, fine-tuning (LoRA, PEFT), and evaluation techniques
  • Proven experience designing and implementing MCP servers and structured context management strategies for LLM applications
  • Strong working knowledge of Generative AI frameworks and orchestration tools — LangChain, LlamaIndex, LangGraph, AutoGen, CrewAI, or equivalent agentic platforms
  • Hands-on experience with RAG architectures, vector databases (Pinecone, Weaviate etc), and embedding models
  • Proficiency in Python-based AI/ML development, including libraries such as HuggingFace Transformers, PyTorch, TensorFlow, scikit-learn, and OpenAI SDK
  • Experience with AI cloud platforms and services — Azure OpenAI, AWS Bedrock, Google Vertex AI, or equivalent enterprise AI infrastructure
  • Solid understanding of MLOps practices including model versioning, experiment tracking (MLflow, Weights & Biases), CI/CD for ML, and model monitoring in production
  • Familiarity with AI agent design patterns, multi-agent systems, tool-use frameworks, and autonomous workflow orchestration
Job Responsibility
Job Responsibility
  • Define and own the enterprise AI architecture strategy, establishing standards, reference architectures, and technology roadmaps for AI/ML adoption
  • Architect and oversee the design of LLM-powered applications including RAG pipelines, AI agents, and agentic workflow systems
  • Design and govern MCP server implementations, enabling structured, context-aware interactions between LLMs and enterprise data sources
  • Lead the evaluation and selection of AI/ML platforms, LLM providers (OpenAI, Anthropic, Azure OpenAI, Google Gemini, open-source models), and supporting infrastructure
  • Architect prompt engineering frameworks, fine-tuning pipelines, and model evaluation strategies to ensure LLM output quality, accuracy, and reliability
  • Design AI orchestration layers using frameworks such as LangChain, LlamaIndex, AutoGen, CrewAI, or custom agentic architectures for multi-step reasoning workflows
  • Establish AI data pipelines for model training, embeddings generation, vector database management, and real-time inference serving
  • Define and enforce responsible AI practices including bias detection, explainability, hallucination mitigation, content safety guardrails, and regulatory compliance
  • Collaborate with engineering leads, data scientists, and product teams to embed AI capabilities into existing platforms and new product initiatives
  • Mentor and upskill engineering teams on AI/ML technologies, architectural patterns, and emerging developments in the AI ecosystem
What we offer
What we offer
  • People-first culture
  • Flexible work model
  • Focus on well-being
  • Inclusive environment
  • Fulltime
Read More
Arrow Right