CrawlJobs Logo

Senior Research Engineer, LLM Evaluation and Behavioral Analysis

together.ai Logo

Together AI

Location Icon

Location:
United States , San Francisco

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

220000.00 - 270000.00 USD / Year

Job Description:

Together AI is building the fastest, most capable open-source-aligned LLMs and inference stack in the world. As part of the Turbo organization, you will be a critical bridge between cutting-edge model research and real-world behavioral reliability. This role focuses on deeply understanding model behavior — probing reasoning, tool use, function calling, multi-step interactions, and subtle failure modes — and building the evaluation systems that ensure models behave intelligently and consistently in production. You will develop robust evaluation pipelines, design high-quality behavioral test suites, and work closely with training, post-training, inference, and product teams to identify regressions, shape datasets, and influence model improvements. Your work will directly define how Together measures model quality and reliability across releases.

Job Responsibility:

  • Build and iterate on evaluation frameworks that measure model performance across instruction following, function calling, long-context reasoning, multi-turn dialog, safety, and agentic behaviors
  • Develop specialized evaluation suites for: Function calling — argument correctness, schema adherence, tool selection, multi-function planning, and error recovery
  • Agentic workflows — task decomposition, multi-step planning, self-correction, and autonomous tool-use sequences
  • Tool-augmented interactions — search, retrieval, code execution, API-driven actions
  • Create CI/CD automated pipelines for A/B comparisons, regression detection, behavioral drift monitoring, and adversarial probing
  • Design and curate high-quality evaluation datasets, especially nuanced or challenging cases across domains
  • Collaborate with researchers and engineers to diagnose failures, triage regressions, and guide data selection, shaping strategies, objective design, and system improvements
  • Work with engineering teams to build dashboards, reports, and internal tools that help visualize behavior changes across releases
  • Operate in a fast-paced, high-impact environment with deep technical ownership and close partnership with world-class model researchers and infra engineers

Requirements:

  • Strong engineering skills with Python, evaluation tooling, and distributed workflows
  • Experience working with LLMs or transformer-based models, particularly in model evaluation, testing, or red-teaming
  • Ability to reason clearly about qualitative behavior, edge cases, and model failure patterns
  • Experience designing experiments, building datasets, and interpreting noisy behavioral signals
  • Understanding of function calling and structured output formats
  • Familiarity with GPU or distributed compute environments
  • Hands-on experience evaluating function-calling models, agentic systems, or tool-augmented LLM pipelines
  • Experience with multi-turn or multi-step reasoning tasks
  • Familiarity with inference systems, distributed infrastructure, or post-training workflows
  • Passion for discovering subtle behaviors, surprising model gaps, or edge-case failures
What we offer:
  • competitive compensation
  • startup equity
  • health insurance
  • other benefits

Additional Information:

Job Posted:
February 18, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior Research Engineer, LLM Evaluation and Behavioral Analysis

Senior People Scientist

The Sr People Scientist is responsible for supplying to the development of an en...
Location
Location
United States , Bellevue
Salary
Salary:
127700.00 - 230300.00 USD / Year
https://www.t-mobile.com Logo
T-Mobile
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree Quantitative Subject area (math, statistics, economics, computer science, physics, engineering)
  • Master's/Advanced Degree Quantitative Subject area (I-O Psychology
  • Behavioral Economics
  • Applied Social Psychology w/emphases on research science and advanced statistics)
  • Doctorate Quantitative Discipline (I-O Psychology
  • Behavioral Economics
  • Applied Social Psychology w/emphases on research science and advanced statistics)
  • 7-10 years Research science or related experience
  • Proven experience with Gen AI for foundational models and LLM and demonstrating for analytics
  • 4-7 years Combination of deep technical skills and business savvy to interface and influence all levels and fields
Job Responsibility
Job Responsibility
  • Support the vision and research science roadmap in collaboration with the HR leadership team and senior leadership partners
  • Collaborate in identifying and addressing large-scale, sophisticated business problems related to employee experience, talent, and organizational capability
  • Drive the development and integration of diverse and complex data sources for advanced and sophisticated qualitative and quantitative modeling
  • Contribute to maintaining high standards in research science, including supporting the mentoring and development of team members
  • Develop and implement network analytics, AI/ML, and Deep Learning models to analyze sophisticated datasets and support innovation in people science
  • Build and run true A/B and quasi-experimental designs to assess the impact of mechanisms, programs, and various tested solutions that align to the overall T-Mobile people strategy
  • evaluate research initiatives to provide bottom line value, return on investment and improvements
  • Translate technical research findings into clear, concise, and engaging reports that support decisions and applications across the employee lifecycle
  • Collaborate with multiple teams and account teams to influence, build consensus, and drive significant T-Mobile wide changes related to applying research science proposals and recommendations, including changes to programs, engineering and system needs, and people strategy roadmaps
What we offer
What we offer
  • medical, dental and vision insurance
  • flexible spending account
  • 401(k)
  • employee stock grants
  • employee stock purchase plan
  • paid time off
  • up to 12 paid holidays
  • paid parental and family leave
  • family building benefits
  • back-up care
  • Fulltime
Read More
Arrow Right

Senior Security Researcher

The Intelligence Graph Research team within Microsoft CTO organization is respon...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Doctorate in Statistics, Mathematics, Computer Science, Computer Security, or related field OR Master's Degree in Statistics, Mathematics, Computer Science, Computer Security, or related field AND 3+ years experience in software development lifecycle, large-scale computing, threat analysis or modeling, cybersecurity, vulnerability research, and/or anomaly detection OR Bachelor's Degree in Statistics, Mathematics, Computer Science, Computer Security, or related field AND 4+ years experience in software development lifecycle, large-scale computing, threat analysis or modeling, cybersecurity, vulnerability research, and/or anomaly detection OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • 8+ years of experience in security research, detection engineering, threat hunting, incident response, or applied security data science (or equivalent depth of expertise)
  • 3+ years of experience in Azure and Entra security concepts: authentication flows, service principals/app registrations, permissions/consents, conditional access, role assignments, tokens, workload identities, and common abuse paths
  • 3+ years building anomaly detections over large-scale telemetry, including Baselines, time-series aggregates, and behavioral modeling, High-volume log analytics and query optimization (e.g., KQL/ADX or equivalent), Designing alert funnels and triage logic to reduce noise
  • 3+ years in experience in applied ML skills for security problems: Feature engineering, model selection, evaluation design, drift monitoring, Experience shipping ML or statistical detection into production systems
  • 3+ years in experience in Python/C# (data pipelines, modeling, production code quality), distributed processing (e.g., Spark/Databricks/Flink) and large datasets (Parquet/data lakes)
  • 1+ years experience with graph analytics for security use cases (attack paths, entity resolution, graph embeddings, community detection, anomaly scoring) and/or graph databases (Neo4j or similar)
  • 1+ years experience building or operationalizing LLM-powered or agentic investigation systems: Tool-driven agents, retrieval, memory, prompt/eval harnesses, guardrails, and human-in-the-loop workflow
  • 1+ years with Microsoft cloud security telemetry sources such as: Entra sign-in/audit logs, app consent events, Azure activity logs, Key Vault diagnostics, storage access logs, Graph API activity, etc
Job Responsibility
Job Responsibility
  • Build cloud-scale anomaly detections: Design and implement high-signal anomaly detectors across Azure/Entra and custom log sources (control plane, data plane, identity/auth, app activity, Graph API, Key Vault, storage, etc.)
  • Create detection funnels that reduce noise while preserving true positives, with measurable improvements in alert quality and investigation time
  • Develop baselines and “pattern-of-life” models for identities, service principals, applications, tenants, and infrastructure
  • Convert detections into ML models and scalable pipelines: Translate research detections into ML approaches (supervised, weakly-supervised, semi-supervised, anomaly detection) and deploy them into reliable pipelines
  • Engineer features at scale (time-series aggregates, behavior fingerprints, graph-derived features, sequence features) and evaluate performance with rigorous metrics (precision/recall, alert volume, time-to-triage, drift)
  • Own end-to-end lifecycle from hypothesis to productionization
  • Fulltime
Read More
Arrow Right

Senior Applied AI Engineer

We’re hiring a Senior Applied AI Engineer to join a fast‑moving, high‑ownership ...
Location
Location
United States , Mountain View
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Master’s Degree AND 3+ years of experience in engineering, problem solving, model building, evaluation, data analysis OR equivalent experience
  • 2+ years shipping production-level code, models, or data analysis
  • 1+ years using AI-assisted coding and analysis techniques
  • Experience working on small teams and mid-stage startup environments
  • Experience working on AI products
  • PhD in engineering, applied math, statistics, or related analytical field
  • 4+ years shipping production-level code, models, or data analysis
  • Deep experience building from zero-to-one
  • Hands on work hillclimbing AI evaluations
Job Responsibility
Job Responsibility
  • Design and ship LLM‑powered assistant features, including conversational flows, agentic behaviors, retrieval pipelines, and multimodal interactions
  • Build prompt architectures, system instructions, and orchestration logic that ensure reliability, grounding, and personality consistency
  • Prototype new capabilities rapidly and iterate based on user signals and evaluation data
  • Build and maintain evaluation frameworks for correctness, safety, grounding, and UX quality
  • Run hillclimbing loops across prompts, models, and tool‑use strategies to continuously improve assistant performance
  • Analyze failure modes, design mitigations, and drive systematic improvements across the stack
  • Develop internal tools for prompt experimentation, model comparison telemetry and debugging automated eval pipelines
  • Create reusable frameworks that accelerate the entire AI org’s ability to ship high‑quality assistant features
  • Integrate LLMs with product surfaces, APIs, and backend systems
  • Build lightweight ML components (ranking, classification, summarization, personalization) that enhance assistant intelligence
  • Fulltime
Read More
Arrow Right

Senior Applied AI Engineer

As an Senior Applied AI Engineer for CXA, you will play a pivotal role in advanc...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Statistics, Electrical/Computer Engineering, Physics, Mathematics or related field, OR Master’s degree OR PHD AND 1+ years of experience working with machine learning libraries to solve real world AI/ML problems
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Microsoft Cloud Background Check
  • Strong 7+ software engineering skills, including hands‑on development experience in C# and Python for building scalable, high‑performance, and production‑ready systems
  • Experience in working with Generative AI models and ML stacks
  • Experience across the product lifecycle from ideation to shipping
Job Responsibility
Job Responsibility
  • Build collaborative relationships with product and business groups to deliver AI-driven impact
  • Research and implement state-of-the-art using foundation models, prompt engineering, RAG, graphs, multi-agent architectures, as well as classical machine learning techniques
  • Fine-tune foundation models using domain-specific datasets
  • Evaluate model behavior on relevance, bias, hallucination, and response quality via offline evaluations, shadow experiments, online experiments, and ROI analysis
  • Apply strong software engineering skills in languages such as C# and Python to design, develop, and optimize scalable, reliable, and maintainable AI‑driven systems
  • Develop LLM prompts, agents, and query execution workflows, often with tight latency constraints
  • Build rapid AI solution prototypes, contribute to production deployment of these solutions, debug production code, support MLOps/AIOps
  • Contribute to papers, patents, and conference presentations
  • Translate research into production-ready solutions and measure their impact through A/B testing and telemetry that address customer needs
  • Ability to use data to identify gaps in AI quality, uncover insights and implement PoCs to show proof of concepts
  • Fulltime
Read More
Arrow Right

Principal Engineer

The Principal AI/ML Operations Engineer leads the architecture, automation, and ...
Location
Location
United States , Pleasanton, California
Salary
Salary:
251000.00 - 314500.00 USD / Year
blackline.com Logo
BlackLine
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Machine Learning, Data Science, or a related field
  • 10+ years in ML infrastructure, DevOps, and software system architecture
  • 4+ years in leading MLOps or AI Ops platforms
  • Strong programming skills in languages such as Python, Java, or Scala
  • Expertise in ML frameworks (TensorFlow, PyTorch, scikit-learn) and orchestration tools (Airflow, Kubeflow, Vertex AI, MLflow)
  • Proven experience operating production pipelines for ML and LLM-based systems across cloud ecosystems (GCP, AWS, Azure)
  • Deep familiarity with LangChain, LangGraph, ADK or similar agentic system runtime management
  • Strong competencies in CI/CD, IaC, and DevSecOps pipelines integrating testing, compliance, and deployment automation
  • Hands-on with observability stacks (Prometheus, Grafana, Newrelic) for model and agent performance tracking
  • Understanding of governance frameworks for Responsible AI, auditability, and cost metering across training and inference workloads
Job Responsibility
Job Responsibility
  • Define enterprise-level standards and reference architectures for ML-Ops and AIOps systems
  • Partner with data science, security, and product teams to set evaluation and governance standards (Guardrails, Bias, Drift, Latency SLAs)
  • Mentor senior engineers and drive design reviews for ML pipelines, model registries, and agentic runtime environments
  • Lead incident response and reliability strategies for ML/AI systems
  • Lead the deployment of AI models and systems in various environments
  • Collaborate with development teams to integrate AI solutions into existing workflows and applications
  • Ensure seamless integration with different platforms and technologies
  • Define and manage MCP Registry for agentic component onboarding, lifecycle versioning, and dependency governance
  • Build CI/CD pipelines automating LLM agent deployment, policy validation, and prompt evaluation of workflows
  • Develop and operationalize experimentation frameworks for agent evaluations, scenario regression, and performance analytics
What we offer
What we offer
  • short-term and long-term incentive programs
  • robust offering of benefit and wellness plans
  • Fulltime
Read More
Arrow Right
New

Senior Applied Scientist

Conversational commerce introduces challenges that differ from traditional web s...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 4+ years related experience (e.g., statistics predictive analytics, research)
  • OR Master's Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 3+ years related experience (e.g., statistics, predictive analytics, research)
  • OR Doctorate in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 1+ year(s) related experience (e.g., statistics, predictive analytics, research)
  • OR equivalent experience
  • 3+ years of hands-on experience developing machine learning or statistical models to solve real-world problems
  • Proficiency in programming for data science (e.g. using Python or R for data analysis and modeling)
  • Experience with data querying languages (e.g. SQL)
  • Hands-on experience with large-scale data processing using tools like Apache Spark or Azure Databricks
  • Skilled in time-series analysis and anomaly detection techniques
  • Practical experience with prompt engineering, fine-tuning GPT-like models, and applying LLMs in domain-heavy areas
Job Responsibility
Job Responsibility
  • Design, build, and productionize machine learning models for product discovery, ranking, recommendation, and personalization using large-scale commerce and behavioral data
  • Develop LLM-based systems for conversational shopping, including prompt design, retrieval-augmented generation, tool orchestration, and grounding against structured commerce data
  • Address quality and trust challenges such as hallucination risk, stale data, and recommendation reliability
  • Define evaluation frameworks and experimentation strategies for conversational and proactive shopping scenarios, including offline metrics and online experiments
  • Partner closely with product, engineering, and design teams to translate models into low-latency, reliable Copilot experiences
  • Provide technical leadership for applied science within Copilot Shopping through design reviews, mentoring, and setting quality standards
  • Contribute to model governance and Responsible AI practices to ensure trustworthy and compliant systems
  • Fulltime
Read More
Arrow Right

Senior Applied Scientist

Join the Signals Modeling team, part of Microsoft AI Ads Engineering, to shape t...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 4+ years related experience (e.g., statistics predictive analytics, research)
  • OR Master's Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 3+ years related experience (e.g., statistics, predictive analytics, research)
  • OR Doctorate in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 1+ year(s) related experience (e.g., statistics, predictive analytics, research)
  • OR equivalent experience
  • Proven experience in programming and data analysis skills
  • Proven expertise in the areas of Generative AI, deep learning, Reinforcement learning, transformers or LLM
  • 5+ years of experience in developing and deploying large-scale machine learning models
Job Responsibility
Job Responsibility
  • Develop and deploy cutting-edge machine learning models, including transformers, generative AI, and reinforcement learning, to optimize user interactions and ad relevance across Microsoft Ads and Copilot
  • Design scalable algorithms for online and offline systems, delivering innovative solutions for ads selection, ad generation and ad relevance
  • Drive experimentation through A/B testing and offline validation to evaluate model performance and refine user behavior predictions
  • Build robust data pipelines and frameworks for handling large-scale, high-dimensional datasets to support advanced AI applications
  • Stay at the forefront of AI research, incorporating the latest advancements to drive innovation and impact across Microsoft platforms
  • Fulltime
Read More
Arrow Right

Senior Applied Scientist

Join the Signals Modeling team, part of Microsoft AI Ads Engineering, to shape t...
Location
Location
Canada , Vancouver
Salary
Salary:
114400.00 - 203900.00 CAD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 4+ years related experience (e.g., statistics predictive analytics, research)
  • OR Master's Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 3+ years related experience (e.g., statistics, predictive analytics, research)
  • OR Doctorate in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 1+ year(s) related experience (e.g., statistics, predictive analytics, research)
  • OR equivalent experience
  • Proven experience in programming and data analysis skills
  • Proven expertise in the areas of Generative AI, deep learning, Reinforcement learning, transformers or LLM
  • 5+ years of experience in developing and deploying large-scale machine learning models
Job Responsibility
Job Responsibility
  • Develop and deploy cutting-edge machine learning models, including transformers, generative AI, and reinforcement learning, to optimize user interactions and ad relevance across Microsoft Ads and Copilot
  • Design scalable algorithms for online and offline systems, delivering innovative solutions for ads selection, ad generation and ad relevance
  • Drive experimentation through A/B testing and offline validation to evaluate model performance and refine user behavior predictions
  • Build robust data pipelines and frameworks for handling large-scale, high-dimensional datasets to support advanced AI applications
  • Stay at the forefront of AI research, incorporating the latest advancements to drive innovation and impact across Microsoft platforms
  • Fulltime
Read More
Arrow Right