CrawlJobs Logo

Senior Research Engineer, LLM Evaluation and Behavioral Analysis

together.ai Logo

Together AI

Location Icon

Location:
United States , San Francisco

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

220000.00 - 270000.00 USD / Year

Job Description:

Together AI is building the fastest, most capable open-source-aligned LLMs and inference stack in the world. As part of the Turbo organization, you will be a critical bridge between cutting-edge model research and real-world behavioral reliability. This role focuses on deeply understanding model behavior — probing reasoning, tool use, function calling, multi-step interactions, and subtle failure modes — and building the evaluation systems that ensure models behave intelligently and consistently in production. You will develop robust evaluation pipelines, design high-quality behavioral test suites, and work closely with training, post-training, inference, and product teams to identify regressions, shape datasets, and influence model improvements. Your work will directly define how Together measures model quality and reliability across releases.

Job Responsibility:

  • Build and iterate on evaluation frameworks that measure model performance across instruction following, function calling, long-context reasoning, multi-turn dialog, safety, and agentic behaviors
  • Develop specialized evaluation suites for: Function calling — argument correctness, schema adherence, tool selection, multi-function planning, and error recovery
  • Agentic workflows — task decomposition, multi-step planning, self-correction, and autonomous tool-use sequences
  • Tool-augmented interactions — search, retrieval, code execution, API-driven actions
  • Create CI/CD automated pipelines for A/B comparisons, regression detection, behavioral drift monitoring, and adversarial probing
  • Design and curate high-quality evaluation datasets, especially nuanced or challenging cases across domains
  • Collaborate with researchers and engineers to diagnose failures, triage regressions, and guide data selection, shaping strategies, objective design, and system improvements
  • Work with engineering teams to build dashboards, reports, and internal tools that help visualize behavior changes across releases
  • Operate in a fast-paced, high-impact environment with deep technical ownership and close partnership with world-class model researchers and infra engineers

Requirements:

  • Strong engineering skills with Python, evaluation tooling, and distributed workflows
  • Experience working with LLMs or transformer-based models, particularly in model evaluation, testing, or red-teaming
  • Ability to reason clearly about qualitative behavior, edge cases, and model failure patterns
  • Experience designing experiments, building datasets, and interpreting noisy behavioral signals
  • Understanding of function calling and structured output formats
  • Familiarity with GPU or distributed compute environments
  • Hands-on experience evaluating function-calling models, agentic systems, or tool-augmented LLM pipelines
  • Experience with multi-turn or multi-step reasoning tasks
  • Familiarity with inference systems, distributed infrastructure, or post-training workflows
  • Passion for discovering subtle behaviors, surprising model gaps, or edge-case failures
What we offer:
  • competitive compensation
  • startup equity
  • health insurance
  • other benefits

Additional Information:

Job Posted:
February 18, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior Research Engineer, LLM Evaluation and Behavioral Analysis

Senior People Scientist

The Sr People Scientist is responsible for supplying to the development of an en...
Location
Location
United States , Bellevue
Salary
Salary:
127700.00 - 230300.00 USD / Year
https://www.t-mobile.com Logo
T-Mobile
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree Quantitative Subject area (math, statistics, economics, computer science, physics, engineering)
  • Master's/Advanced Degree Quantitative Subject area (I-O Psychology
  • Behavioral Economics
  • Applied Social Psychology w/emphases on research science and advanced statistics)
  • Doctorate Quantitative Discipline (I-O Psychology
  • Behavioral Economics
  • Applied Social Psychology w/emphases on research science and advanced statistics)
  • 7-10 years Research science or related experience
  • Proven experience with Gen AI for foundational models and LLM and demonstrating for analytics
  • 4-7 years Combination of deep technical skills and business savvy to interface and influence all levels and fields
Job Responsibility
Job Responsibility
  • Support the vision and research science roadmap in collaboration with the HR leadership team and senior leadership partners
  • Collaborate in identifying and addressing large-scale, sophisticated business problems related to employee experience, talent, and organizational capability
  • Drive the development and integration of diverse and complex data sources for advanced and sophisticated qualitative and quantitative modeling
  • Contribute to maintaining high standards in research science, including supporting the mentoring and development of team members
  • Develop and implement network analytics, AI/ML, and Deep Learning models to analyze sophisticated datasets and support innovation in people science
  • Build and run true A/B and quasi-experimental designs to assess the impact of mechanisms, programs, and various tested solutions that align to the overall T-Mobile people strategy
  • evaluate research initiatives to provide bottom line value, return on investment and improvements
  • Translate technical research findings into clear, concise, and engaging reports that support decisions and applications across the employee lifecycle
  • Collaborate with multiple teams and account teams to influence, build consensus, and drive significant T-Mobile wide changes related to applying research science proposals and recommendations, including changes to programs, engineering and system needs, and people strategy roadmaps
What we offer
What we offer
  • medical, dental and vision insurance
  • flexible spending account
  • 401(k)
  • employee stock grants
  • employee stock purchase plan
  • paid time off
  • up to 12 paid holidays
  • paid parental and family leave
  • family building benefits
  • back-up care
  • Fulltime
Read More
Arrow Right
New

Senior Security Researcher

The Intelligence Graph Research team within Microsoft CTO organization is respon...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Doctorate in Statistics, Mathematics, Computer Science, Computer Security, or related field OR Master's Degree in Statistics, Mathematics, Computer Science, Computer Security, or related field AND 3+ years experience in software development lifecycle, large-scale computing, threat analysis or modeling, cybersecurity, vulnerability research, and/or anomaly detection OR Bachelor's Degree in Statistics, Mathematics, Computer Science, Computer Security, or related field AND 4+ years experience in software development lifecycle, large-scale computing, threat analysis or modeling, cybersecurity, vulnerability research, and/or anomaly detection OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • 8+ years of experience in security research, detection engineering, threat hunting, incident response, or applied security data science (or equivalent depth of expertise)
  • 3+ years of experience in Azure and Entra security concepts: authentication flows, service principals/app registrations, permissions/consents, conditional access, role assignments, tokens, workload identities, and common abuse paths
  • 3+ years building anomaly detections over large-scale telemetry, including Baselines, time-series aggregates, and behavioral modeling, High-volume log analytics and query optimization (e.g., KQL/ADX or equivalent), Designing alert funnels and triage logic to reduce noise
  • 3+ years in experience in applied ML skills for security problems: Feature engineering, model selection, evaluation design, drift monitoring, Experience shipping ML or statistical detection into production systems
  • 3+ years in experience in Python/C# (data pipelines, modeling, production code quality), distributed processing (e.g., Spark/Databricks/Flink) and large datasets (Parquet/data lakes)
  • 1+ years experience with graph analytics for security use cases (attack paths, entity resolution, graph embeddings, community detection, anomaly scoring) and/or graph databases (Neo4j or similar)
  • 1+ years experience building or operationalizing LLM-powered or agentic investigation systems: Tool-driven agents, retrieval, memory, prompt/eval harnesses, guardrails, and human-in-the-loop workflow
  • 1+ years with Microsoft cloud security telemetry sources such as: Entra sign-in/audit logs, app consent events, Azure activity logs, Key Vault diagnostics, storage access logs, Graph API activity, etc
Job Responsibility
Job Responsibility
  • Build cloud-scale anomaly detections: Design and implement high-signal anomaly detectors across Azure/Entra and custom log sources (control plane, data plane, identity/auth, app activity, Graph API, Key Vault, storage, etc.)
  • Create detection funnels that reduce noise while preserving true positives, with measurable improvements in alert quality and investigation time
  • Develop baselines and “pattern-of-life” models for identities, service principals, applications, tenants, and infrastructure
  • Convert detections into ML models and scalable pipelines: Translate research detections into ML approaches (supervised, weakly-supervised, semi-supervised, anomaly detection) and deploy them into reliable pipelines
  • Engineer features at scale (time-series aggregates, behavior fingerprints, graph-derived features, sequence features) and evaluate performance with rigorous metrics (precision/recall, alert volume, time-to-triage, drift)
  • Own end-to-end lifecycle from hypothesis to productionization
  • Fulltime
Read More
Arrow Right
New

Senior Applied AI Engineer

We’re hiring a Senior Applied AI Engineer to join a fast‑moving, high‑ownership ...
Location
Location
United States , Mountain View
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Master’s Degree AND 3+ years of experience in engineering, problem solving, model building, evaluation, data analysis OR equivalent experience
  • 2+ years shipping production-level code, models, or data analysis
  • 1+ years using AI-assisted coding and analysis techniques
  • Experience working on small teams and mid-stage startup environments
  • Experience working on AI products
  • PhD in engineering, applied math, statistics, or related analytical field
  • 4+ years shipping production-level code, models, or data analysis
  • Deep experience building from zero-to-one
  • Hands on work hillclimbing AI evaluations
Job Responsibility
Job Responsibility
  • Design and ship LLM‑powered assistant features, including conversational flows, agentic behaviors, retrieval pipelines, and multimodal interactions
  • Build prompt architectures, system instructions, and orchestration logic that ensure reliability, grounding, and personality consistency
  • Prototype new capabilities rapidly and iterate based on user signals and evaluation data
  • Build and maintain evaluation frameworks for correctness, safety, grounding, and UX quality
  • Run hillclimbing loops across prompts, models, and tool‑use strategies to continuously improve assistant performance
  • Analyze failure modes, design mitigations, and drive systematic improvements across the stack
  • Develop internal tools for prompt experimentation, model comparison telemetry and debugging automated eval pipelines
  • Create reusable frameworks that accelerate the entire AI org’s ability to ship high‑quality assistant features
  • Integrate LLMs with product surfaces, APIs, and backend systems
  • Build lightweight ML components (ranking, classification, summarization, personalization) that enhance assistant intelligence
  • Fulltime
Read More
Arrow Right

Principal Engineer

The Principal AI/ML Operations Engineer leads the architecture, automation, and ...
Location
Location
United States , Pleasanton, California
Salary
Salary:
251000.00 - 314500.00 USD / Year
blackline.com Logo
BlackLine
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Machine Learning, Data Science, or a related field
  • 10+ years in ML infrastructure, DevOps, and software system architecture
  • 4+ years in leading MLOps or AI Ops platforms
  • Strong programming skills in languages such as Python, Java, or Scala
  • Expertise in ML frameworks (TensorFlow, PyTorch, scikit-learn) and orchestration tools (Airflow, Kubeflow, Vertex AI, MLflow)
  • Proven experience operating production pipelines for ML and LLM-based systems across cloud ecosystems (GCP, AWS, Azure)
  • Deep familiarity with LangChain, LangGraph, ADK or similar agentic system runtime management
  • Strong competencies in CI/CD, IaC, and DevSecOps pipelines integrating testing, compliance, and deployment automation
  • Hands-on with observability stacks (Prometheus, Grafana, Newrelic) for model and agent performance tracking
  • Understanding of governance frameworks for Responsible AI, auditability, and cost metering across training and inference workloads
Job Responsibility
Job Responsibility
  • Define enterprise-level standards and reference architectures for ML-Ops and AIOps systems
  • Partner with data science, security, and product teams to set evaluation and governance standards (Guardrails, Bias, Drift, Latency SLAs)
  • Mentor senior engineers and drive design reviews for ML pipelines, model registries, and agentic runtime environments
  • Lead incident response and reliability strategies for ML/AI systems
  • Lead the deployment of AI models and systems in various environments
  • Collaborate with development teams to integrate AI solutions into existing workflows and applications
  • Ensure seamless integration with different platforms and technologies
  • Define and manage MCP Registry for agentic component onboarding, lifecycle versioning, and dependency governance
  • Build CI/CD pipelines automating LLM agent deployment, policy validation, and prompt evaluation of workflows
  • Develop and operationalize experimentation frameworks for agent evaluations, scenario regression, and performance analytics
What we offer
What we offer
  • short-term and long-term incentive programs
  • robust offering of benefit and wellness plans
  • Fulltime
Read More
Arrow Right

Senior Applied Scientist

Join the Signals Modeling team, part of Microsoft AI Ads Engineering, to shape t...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 4+ years related experience (e.g., statistics predictive analytics, research)
  • OR Master's Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 3+ years related experience (e.g., statistics, predictive analytics, research)
  • OR Doctorate in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 1+ year(s) related experience (e.g., statistics, predictive analytics, research)
  • OR equivalent experience
  • Proven experience in programming and data analysis skills
  • Proven expertise in the areas of Generative AI, deep learning, Reinforcement learning, transformers or LLM
  • 5+ years of experience in developing and deploying large-scale machine learning models
Job Responsibility
Job Responsibility
  • Develop and deploy cutting-edge machine learning models, including transformers, generative AI, and reinforcement learning, to optimize user interactions and ad relevance across Microsoft Ads and Copilot
  • Design scalable algorithms for online and offline systems, delivering innovative solutions for ads selection, ad generation and ad relevance
  • Drive experimentation through A/B testing and offline validation to evaluate model performance and refine user behavior predictions
  • Build robust data pipelines and frameworks for handling large-scale, high-dimensional datasets to support advanced AI applications
  • Stay at the forefront of AI research, incorporating the latest advancements to drive innovation and impact across Microsoft platforms
  • Fulltime
Read More
Arrow Right

Senior Applied Scientist

Join the Signals Modeling team, part of Microsoft AI Ads Engineering, to shape t...
Location
Location
Canada , Vancouver
Salary
Salary:
114400.00 - 203900.00 CAD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 4+ years related experience (e.g., statistics predictive analytics, research)
  • OR Master's Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 3+ years related experience (e.g., statistics, predictive analytics, research)
  • OR Doctorate in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 1+ year(s) related experience (e.g., statistics, predictive analytics, research)
  • OR equivalent experience
  • Proven experience in programming and data analysis skills
  • Proven expertise in the areas of Generative AI, deep learning, Reinforcement learning, transformers or LLM
  • 5+ years of experience in developing and deploying large-scale machine learning models
Job Responsibility
Job Responsibility
  • Develop and deploy cutting-edge machine learning models, including transformers, generative AI, and reinforcement learning, to optimize user interactions and ad relevance across Microsoft Ads and Copilot
  • Design scalable algorithms for online and offline systems, delivering innovative solutions for ads selection, ad generation and ad relevance
  • Drive experimentation through A/B testing and offline validation to evaluate model performance and refine user behavior predictions
  • Build robust data pipelines and frameworks for handling large-scale, high-dimensional datasets to support advanced AI applications
  • Stay at the forefront of AI research, incorporating the latest advancements to drive innovation and impact across Microsoft platforms
  • Fulltime
Read More
Arrow Right
New

Senior Data Scientist - Copilot Studio

Microsoft is a company where passionate innovators come to collaborate, envision...
Location
Location
Israel , Tel Aviv, Herzliya
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Doctorate in Data Science, Mathematics, Statistics, Econometrics, Physics, Operations Research, Computer Science, or related field AND 3+ year(s) data-science experience (e.g., managing structured and unstructured data, applying statistical techniques and reporting results)
  • OR Master's Degree in Data Science, Mathematics, Statistics, Econometrics, Physics, Operations Research, Computer Science, or related field AND 5+ years data-science experience
  • Proficiency in one or more programming/scripting languages for working with data, such as Python, C++, or C#
  • Experience in building ML and LLM products, focusing on NLP and conversational AI
  • Analytical Mindset: Strong data analysis and problem-solving skills. Ability to use data to draw insights and make decisions, experiment design, and interpret model performance metrics
  • Excellent teamwork and communication skills. Comfortable working in a fast-paced, interdisciplinary environment and presenting complex findings in a clear, impactful way
  • Passion for learning and stay informed with State of the Art progress
  • This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Formulate data-driven approaches to evaluate and improve AI agent performance, leveraging diverse algorithms and data sources
  • Apply state-of-the-art LLM and machine learning techniques to analyze and optimize agent behavior
  • Use data exploration to uncover patterns in agent interactions, identify new opportunities or issues, and assess data limitations within our problem space
  • Engage with product teams and collaborate with other data scientists, engineers, designers, and product managers to translate findings into clear, actionable insights that shape product features and improve our Copilot Studio platform
  • Fulltime
Read More
Arrow Right
New

Pet-ct Radiographer/technologist

We have an exciting opportunity for a skilled and motivated PET CT Radiographer/...
Location
Location
United Kingdom , Southampton
Salary
Salary:
31500.00 - 41000.00 GBP / Year
alliancemedical.co.uk Logo
Alliance Medical Ltd
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • HCPC or equivalent voluntary registration
  • Background in Nuclear Medicine
  • Experience in handling and dispensing radiopharmaceuticals
  • High level of competency in cannulation preferable
  • Significant experience working as a Radiographer considered
  • Patient-focused attitude
  • Flexible, adaptable team player
  • Excellent communication skills, both verbal and written
  • Patient care experience/skills
  • Excellent interpersonal and organisation skills including time management
Job Responsibility
Job Responsibility
  • Undertake a competency programme in PET-CT to develop knowledge and gain a full understanding of PET-CT
  • IV cannulate patients as part of the role
  • Work across 2 sites, supporting the staffing of the Salisbury mobile service 1-2 days per week
What we offer
What we offer
  • Training will be delivered “On the Job” and in collaboration with the Christie School of Oncology via the PET-CT training academy
  • Full training provided for IV cannulation if no prior experience
  • Fulltime
Read More
Arrow Right