CrawlJobs Logo

Senior Research Engineer, LLM Evaluation and Behavioral Analysis

together.ai Logo

Together AI

Location Icon

Location:
United States , San Francisco

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

220000.00 - 270000.00 USD / Year

Job Description:

Together AI is building the fastest, most capable open-source-aligned LLMs and inference stack in the world. As part of the Turbo organization, you will be a critical bridge between cutting-edge model research and real-world behavioral reliability. This role focuses on deeply understanding model behavior — probing reasoning, tool use, function calling, multi-step interactions, and subtle failure modes — and building the evaluation systems that ensure models behave intelligently and consistently in production. You will develop robust evaluation pipelines, design high-quality behavioral test suites, and work closely with training, post-training, inference, and product teams to identify regressions, shape datasets, and influence model improvements. Your work will directly define how Together measures model quality and reliability across releases.

Job Responsibility:

  • Build and iterate on evaluation frameworks that measure model performance across instruction following, function calling, long-context reasoning, multi-turn dialog, safety, and agentic behaviors
  • Develop specialized evaluation suites for: Function calling — argument correctness, schema adherence, tool selection, multi-function planning, and error recovery
  • Agentic workflows — task decomposition, multi-step planning, self-correction, and autonomous tool-use sequences
  • Tool-augmented interactions — search, retrieval, code execution, API-driven actions
  • Create CI/CD automated pipelines for A/B comparisons, regression detection, behavioral drift monitoring, and adversarial probing
  • Design and curate high-quality evaluation datasets, especially nuanced or challenging cases across domains
  • Collaborate with researchers and engineers to diagnose failures, triage regressions, and guide data selection, shaping strategies, objective design, and system improvements
  • Work with engineering teams to build dashboards, reports, and internal tools that help visualize behavior changes across releases
  • Operate in a fast-paced, high-impact environment with deep technical ownership and close partnership with world-class model researchers and infra engineers

Requirements:

  • Strong engineering skills with Python, evaluation tooling, and distributed workflows
  • Experience working with LLMs or transformer-based models, particularly in model evaluation, testing, or red-teaming
  • Ability to reason clearly about qualitative behavior, edge cases, and model failure patterns
  • Experience designing experiments, building datasets, and interpreting noisy behavioral signals
  • Understanding of function calling and structured output formats
  • Familiarity with GPU or distributed compute environments
  • Hands-on experience evaluating function-calling models, agentic systems, or tool-augmented LLM pipelines
  • Experience with multi-turn or multi-step reasoning tasks
  • Familiarity with inference systems, distributed infrastructure, or post-training workflows
  • Passion for discovering subtle behaviors, surprising model gaps, or edge-case failures
What we offer:
  • competitive compensation
  • startup equity
  • health insurance
  • other benefits

Additional Information:

Job Posted:
February 18, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:
PREMIUM
More languages and countries
+ Unlock 31694 hidden job offers
Languages
English Čeština Deutsch Ελληνικά Español Français +15
Countries
United States United Kingdom India Canada Australia +
See plans
Plans from $2.99 / month

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior Research Engineer, LLM Evaluation and Behavioral Analysis

Senior People Scientist

The Sr People Scientist is responsible for supplying to the development of an en...
Location
Location
United States , Bellevue
Salary
Salary:
127700.00 - 230300.00 USD / Year
https://www.t-mobile.com Logo
T-Mobile
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree Quantitative Subject area (math, statistics, economics, computer science, physics, engineering)
  • Master's/Advanced Degree Quantitative Subject area (I-O Psychology
  • Behavioral Economics
  • Applied Social Psychology w/emphases on research science and advanced statistics)
  • Doctorate Quantitative Discipline (I-O Psychology
  • Behavioral Economics
  • Applied Social Psychology w/emphases on research science and advanced statistics)
  • 7-10 years Research science or related experience
  • Proven experience with Gen AI for foundational models and LLM and demonstrating for analytics
  • 4-7 years Combination of deep technical skills and business savvy to interface and influence all levels and fields
Job Responsibility
Job Responsibility
  • Support the vision and research science roadmap in collaboration with the HR leadership team and senior leadership partners
  • Collaborate in identifying and addressing large-scale, sophisticated business problems related to employee experience, talent, and organizational capability
  • Drive the development and integration of diverse and complex data sources for advanced and sophisticated qualitative and quantitative modeling
  • Contribute to maintaining high standards in research science, including supporting the mentoring and development of team members
  • Develop and implement network analytics, AI/ML, and Deep Learning models to analyze sophisticated datasets and support innovation in people science
  • Build and run true A/B and quasi-experimental designs to assess the impact of mechanisms, programs, and various tested solutions that align to the overall T-Mobile people strategy
  • evaluate research initiatives to provide bottom line value, return on investment and improvements
  • Translate technical research findings into clear, concise, and engaging reports that support decisions and applications across the employee lifecycle
  • Collaborate with multiple teams and account teams to influence, build consensus, and drive significant T-Mobile wide changes related to applying research science proposals and recommendations, including changes to programs, engineering and system needs, and people strategy roadmaps
What we offer
What we offer
  • medical, dental and vision insurance
  • flexible spending account
  • 401(k)
  • employee stock grants
  • employee stock purchase plan
  • paid time off
  • up to 12 paid holidays
  • paid parental and family leave
  • family building benefits
  • back-up care
  • Fulltime
Read More
Arrow Right

Senior Security Researcher

The Intelligence Graph Research team within Microsoft CTO organization is respon...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Doctorate in Statistics, Mathematics, Computer Science, Computer Security, or related field OR Master's Degree in Statistics, Mathematics, Computer Science, Computer Security, or related field AND 3+ years experience in software development lifecycle, large-scale computing, threat analysis or modeling, cybersecurity, vulnerability research, and/or anomaly detection OR Bachelor's Degree in Statistics, Mathematics, Computer Science, Computer Security, or related field AND 4+ years experience in software development lifecycle, large-scale computing, threat analysis or modeling, cybersecurity, vulnerability research, and/or anomaly detection OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • 8+ years of experience in security research, detection engineering, threat hunting, incident response, or applied security data science (or equivalent depth of expertise)
  • 3+ years of experience in Azure and Entra security concepts: authentication flows, service principals/app registrations, permissions/consents, conditional access, role assignments, tokens, workload identities, and common abuse paths
  • 3+ years building anomaly detections over large-scale telemetry, including Baselines, time-series aggregates, and behavioral modeling, High-volume log analytics and query optimization (e.g., KQL/ADX or equivalent), Designing alert funnels and triage logic to reduce noise
  • 3+ years in experience in applied ML skills for security problems: Feature engineering, model selection, evaluation design, drift monitoring, Experience shipping ML or statistical detection into production systems
  • 3+ years in experience in Python/C# (data pipelines, modeling, production code quality), distributed processing (e.g., Spark/Databricks/Flink) and large datasets (Parquet/data lakes)
  • 1+ years experience with graph analytics for security use cases (attack paths, entity resolution, graph embeddings, community detection, anomaly scoring) and/or graph databases (Neo4j or similar)
  • 1+ years experience building or operationalizing LLM-powered or agentic investigation systems: Tool-driven agents, retrieval, memory, prompt/eval harnesses, guardrails, and human-in-the-loop workflow
  • 1+ years with Microsoft cloud security telemetry sources such as: Entra sign-in/audit logs, app consent events, Azure activity logs, Key Vault diagnostics, storage access logs, Graph API activity, etc
Job Responsibility
Job Responsibility
  • Build cloud-scale anomaly detections: Design and implement high-signal anomaly detectors across Azure/Entra and custom log sources (control plane, data plane, identity/auth, app activity, Graph API, Key Vault, storage, etc.)
  • Create detection funnels that reduce noise while preserving true positives, with measurable improvements in alert quality and investigation time
  • Develop baselines and “pattern-of-life” models for identities, service principals, applications, tenants, and infrastructure
  • Convert detections into ML models and scalable pipelines: Translate research detections into ML approaches (supervised, weakly-supervised, semi-supervised, anomaly detection) and deploy them into reliable pipelines
  • Engineer features at scale (time-series aggregates, behavior fingerprints, graph-derived features, sequence features) and evaluate performance with rigorous metrics (precision/recall, alert volume, time-to-triage, drift)
  • Own end-to-end lifecycle from hypothesis to productionization
  • Fulltime
Read More
Arrow Right

Senior Applied AI Engineer

We’re hiring a Senior Applied AI Engineer to join a fast‑moving, high‑ownership ...
Location
Location
United States , Mountain View
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Master’s Degree AND 3+ years of experience in engineering, problem solving, model building, evaluation, data analysis OR equivalent experience
  • 2+ years shipping production-level code, models, or data analysis
  • 1+ years using AI-assisted coding and analysis techniques
  • Experience working on small teams and mid-stage startup environments
  • Experience working on AI products
  • PhD in engineering, applied math, statistics, or related analytical field
  • 4+ years shipping production-level code, models, or data analysis
  • Deep experience building from zero-to-one
  • Hands on work hillclimbing AI evaluations
Job Responsibility
Job Responsibility
  • Design and ship LLM‑powered assistant features, including conversational flows, agentic behaviors, retrieval pipelines, and multimodal interactions
  • Build prompt architectures, system instructions, and orchestration logic that ensure reliability, grounding, and personality consistency
  • Prototype new capabilities rapidly and iterate based on user signals and evaluation data
  • Build and maintain evaluation frameworks for correctness, safety, grounding, and UX quality
  • Run hillclimbing loops across prompts, models, and tool‑use strategies to continuously improve assistant performance
  • Analyze failure modes, design mitigations, and drive systematic improvements across the stack
  • Develop internal tools for prompt experimentation, model comparison telemetry and debugging automated eval pipelines
  • Create reusable frameworks that accelerate the entire AI org’s ability to ship high‑quality assistant features
  • Integrate LLMs with product surfaces, APIs, and backend systems
  • Build lightweight ML components (ranking, classification, summarization, personalization) that enhance assistant intelligence
  • Fulltime
Read More
Arrow Right

Senior Applied AI Engineer

As an Senior Applied AI Engineer for CXA, you will play a pivotal role in advanc...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Statistics, Electrical/Computer Engineering, Physics, Mathematics or related field, OR Master’s degree OR PHD AND 1+ years of experience working with machine learning libraries to solve real world AI/ML problems
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Microsoft Cloud Background Check
  • Strong 7+ software engineering skills, including hands‑on development experience in C# and Python for building scalable, high‑performance, and production‑ready systems
  • Experience in working with Generative AI models and ML stacks
  • Experience across the product lifecycle from ideation to shipping
Job Responsibility
Job Responsibility
  • Build collaborative relationships with product and business groups to deliver AI-driven impact
  • Research and implement state-of-the-art using foundation models, prompt engineering, RAG, graphs, multi-agent architectures, as well as classical machine learning techniques
  • Fine-tune foundation models using domain-specific datasets
  • Evaluate model behavior on relevance, bias, hallucination, and response quality via offline evaluations, shadow experiments, online experiments, and ROI analysis
  • Apply strong software engineering skills in languages such as C# and Python to design, develop, and optimize scalable, reliable, and maintainable AI‑driven systems
  • Develop LLM prompts, agents, and query execution workflows, often with tight latency constraints
  • Build rapid AI solution prototypes, contribute to production deployment of these solutions, debug production code, support MLOps/AIOps
  • Contribute to papers, patents, and conference presentations
  • Translate research into production-ready solutions and measure their impact through A/B testing and telemetry that address customer needs
  • Ability to use data to identify gaps in AI quality, uncover insights and implement PoCs to show proof of concepts
  • Fulltime
Read More
Arrow Right

Senior GenAI Specialist – Finance – Vice President

We are seeking a highly skilled and passionate Senior GenAI Specialist with 8-10...
Location
Location
India , Pune
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6-10 years of relevant experience in Apps Development or systems analysis role
  • Extensive experience system analysis and in programming of software applications
  • Experience in managing and implementing successful projects
  • Subject Matter Expert (SME) in at least one area of Applications Development
  • Ability to adjust priorities quickly as circumstances dictate
  • Demonstrated leadership and project management skills
  • Consistently demonstrates clear and concise written and verbal communication
  • Master's degree or PhD in a relevant field
  • 8-10 years of experience in AI/ML development with a proven track record in GenAI
  • Deep understanding of GenAI models and architectures, including transformers, LLMs (Llama 3, Llama 4, Gemini, GPT-4), GANs, and diffusion models
Job Responsibility
Job Responsibility
  • Partner with multiple management teams to ensure appropriate integration of functions to meet goals as well as identify and define necessary system enhancements to deploy new products and process improvements
  • Resolve variety of high impact problems/projects through in-depth evaluation of complex business processes, system processes, and industry standards
  • Provide expertise in area and advanced knowledge of applications programming and ensure application design adheres to the overall architecture blueprint
  • Utilize advanced knowledge of system flow and develop standards for coding, testing, debugging, and implementation
  • Develop comprehensive knowledge of how areas of business, such as architecture and infrastructure, integrate to accomplish business goals
  • Provide in-depth analysis with interpretive thinking to define issues and develop innovative solutions
  • Serve as advisor or coach to mid-level developers and analysts, allocating work as necessary
  • Appropriately assess risk when business decisions are made, demonstrating particular consideration for the firm's reputation and safeguarding Citigroup, its clients and assets, by driving compliance with applicable laws, rules and regulations, adhering to Policy, applying sound ethical judgment regarding personal behavior, conduct and business practices, and escalating, managing and reporting control issues with transparency
  • Design, develop, and implement sophisticated GenAI solutions for diverse financial applications, exploring advanced concepts like Agentic AI and RLHF
  • Design and implement intelligent chatbots for enhanced customer interaction and operational efficiency
  • Fulltime
Read More
Arrow Right

Senior Applied Scientist

The Bing Places team is building intelligence that powers local search experienc...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 4+ years related experience (e.g., statistics predictive analytics, research) OR Master's Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 3+ years related experience (e.g., statistics, predictive analytics, research) OR Doctorate in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 1+ year(s) related experience (e.g., statistics, predictive analytics, research) OR equivalent experience.
Job Responsibility
Job Responsibility
  • Formulate complex product and engineering problems as machine learning and AI tasks, and drive them from concept through production
  • Design, implement, and evaluate ML‑ and LLM‑based models that improve Bing Places quality, relevance, and coverage
  • Conduct rigorous data analysis to understand system behavior, identify opportunities, and define success metrics
  • Prototype new modeling approaches and iterate quickly based on offline evaluation and online experimentation
  • Own experimentation pipelines, including offline validation and large‑scale online A/B flighting
  • Partner closely with engineers to integrate models into production systems and ensure long‑term reliability and performance
  • Drive technical direction within your problem space and influence broader modeling and platform decisions
  • Document and communicate results through technical design reviews, papers, and patent filings.
  • Fulltime
Read More
Arrow Right

Staff II Software Engineer AI/ML Ops

We're looking for a Lead Data Engineer to design, build, and optimize data pipel...
Location
Location
United States , Pleasanton
Salary
Salary:
245000.00 - 307000.00 USD / Year
blackline.com Logo
BlackLine
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong programming skills in languages such as Python, Java, or Scala
  • Expertise in ML frameworks (TensorFlow, PyTorch, scikit-learn) and orchestration tools (Airflow, Kubeflow, Vertex AI, MLflow)
  • Proven experience operating production pipelines for ML and LLM-based systems across cloud ecosystems (GCP, AWS, Azure)
  • Deep familiarity with LangChain, LangGraph, ADK or similar agentic system runtime management
  • Strong competencies in CI/CD, IaC, and DevSecOps pipelines integrating testing, compliance, and deployment automation
  • Hands-on with observability stacks (Prometheus, Grafana, Newrelic) for model and agent performance tracking
  • Understanding of governance frameworks for Responsible AI, auditability, and cost metering across training and inference workloads
  • Proficiency in containerization technologies (e.g., Docker, Kubernetes)
  • Proficient in scripting languages (e.g., Bash, python) for automation
  • Experience with workflow orchestration tools (e.g., Apache Airflow)
Job Responsibility
Job Responsibility
  • Lead data pipeline development: Build and maintain PySpark ETL pipelines with high data quality and performance
  • Manage integrations: Establish robust connections to client data sources via APIs and tools like FiveTran, Plaid, and BlackLine's own internal connector ecosystem
  • Ensure reliability: Monitor pipeline performance, automate testing, and validate data accuracy
  • Optimize for scale: Implement performance improvements (e.g., CDC mechanisms, indexing strategies) for large-scale datasets
  • Collaborate & innovate: Work with business stakeholders to refine data requirements and integrate cutting-edge AI and big data technologies
  • Partner with data science, security, and product teams to set evaluation and governance standards (Guardrails, Bias, Drift, Latency SLAs)
  • Mentor senior engineers and drive design reviews for ML pipelines, model registries, and agentic runtime environments
  • Lead incident response and reliability strategies for ML/AI systems
  • Collaborate with development teams to integrate AI solutions into existing workflows and applications
  • Ensure seamless integration with different platforms and technologies
What we offer
What we offer
  • Short-term and long-term incentive programs
  • Robust offering of benefit and wellness plans
  • Fulltime
Read More
Arrow Right

Principal Engineer

The Principal AI/ML Operations Engineer leads the architecture, automation, and ...
Location
Location
United States , Pleasanton, California
Salary
Salary:
251000.00 - 314500.00 USD / Year
blackline.com Logo
BlackLine
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Machine Learning, Data Science, or a related field
  • 10+ years in ML infrastructure, DevOps, and software system architecture
  • 4+ years in leading MLOps or AI Ops platforms
  • Strong programming skills in languages such as Python, Java, or Scala
  • Expertise in ML frameworks (TensorFlow, PyTorch, scikit-learn) and orchestration tools (Airflow, Kubeflow, Vertex AI, MLflow)
  • Proven experience operating production pipelines for ML and LLM-based systems across cloud ecosystems (GCP, AWS, Azure)
  • Deep familiarity with LangChain, LangGraph, ADK or similar agentic system runtime management
  • Strong competencies in CI/CD, IaC, and DevSecOps pipelines integrating testing, compliance, and deployment automation
  • Hands-on with observability stacks (Prometheus, Grafana, Newrelic) for model and agent performance tracking
  • Understanding of governance frameworks for Responsible AI, auditability, and cost metering across training and inference workloads
Job Responsibility
Job Responsibility
  • Define enterprise-level standards and reference architectures for ML-Ops and AIOps systems
  • Partner with data science, security, and product teams to set evaluation and governance standards (Guardrails, Bias, Drift, Latency SLAs)
  • Mentor senior engineers and drive design reviews for ML pipelines, model registries, and agentic runtime environments
  • Lead incident response and reliability strategies for ML/AI systems
  • Lead the deployment of AI models and systems in various environments
  • Collaborate with development teams to integrate AI solutions into existing workflows and applications
  • Ensure seamless integration with different platforms and technologies
  • Define and manage MCP Registry for agentic component onboarding, lifecycle versioning, and dependency governance
  • Build CI/CD pipelines automating LLM agent deployment, policy validation, and prompt evaluation of workflows
  • Develop and operationalize experimentation frameworks for agent evaluations, scenario regression, and performance analytics
What we offer
What we offer
  • short-term and long-term incentive programs
  • robust offering of benefit and wellness plans
  • Fulltime
Read More
Arrow Right