CrawlJobs Logo

Model Behavior Engineer

United States, New York Employment contract 98000.00 - 140000.00 USD / Year · Job Posted June 07, 2026
Apply Position
Job Link Share

Job Description

You'll own the quality bar for Notion AI products. You’ll work with product and engineering teams to build systems to define what “good” looks like, measure our progress, and drive changes to deliver reliable and high-quality AI experiences. Your work directly shapes how Notion's AI products behave for millions of users. This isn't a traditional software engineering role. It’s an art & science role. You won't spend your days writing code. Instead, you'll focus on understanding and shaping how our AI products behave through context engineering, designing evaluation systems, and analyzing data. This team sits in our AI engineering team, working directly with engineering, product, design, and data. This role is a unique blend of ops, strategy, and product thinking. Day to day, you'll live in production data, ship prompt fixes, run evals and, in effect, shape our quality strategy. As part of that you'll shape Notion's model strategy and work directly with frontier AI labs (OpenAI, Anthropic, Google) to evaluate and launch new models.

Job Responsibility

  • Context engineering — Design, test, and iterate on system prompts, tool prompts, and context strategies that shape how Notion's AI products behave
  • Understand & debug — Live in production data: transcripts, logs, user feedback
  • Build evals & Measurement — Design eval strategies, build datasets, run evaluations
  • Evaluate and launch new models with leading research labs
  • Drive quality priorities — Work embedded with eng and product teams to surface the most important issues
  • Build tooling & systems — Help manage AI observability and eval platforms

Requirements

  • Driver mentality — You treat problems as yours. If something's broken, it's your job to fix it, even if you didn't cause it. You have a bias to action.
  • Curiosity -You’re excited about exploring the “jagged frontier” of LLM capabilities and how AI products meet reality
  • Analytical instinct — Your first move is to look at data. You can find signal in noise.
  • Comfortable working with data — You can self-serve insights from large datasets, whether through SQL, coding agents, or other tools.
  • Clear communication — You can explain complex issues simply.
  • Experience with LLMs, prompting, or AI products

Nice to have

  • Backgrounds in engineering, product, data science, research, consulting
  • You've built something on your own to solve a problem — side project, startup, tool, whatever

What we offer

  • Highly competitive cash compensation
  • equity
  • benefits

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Model Behavior Engineer

8 matching positions

Product Manager, API Model Behavior

As a Model Behavior Product Manager for the API team, you'll be at the forefront...
Location
Location
United States , San Francisco
Salary
Salary:
293000.00 - 385000.00 USD / Year
openai.com Logo
OpenAI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of product management or related industry experience
  • Experience collaborating directly and deeply with high-growth startups
  • Proven track record of building for developers, with strong intuition for designing clear, flexible APIs and primitives that scale from early experimentation to production use
  • Hands-on experience driving consensus and action in ambiguous spaces
  • Excel at collaborating across diverse teams and communicating complex ideas clearly
Job Responsibility
Job Responsibility
  • Define strategic priorities and roadmap for improving model behavior for API users, focusing on user outcomes, safety, reliability, and emerging capabilities
  • Partner with research and engineering teams at a technical level to translate those goals into model capability improvements
  • Partner with cross-functional teams to launch OpenAI’s frontier models in the API, and expose their full capabilities to users via flexible and powerful API primitives
  • Develop scalable methodologies, tools, and processes for evaluating, tuning, and iterating on model behavior
  • Synthesize user research, community feedback, and quantitative insights into targeted improvements in our AI models
  • Establish and iterate on clear, actionable metrics that accurately reflect model quality and user experience at scale
What we offer
What we offer
  • Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
  • 401(k) retirement plan with employer match
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
  • 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
  • Mental health and wellness support
  • Employer-paid basic life and disability coverage
  • Annual learning and development stipend to fuel your professional growth
  • Daily meals in our offices, and meal delivery credits as eligible
  • Fulltime
Read More
Arrow Right

Product Manager, Model Behavior

As a Product Manager for the Model Behavior team, you'll be at the forefront of ...
Location
Location
United States , San Francisco
Salary
Salary:
230000.00 - 325000.00 USD / Year
openai.com Logo
OpenAI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years of product management or related industry experience
  • Interest in fields such as human-computer interaction, psychology, philosophy, or other relevant fields
  • Excitement about building not just a product, but a new form of intelligence, with the aim to benefit humanity
  • Hands-on experience driving consensus and action in ambiguous spaces
  • Know how to ask questions that uncover underlying constraints and assumptions
  • Excel at collaborating across diverse teams and communicating complex ideas clearly
Job Responsibility
Job Responsibility
  • Define strategic priorities and roadmap for improving model behavior, focusing on user outcomes, safety, reliability, and emerging capabilities
  • Partner closely with research, engineering, product design, and policy teams to translate strategic goals into actionable product initiatives
  • Develop scalable methodologies, tools, and processes for evaluating, tuning, and iterating on model behavior
  • Synthesize user research, community feedback, and quantitative insights into targeted improvements in our AI models
  • Establish and iterate on clear, actionable metrics that accurately reflect model quality and user experience at scale
What we offer
What we offer
  • Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
  • 401(k) retirement plan with employer match
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
  • 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
  • Mental health and wellness support
  • Employer-paid basic life and disability coverage
  • Annual learning and development stipend to fuel your professional growth
  • Daily meals in our offices, and meal delivery credits as eligible
  • Fulltime
Read More
Arrow Right

Model Behavior Architect

We're looking for a Model Behavior Architect to help build Perplexity's AI produ...
Location
Location
United States , San Francisco
Salary
Salary:
180000.00 - 260000.00 USD / Year
perplexity.ai Logo
Perplexity
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience designing evaluations, benchmarks, or metrics for AI systems
  • Strong written and verbal communication skills, particularly in explaining complex concepts to diverse stakeholders
  • Ability to manage multiple concurrent projects in a fast-moving environment
  • Strong experience with Perplexity or other frontier AI models in production settings
  • Demonstrated experience with Python — you'll prototype, debug, automate, and build systems at scale
  • 3+ years of experience working with LLMs in a product or research setting
Job Responsibility
Job Responsibility
  • Context Engineering: Design, test, and optimize context strategies and system prompts that shape answer engine behavior across products, features, and use cases
  • Evaluation Systems: Build automated and semi-automated evaluation pipelines that measure model quality, catch regressions, and scale across product surfaces
  • Model Launch Support: Partner with research and engineering to validate model behavior before and during rollouts, ensuring smooth transitions with no degradation
  • Research & Analysis: Identify inconsistencies and failure modes in model outputs through well-designed research projects — for both internal and production-facing systems
  • Cross-functional Collaboration: Work closely with design, product, and research teams to translate product goals into concrete model behavior requirements
  • Knowledge Sharing: Help engineers across teams build intuition for prompt design, context engineering, and evaluation best practices
  • Staying Current: Track the latest alignment, evaluation, and prompting techniques from industry and academia, and bring the best ideas back to the team
What we offer
What we offer
  • equity
  • health
  • dental
  • vision
  • retirement
  • fitness
  • commuter and dependent care accounts
  • Fulltime
Read More
Arrow Right

Software Engineer Ii, Behavior Planning Ml Platform

Aurora’s mission is to deliver the benefits of self-driving technology safely, q...
Location
Location
United States , Pittsburgh
Salary
Salary:
126000.00 - 201000.00 USD / Year
aurora.tech Logo
Aurora Innovation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • BS or higher degree in Computer Science/Engineering or related fields. > 6 months of experience
  • Strong programming skills in C++ or Python, ideally both
  • Experience with machine learning frameworks (PyTorch or TensorFlow)
  • Solid foundation in computer science fundamentals - especially operating system concepts including concurrency, memory management and process scheduling.
Job Responsibility
Job Responsibility
  • Develop large scale pipelines for data extraction, model training and model evaluation
  • Build and optimize onboard ML infrastructure used to deploy models and run inference onboard the vehicle
  • Collaborate closely with motion planning, systems engineering, and other autonomy groups to define and develop critical ML workflow requirements.
Read More
Arrow Right

Senior Low Observables Design & Integration Engineer - Pole Model

At Boeing, we innovate and collaborate to make the world a better place. We’re c...
Location
Location
United States , Berkeley
Salary
Salary:
Not provided
boeing.com Logo
Boeing
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor of Science degree in Engineering (with a focus in Electrical, Mechanical or Aeronautical), Computer Science, Data Science, Mathematics, Physics, Chemistry or non-US equivalent qualifications directly related to the work statement
  • Professional experience working with low observable materials and technologies, including hands-on exposure to LO integration, design, manufacturing, and test
  • Demonstrated experience using computational electromagnetic solvers (e.g., FEKO, HFSS, CST, WIPL-D, CARLOS, SENTRI, XPATCH or equivalent) for design and optimization of LO systems and RCS predictions
  • Strong understanding of electromagnetic principles relevant to scattering, phenomena, and Radar Cross Section
  • Proven track record supporting fabrication and testing of LO components/assemblies and correlating measurements to models
  • Proficiency in data processing and analysis of RCS/EM test data, including calibration, clutter and background rejection, and data visualization techniques
  • Excellent technical writing skills with experience producing engineering reports, test plans, and test reports
  • Active U.S. Top Secret Security Clearance
  • Ability to obtain Special Program Access (U.S. Only Citizenship required)
Job Responsibility
Job Responsibility
  • Lead detailed LO design and integration for the RCS pole model, including material selection, incorporation of advanced technologies, supplier hardware integration
  • Use computational electromagnetic solvers to model, analyze, and optimize radar cross section (RCS) and scattering behavior across required frequency bands and aspect angles in support of pre-test predictions and diagnostics
  • Work with Manufacturing to ensure proper alignment between design, analysis and fabrication of LO components
  • Define and support fabrication processes, QA checks, and build plans for LO skins, coatings, RAM treatments, and attachments
  • identify and mitigate manufacturability risks, provide LO liaison support to the shop
  • Develop and execute test plans for RCS characterization (anechoic chamber and outdoor ranges), including instrumentation, calibration, and measurement repeatability considerations
  • Prepare and execute data processing workflows to reduce, calibrate, and analyze measured RCS and related EM test data
  • combine simulation and measurement data for validation and design iteration
  • Produce clear technical documentation: design descriptions, analysis reports, test plans, test reports, procedures, and presentation materials for program reviews
  • Mentor junior engineers and support continuous improvement of LO design, test, and data processing practices
What we offer
What we offer
  • Best in class 401(k) plan: match contributions dollar for dollar, up to 10% of eligible pay with Immediate 100% vesting
  • Student Loan Match
  • health insurance
  • flexible spending accounts
  • health savings accounts
  • retirement savings plans
  • life and disability insurance programs
  • paid and unpaid time away from work
  • Potential signing bonus for eligible/qualified external candidates
  • Relocation based on candidate eligibility
  • Fulltime
Read More
Arrow Right

Software Engineer, Autonomy Behavior Validation

As a Software Engineer on the Software Validation team within the AV organizatio...
Location
Location
United States , Sunnyvale
Salary
Salary:
123200.00 - 189100.00 USD / Year
gm.com Logo
General Motors
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master's degree in Computer Science, Software Engineering, Data Science, or related fields
  • 1–3 years of professional software engineering experience (including internships, co-ops, or research engineering roles) building automation, internal tools, or data/analysis pipelines
  • Using large language models (LLMs) to summarize results, generate reports, or accelerate analysis
  • Building simple agents or scripts that chain tools together to complete tasks end-to-end
  • Strong programming skills in Python and experience with SQL
  • Experience writing clean, well-tested, and maintainable code for data processing, backend services, or scientific/analytical workflows
  • Experience working with large datasets to derive insights, build analyses, or drive decisions
  • Strong analytical thinking skills with the ability to interpret data and derive impactful conclusions
  • Ability to adapt and operate under ambiguity, going from quick code prototypes to longer-term, production-ready solutions on brief time horizons
  • Excellent communication skills, capable of switching between high-level and detailed technical discussions
Job Responsibility
Job Responsibility
  • Design and deploy metrics and test strategies at scale to evaluate the behavior of autonomous vehicles in simulation and on-road
  • Translate validation strategies into production-quality code and automation pipelines that execute high-quality AV behavior analysis for continuous and scaled software release cycles
  • Leverage AI-assisted and agentic workflows to build internal tools and frameworks that make it easier to author, configure, and deploy metrics, tests, and validation artifacts
  • Ensure the quality and reliability of behavior validation outputs through monitoring, alerting, automated checks, and continuous improvement of the underlying code and data pipelines
  • Collaborate across teams to establish coding and automation best practices for the Software Validation organization
  • and understand stakeholder needs and translate them into robust tools and workflows
What we offer
What we offer
  • Bonus Potential: An incentive pay program offers payouts based on company performance, job level, and individual performance
  • medical
  • dental
  • vision
  • Health Savings Account
  • Flexible Spending Accounts
  • retirement savings plan
  • sickness and accident benefits
  • life insurance
  • paid vacation & holidays
  • Fulltime
Read More
Arrow Right

AI Systems Engineer – AI Model (Training & Inference)

The AMD AI Group is looking for a Senior Software Development Engineer to own th...
Location
Location
Canada , Markham
Salary
Salary:
106400.00 - 159600.00 CAD / Year
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Industry experience shipping production AI/ML infrastructure, with hands-on work spanning both training and inference.
  • Bachelor’s or Master’s degree or Ph.D in Computer/Software Engineering, Computer Science, or related technical discipline
Job Responsibility
Job Responsibility
  • Enable and optimize large-scale model training (LLMs, VLMs, MoE architectures) on AMD Instinct GPU clusters, ensuring correctness, reproducibility, and competitive throughput.
  • Build and maintain training infrastructure: job orchestration, distributed checkpointing, data loading pipelines, and storage optimization for multi-thousand GPU clusters on Kubernetes.
  • Debug and resolve training-specific issues including gradient norm explosions, non-deterministic behavior across GPU generations, and compute-communication overlap in distributed training (FSDP, DeepSpeed, Megatron-LM).
  • Optimize RCCL collective communication patterns for training workloads, including all-reduce, all-gather, and reduce-scatter across multi-node topologies.
  • Develop monitoring, alerting, and compliance infrastructure to ensure training cluster health, data security, and SLA adherence at scale.
  • Design and build end-to-end validation and testing infrastructure using proxy workloads, synthetic benchmarks, and configurable workload generators to systematically validate platform readiness across AMD Instinct GPU generations.
  • Write and optimize high-performance GPU kernels (GEMM, attention, quantized matmul, GPTQ/AWQ) in HIP, Triton, and MLIR targeting AMD Instinct architectures, with demonstrated ability to outperform open-source baselines.
  • Drive end-to-end inference enablement on new AMD GPU silicon - be among the first to get frontier models running on each new Instinct generation, creating reproducible guides and reference implementations.
  • Optimize inference serving frameworks (vLLM, SGLang, TorchServe) for AMD GPUs: batching strategies, KV-cache management, speculative decoding, and continuous batching for production throughput/latency targets.
  • Develop novel approaches to inference acceleration, including bio-inspired algorithms, SLM-assisted batching, and custom scheduling strategies that exploit AMD hardware characteristics.
  • Fulltime
Read More
Arrow Right

Sr Staff AV Behavior Safety Engineer

The Safety Assurance for Effective Autonomous Driving Software (SAFE‑ADS) depart...
Location
Location
United States
Salary
Salary:
185100.00 - 284100.00 USD / Year
gm.com Logo
General Motors
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Engineering, Mathematics, or a related field
  • 6+ years of experience in machine learning, engineering, data science, or a related field
  • 6+ years in autonomous vehicle or robotics development or related field
  • Demonstrated experience working on production‑intent AV programs
  • Track record providing technical safety leadership in AV development (e.g., defining safety strategies, risk assessments, validation methodologies, safety case contributions)
  • Deep understanding of AV behavior development: defining ODDs, behaviors, and evaluation criteria
  • analyzing simulation, closed‑course, and public‑road test data
  • and generating prioritized, actionable recommendations for developers
  • Experience applying AV safety standards and best practices, such as ISO 5083, ISO 21448 (SOTIF), and AVSC practices
  • Excellent communication and storytelling skills, including the ability to explain complex technical tradeoffs to executives and non‑technical stakeholders
Job Responsibility
Job Responsibility
  • Lead the strategy and support execution for how GM defines, measures, and validates the safety of SAE Level 3 – 4 Automated Driving Systems powered by machine‑learned models
  • Reference and interpret standards such as ISO 21448 (SOTIF), ISO 5083, and AVSC best practices to define GM’s strategy for safe autonomous system development, validation and deployment
  • Own the behavior‑focused portion of the ADS Safety Case, including key claims, sufficiency criteria, and recommended evidence for AV behavior safety performance
  • Collaborate with Software Validation, Embodied AI, Simulation, and Safety Metrics teams to define the end‑to‑end AV behavior validation methodology for AI‑driven systems
  • Set the strategy for how we systematically break down ODDs and how performance is validated per behavior and in aggregate
  • Collaborate on evaluation metrics, human benchmarks, and safety launch targets for AV behaviors and overall system performance
  • This includes supporting development of safety performance indicators (SPIs) for AV behaviors
  • Assess AV performance across safety and reliability dimensions using simulation, closed‑course, and public‑road data and provide clear, prioritized feedback to engineering teams
  • Define and run an assurance process to verify the sufficiency criteria and safety targets to support launch readiness
What we offer
What we offer
  • Medical, dental, vision, Health Savings Account, Flexible Spending Accounts, retirement savings plan, sickness and accident benefits, life insurance, paid vacation & holidays, tuition assistance programs, employee assistance program, GM vehicle discounts
  • Company vehicle evaluation program
  • Bonus Potential: An incentive pay program offers payouts based on company performance, job level, and individual performance
  • Fulltime
Read More
Arrow Right