CrawlJobs Logo

LLM Inference Performance & Evals Engineer

cerebras.net Logo

Cerebras Systems

Location Icon

Location:
Canada , Toronto

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

Join the inference model team dedicated to bring up the state-of-the-art models, numerically validating and accelerating new model ideas on wafer-scale hardware. You will prototype architectural tweaks, build performance-eval pipelines, and turn hard numbers into changes that land in production.

Job Responsibility:

  • Prototype and benchmark cutting-edge ideas: new attentions, MoE, speculative decoding, and many more innovations as they emerge
  • Develop agent-driven automation that designs experiments, schedules runs, triages regressions, and drafts pull-requests
  • Work closely with compiler, runtime, and silicon teams: unique opportunity to experience the full stack of software/hardware innovation
  • Keep pace with the latest open- and closed-source models
  • run them first on wafer scale to expose new optimization opportunities

Requirements:

  • 3+ years building high-performance ML or systems software
  • Solid grounding in Transformer math—attention scaling, KV-cache, quantisation—or clear evidence you learn this material rapidly
  • Comfort navigating the full AI toolchain: Python modeling code, compiler IRs, performance profiling, etc.
  • Strong debugging skills across performance, numerical accuracy, and runtime integration
  • Prior experience in modeling, compilers or crafting benchmarks or performance studies
  • not just black-box QA tests
  • Strong passion to leverage AI agents or workflow orchestration tools to boost personal productivity

Nice to have:

  • Hands-on with flash-attention, Triton kernels, linear-attention, or sparsity research
  • Performance-tuning experience on custom silicon, GPUs, or FPGAs
  • Proficiency in C/C++ programming and experience with low-level optimization
  • Proven experience in compiler development, particularly with LLVM and/or MLIR
  • Publications, repos, or blog posts dissecting model speed-ups
  • Contributions to open-source agent frameworks
What we offer:
  • Build a breakthrough AI platform beyond the constraints of the GPU
  • Publish and open source their cutting-edge AI research
  • Work on one of the fastest AI supercomputers in the world
  • Enjoy job stability with startup vitality
  • Our simple, non-corporate work culture that respects individual beliefs

Additional Information:

Job Posted:
February 17, 2026

Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for LLM Inference Performance & Evals Engineer

Principal AI Engineer

We are looking for a Principal AI Engineer to lead the design and deployment of ...
Location
Location
United States
Salary
Salary:
200000.00 - 300000.00 USD / Year
apollo.io Logo
Apollo.io
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of software engineering experience
  • at least 3 years in applied LLM or agentic AI systems (2023–present)
  • proven success in deploying LLM-powered products used by real users at scale
  • deep backend & systems engineering expertise with Python, distributed systems, and scalable APIs
  • familiarity with LangChain, LlamaIndex, or similar orchestration frameworks
  • experience with RAG pipelines, vector DBs, embedding models, and semantic search tuning
  • experience managing performance across cloud providers (e.g., AWS Bedrock, OpenAI, Anthropic, etc.)
  • demonstrated experience building multi-step agents, planning workflows, chaining reasoning steps, and integrating APIs with agent memory/state
  • comfort with advanced prompting strategies, few-shot and chain-of-thought reasoning, and embedding retrieval setups
  • strong understanding of AI system evaluation, human ratings, A/B experimentation, and feedback loop pipelines
Job Responsibility
Job Responsibility
  • Architect and lead the development of multi-agent systems capable of long-horizon planning, reasoning, and API orchestration
  • build reusable agentic components that integrate deeply into sales and marketing processes
  • own and evolve our in-house platform for scalable, low-latency, and cost-efficient LLM and agent deployments
  • lead design of interfaces powered by natural language understanding and retrieval-augmented generation (RAG)
  • build embedding-based, intent-aware search and personalization systems tuned to business user needs
  • drive innovation in personalized outreach generation using context-aware generation pipelines
  • tune inference pipelines, caching layers, and model selection logic for high-scale, cost-aware performance
  • define and drive robust offline and online testing methodologies (A/B, sandboxing, human evals) across agents and LLM flows
  • architect human-in-the-loop systems and telemetry to improve accuracy, UX, and explainability over time
What we offer
What we offer
  • equity
  • company bonus or sales commissions/bonuses
  • 401(k) plan
  • at least 10 paid holidays per year
  • flex PTO
  • parental leave
  • employee assistance program
  • wellbeing benefits
  • global travel coverage
  • life/AD&D/STD/LTD insurance
  • Fulltime
Read More
Arrow Right

Tech Lead, LLM & Generative AI

Tech Lead to take helm of LLM team (currently 3 engineers) and own architecture,...
Location
Location
Italy
Salary
Salary:
Not provided
everai.ai Logo
EverAI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of engineering experience with significant portion dedicated to shipping ML/LLM features to millions of active users
  • Proficient in Python/PyTorch and comfortable with modern LLM stack (vLLM, HuggingFace, fine-tuning pipelines, eval frameworks)
  • Comfortable working with NSFW content and understanding technical rigor required to moderate it effectively without breaking user experience
  • Intuition for alignment: understanding how prompt conditioning, temperature, and sampling affect chatbot behavior
  • Doer mindset: ability to distinguish between perfect academic solution and shippable production solution
  • Owner mentality: obsess over metrics, regressions, and user experience
Job Responsibility
Job Responsibility
  • Ship code & lead from the front: architect system and mentor team while spending significant time hands-on in codebase (Python/PyTorch)
  • Own core chat loop: optimize context windows, memory/RAG retrieval, and inference latency for seamless real-time experience
  • Own model lifecycle: drive strategy for SFT and RLHF/DPO, decide when to prompt, fine-tune, or architect new RAG pipeline
  • Manage data engine: oversee sourcing, labeling, and cleaning of diverse datasets to improve model steerability and multicultural performance
  • Architect high-precision moderation: design and train custom classifiers to detect and filter non-consensual or illegal content within explicit environment
  • Create nuanced, context-aware moderation systems beyond binary safe/unsafe flags
What we offer
What we offer
  • 4 weeks (20 working days) of PTO per year
  • Annual in-person meetup
  • Up to $200 per year for wellbeing expenses
  • Unlimited 1:1 sessions with psychologists and lifestyle experts through OpenUp (also available for up to three family members)
  • Co-working space budget up to twice per month (35 EUR / 40 USD per visit)
  • Learning budget for professional growth: courses, books, conferences, events, or certifications
  • Company laptop provided + monitor budget up to $250 for workspace setup
  • Premium access to AI tools: ChatGPT, Cursor, Hugging Face, and others
  • Fulltime
Read More
Arrow Right

Tech Lead, LLM & Generative AI

EverAI is processing 80 million tokens per day and growing. We are looking for a...
Location
Location
Germany
Salary
Salary:
Not provided
everai.ai Logo
EverAI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of engineering experience, with a significant portion dedicated to shipping ML/LLM features to millions of active users
  • Proficient in Python/PyTorch and comfortable with the modern LLM stack (vLLM, HuggingFace, fine-tuning pipelines, eval frameworks)
  • Comfortable working with NSFW content and understand the technical rigor required to moderate it effectively without breaking the user experience
  • Intuition for Alignment: understand the physics of LLMs—how prompt conditioning, temperature, and sampling affect the 'soul' of a chatbot
  • Doer Mindset: value velocity, can distinguish between a 'perfect' academic solution and a 'shippable' production solution
  • Owner: obsess over metrics, regressions, and the user experience long after the code has been merged
Job Responsibility
Job Responsibility
  • Ship Code & Lead from the Front: Architect the system and mentor the team, but spend significant time hands-on in the codebase (Python/PyTorch)
  • Own the core chat loop: Optimize context windows, memory/RAG retrieval, and inference latency to ensure a seamless, real-time experience
  • Own the Model Lifecycle: Drive our strategy for SFT (Supervised Fine-Tuning) and RLHF/DPO (Preference Optimization)
  • Manage the 'Data Engine': oversee the sourcing, labeling, and cleaning of diverse datasets to improve model steerability and multicultural performance
  • Architect High-Precision Moderation: design and train custom classifiers to detect and filter non-consensual or illegal content within an explicit environment
  • Move beyond binary 'safe/unsafe' flags to create nuanced, context-aware moderation systems
What we offer
What we offer
  • Work From Anywhere: Fully remote
  • Paid Time Off: 4 weeks (20 working days) of PTO per year
  • Annual Gathering: A yearly in-person meetup
  • Health & Wellness Support: Up to $200 per year for wellbeing expenses + unlimited 1:1 sessions with psychologists and lifestyle experts through OpenUp (also available for up to three family members)
  • Co-Working Space Budget: Work from a co-working space up to twice per month (35 EUR / 40 USD per visit)
  • Learning Budget: Dedicated funds to support your professional growth
  • Equipment: Company laptop provided + monitor budget up to $250 for your workspace setup
  • AI Tools Access: Premium access to ChatGPT, Cursor, Hugging Face, and others
  • Fulltime
Read More
Arrow Right

Tech Lead, LLM & Generative AI

At EverAI, we’re shaping what it means to connect with AI. With 40 million users...
Location
Location
Salary
Salary:
Not provided
everai.ai Logo
EverAI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of engineering experience with a significant portion dedicated to shipping ML/LLM features to millions of active users
  • Proficient in Python/PyTorch and comfortable with the modern LLM stack (vLLM, HuggingFace, fine-tuning pipelines, eval frameworks)
  • Comfortable working with NSFW content and understanding the technical rigor required to moderate it effectively without breaking the user experience
  • Intuition for Alignment: understanding the physics of LLMs—how prompt conditioning, temperature, and sampling affect the 'soul' of a chatbot
  • Doer Mindset: valuing velocity and distinguishing between a 'perfect' academic solution and a 'shippable' production solution
  • Owner: obsessing over metrics, regressions, and the user experience long after the code has been merged
Job Responsibility
Job Responsibility
  • Ship Code & Lead from the Front: Architect the system and mentor the team, but spend significant time hands-on in the codebase (Python/PyTorch)
  • Own the core chat loop: Optimize context windows, memory/RAG retrieval, and inference latency to ensure a seamless, real-time experience
  • Own the Model Lifecycle: Drive our strategy for SFT (Supervised Fine-Tuning) and RLHF/DPO (Preference Optimization)
  • Manage the 'Data Engine': oversee the sourcing, labeling, and cleaning of diverse datasets to improve model steerability and multicultural performance
  • Architect High-Precision Moderation: Build the immune system of the platform
  • Design and train custom classifiers to detect and filter non-consensual or illegal content within an explicit environment
  • Move beyond binary 'safe/unsafe' flags to create nuanced, context-aware moderation systems
What we offer
What we offer
  • Work From Anywhere: Fully remote
  • Paid Time Off: 4 weeks (20 working days) of PTO per year
  • Annual Gathering: A yearly in-person meetup
  • Health & Wellness Support: Up to $200 per year for wellbeing expenses + unlimited 1:1 sessions with psychologists and lifestyle experts through OpenUp (also available for up to three family members)
  • Co-Working Space Budget: Work from a co-working space up to twice per month (35 EUR / 40 USD per visit)
  • Learning Budget: Dedicated funds to support your professional growth: courses, books, conferences, events, or certifications
  • Equipment: Company laptop provided + monitor budget up to $250 for your workspace setup
  • AI Tools Access: Premium access to ChatGPT, Cursor, Hugging Face, and others
  • Fulltime
Read More
Arrow Right

Tech Lead, LLM & Generative AI

We are looking for a Tech Lead to take the helm of our LLM team (currently 3 eng...
Location
Location
Hungary
Salary
Salary:
Not provided
everai.ai Logo
EverAI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of engineering experience, with a significant portion dedicated to shipping ML/LLM features to millions of active users
  • Proficient in Python/PyTorch and comfortable with the modern LLM stack (vLLM, HuggingFace, fine-tuning pipelines, eval frameworks)
  • Comfortable working with NSFW content and understand the technical rigor required to moderate it effectively without breaking the user experience
  • Understand the physics of LLMs—how prompt conditioning, temperature, and sampling affect the 'soul' of a chatbot
  • Can distinguish between a 'perfect' academic solution and a 'shippable' production solution
  • Obsess over metrics, regressions, and the user experience long after the code has been merged
Job Responsibility
Job Responsibility
  • Ship Code & Lead from the Front: Architect the system and mentor the team, but spend significant time hands-on in the codebase (Python/PyTorch)
  • Own the core chat loop: Optimize context windows, memory/RAG retrieval, and inference latency to ensure a seamless, real-time experience
  • Own the Model Lifecycle: Drive our strategy for SFT (Supervised Fine-Tuning) and RLHF/DPO (Preference Optimization)
  • Manage the 'Data Engine': oversee the sourcing, labeling, and cleaning of diverse datasets to improve model steerability and multicultural performance
  • Architect High-Precision Moderation: Build the immune system of the platform
  • Design and train custom classifiers to detect and filter non-consensual or illegal content within an explicit environment
  • Move beyond binary 'safe/unsafe' flags to create nuanced, context-aware moderation systems
What we offer
What we offer
  • Fully remote
  • 4 weeks (20 working days) of PTO per year
  • Annual in-person meetup
  • Up to $200 per year for wellbeing expenses
  • Unlimited 1:1 sessions with psychologists and lifestyle experts through OpenUp (also available for up to three family members)
  • Co-Working Space Budget: Work from a co-working space up to twice per month (35 EUR / 40 USD per visit)
  • Learning Budget: Dedicated funds to support your professional growth: courses, books, conferences, events, or certifications
  • Company laptop provided + monitor budget up to $250 for your workspace setup
  • Premium access to ChatGPT, Cursor, Hugging Face, and others
  • Fulltime
Read More
Arrow Right

Tech Lead, LLM & Generative AI

EverAI is processing 80 million tokens per day and growing. We are looking for a...
Location
Location
Spain
Salary
Salary:
Not provided
everai.ai Logo
EverAI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of engineering experience, with a significant portion dedicated to shipping ML/LLM features to millions of active users
  • Proficient in Python/PyTorch and comfortable with the modern LLM stack (vLLM, HuggingFace, fine-tuning pipelines, eval frameworks)
  • Comfortable working with NSFW content and understand the technical rigor required to moderate it effectively without breaking the user experience
  • Intuition for Alignment: understand the physics of LLMs—how prompt conditioning, temperature, and sampling affect the 'soul' of a chatbot
  • Doer Mindset: value velocity, can distinguish between a 'perfect' academic solution and a 'shippable' production solution
  • Owner: obsess over metrics, regressions, and the user experience long after the code has been merged
Job Responsibility
Job Responsibility
  • Ship Code & Lead from the Front: Architect the system and mentor the team, but spend significant time hands-on in the codebase (Python/PyTorch)
  • Own the core chat loop: Optimize context windows, memory/RAG retrieval, and inference latency to ensure a seamless, real-time experience
  • Own the Model Lifecycle: Drive our strategy for SFT (Supervised Fine-Tuning) and RLHF/DPO (Preference Optimization)
  • Manage the 'Data Engine': oversee the sourcing, labeling, and cleaning of diverse datasets to improve model steerability and multicultural performance
  • Architect High-Precision Moderation: Build the immune system of the platform
  • design and train custom classifiers to detect and filter non-consensual or illegal content within an explicit environment
  • Move beyond binary 'safe/unsafe' flags to create nuanced, context-aware moderation systems
What we offer
What we offer
  • Paid Time Off: 4 weeks (20 working days) of PTO per year
  • Annual Gathering: A yearly in-person meetup
  • Health & Wellness Support: Up to $200 per year for wellbeing expenses + unlimited 1:1 sessions with psychologists and lifestyle experts through OpenUp (also available for up to three family members)
  • Co-Working Space Budget: Work from a co-working space up to twice per month (35 EUR / 40 USD per visit)
  • Learning Budget: Dedicated funds to support your professional growth: courses, books, conferences, events, or certifications
  • Equipment: Company laptop provided + monitor budget up to $250 for your workspace setup
  • AI Tools Access: Premium access to ChatGPT, Cursor, Hugging Face, and others
  • Fulltime
Read More
Arrow Right

Tech Lead, LLM & Generative AI

We are looking for a Tech Lead to take the helm of our LLM team (currently 3 eng...
Location
Location
Norway
Salary
Salary:
Not provided
everai.ai Logo
EverAI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of engineering experience, with a significant portion dedicated to shipping ML/LLM features to millions of active users
  • Proficient in Python/PyTorch and comfortable with the modern LLM stack (vLLM, HuggingFace, fine-tuning pipelines, eval frameworks)
  • Comfortable working with NSFW content and understand the technical rigor required to moderate it effectively without breaking the user experience
  • Understand the physics of LLMs—how prompt conditioning, temperature, and sampling affect the 'soul' of a chatbot
  • Value velocity
  • can distinguish between a 'perfect' academic solution and a 'shippable' production solution
  • Obsess over metrics, regressions, and the user experience long after the code has been merged
Job Responsibility
Job Responsibility
  • Ship Code & Lead from the Front: Architect the system and mentor the team, but spend significant time hands-on in the codebase (Python/PyTorch)
  • Own the core chat loop: Optimize context windows, memory/RAG retrieval, and inference latency to ensure a seamless, real-time experience
  • Own the Model Lifecycle: Drive our strategy for SFT (Supervised Fine-Tuning) and RLHF/DPO (Preference Optimization)
  • Manage the 'Data Engine': oversee the sourcing, labeling, and cleaning of diverse datasets to improve model steerability and multicultural performance
  • Architect High-Precision Moderation: Build the immune system of the platform
  • design and train custom classifiers to detect and filter non-consensual or illegal content within an explicit environment
  • Move beyond binary 'safe/unsafe' flags to create nuanced, context-aware moderation systems
What we offer
What we offer
  • Fully remote
  • 4 weeks (20 working days) of PTO per year
  • Annual in-person meetup
  • Up to $200 per year for wellbeing expenses
  • Unlimited 1:1 sessions with psychologists and lifestyle experts through OpenUp (also available for up to three family members)
  • Co-Working Space Budget: Work from a co-working space up to twice per month (35 EUR / 40 USD per visit)
  • Learning Budget: Dedicated funds to support your professional growth
  • Company laptop provided + monitor budget up to $250 for your workspace setup
  • AI Tools Access: Premium access to ChatGPT, Cursor, Hugging Face, and others
  • Fulltime
Read More
Arrow Right

Tech Lead, LLM & Generative AI

At EverAI, we’re shaping what it means to connect with AI. With 40 million users...
Location
Location
Serbia
Salary
Salary:
Not provided
everai.ai Logo
EverAI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of engineering experience, with a significant portion dedicated to shipping ML/LLM features to millions of active users
  • Proficient in Python/PyTorch and comfortable with the modern LLM stack (vLLM, HuggingFace, fine-tuning pipelines, eval frameworks)
  • Comfortable working with NSFW content and understand the technical rigor required to moderate it effectively without breaking the user experience
  • Understand the physics of LLMs—how prompt conditioning, temperature, and sampling affect the 'soul' of a chatbot
  • Value velocity
  • can distinguish between a 'perfect' academic solution and a 'shippable' production solution
  • Obsess over metrics, regressions, and the user experience long after the code has been merged
Job Responsibility
Job Responsibility
  • Ship Code & Lead from the Front: Architect the system and mentor the team, but spend significant time hands-on in the codebase (Python/PyTorch)
  • Own the core chat loop: Optimize context windows, memory/RAG retrieval, and inference latency to ensure a seamless, real-time experience
  • Own the Model Lifecycle: Drive our strategy for SFT (Supervised Fine-Tuning) and RLHF/DPO (Preference Optimization)
  • Manage the 'Data Engine': oversee the sourcing, labeling, and cleaning of diverse datasets to improve model steerability and multicultural performance
  • Architect High-Precision Moderation: Build the immune system of the platform
  • design and train custom classifiers to detect and filter non-consensual or illegal content within an explicit environment
  • Move beyond binary 'safe/unsafe' flags to create nuanced, context-aware moderation systems
What we offer
What we offer
  • Fully remote
  • 4 weeks (20 working days) of PTO per year
  • Annual in-person meetup
  • Up to $200 per year for wellbeing expenses
  • Unlimited 1:1 sessions with psychologists and lifestyle experts through OpenUp (also available for up to three family members)
  • Co-Working Space Budget: Work from a co-working space up to twice per month (35 EUR / 40 USD per visit)
  • Learning Budget: Dedicated funds to support your professional growth
  • Company laptop provided + monitor budget up to $250 for your workspace setup
  • Premium access to ChatGPT, Cursor, Hugging Face, and others
  • Fulltime
Read More
Arrow Right