LLM Inference Performance & Evals Engineer Job at Cerebras Systems (Toronto)

Ai Engineer

We're hiring a senior AI Engineer to design, build, and ship production AI syste...

Location

India , Jaipur

Salary:

Not provided

HabileLabs

Expiration Date

Until further notice

Requirements

4+ yrs ML/software engineering
2+ yrs on production AI systems
Strong Python
PyTorch or TensorFlow
LLM fine-tuning: LoRA / QLoRA / PEFT
End-to-end ML pipeline experience (train → serve)
Cloud (AWS / GCP / Azure) + Docker / Kubernetes
ASR & TTS integration in real-time streaming systems
VAD, noise suppression, and barge-in handling
Telephony APIs (Twilio, Vonage) or WebRTC experience

Job Responsibility

LLM & GenAI: Fine-tune and deploy LLMs
build RAG pipelines and agentic workflows (LangChain, LlamaIndex)
Voice Pipelines: Architect real-time ASR → LLM → TTS pipelines with <300 ms latency
Voice Agents: Build production voice agents with turn-taking, barge-in handling, and emotion-aware dialogue
Speech Fine-Tuning: Adapt ASR/TTS models for domain-specific accents, terminology, and speaking styles
MLOps: Build reproducible ML pipelines (Kubeflow / MLflow)
maintain CI/CD, monitoring, and model versioning
Inference Optimization: Apply quantization (GGUF, GPTQ), distillation, and hardware-aware inference (TensorRT, vLLM) to cut cost and latency
APIs & Services: Ship high-performance inference APIs in Python (FastAPI) or Go on Kubernetes
Data & Evaluation: Curate text + speech corpora

Fulltime

Senior Applied AI Engineer, Image Generation

We’re hiring a Senior Applied AI Engineer, Image Generation to join a fast‑movin...

Location

United States , Redmond

Salary:

119800.00 - 234700.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
Master’s Degree AND 3+ years of experience in engineering, problem solving, model building, evaluation, data analysis OR equivalent experience.
PhD in engineering, applied math, statistics, or related analytical field.
2+ years shipping production-level code, models, or data analysis.
1+ years using AI-assisted coding and analysis techniques.
Solid grasp of deep learning: loss functions, optimization, regularization, training stability
Experience deploying ML models at scale (inference optimization, quantization, distillation)
Familiarity with image preprocessing pipelines, data augmentation, and dataset curation
Experience working on small teams and mid-stage startup environments.
Experience working on AI products.

Job Responsibility

Model Development & Training
Train, fine-tune, and evaluate image generation models (diffusion, GAN, transformer-based)
Implement and adapt techniques from research papers into working production systems
Design and run experiments to improve image quality, diversity, and controllability
Curate, clean, and manage large-scale image-text training datasets
Evaluation, Hillclimbing & Quality Systems
Build and maintain evaluation frameworks for correctness, safety, grounding, and UX quality.
Run hillclimbing loops across prompts, models, and tool‑use strategies to continuously improve assistant performance.
Analyze failure modes, design mitigations, and drive systematic improvements across the stack.
LLM Tooling & Internal Infrastructure

Fulltime

Tech Lead, LLM & Generative AI

At EverAI, we’re shaping what it means to connect with AI. With 40 million users...

Location

Salary:

Not provided

EverAI

Expiration Date

Until further notice

Requirements

8+ years of engineering experience with a significant portion dedicated to shipping ML/LLM features to millions of active users
Proficient in Python/PyTorch and comfortable with the modern LLM stack (vLLM, HuggingFace, fine-tuning pipelines, eval frameworks)
Comfortable working with NSFW content and understanding the technical rigor required to moderate it effectively without breaking the user experience
Intuition for Alignment: understanding the physics of LLMs—how prompt conditioning, temperature, and sampling affect the 'soul' of a chatbot
Doer Mindset: valuing velocity and distinguishing between a 'perfect' academic solution and a 'shippable' production solution
Owner: obsessing over metrics, regressions, and the user experience long after the code has been merged

Job Responsibility

Ship Code & Lead from the Front: Architect the system and mentor the team, but spend significant time hands-on in the codebase (Python/PyTorch)
Own the core chat loop: Optimize context windows, memory/RAG retrieval, and inference latency to ensure a seamless, real-time experience
Own the Model Lifecycle: Drive our strategy for SFT (Supervised Fine-Tuning) and RLHF/DPO (Preference Optimization)
Manage the 'Data Engine': oversee the sourcing, labeling, and cleaning of diverse datasets to improve model steerability and multicultural performance
Architect High-Precision Moderation: Build the immune system of the platform
Design and train custom classifiers to detect and filter non-consensual or illegal content within an explicit environment
Move beyond binary 'safe/unsafe' flags to create nuanced, context-aware moderation systems

What we offer

Work From Anywhere: Fully remote
Paid Time Off: 4 weeks (20 working days) of PTO per year
Annual Gathering: A yearly in-person meetup
Health & Wellness Support: Up to $200 per year for wellbeing expenses + unlimited 1:1 sessions with psychologists and lifestyle experts through OpenUp (also available for up to three family members)
Co-Working Space Budget: Work from a co-working space up to twice per month (35 EUR / 40 USD per visit)
Learning Budget: Dedicated funds to support your professional growth: courses, books, conferences, events, or certifications
Equipment: Company laptop provided + monitor budget up to $250 for your workspace setup
AI Tools Access: Premium access to ChatGPT, Cursor, Hugging Face, and others

Fulltime

Tech Lead, LLM & Generative AI

We are looking for a Tech Lead to take the helm of our LLM team (currently 3 eng...

Location

Hungary

Salary:

Not provided

EverAI

Expiration Date

Until further notice

Requirements

8+ years of engineering experience, with a significant portion dedicated to shipping ML/LLM features to millions of active users
Proficient in Python/PyTorch and comfortable with the modern LLM stack (vLLM, HuggingFace, fine-tuning pipelines, eval frameworks)
Comfortable working with NSFW content and understand the technical rigor required to moderate it effectively without breaking the user experience
Understand the physics of LLMs—how prompt conditioning, temperature, and sampling affect the 'soul' of a chatbot
Can distinguish between a 'perfect' academic solution and a 'shippable' production solution
Obsess over metrics, regressions, and the user experience long after the code has been merged

Job Responsibility

Ship Code & Lead from the Front: Architect the system and mentor the team, but spend significant time hands-on in the codebase (Python/PyTorch)
Own the core chat loop: Optimize context windows, memory/RAG retrieval, and inference latency to ensure a seamless, real-time experience
Own the Model Lifecycle: Drive our strategy for SFT (Supervised Fine-Tuning) and RLHF/DPO (Preference Optimization)
Manage the 'Data Engine': oversee the sourcing, labeling, and cleaning of diverse datasets to improve model steerability and multicultural performance
Architect High-Precision Moderation: Build the immune system of the platform
Design and train custom classifiers to detect and filter non-consensual or illegal content within an explicit environment
Move beyond binary 'safe/unsafe' flags to create nuanced, context-aware moderation systems

What we offer

Fully remote
4 weeks (20 working days) of PTO per year
Annual in-person meetup
Up to $200 per year for wellbeing expenses
Unlimited 1:1 sessions with psychologists and lifestyle experts through OpenUp (also available for up to three family members)
Co-Working Space Budget: Work from a co-working space up to twice per month (35 EUR / 40 USD per visit)
Learning Budget: Dedicated funds to support your professional growth: courses, books, conferences, events, or certifications
Company laptop provided + monitor budget up to $250 for your workspace setup
Premium access to ChatGPT, Cursor, Hugging Face, and others

Fulltime

Tech Lead, LLM & Generative AI

EverAI is processing 80 million tokens per day and growing. We are looking for a...

Location

Spain

Salary:

Not provided

EverAI

Expiration Date

Until further notice

Requirements

8+ years of engineering experience, with a significant portion dedicated to shipping ML/LLM features to millions of active users
Proficient in Python/PyTorch and comfortable with the modern LLM stack (vLLM, HuggingFace, fine-tuning pipelines, eval frameworks)
Comfortable working with NSFW content and understand the technical rigor required to moderate it effectively without breaking the user experience
Intuition for Alignment: understand the physics of LLMs—how prompt conditioning, temperature, and sampling affect the 'soul' of a chatbot
Doer Mindset: value velocity, can distinguish between a 'perfect' academic solution and a 'shippable' production solution
Owner: obsess over metrics, regressions, and the user experience long after the code has been merged

Job Responsibility

Ship Code & Lead from the Front: Architect the system and mentor the team, but spend significant time hands-on in the codebase (Python/PyTorch)
Own the core chat loop: Optimize context windows, memory/RAG retrieval, and inference latency to ensure a seamless, real-time experience
Own the Model Lifecycle: Drive our strategy for SFT (Supervised Fine-Tuning) and RLHF/DPO (Preference Optimization)
Manage the 'Data Engine': oversee the sourcing, labeling, and cleaning of diverse datasets to improve model steerability and multicultural performance
Architect High-Precision Moderation: Build the immune system of the platform
design and train custom classifiers to detect and filter non-consensual or illegal content within an explicit environment
Move beyond binary 'safe/unsafe' flags to create nuanced, context-aware moderation systems

What we offer

Paid Time Off: 4 weeks (20 working days) of PTO per year
Annual Gathering: A yearly in-person meetup
Health & Wellness Support: Up to $200 per year for wellbeing expenses + unlimited 1:1 sessions with psychologists and lifestyle experts through OpenUp (also available for up to three family members)
Co-Working Space Budget: Work from a co-working space up to twice per month (35 EUR / 40 USD per visit)
Learning Budget: Dedicated funds to support your professional growth: courses, books, conferences, events, or certifications
Equipment: Company laptop provided + monitor budget up to $250 for your workspace setup
AI Tools Access: Premium access to ChatGPT, Cursor, Hugging Face, and others

Fulltime

Tech Lead, LLM & Generative AI

We are looking for a Tech Lead to take the helm of our LLM team (currently 3 eng...

Location

Norway

Salary:

Not provided

EverAI

Expiration Date

Until further notice

Requirements

8+ years of engineering experience, with a significant portion dedicated to shipping ML/LLM features to millions of active users
Proficient in Python/PyTorch and comfortable with the modern LLM stack (vLLM, HuggingFace, fine-tuning pipelines, eval frameworks)
Comfortable working with NSFW content and understand the technical rigor required to moderate it effectively without breaking the user experience
Understand the physics of LLMs—how prompt conditioning, temperature, and sampling affect the 'soul' of a chatbot
Value velocity
can distinguish between a 'perfect' academic solution and a 'shippable' production solution
Obsess over metrics, regressions, and the user experience long after the code has been merged

Job Responsibility

Ship Code & Lead from the Front: Architect the system and mentor the team, but spend significant time hands-on in the codebase (Python/PyTorch)
Own the core chat loop: Optimize context windows, memory/RAG retrieval, and inference latency to ensure a seamless, real-time experience
Own the Model Lifecycle: Drive our strategy for SFT (Supervised Fine-Tuning) and RLHF/DPO (Preference Optimization)
Manage the 'Data Engine': oversee the sourcing, labeling, and cleaning of diverse datasets to improve model steerability and multicultural performance
Architect High-Precision Moderation: Build the immune system of the platform
design and train custom classifiers to detect and filter non-consensual or illegal content within an explicit environment
Move beyond binary 'safe/unsafe' flags to create nuanced, context-aware moderation systems

What we offer

Fully remote
4 weeks (20 working days) of PTO per year
Annual in-person meetup
Up to $200 per year for wellbeing expenses
Unlimited 1:1 sessions with psychologists and lifestyle experts through OpenUp (also available for up to three family members)
Co-Working Space Budget: Work from a co-working space up to twice per month (35 EUR / 40 USD per visit)
Learning Budget: Dedicated funds to support your professional growth
Company laptop provided + monitor budget up to $250 for your workspace setup
AI Tools Access: Premium access to ChatGPT, Cursor, Hugging Face, and others

Fulltime

Tech Lead, LLM & Generative AI

At EverAI, we’re shaping what it means to connect with AI. With 40 million users...

Location

Serbia

Salary:

Not provided

EverAI

Expiration Date

Until further notice

Requirements

8+ years of engineering experience, with a significant portion dedicated to shipping ML/LLM features to millions of active users
Proficient in Python/PyTorch and comfortable with the modern LLM stack (vLLM, HuggingFace, fine-tuning pipelines, eval frameworks)
Comfortable working with NSFW content and understand the technical rigor required to moderate it effectively without breaking the user experience
Understand the physics of LLMs—how prompt conditioning, temperature, and sampling affect the 'soul' of a chatbot
Value velocity
can distinguish between a 'perfect' academic solution and a 'shippable' production solution
Obsess over metrics, regressions, and the user experience long after the code has been merged

Job Responsibility

Ship Code & Lead from the Front: Architect the system and mentor the team, but spend significant time hands-on in the codebase (Python/PyTorch)
Own the core chat loop: Optimize context windows, memory/RAG retrieval, and inference latency to ensure a seamless, real-time experience
Own the Model Lifecycle: Drive our strategy for SFT (Supervised Fine-Tuning) and RLHF/DPO (Preference Optimization)
Manage the 'Data Engine': oversee the sourcing, labeling, and cleaning of diverse datasets to improve model steerability and multicultural performance
Architect High-Precision Moderation: Build the immune system of the platform
design and train custom classifiers to detect and filter non-consensual or illegal content within an explicit environment
Move beyond binary 'safe/unsafe' flags to create nuanced, context-aware moderation systems

What we offer

Fully remote
4 weeks (20 working days) of PTO per year
Annual in-person meetup
Up to $200 per year for wellbeing expenses
Unlimited 1:1 sessions with psychologists and lifestyle experts through OpenUp (also available for up to three family members)
Co-Working Space Budget: Work from a co-working space up to twice per month (35 EUR / 40 USD per visit)
Learning Budget: Dedicated funds to support your professional growth
Company laptop provided + monitor budget up to $250 for your workspace setup
Premium access to ChatGPT, Cursor, Hugging Face, and others

Fulltime

Tech Lead, LLM & Generative AI

We are looking for a Tech Lead to take the helm of our LLM team (currently 3 eng...

Location

Romania

Salary:

Not provided

EverAI

Expiration Date

Until further notice

Requirements

8+ years of engineering experience, with a significant portion dedicated to shipping ML/LLM features to millions of active users
Proficient in Python/PyTorch and comfortable with the modern LLM stack (vLLM, HuggingFace, fine-tuning pipelines, eval frameworks)
Comfortable working with NSFW content and understand the technical rigor required to moderate it effectively without breaking the user experience
Intuition for Alignment: understand the physics of LLMs—how prompt conditioning, temperature, and sampling affect the 'soul' of a chatbot
Doer Mindset: value velocity, can distinguish between a 'perfect' academic solution and a 'shippable' production solution
Owner: obsess over metrics, regressions, and the user experience long after the code has been merged

Job Responsibility

Ship Code & Lead from the Front: Architect the system and mentor the team, but spend significant time hands-on in the codebase (Python/PyTorch)
Own the core chat loop: Optimize context windows, memory/RAG retrieval, and inference latency to ensure a seamless, real-time experience
Own the Model Lifecycle: Drive our strategy for SFT (Supervised Fine-Tuning) and RLHF/DPO (Preference Optimization)
Manage the 'Data Engine': oversee the sourcing, labeling, and cleaning of diverse datasets to improve model steerability and multicultural performance
Architect High-Precision Moderation: Build the immune system of the platform
Design and train custom classifiers to detect and filter non-consensual or illegal content within an explicit environment
Move beyond binary 'safe/unsafe' flags to create nuanced, context-aware moderation systems

What we offer

Fully remote
4 weeks (20 working days) of PTO per year
Annual Gathering: A yearly in-person meetup
Health & Wellness Support: Up to $200 per year for wellbeing expenses + unlimited 1:1 sessions with psychologists and lifestyle experts through OpenUp (also available for up to three family members)
Co-Working Space Budget: Work from a co-working space up to twice per month (35 EUR / 40 USD per visit)
Learning Budget: Dedicated funds to support your professional growth: courses, books, conferences, events, or certifications
Equipment: Company laptop provided + monitor budget up to $250 for your workspace setup
AI Tools Access: Premium access to ChatGPT, Cursor, Hugging Face, and others

Fulltime

Select Country

LLM Inference Performance & Evals Engineer

Job Description

Job Responsibility

Requirements

Nice to have

What we offer

Looking for more opportunities?