CrawlJobs Logo

LLM Inference Performance & Evals Engineer

cerebras.net Logo

Cerebras Systems

Location Icon

Location:
Canada , Toronto

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

Join the inference model team dedicated to bring up the state-of-the-art models, numerically validating and accelerating new model ideas on wafer-scale hardware. You will prototype architectural tweaks, build performance-eval pipelines, and turn hard numbers into changes that land in production.

Job Responsibility:

  • Prototype and benchmark cutting-edge ideas: new attentions, MoE, speculative decoding, and many more innovations as they emerge
  • Develop agent-driven automation that designs experiments, schedules runs, triages regressions, and drafts pull-requests
  • Work closely with compiler, runtime, and silicon teams: unique opportunity to experience the full stack of software/hardware innovation
  • Keep pace with the latest open- and closed-source models
  • run them first on wafer scale to expose new optimization opportunities

Requirements:

  • 3+ years building high-performance ML or systems software
  • Solid grounding in Transformer math—attention scaling, KV-cache, quantisation—or clear evidence you learn this material rapidly
  • Comfort navigating the full AI toolchain: Python modeling code, compiler IRs, performance profiling, etc.
  • Strong debugging skills across performance, numerical accuracy, and runtime integration
  • Prior experience in modeling, compilers or crafting benchmarks or performance studies
  • not just black-box QA tests
  • Strong passion to leverage AI agents or workflow orchestration tools to boost personal productivity

Nice to have:

  • Hands-on with flash-attention, Triton kernels, linear-attention, or sparsity research
  • Performance-tuning experience on custom silicon, GPUs, or FPGAs
  • Proficiency in C/C++ programming and experience with low-level optimization
  • Proven experience in compiler development, particularly with LLVM and/or MLIR
  • Publications, repos, or blog posts dissecting model speed-ups
  • Contributions to open-source agent frameworks
What we offer:
  • Build a breakthrough AI platform beyond the constraints of the GPU
  • Publish and open source their cutting-edge AI research
  • Work on one of the fastest AI supercomputers in the world
  • Enjoy job stability with startup vitality
  • Our simple, non-corporate work culture that respects individual beliefs

Additional Information:

Job Posted:
February 17, 2026

Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for LLM Inference Performance & Evals Engineer

Principal AI Engineer

We are looking for a Principal AI Engineer to lead the design and deployment of ...
Location
Location
United States
Salary
Salary:
200000.00 - 300000.00 USD / Year
apollo.io Logo
Apollo.io
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of software engineering experience
  • at least 3 years in applied LLM or agentic AI systems (2023–present)
  • proven success in deploying LLM-powered products used by real users at scale
  • deep backend & systems engineering expertise with Python, distributed systems, and scalable APIs
  • familiarity with LangChain, LlamaIndex, or similar orchestration frameworks
  • experience with RAG pipelines, vector DBs, embedding models, and semantic search tuning
  • experience managing performance across cloud providers (e.g., AWS Bedrock, OpenAI, Anthropic, etc.)
  • demonstrated experience building multi-step agents, planning workflows, chaining reasoning steps, and integrating APIs with agent memory/state
  • comfort with advanced prompting strategies, few-shot and chain-of-thought reasoning, and embedding retrieval setups
  • strong understanding of AI system evaluation, human ratings, A/B experimentation, and feedback loop pipelines
Job Responsibility
Job Responsibility
  • Architect and lead the development of multi-agent systems capable of long-horizon planning, reasoning, and API orchestration
  • build reusable agentic components that integrate deeply into sales and marketing processes
  • own and evolve our in-house platform for scalable, low-latency, and cost-efficient LLM and agent deployments
  • lead design of interfaces powered by natural language understanding and retrieval-augmented generation (RAG)
  • build embedding-based, intent-aware search and personalization systems tuned to business user needs
  • drive innovation in personalized outreach generation using context-aware generation pipelines
  • tune inference pipelines, caching layers, and model selection logic for high-scale, cost-aware performance
  • define and drive robust offline and online testing methodologies (A/B, sandboxing, human evals) across agents and LLM flows
  • architect human-in-the-loop systems and telemetry to improve accuracy, UX, and explainability over time
What we offer
What we offer
  • equity
  • company bonus or sales commissions/bonuses
  • 401(k) plan
  • at least 10 paid holidays per year
  • flex PTO
  • parental leave
  • employee assistance program
  • wellbeing benefits
  • global travel coverage
  • life/AD&D/STD/LTD insurance
  • Fulltime
Read More
Arrow Right
New

Principal Engineer - Generative AI Infra Capabilities

Wells Fargo is seeking a Principal Engineer - Generative Gen AI GPU Infrastructu...
Location
Location
India , BENGALURU
Salary
Salary:
Not provided
https://www.wellsfargo.com/ Logo
Wells Fargo
Expiration Date
February 20, 2026
Flip Icon
Requirements
Requirements
  • 7+ years of Engineering experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
  • Design GPU cluster topologies (H100/H200, NVLink/NVSwitch), networking, and storage paths for high‑throughput inferencing
  • document sizing and perf baselines.
  • Implement Run: AI constructs (Collections/Departments/Projects/workloads) for MDEV/MDEP/UCEP/MRM
  • codify quota, priority, and fair‑share policies.
  • POC & benchmark disaggregated inferencing (prefill/decode) with vLLM/TensorRT‑LLM
  • publish guidance for H100/H200 tuning (FP8/INT8/AWQ) and KV‑transfer behavior over NVLink.
  • Operationalize OpenShift AI parity for GPU scheduling, time slicing/MIG profiles, and preemption
  • validate upgrade paths and helm/kustomize packaging.
  • Integrate Triton Inference Server for multi‑model serving
Job Responsibility
Job Responsibility
  • Act as an advisor to leadership to develop or influence applications, network, information security, database, operating systems, or web technologies for highly complex business and technical needs across multiple groups
  • Lead the strategy and resolution of highly complex and unique challenges requiring in-depth evaluation across multiple areas or the enterprise, delivering solutions that are long-term, large-scale and require vision, creativity, innovation, advanced analytical and inductive thinking
  • Translate advanced technology experience, an in-depth knowledge of the organizations tactical and strategic business objectives, the enterprise technological environment, the organization structure, and strategic technological opportunities and requirements into technical engineering solutions
  • Provide vision, direction and expertise to leadership on implementing innovative and significant business solutions
  • Maintain knowledge of industry best practices and new technologies and recommends innovations that enhance operations or provide a competitive advantage to the organization
  • Strategically engage with all levels of professionals and managers across the enterprise and serve as an expert advisor to leadership
  • Fulltime
!
Read More
Arrow Right

AI Engineer

Our next frontier is a strategic shift: We're evolving beyond traditional analyt...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
mvfglobal.com Logo
MVF
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Python and service development: write clean, typed, production-ready code
  • comfortable with Pydantic, Asyncio, and FastAPI
  • treat prompts as code: versioned, tested, and decoupled from business logic
  • Cloud-native experience: hands-on experience deploying and operating containerised services on AWS (or GCP/Azure) using CI/CD platforms (Jenkins, GitHub Actions, CircleCI, BuildKite), cloud monitoring tools (Datadog, Sumologic, NewRelic), and container orchestrators (EKS, ECS)
  • comfortable with Terraform for infrastructure as code
  • Hands-on LLM experience: built something real with language models, whether production systems, serious side projects, or internal tools
  • understand that prompting is engineering, not magic
Job Responsibility
Job Responsibility
  • Architect & Engineer Agentic Systems: Build agents that act, not just answer
  • design agents that perform deterministic actions based on probabilistic reasoning
  • build systems that can reliably analyse data, execute function calls, and manage state across multi-step workflows without getting stuck in loops
  • Production-Grade RAG: go beyond basic vector search
  • implement hybrid search (keyword + semantic), re-ranking strategies, and metadata filtering
  • Structured Data Extraction: build pipelines that turn unstructured conversations into structured data that our downstream systems can use
  • Establish AI Engineering Foundations: Observability First: implement the "nervous system" of our AI
  • choose and set up tools (e.g., LangSmith, LangFuse, ADK, or custom) to trace execution chains
  • Evals as a Service: build the testing harness
  • create automated evaluation pipelines that test prompts against "Golden Datasets"
What we offer
What we offer
  • Summer Fridays
  • Competitive holiday benefits - 25 days a year paid holiday, plus 8 bank holidays (increases 1 day a year up to 30 days)
  • Hybrid working - 3 days a week in the office
  • Closed for Christmas holidays - Extra days not taken from your annual holiday allowance
  • Work from anywhere for 2 weeks a year
  • Life Assurance and Income Protection to protect your loved ones
  • Benefits allowance for health, dental, and vision coverage
  • Six months paid maternity leave, and one month paid paternity leave (subject to qualifying conditions) inclusive of same-sex and adoptive parents
  • Defined Contribution Pension and Salary Sacrifice Scheme
  • Be Well: Our award-winning wellbeing and mental health programme to support all MVFers and their families
  • Fulltime
Read More
Arrow Right

Senior AI Software Developer

The Senior AI Engineer owns end-to-end delivery of AI features—from design to pr...
Location
Location
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or master’s degree in computer science, engineering, data science, machine learning, artificial intelligence, or closely related quantitative discipline
  • Typically, 7-10 years’ experience
  • LLMs & Agents: Prompt engineering, function/tool calling, orchestration frameworks, RAG
  • ML/DS: Evaluation metrics (precision/recall, BLEU/ROUGE where relevant), error analysis
  • Data/RAG: Embeddings, similarity (cosine/IP), chunking, rerankers, vector DB operations
  • Backend: Python (FastAPI/Flask), microservices patterns
  • MLOps/Infra: Docker, Kubernetes, CI/CD, artifact management, GPU scheduling
  • Observability: Metrics/logging/tracing, dashboards, automated evaluation pipelines
  • Frameworks: PyTorch/TensorFlow, Hugging Face, LangChain/LlamaIndex
  • Data: Pandas, SQL/NoSQL, Parquet/Arrow, Kafka/queues
Job Responsibility
Job Responsibility
  • Translate high-level designs into clear component contracts, APIs, and service boundaries
  • Implement LLM integrations, RAG pipelines, agents, tool/function calling, and prompt strategies
  • Own feature delivery for sprints/releases
  • maintain high code quality and documentation
  • Fine-tune models when needed
  • design evaluation harnesses and metrics
  • Build A/B testing setups
  • track accuracy, latency, robustness, and task success rates
  • Conduct error analysis
  • iterate using feedback efficacy loops and prompt refinement
What we offer
What we offer
  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion
Read More
Arrow Right
New

Deployment Strategist

At its core, the Deployment Strategist role centers around using data in operati...
Location
Location
United States , Chicago
Salary
Salary:
110000.00 - 170000.00 USD / Year
palantir.com Logo
Palantir Technologies
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Willing and able to work from the Chicago Metropolitan Area
  • Ability to travel 25 – 75% required
  • Experience with programming, scripting or statistical packages (e.g. Python, R, Matlab, SQL)
Job Responsibility
Job Responsibility
  • Synthesize disconnected streams of thought into an understanding of the most important problem
  • Immerse in customers' most intricate workflows
  • Partner with customer teams and explore the data
  • Plunge into the product landscape to enable scaling
  • Go onsite and meet with customer analysts to understand critical questions
  • Identify relevant datasets through deep engagement with customer problems
  • Work with Forward Deployed Engineers to integrate data into a stable pipeline
  • Work with the customer to build customized workflows for new user groups
  • Lead training sessions
  • Present results and proposals to audiences ranging from analysts to C-suite executives
What we offer
What we offer
  • Medical, dental, and vision insurance
  • Voluntary life insurance
  • Basic life, AD&D and disability insurance
  • Commuter benefits
  • Take what you need paid time off
  • 2 weeks paid time off built into the end of each year
  • 10 paid holidays
  • Supportive leave of absence program
  • Paid leave for new parents
  • Subsidized back-up care for all parents
  • Fulltime
Read More
Arrow Right
New

Resin Material Handler

12-hour Day Shift - 6am - 6pm. Assist warehouse activities during times when ask...
Location
Location
United States , Buffalo Grove, Illinois
Salary
Salary:
21.00 - 23.00 USD / Hour
nemera.net Logo
Nemera
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • High school diploma or equivalent
  • Ability to frequently lift up to 55 lbs.
  • Sitting and Standing over differing long periods of time
  • Daily warehouse cleaning tasks and 5s activities
  • Excellent communication skills
  • Well organized with strong attention to detail
  • Ability to operate a forklift (experience with other powered equipment a plus)
  • Commitment to getting tasks done in a timely, accurate manner.
  • Ability to perform material handler and packer responsibilities
Job Responsibility
Job Responsibility
  • Assist warehouse activities during times when asked
  • Identify issues and perform corrective actions.
  • Work in accordance with set departmental/company 12-hour schedule.
  • Assist colleagues when necessary to ensure a smooth running of the warehouse
  • Provide recommendations on process improvements
  • Assist with continuous improvement initiatives implementation
  • Perform specific approved operator level equipment maintenance
  • Execute PPE safety standards while ensuring escalation of issues such as the adherence to PPE.
  • Monitor departmental 5S activities, ensuring compliance and standards are met
  • Ensure all warehouse documentation applicable to the job is completed in a timely, accurate manner
  • Fulltime
Read More
Arrow Right
New

Data & Analytics Developer

If you enjoy working with data and automating business processes, and have exper...
Location
Location
United Kingdom
Salary
Salary:
Not provided
necsws.com Logo
NEC Software Solutions
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years’ experience in Power BI Development including DAX queries, Power Query and data modelling
  • Proficiency in Power Apps and Power Automate development.
  • You have a keen interest in data analytics and workflow automation.
  • Strong diligence and critical thinking skills.
  • You have excellent interpersonal, communication and organisational skills.
  • Able and willing to travel throughout the UK.
  • Knowledge of Power Tools (Power BI, Power Apps and Power Automate).
  • Experience of creating reports, dashboards and using M query.
  • Familiarity with Microsoft 365 tools (SharePoint, Teams, etc)
  • Exposure to AI concept or willingness to learn.
Job Responsibility
Job Responsibility
  • Contribute to the data discovery and design of data model for analytical projects.
  • Create and maintaining Power BI reports and dashboards.
  • Develop and automate workflows using Power Automate to streamline and optimize business processes.
  • Build and deploy custom business applications using Power Apps to solve business challenges and enhance productivity.
  • Maintain reporting datasets and dataflows.
  • Help with documentation and optimising analytical reporting delivery process.
  • Collaborate with other team members to ensure reporting assets are complaint and secured.
  • Stay current with industry trends and emerging technologies related to Power BI, Power Apps and Power Automate, and SharePoint to recommend enhancements.
  • Identifying new areas of influence for the Data & Analytics division.
What we offer
What we offer
  • Private Medical Cover funded by NEC for Employees (with the option to add family members at an additional cost)
  • 25 days paid holiday with the option to buy/sell (FTE)
  • 4 x basic salary life assurance cover funded by NEC (with the option to increase cover at an additional cost)
  • A Group Pension Plan with fantastic employer contributions up to a maximum of 8.5%
  • A selection of flexible benefits to suit your individual needs
  • All colleagues get free access to LinkedIn Learning. Over 15000 courses covering a huge breadth of subjects. Learn about what you like, when you like, how you like.
  • Fulltime
Read More
Arrow Right
New

Sales Consultant

We are seeking an enthusiastic and motivated Sales Consultant to join our Fortit...
Location
Location
Australia , Fortitude Valley
Salary
Salary:
Not provided
plush.com.au Logo
Plush Think Sofas
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Previous sales experience, ideally selling furniture or high-value items such as jewellery, cars, bedding, luxury goods / travel, etc.
  • Positive attitude and enthusiasm, especially during busy periods
  • Strong interpersonal skills with a focus on teamwork and collaboration
  • Open to feedback and eager to learn, demonstrating a growth mindset
  • Excellent organizational skills and the ability to manage multiple responsibilities
Job Responsibility
Job Responsibility
  • Deliver outstanding customer service to create the optimal Nick Scali experience
  • Utilize your product knowledge and selling skills to achieve daily and weekly sales targets
  • Ensure accurate completion of sales order paperwork and internal documentation for timely order processing
  • Maximize sales through effective selling techniques, including room solutions and add-on sales
  • Collaborate with the Showroom Manager to uphold showroom standards, including visual merchandising and pricing accuracy
What we offer
What we offer
  • Flexible working models available, ranging from 2 to 5 days a week, promoting work-life balance
  • Competitive salary with generous uncapped commission
  • Continuous training and career development opportunities
  • A supportive team environment that values innovation and improvement
  • Fulltime
Read More
Arrow Right