CrawlJobs Logo

Senior LLM Inference Performance Engineer

Finland, Helsinki · Job Posted May 29, 2026
Apply Position
Job Link Share

Job Description

WHAT YOU DO AT AMD CHANGES EVERYTHING At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you'll discover the real differentiator is our culture. We push the limits of innovation to solve the world's most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career. THE ROLE: The focus of this role is on the performance analysis and optimization of production grade AI services; in particular, in the AMD Inference Microservice (AIM) ecosystem. You will be part of a diverse and ambitious team responsible for ensuring reliable performance of various AI microservices on diverse hardware configurations. You will work with state-of-the-art AI tooling and models on cutting edge AI infrastructure. This role requires both deep understanding of LLMs as well as hands-on knowledge of AI tooling like inference servers.

Job Responsibility

  • LLM and AI Performance: Measure, analyze, and optimize LLM and AI service performance across metrics like latency and throughput for various training and inference use cases
  • Design and implement methodologies for measuring model performance, and automating optimization strategies to identify optimal configurations
  • Stay on top of current advances in AI, models, APIs, and open-source ecosystems, and translate them into scalable solutions
  • LLM and AI Tooling: Design and develop tooling to measure and analyze the performance of AI model deployments and the effect of different configurations and infrastructure, standalone and Kubernetes clusters
  • Develop and maintain tooling for interacting with different ecosystem functions to improve developer and user experience
  • Develop and maintain internal tooling to support LLM and AI performance tuning at scale

Requirements

  • Seasoned in deploying LLMs and other AI model types in production using frameworks like vLLM, SGLang, or similar tooling
  • Deep knowledge about LLM serving and performance metric evaluation
  • Comfortable with Python software development and bash scripting
  • Experience with Docker, Kubernetes and Helm
  • Desire and ability to continuously learn in a fast-changing environment
  • Initiative, pragmatic problem solving, and great collaboration skills
  • Bachelor's or master's degree in computer science, computer engineering, electrical engineering, or an equivalent field

Nice to have

  • Experience with multi-objective hyper-parameter optimization
  • Knowledge of GPU architecture, kernel development, and debugging (C/C++/CUDA)

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Senior LLM Inference Performance Engineer

8 matching positions

Senior LLM Backend Engineer

We are looking for a Senior Backend Engineer with a strong focus on Large Langua...
Location
Location
Spain
Salary
Salary:
Not provided
bark.com Logo
Bark
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Extensive production experience with Python in backend engineering
  • Proven experience integrating LLMs into applications via APIs or SDKs
  • Strong experience building and maintaining APIs for LLM-based features
  • Strong experience building and maintaining event-driven workflows
  • Strong experience building and maintaining business logic that consumes AI outputs
  • Strong experience building and maintaining integrations with 3rd party AI/ML platforms
  • Solid SQL and NoSQL experience (especially in AI data pipelines)
  • Production experience with Docker, ideally with Kubernetes or AWS Fargate/ECS/EKS
  • Experience deploying and maintaining AI services in cloud environments
  • Strong organisational skills and ability to deliver in a fast-paced, product-focused environment
Job Responsibility
Job Responsibility
  • Work with product managers to understand user needs and translate them into AI-powered functionality
  • Design and build APIs, services, and workflows that integrate LLMs (both proprietary and open-source)
  • Implement prompt engineering, RAG pipelines, and model fine-tuning where required
  • Optimise AI inference performance, scalability, and cost-effectiveness
  • Ensure AI features meet high standards for security, reliability, and maintainability
  • Collaborate with other engineers to integrate AI features seamlessly into the wider system
  • Stay on top of emerging LLM technologies and best practices, running experiments and sharing knowledge across the team
What we offer
What we offer
  • Fully remote working
  • Personal annual L&D Budgets with 600€ to spend on your development
  • Being at the forefront of an industry with new and exciting problems to solve
  • Fulltime
Read More
Arrow Right

Senior Machine Learning Engineering Manager, Gen AI

We're seeking a Senior Machine Learning Manager (M60) to lead a cross-functional...
Location
Location
United States
Salary
Salary:
193500.00 - 303150.00 USD / Year
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years in ML, search, or backend engineering roles, with 3+ years leading teams
  • Strong track record of shipping ML-powered or LLM-integrated user-facing products
  • Experience with RAG systems (vector search, hybrid retrieval, LLM orchestration)
  • Deep experience in either modeling (e.g., LLMs, search, NLP) or engineering (e.g., backend infra, full-stack), with the ability to lead end-to-end
  • Deep understanding of LLM ecosystems (OpenAI, Claude, Mistral, OSS), orchestration frameworks (LangChain, LlamaIndex), and vector databases (Weaviate, Pinecone, FAISS, etc.)
  • Strong product intuition and ability to translate complex tech into valuable user features
  • Familiarity with GenAI evaluation methods: hallucination detection, groundedness scoring, and human-in-the-loop feedback loops
  • Master’s or PhD in Computer Science, Machine Learning, or related field preferred—or equivalent practical experience
Job Responsibility
Job Responsibility
  • Lead the vision, design, and execution of LLM-powered AI products, leveraging advance AI modeling (e.g. SLM post-training/fine-tuning), RAG architectures and hybrid ranking system
  • Define system architecture across retrievers, rankers, orchestration layers, prompt templates, and feedback mechanisms
  • Work closely with product and design teams to ensure delightful, fast, and grounded user experiences
  • Build and manage a cross-disciplinary team including ML engineers, backend/frontend engineers, and applied scientists
  • Foster a culture of E2E ownership — empowering the team to move from prototype to production quickly and iteratively
  • Mentor individuals to grow in both technical depth and product acumen
  • Shape the technical roadmap and long-term strategy for GenAI search across Atlassian’s product suite
  • Partner with platform and infra teams to scale inference, evaluate performance, and integrate usage signals for continuous improvement
  • Champion data quality, grounding, and responsible AI practices in all deployed features
What we offer
What we offer
  • health and wellbeing resources
  • paid volunteer days
  • Fulltime
Read More
Arrow Right

Senior Staff Machine Learning Engineer

Help design our AI platform and develop our next generation of machine learning ...
Location
Location
United States , San Francisco
Salary
Salary:
216500.00 - 324500.00 USD / Year
gofundme.com Logo
GoFundMe
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 9+ years of hands-on experience in machine learning engineering, AI development, software engineering, or related fields
  • Experience emphasizing secure, large-scale, distributed system design, AI/ML pipeline development, and implementation
  • Extensive experience designing, developing, and operating scalable backend systems
  • Experience applying software engineering best practices such as domain-driven design, event-driven architectures, and microservices
  • Deep expertise in agentic workflows, AI evaluation solutions, prompt management, and secure AI development and testing practices
  • Strong knowledge of relational and document-based databases, data storage paradigms, and efficient RESTful API design
  • Experience establishing robust CI/CD pipelines, automated testing (unit and integration), and deployment practices
  • Strong leadership skills, including effective planning and management of complex projects, mentoring of team members, and fostering a collaborative, high-performing engineering culture
  • Excellent communicator, able to articulate complex technical concepts clearly to both technical and non-technical stakeholders
  • Bachelor's degree in Computer Science, Software Engineering, or a related technical field (preferred)
Job Responsibility
Job Responsibility
  • Design and implement AI platforms to enable scalable and secure access to LLMs from multiple model providers for diverse use cases
  • Design and implement agentic workflows, agentic tool ecosystems, and LLM prompt management solutions
  • Design, build, and optimize scalable model training, fine tuning, and inference pipelines, ensuring robust integration with production systems
  • Influence technical strategy and approach to developing embedding stores, vector databases, and other reusable assets
  • Lead initiatives to streamline ML and AI workflows, improve operational efficiency, and establish standardized procedures to achieve consistent, high-quality results across our AI systems
  • Design and develop backend services and RESTful APIs using Python and FastAPI, integrating seamlessly with ML pipelines and services
  • Take operational responsibility for team-owned services, including performance monitoring, optimization, troubleshooting, and participation in an on-call rotation
  • Collaborate with both technical and non-technical colleagues, including data and applied scientists, software engineers, product managers, and business stakeholders, to deliver reliable and scalable ML-driven products
  • Coach and mentor fellow ML engineers, promoting a culture of collaboration, continuous improvement, and engineering excellence within the team
  • Employ a diverse set of tools and platforms including Python, AWS, Databricks, Docker, Kubernetes, FastAPI, Terraform, Snowflake, Coralogix, and GitHub to build, deploy, and maintain scalable, highly available machine learning infrastructure
What we offer
What we offer
  • Competitive pay
  • Comprehensive healthcare benefits
  • Financial assistance for things like hybrid work, family planning
  • Generous parental leave
  • Flexible time-off policies
  • Mental health and wellness resources
  • Learning, development, and recognition programs
  • Fulltime
Read More
Arrow Right

Director of AI Engineering

We are entering a hyper-growth phase of AI innovation and are hiring a Director ...
Location
Location
Canada; United States
Salary
Salary:
300000.00 - 450000.00 USD / Year
apollo.io Logo
Apollo.io
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10–15+ years in software engineering, with significant leadership experience owning AI/ML or applied LLM systems at scale
  • Proven history shipping LLM-powered features, agentic workflows, or AI assistants used by real customers in production
  • Deep understanding of LLM orchestration frameworks (LangChain, LlamaIndex), RAG pipelines, vector search, embeddings, and prompt engineering
  • Expert in backend & distributed systems (Python strongly preferred) and cloud infrastructure (AWS/GCP)
  • Strong experience with telemetry, observability, and cost-aware real-time inference optimizations
  • Demonstrated ability to lead senior engineers, define technical roadmaps, and deliver outcomes aligned to business metrics
  • Experience building or scaling teams working on experimentation, optimization, personalization, or ML-powered growth systems
  • Exceptional ability to simplify complex problems, set clear standards, and drive alignment across Product, Data, Design, and Engineering
  • Strong product sense, ability to weigh novelty vs. impact, focus on user value, and prioritize speed with guardrails
  • Fluent in integrating AI tools into engineering workflows for code generation, debugging, delivery velocity, and operational efficiency
Job Responsibility
Job Responsibility
  • Define the multi-year technical vision for Apollo’s AI stack, spanning agents, orchestration, inference, retrieval, and platformization
  • Prioritize high-impact AI investments by partnering with Product, Design, Research, and Data leaders to align engineering outcomes with business goals
  • Establish technical standards, evaluation criteria, and success metrics for every AI-powered feature shipped
  • Lead the architecture and deployment of long-horizon autonomous agents, multi-agent workflows, and API-driven orchestration frameworks
  • Build reusable, scalable agentic components that power GTM workflows like research, enrichment, sequencing, lead scoring, routing, and personalization
  • Own the evolution of Apollo’s internal LLM platform for high-scale, low-latency, cost-optimized inference
  • Oversee model-driven experiences for natural-language interfaces, RAG pipelines, semantic search, personalized recommendations, and email intelligence
  • Partner with Product & Design to build intuitive conversational UX that hides underlying complexity while elevating user productivity
  • Implement rigorous evaluation frameworks, including offline benchmarking, human-in-the-loop review, and online A/B experimentation
  • Ensure robust observability, monitoring, and safety guardrails for all AI systems in production
What we offer
What we offer
  • Equity
  • Company bonus or sales commissions/bonuses
  • 401(k) plan
  • At least 10 paid holidays per year
  • Flex PTO
  • Parental leave
  • Employee assistance program and wellbeing benefits
  • Global travel coverage
  • Life/AD&D/STD/LTD insurance
  • FSA/HSA
  • Fulltime
Read More
Arrow Right

Senior LLM / Generative AI / Agentic Solutions Engineer

Our partner is a fast-growing, innovation-driven company building and deploying ...
Location
Location
Hungary , Budapest
Salary
Salary:
Not provided
https://www.randstad.com Logo
Randstad
Expiration Date
June 12, 2026
Flip Icon
Requirements
Requirements
  • Senior AI Expertise: 5+ years building production ML/AI systems, including 2+ years in lead roles with strong Python engineering (performance, testing, packaging)
  • LLM & Agentic AI: Hands-on experience with orchestration, tool-calling, and workflow integration, including LLM adaptation (PEFT/LoRA) and safety engineering
  • Production RAG & Data: Proven track record of operating RAG pipelines, vector databases, and retrieval performance tuning in production
  • MLOps & Cloud: Proficiency in containerized services (REST/gRPC), CI/CD, and monitoring within cloud environments (AWS/GCP/Azure)
  • Advanced Optimization: Experience in inference optimization (vLLM/quantization), event-driven orchestration, and automated evaluation (LLM-as-judge)
Job Responsibility
Job Responsibility
  • Build Agentic Systems: Design supervisor/executor patterns, memory strategies, and robust tool-calling failure handling
  • LLM Adaptation & Deployment: Fine-tune open-source models and optimize inference for production-scale latency and cost
  • Advanced RAG: Implement high-performance embedding, retrieval, and re-ranking pipelines for grounded outputs
  • Structured Generation: Enforce schemas and guardrails to minimize hallucinations and ensure reliable system behavior
  • Evaluation & Quality: Develop automated evaluation harnesses, regression tests, and versioning for prompts and models
  • Production Engineering: Ship containerized APIs with full CI/CD, observability, and reliability monitoring (SLOs)
  • Cross-functional Delivery: Collaborate with product teams to integrate GenAI features and mentor junior engineers
What we offer
What we offer
  • Career Growth
  • Collaborative Team
  • Exciting Projects
  • Remote Work
  • Fulltime
Read More
Arrow Right

Senior LLM / Generative AI / Agentic Solutions Engineer

Our partner is a fast-growing, innovation-driven company building and deploying ...
Location
Location
Hungary , Budapest
Salary
Salary:
Not provided
https://www.randstad.com Logo
Randstad
Expiration Date
May 31, 2026
Flip Icon
Requirements
Requirements
  • Senior AI Expertise: 5+ years building production ML/AI systems, including 2+ years in lead roles with strong Python engineering (performance, testing, packaging)
  • LLM & Agentic AI: Hands-on experience with orchestration, tool-calling, and workflow integration, including LLM adaptation (PEFT/LoRA) and safety engineering
  • Production RAG & Data: Proven track record of operating RAG pipelines, vector databases, and retrieval performance tuning in production
  • MLOps & Cloud: Proficiency in containerized services (REST/gRPC), CI/CD, and monitoring within cloud environments (AWS/GCP/Azure)
  • Advanced Optimization: Experience in inference optimization (vLLM/quantization), event-driven orchestration, and automated evaluation (LLM-as-judge)
Job Responsibility
Job Responsibility
  • Build Agentic Systems: Design supervisor/executor patterns, memory strategies, and robust tool-calling failure handling
  • LLM Adaptation & Deployment: Fine-tune open-source models and optimize inference for production-scale latency and cost
  • Advanced RAG: Implement high-performance embedding, retrieval, and re-ranking pipelines for grounded outputs
  • Structured Generation: Enforce schemas and guardrails to minimize hallucinations and ensure reliable system behavior
  • Evaluation & Quality: Develop automated evaluation harnesses, regression tests, and versioning for prompts and models
  • Production Engineering: Ship containerized APIs with full CI/CD, observability, and reliability monitoring (SLOs)
  • Cross-functional Delivery: Collaborate with product teams to integrate GenAI features and mentor junior engineers
What we offer
What we offer
  • Career Growth
  • Collaborative Team
  • Exciting Projects
  • Remote Work
  • Fulltime
!
Read More
Arrow Right

Senior Applied Scientist

Microsoft Ads powers experiences at global scale through large-scale machine lea...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s /Masters Degree in Computer Science, Mathematics, Software Engineering, Computer Engineering, or related technical field, and 5+ years of related experience in machine learning systems, distributed systems, inference infrastructure, or software engineering
  • OR Doctorate in Computer Science, Mathematics, Software Engineering, Computer Engineering, or related technical field, and 2+ years of related experience
  • Strong programming skills in Python, C++, or C#
  • Hands-on experience in one or more of the following areas: Large-scale ML/LLM inference serving in production
  • MLSys for model deployment, serving, or runtime optimization
  • Experience building or optimizing systems for online inference, batch inference, or near-real-time inference
  • Strong understanding of inference bottlenecks such as batching, queuing, tail latency, KV-cache pressure, memory bandwidth limits, caching, and heterogeneous resource utilization
  • Experience with one or more modern inference stacks or runtimes such as vLLM, TensorRT-LLM, SGLang, Triton, ONNX Runtime, DeepSpeed, or PyTorch inference tooling
  • Experience with modern LLM inference and serving techniques, including areas such as KV-cache management, prefix caching, speculative decoding, quantization, prefill/decode disaggregation, or MoE inference optimization
  • Experience with production-scale model serving platforms and distributed inference systems, including multi-node or multi-tenant deployments, resource-aware scheduling, and optimization across heterogeneous workloads
Job Responsibility
Job Responsibility
  • Design and optimize end-to-end ML/LLM inference workflows across online low-latency serving, near-real-time inference, and large-scale batch inference scenarios
  • Build scalable serving and execution systems for large-scale models, including scheduling, batching, routing, admission control, and resource-aware execution
  • Improve inference performance and efficiency across compute, memory, storage, network, and concurrency dimensions, with strong focus on latency, throughput, reliability, and cost
  • Develop and apply modern serving techniques such as continuous or dynamic batching, prefix caching, KV-cache optimization, request shaping, tail-latency reduction, and runtime-level performance tuning
  • Optimize systems for key generative inference metrics such as time to first token, inter-token latency, throughput, accelerator utilization, and cost per request
  • Work on runtime and serving optimizations for modern inference stacks such as vLLM, TensorRT-LLM, SGLang, Triton, ONNX Runtime, and PyTorch-based serving systems
  • Partner with applied scientists to productionize new models and inference patterns, including agentic workflows with tool use, structured outputs, and long-context workloads, and evaluate quality-latency-cost tradeoffs in real production scenarios
  • Design and improve scheduling and resource management for heterogeneous and multi-tenant inference workloads, including GPU-aware placement, admission control, burst handling, and workload isolation
  • Build strong observability and diagnostics for inference services, including bottleneck analysis, performance regression detection, and end-to-end latency and cost measurement
  • Fulltime
Read More
Arrow Right

Software Engineer II and Senior Software Engineer - Performance

The Artificial Intelligence Performance team at Microsoft develops AI software t...
Location
Location
United States , Mountain View
Salary
Salary:
100600.00 - 199000.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 2+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, or Python OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Identify and drive improvements to end-to-end inference performance of OpenAI and other state-of-the-art LLMs
  • Measure, benchmark performance on Nvidia/AMD GPUs and first party Microsoft silicon
  • Optimize and monitor performance of LLMs and build SW tooling to enable insights into performance opportunities ranging from the model level to the systems and silicon level to improve customer experience and reduce the footprint of the computing fleet
  • Enable fast time to market of LLMs/models and their deployments at scale by building SW tools that afford velocity in porting models on new Nvidia and AMD GPUs
  • Design, implement, and test functions or components for our AI/DNN/LLM frameworks and tools
  • Speeding up/reducing complexity of key components/pipelines to improve performance and/or efficiency of our systems
  • Communicate and collaborate with our partners both internal and external
  • Embody Microsoft's Culture and Values
  • Fulltime
Read More
Arrow Right