CrawlJobs Logo

Software Development Engineer – Distributed Inference

United States, Austin 143280.00 - 214920.00 USD / Year · Job Posted February 04, 2026
Apply Position
Job Link Share

Job Description

AMD is looking for a software engineer who is passionate about Distributed Inferencing on AMD GPUs and improving the performance of key applications and benchmarks. You will be a member of a core team of incredibly talented industry specialists and will work with the very latest hardware and software technology.

Job Responsibility

  • Enable and benchmark AI models on large-scale distributed systems to evaluate performance, accuracy, and scalability
  • Optimize AI workloads across scale-up (multi-GPU), scale-out (multi-node), and scale-across distributed system configurations
  • Collaborate closely with internal GPU library teams to analyze and optimize distributed workloads for high throughput and low latency
  • Develop and apply optimal parallelization strategies for AI workloads to achieve best-in-class performance across diverse system configurations
  • Contribute to distributed model management systems, model zoos, monitoring frameworks, benchmarking pipelines, and technical documentation
  • Build and maintain real-time dashboards reporting performance, accuracy, and reliability metrics for internal stakeholders and external users

Requirements

  • Undergraduate or Master’s or PhD degree in Computer Science, Computer Engineering, or a related field, or equivalent practical experience
  • Strong technical expertise in C++/ Python development
  • Experience solving performance and investigating scalability on multi-GPU, multi-node clusters
  • Passionate about quality assurance, benchmarking, and automation in the AI/ML space
  • Strong C/C++ and Python skills, with experience in software design, debugging, performance analysis, and test development
  • Experience running AI workloads on large-scale, heterogeneous compute clusters
  • Familiarity with cluster management and orchestration platforms such as SLURM and Kubernetes (K8s)
  • Experience with GitHub, Jenkins, or similar CI/CD tools and modern development workflows

Nice to have

  • Hands-on experience with AI inference or serving frameworks such as vLLM, SGLang, and Llama.cpp
  • Understanding KV cache transfer mechanisms and technologies (e.g., Mooncake, NIXL/RIXL) and expert parallelization approaches (e.g., DeepEP, MORI, PPLX-Garden)

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Software Development Engineer – Distributed Inference

8 matching positions

Senior Software Engineer and Principal Software Engineer - Power Point AI Team

The PowerPoint team is embarking on an exciting new chapter - evolving a product...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • 8+ years of experience in backend service engineering, including work on high-scale infrastructures
  • Proficiency in one or more systems programming languages such as C#, C++
  • 1+ years of experience in software engineering, designing and developing systems (and APIs) that deploy and integrate with AI models
  • 2+ years of experience working with rich telemetry, making data driven decisions, and carrying out rapid experimentation
  • 2+ years of experience building software for scale, performance, and reliability
  • Academic or industry experience with building, finetuning, deploying or building eval-driven systems utilizing the models (any category)
Job Responsibility
Job Responsibility
  • Lead design and delivery of complex, scalable AI features ensuring resilience and exceptional user experience
  • Drive technical strategy and architecture decisions across multiple services, influencing partner teams and aligning with compliance and security requirements
  • Champion modern engineering practices, including AI-driven approaches, automation, and cloud-native patterns, across the full development lifecycle
  • Mentor and guide engineers, fostering technical excellence and continuous improvement in security, reliability, and performance
  • Collaborate cross-org to solve challenging technical problems, streamline processes, and reduce operational costs while improving live-site health
  • Design and implement scalable backend services optimized for machine learning workflows and large language model integration
  • Develop and maintain evaluation-driven systems that leverage text and multimodal inputs (e.g., images) to power visual-creation experiences
  • Build and optimize APIs and infrastructure to support high-performance model inference and experimentation at scale
  • Collaborate with product, ML, and design teams to integrate models into user-facing features, ensuring seamless functionality and performance
  • Conduct model evaluations and experiments, analyze results, and iterate on improvements to enhance accuracy and user experience
  • Fulltime
Read More
Arrow Right

Software Development Engineer

We are looking for a dynamic, upbeat software engineer to join our growing team....
Location
Location
China , Shanghai
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Familiarity in Python
  • Familiarity with C++ or async programming
  • Understanding of LLM or multimodal model concepts
  • Knowledge of transformer architectures, attention mechanisms, vision-language alignment, and inference pipelines
  • Theoretical grounding in Transformer/Attention/MoE/KV Cache, and quantization (FP8/FP4)
  • Linux development environment
  • Experience with profiling and diagnosing compute, memory, and communication bottlenecks across multi-GPU and multi-node environments
  • Solid Python/C++ coding skills and experience debugging and testing practices
  • Experience with multimodal models (e.g., Qwen-VL, Qwen-Image-Edit, Wan) or diffusion-based generative models
  • Familiarity with techniques like quantization, PagedAttention, continuous batching, or speculative decoding
Job Responsibility
Job Responsibility
  • Deep Learning & LLM Framework Optimization for AMD GPUs
  • Model-Aware Implementation with LLMs and multimodal architectures
  • Performance-Conscious Coding in multi-GPU environments
  • Profiling using tools to evaluate impact of changes
  • End-to-End Performance Engineering across multi-GPU and multi-node setups
  • Compiler & Pipeline Acceleration using compiler technologies and graph compilers
  • Research & Advanced Techniques like speculative decoding and weight-only quantization
  • Cross-Team & Open-Source Collaboration with internal GPU library teams and open-source maintainers
  • Software Engineering Excellence for maintainable and production-quality performance optimizations
What we offer
What we offer
  • AMD benefits at a glance
  • Fulltime
Read More
Arrow Right

Software Development Engineer

As a core member of the team, you will play a pivotal role in optimizing and dev...
Location
Location
China , Shanghai
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s and/or Master’s Degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field
  • 5+ years of professional experience in technical software development, with a focus on GPU optimization, performance engineering, and framework development
  • Skilled engineer with strong technical and analytical expertise in C++ development within Linux environments
  • Strong problem-solving skills, a proactive approach, and a keen understanding of software engineering best practices
  • Experience in GPU Kernel Development & Optimization for deep learning on AMD GPUs using HIP, CUDA, and assembly (ASM)
  • Strong knowledge of AMD architectures (GCN, RDNA) and low-level programming
  • Experience leveraging tools like Compute Kernel (CK), CUTLASS, and Triton for multi-GPU and multi-platform performance
  • Experience in Deep Learning Integration into machine learning frameworks (e.g., TensorFlow, PyTorch) to accelerate model training and inference
  • Skilled in Python and C++, with experience in debugging, performance tuning, and test design
  • Solid experience in running large-scale workloads on heterogeneous compute clusters
Job Responsibility
Job Responsibility
  • Optimize Deep Learning Frameworks: Enhance and optimize frameworks like TensorFlow and PyTorch for AMD GPUs in open-source repositories
  • Develop GPU Kernels: Create and optimize GPU kernels to maximize performance for specific AI operations
  • Develop & Optimize Models: Design and optimize deep learning models specifically for AMD GPU performance
  • Collaborate with GPU Library Teams: Work closely with internal teams to analyze and improve training and inference performance on AMD GPUs
  • Collaborate with Open-Source Maintainers: Engage with framework maintainers to ensure code changes are aligned with requirements and integrated upstream
  • Work in Distributed Computing Environments: Optimize deep learning performance on both scale-up (multi-GPU) and scale-out (multi-node) systems
  • Utilize Cutting-Edge Compiler Tech: Leverage advanced compiler technologies to improve deep learning performance
  • Optimize Deep Learning Pipeline: Enhance the full pipeline, including integrating graph compilers
  • Software Engineering Best Practices: Apply sound engineering principles to ensure robust, maintainable solutions
What we offer
What we offer
  • Benefits offered are described: AMD benefits at a glance
Read More
Arrow Right

Sr. Software Development Engineer

As a core member of the team, you will play a pivotal role in optimizing and dev...
Location
Location
China , Shanghai
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Skilled engineer with strong technical and analytical expertise in C++ development within Linux environments
  • Ability to define goals, manage development efforts, and deliver high-quality solutions
  • Strong problem-solving skills
  • Proactive approach
  • Keen understanding of software engineering best practices
  • Experience in GPU kernel development & optimization for AMD GPUs using HIP, CUDA, and assembly (ASM)
  • Strong knowledge of AMD architectures (GCN, RDNA) and low-level programming
  • Experience leveraging tools like Compute Kernel (CK), CUTLASS, and Triton for multi-GPU and multi-platform performance
  • Experience in integrating optimized GPU performance into machine learning frameworks (e.g., TensorFlow, PyTorch)
  • Skilled in Python and C++
Job Responsibility
Job Responsibility
  • Optimize Deep Learning Frameworks: Enhance and optimize frameworks like TensorFlow and PyTorch for AMD GPUs in open-source repositories
  • Develop GPU Kernels: Create and optimize GPU kernels to maximize performance for specific AI operations
  • Develop & Optimize Models: Design and optimize deep learning models specifically for AMD GPU performance
  • Collaborate with GPU Library Teams: Work closely with internal teams to analyze and improve training and inference performance on AMD GPUs
  • Collaborate with Open-Source Maintainers: Engage with framework maintainers to ensure code changes are aligned with requirements and integrated upstream
  • Work in Distributed Computing Environments: Optimize deep learning performance on both scale-up (multi-GPU) and scale-out (multi-node) systems
  • Utilize Cutting-Edge Compiler Tech: Leverage advanced compiler technologies to improve deep learning performance
  • Optimize Deep Learning Pipeline: Enhance the full pipeline, including integrating graph compilers
  • Software Engineering Best Practices: Apply sound engineering principles to ensure robust, maintainable solutions
Read More
Arrow Right

Software Development Engineer

As a core member of the team, you will play a pivotal role in optimizing and dev...
Location
Location
China , Shanghai
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s and/or Master’s Degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field
  • 5+ years of professional experience in technical software development, with a focus on GPU optimization, performance engineering, and framework development
  • Skilled engineer with strong technical and analytical expertise in C++ development within Linux environments
  • Strong problem-solving skills, a proactive approach, and a keen understanding of software engineering best practices are essential
  • GPU Kernel Development & Optimization: Experienced in designing and optimizing GPU kernels for deep learning on AMD GPUs using HIP, CUDA, and assembly (ASM)
  • Strong knowledge of AMD architectures (GCN, RDNA) and low-level programming
  • Leveraging tools like Compute Kernel (CK), CUTLASS, and Triton for multi-GPU and multi-platform performance
  • Deep Learning Integration: Experienced in integrating optimized GPU performance into machine learning frameworks (e.g., TensorFlow, PyTorch) to accelerate model training and inference
  • Software Engineering: Skilled in Python and C++
  • Experience in debugging, performance tuning, and test design
Job Responsibility
Job Responsibility
  • Optimize Deep Learning Frameworks: Enhance and optimize frameworks like TensorFlow and PyTorch for AMD GPUs in open-source repositories
  • Develop GPU Kernels: Create and optimize GPU kernels to maximize performance for specific AI operations
  • Develop & Optimize Models: Design and optimize deep learning models specifically for AMD GPU performance
  • Collaborate with GPU Library Teams: Work closely with internal teams to analyze and improve training and inference performance on AMD GPUs
  • Collaborate with Open-Source Maintainers: Engage with framework maintainers to ensure code changes are aligned with requirements and integrated upstream
  • Work in Distributed Computing Environments: Optimize deep learning performance on both scale-up (multi-GPU) and scale-out (multi-node) systems
  • Utilize Cutting-Edge Compiler Tech: Leverage advanced compiler technologies to improve deep learning performance
  • Optimize Deep Learning Pipeline: Enhance the full pipeline, including integrating graph compilers
  • Software Engineering Best Practices: Apply sound engineering principles to ensure robust, maintainable solutions
What we offer
What we offer
  • AMD benefits at a glance
Read More
Arrow Right

Staff Software Engineer, Inference Infrastructure

Our mission is to scale intelligence to serve humanity. We’re training and deplo...
Location
Location
San Francisco, Toronto, London, New York, Montreal
Salary
Salary:
Not provided
cohere.com Logo
Cohere
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of engineering experience running production infrastructure at a large scale
  • Experience designing large, highly available distributed systems with Kubernetes, and GPU workloads on those clusters
  • Experience with Kubernetes dev and production coding and support
  • Experience with GCP, Azure, AWS, OCI, multi-cloud on-prem / hybrid serving
  • Experience in designing, deploying, supporting, and troubleshooting in complex Linux-based computing environments
  • Experience in compute/storage/network resource and cost management
  • Excellent collaboration and troubleshooting skills to build mission-critical systems, and ensure smooth operations and efficient teamwork
  • The grit and adaptability to solve complex technical challenges that evolve day to day
  • Familiarity with computational characteristics of accelerators (GPUs, TPUs, and/or custom accelerators), especially how they influence latency and throughput of inference
  • Strong understanding or working experience with distributed systems
Job Responsibility
Job Responsibility
  • Developing, deploying, and operating the AI platform delivering Cohere's large language models through easy to use API endpoints
  • Working closely with many teams to deploy optimized NLP models to production in low latency, high throughput, and high availability environments
  • Interfacing with customers and creating customized deployments to meet their specific needs
What we offer
What we offer
  • An open and inclusive culture and work environment
  • Work closely with a team on the cutting edge of AI research
  • Weekly lunch stipend, in-office lunches & snacks
  • Full health and dental benefits, including a separate budget to take care of your mental health
  • 100% Parental Leave top-up for up to 6 months
  • Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
  • Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
  • 6 weeks of vacation (30 working days!)
  • Fulltime
Read More
Arrow Right

Senior Software Development Engineer in Test (SDET) - AI Cluster Networking and Security

In AI infrastructure organization, simplifying large hardware deployments with p...
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
cerebras.net Logo
Cerebras Systems
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or master's degree in engineering in computer science, electrical, AI, data science of related field
  • 10+ years of experience in testing one of areas like enterprise software, distributed systems, datacenter hardware and software
  • Experience working in large enterprise or cloud networking infrastructure, high speed switches, routers, firewalls
  • Experience in qualifying networking vendor platforms like Juniper, Arista or Cisco and network test equipment like Ixia/Spirent
  • Experience in Datacenter technology like BGP, ECN, PFC
  • Experience testing networking security, compliance and firewalls
  • Strong coding skills in one of the programming languages like python, golang or C/C++
  • Strong debugging skills to debug issues in large distributed systems, hardware, and software. Experience with debugging tools like gdb, strace, networking monitors
  • Strong understanding of operating systems internals like memory management, file system working, security basics and performance
  • Strong understanding of datacenter layout, device performance characteristics like PCIe, networking and storage
Job Responsibility
Job Responsibility
  • Innovate and execute tests on cutting edge AI infrastructure
  • Define optimized test strategies and methodologies
  • Be a quick learner, adapt to new technologies
  • Build a strong understanding of how to break these large distributed systems challenge into smaller components that can be unit tested
  • Automate first approach - Aim for 100% automated tests to test all cluster features in areas of high availability, failure scenarios, performance, stress and security
  • Champion cluster security, reliability for uptime of 99.9999% and ease of use with observability
  • Test all components of AI cluster including but not limited to cluster software involving kubernetes, prometheus and grafana. Cluster hardware components like ML wafer scale accelerators, CPU runtime nodes, High speed swarmx interconnect, High speed data transfer of weights through memoryx interconnect
  • Qualify cluster networking solutions which consists of high-speed switches, routers and optics from various vendors
  • Qualify cluster security features including OS security, network security, cloud compliance user access and security certifications
What we offer
What we offer
  • Build a breakthrough AI platform beyond the constraints of the GPU
  • Publish and open source their cutting-edge AI research
  • Work on one of the fastest AI supercomputers in the world
  • Enjoy job stability with startup vitality
  • Our simple, non-corporate work culture that respects individual beliefs
Read More
Arrow Right

Principal Software Engineer - CoreAI Model Inference & Serving

Join our team within CoreAI, where we are building the AI data-plane that powers...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, or Java
  • OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • These requirements include, but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Be a hands-on technical leader, designing, coding, and shipping core serving systems, smart routing, and request distribution for a broad portfolio of LLMs, including OpenAI, Mistral, Grok, DeepSeek, and others
  • Build large-scale AI services and platform capabilities that power new products and customer experiences
  • Drive cutting-edge innovation in AI systems alongside world-class engineers and cross-functional partners
  • Lead through architecture, code reviews, mentorship, and technical excellence while staying close to implementation
  • Improve reliability, scalability, observability, efficiency, and performance across mission-critical services
  • Fulltime
Read More
Arrow Right