CrawlJobs Logo

Senior AI Hardware Architect

https://www.microsoft.com/ Logo

Microsoft Corporation

Location Icon

Location:
United States , Mountain View

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

119800.00 - 234700.00 USD / Year

Job Description:

Join the Systems Planning and Architecture (SPARC) team within Microsoft’s Azure Hardware Systems and Infrastructure (AHSI) organization, the team behind Microsoft’s expanding Cloud Infrastructure and for powering Microsoft’s “Intelligent Cloud” mission. Microsoft delivers more than 200 online services to more than one billion individuals worldwide, and AHSI is the team behind our expanding cloud infrastructure. We deliver the core infrastructure and foundational technologies for Microsoft's cloud businesses including Microsoft Azure, Bing, MSN, Office 365, OneDrive, Skype, Teams and Xbox Live. We are seeking a Senior AI Hardware Architect to join the AI Systems Architecture (ASA) group, where we define, analyze, and optimize next-generation AI accelerator platforms and large-scale inference and training systems. In this role, you will lead performance analysis, profiling, kernel-level optimization, and end-to-end performance characterization across GPU and accelerator architectures, working across hardware, software, and system boundaries. You will analyze real-world AI workloads across modern GPU platforms and in-house AI accelerators, identifying performance bottlenecks and architectural trade-offs through rigorous measurement and benchmarking. A key aspect of this role is correlating on-silicon measurements, software traces, and kernel execution behavior with architectural models and simulators, enabling deep insight into performance behavior and guiding data-driven architectural decisions. You will collaborate closely with architecture, microarchitecture, compiler, runtime, and systems teams, and contribute to the development of data correlation, analysis, and visualization tools that improve performance insight and optimization velocity. Through quantitative analysis and cross-platform understanding, you will play a critical role in shaping future accelerator and system architectures across the AI hardware and software stack.

Job Responsibility:

  • Lead performance analysis, profiling, and benchmarking across GPU and in-house AI accelerator architectures, applying rigorous data and statistical analysis to identify complex performance bottlenecks, root causes, and optimization opportunities across hardware, software, and system layers
  • Run and analyze end-to-end AI models on production-like serving infrastructure, performing deep dives into modern AI serving stacks (e.g., optimized LLM serving frameworks, schedulers, runtimes, and memory management systems) to understand performance behavior, scalability limits, and system-level trade-offs
  • Provide data-driven recommendations and architectural trade-offs to senior technical leadership, balancing performance, complexity, cost, quality, reliability, and development timelines to inform accelerator and system architecture decisions
  • Develop and implement technical solutions to complex performance, quality, and design challenges, including kernel-level optimization, architectural tuning, and system-level performance improvements across multiple products or feature areas
  • Correlate on-silicon measurements, software traces, and kernel execution behavior with architectural models and simulators, ensuring alignment between measured performance and architectural intent, and identifying gaps that drive future design enhancements
  • Design, build, and evolve data correlation, analysis, and visualization tools and workflows that scale performance insight, accelerate debugging, and improve clarity and communication of optimization opportunities across teams
  • Lead and contribute to design and performance documentation, including architecture reviews, performance reports, functional specifications, and customized analyses
  • communicate progress, risks, and recommendations within and across teams, and help identify and mitigate significant project risks

Requirements:

  • Master's Degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 3+ years technical engineering experience OR Bachelor's Degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 5+ years technical engineering experience OR equivalent experience
  • Ability to meet Microsoft, customer, and/or government security screening requirements for this role
  • Passing the Microsoft Cloud background check upon hire/transfer and every two years thereafter

Nice to have:

  • Doctorate in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 3+ years technical engineering experience OR Master's Degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 6+ years technical engineering experience OR Bachelor's Degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 8+ years technical engineering experience OR equivalent experience
  • MS or PhD in Machine Learning, Computer Architecture/Systems, Electrical Engineering, High-Performance Computing, or related areas
  • 4+ years of experience in Computer Architecture, AI Systems, or closely related technical domains
  • Experience with GPU and AI accelerator architectures, including compute pipelines, memory hierarchies, interconnects, and parallel execution models
  • Demonstrated expertise in performance profiling, benchmarking, and root-cause analysis, using hardware performance counters, software traces, and workload-level measurements
  • Hands-on experience with kernel-level performance analysis and optimization, and correlating kernel behavior with architectural and system-level performance
  • Strong programming and scripting skills in Python and C/C++ for performance analysis, tooling, benchmarking, and automation
  • Experience with architectural modeling or simulators and correlating modeled behavior with measured hardware performance
  • Experience running and analyzing end-to-end AI models on serving or training infrastructure, with the ability to diagnose performance issues across hardware, runtime, and system layers
  • Hands-on experience with AI frameworks and runtimes, including PyTorch, and familiarity with modern AI serving stacks such as vLLM and SGLang frameworks
  • Ability to communicate complex technical concepts clearly through design documentation, performance reports, functional specifications, and technical presentations

Additional Information:

Job Posted:
February 13, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior AI Hardware Architect

Senior Solution Architect AI & HPC

AI is a high-growth market for HPE, and we believe we are uniquely suited to bri...
Location
Location
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or Master's degree in Engineering, Computer Science, or similar quantitative focus preferred
  • Ability to quickly prototype functionality into scripts for demos, integrations, troubleshooting, etc.
  • Expertise in cloud architectures, specifically with public cloud platforms such as AWS, Azure, or Google Cloud
  • Strong understanding of AI technologies, including machine learning, deep learning, and neural networks
  • Experience participating in solution configurations and the creation of PoCs to meet customer requirements
  • Solid knowledge of infrastructure components, including servers, storage, networking, and virtualization
  • Experience with high-performance computing (HPC) and GPU-accelerated systems is advantageous
  • Demonstrates expert technical skills in assigned area of specialization
  • Expert knowledge of the company offerings, strategic initiatives, current trends, competitor products and strategies within area of responsibility
  • Expert level written and verbal communication skills and mastery over English and local language
Job Responsibility
Job Responsibility
  • Collaborate with sales teams to understand customer requirements and develop tailored solutions for their AI infrastructure needs
  • Engage in pre-sales activities, including technical presentations, demonstrations, and proof-of-concepts
  • Act as a trusted advisor to customers, addressing their questions, concerns, and technical challenges effectively
  • Stay up-to-date with the latest advancements in AI technologies, cloud architectures, and infrastructure trends
  • Lead Proof-of-Concepts (PoC) for HPE customers expanding into Deep Learning or Machine Learning use cases
  • Architect reusable end-to-end AI solutions for HPE customers and prospects
  • Lead technical discussions with customers and partners to propose HPE and partner Integrated solutions
  • Identify solutions, define action plans, and help coordinate and deliver optimal solutions and enhancements
  • Recommend configurations and settings for different types of hardware and interconnect fabrics
  • Assist in any product or technical issue towards an initial sale or renewal of a customer
What we offer
What we offer
  • Health & Wellbeing benefits
  • Personal & Professional Development programs
  • Unconditional Inclusion environment
  • Comprehensive suite of benefits that supports physical, financial and emotional wellbeing
  • Fulltime
Read More
Arrow Right

Distinguished Technologist, Presales Engineering

Distinguished Technologist, Presales Engineering. This role has been designated ...
Location
Location
United States , All, New Jersey
Salary
Salary:
203500.00 - 492500.00 USD / Year
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • AI/ML Networking Expertise: Proven experience designing networks for AI clusters, with a deep understanding of lossless Ethernet, congestion control algorithms (PFC, ECN), and load-balancing techniques specific to AI traffic patterns
  • Edge & Distributed Compute Knowledge: Technical proficiency in MEC (Multi-access Edge Computing) and how it intersects with AI inferencing models and 5G/6G transport
  • Advanced Network Design: 15+ years of experience in Network Infrastructure Design, with at least 3-5 years focused on Hyperscale Datacenter or Cloud-Scale SP architectures
  • Modern Protocol Fluency: Working knowledge of and hands-on experience with JunOS, EOS, or IOS(certifications preferred), with specialized knowledge in EVPN-VXLAN, Segment Routing (SRv6), and telemetry-driven automation
  • Systems Thinking: Ability to bridge the gap between hardware (GPU/NIC/Switch) and software (AI Frameworks, Kubernetes, Virtualization) to provide a holistic "AI Fabric" vision
  • Thought Leadership: Significant experience providing both solution and commercial leadership, specifically in translating the "Cost of AI" into a value-based networking ROI for C-level stakeholders
  • Executive Presence: Professional with strong business acumen and the ability to build relationships with technical decision makers and C-level executives in client organizations
  • Sales Mastery: Experience preparing RFP/Tender response documents, including compliance, bill of materials, and solution documents to drive a successful response
  • Resource Management: Strong resource management skills, including how and when to effectively engage SMEs, specialists, and Engineering resources
  • Minimum Qualifications: 15+ years relevant industry experience in Cloud/Networking Infrastructure, routing, and switching domains
Job Responsibility
Job Responsibility
  • Lead AI Grid Strategy: Serve as the primary architect for "AI Grid" initiatives, helping Service Providers build interconnected, high-performance compute and networking fabrics designed specifically for distributed AI workloads
  • Architect Edge AI Solutions: Design and implement low-latency networking architectures to support Inferencing at the Edge, ensuring SPs can deliver AI services closer to the end-user with minimal jitter and maximum throughput
  • Optimize AI Infrastructure: Articulate the value of specialized AI/ML networking, including the orchestration of RDMA over Converged Ethernet (RoCE v2) and InfiniBand-to-Ethernet transitions within the Modern Datacenter
  • End-to-End AI Design: Responsible for the architecture of end-to-end Networking Infrastructure that supports both the "Front-end" (management/client) and "Back-end" (GPU-to-GPU) AI clusters
  • Strategic Advisory: Advise Sales Engineers and partners on the transition from traditional SP routing to AI-Optimized WAN and Fabric solutions, ensuring customers' business requirements for massive scale and predictive analytics are met
  • Ecosystem Integration: Develop deep partnerships with Silicon providers, AI software innovators, and Integrators to solve emerging "AI-scale" problems across multi-tenant environments
  • Hyperscale AI Networking: Architect and design Hyperscale solutions that specifically address "Job Completion Time" (JCT) metrics critical to AI training customers
  • Complex Deal Orchestration: In complex Networking Infrastructure deals, the Distinguished Technologist will be responsible for the end-to-end technical solution & customizations, orchestrating other Specialists and Technical resources including Systems Engineers as well as Product Management
  • Technical Validation: Excel in delivering Demos and Proof of Concept on solutions as well as clearly articulating the value proposition aligned to customer use cases
  • Stakeholder Engagement: Build relationships with senior resources and technical decision makers with key customers to ensure smooth knowledge transfer and hand-over to the delivery team
What we offer
What we offer
  • Health & Wellbeing: We strive to provide our team members and their loved ones with a comprehensive suite of benefits that supports their physical, financial and emotional wellbeing
  • Personal & Professional Development: We also invest in your career because the better you are, the better we all are. We have specific programs catered to helping you reach any career goals you have — whether you want to become a knowledge expert in your field or apply your skills to another division
  • Unconditional Inclusion: We are unconditionally inclusive in the way we work and celebrate individual uniqueness. We know varied backgrounds are valued and succeed here. We have the flexibility to manage our work and personal needs. We make bold moves, together, and are a force for good
  • Fulltime
Read More
Arrow Right

Senior AI Network Architect

Microsoft Silicon, Cloud Hardware, and Infrastructure Engineering (SCHIE) is the...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master's Degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 3+ years technical engineering experience
  • OR Bachelor's Degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 5+ years technical engineering experience
  • OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Microsoft Cloud Background Check
  • 3+ years of experience in designing AI backend networks and integrating them into large-scale GPU systems
  • Proven expertise in system architecture across compute, networking, and accelerator domains
  • Deep understanding of RDMA protocols (RoCE, InfiniBand), congestion control (DCQCN), and Layer 2/3 routing
  • Experience with optical interconnects (e.g., PSM, WDM), link budget analysis, and transceiver integration
  • Familiarity with signal integrity modeling, link training, and physical layer optimization
Job Responsibility
Job Responsibility
  • Spearhead architectural definition and innovation for next-generation GPU and AI accelerator platforms, with a focus on ultra-high bandwidth, low-latency backend networks
  • Drive system-level integration across compute, storage, and interconnect domains to support scalable AI training workloads
  • Partner with silicon, firmware, and datacenter engineering teams to co-design infrastructure that meets performance, reliability, and deployment goals
  • Influence platform decisions across rack, chassis, and pod-level implementations
  • Cultivate deep technical relationships with silicon vendors, optics suppliers, and switch fabric providers to co-develop differentiated solutions
  • Represent Microsoft in joint architecture forums and technical workshops
  • Evaluate and articulate tradeoffs across electrical, mechanical, thermal, and signal integrity domains
  • Frame decisions in terms of TCO, performance, scalability, and deployment risk
  • Lead design reviews and contribute to PRDs and system specifications
  • Shape the direction of hyperscale AI infrastructure by engaging with standards bodies (e.g., IEEE 802.3), influencing component roadmaps, and driving adoption of novel interconnect protocols and topologies
  • Fulltime
Read More
Arrow Right

Director Product Management (Artificial Intelligence Hardware)

Do you want to be at the forefront of innovating the latest hardware designs to ...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree AND 8+ years experience in product/service/project/program management or software development OR equivalent experience
  • 3+ years of experience working on AI systems as an architect or a product manager
  • 7+ years of technical product management experience, including products within datacenter Hardware systems and/or Cloud infrastructure
  • 7+ years experience creating product roadmap(s) from conception to launch, driving end-to-end program execution, defining product go-to-market strategy, and leading program direction discussions
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Collaborate with customers and partner organizations to define future generations of Artificial Intelligence (AI) Hardware for Azure at Microsoft
  • Lead the strategic product vision, roadmap and product requirements for our next generations of AI hardware platforms
  • Identify and prioritize customer needs, market opportunities, and competitive gaps, and translate them into clear and actionable product requirements and specifications
  • Drive executive decision making for new investments, including competitive analysis, program goals and business requirements, architectural concepts, risk management strategies, financial analysis, schedule and hardware strategy
  • Lead technical programs from concept to execution, collaborating with architecture, engineering and business teams to develop and drive end-to-end product development
  • Develop and maintain a high level of technical proficiency in AI workload requirements, AI technology landscape and AI Industry roadmaps
  • Engage with senior leadership, highlighting risks across functional teams and providing recommendations to support product level decisions
  • Operate effectively in ambiguity. Apply process where it creates value, and design process where it’s needed. Recognize the situations where each approach is most appropriate
  • Fulltime
Read More
Arrow Right

Director, Product Manager (Artificial Intelligence Hardware)

Do you want to be at the forefront of innovating the latest hardware designs to ...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree AND 8+ years experience in product/service/program management or software development OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Microsoft Cloud Background Check
  • 3+ years of experience working on AI systems as an architect or a product manager
  • Strong understanding of several of the below areas: GPU Architecture, Scale Up and Scale out networking, Data Center Specifications, Model Architecture, Workloads and System Reliability
  • 7+ years of technical product management experience, including products within datacenter Hardware systems and/or Cloud infrastructure
  • 7+ years experience creating product roadmap(s) from conception to launch, driving end-to-end program execution, defining product go-to-market strategy, and leading program direction discussions
  • Understanding and passion for technical product management including presentation skills and written communication
  • Ability to build relationships and influence in a matrix organization
  • managing cross team deliverables including program costs, schedules, risks and issues mitigation, establishing process and framework for large scale collaboration
Job Responsibility
Job Responsibility
  • Collaborate with customers and partner organizations to define future generations of Artificial Intelligence (AI) Hardware for Azure at Microsoft
  • Lead the strategic product vision, roadmap and product requirements for our next generations of AI hardware platforms
  • Identify and prioritize customer needs, market opportunities, and competitive gaps, and translate them into clear and actionable product requirements and specifications
  • Drive executive decision making for new investments, including competitive analysis, program goals and business requirements, architectural concepts, risk management strategies, financial analysis, schedule and hardware strategy
  • Lead technical programs from concept to execution, collaborating with architecture, engineering and business teams to develop and drive end-to-end product development
  • Develop and maintain a high level of technical proficiency in AI workload requirements, AI technology landscape and AI Industry roadmaps
  • Engage with senior leadership, highlighting risks across functional teams and providing recommendations to support product level decisions
  • Operate effectively in ambiguity. Apply process where it creates value, and design process where it’s needed. Recognize the situations where each approach is most appropriate
  • Fulltime
Read More
Arrow Right

Senior AI Presales Consultant

We are seeking a high-impact, strategic AI Presales Consultant to join our elite...
Location
Location
India , Mumbai
Salary
Salary:
Not provided
eviden.com Logo
Eviden
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years in a customer-facing technical role (e.g., Presales, Solutions Architecture, AI Specialist, or Technical Consulting), with a proven track record of designing large-scale AI, ML, or HPC solutions
  • Deep, hands-on understanding of LLM architectures. Must be able to architect, explain, and build PoCs for RAG pipelines, including vector databases (e.g., Milvus, Pinecone, Chroma), embedding models, and data ingestion strategies
  • Direct experience in sizing AI infrastructure. Must be able to perform "napkin math" and detailed calculations for GPU, CPU, memory, and network requirements
  • Must be able to fluently discuss performance metrics (tokens/second, latency, throughput, TFLOPS) and their relationship to hardware choice (e.g., NVIDIA H100 vs. A100, memory bandwidth, interconnects like NVLink/InfiniBand)
  • Expertise in the AI software stack. Strong understanding of MLOps principles (Kubeflow, MLflow), Kubernetes (K8s) for AI workloads, and model serving platforms (NVIDIA Triton, KServe, or similar)
  • Strong, current knowledge of the AI model landscape (e.g., Llama family, Mistral, GPT-family, foundation models). Ability to discuss fine-tuning techniques, quantization, and pruning
  • Exceptional communication, whiteboarding, and presentation skills. Ability to translate executive-level business needs into detailed technical architecture and build a compelling C-level value proposition
  • Bachelor's or Master's degree in Computer Science, AI, Data Science, or a related engineering field
Job Responsibility
Job Responsibility
  • Strategic Client Advisory: Lead executive-level "Art of the Possible" workshops and technical discovery sessions to understand a client's business goals, data readiness, and AI maturity
  • Full-Stack Solution Architecture: Design holistic, end-to-end AI solutions that synergize our supercomputing hardware, AI software platform, and MLOps capabilities to meet specific client needs
  • Generative AI & LLM Expertise: Act as the subject matter expert on Generative AI. Architect and evangelize scalable data ingestion and preparation pipelines, specializing in Retrieval-Augmented Generation (RAG) frameworks
  • Infrastructure Sizing & Performance Modelling: Analyse customer workloads (data volume, model complexity, training frequency, inference throughput) to accurately size the required platform infrastructure, including Kubernetes clusters, data storage, and software licenses. This includes calculating compute, storage, and network requirements based on key performance metrics like model parameters, token performance (tokens/sec), desired latency, and concurrent user load
  • Model & Software Consultation: Advise clients on AI model selection, comparing the trade-offs of open-source vs. proprietary LLMs, fine-tuning vs. foundation models, and model quantization
  • Position and demonstrate our proprietary AI software platform, MLOps tools, and libraries, integrating them into the client's ecosystem
  • Inference Optimization: Design and architect robust, low-latency, and high-throughput inference solutions for complex AI models, including large-scale LLM serving
  • User Experience (UX) Advocacy: Collaborate with client teams to define the end-user experience, ensuring the solution delivers tangible business value and a seamless interface for data scientists, analysts, and application users
  • Sales Cycle Enablement: Own the technical narrative throughout the sales cycle. Build and deliver compelling presentations, custom demonstrations, and Proofs of Concept (PoCs). Lead the technical response to complex RFIs/RFPs
  • Fulltime
Read More
Arrow Right

Software Engineer II and Senior Software Engineer - AI Compilers

The AI Frameworks team at Microsoft develops the AI software used to train and d...
Location
Location
United States , Mountain View
Salary
Salary:
100600.00 - 199000.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 2+ years technical engineering experience with coding in languages including, but not limited to, C, C++, or Python OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Invent and implement innovative compiler features and advanced optimization passes, leveraging tools such as LLVM, MLIR, Torch Dynamo, and Triton
  • Develop code generation techniques for new hardware platforms
  • Design and develop cutting edge AI software in C++ and Python
  • Optimize AI workloads
  • Design new programming abstractions for AI
  • Collaborate broadly across multiple disciplines from hardware architects to ML developers
  • Identify requirements, plan and design solutions, estimate effort, and schedule deliverables
  • Help establish and drive the adoption of outstanding coding standards and patterns and help enhance our inclusive engineering culture
  • Embody Microsoft's culture and values
  • Fulltime
Read More
Arrow Right

Senior AI Software Architect

Do you want to be at the forefront of innovating the latest hardware designs to ...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Port and optimize large-scale AI models (e.g., foundation models, diffusion models, YOLO) to run efficiently on Maia hardware
  • Integrate models using frameworks such as PyTorch, ONNX, vLLM, and SGLang
  • Apply techniques like KV cache quantization (e.g., BF16 → FP8), checkpointing, and re-sharding for efficient inference and training
  • Experiment with parallelism strategies (TP, PP) and analyze performance impacts across interconnects (NVLink vs PCIe)
  • Collaborate on improving inference pipelines, including KV caching in sglang/vllm and performance tuning at the PyTorch level
  • Work with Triton kernels for basic operations (e.g., FP8 dequantization) and assist in kernel performance analysis
  • Partner with hardware architects and kernel developers for co-design discussions
  • Communicate effectively with multiple stakeholders to align on performance goals and deliverables
  • Fulltime
Read More
Arrow Right