Performance Engineer, Low-Level Libraries Job at Meta (Bellevue)

Software Engineer - Performance Tools

Join our team as a Software Engineer - Performance Tools and take the lead in il...

Location

United States , San Jose

Salary:

150000.00 - 275000.00 USD / Year

Etched

Expiration Date

Until further notice

Requirements

Strong proficiency in C++ or Rust
Proficiency in Python is a plus
Deep understanding of computer architecture (CPU, GPU, accelerators), memory hierarchies (caches, DRAM), and interconnects (especially PCIe)
Proven experience in low-level performance analysis, profiling, and bottleneck identification on complex hardware systems (GPUs, CPUs, FPGAs, or custom ASICs)
Experience with performance analysis tools (e.g., NVIDIA Nsight, AMD uProf, Intel VTune, perf, Tracy, ETW)
Experience working close to hardware, potentially reading performance counters or interacting directly with device drivers

Job Responsibility

Tool Architecture & Design: Lead the design and architecture of a comprehensive performance analysis suite, including data collection mechanisms, data processing pipelines, analysis engines, and user interfaces (CLI and/or GUI)
Low-Level Data Collection: Develop robust methods to capture performance data directly from our custom ML accelerator hardware (e.g., hardware performance counters, execution unit status, memory access patterns) via driver interfaces or other mechanisms
Host & System Tracing: Implement tracing for host-side API calls (runtime libraries, driver interactions) and system-level events (CPU activity, PCIe traffic, memory usage, network contention) related to Sohu workloads
Data Correlation & Synchronization: Design and implement techniques to accurately correlate performance events across the host CPU, device driver, PCIe bus, multiple accelerators, and multiple hosts, ensuring precise time synchronization
Performance Analysis Engine: Build analysis modules to automatically interpret collected trace and counter data, identifying key performance limiters (e.g., compute-bound, memory bandwidth-bound, latency-bound, PCIe-bound, specific hardware bottlenecks)
Visualization & Reporting: Develop intuitive visualizations (timelines, dependency graphs, resource utilization charts, statistical summaries) to clearly communicate performance characteristics and bottlenecks to users
Collaboration & Support: Work closely with hardware architects, firmware engineers, driver developers, compiler engineers, and ML application engineers to understand their needs, define tool requirements, and provide expert guidance on performance analysis and optimization using the tool

What we offer

Medical, dental, and vision packages with generous premium coverage
$500 per month credit for waiving medical benefits
Housing subsidy of $2k per month for those living within walking distance of the office
Relocation support for those moving to San Jose (Santana Row)
Various wellness benefits covering fitness, mental health, and more
Daily lunch + dinner in our office

Fulltime

Software Development Engineer - Advanced Graphics Programs

At AMD, our mission is to build great products that accelerate next-generation c...

Location

Poland , Gdansk

Salary:

Not provided

AMD

Expiration Date

Until further notice

Requirements

Experience writing efficient high-level shader code such as HLSL SM6, GLSL, Slang, or similar, alongside modern C++
Knowledge of real-time rendering and graphics algorithms
Excellent written and verbal communication skills in English
Knowledge of applied mathematics, especially linear algebra, geometry, and trigonometry
Familiarity with modern game console and desktop GPU architectures
Understanding of low-level machine learning concepts and design patterns, including automatic differentiation, computational graphs, and tensor broadcasting
Experience working with modern machine learning libraries such as PyTorch or TensorFlow
Knowledge of physically based rendering algorithms, including sampling, shading, and light transport
Experience with modern graphics APIs such as DirectX 12 or Vulkan
Experience contributing to shipped AAA game titles is preferred

Job Responsibility

Collaborate with research engineers to transform proof-of-concept prototypes into robust, production-ready solutions with a high standard of quality
Partner with external game developers and internal AMD teams to integrate advanced graphics technologies into real-world applications and titles
Optimize, extend, package, and document high-level compute shader and modern C++ code for performance, scalability, and usability
Build a strong understanding of the team’s tools, workflows, and technology landscape in the first few months, while contributing to core engineering tasks
Within the first 6 to 12 months, take ownership of significant technical deliverables, help shape implementation direction, and contribute to the successful delivery of advanced graphics initiatives

Senior Software Engineer, CoreAI Workload Engines

The CoreAI Workloads team builds the foundational inference engines and APIs tha...

Location

United States , Redmond

Salary:

119800.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field and 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Python, or equivalent experience.
Proven ability to design and operate large-scale, production inference services with high reliability and performance requirements, and to ship performance improvements safely via disciplined experimentation.
Strong skills in performance analysis: benchmarking, profiling, diagnosing regressions, and turning results into concrete engine/runtime changes.
Strong problem-solving skills and the ability to debug complex, cross layer systems issues.
Demonstrated technical leadership, including mentoring engineers, driving cross-team architectural alignment, and leveraging AI tools and AI-assisted workflows to accelerate engineering velocity and quality.
Hands-on experience with Kubernetes (building and operating services on k8s), including debugging production issues and designing platform abstractions (e.g., custom resources/controllers) and scheduling-aware deployments (e.g., node affinity, taints/tolerations, resource requests/limits).
Strong collaboration and communication skills, with the ability to work across organizational boundaries.

Job Responsibility

Optimize inference engines for OpenAI and open-source models by implementing and shipping performance/efficiency improvements across runtime, scheduling, and serving paths (latency, throughput, utilization, availability, and cost).
Run experiments end-to-end: formulate hypotheses, implement engine changes (including Python/PyTorch integration points where relevant), analyze results, and ship improvements behind guardrails.
Build and use experimentation capabilities for large-scale AI inference (experiment lifecycle, tracking, metric modeling, comparability standards, automated analysis) so the team can iterate quickly and safely.
Own serving availability and efficiency for Azure OpenAI Service workloads through tiered experimentation, lean segmentation, and multi-modal utilization across heterogeneous fleets—turning findings into shipped engine improvements.
Design and evolve inference serving architectures to improve utilization and latency using techniques such as disaggregated serving, multi-token prediction, KV offload/retrieval, and quantization—validated via staged rollouts and production guardrails.
Extend AI infrastructure abstractions to support elastic, heterogeneous inference engines reliably at scale (e.g., dynamic scaling across model families, modalities, and workload classes while maintaining isolation and SLOs).
Tune and scale inference engines across NVIDIA GPU generations (A100, H100, H200) for state-of-the-art OpenAI models, focusing on serving efficiency, utilization, and reliability (not hardware bring-up).
Partner with networking and storage teams to leverage high-performance interconnects (e.g., RDMA/InfiniBand-class fabrics such as RoCE over IB) for distributed inference, without owning low-level kernel/driver enablement.
Drive end-to-end features from design through production: observability, diagnostics, performance regression detection, and operational excellence for inference serving.
Influence platform architecture and technical direction across teams through design reviews, clear metrics, and technical leadership focused on experimentation velocity and production reliability.

What we offer

Benefits and other compensation

Fulltime

Software Engineer with Russian

We are seeking a highly skilled and experienced Software Engineer to join our te...

Location

Poland , Warsaw

Salary:

189660.00 - 322940.00 PLN / Year

Citi

Expiration Date

Until further notice

Requirements

8+ years of relevant experience in engineering software applications or products
Proven experience in systems analysis and programming of software applications
Demonstrated success in managing and implementing software projects
Working knowledge of consulting and project management techniques/methods
Ability to work effectively under pressure and manage deadlines, as well as adapt to unexpected changes in expectations or requirements
Conversant with Continuous Integration/Continuous Delivery (CI/CD) practices
Languages & Frameworks: Java 8+, Spring Boot, Spring Core, Spring MVC, Spring Security, REST, Microservices
Databases: Experience with MSSQL & Oracle
Distributed Cache: Redis/Hazelcast
Messaging: Kafka/Active MQ, Tibco EMS, IBM MQ

Job Responsibility

Research, design, implement, and manage software programs, coordinating with stakeholders to ensure extensible low-level design with appropriate separation of concerns and abstractions
Write modular, extensible, readable, performant, and secured code, actively participating in code reviews
Prioritize application security by adhering to secure design architecture and established security standards and practices
Create technical solution artifacts, code review records, and deployment plans
Troubleshoot and resolve complex cross-component issues, including those identified during static analysis, penetration testing, or deployment, by identifying root causes and implementing effective solutions
Apply advanced language constructs, design principles, design patterns, libraries, frameworks, appropriate data structures, and performance/scalability concepts

What we offer

Employer paid Defined Contribution Pension Plan contribution of 6% of employee’s pensionable earnings (PPE Program)
Employer paid Private Medical Care Package for employees and Private Medical Care Packages for certain family members available at preferential rates
Employer paid Life Insurance Program for employees and Life Insurance for certain family members available at preferential rates
Employee Assistance Program financed by Employer
Paid Parental Leave Program (maternity and paternity leave
statutory and 2 weeks additional paid paternity leave)
Sport Card for employees subsidised via Social Benefits Fund and Sport Cards for certain family members available at preferential rates
Additional benefits from Company’s Social Benefit Fund, in particular: Holidays Allowance, support for sport and cultural activities, team building events
Additional day off for volunteering
Cafeteria/ flex benefit – a company benefits system which enables employees to select and purchase benefits offered by a provider and available for employees on the platform

Fulltime

Software Engineer

We are seeking a highly skilled and experienced Software Engineer to join our te...

Location

Ireland , Dublin

Salary:

71440.00 - 107160.00 EUR / Year

Citi

Expiration Date

Until further notice

Requirements

8+ years of relevant experience in engineering software applications or products
Proven experience in systems analysis and programming of software applications
Demonstrated success in managing and implementing software projects
Working knowledge of consulting and project management techniques/methods
Ability to work effectively under pressure and manage deadlines, as well as adapt to unexpected changes in expectations or requirements
Conversant with Continuous Integration/Continuous Delivery (CI/CD) practices
Java 8+, Spring Boot, Spring Core, Spring MVC, Spring Security, REST, Microservices
Experience with MSSQL & Oracle
Redis/Hazelcast
Kafka/Active MQ, Tibco EMS, IBM MQ

Job Responsibility

Research, design, implement, and manage software programs, coordinating with stakeholders to ensure extensible low-level design with appropriate separation of concerns and abstractions
Write modular, extensible, readable, performant, and secured code, actively participating in code reviews
Prioritize application security by adhering to secure design architecture and established security standards and practices
Create technical solution artifacts, code review records, and deployment plans
Troubleshoot and resolve complex cross-component issues, including those identified during static analysis, penetration testing, or deployment, by identifying root causes and implementing effective solutions
Apply advanced language constructs, design principles, design patterns, libraries, frameworks, appropriate data structures, and performance/scalability concepts

What we offer

Hybrid working model (up to 2 days working at home per week)
Competitive base salary (annually reviewed)
Additional benefits supporting well-being, living well, and saving well
Business casual workplace

Fulltime

Principal Software Engineer, CoreAI Workload Engines

The CoreAI Workloads team builds the foundational inference engines and APIs tha...

Location

United States , Redmond

Salary:

139900.00 - 331200.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field and 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Python, or equivalent experience
Proven ability to design and operate large-scale, production inference services with high reliability and performance requirements, and to ship performance improvements safely via disciplined experimentation
Strong skills in performance analysis: benchmarking, profiling, diagnosing regressions, and turning results into concrete engine/runtime changes
Strong problem-solving skills and the ability to debug complex, cross layer systems issues
Demonstrated technical leadership, including mentoring engineers, driving cross-team architectural alignment, and leveraging AI tools and AI-assisted workflows to accelerate engineering velocity and quality
Hands-on experience with Kubernetes (building and operating services on k8s), including debugging production issues and designing platform abstractions (e.g., custom resources/controllers) and scheduling-aware deployments (e.g., node affinity, taints/tolerations, resource requests/limits)
Strong collaboration and communication skills, with the ability to work across organizational boundaries

Job Responsibility

Optimize inference engines for OpenAI and open-source models by implementing and shipping performance/efficiency improvements across runtime, scheduling, and serving paths (latency, throughput, utilization, availability, and cost)
Run experiments end-to-end: formulate hypotheses, implement engine changes (including Python/PyTorch integration points where relevant), analyze results, and ship improvements behind guardrails
Build and use experimentation capabilities for large-scale AI inference (experiment lifecycle, tracking, metric modeling, comparability standards, automated analysis) so the team can iterate quickly and safely
Own serving availability and efficiency for Azure OpenAI Service workloads through tiered experimentation, lean segmentation, and multi-modal utilization across heterogeneous fleets—turning findings into shipped engine improvements
Design and evolve inference serving architectures to improve utilization and latency using techniques such as disaggregated serving, multi-token prediction, KV offload/retrieval, and quantization—validated via staged rollouts and production guardrails
Extend AI infrastructure abstractions to support elastic, heterogeneous inference engines reliably at scale (e.g., dynamic scaling across model families, modalities, and workload classes while maintaining isolation and SLOs)
Tune and scale inference engines across NVIDIA GPU generations (A100, H100, H200) for state-of-the-art OpenAI models, focusing on serving efficiency, utilization, and reliability (not hardware bring-up)
Partner with networking and storage teams to leverage high-performance interconnects (e.g., RDMA/InfiniBand-class fabrics such as RoCE over IB) for distributed inference, without owning low-level kernel/driver enablement
Drive end-to-end features from design through production: observability, diagnostics, performance regression detection, and operational excellence for inference serving
Influence platform architecture and technical direction across teams through design reviews, clear metrics, and technical leadership focused on experimentation velocity and production reliability

Fulltime

Software Development Engineer

As a core member of the team, you will play a pivotal role in optimizing and dev...

Location

China , Shanghai

Salary:

Not provided

AMD

Expiration Date

Until further notice

Requirements

Bachelor’s and/or Master’s Degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field
5+ years of professional experience in technical software development, with a focus on GPU optimization, performance engineering, and framework development
Skilled engineer with strong technical and analytical expertise in C++ development within Linux environments
Strong problem-solving skills, a proactive approach, and a keen understanding of software engineering best practices
Experience in GPU Kernel Development & Optimization for deep learning on AMD GPUs using HIP, CUDA, and assembly (ASM)
Strong knowledge of AMD architectures (GCN, RDNA) and low-level programming
Experience leveraging tools like Compute Kernel (CK), CUTLASS, and Triton for multi-GPU and multi-platform performance
Experience in Deep Learning Integration into machine learning frameworks (e.g., TensorFlow, PyTorch) to accelerate model training and inference
Skilled in Python and C++, with experience in debugging, performance tuning, and test design
Solid experience in running large-scale workloads on heterogeneous compute clusters

Job Responsibility

Optimize Deep Learning Frameworks: Enhance and optimize frameworks like TensorFlow and PyTorch for AMD GPUs in open-source repositories
Develop GPU Kernels: Create and optimize GPU kernels to maximize performance for specific AI operations
Develop & Optimize Models: Design and optimize deep learning models specifically for AMD GPU performance
Collaborate with GPU Library Teams: Work closely with internal teams to analyze and improve training and inference performance on AMD GPUs
Collaborate with Open-Source Maintainers: Engage with framework maintainers to ensure code changes are aligned with requirements and integrated upstream
Work in Distributed Computing Environments: Optimize deep learning performance on both scale-up (multi-GPU) and scale-out (multi-node) systems
Utilize Cutting-Edge Compiler Tech: Leverage advanced compiler technologies to improve deep learning performance
Optimize Deep Learning Pipeline: Enhance the full pipeline, including integrating graph compilers
Software Engineering Best Practices: Apply sound engineering principles to ensure robust, maintainable solutions

What we offer

Benefits offered are described: AMD benefits at a glance

Senior Software Engineer

The R&D of Search Ads aims to build an online advertising ecosystem of users, ad...

Location

China , Beijing

Salary:

Not provided

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Python, CUDA, or ROCm OR equivalent experience
3+ years' practical experience working on applications that use GPUs, experience in optimizing their performance
Practical Experience writing new GPU kernels, going beyond experience of GPU workloads with existing library kernels
Cross-team collaboration skills and the desire to collaborate in a team of researchers and developers
Bachelor's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C/C++, CUDA, or ROCm OR Master's Degree in Computer Science or related technical field AND 2+ years technical engineering experience with coding in languages including, but not limited to, C/C++, CUDA, or ROCm OR equivalent experience
Experience in low-level performance analysis and optimization, including proficiency using GPU profiling tools such as NVIDIA Visual Profiler, and NVIDIA Nsight Compute
Technical background and solid foundation in software engineering principles and architecture design
Familiar with inference optimization, experience in developing popular inference framework such as TensorRT-LLM, SGLang, vLLM
Exposure to Deep Neural Network inference and experience in one or more deep learning frameworks such as PyTorch, Tensorflow, or ONNX Runtime

Job Responsibility

Design, develop, and maintain high-performance software in C/C++ and Python, including GPU programming with CUDA, ROCm, or Triton
Optimize model inference and training pipelines for speed, throughput, memory efficiency, and cost across GPU platforms
Collaborate with platform teams to integrate and tune solutions on emerging accelerator stacks and rapidly evolving toolchains
Profile workloads end-to-end, identify bottlenecks, and implement kernel-level and system-level performance improvements
Partner with internal and external stakeholders to translate requirements into scalable performance features and optimizations for state-of-the-art models
Validate performance, stability, and correctness through benchmarking, automated testing, and production readiness reviews

Fulltime

Select Country

Performance Engineer, Low-Level Libraries

Job Description

Job Responsibility

Requirements

Nice to have

What we offer

Looking for more opportunities?