CrawlJobs Logo

Performance Engineer, Low-Level Libraries

United States, Bellevue 257000.00 USD / Year · Job Posted January 26, 2026
Apply Position
Job Link Share

Job Description

Are you committed to squeezing every drop of performance? Join Meta's Low-Level Libraries team and drive impact across our foundational infrastructure. We own the performance and Developer Experience (DevX) for critical C/C++ libraries like *folly*, *Jemalloc*, and GEMM (MLK, Aocl-Blas, etc.). Our work involves cutting-edge optimizations in domains such as memory management, concurrency, architecture-specific enablement, and AI frameworks. The result? Significant power savings and enabling new platforms at Meta.

Job Responsibility

  • Develop and optimize C/C++ libraries for Meta services–memory allocation, thread pools and work scheduling, thread synchronization and lockless data structures, highly performant collections, async processing and I/O, RPC, etc
  • Analyze resource utilization in server applications (CPU, GPU, memory, network, etc.), identify bottlenecks, scope out opportunities for improved resource utilization, and implement improvements, such as modifying core libraries to optimize Meta server workloads, implementing efficiency improvements in production code (e.g., change core data structures), or improving server utilization
  • Work with internal customers and partners to define requirements
  • Reflect requirements in the team roadmap and plan out execution

Requirements

  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • 7+ years of professional C/C++ experience
  • Knowledge of computer architecture, CPU and memory subsystem, and OS-level resource management
  • Experience using performance-profiling tools and optimizing native applications for execution-time and memory efficiency

Nice to have

  • Experience implementing and optimizing low-level libraries, such as memory management, threading, GEMM, data compression, or string processing
  • Knowledge of modern ISAs, such as x86 and ARM
  • Experience hand-tuning code, e.g., with loop optimizations, vectorization, parallelization, HW-architecture-specific optimizations
  • Experience developing operating-system kernels

What we offer

  • bonus
  • equity
  • benefits

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Performance Engineer, Low-Level Libraries

8 matching positions

Software Engineer - Performance Tools

Join our team as a Software Engineer - Performance Tools and take the lead in il...
Location
Location
United States , San Jose
Salary
Salary:
150000.00 - 275000.00 USD / Year
etched.com Logo
Etched
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong proficiency in C++ or Rust
  • Proficiency in Python is a plus
  • Deep understanding of computer architecture (CPU, GPU, accelerators), memory hierarchies (caches, DRAM), and interconnects (especially PCIe)
  • Proven experience in low-level performance analysis, profiling, and bottleneck identification on complex hardware systems (GPUs, CPUs, FPGAs, or custom ASICs)
  • Experience with performance analysis tools (e.g., NVIDIA Nsight, AMD uProf, Intel VTune, perf, Tracy, ETW)
  • Experience working close to hardware, potentially reading performance counters or interacting directly with device drivers
Job Responsibility
Job Responsibility
  • Tool Architecture & Design: Lead the design and architecture of a comprehensive performance analysis suite, including data collection mechanisms, data processing pipelines, analysis engines, and user interfaces (CLI and/or GUI)
  • Low-Level Data Collection: Develop robust methods to capture performance data directly from our custom ML accelerator hardware (e.g., hardware performance counters, execution unit status, memory access patterns) via driver interfaces or other mechanisms
  • Host & System Tracing: Implement tracing for host-side API calls (runtime libraries, driver interactions) and system-level events (CPU activity, PCIe traffic, memory usage, network contention) related to Sohu workloads
  • Data Correlation & Synchronization: Design and implement techniques to accurately correlate performance events across the host CPU, device driver, PCIe bus, multiple accelerators, and multiple hosts, ensuring precise time synchronization
  • Performance Analysis Engine: Build analysis modules to automatically interpret collected trace and counter data, identifying key performance limiters (e.g., compute-bound, memory bandwidth-bound, latency-bound, PCIe-bound, specific hardware bottlenecks)
  • Visualization & Reporting: Develop intuitive visualizations (timelines, dependency graphs, resource utilization charts, statistical summaries) to clearly communicate performance characteristics and bottlenecks to users
  • Collaboration & Support: Work closely with hardware architects, firmware engineers, driver developers, compiler engineers, and ML application engineers to understand their needs, define tool requirements, and provide expert guidance on performance analysis and optimization using the tool
What we offer
What we offer
  • Medical, dental, and vision packages with generous premium coverage
  • $500 per month credit for waiving medical benefits
  • Housing subsidy of $2k per month for those living within walking distance of the office
  • Relocation support for those moving to San Jose (Santana Row)
  • Various wellness benefits covering fitness, mental health, and more
  • Daily lunch + dinner in our office
  • Fulltime
Read More
Arrow Right

Software Development Engineer - Advanced Graphics Programs

At AMD, our mission is to build great products that accelerate next-generation c...
Location
Location
Poland , Gdansk
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience writing efficient high-level shader code such as HLSL SM6, GLSL, Slang, or similar, alongside modern C++
  • Knowledge of real-time rendering and graphics algorithms
  • Excellent written and verbal communication skills in English
  • Knowledge of applied mathematics, especially linear algebra, geometry, and trigonometry
  • Familiarity with modern game console and desktop GPU architectures
  • Understanding of low-level machine learning concepts and design patterns, including automatic differentiation, computational graphs, and tensor broadcasting
  • Experience working with modern machine learning libraries such as PyTorch or TensorFlow
  • Knowledge of physically based rendering algorithms, including sampling, shading, and light transport
  • Experience with modern graphics APIs such as DirectX 12 or Vulkan
  • Experience contributing to shipped AAA game titles is preferred
Job Responsibility
Job Responsibility
  • Collaborate with research engineers to transform proof-of-concept prototypes into robust, production-ready solutions with a high standard of quality
  • Partner with external game developers and internal AMD teams to integrate advanced graphics technologies into real-world applications and titles
  • Optimize, extend, package, and document high-level compute shader and modern C++ code for performance, scalability, and usability
  • Build a strong understanding of the team’s tools, workflows, and technology landscape in the first few months, while contributing to core engineering tasks
  • Within the first 6 to 12 months, take ownership of significant technical deliverables, help shape implementation direction, and contribute to the successful delivery of advanced graphics initiatives
Read More
Arrow Right

Senior Software Engineer, CoreAI Workload Engines

The CoreAI Workloads team builds the foundational inference engines and APIs tha...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field and 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Python, or equivalent experience.
  • Proven ability to design and operate large-scale, production inference services with high reliability and performance requirements, and to ship performance improvements safely via disciplined experimentation.
  • Strong skills in performance analysis: benchmarking, profiling, diagnosing regressions, and turning results into concrete engine/runtime changes.
  • Strong problem-solving skills and the ability to debug complex, cross layer systems issues.
  • Demonstrated technical leadership, including mentoring engineers, driving cross-team architectural alignment, and leveraging AI tools and AI-assisted workflows to accelerate engineering velocity and quality.
  • Hands-on experience with Kubernetes (building and operating services on k8s), including debugging production issues and designing platform abstractions (e.g., custom resources/controllers) and scheduling-aware deployments (e.g., node affinity, taints/tolerations, resource requests/limits).
  • Strong collaboration and communication skills, with the ability to work across organizational boundaries.
Job Responsibility
Job Responsibility
  • Optimize inference engines for OpenAI and open-source models by implementing and shipping performance/efficiency improvements across runtime, scheduling, and serving paths (latency, throughput, utilization, availability, and cost).
  • Run experiments end-to-end: formulate hypotheses, implement engine changes (including Python/PyTorch integration points where relevant), analyze results, and ship improvements behind guardrails.
  • Build and use experimentation capabilities for large-scale AI inference (experiment lifecycle, tracking, metric modeling, comparability standards, automated analysis) so the team can iterate quickly and safely.
  • Own serving availability and efficiency for Azure OpenAI Service workloads through tiered experimentation, lean segmentation, and multi-modal utilization across heterogeneous fleets—turning findings into shipped engine improvements.
  • Design and evolve inference serving architectures to improve utilization and latency using techniques such as disaggregated serving, multi-token prediction, KV offload/retrieval, and quantization—validated via staged rollouts and production guardrails.
  • Extend AI infrastructure abstractions to support elastic, heterogeneous inference engines reliably at scale (e.g., dynamic scaling across model families, modalities, and workload classes while maintaining isolation and SLOs).
  • Tune and scale inference engines across NVIDIA GPU generations (A100, H100, H200) for state-of-the-art OpenAI models, focusing on serving efficiency, utilization, and reliability (not hardware bring-up).
  • Partner with networking and storage teams to leverage high-performance interconnects (e.g., RDMA/InfiniBand-class fabrics such as RoCE over IB) for distributed inference, without owning low-level kernel/driver enablement.
  • Drive end-to-end features from design through production: observability, diagnostics, performance regression detection, and operational excellence for inference serving.
  • Influence platform architecture and technical direction across teams through design reviews, clear metrics, and technical leadership focused on experimentation velocity and production reliability.
What we offer
What we offer
  • Benefits and other compensation
  • Fulltime
Read More
Arrow Right

Software Engineer with Russian

We are seeking a highly skilled and experienced Software Engineer to join our te...
Location
Location
Poland , Warsaw
Salary
Salary:
189660.00 - 322940.00 PLN / Year
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of relevant experience in engineering software applications or products
  • Proven experience in systems analysis and programming of software applications
  • Demonstrated success in managing and implementing software projects
  • Working knowledge of consulting and project management techniques/methods
  • Ability to work effectively under pressure and manage deadlines, as well as adapt to unexpected changes in expectations or requirements
  • Conversant with Continuous Integration/Continuous Delivery (CI/CD) practices
  • Languages & Frameworks: Java 8+, Spring Boot, Spring Core, Spring MVC, Spring Security, REST, Microservices
  • Databases: Experience with MSSQL & Oracle
  • Distributed Cache: Redis/Hazelcast
  • Messaging: Kafka/Active MQ, Tibco EMS, IBM MQ
Job Responsibility
Job Responsibility
  • Research, design, implement, and manage software programs, coordinating with stakeholders to ensure extensible low-level design with appropriate separation of concerns and abstractions
  • Write modular, extensible, readable, performant, and secured code, actively participating in code reviews
  • Prioritize application security by adhering to secure design architecture and established security standards and practices
  • Create technical solution artifacts, code review records, and deployment plans
  • Troubleshoot and resolve complex cross-component issues, including those identified during static analysis, penetration testing, or deployment, by identifying root causes and implementing effective solutions
  • Apply advanced language constructs, design principles, design patterns, libraries, frameworks, appropriate data structures, and performance/scalability concepts
What we offer
What we offer
  • Employer paid Defined Contribution Pension Plan contribution of 6% of employee’s pensionable earnings (PPE Program)
  • Employer paid Private Medical Care Package for employees and Private Medical Care Packages for certain family members available at preferential rates
  • Employer paid Life Insurance Program for employees and Life Insurance for certain family members available at preferential rates
  • Employee Assistance Program financed by Employer
  • Paid Parental Leave Program (maternity and paternity leave
  • statutory and 2 weeks additional paid paternity leave)
  • Sport Card for employees subsidised via Social Benefits Fund and Sport Cards for certain family members available at preferential rates
  • Additional benefits from Company’s Social Benefit Fund, in particular: Holidays Allowance, support for sport and cultural activities, team building events
  • Additional day off for volunteering
  • Cafeteria/ flex benefit – a company benefits system which enables employees to select and purchase benefits offered by a provider and available for employees on the platform
  • Fulltime
Read More
Arrow Right

Software Engineer

We are seeking a highly skilled and experienced Software Engineer to join our te...
Location
Location
Ireland , Dublin
Salary
Salary:
71440.00 - 107160.00 EUR / Year
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of relevant experience in engineering software applications or products
  • Proven experience in systems analysis and programming of software applications
  • Demonstrated success in managing and implementing software projects
  • Working knowledge of consulting and project management techniques/methods
  • Ability to work effectively under pressure and manage deadlines, as well as adapt to unexpected changes in expectations or requirements
  • Conversant with Continuous Integration/Continuous Delivery (CI/CD) practices
  • Java 8+, Spring Boot, Spring Core, Spring MVC, Spring Security, REST, Microservices
  • Experience with MSSQL & Oracle
  • Redis/Hazelcast
  • Kafka/Active MQ, Tibco EMS, IBM MQ
Job Responsibility
Job Responsibility
  • Research, design, implement, and manage software programs, coordinating with stakeholders to ensure extensible low-level design with appropriate separation of concerns and abstractions
  • Write modular, extensible, readable, performant, and secured code, actively participating in code reviews
  • Prioritize application security by adhering to secure design architecture and established security standards and practices
  • Create technical solution artifacts, code review records, and deployment plans
  • Troubleshoot and resolve complex cross-component issues, including those identified during static analysis, penetration testing, or deployment, by identifying root causes and implementing effective solutions
  • Apply advanced language constructs, design principles, design patterns, libraries, frameworks, appropriate data structures, and performance/scalability concepts
What we offer
What we offer
  • Hybrid working model (up to 2 days working at home per week)
  • Competitive base salary (annually reviewed)
  • Additional benefits supporting well-being, living well, and saving well
  • Business casual workplace
  • Fulltime
Read More
Arrow Right

Principal Software Engineer, CoreAI Workload Engines

The CoreAI Workloads team builds the foundational inference engines and APIs tha...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 331200.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field and 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Python, or equivalent experience
  • Proven ability to design and operate large-scale, production inference services with high reliability and performance requirements, and to ship performance improvements safely via disciplined experimentation
  • Strong skills in performance analysis: benchmarking, profiling, diagnosing regressions, and turning results into concrete engine/runtime changes
  • Strong problem-solving skills and the ability to debug complex, cross layer systems issues
  • Demonstrated technical leadership, including mentoring engineers, driving cross-team architectural alignment, and leveraging AI tools and AI-assisted workflows to accelerate engineering velocity and quality
  • Hands-on experience with Kubernetes (building and operating services on k8s), including debugging production issues and designing platform abstractions (e.g., custom resources/controllers) and scheduling-aware deployments (e.g., node affinity, taints/tolerations, resource requests/limits)
  • Strong collaboration and communication skills, with the ability to work across organizational boundaries
Job Responsibility
Job Responsibility
  • Optimize inference engines for OpenAI and open-source models by implementing and shipping performance/efficiency improvements across runtime, scheduling, and serving paths (latency, throughput, utilization, availability, and cost)
  • Run experiments end-to-end: formulate hypotheses, implement engine changes (including Python/PyTorch integration points where relevant), analyze results, and ship improvements behind guardrails
  • Build and use experimentation capabilities for large-scale AI inference (experiment lifecycle, tracking, metric modeling, comparability standards, automated analysis) so the team can iterate quickly and safely
  • Own serving availability and efficiency for Azure OpenAI Service workloads through tiered experimentation, lean segmentation, and multi-modal utilization across heterogeneous fleets—turning findings into shipped engine improvements
  • Design and evolve inference serving architectures to improve utilization and latency using techniques such as disaggregated serving, multi-token prediction, KV offload/retrieval, and quantization—validated via staged rollouts and production guardrails
  • Extend AI infrastructure abstractions to support elastic, heterogeneous inference engines reliably at scale (e.g., dynamic scaling across model families, modalities, and workload classes while maintaining isolation and SLOs)
  • Tune and scale inference engines across NVIDIA GPU generations (A100, H100, H200) for state-of-the-art OpenAI models, focusing on serving efficiency, utilization, and reliability (not hardware bring-up)
  • Partner with networking and storage teams to leverage high-performance interconnects (e.g., RDMA/InfiniBand-class fabrics such as RoCE over IB) for distributed inference, without owning low-level kernel/driver enablement
  • Drive end-to-end features from design through production: observability, diagnostics, performance regression detection, and operational excellence for inference serving
  • Influence platform architecture and technical direction across teams through design reviews, clear metrics, and technical leadership focused on experimentation velocity and production reliability
  • Fulltime
Read More
Arrow Right

Software Development Engineer

As a core member of the team, you will play a pivotal role in optimizing and dev...
Location
Location
China , Shanghai
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s and/or Master’s Degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field
  • 5+ years of professional experience in technical software development, with a focus on GPU optimization, performance engineering, and framework development
  • Skilled engineer with strong technical and analytical expertise in C++ development within Linux environments
  • Strong problem-solving skills, a proactive approach, and a keen understanding of software engineering best practices
  • Experience in GPU Kernel Development & Optimization for deep learning on AMD GPUs using HIP, CUDA, and assembly (ASM)
  • Strong knowledge of AMD architectures (GCN, RDNA) and low-level programming
  • Experience leveraging tools like Compute Kernel (CK), CUTLASS, and Triton for multi-GPU and multi-platform performance
  • Experience in Deep Learning Integration into machine learning frameworks (e.g., TensorFlow, PyTorch) to accelerate model training and inference
  • Skilled in Python and C++, with experience in debugging, performance tuning, and test design
  • Solid experience in running large-scale workloads on heterogeneous compute clusters
Job Responsibility
Job Responsibility
  • Optimize Deep Learning Frameworks: Enhance and optimize frameworks like TensorFlow and PyTorch for AMD GPUs in open-source repositories
  • Develop GPU Kernels: Create and optimize GPU kernels to maximize performance for specific AI operations
  • Develop & Optimize Models: Design and optimize deep learning models specifically for AMD GPU performance
  • Collaborate with GPU Library Teams: Work closely with internal teams to analyze and improve training and inference performance on AMD GPUs
  • Collaborate with Open-Source Maintainers: Engage with framework maintainers to ensure code changes are aligned with requirements and integrated upstream
  • Work in Distributed Computing Environments: Optimize deep learning performance on both scale-up (multi-GPU) and scale-out (multi-node) systems
  • Utilize Cutting-Edge Compiler Tech: Leverage advanced compiler technologies to improve deep learning performance
  • Optimize Deep Learning Pipeline: Enhance the full pipeline, including integrating graph compilers
  • Software Engineering Best Practices: Apply sound engineering principles to ensure robust, maintainable solutions
What we offer
What we offer
  • Benefits offered are described: AMD benefits at a glance
Read More
Arrow Right

Senior Software Engineer

The R&D of Search Ads aims to build an online advertising ecosystem of users, ad...
Location
Location
China , Beijing
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Python, CUDA, or ROCm OR equivalent experience
  • 3+ years' practical experience working on applications that use GPUs, experience in optimizing their performance
  • Practical Experience writing new GPU kernels, going beyond experience of GPU workloads with existing library kernels
  • Cross-team collaboration skills and the desire to collaborate in a team of researchers and developers
  • Bachelor's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C/C++, CUDA, or ROCm OR Master's Degree in Computer Science or related technical field AND 2+ years technical engineering experience with coding in languages including, but not limited to, C/C++, CUDA, or ROCm OR equivalent experience
  • Experience in low-level performance analysis and optimization, including proficiency using GPU profiling tools such as NVIDIA Visual Profiler, and NVIDIA Nsight Compute
  • Technical background and solid foundation in software engineering principles and architecture design
  • Familiar with inference optimization, experience in developing popular inference framework such as TensorRT-LLM, SGLang, vLLM
  • Exposure to Deep Neural Network inference and experience in one or more deep learning frameworks such as PyTorch, Tensorflow, or ONNX Runtime
Job Responsibility
Job Responsibility
  • Design, develop, and maintain high-performance software in C/C++ and Python, including GPU programming with CUDA, ROCm, or Triton
  • Optimize model inference and training pipelines for speed, throughput, memory efficiency, and cost across GPU platforms
  • Collaborate with platform teams to integrate and tune solutions on emerging accelerator stacks and rapidly evolving toolchains
  • Profile workloads end-to-end, identify bottlenecks, and implement kernel-level and system-level performance improvements
  • Partner with internal and external stakeholders to translate requirements into scalable performance features and optimizations for state-of-the-art models
  • Validate performance, stability, and correctness through benchmarking, automated testing, and production readiness reviews
  • Fulltime
Read More
Arrow Right