CrawlJobs Logo

Systems Research Engineer, GPU Programming

United States, San Francisco 160000.00 - 230000.00 USD / Year · Job Posted February 18, 2026
Apply Position
Job Link Share

Job Description

As a Systems Research Engineer specialized in GPU Programming, you will play a crucial role in developing and optimizing GPU-accelerated kernels and algorithms for ML/AI applications. Working closely with the modeling and algorithm team, you will co-design GPU kernels and model architecture to enhance the performance and efficiency of our AI systems. Collaborating with the hardware and software teams, you will contribute to the co-design of efficient GPU architectures and programming models, leveraging your expertise in GPU programming and parallel computing. Your research skills will be vital in staying up-to-date with the latest advancements in GPU programming techniques, ensuring that our AI infrastructure remains at the forefront of innovation.

Job Responsibility

  • Optimize and fine-tune GPU code to achieve better performance and scalability
  • Collaborate with cross-functional teams to integrate GPU-accelerated solutions into existing software systems
  • Stay up-to-date with the latest advancements in GPU programming techniques and technologies

Requirements

  • Strong background in GPU programming and parallel computing, such as CUDA and/or Triton
  • Knowledge of ML/AI applications and models
  • Knowledge of performance profiling and optimization tools for GPU programming
  • Excellent problem-solving and analytical skills
  • Bachelor's, Master's, or Ph.D. degree in Computer Science, Electrical Engineering, or equivalent practical experiences

What we offer

  • competitive compensation
  • startup equity
  • health insurance
  • flexibility in terms of remote work

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Systems Research Engineer, GPU Programming

8 matching positions

Software Engineer, Systems ML - Compilers / Backend

We are seeking a software engineer to support the development of the compiler to...
Location
Location
United States , Sunnyvale
Salary
Salary:
181000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Currently has, or is in the process of obtaining a Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience. Degree must be completed prior to joining Meta
  • Experience in software design and programming experience in Python and/or C/C++ for development, debugging, testing and performance analysis
  • Experience in AI framework development or accelerating models on hardware architectures (GPU, TPU, custom AI ASICs)
  • Must obtain work authorization in the country of employment at the time of hire, and maintain ongoing work authorization during employment
Job Responsibility
Job Responsibility
  • Analyze and design effective compiler passes and optimizations. Implement and/or enhance code generation targeting machine learning accelerators
  • Work with algorithm research teams to support the co-design of hardware features mapping ML graphs to hardware implementations, modeling data-flows, creating cost-benefit analysis and estimating silicon power and performance
  • Work with hardware architects to co-design hardware features that maximize performance, power efficiency and programmability
  • Contribute to the development of machine-learning libraries, intermediate representations, export formats, and analysis tools
  • Collaborate with the team to enhance the efficiency, scalability, and stability of our toolchains by focusing on kernel optimization and tuning
  • Conduct design and code reviews. Evaluate code performance, debug, diagnose and drive resolution of compiler and cross-disciplinary system issues
  • Interface with other compiler-focused teams to evaluate and incorporate their innovations and vice versa
What we offer
What we offer
  • bonus
  • equity
  • benefits
Read More
Arrow Right

Staff Software Engineer, GPU Infrastructure (HPC)

The internal infrastructure team is responsible for building world-class infrast...
Location
Location
Salary
Salary:
Not provided
cohere.com Logo
Cohere
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Deep expertise in ML/HPC infrastructure: Experience with GPU/TPU clusters, distributed training frameworks (JAX, PyTorch, TensorFlow), and high-performance computing (HPC) environments
  • Kubernetes at scale: Proven ability to deploy, manage, and troubleshoot cloud-native Kubernetes clusters for AI workloads
  • Strong programming skills: Proficiency in Python (for ML tooling) and Go (for systems engineering), with a preference for open-source contributions over reinventing solutions
  • Low-level systems knowledge: Familiarity with Linux internals, RDMA networking, and performance optimization for ML workloads
  • Research collaboration experience: A track record of working closely with AI researchers or ML engineers to solve infrastructure challenges
  • Self-directed problem-solving: The ability to identify bottlenecks, propose solutions, and drive impact in a fast-paced environment
Job Responsibility
Job Responsibility
  • Build and scale ML-optimized HPC infrastructure: Deploy and manage Kubernetes-based GPU/TPU superclusters across multiple clouds, ensuring high throughput and low-latency performance for AI workloads
  • Optimize for AI/ML training: Collaborate with cloud providers to fine-tune infrastructure for cost efficiency, reliability, and performance, leveraging technologies like RDMA, NCCL, and high-speed interconnects
  • Troubleshoot and resolve complex issues: Proactively identify and resolve infrastructure bottlenecks, performance degradation, and system failures to ensure minimal disruption to AI/ML workflows
  • Enable researchers with self-service tools: Design intuitive interfaces and workflows that allow researchers to monitor, debug, and optimize their training jobs independently
  • Drive innovation in ML infrastructure: Work closely with AI researchers to understand emerging needs (e.g., JAX, PyTorch, distributed training) and translate them into robust, scalable infrastructure solutions
  • Champion best practices: Advocate for observability, automation, and infrastructure-as-code (IaC) across the organization, ensuring systems are maintainable and resilient
  • Mentorship and collaboration: Share expertise through code reviews, documentation, and cross-team collaboration, fostering a culture of knowledge transfer and engineering excellence
What we offer
What we offer
  • An open and inclusive culture and work environment
  • Work closely with a team on the cutting edge of AI research
  • Weekly lunch stipend, in-office lunches & snacks
  • Full health and dental benefits, including a separate budget to take care of your mental health
  • 100% Parental Leave top-up for up to 6 months
  • Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
  • Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
  • 6 weeks of vacation (30 working days!)
  • Fulltime
Read More
Arrow Right

Software Engineer, Systems ML - Compilers / Backend

We are seeking a software engineer to support the development of the compiler to...
Location
Location
United States , Sunnyvale
Salary
Salary:
217000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • 2+ years experience developing compilers, toolchains, runtime, or similar code optimization software
  • Experience in software design and programming experience in Python and/or C/C++ for development, debugging, testing and performance analysis
  • Experience in AI framework development or accelerating models on hardware architectures (GPU, TPU, custom AI ASICs)
Job Responsibility
Job Responsibility
  • Analyze and design effective compiler passes and optimizations. Implement and/or enhance code generation targeting machine learning accelerators
  • Work with algorithm research teams to map ML graphs to hardware implementations, model data-flows, create cost-benefit analysis and estimate silicon power and performance
  • Work with hardware architects to co-design hardware features that maximize performance, power efficiency and programmability
  • Contribute to the development of machine-learning libraries, intermediate representations, export formats, and analysis tools
  • Analyze and improve the efficiency, scalability, and stability of our toolchains. Optimize and tune kernels and compiled code to achieve latency targets for ML inference
  • Conduct design and code reviews. Evaluate code performance, debug, diagnose and drive resolution of compiler and cross-disciplinary system issues
  • Interface with other compiler-focused teams to evaluate and incorporate their innovations and vice versa
What we offer
What we offer
  • bonus
  • equity
  • benefits
Read More
Arrow Right

Principal Research Engineer - Agent 365

Copilot usage is growing rapidly across Microsoft 365 and custom agent experienc...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Architect and deliver AI systems across model development, data, infra, evaluation, and deployment spanning multiple product lines
  • Set technical direction for large programs
  • drive alignment across Research, Engineering, and Product
  • Integrate LLMs, multimodal models, multi-agent architectures, and RAG into Microsoft’s ecosystem
  • Establish standards for MLOps, governance, and Responsible AI, compliant with Microsoft principles and industry standards
  • Drive original research and thought leadership (whitepapers, internal notes, patents)
  • convert insights into shipped capabilities
  • Research Translation: Continuously review emerging work
  • identify high-potential methods and adapt them to Microsoft problem spaces
  • Production Integration: Turn research prototypes into production-quality code optimized for scale, latency, and maintainability
  • Fulltime
Read More
Arrow Right

Senior Research Engineer

We are seeking a highly skilled Senior Research Engineer to collaborate closely ...
Location
Location
United States
Salary
Salary:
210000.00 - 309000.00 USD / Year
assembly.ai Logo
Assembly
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong expertise in the Python ecosystem and major ML frameworks (PyTorch, JAX)
  • Experience with lower-level programming (C++ or Rust preferred)
  • Deep understanding of GPU acceleration (CUDA, profiling, kernel-level optimization)
  • TPU experience is a strong plus
  • Proven ability to accelerate deep learning workloads using compiler frameworks, graph optimizations, and parallelization strategies
  • Solid understanding of the deep learning lifecycle: model design, large-scale training, data processing pipelines, and inference deployment
  • Strong debugging, profiling, and optimization skills in large-scale distributed environments
  • Excellent communication and collaboration skills, with the ability to clearly prioritize and articulate impact-driven technical solutions
Job Responsibility
Job Responsibility
  • Investigate and mitigate performance bottlenecks in large-scale distributed training and inference systems
  • Develop and implement both low-level (operator/kernel) and high-level (system/architecture) optimization strategies
  • Translate research models and prototypes into highly optimized, production-ready inference systems
  • Explore and integrate inference compilers such as TensorRT, ONNX Runtime, AWS Neuron and Inferentia, or similar technologies
  • Design, test, and deploy scalable solutions for parallel and distributed workloads on heterogeneous hardware
  • Facilitate knowledge transfer and bidirectional support between Research and Engineering teams, ensuring alignment of priorities and solutions
What we offer
What we offer
  • competitive equity grants
  • 100% employer-paid benefits
  • flexibility of being fully remote
  • Fulltime
Read More
Arrow Right

Geoint Systems Engineer

Reinventing Geospatial (RGi) is a leading expert in geospatial solutions for Def...
Location
Location
United States , Aberdeen Proving Grounds; Alexandria
Salary
Salary:
Not provided
rgi-corp.com Logo
Reinventing Geospatial
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Active Top Secret clearance with an ability to obtain SCI access and willingness to obtain CI Polygraph
  • US Citizenship Required
  • Experience with installation, configuration, security hardening, operation, maintenance, and troubleshooting of: Windows operating systems (Server and Desktop environments), Linux operating systems (RHEL, CentOS, Ubuntu, or similar distributions)
  • Proficiency in managing and troubleshooting enterprise software including: Web servers (Apache, Nginx, IIS), Database systems (PostgreSQL, SQL Server, MySQL, Oracle), Web applications and services Middleware and application servers
  • Strong scripting and automation capabilities with knowledge of: General programming paradigms including data types, control flow structures, and logic constructs, PowerShell, Python, Bash/Shell scripting experience
  • Experience with REST API technologies including: Understanding of HTTP methods (GET, POST, PUT, DELETE, PATCH) and the ability to automate API interactions for system integration and operations, JSON/XML data handling
  • Comprehensive understanding of networking fundamentals: Network protocols (TCP, UDP, multicast, unicast), File sharing protocols (SMB, NFS), IP addressing schemes (IPv4/IPv6) and subnet calculations, Routing concepts and implementation, OSI model and troubleshooting methodology
  • Experience with network troubleshooting tools and techniques
  • Knowledge of system hardware architecture for selection, suitability analysis, operation, and troubleshooting: RAID configurations (0, 1, 5, 6, 10), HDD vs. SSD performance characteristics, SAN architecture and management, CPU architectures and performance considerations, RAM capacity and speed requirements, GPU capabilities for geospatial processing workloads
  • Ability to perform hardware capacity planning and performance optimization
Job Responsibility
Job Responsibility
  • Support the installation, configuration, operation, and maintenance of geospatial software systems
  • Utilize technical expertise across operating systems, enterprise applications, automation technologies, and hardware infrastructure to ensure mission-critical geospatial capabilities remain operational and secure
  • Analyze system capabilities with AGE and COE compliance requirements and identify gaps
  • Maintain functional specifications that define essential technical requirements of Legacy DCGS-A, IS&A, Mission Command, and COE CPCE
  • Maintain system engineering documentation including the System Engineering Plan, Software Requirements Traceability Matrix
  • Cross reference mapping of GEOINT functional specifications to Intelligence or Mission Command Systems specifications and program-level documents, such as the Capabilities Production Document (CPD), Information Systems Interface Control Document (IS-ICD), and Requirements Definition Package (RDP)
  • Interact with systems users to translate their requirements into systems, hardware, and software requirements and design
  • Plan and perform engineering research, design development, and other assignments in conformance with design, engineering and customer specifications
  • Lead team of engineers through project completion
  • responsible for major technical/engineering projects of higher complexity
What we offer
What we offer
  • 100% paid employee healthcare & dental insurance
  • Paid parental leave
  • 401k with matching
  • Escalating vacation time
  • Referral bonuses
  • Tuition reimbursement
  • Professional development training
  • Free beverages and snacks
  • Weekly catered lunches and breakfast on Fridays
  • Fulltime
Read More
Arrow Right

Research Intern - Systems For Efficient AI

Research Internships at Microsoft provide a dynamic environment for research car...
Location
Location
United States , Redmond
Salary
Salary:
6710.00 - 13270.00 USD / Month
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Accepted or currently enrolled in a PhD program in Computer Science, Software Engineering, Electrical Engineering, or a related STEM field
  • Experience with LLM architectures, systems for LLM inference, and/or AI hardware
  • Experience with GPUs and understanding of CUDA/ROCm frameworks
  • Experience with computer systems and/or networks
  • Experience in conducting research and writing peer-reviewed publications
  • Proficient written and verbal communication skills
  • Be able to work in a cross-functional and multi-disciplinary setting across research and product
  • Proficient software development skills, preferably in C++ and Python
Job Responsibility
Job Responsibility
  • Research Interns put inquiry and theory into practice
  • Learn, collaborate, and network for life
  • Advance their own careers and contribute to exciting research and development strides
  • Paired with mentors and expected to collaborate with other Research Interns and researchers
  • Present findings
  • Contribute to the vibrant life of the community
  • Fulltime
Read More
Arrow Right

Founding GPU Compiler Engineer

We're hiring a Founding GPU Compiler Engineer to build the core compilation infr...
Location
Location
United States , San Francisco
Salary
Salary:
285000.00 - 315000.00 USD / Year
workatastartup.com Logo
YC Work at a Startup
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Deep experience with compiler infrastructure (LLVM, MLIR, or similar)
  • Strong background in GPU architecture and low-level optimization (CUDA, ROCm, or equivalent)
  • Hands-on experience with at least one of: PTX/SASS, GCN/RDNA assembly, or other GPU ISAs
  • Familiarity with ML compiler stacks (XLA, TVM, Triton, torch.compile, or similar)
  • Solid systems programming skills in C++ and/or Rust
  • Proven track record of building production-grade compiler infrastructure
Job Responsibility
Job Responsibility
  • Design and implement the main compilation pipeline, from StableHLO to executable GPU and host binaries
  • Build and extend MLIR dialects and passes to optimize AI workloads
  • Develop backend code generation for multiple targets (NVIDIA PTX/SASS, AMD GCN/RDNA, Trainium, TPU)
  • Implement classic compiler optimizations customized for large-scale training (fusion, tiling, memory planning, scheduling)
  • Build search-based compiler infrastructure to explore different optimization options
  • Create hybrid codegen paths for cases where direct MLIR lowering isn't practical
  • Set up testing, benchmarking, and performance regression systems
  • Work closely with ML researchers to understand workload characteristics and find optimization opportunities
What we offer
What we offer
  • bonus
  • equity
  • benefits
  • relocation assistance
  • Fulltime
Read More
Arrow Right