CrawlJobs Logo

Software Engineer (Technical Leadership) - Kernel

United States, Menlo Park 219000.00 - 301000.00 USD / Year · Job Posted January 23, 2026
Apply Position
Job Link Share

Job Description

At Meta, we're building and operating one of the world's most dynamic and fast-paced networks, powering our global data centers and supporting cutting-edge technologies like AI, Generative AI, Recommendation engines, and Metaverse. Our network infrastructure teams are responsible for developing, deploying, and operating this complex system, covering the entire network lifecycle from hardware development to operation. We're seeking software engineers with proven experience to join our teams and help build scalable distributed systems, develop innovative solutions to our challenges, and ship them into production. As part of our network engineering teams, you'll have the opportunity to work on cutting-edge switching technology, collaborate with talented engineers, and contribute to the development of Meta's hyper-scale network infrastructure. The Kernel team supports the Linux kernel used in Meta's production infrastructure. Our work advances Meta infrastructure projects through innovation and leadership in the open source community. Our engineers have the unique opportunity to build scope and influence internally at Meta and also through collaboration with our peers in the industry. The kernel team works on tasks like: -Creating custom kernel changes for internal needs -Merging upstream changes into the Meta Linux Kernel -Working with the Linux community outside of Meta to develop features and fix bugs -Investigating Linux-related performance issues and failures -Periodically building and initial testing of Meta's new kernel rpms -Creating tooling to assist with Kernel development

Job Responsibility

  • Design, develop, and validate Linux Kernel and userspace software
  • Debug complex system-level issues and lead performance tuning exercises to optimize software stack performance
  • Understand software components from multiple partner teams, lead integration efforts, and drive continued development
  • Develop and automate test suites for CI/CD framework and various components
  • Collaborate with partner teams to integrate software components, align on goals, and participate in oncall rotations
  • Participate in multiple open source communities through patch review, conferences, and discussions

Requirements

  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • 10+ years software development experience in industry settings or PhD with 4+ years of experience
  • 3+ years relevant experience with Linux kernel, firmware, or other low level systems programming
  • Proficiency in C/C++ and at least one scripting language (Python/Shell Scripting)
  • Experience leading projects with industry-wide impact
  • Vast experience communicating and working across functions to drive solutions
  • Significant experience in mentoring/influencing experienced engineers across organizations
  • Proven track record of planning multi-year roadmap in which shorter-term projects ladder to the long term mission
  • Experience in driving large cross-functional/industry-wide engineering efforts

Nice to have

  • Active contributor to the Linux Kernel, Systemd or other relevant open source projects
  • Working knowledge of virtualization, CPU scheduling, memory management, filesystems, or eBPF
  • Experience in hardware driver development and debugging

What we offer

  • bonus
  • equity
  • benefits

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Software Engineer (Technical Leadership) - Kernel

8 matching positions

New

Staff Software Engineer : Storage, Search, & Data Platforms

The Storage, Search, and Data (SSD) group is the custodian of Uber's digital int...
Location
Location
United States , Seattle; San Francisco; Sunnyvale
Salary
Salary:
232000.00 - 258000.00 USD / Year
uber.com Logo
Uber
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 12+ years of software engineering experience, with a proven history of designing and operating massive-scale distributed data systems
  • Elite engineering skills in Go, Java, C++, or Rust. You are comfortable deep-diving into database internals, kernel-level optimizations, and complex distributed consensus protocols
  • Proven experience leading technical strategy across multiple teams or organizations, turning high-level business goals into concrete technical realities
  • Extensive experience managing Tier-0, mission-critical systems with 99.99% availability and global blast-radius constraints
Job Responsibility
Job Responsibility
  • Define and execute the multi-year roadmap to transition Uber from Data Storage to a Cloud-Native Data Provider, solving for cross-region latency, global metadata consistency, and exabyte-scale cost efficiency
  • Partner with Uber's AI/ML leadership to architect the Data-to-GPU pipeline. You will design the one-stop storage APIs that allow researchers to leverage high-performance data access across multi-cloud regions and vendors seamlessly
  • Drive the next generation of our core engines: Docstore (NoSQL), Vitess (Sharded MySQL), Apache Pinot (Real-time Analytics), and OpenSearch (Discovery)
  • You will represent Uber in the global community as a leader in key open source technologies including Apache, Hudi, Iceberg and many others
What we offer
What we offer
  • Eligible to participate in Uber's bonus program
  • May be offered an equity award & other types of comp
  • Eligible to participate in a 401(k) plan
  • Various benefits
  • Fulltime
Read More
Arrow Right

Principal Software Engineer

Microsoft Advertising is seeking a Principal Software Engineer to join our Ads E...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
  • OR equivalent experience
  • Master's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
  • OR Bachelor's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
  • OR equivalent experience
  • Industry experience in advertising or search engine backend systems, such as large-scale ad ranking, real-time bidding (RTB), or relevance-serving infrastructure
  • Hands-on experience with real-time data streaming systems (Kafka, Flink, Spark Streaming), feature-store integration, and multi-region deployment for low-latency, globally distributed services
  • Familiarity with LLM inference optimization—model sharding, tensor/kv-cache parallelism, paged attention, continuous batching, quantization (AWQ/FP8), and hybrid CPU–GPU orchestration
  • Demonstrated success operating large-scale systems with SLA-based capacity forecasting, autoscaling, and performance telemetry
  • proven leadership in cross-functional architecture initiatives and technical mentorship
Job Responsibility
Job Responsibility
  • Design and lead the development of large-scale, distributed online serving systems—including GPU-accelerated and CPU-based ranking/inference pipelines—to process millions of ad requests per second with ultra-low latency, high throughput, and solid reliability
  • Architect and optimize end-to-end inference infrastructure, including model serving, batching/streaming, caching, scheduling, and resource orchestration across heterogeneous hardware (GPU, CPU, and memory tiers)
  • Profile and optimize performance across the full stack—from CUDA kernels and GPU pipelines to CPU threads and OS-level scheduling—identifying bottlenecks, tuning latency tails, and improving cost efficiency through advanced profiling and instrumentation
  • Own live-site reliability as a DRI: design telemetry, alerting, and fault-tolerance mechanisms
  • drive rapid diagnosis and mitigation of performance regressions or outages in globally distributed systems
  • Collaborate and mentor across teams—driving architecture reviews, enforcing engineering excellence, promoting system-level optimization practices, and mentoring others in deep debugging, profiling, and performance engineering
  • Fulltime
Read More
Arrow Right

Principal Software Engineer

In this Software Engineering role, you will be responsible for investigating, en...
Location
Location
United States , Columbia; Morrisville; Danbury
Salary
Salary:
Not provided
owlcyberdefense.com Logo
Owl Cyber Defense
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree or higher in Computer Science, Engineering or Mathematics
  • 15+ years software development
  • 3+ years in technical leadership or team lead capacity
  • Advanced proficiency in one or more of the following: Rust, Java, C, or C++
  • Understanding of Linux/Unix kernel-level functionality
  • Strong automated testing and quality assurance practices
  • Proven ability to mentor developers and foster collaborative team culture
  • Git version control and collaborative development workflows
  • Excellent written and verbal communication skills
  • Deep understanding of software architecture and design patterns
Job Responsibility
Job Responsibility
  • Investigating, enhancing, designing, developing, and testing Linux based security systems
  • Work on multiple projects identifying and resolving complex security issues
  • Projects involve deep security vulnerabilities that target the operating system level
  • Deepen operating system security knowledge
  • Learn SELinux and other security processes to harden complex systems
  • Be part of a strong technical team with a high degree of autonomy and significant responsibility
Read More
Arrow Right

Staff Software Engineer

The Staff Software Engineer on the Engineering team is responsible for the imple...
Location
Location
India , Pune
Salary
Salary:
Not provided
logicmonitor.com Logo
LogicMonitor
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of software development experience in commercial or enterprise applications
  • 6+ years of full-time experience as a Java developer on Linux platform
  • BS or above degree in computer science or related field
  • Expertise with latest Java development framework and open-source tools
  • Extensive experience and knowledge with inner workings of JVM
  • Strong understanding of web application architectures, specifically Apache Tomcat
  • Experience in SaaS Product Development dealing with large volumes of data
  • Deep SQL / NoSQL database knowledge, including following databases: MySQL, Cassandra, and ElasticSearch
  • Extensive experience with one of the following Big Data technologies: Apache Spark, Kafka Streams, AWS Kinesis/Firehose
  • Experience designing large, complex distributed systems
Job Responsibility
Job Responsibility
  • Prioritize and plan for deliverables in an iterative development strategy, according to our 2 week scrum schedule & 1 week regression testing
  • Design, document, code, and test technical solution for new systems or enhancements to existing systems
  • Follow agile software development methodologies for implementation
  • Working with various teams in LogicMonitor to deliver software products that support LogicMonitor's business growth
  • Provides technical leadership mentoring and guidance at senior engineering levels and below
  • Trusted to represent the team to other functional teams
  • Coordination, Communication, and Collaboration between management, product, techops, support, developers
  • Envision system features and functionalities by analyzing business requirements
  • Troubleshoot and resolve product/application issues for escalated support cases
  • Collaborate with a diverse, distributed development organization
  • Fulltime
Read More
Arrow Right

Senior Staff Software Engineer - AI

GEICO is seeking an experienced Engineer with a passion for building high-perfor...
Location
Location
United States , Seattle, WA; Austin, TX; Palo Alto, CA; Chicago, IL; Dallas, TX
Salary
Salary:
110000.00 - 230000.00 USD / Year
geico.com Logo
Geico
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience building and deploying ML systems in production with cross-functional engineering teams
  • Fluency in at least two modern languages such as Python, Go, Java, C++, or C# including object-oriented design
  • Experience architecting multi-component ML platforms using open-source/cloud-agnostic components: Datastores: PostgreSQL, NoSQL (MongoDB, Cassandra, CosmosDB) Streaming: Kafka, Flink, or Spark Streaming
  • Experience with end-to-end ML lifecycle: version control, CI/CD, Kubernetes, testing, monitoring, and production support
  • Experience with cloud providers (Azure, AWS or GCP) in production ML environments
  • Experience with observability tools and distributed systems monitoring, logging, tracing, and root cause analysis
  • Experience building multi-agent systems using LLMs and agentic frameworks (e.g., LangChain, LangGraph, AutoGen, Semantic Kernel, CrewAI)
  • Hands-on experience with RAG, semantic search, and vector databases (e.g., Milvus, pgvector, Qdrant, ElasticSearch)
  • Experience designing human-in-the-loop workflows and safety controls for autonomous systems
  • Strong architecture and design skills with ability to influence technical direction and roadmap
Job Responsibility
Job Responsibility
  • Design and build a multi-agent AI platform where specialized agents autonomously detect, diagnose, and resolve issues through agent-to-agent (A2A) collaboration
  • Develop intelligent agents using LLMs and agentic frameworks that coordinate detection, diagnostic, remediation, and knowledge tasks with minimal human intervention
  • Define agent interaction protocols, A2A communication standards, and evaluation frameworks for agent decision quality and autonomous action safety
  • Architect vector database solutions (Milvus, pgvector, Qdrant) for semantic search and RAG to enable context-aware agent decision-making
  • Build end-to-end ML pipelines for severity classification, anomaly detection, failure pattern recognition, and impact forecasting using observability data
  • Establish scalable orchestration infrastructure for multi-agent workflows with CI/CD, automated evaluation, canary releases, and rollback strategies
  • Implement monitoring for agent interactions, A2A communication patterns, decision quality, data drift, and system reliability
  • Lead technical architecture ensuring scalability, observability, and integration with existing alerting, logging, and monitoring systems
  • Define standards for agent safety, explainability, governance, and human-in-the-loop controls for high-impact automated actions
  • Partner with SRE, Product, and Engineering teams to translate reliability goals into measurable ML objectives and maintain pragmatic technical roadmaps
What we offer
What we offer
  • Comprehensive Total Rewards program that offers personalized coverage tailor-made for you and your family’s overall well-being
  • Financial benefits including market-competitive compensation
  • a 401K savings plan vested from day one that offers a 6% match
  • performance and recognition-based incentives
  • and tuition assistance
  • Access to additional benefits like mental healthcare as well as fertility and adoption assistance
  • Supports flexibility- We provide workplace flexibility as well as our GEICO Flex program, which offers the ability to work from anywhere in the US for up to four weeks per year
  • Fulltime
Read More
Arrow Right

Software Engineer, Hardware

As a software engineer on the Scaling team, you’ll help build and optimize the l...
Location
Location
United States , San Francisco
Salary
Salary:
266000.00 - 455000.00 USD / Year
openai.com Logo
OpenAI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proficient in systems programming (e.g., Rust, C++) and scripting languages like Python
  • Experience in one or more of the following areas: compiler development, kernel authoring, accelerator programming, runtime systems, distributed systems, or high-performance simulation
  • Deep curiosity for how large-scale systems work and enjoy making them faster, simpler, and more reliable
  • Excited to work in a fast-paced, highly collaborative environment with evolving hardware and ML system demands
  • Value engineering excellence, technical leadership, and thoughtful system design
Job Responsibility
Job Responsibility
  • Design and build APIs and runtime components to orchestrate computation and data movement across heterogeneous ML workloads
  • Contribute to compiler infrastructure, including the development of optimizations and compiler passes to support evolving hardware
  • Engineer and optimize compute and data kernels, ensuring correctness, high performance, and portability across simulation and production environments
  • Profile and optimize system bottlenecks, especially around I/O, memory hierarchy, and interconnects, at both local and distributed scales
  • Develop simulation infrastructure to validate runtime behaviors, test training stack changes, and support early-stage hardware and system development
  • Rapidly deploy runtime and compiler updates to new supercomputing builds in close collaboration with hardware and research teams
  • Work across a diverse stack, primarily using Rust and Python, with opportunities to influence architecture decisions across the training framework
What we offer
What we offer
  • Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
  • 401(k) retirement plan with employer match
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
  • 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
  • Mental health and wellness support
  • Employer-paid basic life and disability coverage
  • Annual learning and development stipend to fuel your professional growth
  • Daily meals in our offices, and meal delivery credits as eligible
  • Fulltime
Read More
Arrow Right

Systems Software Engineer

The Crusoe Cloud Software Development team is seeking a passionate and experienc...
Location
Location
United States , San Francisco
Salary
Salary:
137000.00 - 161000.00 USD / Year
crusoe.ai Logo
Crusoe
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Linux Systems Familiarity: Experience building applications on Linux kernels, specifically pertaining to virtualization, device drivers, memory management, and process scheduling
  • Hardware Integration: Solid understanding of hardware devices such as GPUs, CPUs, Infiniband and Ethernet NICs, Ephemeral Disks, and PCI Express
  • Systems Design: Strong grasp of distributed applications and highly-scalable systems design. Specific focus around communications protocols (GRPC, REST, TCP/IP, etc.), databases (Postgres, Redis), and systems design applications (Pub/Sub, Kafka)
  • Software Architecture: Strong experience building software applications, both at the higher (Golang, Java, Python) and lower (C, C++, Rust) levels. Keen eye for clean, maintainable code, and a unit-test driven mindset
  • Excellent Communication Skills: Ability to collaborate with teams across an organization, blocking out noise, and focusing on what needs to get done to get a project across the line
  • Rapid and Agile Learner: Capable of adapting quickly, eager to research new technology and not get overwhelmed by unfamiliar tech stacks
  • Virtualization Concepts: General knowledge of hypervisors, virtual machine lifecycles, and Linux KVM tooling
  • CI/CD and Validation: Understanding of how to build Gitlab or Github CI/CD pipelines that deliver bug-free code across a multitude of compute platforms
Job Responsibility
Job Responsibility
  • Compute Application Development & Scaleout: Design highly reliable and performant Linux applications used to manage our virtualization stack across thousands of AI compute servers in multiple global datacenters
  • AI Hardware Platform Integration: Integrate Crusoe applications with a wide variety of hardware and software AI chip-vendor stacks. Build solutions to optimize and monitor virtualized hardware (GPUs, Infiniband/ROCe NICs, Ephemeral Storage, etc.) in cutting-edge AI/HPC environments
  • Kernel & Hypervisor Integration - Work side by side with our Linux Kernel and Hypervisor teams to ensure our Crusoe applications are seamlessly integrated with a variety of kernels and hypervisors
  • Performance Analysis & Tuning: Analyze and enhance the performance of the entire virtualization stack, from the hypervisor to the virtualized guest OS, with a specific focus on optimizing AI/ML workloads. This includes profiling, bottleneck identification, and implementing low-level optimizations
  • System-Level Troubleshooting: Diagnose and resolve complex system issues across our virtualization stack (drivers, kernel, hypervisor, guest OS, and crusoe applications). Work closely with kernel and hypervisor teams to debug and resolve integration challenges
  • Code Review and Quality Assurance: Conduct thorough code reviews to ensure the highest level of software quality, reliability, and security within compute applications and virtualization stack
  • Cross-Functional Collaboration: Collaborate with other engineering teams, including hardware design, OS development, and AI/ML application teams, to ensure cohesive and integrated product development
  • Technical Leadership: Provide technical guidance and mentorship to junior engineers, fostering a culture of technical excellence and collaborative problem-solving within the compute applications team
What we offer
What we offer
  • Restricted Stock Units in a fast growing, well-funded technology company
  • Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
  • Employer contributions to HSA accounts
  • Paid Parental Leave
  • Paid life insurance, short-term and long-term disability
  • Teladoc
  • 401(k) with a 100% match up to 4% of salary
  • Generous paid time off and holiday schedule
  • Cell phone reimbursement
  • Tuition reimbursement
  • Fulltime
Read More
Arrow Right

Senior Staff Software Engineer, SDN Networking

Crusoe's mission is to accelerate the abundance of energy and intelligence. We’r...
Location
Location
United States , San Francisco; Sunnyvale
Salary
Salary:
214000.00 - 259000.00 USD / Year
crusoe.ai Logo
Crusoe
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of proven experience in system programming with C, C++, and/or Rust
  • Extensive knowledge of Linux Systems Internals, including kernel internals, memory management, and I/O subsystems
  • Expertise in Network Programming and Packet Processing pipelines (TCP/IP, UDP, etc.)
  • Hands-on experience with kernel bypass technologies such as XDP/EBPF, AF_XDP, and DPDK
  • In-depth understanding of network accelerators like Mellanox/Nvidia SmartNIC (ConnectX6/7), DPU Bluefield3, and Intel IPU
  • Familiarity with SR-IOV, vDPA, scalable functions, Open vSwitch, Openflow, and Open Virtual Networking
  • Knowledge of professional software engineering practices and best practices for the full software development life cycle
  • Demonstrated track record of contributions to the open source community (e.g., Open vSwitch/OVS, Open Virtual Networking/OVN, Multus, Cilium)
Job Responsibility
Job Responsibility
  • Define and Execute SDN Strategy: Develop and execute the roadmap for Crusoe Energy Cloud's Software Defined Networking strategy
  • Provide technical leadership and guidance to the engineering team
  • Drive architectural decisions, design processes, design reviews, code reviews, and implementation tasks
  • Collaborate closely with the network infrastructure organization to develop and deploy industry-leading networking infrastructure
  • Lead the development and maintenance of Linux Kernel modules and drivers, leveraging technologies like XDP/EBPF, DPDK, and network accelerators
  • Design and implement high-performance, scalable, and reliable network architectures
  • Provide ongoing support for production systems, including troubleshooting, performance tuning, and incident response
  • Foster strong collaboration with other engineering teams (e.g., Software Infrastructure, Product) and cross-functional departments
What we offer
What we offer
  • Restricted Stock Units in a fast growing, well-funded technology company
  • Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
  • Employer contributions to HSA accounts
  • Paid Parental Leave
  • Paid life insurance, short-term and long-term disability
  • Teladoc
  • 401(k) with a 100% match up to 4% of salary
  • Generous paid time off and holiday schedule
  • Cell phone reimbursement
  • Tuition reimbursement
  • Fulltime
Read More
Arrow Right