Software Engineer (Technical Leadership) - Kernel Job at Meta (Menlo Park)

New

Staff Software Engineer : Storage, Search, & Data Platforms

The Storage, Search, and Data (SSD) group is the custodian of Uber's digital int...

Location

United States , Seattle; San Francisco; Sunnyvale

Salary:

232000.00 - 258000.00 USD / Year

Uber

Expiration Date

Until further notice

Requirements

12+ years of software engineering experience, with a proven history of designing and operating massive-scale distributed data systems
Elite engineering skills in Go, Java, C++, or Rust. You are comfortable deep-diving into database internals, kernel-level optimizations, and complex distributed consensus protocols
Proven experience leading technical strategy across multiple teams or organizations, turning high-level business goals into concrete technical realities
Extensive experience managing Tier-0, mission-critical systems with 99.99% availability and global blast-radius constraints

Job Responsibility

Define and execute the multi-year roadmap to transition Uber from Data Storage to a Cloud-Native Data Provider, solving for cross-region latency, global metadata consistency, and exabyte-scale cost efficiency
Partner with Uber's AI/ML leadership to architect the Data-to-GPU pipeline. You will design the one-stop storage APIs that allow researchers to leverage high-performance data access across multi-cloud regions and vendors seamlessly
Drive the next generation of our core engines: Docstore (NoSQL), Vitess (Sharded MySQL), Apache Pinot (Real-time Analytics), and OpenSearch (Discovery)
You will represent Uber in the global community as a leader in key open source technologies including Apache, Hudi, Iceberg and many others

What we offer

Eligible to participate in Uber's bonus program
May be offered an equity award & other types of comp
Eligible to participate in a 401(k) plan
Various benefits

Fulltime

Principal Software Engineer

Microsoft Advertising is seeking a Principal Software Engineer to join our Ads E...

Location

United States , Redmond

Salary:

139900.00 - 274800.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
OR equivalent experience
Master's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
OR Bachelor's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
OR equivalent experience
Industry experience in advertising or search engine backend systems, such as large-scale ad ranking, real-time bidding (RTB), or relevance-serving infrastructure
Hands-on experience with real-time data streaming systems (Kafka, Flink, Spark Streaming), feature-store integration, and multi-region deployment for low-latency, globally distributed services
Familiarity with LLM inference optimization—model sharding, tensor/kv-cache parallelism, paged attention, continuous batching, quantization (AWQ/FP8), and hybrid CPU–GPU orchestration
Demonstrated success operating large-scale systems with SLA-based capacity forecasting, autoscaling, and performance telemetry
proven leadership in cross-functional architecture initiatives and technical mentorship

Job Responsibility

Design and lead the development of large-scale, distributed online serving systems—including GPU-accelerated and CPU-based ranking/inference pipelines—to process millions of ad requests per second with ultra-low latency, high throughput, and solid reliability
Architect and optimize end-to-end inference infrastructure, including model serving, batching/streaming, caching, scheduling, and resource orchestration across heterogeneous hardware (GPU, CPU, and memory tiers)
Profile and optimize performance across the full stack—from CUDA kernels and GPU pipelines to CPU threads and OS-level scheduling—identifying bottlenecks, tuning latency tails, and improving cost efficiency through advanced profiling and instrumentation
Own live-site reliability as a DRI: design telemetry, alerting, and fault-tolerance mechanisms
drive rapid diagnosis and mitigation of performance regressions or outages in globally distributed systems
Collaborate and mentor across teams—driving architecture reviews, enforcing engineering excellence, promoting system-level optimization practices, and mentoring others in deep debugging, profiling, and performance engineering

Fulltime

Principal Software Engineer

In this Software Engineering role, you will be responsible for investigating, en...

Location

United States , Columbia; Morrisville; Danbury

Salary:

Not provided

Owl Cyber Defense

Expiration Date

Until further notice

Requirements

Bachelor’s degree or higher in Computer Science, Engineering or Mathematics
15+ years software development
3+ years in technical leadership or team lead capacity
Advanced proficiency in one or more of the following: Rust, Java, C, or C++
Understanding of Linux/Unix kernel-level functionality
Strong automated testing and quality assurance practices
Proven ability to mentor developers and foster collaborative team culture
Git version control and collaborative development workflows
Excellent written and verbal communication skills
Deep understanding of software architecture and design patterns

Job Responsibility

Investigating, enhancing, designing, developing, and testing Linux based security systems
Work on multiple projects identifying and resolving complex security issues
Projects involve deep security vulnerabilities that target the operating system level
Deepen operating system security knowledge
Learn SELinux and other security processes to harden complex systems
Be part of a strong technical team with a high degree of autonomy and significant responsibility

Staff Software Engineer

The Staff Software Engineer on the Engineering team is responsible for the imple...

Location

India , Pune

Salary:

Not provided

LogicMonitor

Expiration Date

Until further notice

Requirements

8+ years of software development experience in commercial or enterprise applications
6+ years of full-time experience as a Java developer on Linux platform
BS or above degree in computer science or related field
Expertise with latest Java development framework and open-source tools
Extensive experience and knowledge with inner workings of JVM
Strong understanding of web application architectures, specifically Apache Tomcat
Experience in SaaS Product Development dealing with large volumes of data
Deep SQL / NoSQL database knowledge, including following databases: MySQL, Cassandra, and ElasticSearch
Extensive experience with one of the following Big Data technologies: Apache Spark, Kafka Streams, AWS Kinesis/Firehose
Experience designing large, complex distributed systems

Job Responsibility

Prioritize and plan for deliverables in an iterative development strategy, according to our 2 week scrum schedule & 1 week regression testing
Design, document, code, and test technical solution for new systems or enhancements to existing systems
Follow agile software development methodologies for implementation
Working with various teams in LogicMonitor to deliver software products that support LogicMonitor's business growth
Provides technical leadership mentoring and guidance at senior engineering levels and below
Trusted to represent the team to other functional teams
Coordination, Communication, and Collaboration between management, product, techops, support, developers
Envision system features and functionalities by analyzing business requirements
Troubleshoot and resolve product/application issues for escalated support cases
Collaborate with a diverse, distributed development organization

Fulltime

Senior Staff Software Engineer - AI

GEICO is seeking an experienced Engineer with a passion for building high-perfor...

Location

United States , Seattle, WA; Austin, TX; Palo Alto, CA; Chicago, IL; Dallas, TX

Salary:

110000.00 - 230000.00 USD / Year

Geico

Expiration Date

Until further notice

Requirements

Experience building and deploying ML systems in production with cross-functional engineering teams
Fluency in at least two modern languages such as Python, Go, Java, C++, or C# including object-oriented design
Experience architecting multi-component ML platforms using open-source/cloud-agnostic components: Datastores: PostgreSQL, NoSQL (MongoDB, Cassandra, CosmosDB) Streaming: Kafka, Flink, or Spark Streaming
Experience with end-to-end ML lifecycle: version control, CI/CD, Kubernetes, testing, monitoring, and production support
Experience with cloud providers (Azure, AWS or GCP) in production ML environments
Experience with observability tools and distributed systems monitoring, logging, tracing, and root cause analysis
Experience building multi-agent systems using LLMs and agentic frameworks (e.g., LangChain, LangGraph, AutoGen, Semantic Kernel, CrewAI)
Hands-on experience with RAG, semantic search, and vector databases (e.g., Milvus, pgvector, Qdrant, ElasticSearch)
Experience designing human-in-the-loop workflows and safety controls for autonomous systems
Strong architecture and design skills with ability to influence technical direction and roadmap

Job Responsibility

Design and build a multi-agent AI platform where specialized agents autonomously detect, diagnose, and resolve issues through agent-to-agent (A2A) collaboration
Develop intelligent agents using LLMs and agentic frameworks that coordinate detection, diagnostic, remediation, and knowledge tasks with minimal human intervention
Define agent interaction protocols, A2A communication standards, and evaluation frameworks for agent decision quality and autonomous action safety
Architect vector database solutions (Milvus, pgvector, Qdrant) for semantic search and RAG to enable context-aware agent decision-making
Build end-to-end ML pipelines for severity classification, anomaly detection, failure pattern recognition, and impact forecasting using observability data
Establish scalable orchestration infrastructure for multi-agent workflows with CI/CD, automated evaluation, canary releases, and rollback strategies
Implement monitoring for agent interactions, A2A communication patterns, decision quality, data drift, and system reliability
Lead technical architecture ensuring scalability, observability, and integration with existing alerting, logging, and monitoring systems
Define standards for agent safety, explainability, governance, and human-in-the-loop controls for high-impact automated actions
Partner with SRE, Product, and Engineering teams to translate reliability goals into measurable ML objectives and maintain pragmatic technical roadmaps

What we offer

Comprehensive Total Rewards program that offers personalized coverage tailor-made for you and your family’s overall well-being
Financial benefits including market-competitive compensation
a 401K savings plan vested from day one that offers a 6% match
performance and recognition-based incentives
and tuition assistance
Access to additional benefits like mental healthcare as well as fertility and adoption assistance
Supports flexibility- We provide workplace flexibility as well as our GEICO Flex program, which offers the ability to work from anywhere in the US for up to four weeks per year

Fulltime

Software Engineer, Hardware

As a software engineer on the Scaling team, you’ll help build and optimize the l...

Location

United States , San Francisco

Salary:

266000.00 - 455000.00 USD / Year

OpenAI

Expiration Date

Until further notice

Requirements

Proficient in systems programming (e.g., Rust, C++) and scripting languages like Python
Experience in one or more of the following areas: compiler development, kernel authoring, accelerator programming, runtime systems, distributed systems, or high-performance simulation
Deep curiosity for how large-scale systems work and enjoy making them faster, simpler, and more reliable
Excited to work in a fast-paced, highly collaborative environment with evolving hardware and ML system demands
Value engineering excellence, technical leadership, and thoughtful system design

Job Responsibility

Design and build APIs and runtime components to orchestrate computation and data movement across heterogeneous ML workloads
Contribute to compiler infrastructure, including the development of optimizations and compiler passes to support evolving hardware
Engineer and optimize compute and data kernels, ensuring correctness, high performance, and portability across simulation and production environments
Profile and optimize system bottlenecks, especially around I/O, memory hierarchy, and interconnects, at both local and distributed scales
Develop simulation infrastructure to validate runtime behaviors, test training stack changes, and support early-stage hardware and system development
Rapidly deploy runtime and compiler updates to new supercomputing builds in close collaboration with hardware and research teams
Work across a diverse stack, primarily using Rust and Python, with opportunities to influence architecture decisions across the training framework

What we offer

Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
401(k) retirement plan with employer match
Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
Mental health and wellness support
Employer-paid basic life and disability coverage
Annual learning and development stipend to fuel your professional growth
Daily meals in our offices, and meal delivery credits as eligible

Fulltime

Systems Software Engineer

The Crusoe Cloud Software Development team is seeking a passionate and experienc...

Location

United States , San Francisco

Salary:

137000.00 - 161000.00 USD / Year

Crusoe

Expiration Date

Until further notice

Requirements

Linux Systems Familiarity: Experience building applications on Linux kernels, specifically pertaining to virtualization, device drivers, memory management, and process scheduling
Hardware Integration: Solid understanding of hardware devices such as GPUs, CPUs, Infiniband and Ethernet NICs, Ephemeral Disks, and PCI Express
Systems Design: Strong grasp of distributed applications and highly-scalable systems design. Specific focus around communications protocols (GRPC, REST, TCP/IP, etc.), databases (Postgres, Redis), and systems design applications (Pub/Sub, Kafka)
Software Architecture: Strong experience building software applications, both at the higher (Golang, Java, Python) and lower (C, C++, Rust) levels. Keen eye for clean, maintainable code, and a unit-test driven mindset
Excellent Communication Skills: Ability to collaborate with teams across an organization, blocking out noise, and focusing on what needs to get done to get a project across the line
Rapid and Agile Learner: Capable of adapting quickly, eager to research new technology and not get overwhelmed by unfamiliar tech stacks
Virtualization Concepts: General knowledge of hypervisors, virtual machine lifecycles, and Linux KVM tooling
CI/CD and Validation: Understanding of how to build Gitlab or Github CI/CD pipelines that deliver bug-free code across a multitude of compute platforms

Job Responsibility

Compute Application Development & Scaleout: Design highly reliable and performant Linux applications used to manage our virtualization stack across thousands of AI compute servers in multiple global datacenters
AI Hardware Platform Integration: Integrate Crusoe applications with a wide variety of hardware and software AI chip-vendor stacks. Build solutions to optimize and monitor virtualized hardware (GPUs, Infiniband/ROCe NICs, Ephemeral Storage, etc.) in cutting-edge AI/HPC environments
Kernel & Hypervisor Integration - Work side by side with our Linux Kernel and Hypervisor teams to ensure our Crusoe applications are seamlessly integrated with a variety of kernels and hypervisors
Performance Analysis & Tuning: Analyze and enhance the performance of the entire virtualization stack, from the hypervisor to the virtualized guest OS, with a specific focus on optimizing AI/ML workloads. This includes profiling, bottleneck identification, and implementing low-level optimizations
System-Level Troubleshooting: Diagnose and resolve complex system issues across our virtualization stack (drivers, kernel, hypervisor, guest OS, and crusoe applications). Work closely with kernel and hypervisor teams to debug and resolve integration challenges
Code Review and Quality Assurance: Conduct thorough code reviews to ensure the highest level of software quality, reliability, and security within compute applications and virtualization stack
Cross-Functional Collaboration: Collaborate with other engineering teams, including hardware design, OS development, and AI/ML application teams, to ensure cohesive and integrated product development
Technical Leadership: Provide technical guidance and mentorship to junior engineers, fostering a culture of technical excellence and collaborative problem-solving within the compute applications team

What we offer

Restricted Stock Units in a fast growing, well-funded technology company
Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
Employer contributions to HSA accounts
Paid Parental Leave
Paid life insurance, short-term and long-term disability
Teladoc
401(k) with a 100% match up to 4% of salary
Generous paid time off and holiday schedule
Cell phone reimbursement
Tuition reimbursement

Fulltime

Senior Staff Software Engineer, SDN Networking

Crusoe's mission is to accelerate the abundance of energy and intelligence. We’r...

Location

United States , San Francisco; Sunnyvale

Salary:

214000.00 - 259000.00 USD / Year

Crusoe

Expiration Date

Until further notice

Requirements

8+ years of proven experience in system programming with C, C++, and/or Rust
Extensive knowledge of Linux Systems Internals, including kernel internals, memory management, and I/O subsystems
Expertise in Network Programming and Packet Processing pipelines (TCP/IP, UDP, etc.)
Hands-on experience with kernel bypass technologies such as XDP/EBPF, AF_XDP, and DPDK
In-depth understanding of network accelerators like Mellanox/Nvidia SmartNIC (ConnectX6/7), DPU Bluefield3, and Intel IPU
Familiarity with SR-IOV, vDPA, scalable functions, Open vSwitch, Openflow, and Open Virtual Networking
Knowledge of professional software engineering practices and best practices for the full software development life cycle
Demonstrated track record of contributions to the open source community (e.g., Open vSwitch/OVS, Open Virtual Networking/OVN, Multus, Cilium)

Job Responsibility

Define and Execute SDN Strategy: Develop and execute the roadmap for Crusoe Energy Cloud's Software Defined Networking strategy
Provide technical leadership and guidance to the engineering team
Drive architectural decisions, design processes, design reviews, code reviews, and implementation tasks
Collaborate closely with the network infrastructure organization to develop and deploy industry-leading networking infrastructure
Lead the development and maintenance of Linux Kernel modules and drivers, leveraging technologies like XDP/EBPF, DPDK, and network accelerators
Design and implement high-performance, scalable, and reliable network architectures
Provide ongoing support for production systems, including troubleshooting, performance tuning, and incident response
Foster strong collaboration with other engineering teams (e.g., Software Infrastructure, Product) and cross-functional departments

What we offer

Restricted Stock Units in a fast growing, well-funded technology company
Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
Employer contributions to HSA accounts
Paid Parental Leave
Paid life insurance, short-term and long-term disability
Teladoc
401(k) with a 100% match up to 4% of salary
Generous paid time off and holiday schedule
Cell phone reimbursement
Tuition reimbursement

Fulltime

Select Country

Software Engineer (Technical Leadership) - Kernel

Job Description

Job Responsibility

Requirements

Nice to have

What we offer

Looking for more opportunities?