CrawlJobs Logo

Staff Software Engineer, Search & Distributed Systems

USA, Buffalo 165000.00 - 260000.00 USD / Year · Job Posted May 05, 2026
Apply Position
Job Link Share

Job Description

We are looking for a Staff Software Engineer who would thrive on being accountable for our Search infrastructure: its scalability, reliability, and data resiliency. We don't just need someone who knows how to write a complex query; we need a battle-scarred Distributed Systems expert who understands the deep internals of Elasticsearch and who has a deep toolbox for analyzing, monitoring, alerting, and quickly resolving critical issues as they arise. You know exactly how Elasticsearch fails, why it fails under load, and how to architect a topology that prevents it. Because our search ecosystem doesn't exist in a vacuum, you will also own the architectural connective tissue—ensuring our service layers and event-based ecosystem interact with Search flawlessly. As a Staff Engineer, you will set the technical standard, drive systemic reliability, and mentor senior engineers across the organization.

Job Responsibility

  • Architect for Scale: Design, configure, and scale our Elasticsearch clusters. You will define our global strategies for shard routing, Index Lifecycle Management (ILM), heap tuning, and data tiering to support massive auction throughput.
  • Master the Failure Modes: Anticipate and engineer away points of failure. You will design circuit breakers, implement backpressure mechanisms, and tune asymmetric timeouts to prevent retry storms between our BFFs, K8s services, and the Search layer.
  • Expert Troubleshooting & IR: Act as the ultimate technical escalation point for complex, cross-system performance degradation. You will dive deep into JVM metrics, Garbage Collection pauses, K8s network bottlenecks, and slow logs to uncover and remediate root causes.
  • Holistic System Ownership: Manage the entire data lifecycle. You will optimize the ingestion pipelines syncing our event datastreams driven by producers and consumers (Kafka) to Elasticsearch, ensuring eventual consistency and data integrity at scale.
  • Drive Engineering Excellence: Draft authoritative architectural Blueprints, SOPs, and Runbooks. You will elevate the surrounding engineering culture by coaching teams on distributed systems design, observability best practices, and incident management.
  • Modernize & Innovate: Scan the horizon for emerging technologies. You will help evaluate and integrate next-generation search capabilities (e.g., Vector Search, RAG architectures) to support our broader AI and machine learning initiatives.

Requirements

  • 8+ years of software engineering experience, with at least 3+ years operating at a Senior or Staff level focusing on distributed systems and high-throughput platforms.
  • Deep, authoritative knowledge of Elasticsearch internals. You have managed large-scale clusters and deeply understand mapping, analysis, query optimization, cluster state management, and split-brain mitigation.
  • Proficiency in the systems upstream and downstream of Search. You have hands-on experience with Kubernetes (EKS/GKE), API Gateway/BFF architectures, and event streams (Kafka).
  • A proven track record of implementing fault-tolerant patterns (retries, rate limiting, circuit breaking, dead letter queues) in microservice architectures.
  • Expert-level ability to instrument systems and diagnose complex performance issues using modern observability stacks (Datadog, Prometheus, Grafana, OpenTelemetry).
  • Strong communication skills with a proven ability to influence cross-functional teams, build consensus around architectural decisions (the Knoster model!), and mentor mid-level and senior engineers.

Nice to have

  • Experience with Infrastructure as Code (Terraform, Helm) for stateful applications.
  • Familiarity with FinOps practices, specifically optimizing Elasticsearch compute and storage costs.
  • Experience integrating AI-assisted development tools into your daily workflow.

What we offer

  • Multiple medical plans including a high deductible, low cost health plan
  • Company-sponsored (paid) Short-Term Disability, Long-Term Disability, and Life Insurance
  • Comprehensive optional benefits such as Dental, Vision, Supplemental Life/AD&D, Legal/ID Protection, and Accident and Critical Illness Insurance
  • Generous paid time off options, including uncapped vacation days, the greater of 3 paid sick days or in accordance with the applicable state or local paid sick leave law, 6 paid company holidays, 2 floating holidays, parental leave, bereavement leave, jury duty leave, voting leave, and other forms of paid leave as required by applicable law or regulation
  • Employee Stock Purchase Program with additional opportunities to earn stock in the Company
  • Retirement planning through the Company's 401(k)

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Staff Software Engineer, Search & Distributed Systems

8 matching positions

Staff Software Developer, Search & Distributed Systems

If you are looking for a career at a dynamic company with a people-first mindset...
Location
Location
Canada , Toronto
Salary
Salary:
147000.00 - 220000.00 CAD / Year
acvauctions.com Logo
ACV Auctions
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of software engineering experience, with at least 3+ years operating at a Senior or Staff level focusing on distributed systems and high-throughput platforms
  • Deep, authoritative knowledge of Elasticsearch internals. You have managed large-scale clusters and deeply understand mapping, analysis, query optimization, cluster state management, and split-brain mitigation
  • Proficiency in the systems upstream and downstream of Search. You have hands-on experience with Kubernetes (EKS/GKE), API Gateway/BFF architectures, and event streams (Kafka)
  • A proven track record of implementing fault-tolerant patterns (retries, rate limiting, circuit breaking, dead letter queues) in microservice architectures
  • Expert-level ability to instrument systems and diagnose complex performance issues using modern observability stacks (Datadog, Prometheus, Grafana, OpenTelemetry)
  • Strong communication skills with a proven ability to influence cross-functional teams, build consensus around architectural decisions (the Knoster model!), and mentor mid-level and senior engineers
Job Responsibility
Job Responsibility
  • Architect for Scale: Design, configure, and scale our Elasticsearch clusters. You will define our global strategies for shard routing, Index Lifecycle Management (ILM), heap tuning, and data tiering to support massive auction throughput
  • Master the Failure Modes: Anticipate and engineer away points of failure. You will design circuit breakers, implement backpressure mechanisms, and tune asymmetric timeouts to prevent retry storms between our BFFs, K8s services, and the Search layer
  • Expert Troubleshooting & IR: Act as the ultimate technical escalation point for complex, cross-system performance degradation. You will dive deep into JVM metrics, Garbage Collection pauses, K8s network bottlenecks, and slow logs to uncover and remediate root causes
  • Holistic System Ownership: Manage the entire data lifecycle. You will optimize the ingestion pipelines syncing our event datastreams driven by producers and consumers (Kafka) to Elasticsearch, ensuring eventual consistency and data integrity at scale
  • Drive Engineering Excellence: Draft authoritative architectural Blueprints, SOPs, and Runbooks. You will elevate the surrounding engineering culture by coaching teams on distributed systems design, observability best practices, and incident management
  • Modernize & Innovate: Scan the horizon for emerging technologies. You will help evaluate and integrate next-generation search capabilities (e.g., Vector Search, RAG architectures) to support our broader AI and machine learning initiatives
What we offer
What we offer
  • Company Sponsored (paid) Healthcare
  • Dental
  • Vision
  • Life/AD&D
  • Short-Term and Long-Term Disability
  • Comprehensive additional optional benefits such Critical Illness and Supplemental Life/AD&D
  • Generous Parental Leave Top-Up Pay and Vacation Programs
  • Employee Stock Purchase Program with additional opportunities to earn stock in the company
  • Retirement planning through the Company's RRSP
  • Fulltime
Read More
Arrow Right

Senior Staff Software Engineer (Search)

We are on a mission to build a reliable, fast, and scalable search for DoorDash....
Location
Location
United States , San Francisco, CA; Sunnyvale, CA; Seattle, WA
Salary
Salary:
231200.00 - 340000.00 USD / Year
doordash.com Logo
DoorDash
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • B.S. or M.S. in Computer Science or equivalent
  • 10+ years of industry experience, with a track record of leading large-scale, high-impact components and systems
  • Proven ability to drive multi-quarter technical roadmaps as a technical lead, with clear ownership of architectural decisions
  • Deep expertise in distributed systems and data pipelines at scale. Expertise in search infrastructure, including indexing and serving stack
  • Strong understanding of ML systems, embedding-based retrieval, model serving tradeoffs, and multi-stage ranking architectures
  • Strong technical intuition paired with the ability to influence and align cross-functional stakeholders
  • Humility and growth mindset, leading through expertise and collaboration, not hierarchy
Job Responsibility
Job Responsibility
  • Lead at scale: Serve as the uber tech lead for Search, providing technical vision and architectural direction across the entire organization. Own a multi-year roadmap that spans multiple services and teams powering mission-critical products at DoorDash
  • Architect the next-generation search stack: Drive the rebuild of core search infrastructure, including indexing pipelines, embedding-based retrieval, and ML ranker serving in latency-sensitive paths. This stack will power both consumer search and agentic commerce experiences at DoorDash scale
  • Influence across teams: Drive alignment across ML, Infrastructure, Product, and partner engineering teams. Mentor staff and senior engineers across the search org to raise the bar of technical excellence
  • Shape engineering culture: Set the technical bar for how the search org designs, ships, and operates large-scale systems, and define the patterns that will outlast any single project
  • Hands-on problem solving: Dig into complex distributed systems challenges, from low-latency serving to indexing freshness tradeoffs, and write code that moves the needle
What we offer
What we offer
  • 401(k) plan with employer matching
  • 16 weeks of paid parental leave
  • wellness benefits
  • commuter benefits match
  • paid time off and paid sick leave in compliance with applicable laws
  • medical, dental, and vision benefits
  • 11 paid holidays
  • disability and basic life insurance
  • family-forming assistance
  • mental health program
  • Fulltime
Read More
Arrow Right

Staff Software Engineer, Search

As a Staff Software Engineer, you will help make SiriusXM’s massive content cata...
Location
Location
United States , Texas; Georgia; New Jersey; New York
Salary
Salary:
101500.00 - 195000.00 USD / Year
siriusxm.com Logo
SiriusXM
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of professional software engineering experience building large-scale backend systems in Java, building backend microservices and scalable distributed systems
  • 3+ years of experience with Python
  • Deep experience with search frameworks, metadata indexing, and retrieval systems
  • Proven experience leading complex, cross-functional technical initiatives
  • Expert-level experience with AWS, including deploying applications using services such as EC2, Lambda, S3, DynamoDB, CloudWatch, ElastiCache, and IAM
  • Strong foundation in object-oriented design, system design, and design patterns
  • Proven ability to make sound architectural trade-offs while maintaining long-term extensibility
  • Comfort working across system boundaries (infrastructure, ML, and adjacent service domains)
  • Ability to handle multiple tasks in a fast-paced environment
  • Excellent interpersonal and communication skills
Job Responsibility
Job Responsibility
  • Design, build, and operate well-architected, scalable microservices for the search services stack
  • Drive architectural evolution of indexing, retrieval, and serving pipelines from prototype to production
  • Evaluate, fine-tune, and integrate off-the-shelf LLM models, rapidly prototyping where needed
  • Drive cross-functional initiatives, collaborating with product, science, design, and infrastructure partners
  • Act as a strong technical bridge between science/ML and engineering teams
  • Partner with the Voice Search team to eliminate redundancies and enhance the overall search ecosystem
  • Influence and uphold engineering best practices, mentoring other engineers as the team scales
  • Fulltime
Read More
Arrow Right

Staff Software Engineer, Search Platform

As a vital member of the Search PlatformTeam, you will be part of a specialized ...
Location
Location
Singapore; China , Singapore; Shanghai
Salary
Salary:
Not provided
airwallex.com Logo
Airwallex
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • More than 7 years of back-end development experience
  • Have experience in developing large-scale distributed systems
  • Proficient in coding and scripting languages (Java(Kotlin) /C++ /Python, etc.) with strong software and system design abilities
  • Deep familiarity with the standard library, idiomatic usage, and best practices of your primary programming languages
  • Able to write clear, maintainable, and efficient code
  • In depth knowledge on storage & streaming over PostgreSQL or Kafka
Job Responsibility
Job Responsibility
  • Work closely with Product Managers to analyze the product requirements and then produce the technique solutions & execution plan to deliver the software products
  • Hands-on design, implement and deliver production-grade streaming ingestion using Flink (or similar technologies), focusing on low-latency, high-throughput, and fault-tolerant design
  • Tackle challenging problems in timely computation, stateful stream processing, partitioning, and resilience
  • Proactively troubleshooting and addressing technical bottlenecks
  • Participate in and contribute to critical code, design, and performance reviews, raising the technical bar across the team
  • Engage with the Technical leads in building a backlog that continuously contributes to the execution of the roadmap
  • Collaborate with local/global engineering teams, infrastructure teams, and product development teams and translate business requirements into robust engineering solutions
  • Fulltime
Read More
Arrow Right
New

Staff Software Engineer : Storage, Search, & Data Platforms

The Storage, Search, and Data (SSD) group is the custodian of Uber's digital int...
Location
Location
United States , Seattle; San Francisco; Sunnyvale
Salary
Salary:
232000.00 - 258000.00 USD / Year
uber.com Logo
Uber
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 12+ years of software engineering experience, with a proven history of designing and operating massive-scale distributed data systems
  • Elite engineering skills in Go, Java, C++, or Rust. You are comfortable deep-diving into database internals, kernel-level optimizations, and complex distributed consensus protocols
  • Proven experience leading technical strategy across multiple teams or organizations, turning high-level business goals into concrete technical realities
  • Extensive experience managing Tier-0, mission-critical systems with 99.99% availability and global blast-radius constraints
Job Responsibility
Job Responsibility
  • Define and execute the multi-year roadmap to transition Uber from Data Storage to a Cloud-Native Data Provider, solving for cross-region latency, global metadata consistency, and exabyte-scale cost efficiency
  • Partner with Uber's AI/ML leadership to architect the Data-to-GPU pipeline. You will design the one-stop storage APIs that allow researchers to leverage high-performance data access across multi-cloud regions and vendors seamlessly
  • Drive the next generation of our core engines: Docstore (NoSQL), Vitess (Sharded MySQL), Apache Pinot (Real-time Analytics), and OpenSearch (Discovery)
  • You will represent Uber in the global community as a leader in key open source technologies including Apache, Hudi, Iceberg and many others
What we offer
What we offer
  • Eligible to participate in Uber's bonus program
  • May be offered an equity award & other types of comp
  • Eligible to participate in a 401(k) plan
  • Various benefits
  • Fulltime
Read More
Arrow Right

Staff Software Engineer, AI Agentic Search

We are seeking a Staff Software Engineer with 10+ years of experience to work in...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
notion.so Logo
Notion
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of fullstack engineering experience
  • Proven track record of execution
  • Experience building world-class product experiences
  • Experience shipping quality user interfaces with web technologies (HTML, CSS, JavaScript, modern UI framework like React)
  • Experience with distributed systems, data pipelines, vector databases, and production infrastructure
  • Ability to write clean code, take end-to-end ownership, and make pragmatic architectural decisions
  • Ability to mentor teammates
  • Thoughtful problem-solving approach
  • Ability to navigate ambiguity and decompose complex problems
  • Not ideological about technology
Job Responsibility
Job Responsibility
  • Build the end-to-end AI agentic search experience
  • Design and ship connectors from scratch, keeping third-party data continuously in sync
  • Build semantic search infrastructure powered by vector embeddings and efficient storage systems
  • Transform natural language questions into intelligent queries across multiple data sources
  • Implement ranking systems that surface the most relevant results while respecting user permissions and access controls
  • Scale to millions of documents across thousands of customers
  • Work with a team of engineers and cross-functional partners to define product strategy and drive execution
  • Build and maintain foundational pieces of Notion’s building blocks
  • Contribute to the overall performance, reliability, and robustness of the Notion product
  • Partner with engineering leaders to identify and execute against high leverage technical investment
  • Fulltime
Read More
Arrow Right

Staff Software Engineer – Discrete Event Simulation & Route Optimization

The Autonomous Robotics Center (ARC) is a multidisciplinary organization develop...
Location
Location
United States , Austin, Texas; Mountain View, California; Warren, Michigan
Salary
Salary:
Not provided
gm.com Logo
General Motors
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master's degree in Computer Science, Computer Engineering, Electrical Engineering, Operations Research, Applied Mathematics, or a related field
  • PhD or equivalent experience preferred
  • 10+ years of software engineering experience with a strong focus on algorithms, simulation, or optimization
  • Strong skills in Python, C++, C#, or similar languages, with a track record of shipping production-quality software
  • Deep experience implementing and optimizing shortest-path and routing algorithms (e.g., Dijkstra, A*, flows, matchings, search on large graphs)
  • Solid expertise in probability, statistics, and stochastic processes applied to modeling and simulation (e.g., Monte Carlo simulation)
  • Experience running large batches of simulations or distributed experiments (cloud or on-prem)
  • Excellent communication and collaboration skills, with a history of influencing architecture and technical direction
  • Practical experience with simulation frameworks (e.g., SimPy, AnyLogic, Arena, or custom DES frameworks)
Job Responsibility
Job Responsibility
  • Design and implement core algorithms for discrete event simulation, scheduling, routing, graph-based modeling, and system optimization
  • Build and maintain DES models of complex systems (e.g., production lines, AMR flows, logistics networks), including event logic, resources, and KPIs
  • Build and run large-scale simulation experiments and translate results into actionable recommendations for throughput, cost, and reliability improvements
  • Develop and maintain core data structures and services for maps, graphs, and spatial databases
  • Develop production-quality software (primarily in Python and C#) and expose capabilities via stable APIs and internal tools
  • Partner with cross-functional teams (manufacturing, robotics, data, and platform engineering) to connect models with real-world telemetry and digital twins
  • Define and champion best practices for algorithm design, simulation modeling, testing, and observability
  • Mentor engineers and provide technical leadership on algorithms, modeling, and software design
What we offer
What we offer
  • Relocation benefits
  • Fulltime
Read More
Arrow Right

Sr Staff Engineer Software, Fullstack (Prisma AIRS) - NetSec

Join our team building a cutting-edge multi-tenanted GenAI Security Platform tha...
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
paloaltonetworks.com Logo
Palo Alto Networks
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven experience building and scaling multi-tenant SaaS platforms with strict data isolation
  • Strong knowledge of API design, RESTful principles, and OpenAPI specifications
  • Proficiency in modern JavaScript frameworks (React, Vue, or Svelte) with TypeScript
  • Experience building data-intensive dashboards with complex visualisations and real-time data
  • Strong CSS/styling skills and responsive design principles
  • Demonstrated experience working with production AI/ML systems at scale
  • Practical experience integrating LLM APIs and managing inference at scale
  • Understanding of LLM operational challenges: rate limiting, cost optimisation, latency management, fallback strategies
  • Familiarity with AI agent frameworks (LangChain, AutoGen, MCP, or similar)
  • Knowledge of prompt engineering, semantic search, and vector databases
Job Responsibility
Job Responsibility
  • Design and implement high-performance REST APIs with enterprise-grade multi-tenant isolation and strict security boundaries
  • Work on distributed systems architecture handling high-throughput workloads with mission-critical uptime requirements
  • Build responsive dashboards and administrative interfaces for platform management, data visualisation, and system configuration
  • Integrate multiple LLM providers, implement semantic search capabilities, and build intelligent agent workflows
  • Architect complex, multi-step AI evaluation pipelines for asynchronous job execution and large-scale data processing
  • Design and implement database schemas with proper indexing, query optimisation, and data isolation strategies
  • Build and maintain scalable micro-services with async/await patterns and type-safe code
  • Develop data-intensive UIs with real-time updates, complex state management, and intuitive user experiences
  • Deploy and manage containerised applications on Kubernetes with comprehensive observability
  • Write thorough tests (frontend and backend) and maintain high code quality standards with automated tooling
  • Fulltime
Read More
Arrow Right