CrawlJobs Logo

Software Engineer, Distributed Systems - Infra

United States, San Francisco Employment contract 180000.00 - 275000.00 USD / Year · Job Posted May 15, 2026
Apply Position
Job Link Share

Job Description

You'll build and scale the application and data infrastructure that supports 70M+ users creating millions of gammas every day. This means working on real-time collaborative editing, databases, public APIs, and high-volume event pipelines while helping define and evolve the core data model and storage systems powering Gamma's business. You'll ship backend systems that directly impact growth metrics and user experience, balancing long-term technical investments with rapid shipping velocity. As Software Engineer on the Platform team, you'll collaborate across frontend, product, and data teams in a fast-paced, product-led environment. You'll bring a product-minded approach, understanding how technical decisions impact user experience and business metrics while thriving in an environment where shipping quality directly impacts growth. Our team has a strong in-office culture and works in person 4–5 days per week in San Francisco. We love working together to stay creative and connected, with flexibility to work from home when focus matters most.

Job Responsibility

  • Design and implement scalable APIs, distributed systems, and data infrastructure that serve millions of users
  • Help define and evolve the core data model and storage systems powering Gamma's business
  • Ship backend systems that directly impact growth metrics and user experience
  • Work on real-time collaborative editing, databases, public APIs, and high-volume event pipelines
  • Balance long-term technical investments with rapid shipping velocity
  • Collaborate across frontend, product, and data teams to deliver high-quality solutions under tight timelines

Requirements

  • 3–5+ years of backend engineering experience building scalable systems
  • Strong proficiency in backend technologies (Node.js, Python, or similar) and databases (PostgreSQL, Redis)
  • Experience with high-traffic production systems and performance optimization
  • Track record shipping high-quality, complex applications under tight timelines
  • Product-minded approach with understanding of how technical decisions impact user experience and business metrics
  • Thrives in fast-paced, product-led environments where shipping quality directly impacts growth
  • Experience with real-time collaboration systems, event pipelines, or AI-powered applications (Nice to have)

Nice to have

Experience with real-time collaboration systems, event pipelines, or AI-powered applications

What we offer

Equity

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Software Engineer, Distributed Systems - Infra

8 matching positions

Software Engineer, Systems

Meta Platforms, Inc. (Meta), formerly known as Facebook Inc., builds technologie...
Location
Location
United States , Bellevue
Salary
Salary:
182547.00 - 209000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Requires a Master's degree (or foreign degree equivalent) in Computer Science, Engineering, Statistics, Information Systems, Analytics, Mathematics, Physics, Applied Sciences, or a related field
  • Requires completion of a university-level course, research project, internship, or thesis in the following: Python, PHP, or Haskell
  • Relational databases and SQL
  • Software development tools: Code editors (VIM or Emacs), and revision control systems (Subversion, GIT, or Perforce)
  • Linux, UNIX, or other *nix-like OS as evidenced by file manipulation, advanced commands, and shell scripting
  • Core web technologies: HTML, CSS, or JavaScript
  • Building highly-scalable performant solutions
  • Data processing, programming languages, databases, networking, operating systems, computer graphics, or human-computer interaction
  • Big data computing platform, libraries and data modeling
  • In-depth knowledge of mathematics, statistics, and algorithms
Job Responsibility
Job Responsibility
  • Research, design, develop, build and test platforms, services and products that support Meta's big data computing and analysis with a focus on data privacy and security
  • Build new features and improve existing systems
  • Work on problems of moderate scope
  • Push code, drive the development of the privacy and security solutions in Meta's big data computing platform, and be a part of a team to protect billions of users' data
  • Develop a strong understanding of relevant product areas, the codebase, and/or systems
  • Demonstrate proficiency in data analysis, programming and software engineering
  • Produce high quality code with good test coverage, using modem abstractions and frameworks
  • Receive general instructions on routine work and detailed instructions on new projects or assignments, work independently, use available resources to get unblocked, and complete tasks on-schedule by exercising strong judgment and problem solving skills
  • Master internal development standards from developing to releasing code to taking on tasks and projects with increasing levels of complexity
  • Actively seek and give feedback in alignment with company Performance Philosophy
What we offer
What we offer
  • bonus
  • equity
  • benefits
Read More
Arrow Right

Software Engineer, Infra PyTorch (PhD)

This role is about developing the core PyTorch 2.0 technologies, innovating and ...
Location
Location
United States , Menlo Park
Salary
Salary:
181000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Currently has, or is in the process of obtaining a Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience. Degree must be completed prior to joining Meta
  • Currently has or is in the process of obtaining a PhD degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience. Degree must be completed prior to joining Meta
  • Research or industry experience in developing compilers, ML systems, ML accelerators, GPU performance, and similar
  • Advanced in Python or C++ programming
Job Responsibility
Job Responsibility
  • Develop the PT2 compiler (e.g., TorchDynamo, TorchInductor, PyTorch Distributed, PyTorch Core)
  • Improve PyTorch performance via systematic solutions for the entire community
  • Explore the intersection of the PyTorch compiler and PyTorch distributed
  • Optimize Generative AI models across the stack (pre-training, fine-tuning, and inference)
  • Collaborate with users of PyTorch to enable new use cases of PT2 technologies both inside and outside Meta
What we offer
What we offer
  • bonus
  • equity
  • benefits
Read More
Arrow Right

Software Systems Engineer

As part of New Product Introductions, you will have direct access to bleeding ed...
Location
Location
United States , Fremont
Salary
Salary:
173000.00 - 245000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • 8+ years of programming experience in a relevant programming language
  • 6+ years relevant experience building large-scale infrastructure applications or similar experience
  • Experience with scripting languages such as Python, Javascript or Hack
  • Experience leading major initiatives successfully
  • Experience leading projects and teams accordingly
  • Experience building and shipping high quality work and achieving high reliability
  • Experience improving quality through thoughtful code reviews, appropriate testing, proper rollout, monitoring, and proactive changes
  • Experience in utilizing data and analysis to explain technical problems and providing detailed feedback and solutions
Job Responsibility
Job Responsibility
  • Design, develop, and maintain an Infra stack of solutions that drive Provisioning, Scheduling, Monitoring, and Alerting
  • Collaborate with hardware and software engineering teams to identify product requirements
  • Contribute to and Influence Open Compute Project Industry standards
  • Troubleshoot and debug distributed systems and product issues
  • Implement and maintain infrastructure at scale
  • Continuously improve processes and methodologies
What we offer
What we offer
  • bonus
  • equity
  • benefits
Read More
Arrow Right

Software Engineer, Maps Infra

We are looking for a Software Engineer to partner with our Mapping team to deliv...
Location
Location
United States , San Francisco
Salary
Salary:
162000.00 - 260000.00 USD / Year
aurora.tech Logo
Aurora Innovation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ years experience building server side and data processing systems
  • Expert proficiency in C++ with a commitment to writing clean, testable, and production-ready code
  • Deep understanding of distributed systems principles, with a proven ability to deliver scalable, reliable backend systems
  • Strong understanding of cloud-native technologies (e.g., AWS, GCP, Kubernetes)
  • Excellent communication and collaboration skills
  • Proven ability to rapidly learn new technologies and adapt to evolving requirements
Job Responsibility
Job Responsibility
  • Design, develop, and maintain the scalable backend infrastructure and data processing pipeline for storing and serving map data as we onboard the Aurora Driver to more commercial routes
  • Establish and maintain robust testing and performance optimization practices to ensure the stability and scalability of the Atlas system
  • Partner closely with internal and external customers to influence existing and future designs and features
What we offer
What we offer
  • Annual bonus
  • Equity compensation
  • Benefits
  • Fulltime
Read More
Arrow Right

Software Engineer, Maps Infra

We are looking for a Software Engineer to partner with our Mapping team to deliv...
Location
Location
United States , Mountain View
Salary
Salary:
162000.00 - 260000.00 USD / Year
aurora.tech Logo
Aurora Innovation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ years experience building server side and data processing systems
  • Expert proficiency in C++ with a commitment to writing clean, testable, and production-ready code
  • Deep understanding of distributed systems principles, with a proven ability to deliver scalable, reliable backend systems
  • Strong understanding of cloud-native technologies (e.g., AWS, GCP, Kubernetes)
  • Excellent communication and collaboration skills
  • Proven ability to rapidly learn new technologies and adapt to evolving requirements
Job Responsibility
Job Responsibility
  • Design, develop, and maintain the scalable backend infrastructure and data processing pipeline for storing and serving map data as we onboard the Aurora Driver to more commercial routes
  • Establish and maintain robust testing and performance optimization practices to ensure the stability and scalability of the Atlas system
  • Partner closely with internal and external customers to influence existing and future designs and features
What we offer
What we offer
  • Annual bonus
  • Equity compensation
  • Benefits
  • Fulltime
Read More
Arrow Right

Staff Software Engineer - AI/ML Infra

GEICO AI platform and Infrastructure team is seeking an exceptional Senior ML Pl...
Location
Location
United States , Chevy Chase; New York City; Palo Alto
Salary
Salary:
115000.00 USD / Year
geico.com Logo
Geico
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in computer science, Engineering, or related technical field (or equivalent experience)
  • 8+ years of software engineering experience with focus on infrastructure, platform engineering, or MLOps
  • 3+ years of hands-on experience with machine learning infrastructure and deployment at scale
  • 2+ years of experience working with Large Language Models and transformer architectures
  • Proficient in Python
  • strong skills in Go, Rust, or Java preferred
  • Proven experience working with open source LLMs (Llama 2/3, Qwen, Mistral, Gemma, Code Llama, etc.)
  • Proficient in Kubernetes including custom operators, helm charts, and GPU scheduling
  • Deep expertise in Azure services (AKS, Azure ML, Container Registry, Storage, Networking)
  • Experience implementing and operating feature stores (Chronon, Feast, Tecton, Azure ML Feature Store, or custom solutions)
Job Responsibility
Job Responsibility
  • Design and implement scalable infrastructure for training, fine-tuning, and serving open source LLMs (Llama, Mistral, Gemma, etc.)
  • Architect and manage Kubernetes clusters for ML workloads, including GPU scheduling, autoscaling, and resource optimization
  • Design, implement, and maintain feature stores for ML model training and inference pipelines
  • Build and optimize LLM inference systems using frameworks like vLLM, TensorRT-LLM, and custom serving solutions
  • Ensure 99.9%+ uptime for ML platforms through robust monitoring, alerting, and incident response procedures
  • Design and implement ML platforms using DataRobot, Azure Machine Learning, Azure Kubernetes Service (AKS), and Azure Container Instances
  • Develop and maintain infrastructure using Terraform, ARM templates, and Azure DevOps
  • Implement cost-effective solutions for GPU compute, storage, and networking across Azure regions
  • Ensure ML platforms meet enterprise security standards and regulatory compliance requirements
  • Evaluate and potentially implement hybrid cloud solutions with AWS/GCP as backup or specialized use cases
What we offer
What we offer
  • Comprehensive Total Rewards program that offers personalized coverage tailor-made for you and your family’s overall well-being
  • Financial benefits including market-competitive compensation
  • a 401K savings plan vested from day one that offers a 6% match
  • performance and recognition-based incentives
  • and tuition assistance
  • Access to additional benefits like mental healthcare as well as fertility and adoption assistance
  • Supports flexibility- We provide workplace flexibility as well as our GEICO Flex program, which offers the ability to work from anywhere in the US for up to four weeks per year
  • Fulltime
Read More
Arrow Right

Staff Software Engineer - AI/ML Infra

GEICO AI platform and Infrastructure team is seeking an exceptional Senior ML Pl...
Location
Location
United States , Palo Alto
Salary
Salary:
90000.00 USD / Year
geico.com Logo
Geico
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in computer science, Engineering, or related technical field (or equivalent experience)
  • 8+ years of software engineering experience with focus on infrastructure, platform engineering, or MLOps
  • 3+ years of hands-on experience with machine learning infrastructure and deployment at scale
  • 2+ years of experience working with Large Language Models and transformer architectures
  • Proficient in Python
  • strong skills in Go, Rust, or Java preferred
  • Proven experience working with open source LLMs (Llama 2/3, Qwen, Mistral, Gemma, Code Llama, etc.)
  • Proficient in Kubernetes including custom operators, helm charts, and GPU scheduling
  • Deep expertise in Azure services (AKS, Azure ML, Container Registry, Storage, Networking)
  • Experience implementing and operating feature stores (Chronon, Feast, Tecton, Azure ML Feature Store, or custom solutions)
Job Responsibility
Job Responsibility
  • Design and implement scalable infrastructure for training, fine-tuning, and serving open source LLMs (Llama, Mistral, Gemma, etc.)
  • Architect and manage Kubernetes clusters for ML workloads, including GPU scheduling, autoscaling, and resource optimization
  • Design, implement, and maintain feature stores for ML model training and inference pipelines
  • Build and optimize LLM inference systems using frameworks like vLLM, TensorRT-LLM, and custom serving solutions
  • Ensure 99.9%+ uptime for ML platforms through robust monitoring, alerting, and incident response procedures
  • Design and implement ML platforms using DataRobot, Azure Machine Learning, Azure Kubernetes Service (AKS), and Azure Container Instances
  • Develop and maintain infrastructure using Terraform, ARM templates, and Azure DevOps
  • Implement cost-effective solutions for GPU compute, storage, and networking across Azure regions
  • Ensure ML platforms meet enterprise security standards and regulatory compliance requirements
  • Evaluate and potentially implement hybrid cloud solutions with AWS/GCP as backup or specialized use cases
What we offer
What we offer
  • Comprehensive Total Rewards program that offers personalized coverage tailor-made for you and your family’s overall well-being
  • Financial benefits including market-competitive compensation
  • a 401K savings plan vested from day one that offers a 6% match
  • performance and recognition-based incentives
  • and tuition assistance
  • Access to additional benefits like mental healthcare as well as fertility and adoption assistance
  • Supports flexibility- We provide workplace flexibility as well as our GEICO Flex program, which offers the ability to work from anywhere in the US for up to four weeks per year
  • Fulltime
Read More
Arrow Right

ML Infra Engineer (Data Systems)

As an ML Infra Engineer (Data Systems), you’ll build and operate the data infras...
Location
Location
United States , San Francisco
Salary
Salary:
Not provided
physicalintelligence.company Logo
Physical Intelligence
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong software engineering fundamentals
  • Experience building distributed systems or large-scale data pipelines
  • Comfort reasoning about performance, memory, I/O, and storage efficiency
  • Familiarity with batch and/or streaming processing systems
  • Experience with object storage systems and data format tradeoffs
  • Ownership mindset: design, build, operate, and iterate on systems end-to-end
  • Enjoy working closely with researchers and unblocking fast-moving projects
Job Responsibility
Job Responsibility
  • Data Ingestion & Processing: Design and build high-throughput pipelines that validate, transform, and featurize raw multimodal data
  • Batch & Streaming Systems: Operate large-scale batch and streaming workflows over massive datasets
  • Storage Systems: Design object storage layouts, metadata systems, and efficient access patterns
  • choose file formats with performance and scalability in mind
  • Data Lifecycle Management: Build systems for backfills, dataset rebuilds, garbage collection, and large-scale transformations
  • Training-Time Performance: Optimize dataloaders, sharding, prefetching, caching, and throughput to reduce time from data arrival → model training
  • Metadata & Indexing: Build scalable metadata stores for datasets, annotations, and training artifacts
  • Data Movement: Move hundreds of terabytes to petabytes efficiently across clusters and environments
  • Operational Correctness: Implement observability, validation, and guardrails to prevent silent data regressions
  • Cross-Functional Collaboration: Work closely with cross-functional teams of researchers, engineers and roboticists to translate evolving data needs into robust systems
  • Fulltime
Read More
Arrow Right