CrawlJobs Logo

Software Engineer, Caching Infrastructure

openai.com Logo

OpenAI

Location Icon

Location:
United States , San Francisco

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

230000.00 - 385000.00 USD / Year

Job Description:

The Caching Infrastructure team is responsible for building a caching layer that powers many critical use cases at OpenAI. We aim to provide a high-availability, multi-tenant cache platform that scales automatically with workload, minimizes tail latency, and supports a diverse range of use cases. We’re looking for an experienced engineer to help design and scale this critical infrastructure.

Job Responsibility:

  • Design, build, and operate OpenAI’s multi-tenant caching platform used across inference, identity, quota, and product experiences
  • Define the long-term vision and roadmap for caching as a core infra capability, balancing performance, durability, and cost
  • Collaborate with other infra teams (e.g., networking, observability, databases) and product teams to ensure our caching platform meets their needs

Requirements:

  • 5+ years of experience building and scaling distributed systems, with a strong focus on caching, load balancing, or storage systems
  • Deep expertise with Redis, Memcached, or similar solutions, including clustering, durability configurations, client-side connection patterns, and performance tuning
  • Production experience with Kubernetes, service meshes (e.g., Envoy), and autoscaling systems
  • Think rigorously about latency, reliability, throughput, and cost in designing platform capabilities
  • Thrive in a fast-paced environment and enjoy balancing pragmatic engineering with long-term technical excellence
What we offer:
  • Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
  • 401(k) retirement plan with employer match
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
  • 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
  • Mental health and wellness support
  • Employer-paid basic life and disability coverage
  • Annual learning and development stipend to fuel your professional growth
  • Daily meals in our offices, and meal delivery credits as eligible
  • Relocation support for eligible employees
  • Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided
  • Offers Equity
  • Performance-related bonus(es) for eligible employees

Additional Information:

Job Posted:
February 21, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Software Engineer, Caching Infrastructure

Software Engineer, AI Infrastructure

As a Software Engineer on our AI Infrastructure team, you will help design the c...
Location
Location
United States , New York, NY; San Mateo, CA
Salary
Salary:
Not provided
fireworks.ai Logo
Fireworks AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Engineering, or a related technical field (or equivalent practical experience)
  • 3 years of experience in software engineering, with a focus on infrastructure or machine learning systems
  • Strong programming skills in Python, Go, or a similar language
  • Proven experience in ML infrastructure and tooling (e.g., PyTorch, MLflow, Vertex AI, SageMaker, Kubernetes, etc.)
  • Basic understanding of LLM knowledge (e.g., context length, disaggregated prefill, KV cache memory estimation, etc)
Job Responsibility
Job Responsibility
  • Contribute to the design and development of scalable backend infrastructure that supports distributed training, inference, and data pipelines
  • Build and maintain core backend services such as LLM CI/CD pipeline, control plane, and model serving systems
  • Support performance optimization, cost efficiency, and reliability improvements across compute, storage, and networking layers
  • Building frameworks and safeguards to ensure Fireworks AI has the best model quality in the industry
  • Collaborate with performance, training, and product teams to translate research and product needs into infrastructure solutions
  • Participate in code reviews, technical discussions, and continuous integration and deployment processes
What we offer
What we offer
  • Solve Hard Problems: Tackle challenges at the forefront of AI infrastructure
  • Build What’s Next: Work with bleeding-edge technology that impacts how businesses and developers harness AI globally
  • Ownership & Impact: Join a fast-growing, passionate team where your work directly shapes the future of AI—no bureaucracy, just results
  • Learn from the Best: Collaborate with world-class engineers and AI researchers who thrive on curiosity and innovation
  • Fulltime
Read More
Arrow Right

Senior Software Engineer - Network Enablement (Applied ML)

We build simple yet innovative consumer products and developer APIs that shape h...
Location
Location
United States , San Francisco
Salary
Salary:
180000.00 - 270000.00 USD / Year
plaid.com Logo
Plaid
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong software engineering skills including systems design, APIs, and building reliable backend services (Go or Python preferred)
  • Production experience with batch and streaming data pipelines and orchestration tools such as Airflow or Spark
  • Experience building or operating real-time scoring and online feature-serving systems, including feature stores and low-latency model inference
  • Experience integrating model outputs into product flows (APIs, feature flags) and measuring impact through experiments and product metrics
  • Experience with model lifecycle and operations: model registries, CI/CD for models, reproducible training, offline & online parity, monitoring and incident response
Job Responsibility
Job Responsibility
  • Embed model inference into Network Enablement product flows and decision logic (APIs, feature flags, backend flows)
  • Define and instrument product + ML success metrics (fraud reduction, retention lift, false positives, downstream impact)
  • Design and run experiments and rollout plans (backtesting, shadow scoring, A/B tests, feature-flagged releases) to validate product hypotheses
  • Build and operate offline training pipelines and production batch scoring for bank intelligence products
  • Ship and maintain online feature serving and low-latency model inference endpoints for real-time partner/bank scoring
  • Implement model CI/CD, model/version registry, and safe rollout/rollback strategies
  • Monitor model/data health: drift/regression detection, model-quality dashboards, alerts, and SLOs targeted to partner product needs
  • Ensure offline and online parity, data lineage, and automated validation / data contracts to reduce regressions
  • Optimize inference performance and cost for real-time scoring (batching, caching, runtime selection)
  • Ensure fairness, explainability and PII-aware handling for partner-facing ML features
What we offer
What we offer
  • medical
  • dental
  • vision
  • 401(k)
  • equity
  • commission
  • Fulltime
Read More
Arrow Right

Senior Infrastructure Engineer – Hosting

As a Senior Infrastructure Engineer – Hosting you will be responsible for the de...
Location
Location
United States
Salary
Salary:
150000.00 USD / Year
corporatetools.com Logo
Corporate Tools
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3-5 years of experience in Linux system administration, virtualization, and cloud infrastructure
  • Experience with Proxmox or other hypervisors (VMware, KVM, Xen, Hyper-V)
  • Experience with Ceph or SAN storage solutions for virtualization
  • Ability to manage kernel tuning, system performance, and process optimization
  • Hands-on experience with Ceph storage, ZFS, iSCSI, NFS, RAID, and SAN architectures
  • Understanding of storage performance metrics (IOPS, throughput, latency)
  • Ability to work on projects solo or with a team
  • Love for learning and improving code
  • Strong communication and collaboration skills
  • Experience with WordPress hosting, database replication, and caching techniques
Job Responsibility
Job Responsibility
  • Develop and design robust and scalable hardware solutions
  • Take ownership of projects from conception to deployment, ensuring timely delivery and meeting the specified requirements
  • Work closely with cross-functional teams, including IT, product management, and other software teams, to ensure seamless integration and alignment with business objectives
  • Deploy, configure, and maintain Proxmox VE clusters for virtualization or other hypervisors
  • Implement high-availability (HA) and failover solutions for virtual machines
  • Manage resource allocation (CPU, memory, disk, network) to optimize performance for hosted applications
  • Automate VM deployment and configuration using Ansible, Terraform, or SaltStack
  • Maintain backups and disaster recovery plans for virtualized environments
  • Design and manage Ceph clusters or SAN storage (iSCSI, NFS, ZFS, etc.) for high-performance, redundant storage
  • Monitor and optimize storage performance, including IOPS, latency, and throughput
What we offer
What we offer
  • 100% employer-paid medical, dental and vision for employees
  • Annual review with raise option
  • 22 days Paid Time Off accrued annually, and 4 holidays
  • After 3 years, PTO increases to 29 days. Employees transition to flexible time off after 5 years with the company—not accrued, not capped, take time off when you want
  • The 4 holidays are: New Year’s Day, Fourth of July, Thanksgiving, and Christmas Day
  • Paid Parental Leave
  • Up to 6% company matching 401(k) with no vesting period
  • Quarterly allowance
  • Use to make your remote work set up more comfortable, for continuing education classes, a plant for your desk, coffee for your coworker, a massage for yourself... really, whatever
  • Open concept office with friendly coworkers
  • Fulltime
Read More
Arrow Right

Senior Software Engineer - Transactional Data Platform

As a Senior Software Engineer, you will play a critical role in designing, build...
Location
Location
Australia , Sydney
Salary
Salary:
Not provided
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Software Engineering, or a related technical field
  • 5+ years of experience in backend software development
  • 3+ years of hands-on experience working with AWS cloud services, particularly AWS storage technologies (S3, DynamoDB, EBS, EFS, FSx, or Glacier)
  • 3+ years of experience in designing and developing distributed systems or high-scale backend services
  • Strong programming skills in Kotlin
  • Experience working in agile environments following DevOps and CI/CD best practices
  • Strong Backend Development Skills
  • Proficiency in Kotlin, Java for backend development
  • Experience building high-performance, scalable microservices and APIs
  • Strong understanding of RESTful APIs, gRPC, and event-driven architectures
Job Responsibility
Job Responsibility
  • Designing, building, and optimizing high-performance, scalable, and resilient backend storage solutions on AWS cloud infrastructure
  • Developing distributed storage systems, APIs, and backend services that power mission-critical applications, ensuring low-latency, high-throughput, and fault-tolerant data storage
  • Collaborating closely with principal engineers, architects, SREs, and product teams to define technical roadmaps, improve storage efficiency, and optimize access patterns
  • Driving performance tuning, data modeling, caching strategies, and cost optimization across AWS storage services like S3, DynamoDB, EBS, EFS, FSx, and Glacier
  • Contributing to infrastructure automation, security best practices, and monitoring strategies using tools like Terraform, CloudWatch, Prometheus, and OpenTelemetry
  • Troubleshooting and resolving production incidents related to data integrity, latency spikes, and storage failures, ensuring high availability and disaster recovery preparedness
  • Mentoring junior engineers, participating in design reviews and architectural discussions, and advocating for engineering best practices such as CI/CD automation, infrastructure as code, and observability-driven development
What we offer
What we offer
  • Atlassians can choose where they work – whether in an office, from home, or a combination of the two
  • Flexibility for eligible candidates to work remotely across the West US
  • Fulltime
Read More
Arrow Right

Senior Staff Software Engineer

As a Senior Staff Software Engineer, you will join a highly performing team of e...
Location
Location
United States , New York
Salary
Salary:
156400.00 - 225000.00 USD / Year
siriusxm.com Logo
SiriusXM
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s Degree in Computer Science/Mathematics or a similar field
  • 12+ years of software engineering experience in Java programming language (preferably JDK17 or higher)
  • 5+ years of experience developing and designing data applications and data pipelines
  • 7+ years of experience crafting microservices and scalable products, utilizing diverse cloud platforms (ideally AWS)
  • Strong understanding of engineering software processes, lifecycle methodology, configuration management, release management, and system verification and testing
  • Ability to work independently and oversee entire projects or significant parts, focusing on completing the tasks on time
  • Proven ability to research and become proficient in new technologies
  • Strong analytical and problem-solving skills, with meticulous attention to detail and a dedication to continuous improvement
  • Proficiency in constructing detailed software architecture diagrams
  • Interpersonal skills and ability to interact and work with staff at all levels
Job Responsibility
Job Responsibility
  • Design and build high-performance, reliable, and scalable reporting APIs and data pipelines
  • Lead high-level architecture discussions and planning sessions work with ad measurement, revenue and addressability data
  • Work with various teams of engineers building software in a collaborative development process
  • Execute tasks with utmost clarity and precision, demonstrating a strong sense of ownership and providing clear direction to drive projects forward effectively
  • Collaborate with the Product team to clarify the scope of the projects by giving thoughtful feedback which challenges and clarifies requirements intent
  • Collaborate with the Global Operations and Cloud Infrastructure teams to ensure flawless production deployments and support the Incident Management team
  • Responsible for conducting technical interviews as needed, ensuring a consistently high bar for engineering excellence and performance standards
  • Responsible for mentoring and guiding junior engineers.
What we offer
What we offer
  • discretionary short-term and long-term incentives
  • Fulltime
Read More
Arrow Right

Senior Software Engineer - Data Team

We’re looking for Software Engineers to join our Data Department, developers wit...
Location
Location
Spain , Barcelona; Madrid
Salary
Salary:
50000.00 - 70000.00 EUR / Year
https://feverup.com/fe Logo
Fever
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Python Proficiency: confident working deeply in Python, understand topics like the GIL, concurrency (asyncio), generators, and decorators, care about maintainable typing and thoughtful performance optimization
  • Architecture Patterns: comfortable applying Hexagonal Architecture to keep domain logic clean and decoupled, can leverage patterns like CQRS and the Transactional Outbox to support consistency and reliability in an event-driven environment
  • Database Polyglot: strong SQL fundamentals, know how to design for performance (PostgreSQL internals, indexing strategies), understand when tools like Redis (caching) or Elasticsearch (search/aggregations) are the right fit
  • Communication: communicate clearly in English across audiences
  • Pragmatic mindset: balance quality with impact, able to make thoughtful trade-offs, deliver iteratively, and keep an eye on long-term sustainability while moving at a good pace
Job Responsibility
Job Responsibility
  • Architect and Build: Design, implement, and maintain scalable microservices using Python (FastAPI/Django), take ownership of breaking down complex monoliths or building new services from the ground up, applying DDD principles
  • Master the Event Stream: Build robust, event-driven flows with Kafka, ensure that our events are durable, ordered, and processed idempotently, managing eventual consistency with care
  • Integrate at Scale: Design fault-tolerant integrations with third-party ecosystems (Meta Ads, Google Marketing Platform, Salesforce), navigate rate limits, retries, and circuit breakers to maintain platform stability
  • Bridge OLTP and OLAP: Work at the intersection of transactional applications and analytical data, optimize PostgreSQL for operational efficiency while designing ingestion pipelines for Snowflake and Elasticsearch, using Airflow and dbt
  • Productionize Data Capabilities: Partner closely with Data Science, Machine Learning, and Data Engineering teams to ensure seamless integration of data sources and model infrastructure
  • Elevate the Bar: Lead thorough code reviews, write RFCs for key technical decisions, and mentor mid-level engineers, champion testing strategies (unit, integration, contract testing) and advocate for clean, sustainable code architecture
What we offer
What we offer
  • Responsibility from day one and professional and personal growth
  • Opportunity to have a real impact in a high-growth global category leader
  • A compensation package consisting of base salary and the potential to earn a significant bonus for top performance
  • Stock options plan
  • 40% discount on all Fever events and experiences
  • Home office friendly
  • Health insurance and other benefits such as Flexible remuneration with a 100% tax exemption through Cobee
  • English / Spanish Lessons
  • Wellhub Membership
  • Possibility to receive in advance part of your salary by Payflow
  • Fulltime
Read More
Arrow Right

Principal Software Engineer, Trusted Data Platform

As a Principal Software Engineer, you will be a technical leader and hands-on co...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Software Engineering, or a related technical field
  • 10+ years of experience in backend software development, focusing on distributed systems and storage solutions
  • 5+ years of experience working with AWS storage services (S3, DynamoDB, EBS, EFS, FSx, Glacier)
  • Strong expertise in system design, architecture, and scalability for large-scale storage solutions
  • Proficiency in at least one major backend programming language (Kotlin, Java, Go, Rust, or Python)
  • Experience designing and implementing highly available, fault-tolerant, and cost-efficient storage architectures
  • Deep understanding of distributed systems, replication strategies, sharding, and caching
  • Knowledge of data security, encryption best practices, and compliance requirements (SOC2, GDPR, HIPAA)
  • Experience leading engineering teams, mentoring senior engineers, and driving technical roadmaps
  • Proficiency with observability tools, performance monitoring, and troubleshooting at scale
Job Responsibility
Job Responsibility
  • Designing and optimizing high-scale, distributed storage systems built on AWS storage technologies
  • Shaping the architecture, performance, and reliability of backend storage solutions that power critical applications at scale
  • Designing, implementing, and optimizing backend storage services that support high throughput, low latency, and fault tolerance
  • Working closely with senior engineers, architects, and cross-functional teams to drive scalability, availability, and efficiency improvements in large-scale storage solutions
  • Leading technical deep dives, architecture reviews, and root cause analyses to resolve complex production issues related to storage performance, consistency, and durability
  • Driving best practices in distributed system design, security, and cloud cost optimization
  • Mentoring senior engineers, contributing to technical roadmaps, and helping shape the long-term storage strategy
  • Collaborating with Site Reliability Engineers (SREs) to implement observability, monitoring, and disaster recovery strategies, ensuring high availability and compliance with industry standards
  • Advocating for automation, Infrastructure-as-Code (IaC), and DevOps best practices, leveraging tools like Terraform, AWS CloudFormation, Kubernetes (EKS), and CI/CD pipelines to enable scalable deployments and operational excellence
What we offer
What we offer
  • Atlassians can choose where they work – whether in an office, from home, or a combination of the two
  • Atlassians have more control over supporting their family, personal goals, and other priorities
  • We can hire people in any country where we have a legal entity
  • Interviews and onboarding are conducted virtually
  • Whatever your preference - working from home, an office, or in between - you can choose the place that's best for your work and your lifestyle
Read More
Arrow Right

Software Engineer, Edge

Vercel is looking for engineers to help us build functional systems that improve...
Location
Location
United States , San Francisco; New York City
Salary
Salary:
196000.00 - 294000.00 USD / Year
vercel.com Logo
Vercel
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Intrigued about tackling the complex challenges serving petabytes of data and billions of requests to millions of people
  • A collaborative team player who believes in the power of strong teams to drive significant changes and innovations
  • Have at least 5 years of relevant experience
  • Have deep experience with how to make high performance systems scale in the cloud
  • Want to help protect the Vercel platform and our customers from abuse
  • Are keen to experiment, challenge norms and deliver secure and reliable systems that delight Vercel's users
  • Are knowledgeable and experienced with web servers, network protocols
  • Comfortable in Golang
  • Familiar with Cloud Services (e.g. AWS, Google Cloud, or others)
Job Responsibility
Job Responsibility
  • Help to scale and improve our infrastructure, availability and reliability by working with our backend engineers and product team to identify problems, create tooling and automation
  • Be comfortable designing systems, writing code, and debugging systems in production
  • Orchestrate deploying, routing and serving for our customers through capabilities and tooling we maintain that leverage our own and other cloud providers' infrastructure for networking, TLS termination, routing, storage, caching and other novel edge services
What we offer
What we offer
  • Competitive compensation package, including equity
  • Inclusive Healthcare Package
  • Learn and Grow - we provide mentorship and send you to events that help you build your network and skills
  • Flexible Time Off
  • We will provide you the gear you need to do your role, and a WFH budget for you to outfit your space as needed
  • Fulltime
Read More
Arrow Right