CrawlJobs Logo

Staff Site Reliability Engineer

affirm.com Logo

Affirm

Location Icon

Location:
Spain

Category Icon
Category:
IT - Software Development

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

101000.00 - 131000.00 EUR / Year

Job Description:

Affirm is reinventing credit to make it more honest and friendly, giving consumers the flexibility to buy now and pay later without any hidden fees or compounding interest. Site Reliability Engineering at Affirm is a small, yet crucial, team that helps our Engineering partners to “Operate What They Own” with excellence to protect their customers’ experience. SRE accomplishes this through defining frameworks and best practices for operating applications, building tooling, and providing training and consulting.

Job Responsibility:

  • Set technical strategy vision for your team on a multi year-long time scale, and help your team tie it together with critical, business-impacting projects
  • Collaborate across teams in the product development lifecycle by collaborating with infrastructure, product management, developer experience & analytics to ensure technical sustainability, risks and trade-offs are well understood and managed
  • Act as a force-multiplier for your team through your definition and advocacy of technical solutions and operational processes
  • Take ownership of your team’s operations and availability by ensuring you have the right monitoring, triage rotations, playbooks, policies, testing and alerting in place to support “keep the lights on” & on-call efforts
  • Foster a culture of quality and ownership on your team by setting code review and design standards for your team, and advocating for them beyond your team through your writing and tech talks
  • Help develop talent on your team by providing feedback and guidance, and leading by example
  • Participate in an on-call rotation

Requirements:

  • 8+ years of experience designing, developing, advocating as a point subject of reference, and launching backend systems at scale using scripting and development languages like Bash, Python or Kotlin
  • Extensive track record of developing highly available distributed systems using technologies like AWS, MySQL, Spark and Kubernetes
  • Track record of managing, driving and improving the Incident Livecycle process from live incident management through retrospective and post-incident analysis to provide actional insights to enhance overall system reliability, resilience, and performance
  • 7+ years experience in Site Reliability or Production Engineering teams
  • Experience delivering major features, system components or deprecating existing functionality in a system through the definition of a technical and execution plan
  • Ability to write high quality code that is easily understood and used by others
  • Strong verbal and written communication skills that support effective collaboration with our global engineering team and key stakeholders of an organization
  • Equivalent practical experience or a Bachelor’s degree in a related field
  • Based in Spain for the role

Nice to have:

  • Demonstrate curiosity with empathy, and strong opinions loosely held
  • Thrive in ambiguity, and are comfortable moving from low level language idioms all the way to the architecture of large systems to understand how they work
  • Growth and impact trajectory demonstrates that you have mastered gathering and iterating on feedback from your engineering and cross-functional peers
What we offer:
  • Flexible Spending Wallets for tech, food and lifestyle
  • Away Days - wellness days to take off work and recharge
  • Learning & Development programs
  • Parental benefit
  • Employee Resource & Community Groups
  • Health care coverage - Affirm covers all premiums for all levels of coverage for you and your dependents
  • Flexible Spending Wallets - generous stipends for spending on Technology, Food, various Lifestyle needs, and family forming expenses
  • Time off - competitive vacation and holiday schedules allowing you to take time off to rest and recharge
  • ESPP - An employee stock purchase plan enabling you to buy shares of Affirm at a discount
  • Visa sponsorship

Additional Information:

Job Posted:
December 15, 2025

Employment Type:
Fulltime
Work Type:
Remote work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Staff Site Reliability Engineer

New

Staff Site Reliability Engineer

Site Reliability Engineering at Affirm is a small, yet crucial, team that helps ...
Location
Location
Poland
Salary
Salary:
358000.00 - 458000.00 PLN / Year
affirm.com Logo
Affirm
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience designing, developing, advocating as a point subject of reference, and launching backend systems at scale using scripting and development languages like Bash, Python or Kotlin
  • Extensive track record of developing highly available distributed systems using technologies like AWS, MySQL, Spark and Kubernetes
  • Track record of managing, driving and improving the Incident Livecycle process from live incident management through retrospective and post-incident analysis to provide actional insights to enhance overall system reliability, resilience, and performance
  • 7+ years experience in Site Reliability or Production Engineering teams
  • Demonstrate curiosity with empathy, and strong opinions loosely held
  • Experience delivering major features, system components or deprecating existing functionality in a system through the definition of a technical and execution plan
  • Write high quality code that is easily understood and used by others
  • Thrive in ambiguity, and are comfortable moving from low level language idioms all the way to the architecture of large systems to understand how they work
  • Growth and impact trajectory demonstrates that you have mastered gathering and iterating on feedback from your engineering and cross-functional peers
  • Strong verbal and written communication skills that support effective collaboration with our global engineering team and key stakeholders of an organization
Job Responsibility
Job Responsibility
  • Set technical strategy vision for your team on a multi year-long time scale, and help your team tie it together with critical, business-impacting projects
  • Collaborate across teams in the product development lifecycle by collaborating with infrastructure, product management, developer experience & analytics to ensure technical sustainability, risks and trade-offs are well understood and managed
  • Act as a force-multiplier for your team through your definition and advocacy of technical solutions and operational processes
  • Take ownership of your team’s operations and availability by ensuring you have the right monitoring, triage rotations, playbooks, policies, testing and alerting in place to support “keep the lights on” & on-call efforts
  • Foster a culture of quality and ownership on your team by setting code review and design standards for your team, and advocating for them beyond your team through your writing and tech talks
  • Help develop talent on your team by providing feedback and guidance, and leading by example
What we offer
What we offer
  • Flexible Spending Wallets for tech, food and lifestyle
  • Away Days - wellness days to take off work and recharge
  • Learning & Development programs
  • Parental leave
  • Employee Resource & Community Groups
  • Health care coverage - Affirm covers all premiums for all levels of coverage for you and your dependents
  • Flexible Spending Wallets - generous stipends for spending on Technology, Food, various Lifestyle needs, and family forming expenses
  • Time off - competitive vacation and holiday schedules allowing you to take time off to rest and recharge
  • ESPP - An employee stock purchase plan enabling you to buy shares of Affirm at a discount
  • Fulltime
Read More
Arrow Right
New

Staff Engineer, Site Reliability

LearnUpon is looking for a Staff Site Reliability Engineer to join our team in I...
Location
Location
Ireland , Dublin
Salary
Salary:
Not provided
learnupon.com Logo
LearnUpon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of experience in a software or Ops role
  • 5+ years of cloud engineering experience, with at least 2 years experience with AWS
  • Experience deploying Microservice environments, using containerisation technologies such as Kubernetes and Docker
  • Experience in designing and implementing Observability tech stacks
  • Have championed the benefits of Observability to Engineering teams
  • Can architect the design of SLO/SLI implementation that balances the needs of different teams
  • Familiar with cost analysis of Observability metrics gathering, Engineering effort, and tooling
  • Experience building and supporting large-scale distributed systems that back a consumer app or website with associated requirements of performance, security and disaster recovery
  • Experience with implementing IaaC (e.g. CloudFormation, Terraform etc.), automation tooling (e.g. Puppet, Ansible etc.), CI/CD (e.g. Jenkins, Travis CI, GitLab etc.)
  • Able to effectively communicate technical ideas to and collaborate with both technical and non-technical peers
Job Responsibility
Job Responsibility
  • Identifying opportunities to improve and scale our infrastructure for performance, observability, maintainability, and cost, by creating innovative solutions
  • Leading our efforts to build an observability function that incorporates application metrics, application transaction tracking, and event log management
  • Driving the processes to maintain resilient, scalable and cost-effective infrastructure
  • Working with other Engineering teams to provide infrastructure solutions that meet their ongoing requirements
  • Building tools focused on measuring, monitoring and alerting, with an eye towards self-service in order to promote Engineers’ ownership of observability
  • Reacting quickly to changing customer and business needs
  • Participate in on-call rota
  • Mentoring junior talent
What we offer
What we offer
  • Work in a fun and supportive environment with regular team events
  • Excellent career progression
  • Structured learning environment
  • Competitive salary and company ESOP
  • Private health insurance
  • 26 days annual leave
  • Fulltime
Read More
Arrow Right

Staff Site Reliability Engineer

We are looking for a Site Reliability Engineer to own our internal systems infra...
Location
Location
United States , Sunnyvale
Salary
Salary:
175000.00 - 250000.00 USD / Year
figure.ai Logo
Figure
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong experience with Linux/Unix systems administration
  • Proficiency in programming/scripting
  • Extensive experience with cloud platforms (Azure, AWS, GCP) and on-prem hardware architectures
  • Experience designing, deploying, and operating high-availability, fault-tolerant, and distributed systems
  • Mastery of infrastructure as code (Terraform, CloudFormation, Ansible…)
  • Familiarity with monitoring, logging, and alerting tools (Prometheus, Grafana, Datadog…)
  • Solid understanding of networking fundamentals (TCP/IP, DNS, HTTP, load balancers, firewalls)
  • Experience defining Service Level Objectives (SLO), developing runbooks/incident response plans, facilitating post-mortems and managing systems assets
  • Ability to work in cross-functional teams with developers, infra, and product teams
  • Excellent verbal and written communication skills
Job Responsibility
Job Responsibility
  • Be the go to person for mission critical infrastructure enabling critical operations such as Source Configuration Management, CI/CD systems, software distribution, supplier portals, manufacturing and more
  • Migrate SaaS to self-hosted solutions to enhance security and reliability
  • Implement monitoring and alerting systems, and define incident response plans and runbooks
  • Reduce human workload through automation to automate deployment and scaling
  • Establish strong relationships with stakeholders to identify infrastructure needs and establish Service Level Objectives
  • Use a data driven approach to demonstrate service robustness and track optimization work
  • Partner with the security team to ensure that security remediations and updates are applied in a timely manner
  • Fulltime
Read More
Arrow Right

Staff Site Reliability Engineer

At Ledger, we are looking for an experienced Reliability Engineer to join our SR...
Location
Location
France , Paris
Salary
Salary:
Not provided
https://www.ledger.com Logo
Ledger
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years on cloud engineering at scale, on organizations operating SaaS solutions
  • Proficiency in working in Unix/Linux environments, Git, Python, Terraform, Kubernetes, AWS cloud solutions and architectures, CI/CD tools, Argocd, Ansible, configuration management, etc.
  • Strong knowledge on observability practices, with experience implementing and managing Logging, Monitoring and Alerting framework with solutions such as Datadog or Prometheus/Grafana/Loki.
  • Experience of cross-functional work and the ability to demonstrate a collaborative approach with regards to building key relationships across the organization and define projects scope, goals, plan and deliverables
  • Customer focused with the ability to identify and understand both internal and external customer's needs
  • Creative problem-solving and analysis skills with an ability to identify, develop, and implement solutions to meet the needs of the business
  • Excellent presentation and written communication
  • Ability to deal with ambiguity, high level of pressure and rapidly changing environments
  • Engineering degree.
Job Responsibility
Job Responsibility
  • Participate in building a DevOps / SRE culture and enable the transition to modern infrastructure management and deployment practices
  • Participate in building the SRE team roadmap (vision and delivery accountability). Anticipate stakeholder needs, game-changing technologies emergence and challenge scope / deadlines
  • Perform integration of platform software components
  • Participate to design and deliver solutions to improve the availability, scalability, latency, and efficiency of systems
  • Influence and create standards & best practices in support of service level objectives
  • Automate key SRE metrics including SLOs/SLAs and error budgets
  • Provide expert support to our level-2/application support team, to troubleshoot priority incidents, and conduct post-mortems
  • Apply analytics on past incidents and usage patterns to predict issues and take proactive actions
  • Ensure control of technical debt and promote quality practices
  • Follow SRE and chaos engineering approaches across all strategic systems to predict in coordination with Service Design and prevent outages and improve solution availability
What we offer
What we offer
  • Equity: Employees are the foundation of our success, and we award stock options so you can share in that success as we grow
  • Flexibility: A hybrid work policy
  • Social: Annual company outing for Ledgerdary Days, plus frequent social events, snacks and drinks
  • Medical: Comprehensive health insurance policy offering extensive medical, dental and vision care coverage
  • Well-being: Personal development, coaching & fitness with our dedicated partners
  • Vacation: Five weeks of paid leave per year, in addition to national holidays and rest & relaxation (RTT) days
  • High tech: Access to high performance office equipment and gadgets, including Apple products
  • Transport: Ledger reimburses part of your preferred means of transportation
  • Discounts: Employee discount on all our products.
  • Fulltime
Read More
Arrow Right
New

Staff Platform Engineer

Join our dynamic team as a Compute Platform Engineer and play a pivotal role in ...
Location
Location
United States , Mountain View, California
Salary
Salary:
180000.00 - 280000.00 USD / Year
inworld.ai Logo
Inworld AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7 years of experience in software engineering
  • 5 years of experience with infrastructure-as-code
  • Proficiency in managing Kubernetes clusters and applications, including creating Kustomize manifests/Helm charts for new applications
  • Experience in creating and maintaining CI/CD pipelines for both applications and infrastructure deployments (using tools like Terraform/Terragrunt, ArgoCD, GitHub Actions, Ansible, etc.)
  • Deep knowledge of at least one major cloud provider (Google Cloud Platform, Microsoft Azure, Oracle Cloud)
  • Proficient in at least one backend programming/scripting languages such as Golang, Python, and Bash
  • Candidates must be based in the SF Bay Area or willing to relocate (you will be working on-site in our South Bay office a few days a week)
Job Responsibility
Job Responsibility
  • Work closely with backend and ML engineering teams to design, deploy, and maintain reliable, high-performance, and secure cloud infrastructure for our AI engine and Studio
  • Facilitate a "you build it, you run it" culture by providing the necessary tools and processes for monitoring the reliability, availability, and performance of services
  • Manage CI/CD pipelines to ensure smooth and efficient code integration and deployment
  • Identify and implement opportunities to enhance engineering speed and efficiency
  • Conduct root cause analysis to identify critical issues and develop automated solutions to prevent recurrence
  • Develop and share best practices to improve automation and efficiency across our engineering teams
What we offer
What we offer
  • equity and benefits
  • Fulltime
Read More
Arrow Right

Sr. Staff SRE Engineer

Archer is an aerospace company based in San Jose, California building an all-ele...
Location
Location
United States , San Jose
Salary
Salary:
163000.00 - 204000.00 USD / Year
archer.com Logo
Archer Aviation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of experience in Site Reliability Engineering, DevOps, or a similar role with a strong focus on operational excellence
  • Deep expertise in Amazon EKS, including cluster provisioning, management, and troubleshooting
  • Extensive experience with observability tools and practices, including Prometheus, Grafana, ELK stack, or similar
  • Proven track record in designing and implementing robust data pipelines (e.g., Kafka, Airflow, Spark)
  • Strong background in CI/CD methodologies and tools (e.g., Jenkins, GitLab CI, ArgoCD)
  • Expert-level knowledge of cloud platforms (AWS preferred), including infrastructure-as-code principles
  • Comprehensive understanding of security best practices for cloud environments, applications, and data
  • Proficiency in Docker for containerization and orchestration
  • Advanced scripting and programming skills in Python, Bash, and PowerShell
  • Solid understanding of networking concepts, distributed systems, and operating systems
Job Responsibility
Job Responsibility
  • Lead the design, implementation, and maintenance of highly available, scalable, and secure cloud-native infrastructure on Amazon Elastic Kubernetes Service (EKS)
  • Develop and implement comprehensive observability strategies, including monitoring, logging, and alerting, to ensure the health and performance of our systems
  • Architect and optimize data pipelines to ensure efficient and reliable data flow across various platforms
  • Drive the continuous improvement of our CI/CD pipelines, promoting best practices for automated testing, deployment, and release management
  • Champion cloud-first strategies, leveraging the full capabilities of cloud platforms for infrastructure, services, and operations
  • Implement and enforce robust security practices across our infrastructure, applications, and data
  • Design and maintain Docker-based containerization solutions for our applications
  • Develop and maintain automation scripts and tools using Python, Bash, and PowerShell
  • Collaborate with development teams to ensure reliability is built into the software development lifecycle from inception
  • Troubleshoot complex production issues across various layers of the stack, identifying root causes and implementing preventative measures
What we offer
What we offer
  • Equal-opportunity employer committed to creating a diverse and inclusive workplace
  • Reasonable accommodations for job applicants with physical or mental disabilities, and those with sincerely held religious beliefs
  • Fulltime
Read More
Arrow Right
New

Staff Software Engineer

As a Staff Forward Deployed Engineer (FDE) at Invisible, you'll lead high-impact...
Location
Location
United States , Austin; New York; San Francisco Bay Area; Washington DC–Baltimore
Salary
Salary:
213000.00 - 300000.00 USD / Year
invisible.co Logo
Invisible Technologies
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of software engineering experience, including significant time spent building data, ML, or backend systems
  • Deep proficiency in Python with hands-on experience using Hugging Face, LangChain, OpenAI, Pinecone, and related ecosystems
  • Skilled in full-stack and API-based deployment patterns, including Docker, FastAPI, Kubernetes, and cloud environments (GCP, AWS)
  • Experienced with workflow orchestration libraries, pub/sub systems (Kafka), and schema governance
  • Expertise in data governance and operations, including Unity Catalog and policy management, cluster/job orchestration, data contracts and quality enforcement, Delta/ETL pipelines, and replay processes
  • Strong product and system design instincts — you understand business needs and how to translate them into technical architecture
  • Experience building usable systems from messy data and ambiguous requirements
  • Excellent communication and client-facing skills
  • you’ve led conversations with technical and non-technical stakeholders alike
  • Proven experience owning projects from scoping through deployment in ambiguous, high-stakes environments
Job Responsibility
Job Responsibility
  • Partner with delivery and executive stakeholders to scope, design, and lead implementation of AI-driven solutions
  • Identify transformational opportunities in messy, ambiguous workflows and turn them into repeatable systems
  • Lead architecture design and trade-off discussions across performance, scalability, cost, and reliability
  • Own projects from first discovery call through full deployment — including client-facing delivery, internal coordination, and post-launch iteration
  • Build shared infrastructure, reusable components, and internal playbooks to level-up the team
  • Coach and mentor mid-level engineers and help shape the culture of forward-deployed AI engineering at Invisible
What we offer
What we offer
  • bonus
  • equity
  • benefits
  • Fulltime
Read More
Arrow Right
New

Staff Software Engineer, Forward Deployed

As a Staff Forward Deployed Engineer (FDE) at Invisible, you'll lead high-impact...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
invisible.co Logo
Invisible Technologies
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of software engineering experience, including significant time spent building data, ML, or backend systems
  • Deep proficiency in Python and experience with ML/LLM frameworks such as Hugging Face, LangChain, OpenAI, Pinecone, etc.
  • Familiarity with full-stack or API-based deployment patterns (Docker, FastAPI, Kubernetes, GCP/AWS)
  • Strong product and system design instincts — you understand business needs and how to translate them into technical architecture
  • Experience building usable systems from messy data and ambiguous requirements
  • Excellent communication and client-facing skills
  • you’ve led conversations with technical and non-technical stakeholders alike
  • Proven experience owning projects from scoping through deployment in ambiguous, high-stakes environments
  • Be willing to be on-call for our customers when situations ari
  • Ability to travel roughly 25–50 % of the time, sometimes short-notice trips—primarily across Europe with occasional international roll-outs—to work directly on-site with clients
Job Responsibility
Job Responsibility
  • Partner with delivery and executive stakeholders to scope, design, and lead implementation of AI-driven solutions
  • Identify transformational opportunities in messy, ambiguous workflows and turn them into repeatable systems
  • Lead architecture design and trade-off discussions across performance, scalability, cost, and reliability
  • Own projects from first discovery call through full deployment — including client-facing delivery, internal coordination, and post-launch iteration
  • Build shared infrastructure, reusable components, and internal playbooks to level-up the team
  • Coach and mentor mid-level engineers and help shape the culture of forward-deployed AI engineering at Invisible
What we offer
What we offer
  • Bonuses and equity are included in offers above entry level
  • Fulltime
Read More
Arrow Right
Welcome to CrawlJobs.com
Your Global Job Discovery Platform
At CrawlJobs.com, we simplify finding your next career opportunity by bringing job listings directly to you from all corners of the web. Using cutting-edge AI and web-crawling technologies, we gather and curate job offers from various sources across the globe, ensuring you have access to the most up-to-date job listings in one place.