CrawlJobs Logo

Staff Software Engineer - Site Reliability

ironcladapp.com Logo

Ironclad

Location Icon

Location:
United States , San Francisco

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

210000.00 - 235000.00 USD / Year

Job Description:

Ironclad is the leading AI contracting platform that transforms agreements into assets. Contracts move faster, insights surface instantly, and agents push work forward, all with you in control. Whether you’re buying or selling, Ironclad unifies the entire process on one intelligent platform, providing leaders with the visibility they need to stay one step ahead. That’s why the world’s most transformative organizations, from OpenAI to the World Health Organization and the Associated Press, trust Ironclad to accelerate their business. Site Reliability Engineer sits under the umbrella of Product and Engineering, and plays a pivotal role in ensuring developers have the tools, infrastructure and monitoring to provide our customers with an enterprise-grade experience. As a staff-level SRE, expectations are to help set the technical strategy for the team, drive cross-team impact, improve organizational efficiency, and champion SRE culture at Ironclad.

Job Responsibility:

  • Be part of the Cloud Platform SRE Team, focused on building our Cloud Platform using modern tools and best practices
  • Champion SRE best practices within the team and throughout the organization
  • Ensure the reliability, availability, and performance of services and infrastructure
  • Solve the whole problem. Design, implement, and maintain scalable systems
  • Automate repetitive operational tasks to streamline processes
  • Monitor system performance and troubleshoot issues proactively
  • Develop and document best practices for system operations
  • Collaborate with development teams to enhance system design
  • Manage incident responses and perform root cause analysis
  • Participate in on-call rotations to handle critical issues as they arise
  • Be a mentor, multiply our team’s output with leadership and guidance

Requirements:

  • Minimum of 5 years of experience in a Site Reliability Engineering / DevOps role
  • Expert knowledge of Docker and Kubernetes, Crossplane experience is a plus
  • Strong knowledge of cloud platforms such as AWS and Google Cloud
  • Proficiency in scripting and programming languages like Python, Typescript, or Bash
  • Experience with infrastructure-as-code tools like Terraform or Pulumi
  • Strong troubleshooting and analytical skills, drive to help customers, and the ability to dive deep and learn a new product
  • Experience with CI/CD pipelines and deployment automation tools such as CircleCI and ArgoCD
  • Strong understanding of networking and security principles

Nice to have:

Service Mesh is a plus

What we offer:
  • 100% health coverage for employees (medical, dental, and vision), and 75% coverage for dependents with buy-up plan options available
  • Market-leading leave policies, including gender-neutral parental leave and compassionate leave
  • Family forming support through Maven for you and your partner
  • Paid time off - take the time you need, when you need it
  • Monthly stipends for wellbeing, hybrid work, and (if applicable) cell phone use
  • Mental health support through Modern Health, including therapy, coaching, and digital tools
  • Pre-tax commuter benefits (US Employees)
  • 401(k) plan with Fidelity with employer match (US Employees)
  • Regular team events to connect, recharge, and have fun
  • And most importantly: the opportunity to help build the company you want to work at
  • Offers Equity

Additional Information:

Job Posted:
February 18, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Staff Software Engineer - Site Reliability

Staff Site Reliability Engineer

At Ledger, we are looking for an experienced Reliability Engineer to join our SR...
Location
Location
France , Paris
Salary
Salary:
Not provided
https://www.ledger.com Logo
Ledger
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years on cloud engineering at scale, on organizations operating SaaS solutions
  • Proficiency in working in Unix/Linux environments, Git, Python, Terraform, Kubernetes, AWS cloud solutions and architectures, CI/CD tools, Argocd, Ansible, configuration management, etc.
  • Strong knowledge on observability practices, with experience implementing and managing Logging, Monitoring and Alerting framework with solutions such as Datadog or Prometheus/Grafana/Loki.
  • Experience of cross-functional work and the ability to demonstrate a collaborative approach with regards to building key relationships across the organization and define projects scope, goals, plan and deliverables
  • Customer focused with the ability to identify and understand both internal and external customer's needs
  • Creative problem-solving and analysis skills with an ability to identify, develop, and implement solutions to meet the needs of the business
  • Excellent presentation and written communication
  • Ability to deal with ambiguity, high level of pressure and rapidly changing environments
  • Engineering degree.
Job Responsibility
Job Responsibility
  • Participate in building a DevOps / SRE culture and enable the transition to modern infrastructure management and deployment practices
  • Participate in building the SRE team roadmap (vision and delivery accountability). Anticipate stakeholder needs, game-changing technologies emergence and challenge scope / deadlines
  • Perform integration of platform software components
  • Participate to design and deliver solutions to improve the availability, scalability, latency, and efficiency of systems
  • Influence and create standards & best practices in support of service level objectives
  • Automate key SRE metrics including SLOs/SLAs and error budgets
  • Provide expert support to our level-2/application support team, to troubleshoot priority incidents, and conduct post-mortems
  • Apply analytics on past incidents and usage patterns to predict issues and take proactive actions
  • Ensure control of technical debt and promote quality practices
  • Follow SRE and chaos engineering approaches across all strategic systems to predict in coordination with Service Design and prevent outages and improve solution availability
What we offer
What we offer
  • Equity: Employees are the foundation of our success, and we award stock options so you can share in that success as we grow
  • Flexibility: A hybrid work policy
  • Social: Annual company outing for Ledgerdary Days, plus frequent social events, snacks and drinks
  • Medical: Comprehensive health insurance policy offering extensive medical, dental and vision care coverage
  • Well-being: Personal development, coaching & fitness with our dedicated partners
  • Vacation: Five weeks of paid leave per year, in addition to national holidays and rest & relaxation (RTT) days
  • High tech: Access to high performance office equipment and gadgets, including Apple products
  • Transport: Ledger reimburses part of your preferred means of transportation
  • Discounts: Employee discount on all our products.
  • Fulltime
Read More
Arrow Right

Staff Site Reliability Engineer

We are looking for a Site Reliability Engineer to own our internal systems infra...
Location
Location
United States , Sunnyvale
Salary
Salary:
175000.00 - 250000.00 USD / Year
figure.ai Logo
Figure
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong experience with Linux/Unix systems administration
  • Proficiency in programming/scripting
  • Extensive experience with cloud platforms (Azure, AWS, GCP) and on-prem hardware architectures
  • Experience designing, deploying, and operating high-availability, fault-tolerant, and distributed systems
  • Mastery of infrastructure as code (Terraform, CloudFormation, Ansible…)
  • Familiarity with monitoring, logging, and alerting tools (Prometheus, Grafana, Datadog…)
  • Solid understanding of networking fundamentals (TCP/IP, DNS, HTTP, load balancers, firewalls)
  • Experience defining Service Level Objectives (SLO), developing runbooks/incident response plans, facilitating post-mortems and managing systems assets
  • Ability to work in cross-functional teams with developers, infra, and product teams
  • Excellent verbal and written communication skills
Job Responsibility
Job Responsibility
  • Be the go to person for mission critical infrastructure enabling critical operations such as Source Configuration Management, CI/CD systems, software distribution, supplier portals, manufacturing and more
  • Migrate SaaS to self-hosted solutions to enhance security and reliability
  • Implement monitoring and alerting systems, and define incident response plans and runbooks
  • Reduce human workload through automation to automate deployment and scaling
  • Establish strong relationships with stakeholders to identify infrastructure needs and establish Service Level Objectives
  • Use a data driven approach to demonstrate service robustness and track optimization work
  • Partner with the security team to ensure that security remediations and updates are applied in a timely manner
  • Fulltime
Read More
Arrow Right

Staff Software Engineer

As a Staff Forward Deployed Engineer (FDE) at Invisible, you'll lead high-impact...
Location
Location
United States , Austin; New York; San Francisco Bay Area; Washington DC–Baltimore
Salary
Salary:
213000.00 - 300000.00 USD / Year
invisible.co Logo
Invisible Technologies
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of software engineering experience, including significant time spent building data, ML, or backend systems
  • Deep proficiency in Python with hands-on experience using Hugging Face, LangChain, OpenAI, Pinecone, and related ecosystems
  • Skilled in full-stack and API-based deployment patterns, including Docker, FastAPI, Kubernetes, and cloud environments (GCP, AWS)
  • Experienced with workflow orchestration libraries, pub/sub systems (Kafka), and schema governance
  • Expertise in data governance and operations, including Unity Catalog and policy management, cluster/job orchestration, data contracts and quality enforcement, Delta/ETL pipelines, and replay processes
  • Strong product and system design instincts — you understand business needs and how to translate them into technical architecture
  • Experience building usable systems from messy data and ambiguous requirements
  • Excellent communication and client-facing skills
  • you’ve led conversations with technical and non-technical stakeholders alike
  • Proven experience owning projects from scoping through deployment in ambiguous, high-stakes environments
Job Responsibility
Job Responsibility
  • Partner with delivery and executive stakeholders to scope, design, and lead implementation of AI-driven solutions
  • Identify transformational opportunities in messy, ambiguous workflows and turn them into repeatable systems
  • Lead architecture design and trade-off discussions across performance, scalability, cost, and reliability
  • Own projects from first discovery call through full deployment — including client-facing delivery, internal coordination, and post-launch iteration
  • Build shared infrastructure, reusable components, and internal playbooks to level-up the team
  • Coach and mentor mid-level engineers and help shape the culture of forward-deployed AI engineering at Invisible
What we offer
What we offer
  • bonus
  • equity
  • benefits
  • Fulltime
Read More
Arrow Right

Staff Engineer, Site Reliability

LearnUpon is looking for a Staff Site Reliability Engineer to join our team in I...
Location
Location
Ireland , Dublin
Salary
Salary:
Not provided
learnupon.com Logo
LearnUpon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of experience in a software or Ops role
  • 5+ years of cloud engineering experience, with at least 2 years experience with AWS
  • Experience deploying Microservice environments, using containerisation technologies such as Kubernetes and Docker
  • Experience in designing and implementing Observability tech stacks
  • Have championed the benefits of Observability to Engineering teams
  • Can architect the design of SLO/SLI implementation that balances the needs of different teams
  • Familiar with cost analysis of Observability metrics gathering, Engineering effort, and tooling
  • Experience building and supporting large-scale distributed systems that back a consumer app or website with associated requirements of performance, security and disaster recovery
  • Experience with implementing IaaC (e.g. CloudFormation, Terraform etc.), automation tooling (e.g. Puppet, Ansible etc.), CI/CD (e.g. Jenkins, Travis CI, GitLab etc.)
  • Able to effectively communicate technical ideas to and collaborate with both technical and non-technical peers
Job Responsibility
Job Responsibility
  • Identifying opportunities to improve and scale our infrastructure for performance, observability, maintainability, and cost, by creating innovative solutions
  • Leading our efforts to build an observability function that incorporates application metrics, application transaction tracking, and event log management
  • Driving the processes to maintain resilient, scalable and cost-effective infrastructure
  • Working with other Engineering teams to provide infrastructure solutions that meet their ongoing requirements
  • Building tools focused on measuring, monitoring and alerting, with an eye towards self-service in order to promote Engineers’ ownership of observability
  • Reacting quickly to changing customer and business needs
  • Participate in on-call rota
  • Mentoring junior talent
What we offer
What we offer
  • Work in a fun and supportive environment with regular team events
  • Excellent career progression
  • Structured learning environment
  • Competitive salary and company ESOP
  • Private health insurance
  • 26 days annual leave
  • Fulltime
Read More
Arrow Right

Staff Software Engineer, Forward Deployed

As a Staff Forward Deployed Engineer (FDE) at Invisible, you'll lead high-impact...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
invisible.co Logo
Invisible Technologies
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of software engineering experience, including significant time spent building data, ML, or backend systems
  • Deep proficiency in Python and experience with ML/LLM frameworks such as Hugging Face, LangChain, OpenAI, Pinecone, etc.
  • Familiarity with full-stack or API-based deployment patterns (Docker, FastAPI, Kubernetes, GCP/AWS)
  • Strong product and system design instincts — you understand business needs and how to translate them into technical architecture
  • Experience building usable systems from messy data and ambiguous requirements
  • Excellent communication and client-facing skills
  • you’ve led conversations with technical and non-technical stakeholders alike
  • Proven experience owning projects from scoping through deployment in ambiguous, high-stakes environments
  • Be willing to be on-call for our customers when situations ari
  • Ability to travel roughly 25–50 % of the time, sometimes short-notice trips—primarily across Europe with occasional international roll-outs—to work directly on-site with clients
Job Responsibility
Job Responsibility
  • Partner with delivery and executive stakeholders to scope, design, and lead implementation of AI-driven solutions
  • Identify transformational opportunities in messy, ambiguous workflows and turn them into repeatable systems
  • Lead architecture design and trade-off discussions across performance, scalability, cost, and reliability
  • Own projects from first discovery call through full deployment — including client-facing delivery, internal coordination, and post-launch iteration
  • Build shared infrastructure, reusable components, and internal playbooks to level-up the team
  • Coach and mentor mid-level engineers and help shape the culture of forward-deployed AI engineering at Invisible
What we offer
What we offer
  • Bonuses and equity are included in offers above entry level
  • Fulltime
Read More
Arrow Right

Staff Platform Engineer

Join our dynamic team as a Compute Platform Engineer and play a pivotal role in ...
Location
Location
United States , Mountain View, California
Salary
Salary:
180000.00 - 280000.00 USD / Year
inworld.ai Logo
Inworld AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7 years of experience in software engineering
  • 5 years of experience with infrastructure-as-code
  • Proficiency in managing Kubernetes clusters and applications, including creating Kustomize manifests/Helm charts for new applications
  • Experience in creating and maintaining CI/CD pipelines for both applications and infrastructure deployments (using tools like Terraform/Terragrunt, ArgoCD, GitHub Actions, Ansible, etc.)
  • Deep knowledge of at least one major cloud provider (Google Cloud Platform, Microsoft Azure, Oracle Cloud)
  • Proficient in at least one backend programming/scripting languages such as Golang, Python, and Bash
  • Candidates must be based in the SF Bay Area or willing to relocate (you will be working on-site in our South Bay office a few days a week)
Job Responsibility
Job Responsibility
  • Work closely with backend and ML engineering teams to design, deploy, and maintain reliable, high-performance, and secure cloud infrastructure for our AI engine and Studio
  • Facilitate a "you build it, you run it" culture by providing the necessary tools and processes for monitoring the reliability, availability, and performance of services
  • Manage CI/CD pipelines to ensure smooth and efficient code integration and deployment
  • Identify and implement opportunities to enhance engineering speed and efficiency
  • Conduct root cause analysis to identify critical issues and develop automated solutions to prevent recurrence
  • Develop and share best practices to improve automation and efficiency across our engineering teams
What we offer
What we offer
  • equity and benefits
  • Fulltime
Read More
Arrow Right
New

Staff Software Engineer, Developer Experience (DevEx)

At Harvey, we’re transforming how legal and professional services operate — not ...
Location
Location
United States , San Francisco
Salary
Salary:
238000.00 - 290000.00 USD / Year
harvey.ai Logo
Harvey
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of software engineering experience, including building scalable backend systems or internal developer platforms
  • Proficiency in Python (or similar languages) and deep knowledge of backend development fundamentals and distributed systems
  • Hands-on experience with CI/CD systems (Builtekite, Github Actions), test frameworks, or load and performance testing
  • Hands-on experience with container technologies (Docker, Kubernetes) and infrastructure as code (Pulumi, Terraform)
  • A track record of producing high-quality, well-tested code, consistently following software development best practices to ensure quality and reliability
  • Proven technical leadership throughout the entire project lifecycle, including ideation, design, implementation, and productionization
  • Experience mentoring engineers, guiding architectural decisions, and shaping culture to foster engineering excellence
  • Strong problem-solving skills and a passion for improving developer experience — you enjoy creating tools or frameworks that make other engineers more productive
  • Excellent collaboration and communication skills, with the ability to work across teams and incorporate feedback
Job Responsibility
Job Responsibility
  • Develop and scale a world-class developer platform to accelerate Harvey's hyper growth. Boost velocity and stability through robust CI/CD systems, effective test frameworks, and reliable development environments.
  • Build load testing and benchmarking infrastructure essential for evaluating and optimizing the performance of AI-native applications.
  • Pioneer the future of software development and site reliability engineering by integrating AI agents across the software development, deployment and maintenance lifecycle.
  • Collaborate with Backend Platform teams to embed testability, reliability and observability into the platform, ensuring services built on our foundation are robust, easy to test and maintain.
  • Work closely with engineering teams to gather feedback, evangelize best practices, and make the “paved road” a reality — empowering every Harvey engineer to move fast with confidence.
  • Set the strategic direction and roadmap for scaling developer experience as Harvey expands, and contribute strategically to team decision-making.
  • Provide strong technical leadership and mentorship, upholding a high bar for engineering excellence across the team.
What we offer
What we offer
  • Comprehensive health, dental and vision coverage
  • retirement benefits (401k match up to 4%)
  • flexible PTO
  • Offers Equity
  • Offers Bonus
  • Fulltime
Read More
Arrow Right
New

Staff Data Ops Engineer - Platform

We are looking for a Staff Data Ops Engineer - Platform to join the Data & AI Pl...
Location
Location
France , Paris
Salary
Salary:
Not provided
doctolib.fr Logo
Doctolib
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of experience after graduation as a Staff Data Platform Engineer, or Staff Data Ops or Staff Site Reliability Engineer or in a similar role, with a history of architecting and scaling robust data platforms
  • Extensive experience with Google Cloud Platform and a command of Kubernetes & Terraform for automated deployments
  • Authority on implementing network and IAM security best practices
  • Deep technical proficiency in orchestrating data pipelines using Airflow or Dagster, deploying applications to the cloud, and leveraging modern data warehouses such as BigQuery
  • Highly skilled in programming with Python, and have a solid understanding of software development principles
  • Excellent troubleshooter who excels at diagnosing and fixing data infrastructure and identifying performance bottlenecks
  • Strong communicator who can articulate complex technical concepts to both technical and non-technical audiences
Job Responsibility
Job Responsibility
  • Design and implement enterprise-scale data infrastructure strategies, conducting thorough impact and cost analysis for major technical decisions, and establishing architectural standards across the organization
  • Build and optimize complex, multi-region data pipelines handling petabyte-scale datasets, ensuring 99.9% reliability and implementing advanced monitoring and alerting systems
  • Lead cost analysis initiatives, identify optimization opportunities across our data stack, and implement solutions that reduce infrastructure spend while improving performance and reliability
  • Provide technical guidance to data engineers and cross-functional teams, conduct architecture reviews, and drive adoption of best practices in DataOps, security, and governance
  • Evaluate emerging technologies, conduct proof-of-concepts for new data tools and platforms, and lead the technical roadmap for data infrastructure modernization
What we offer
What we offer
  • Free comprehensive health insurance for you and your children
  • Parent Care Program: receive one additional month of leave on top of the legal parental leave
  • Free mental health and coaching services through our partner Moka.care
  • For caregivers and workers with disabilities, a package including an adaptation of the remote policy, extra days off for medical reasons, and psychological support
  • Work from EU countries and the UK for up to 10 days per year, thanks to our flexibility days policy
  • Work Council subsidy to refund part of sport club membership or creative class
  • Up to 14 days of RTT
  • A subsidy from the work council to refund part of the membership to a sport club or a creative class
  • Lunch voucher with Swile card
  • Fulltime
Read More
Arrow Right