CrawlJobs Logo

Site Reliability Operations Analyst

United States, New York 93000.00 - 160000.00 USD / Year · Job Posted February 18, 2026
Apply Position
Job Link Share

Job Description

As a Site Reliability Operations Analyst you are the engine behind Palantir deployments. You are responsible for crafting, implementing and executing processes to streamline workflows and reduce friction. You track and stabilize projects, remove roadblocks, and anticipate customer needs to free up our engineers to focus their time and attention on the technical problems they are best equipped to solve.

Job Responsibility

  • Work on many different types of problems and challenges
  • Be the first responders when things go wrong
  • Craft and implement process to reduce friction and enable all team members to spend their time on what they do best
  • Think creatively, work collaboratively, and go above and beyond to get the job done

Requirements

  • Active US Security clearance or eligibility and willingness to obtain a US Security clearance
  • Ability to travel 25-75%, varies by location and team
  • 3+ years of project/program management experience, preferably in a fast-paced or dynamic environment

What we offer

  • Employees (and their eligible dependents) can enroll in medical, dental, and vision insurance as well as voluntary life insurance
  • Employees are automatically covered by Palantir’s basic life, AD&D and disability insurance
  • Commuter benefits
  • Relocation assistance
  • Take what you need paid time off, not accrual based
  • 2 weeks paid time off built into the end of each year (subject to team and business needs)
  • 10 paid holidays throughout the calendar year
  • Supportive leave of absence program including time off for military service and medical events
  • Paid leave for new parents and subsidized back-up care for all parents
  • Fertility and family building benefits including but not limited to adoption, surrogacy, and preservation
  • Stipend to help with expenses that come with a new child
  • Employees can enroll in Palantir’s 401k plan

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Site Reliability Operations Analyst

8 matching positions

Site Reliability Operations Analyst

As a Site Reliability Operations Analyst you are the engine behind Palantir depl...
Location
Location
United States , Washington, D.C.
Salary
Salary:
93000.00 - 160000.00 USD / Year
palantir.com Logo
Palantir Technologies
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Active US Security clearance or eligibility and willingness to obtain a US Security clearance
  • Ability to travel 25-75%, varies by location and team
  • 3+ years of project/program management experience, preferably in a fast-paced or dynamic environment
Job Responsibility
Job Responsibility
  • Work on many different types of problems and challenges
  • Be the first responders when things go wrong
  • Craft and implement process to reduce friction and enable all team members to spend their time on what they do best
  • Think creatively, work collaboratively, and go above and beyond to get the job done
What we offer
What we offer
  • Employees (and their eligible dependents) can enroll in medical, dental, and vision insurance as well as voluntary life insurance
  • Employees are automatically covered by Palantir’s basic life, AD&D and disability insurance
  • Commuter benefits
  • Relocation assistance
  • Take what you need paid time off, not accrual based
  • 2 weeks paid time off built into the end of each year (subject to team and business needs)
  • 10 paid holidays throughout the calendar year
  • Supportive leave of absence program including time off for military service and medical events
  • Paid leave for new parents and subsidized back-up care for all parents
  • Fertility and family building benefits including but not limited to adoption, surrogacy, and preservation
  • Fulltime
Read More
Arrow Right

Site Reliability Operations Analyst - Commercial

As a Site Reliability Operations Analyst you are the engine behind Palantir depl...
Location
Location
United States , New York
Salary
Salary:
93000.00 - 160000.00 USD / Year
palantir.com Logo
Palantir Technologies
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Ability to travel 25-75%, varies by location and team
  • 3+ years of project/program management experience, preferably in a fast-paced or dynamic environment
Job Responsibility
Job Responsibility
  • Work on many different types of problems and challenges
  • Be the first responders when things go wrong
  • Craft and implement process to reduce friction and enable all team members to spend their time on what they do best
  • Think creatively, work collaboratively, and do whatever it takes to get the job done
What we offer
What we offer
  • Employees (and their eligible dependents) can enroll in medical, dental, and vision insurance as well as voluntary life insurance
  • Employees are automatically covered by Palantir’s basic life, AD&D and disability insurance
  • Commuter benefits
  • Relocation assistance
  • Take what you need paid time off, not accrual based
  • 2 weeks paid time off built into the end of each year (subject to team and business needs)
  • 10 paid holidays throughout the calendar year
  • Supportive leave of absence program including time off for military service and medical events
  • Paid leave for new parents and subsidized back-up care for all parents
  • Fertility and family building benefits including but not limited to adoption, surrogacy, and preservation
  • Fulltime
Read More
Arrow Right

Site Reliability Operations Analyst - Commercial

As a Site Reliability Operations Analyst you are the engine behind Palantir depl...
Location
Location
South Korea , Seoul
Salary
Salary:
Not provided
palantir.com Logo
Palantir Technologies
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Ability to travel 25-75%, varies by location and team
  • 3+ years of project/program management experience, preferably in a fast-paced or dynamic environment
  • Ability to read, write, and speak fluent business Korean and English is a requirement
Job Responsibility
Job Responsibility
  • Work on many different types of problems and challenges
  • Be the first responders when things go wrong
  • Craft and implement process to reduce friction and enable all team members to spend their time on what they do best
  • Think creatively, work collaboratively, and do whatever it takes to get the job done
What we offer
What we offer
  • Promoting health and well-being across all areas of Palantirians’ lives
  • Fulltime
Read More
Arrow Right

Market Risk Analyst - Site Reliability Engineer

Join us at Barclays as a Market Risk Analyst - Site Reliability Engineer (SRE). ...
Location
Location
United Kingdom , Glasgow
Salary
Salary:
Not provided
barclays.co.uk Logo
Barclays
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Hands-on/technical experience with high proficiency in SQL, Database Technologies, Unix, Windows, primarily within Investment Banking domain
  • Experience with ITIL concepts and best practices
  • Experience of using configuration management tools and reporting (preferred Service Management Tool - Service First / SNOW)
  • Experience in batch monitoring tools (preferably, Autosys)
Job Responsibility
Job Responsibility
  • Effectively monitor and maintain the bank’s critical technology infrastructure and resolve more complex technical issues, whilst minimising disruption to operations
  • Provision of technical support for the service management function to resolve more complex issues for a specific client of group of clients
  • Develop the support model and service offering to improve the service to customers and stakeholders
  • Execution of preventative maintenance tasks on hardware and software and utilisation of monitoring tools/metrics to identify, prevent and address potential issues and ensure optimal performance
  • Maintenance of a knowledge base containing detailed documentation of resolved cases for future reference, self-service opportunities and knowledge sharing
  • Analysis of system logs, error messages and user reports to identify the root causes of hardware, software and network issues, and providing a resolution to these issues by fixing or replacing faulty hardware components, reinstalling software, or applying configuration changes
  • Automation, monitoring enhancements, capacity management, resiliency, business continuity management, front office specific support and stakeholder management
  • Identification and remediation or raising, through appropriate process, of potential service impacting risks and issues
  • Proactively assess support activities implementing automations where appropriate to maintain stability and drive efficiency
  • Actively tune monitoring tools, thresholds, and alerting to ensure issues are known when they occur
What we offer
What we offer
  • Competitive holiday allowance
  • Life assurance
  • Private medical care
  • Pension contribution
  • Fulltime
Read More
Arrow Right

Site Reliability Engineering Analyst - Assistant Vice President

The Engineer Sr Analyst is an intermediate level position responsible for a vari...
Location
Location
India , Pune
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5-8 years of relevant experience in an Engineering role
  • Experience working in Financial Services or a large complex and/or global environment
  • Project Management experience
  • Consistently demonstrates clear and concise written and verbal communication
  • Comprehensive knowledge of design metrics, analytics tools, benchmarking activities and related reporting to identify best practices
  • Demonstrated analytic/diagnostic skills
  • Ability to work in a matrix environment and partner with virtual teams
  • Ability to work independently, prioritize, and take ownership of various parts of a project or initiative
  • Ability to work under pressure and manage to tight deadlines or unexpected changes in expectations or requirements
  • Proven track record of operational process change and improvement
Job Responsibility
Job Responsibility
  • Contribute to the budgetary requirement definition for assigned product area, develop functional specifications, and create project plans and software release schedules
  • Partner with business and development teams to identify engineering requirements and assist in defining application and system requirements and processes and maintain engineering relationships with the end user/client
  • Ensure requirements/tasks from technology departments and/or end users are communicated to stakeholders
  • Provide solutions and processes in accordance with audit initiatives and requirements and consult with Business Information Security officers (BISOs) and TISOs
  • Exhibit in-depth understanding of engineering concepts and principles
  • Assist with training activities and mentor junior team members
  • Appropriately assess risk when business decisions are made, demonstrating particular consideration for the firm's reputation and safeguarding Citigroup, its clients and assets, by driving compliance with applicable laws, rules and regulations, adhering to Policy, applying sound ethical judgment regarding personal behavior, conduct and business practices, and escalating, managing and reporting control issues with transparency
  • Automate Core Processes: Design, develop, and implement automation solutions to replace manual activities, repetitive processes, to support migrations to new infrastructure
  • Continuous Improvement: Proactively identify opportunities for process improvements and efficiency gains across the service lifecycle
  • Support AI Integration: Collaborate with development and data science teams to support the seamless integration of services with AI solutions
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

As Site Reliability Engineer you will contribute to the overarching implementati...
Location
Location
Romania , Bucuresti
Salary
Salary:
Not provided
nttdata.com Logo
NTT DATA
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Engineering, or related field
  • Minimum 5 years proven work experience as a Reliability Engineer or similar role
  • Expert knowledge and hands-on experience with applications hosted on cloud platforms such as Google Cloud Platform as well as with Docker / Kubernetes in combination with Google Kubernetes Engine (GKE), Terraform or similar technology
  • Experience in resilient software development in Python/JAVA and the usage of modern CI/CD pipelines e.g. Github, Github Actions, Bitbucket, Helm
  • Strong experience in the setup of observability, monitoring and self-healing solutions for instance with New Relic, Splunk, Google Cloud Operations, Lightstep and Ansible
  • Very good knowledge of security standards (e.g.: TLS, OAuth2, KMS, Vault, Admission Controllers, let's encrypt), microservice architectures and experience with API Management with Apigee or WSO2
  • Proactive attitude and collaborative Team player mindset paired with self confidence
  • Not losing your coolness and keep your eye for details even in stressful situations where time matters
  • Having a creative approach towards solving technical problems
  • Excellent communication skills in English
Job Responsibility
Job Responsibility
  • Define Service Level Objectives (SLOs), and enable an end-to-end view on customer satisfaction based on best practices for setting up Service Level Indicators (SLIs) to create effective strategies for maintaining and improving system performance and availability
  • Collaborate with Business Functional Analysts and Solution Architects to find improvements in the solution design to improve the resilience of technical solutions early on
  • Consult and guide the squad on the prioritization of reliability improvement and actively deliver them as part of the sprint
  • Hands-on experience in implementing reliability and resilience patterns like auto-scaling, circuit breakers, bulk-heads, rate limiter, retry mechanisms, etc.
  • Actively work on service request fulfilment, incident and problem mgmt. to identify and reduce toil and the MTTR with engineering best practices
  • Align and contribute on state-of-the-art SRE best practices e.g. Distributed Tracing, Open Telemetry and Chaos Engineering with the SRE chapter function
  • Be a knowledge- and skill multiplicator of your profession by being a Lead of the Site Reliability engineer population
  • Increase the seniority of the overall Site Reliability Engineer chapter by establishing events and procedures, and foster a culture of high standards
  • Lead people of your engineer profession and make them become better each day
What we offer
What we offer
  • Smooth integration and a supportive mentor
  • Pick your working style: choose from Remote, Hybrid or Office work opportunities
  • Our projects have different working hours to suit your needs
  • Sponsored certifications, trainings and top e-learning platforms
  • Private Health Insurance – custom-made for you
  • Individual coaching sessions or accredited Coaching School
  • Epic parties or themed events – lovingly designed for our people and their families
  • Fulltime
Read More
Arrow Right

Site Reliability Engineering (SRE) / Lead Engineer

We are currently seeking a Site Reliability Engineering (SRE) / Lead Engineer to...
Location
Location
Mexico , Guadalajara
Salary
Salary:
Not provided
nttdata.com Logo
NTT DATA
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8-10+ years of experience in SRE, Observability, or DevOps roles, with leadership responsibilities
  • Hands-on experience with OpenTelemetry for distributed tracing and observability instrumentation
  • Proven expertise with Application Performance Monitoring (APM) tools such as New Relic, Datadog, AppDynamics, or Dynatrace
  • Strong proficiency in Infrastructure as Code (IaC) using Terraform
  • Solid understanding of cloud platforms including AWS, GCP, or Azure
  • Experience with automation/configuration management tools like Ansible, Chef, or Puppet
  • Deep knowledge of CI/CD pipelines and tools such as GitHub Actions, Jenkins, or Azure DevOps
  • Experience managing Kubernetes and containerized environments (Docker, Helm)
  • Familiarity with log aggregation and analysis platforms like ELK Stack or Splunk
  • Excellent leadership, communication, and collaboration skills
Job Responsibility
Job Responsibility
  • Lead the strategic development and management of observability and reliability frameworks across the organization, ensuring alignment with business goals and technical requirements
  • Design and implementation of monitoring and observability solutions, collaborating with engineering teams to define standards and best practices
  • Manage Infrastructure as Code (IaC) initiatives using Terraform, coordinating with cloud and infrastructure teams to ensure scalable and secure deployments
  • Drive automation strategies for monitoring, alerting, and logging pipelines, focusing on process improvements and operational efficiency
  • Develop and maintain comprehensive observability roadmaps, including distributed tracing, logging, and metrics collection strategies
  • Collaborate with product management, sales, and pre-sales teams to provide technical expertise and support during solution design and customer engagements
  • Lead cross-functional teams to enhance CI/CD pipelines and deployment reliability, ensuring smooth integration of observability tools and practices
  • Engage with vendors and strategic partners to evaluate, select, and integrate observability and monitoring solutions, ensuring alignment with organizational needs and fostering strong collaborative relationships
  • Mentor and develop junior engineers and analysts, fostering a culture of reliability, observability, and operational excellence
  • Fulltime
Read More
Arrow Right

Site Reliability Engineering (SRE) / Observability Technical Lead

Join a dynamic team as a Site Reliability Engineer, leading observability and re...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
nttdata.com Logo
NTT DATA
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in SRE, Observability, or DevOps roles, with leadership responsibilities
  • Proven expertise with Application Performance Monitoring (APM) tools such as New Relic, Datadog, AppDynamics, or Dynatrace
  • Hands-on experience with OpenTelemetry (OTel) for distributed tracing and observability instrumentation
  • Strong proficiency in Infrastructure as Code (IaC) using Terraform
  • Solid understanding of cloud platforms including AWS, GCP, or Azure
  • Experience with automation/configuration management tools like Ansible, Chef, or Puppet
  • Deep knowledge of CI/CD pipelines and tools such as GitHub Actions, Jenkins, or Azure DevOps
  • Experience managing Kubernetes and containerized environments (Docker, Helm)
  • Familiarity with log aggregation and analysis platforms like ELK Stack or Splunk
  • Excellent leadership, communication, and collaboration skills
Job Responsibility
Job Responsibility
  • Lead the strategic development and management of observability and reliability frameworks across the organization, ensuring alignment with business goals and technical requirements
  • Design and implementation of monitoring and observability solutions, collaborating with engineering teams to define standards and best practices
  • Manage Infrastructure as Code (IaC) initiatives using Terraform, coordinating with cloud and infrastructure teams to ensure scalable and secure deployments
  • Drive automation strategies for monitoring, alerting, and logging pipelines, focusing on process improvements and operational efficiency
  • Develop and maintain comprehensive observability roadmaps, including distributed tracing, logging, and metrics collection strategies
  • Collaborate with product management, sales, and pre-sales teams to provide technical expertise and support during solution design and customer engagements
  • Lead cross-functional teams to enhance CI/CD pipelines and deployment reliability, ensuring smooth integration of observability tools and practices
  • Engage with vendors and strategic partners to evaluate, select, and integrate observability and monitoring solutions, ensuring alignment with organizational needs and fostering strong collaborative relationships
  • Mentor and develop junior engineers and analysts, fostering a culture of reliability, observability, and operational excellence
What we offer
What we offer
  • Tailored benefits that support your physical, emotional, and financial wellbeing
  • Continuous growth and development opportunities
  • Flexible work options
  • Fulltime
Read More
Arrow Right