Site Reliability Operations Analyst Job at Palantir Technologies (New York)

Site Reliability Operations Analyst

As a Site Reliability Operations Analyst you are the engine behind Palantir depl...

Location

United States , Washington, D.C.

Salary:

93000.00 - 160000.00 USD / Year

Palantir Technologies

Expiration Date

Until further notice

Requirements

Active US Security clearance or eligibility and willingness to obtain a US Security clearance
Ability to travel 25-75%, varies by location and team
3+ years of project/program management experience, preferably in a fast-paced or dynamic environment

Job Responsibility

Work on many different types of problems and challenges
Be the first responders when things go wrong
Craft and implement process to reduce friction and enable all team members to spend their time on what they do best
Think creatively, work collaboratively, and go above and beyond to get the job done

What we offer

Employees (and their eligible dependents) can enroll in medical, dental, and vision insurance as well as voluntary life insurance
Employees are automatically covered by Palantir’s basic life, AD&D and disability insurance
Commuter benefits
Relocation assistance
Take what you need paid time off, not accrual based
2 weeks paid time off built into the end of each year (subject to team and business needs)
10 paid holidays throughout the calendar year
Supportive leave of absence program including time off for military service and medical events
Paid leave for new parents and subsidized back-up care for all parents
Fertility and family building benefits including but not limited to adoption, surrogacy, and preservation

Fulltime

Site Reliability Operations Analyst - Commercial

As a Site Reliability Operations Analyst you are the engine behind Palantir depl...

Location

United States , New York

Salary:

93000.00 - 160000.00 USD / Year

Palantir Technologies

Expiration Date

Until further notice

Requirements

Ability to travel 25-75%, varies by location and team
3+ years of project/program management experience, preferably in a fast-paced or dynamic environment

Job Responsibility

Work on many different types of problems and challenges
Be the first responders when things go wrong
Craft and implement process to reduce friction and enable all team members to spend their time on what they do best
Think creatively, work collaboratively, and do whatever it takes to get the job done

What we offer

Employees (and their eligible dependents) can enroll in medical, dental, and vision insurance as well as voluntary life insurance
Employees are automatically covered by Palantir’s basic life, AD&D and disability insurance
Commuter benefits
Relocation assistance
Take what you need paid time off, not accrual based
2 weeks paid time off built into the end of each year (subject to team and business needs)
10 paid holidays throughout the calendar year
Supportive leave of absence program including time off for military service and medical events
Paid leave for new parents and subsidized back-up care for all parents
Fertility and family building benefits including but not limited to adoption, surrogacy, and preservation

Fulltime

Site Reliability Operations Analyst - Commercial

As a Site Reliability Operations Analyst you are the engine behind Palantir depl...

Location

South Korea , Seoul

Salary:

Not provided

Palantir Technologies

Expiration Date

Until further notice

Requirements

Ability to travel 25-75%, varies by location and team
3+ years of project/program management experience, preferably in a fast-paced or dynamic environment
Ability to read, write, and speak fluent business Korean and English is a requirement

Job Responsibility

Work on many different types of problems and challenges
Be the first responders when things go wrong
Craft and implement process to reduce friction and enable all team members to spend their time on what they do best
Think creatively, work collaboratively, and do whatever it takes to get the job done

What we offer

Promoting health and well-being across all areas of Palantirians’ lives

Fulltime

Market Risk Analyst - Site Reliability Engineer

Join us at Barclays as a Market Risk Analyst - Site Reliability Engineer (SRE). ...

Location

United Kingdom , Glasgow

Salary:

Not provided

Barclays

Expiration Date

Until further notice

Requirements

Hands-on/technical experience with high proficiency in SQL, Database Technologies, Unix, Windows, primarily within Investment Banking domain
Experience with ITIL concepts and best practices
Experience of using configuration management tools and reporting (preferred Service Management Tool - Service First / SNOW)
Experience in batch monitoring tools (preferably, Autosys)

Job Responsibility

Effectively monitor and maintain the bank’s critical technology infrastructure and resolve more complex technical issues, whilst minimising disruption to operations
Provision of technical support for the service management function to resolve more complex issues for a specific client of group of clients
Develop the support model and service offering to improve the service to customers and stakeholders
Execution of preventative maintenance tasks on hardware and software and utilisation of monitoring tools/metrics to identify, prevent and address potential issues and ensure optimal performance
Maintenance of a knowledge base containing detailed documentation of resolved cases for future reference, self-service opportunities and knowledge sharing
Analysis of system logs, error messages and user reports to identify the root causes of hardware, software and network issues, and providing a resolution to these issues by fixing or replacing faulty hardware components, reinstalling software, or applying configuration changes
Automation, monitoring enhancements, capacity management, resiliency, business continuity management, front office specific support and stakeholder management
Identification and remediation or raising, through appropriate process, of potential service impacting risks and issues
Proactively assess support activities implementing automations where appropriate to maintain stability and drive efficiency
Actively tune monitoring tools, thresholds, and alerting to ensure issues are known when they occur

What we offer

Competitive holiday allowance
Life assurance
Private medical care
Pension contribution

Fulltime

Site Reliability Engineering Analyst - Assistant Vice President

The Engineer Sr Analyst is an intermediate level position responsible for a vari...

Location

India , Pune

Salary:

Not provided

Citi

Expiration Date

Until further notice

Requirements

5-8 years of relevant experience in an Engineering role
Experience working in Financial Services or a large complex and/or global environment
Project Management experience
Consistently demonstrates clear and concise written and verbal communication
Comprehensive knowledge of design metrics, analytics tools, benchmarking activities and related reporting to identify best practices
Demonstrated analytic/diagnostic skills
Ability to work in a matrix environment and partner with virtual teams
Ability to work independently, prioritize, and take ownership of various parts of a project or initiative
Ability to work under pressure and manage to tight deadlines or unexpected changes in expectations or requirements
Proven track record of operational process change and improvement

Job Responsibility

Contribute to the budgetary requirement definition for assigned product area, develop functional specifications, and create project plans and software release schedules
Partner with business and development teams to identify engineering requirements and assist in defining application and system requirements and processes and maintain engineering relationships with the end user/client
Ensure requirements/tasks from technology departments and/or end users are communicated to stakeholders
Provide solutions and processes in accordance with audit initiatives and requirements and consult with Business Information Security officers (BISOs) and TISOs
Exhibit in-depth understanding of engineering concepts and principles
Assist with training activities and mentor junior team members
Appropriately assess risk when business decisions are made, demonstrating particular consideration for the firm's reputation and safeguarding Citigroup, its clients and assets, by driving compliance with applicable laws, rules and regulations, adhering to Policy, applying sound ethical judgment regarding personal behavior, conduct and business practices, and escalating, managing and reporting control issues with transparency
Automate Core Processes: Design, develop, and implement automation solutions to replace manual activities, repetitive processes, to support migrations to new infrastructure
Continuous Improvement: Proactively identify opportunities for process improvements and efficiency gains across the service lifecycle
Support AI Integration: Collaborate with development and data science teams to support the seamless integration of services with AI solutions

Fulltime

Site Reliability Engineer

As Site Reliability Engineer you will contribute to the overarching implementati...

Location

Romania , Bucuresti

Salary:

Not provided

NTT DATA

Expiration Date

Until further notice

Requirements

Bachelor's degree in Computer Science, Engineering, or related field
Minimum 5 years proven work experience as a Reliability Engineer or similar role
Expert knowledge and hands-on experience with applications hosted on cloud platforms such as Google Cloud Platform as well as with Docker / Kubernetes in combination with Google Kubernetes Engine (GKE), Terraform or similar technology
Experience in resilient software development in Python/JAVA and the usage of modern CI/CD pipelines e.g. Github, Github Actions, Bitbucket, Helm
Strong experience in the setup of observability, monitoring and self-healing solutions for instance with New Relic, Splunk, Google Cloud Operations, Lightstep and Ansible
Very good knowledge of security standards (e.g.: TLS, OAuth2, KMS, Vault, Admission Controllers, let's encrypt), microservice architectures and experience with API Management with Apigee or WSO2
Proactive attitude and collaborative Team player mindset paired with self confidence
Not losing your coolness and keep your eye for details even in stressful situations where time matters
Having a creative approach towards solving technical problems
Excellent communication skills in English

Job Responsibility

Define Service Level Objectives (SLOs), and enable an end-to-end view on customer satisfaction based on best practices for setting up Service Level Indicators (SLIs) to create effective strategies for maintaining and improving system performance and availability
Collaborate with Business Functional Analysts and Solution Architects to find improvements in the solution design to improve the resilience of technical solutions early on
Consult and guide the squad on the prioritization of reliability improvement and actively deliver them as part of the sprint
Hands-on experience in implementing reliability and resilience patterns like auto-scaling, circuit breakers, bulk-heads, rate limiter, retry mechanisms, etc.
Actively work on service request fulfilment, incident and problem mgmt. to identify and reduce toil and the MTTR with engineering best practices
Align and contribute on state-of-the-art SRE best practices e.g. Distributed Tracing, Open Telemetry and Chaos Engineering with the SRE chapter function
Be a knowledge- and skill multiplicator of your profession by being a Lead of the Site Reliability engineer population
Increase the seniority of the overall Site Reliability Engineer chapter by establishing events and procedures, and foster a culture of high standards
Lead people of your engineer profession and make them become better each day

What we offer

Smooth integration and a supportive mentor
Pick your working style: choose from Remote, Hybrid or Office work opportunities
Our projects have different working hours to suit your needs
Sponsored certifications, trainings and top e-learning platforms
Private Health Insurance – custom-made for you
Individual coaching sessions or accredited Coaching School
Epic parties or themed events – lovingly designed for our people and their families

Fulltime

Site Reliability Engineering (SRE) / Lead Engineer

We are currently seeking a Site Reliability Engineering (SRE) / Lead Engineer to...

Location

Mexico , Guadalajara

Salary:

Not provided

NTT DATA

Expiration Date

Until further notice

Requirements

8-10+ years of experience in SRE, Observability, or DevOps roles, with leadership responsibilities
Hands-on experience with OpenTelemetry for distributed tracing and observability instrumentation
Proven expertise with Application Performance Monitoring (APM) tools such as New Relic, Datadog, AppDynamics, or Dynatrace
Strong proficiency in Infrastructure as Code (IaC) using Terraform
Solid understanding of cloud platforms including AWS, GCP, or Azure
Experience with automation/configuration management tools like Ansible, Chef, or Puppet
Deep knowledge of CI/CD pipelines and tools such as GitHub Actions, Jenkins, or Azure DevOps
Experience managing Kubernetes and containerized environments (Docker, Helm)
Familiarity with log aggregation and analysis platforms like ELK Stack or Splunk
Excellent leadership, communication, and collaboration skills

Job Responsibility

Lead the strategic development and management of observability and reliability frameworks across the organization, ensuring alignment with business goals and technical requirements
Design and implementation of monitoring and observability solutions, collaborating with engineering teams to define standards and best practices
Manage Infrastructure as Code (IaC) initiatives using Terraform, coordinating with cloud and infrastructure teams to ensure scalable and secure deployments
Drive automation strategies for monitoring, alerting, and logging pipelines, focusing on process improvements and operational efficiency
Develop and maintain comprehensive observability roadmaps, including distributed tracing, logging, and metrics collection strategies
Collaborate with product management, sales, and pre-sales teams to provide technical expertise and support during solution design and customer engagements
Lead cross-functional teams to enhance CI/CD pipelines and deployment reliability, ensuring smooth integration of observability tools and practices
Engage with vendors and strategic partners to evaluate, select, and integrate observability and monitoring solutions, ensuring alignment with organizational needs and fostering strong collaborative relationships
Mentor and develop junior engineers and analysts, fostering a culture of reliability, observability, and operational excellence

Fulltime

Site Reliability Engineering (SRE) / Observability Technical Lead

Join a dynamic team as a Site Reliability Engineer, leading observability and re...

Location

United Kingdom , London

Salary:

Not provided

NTT DATA

Expiration Date

Until further notice

Requirements

5+ years of experience in SRE, Observability, or DevOps roles, with leadership responsibilities
Proven expertise with Application Performance Monitoring (APM) tools such as New Relic, Datadog, AppDynamics, or Dynatrace
Hands-on experience with OpenTelemetry (OTel) for distributed tracing and observability instrumentation
Strong proficiency in Infrastructure as Code (IaC) using Terraform
Solid understanding of cloud platforms including AWS, GCP, or Azure
Experience with automation/configuration management tools like Ansible, Chef, or Puppet
Deep knowledge of CI/CD pipelines and tools such as GitHub Actions, Jenkins, or Azure DevOps
Experience managing Kubernetes and containerized environments (Docker, Helm)
Familiarity with log aggregation and analysis platforms like ELK Stack or Splunk
Excellent leadership, communication, and collaboration skills

Job Responsibility

Lead the strategic development and management of observability and reliability frameworks across the organization, ensuring alignment with business goals and technical requirements
Design and implementation of monitoring and observability solutions, collaborating with engineering teams to define standards and best practices
Manage Infrastructure as Code (IaC) initiatives using Terraform, coordinating with cloud and infrastructure teams to ensure scalable and secure deployments
Drive automation strategies for monitoring, alerting, and logging pipelines, focusing on process improvements and operational efficiency
Develop and maintain comprehensive observability roadmaps, including distributed tracing, logging, and metrics collection strategies
Collaborate with product management, sales, and pre-sales teams to provide technical expertise and support during solution design and customer engagements
Lead cross-functional teams to enhance CI/CD pipelines and deployment reliability, ensuring smooth integration of observability tools and practices
Engage with vendors and strategic partners to evaluate, select, and integrate observability and monitoring solutions, ensuring alignment with organizational needs and fostering strong collaborative relationships
Mentor and develop junior engineers and analysts, fostering a culture of reliability, observability, and operational excellence

What we offer

Tailored benefits that support your physical, emotional, and financial wellbeing
Continuous growth and development opportunities
Flexible work options

Fulltime

Select Country

Site Reliability Operations Analyst

Job Description

Job Responsibility

Requirements

What we offer

Looking for more opportunities?

Site Reliability Operations Analyst

Site Reliability Operations Analyst

Site Reliability Operations Analyst - Commercial

Site Reliability Operations Analyst - Commercial

Market Risk Analyst - Site Reliability Engineer

Site Reliability Engineering Analyst - Assistant Vice President

Site Reliability Engineer

Site Reliability Engineering (SRE) / Lead Engineer

Site Reliability Engineering (SRE) / Observability Technical Lead

Our AI answers in your language