CrawlJobs Logo

Product Reliability Engineer - Defense

palantir.com Logo

Palantir Technologies

Location Icon

Location:
United States , Washington, D.C.

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

82000.00 - 140000.00 USD / Year

Job Description:

Product Reliability Engineers (PREs) are responsible for the health, performance, and stability of the services that power services at Palantir. PREs take ownership over the entire end-to-end cycle of service reliability, from responding to outages to improving codebases and building lasting solutions. You will tackle critical issues for key customers, introduce observability into complex systems, address tech debt in essential codebases, and inform strategic investments in core products. We are looking for engineers who enjoy deep-dive troubleshooting, feel strong ownership over the problems they encounter, and recognize the urgency of customer-facing outages. PREs spend the majority of their time on forward-looking product work, including but not limited to, infrastructure migrations, product contributions to improve stability and observability, and codebase enhancements that increase resilience. During periodic on-call shifts, we respond to automated alerts, investigate issues reported by customers, and share technical expertise with adjacent product teams. Whatever the technical issue or question about your service is, you'll play a central and critical role in resolving it, seeking not just a one-time fix, but a permanent solution. We provide new team members with an experienced mentor and a clear onboarding framework to set them up for success in the role.

Job Responsibility:

  • Continuously invest in documentation, metrics, monitors and other troubleshooting tools
  • Participate in on-call rotations during business hours and occasional weekends. This is a challenging yet rewarding opportunity to help remediate the most pressing issues across the Palantir fleet
  • Diagnose, resolve, and prevent issues encountered in the field. Deliver end-to-end improvements to core products based on these issues you encounter in the field
  • Improve observability by refactoring codepaths and introducing telemetry
  • Identify and implement data-driven opportunities for improved service resilience
  • Develop strategic opinions on stability investments and inform the vision for long-term product stability

Requirements:

  • Engineering background in Computer Science, Mathematics, Software Engineering, Physics or similar field
  • Ability to work with a high degree of ownership and a strong sense of urgency in a dynamic environment
  • Experience producing code in backend languages such as Java, as part of a past role or personal projects
  • Familiarity with storage and data processing systems and cloud infrastructure
  • Strong written and verbal communication and ability to iterate quickly with teammates and incorporate feedback
  • Eligibility and willingness to obtain a US Security clearance

Nice to have:

  • Comfortable with and curious about large scale production systems and technologies. For example, load balancing, monitoring, distributed systems, and configuration management
  • Confidence in troubleshooting complex issues independently using observability tools and stack traces
  • Familiarity with monitoring tools such as Prometheus and health checks
  • Experience coding with Java, Go and/or web technologies (e.g. HTML, CSS, JavaScript, Python/Ruby, Django/Flask/Ruby on Rails, etc.) is a plus
  • Track record of identifying bugs in codebases and contributing fixes leading to long term service stability
  • Demonstrated ability making data-driven decisions and engaging with stakeholders on strategy
What we offer:
  • Employees (and their eligible dependents) can enroll in medical, dental, and vision insurance as well as voluntary life insurance
  • Employees are automatically covered by Palantir’s basic life, AD&D and disability insurance
  • Commuter benefits
  • Take what you need paid time off, not accrual based
  • 2 weeks paid time off built into the end of each year (subject to team and business needs)
  • 10 paid holidays throughout the calendar year
  • Supportive leave of absence program including time off for military service and medical events
  • Paid leave for new parents and subsidized back-up care for all parents
  • Fertility and family building benefits including but not limited to adoption, surrogacy, and preservation
  • Stipend to help with expenses that come with a new child
  • Employees can enroll in Palantir’s 401k plan

Additional Information:

Job Posted:
February 20, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Product Reliability Engineer - Defense

Principle LRU Test Equipment Development Engineer

Contribute extensive aerospace LRU test experience towards the conceptualization...
Location
Location
United States , South Windsor
Salary
Salary:
Not provided
bloomy.com Logo
Bloomy
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • BS in engineering or science
  • electrical, systems or aerospace engineering preferred
  • MS a plus
  • 15 years of experience with the specification, design, maintenance and/or use of automated testing equipment in the aerospace and defense industries, including a minimum of 10 years of LRU test development experience
  • Outstanding verbal and written communication and presentation skills, including the expression of requirements, systems and solutions
  • Strong team as well as customer orientation
Job Responsibility
Job Responsibility
  • Work together with BLOOMY's engineering teams to refresh and extend a growing portfolio of commercial test equipment to support evolving new Advanced Air Mobility (AAM) standards and requirements, spanning engineering, integration, certification, production as well as flightline and MRO depot testing
  • Contribute to key customer meetings, presentations and bid and proposal strategies
  • Participate in standards boards and industry events
  • Contribute to the development of marketing collateral, application notes, case studies, video clips, webinars, blogs, demos, and exhibits
  • Liaise with industry partners
  • Support the company's mission to provide automated test solutions for mission-critical and emerging applicactions which increase product safety, performance and reliability while reducing cost
Read More
Arrow Right

Electrical Engineer

The Electrical Engineer is responsible for the electrical design of simulated tr...
Location
Location
United States , Tampa
Salary
Salary:
Not provided
aerosimulation.com Logo
Aero Simulation, Inc. (ASI)
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Working knowledge of hardware and electrical system design and test processes
  • Working knowledge of A/C and D/C power distribution, grounding, I/O distribution, networking, KVM, Emergency Power Off, Overheat, and Audio electrical designs
  • Demonstrated experience producing manufacturable electrical designs
  • Previous experience working with government customers with preferred experience presenting and/or supporting requirements reviews, design reviews, and acceptance testing
  • Proficiency in common business software (Microsoft Office – Word, Outlook, PowerPoint, Excel, SharePoint, Visio)
  • Knowledge and proficiency with AutoCAD software desired
  • Ability to develop and maintain positive working relationships with internal and external customers
  • Ability to adapt communication style and messaging to different audiences
  • Ability to manage multiple priorities and projects simultaneously, ensuring stakeholder expectations are managed appropriately
  • Ability to work in a project-oriented, fast paced environment to meet deadlines
Job Responsibility
Job Responsibility
  • Work closely with Systems and Mechanical Engineering to produce electrical designs and details that are manufacturable, ergonomic, reliable, and maintainable
  • Work with and mentor electrical engineers of multiple levels to develop comprehensive and cohesive electrical designs
  • Ensures specification of hardware by working closely with vendors and suppliers for successful technology solutions
  • Communicate effectively and work closely with the Computer Aided Design team to generate wire lists, cable drawings, system drawings, and top-level assemblies/installations
  • Support customer meetings including requirement reviews, design reviews, and acceptance testing
  • Provides support to manufacturing in the form of resolving design and documentation issues during production phases, and documenting changes by creating engineering change documents
  • Support customer events such as configuration audits and maintenance training
What we offer
What we offer
  • Employee Stock Ownership Plan (ESOP)
  • Flexible work environment
  • Generous paid time off
  • Professional development opportunities
  • Industry competitive compensation
  • Medical benefits
  • Dental benefits
  • 401k
  • Fulltime
Read More
Arrow Right

Product Reliability Engineer - Defense

Product Reliability Engineers (PREs) are responsible for the health, performance...
Location
Location
United States , New York
Salary
Salary:
82000.00 - 140000.00 USD / Year
palantir.com Logo
Palantir Technologies
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Engineering background in Computer Science, Mathematics, Software Engineering, Physics or similar field
  • Ability to work with a high degree of ownership and a strong sense of urgency in a dynamic environment
  • Experience producing code in backend languages such as Java, as part of a past role or personal projects
  • Familiarity with storage and data processing systems and cloud infrastructure
  • Strong written and verbal communication and ability to iterate quickly with teammates and incorporate feedback
  • Eligibility and willingness to obtain a US Security clearance
Job Responsibility
Job Responsibility
  • Continuously invest in documentation, metrics, monitors and other troubleshooting tools
  • Participate in on-call rotations during business hours and occasional weekends. This is a challenging yet rewarding opportunity to help remediate the most pressing issues across the Palantir fleet.
  • Diagnose, resolve, and prevent issues encountered in the field. Deliver end-to-end improvements to core products based on these issues you encounter in the field.
  • Improve observability by refactoring codepaths and introducing telemetry
  • Identify and implement data-driven opportunities for improved service resilience
  • Develop strategic opinions on stability investments and inform the vision for long-term product stability
What we offer
What we offer
  • Employees (and their eligible dependents) can enroll in medical, dental, and vision insurance as well as voluntary life insurance
  • Employees are automatically covered by Palantir’s basic life, AD&D and disability insurance
  • Commuter benefits
  • Take what you need paid time off, not accrual based
  • 2 weeks paid time off built into the end of each year (subject to team and business needs)
  • 10 paid holidays throughout the calendar year
  • Supportive leave of absence program including time off for military service and medical events
  • Paid leave for new parents and subsidized back-up care for all parents
  • Fertility and family building benefits including but not limited to adoption, surrogacy, and preservation
  • Stipend to help with expenses that come with a new child
  • Fulltime
Read More
Arrow Right

Software Engineer, Internship - Defense Tech

Software Engineers at Palantir build software at scale to transform how organiza...
Location
Location
United States , Palo Alto
Salary
Salary:
10500.00 USD / Month
palantir.com Logo
Palantir Technologies
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Engineering background in fields such as Computer Science, Mathematics, Software Engineering, and Physics
  • Familiarity with data structures, storage systems, cloud infrastructure, front-end frameworks, and other technical tools
  • Active US Security clearance, or eligibility and willingness to obtain a US Security clearance prior to start of internship
  • Experience coding in programming languages, such as Java, C++, Python, JavaScript, or similar languages
  • Must be planning on graduating in 2027. This should be your final internship before graduating
Job Responsibility
Job Responsibility
  • Ownership: We see projects through from beginning to end in spite of obstacles we may encounter
  • Collaboration: We work internally with people from a variety of backgrounds — such as other Software Engineers, Product Managers, Designers and Product Reliability Engineers. We also partner with our business development teams (Forward Deployed Engineers, Deployment Strategists) in order to understand and solve our customers' problems
  • Trust: We trust each other to effectively handle time and priorities, and don't micromanage. We want people to have the space to think for themselves, while feeling supported by their team
What we offer
What we offer
  • Promoting health and well-being across all areas of Palantirians’ lives is just one of the ways we’re investing in our community
  • Fulltime
Read More
Arrow Right

Reliability Engineer – Performance & Life-Cycle Assurance

Mach Industries is seeking a Reliability Engineer who will own the end-to-end re...
Location
Location
United States , Huntington Beach
Salary
Salary:
150000.00 - 200000.00 USD / Year
machindustries.com Logo
Mach Industries
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Mechanical Engineering, Electrical/Electronic Engineering, Aerospace Engineering, Systems Engineering or related discipline
  • 5+ years of reliability engineering (or similar) experience in complex hardware-centric systems
  • preferably in aerospace/defense/unmanned systems or high-reliability industrial/automotive environments
  • Demonstrated experience applying reliability methods such as FMEA, FMECA, and RCFA
  • Strong data-analysis skills: ability to ingest large data sets (field returns, operational logs), perform statistical/trend analysis, build dashboards, derive actionable insights
  • Experience with reliability testing: accelerated life tests, environmental stress screening, vibration/thermal/thermal-cycle/shock/humidity, life-cycle modelling
  • Knowledge of safety‐critical system standards and regulatory requirements (e.g., MIL-STD, DO-178, DO-254)
Job Responsibility
Job Responsibility
  • Develop, deploy and maintain a reliability program plan for our UAS platforms and key subsystems (hardware, firmware, software) following best-practices (e.g., failure-mode and effects analysis (FMEA))
  • Define reliability and maintainability requirements and metrics (e.g., MTBF, MTBR, availability, mission readiness, failure rate targets) early in the design lifecycle, and track performance through production and field operation
  • Using data (lab testing, manufacturing, field returns, in-service logs) perform analytics to identify trends, root causes of failures (RCFA), latent defects, and reliability risks—then drive corrective and preventive actions
  • Define and oversee reliability test plans, accelerated life testing, environmental stress screening, field-data analysis, degradation modelling and life-cycle modelling in collaboration with test & validation teams
  • Monitor key reliability indicators (e.g., failure-rate trending, early‐life failures, wear-out characteristics, maintenance cost per unit time/mission, parts-life forecasting) and provide actionable insights to leadership
  • Communicate reliability status, risk posture, and improvement plans to senior leadership and stakeholders, including interfacing with defense-customer reliability/quality requirements and audits if applicable
What we offer
What we offer
  • Offers Equity
  • healthcare
  • dental and vision plans
  • retirement savings
  • paid time off
  • continuing education
  • training
  • career growth
  • Fulltime
Read More
Arrow Right

Principal Site Reliability Engineer

Arcadia’s customers rely on us to securely process and deliver high-value health...
Location
Location
Salary
Salary:
Not provided
themuse.com Logo
The Muse
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience in SRE, platform engineering, systems engineering, or related roles operating production services at scale
  • Demonstrated principal-level impact: leading cross-team initiatives, influencing architecture decisions, and driving sustained improvements in reliability and operations
  • Expertise in Kubernetes operations and troubleshooting, including safe rollout/rollback patterns, workload debugging, and operational guardrails
  • Strong GitOps experience with Argo CD
  • experience building delivery workflows and automation using Argo Workflows
  • Strong infrastructure orchestration and provisioning experience with Crossplane and Terraform
  • ability to define reusable platform patterns and controls
  • Deep AWS experience (IAM, networking/VPC, compute, storage, managed services, observability) and strong understanding of reliability and failure modes in cloud systems
  • Proficiency in Python for building automation, tooling, and reliability improvements
  • Strong incident management and on-call leadership experience, including measurable improvements (availability, MTTR, alert quality, cost, or operational maturity)
Job Responsibility
Job Responsibility
  • Act as the technical leader for reliability for one or more domains
  • set direction and standards while remaining hands-on where it matters most
  • Drive reliability strategy across critical services: define SLOs/SLIs, error budgets, and reliability KPIs aligned to customer journeys and outcomes
  • Own incident response maturity: lead complex incidents, improve incident command practices, and ensure high-quality RCAs with prioritized, tracked remediation
  • Architect and implement automation to reduce toil and risk: runbook automation, self-service tools, and safe operational workflows (Python + Argo Workflows)
  • Advance GitOps delivery practices using Argo CD: promotion strategies, progressive delivery/canaries, and guardrails that reduce deploy risk
  • Scale infrastructure management with Crossplane and Terraform: reusable patterns, policy controls, and paved roads for teams
  • Lead operational readiness and reliability reviews for new features/architectural changes
  • reinforce non-functional requirements (availability, latency, security, cost)
  • Improve performance and cost efficiency through capacity planning, load testing, right-sizing, and architecture recommendations across AWS services
What we offer
What we offer
  • Pet Insurance
  • Health Insurance
  • Dental Insurance
  • Vision Insurance
  • FSA
  • HSA
  • HSA With Employer Contribution
  • Life Insurance
  • Short-Term Disability
  • Long-Term Disability
Read More
Arrow Right

Senior Site Reliability Engineering Manager

Microsoft Substrate is the foundational cloud platform that powers many of Micro...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Doctorate Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration
  • OR Master's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administration
  • OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 5+ years technical experience in software engineering, network engineering, or systems administration
  • OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Ability to obtain and maintain appropriate background investigations and customer screenings for access to GCC Moderate, GCC High, and Department of Defense environments
  • For access to GCCH and DoD environments, ability to obtain and maintain a favorably adjudicated Tier 3 (T3) background investigation
  • For access to GCCM environments, ability to meet Criminal Justice Information Services (CJIS) eligibility requirements
  • For manager-level roles, a Tier 5 (T5) background investigation is preferred
  • Pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Lead and develop a team of Site Reliability Engineer ICs, providing clear expectations, regular coaching, and career guidance across senior and principal levels
  • Own the operational health and reliability posture of Substrate services running in regulated environments
  • Drive change and influence across the org as you establish and drive SLOs, SLIs, and operational metrics
  • Lead effective incident management and post-incident reviews
  • Serve as an actively engaged on-call engineer (OCE) and participate in an on-call rotation
  • Own reliability, resilience, and disaster recovery, including driving and coordinating DR and game day exercises
  • Drive engineering led operational excellence at scale
  • Partner with engineering and product teams to embed reliability, security, and compliance considerations early in service design
  • Influence technical and operational strategy beyond your immediate team
  • Represent your team’s work clearly to leadership and partners
  • Fulltime
Read More
Arrow Right

Principal Site Reliability Engineering Manager

Microsoft Substrate is the foundational cloud platform that powers many of Micro...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Doctorate Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration
  • OR Master's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administration
  • OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 5+ years technical experience in software engineering, network engineering, or systems administration
  • OR equivalent experience
  • Candidates must be able to meet Microsoft, customer and/or government security screening requirements required for this role
  • This role requires access to Microsoft Government cloud environments, including GCC Moderate (GCCM), GCC High (GCCH), and Department of Defense (DoD) environments
  • For access to GCCH and DoD environments, this role requires the ability to obtain and maintain a favorably adjudicated Tier 3 (T3) background investigation
  • For access to GCCM environments, this role requires the ability to meet Criminal Justice Information Services (CJIS) eligibility requirements
  • For manager-level roles, a Tier 5 (T5) background investigation is preferred
  • Candidates may be considered without currently holding these background investigations, provided they are eligible for and able to successfully obtain them
Job Responsibility
Job Responsibility
  • Lead and develop a team of Site Reliability Engineer ICs, providing clear expectations, regular coaching, and career guidance across senior and principal levels
  • Own the operational health and reliability posture of Substrate services running in regulated environments
  • Drive change and influence across the org as you establish and drive SLOs, SLIs, and operational metrics
  • Lead effective incident management and post-incident reviews
  • Serve as an actively engaged on-call engineer (OCE) and participate in an on-call rotation
  • Own reliability, resilience, and disaster recovery, including driving and coordinating DR and game day exercises
  • Drive engineering led operational excellence at scale
  • Partner with engineering and product teams to embed reliability, security, and compliance considerations early in service design
  • Influence technical and operational strategy beyond your immediate team
  • Represent your team’s work clearly to leadership and partners
  • Fulltime
Read More
Arrow Right