CrawlJobs Logo

Product Reliability Engineer - Defense

United States, New York 82000.00 - 140000.00 USD / Year · Job Posted February 20, 2026
Apply Position
Job Link Share

Job Description

Product Reliability Engineers (PREs) are responsible for the health, performance, and stability of the services that power services at Palantir. PREs take ownership over the entire end-to-end cycle of service reliability, from responding to outages to improving codebases and building lasting solutions. You will tackle critical issues for key customers, introduce observability into complex systems, address tech debt in essential codebases, and inform strategic investments in core products. We are looking for engineers who enjoy deep-dive troubleshooting, feel strong ownership over the problems they encounter, and recognize the urgency of customer-facing outages. PREs spend the majority of their time on forward-looking product work, including but not limited to, infrastructure migrations, product contributions to improve stability and observability, and codebase enhancements that increase resilience. During periodic on-call shifts, we respond to automated alerts, investigate issues reported by customers, and share technical expertise with adjacent product teams. Whatever the technical issue or question about your service is, you'll play a central and critical role in resolving it, seeking not just a one-time fix, but a permanent solution. We provide new team members with an experienced mentor and a clear onboarding framework to set them up for success in the role.

Job Responsibility

  • Continuously invest in documentation, metrics, monitors and other troubleshooting tools
  • Participate in on-call rotations during business hours and occasional weekends. This is a challenging yet rewarding opportunity to help remediate the most pressing issues across the Palantir fleet.
  • Diagnose, resolve, and prevent issues encountered in the field. Deliver end-to-end improvements to core products based on these issues you encounter in the field.
  • Improve observability by refactoring codepaths and introducing telemetry
  • Identify and implement data-driven opportunities for improved service resilience
  • Develop strategic opinions on stability investments and inform the vision for long-term product stability

Requirements

  • Engineering background in Computer Science, Mathematics, Software Engineering, Physics or similar field
  • Ability to work with a high degree of ownership and a strong sense of urgency in a dynamic environment
  • Experience producing code in backend languages such as Java, as part of a past role or personal projects
  • Familiarity with storage and data processing systems and cloud infrastructure
  • Strong written and verbal communication and ability to iterate quickly with teammates and incorporate feedback
  • Eligibility and willingness to obtain a US Security clearance

Nice to have

  • Comfortable with and curious about large scale production systems and technologies. For example, load balancing, monitoring, distributed systems, and configuration management.
  • Confidence in troubleshooting complex issues independently using observability tools and stack traces
  • Familiarity with monitoring tools such as Prometheus and health checks
  • Experience coding with Java, Go and/or web technologies (e.g. HTML, CSS, JavaScript, Python/Ruby, Django/Flask/Ruby on Rails, etc.) is a plus
  • Track record of identifying bugs in codebases and contributing fixes leading to long term service stability
  • Demonstrated ability making data-driven decisions and engaging with stakeholders on strategy

What we offer

  • Employees (and their eligible dependents) can enroll in medical, dental, and vision insurance as well as voluntary life insurance
  • Employees are automatically covered by Palantir’s basic life, AD&D and disability insurance
  • Commuter benefits
  • Take what you need paid time off, not accrual based
  • 2 weeks paid time off built into the end of each year (subject to team and business needs)
  • 10 paid holidays throughout the calendar year
  • Supportive leave of absence program including time off for military service and medical events
  • Paid leave for new parents and subsidized back-up care for all parents
  • Fertility and family building benefits including but not limited to adoption, surrogacy, and preservation
  • Stipend to help with expenses that come with a new child
  • Employees can enroll in Palantir’s 401k plan

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Product Reliability Engineer - Defense

8 matching positions

Product Reliability Engineer - Defense

Product Reliability Engineers (PREs) are responsible for the health, performance...
Location
Location
United States , Washington, D.C.
Salary
Salary:
82000.00 - 140000.00 USD / Year
palantir.com Logo
Palantir Technologies
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Engineering background in Computer Science, Mathematics, Software Engineering, Physics or similar field
  • Ability to work with a high degree of ownership and a strong sense of urgency in a dynamic environment
  • Experience producing code in backend languages such as Java, as part of a past role or personal projects
  • Familiarity with storage and data processing systems and cloud infrastructure
  • Strong written and verbal communication and ability to iterate quickly with teammates and incorporate feedback
  • Eligibility and willingness to obtain a US Security clearance
Job Responsibility
Job Responsibility
  • Continuously invest in documentation, metrics, monitors and other troubleshooting tools
  • Participate in on-call rotations during business hours and occasional weekends. This is a challenging yet rewarding opportunity to help remediate the most pressing issues across the Palantir fleet
  • Diagnose, resolve, and prevent issues encountered in the field. Deliver end-to-end improvements to core products based on these issues you encounter in the field
  • Improve observability by refactoring codepaths and introducing telemetry
  • Identify and implement data-driven opportunities for improved service resilience
  • Develop strategic opinions on stability investments and inform the vision for long-term product stability
What we offer
What we offer
  • Employees (and their eligible dependents) can enroll in medical, dental, and vision insurance as well as voluntary life insurance
  • Employees are automatically covered by Palantir’s basic life, AD&D and disability insurance
  • Commuter benefits
  • Take what you need paid time off, not accrual based
  • 2 weeks paid time off built into the end of each year (subject to team and business needs)
  • 10 paid holidays throughout the calendar year
  • Supportive leave of absence program including time off for military service and medical events
  • Paid leave for new parents and subsidized back-up care for all parents
  • Fertility and family building benefits including but not limited to adoption, surrogacy, and preservation
  • Stipend to help with expenses that come with a new child
  • Fulltime
Read More
Arrow Right

Principal Site Reliability Engineer

Arcadia’s customers rely on us to securely process and deliver high-value health...
Location
Location
Salary
Salary:
Not provided
themuse.com Logo
The Muse
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience in SRE, platform engineering, systems engineering, or related roles operating production services at scale
  • Demonstrated principal-level impact: leading cross-team initiatives, influencing architecture decisions, and driving sustained improvements in reliability and operations
  • Expertise in Kubernetes operations and troubleshooting, including safe rollout/rollback patterns, workload debugging, and operational guardrails
  • Strong GitOps experience with Argo CD
  • experience building delivery workflows and automation using Argo Workflows
  • Strong infrastructure orchestration and provisioning experience with Crossplane and Terraform
  • ability to define reusable platform patterns and controls
  • Deep AWS experience (IAM, networking/VPC, compute, storage, managed services, observability) and strong understanding of reliability and failure modes in cloud systems
  • Proficiency in Python for building automation, tooling, and reliability improvements
  • Strong incident management and on-call leadership experience, including measurable improvements (availability, MTTR, alert quality, cost, or operational maturity)
Job Responsibility
Job Responsibility
  • Act as the technical leader for reliability for one or more domains
  • set direction and standards while remaining hands-on where it matters most
  • Drive reliability strategy across critical services: define SLOs/SLIs, error budgets, and reliability KPIs aligned to customer journeys and outcomes
  • Own incident response maturity: lead complex incidents, improve incident command practices, and ensure high-quality RCAs with prioritized, tracked remediation
  • Architect and implement automation to reduce toil and risk: runbook automation, self-service tools, and safe operational workflows (Python + Argo Workflows)
  • Advance GitOps delivery practices using Argo CD: promotion strategies, progressive delivery/canaries, and guardrails that reduce deploy risk
  • Scale infrastructure management with Crossplane and Terraform: reusable patterns, policy controls, and paved roads for teams
  • Lead operational readiness and reliability reviews for new features/architectural changes
  • reinforce non-functional requirements (availability, latency, security, cost)
  • Improve performance and cost efficiency through capacity planning, load testing, right-sizing, and architecture recommendations across AWS services
What we offer
What we offer
  • Pet Insurance
  • Health Insurance
  • Dental Insurance
  • Vision Insurance
  • FSA
  • HSA
  • HSA With Employer Contribution
  • Life Insurance
  • Short-Term Disability
  • Long-Term Disability
Read More
Arrow Right

Reliability Engineer – Performance & Life-Cycle Assurance

Mach Industries is seeking a Reliability Engineer who will own the end-to-end re...
Location
Location
United States , Huntington Beach
Salary
Salary:
150000.00 - 200000.00 USD / Year
machindustries.com Logo
Mach Industries
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Mechanical Engineering, Electrical/Electronic Engineering, Aerospace Engineering, Systems Engineering or related discipline
  • 5+ years of reliability engineering (or similar) experience in complex hardware-centric systems
  • preferably in aerospace/defense/unmanned systems or high-reliability industrial/automotive environments
  • Demonstrated experience applying reliability methods such as FMEA, FMECA, and RCFA
  • Strong data-analysis skills: ability to ingest large data sets (field returns, operational logs), perform statistical/trend analysis, build dashboards, derive actionable insights
  • Experience with reliability testing: accelerated life tests, environmental stress screening, vibration/thermal/thermal-cycle/shock/humidity, life-cycle modelling
  • Knowledge of safety‐critical system standards and regulatory requirements (e.g., MIL-STD, DO-178, DO-254)
Job Responsibility
Job Responsibility
  • Develop, deploy and maintain a reliability program plan for our UAS platforms and key subsystems (hardware, firmware, software) following best-practices (e.g., failure-mode and effects analysis (FMEA))
  • Define reliability and maintainability requirements and metrics (e.g., MTBF, MTBR, availability, mission readiness, failure rate targets) early in the design lifecycle, and track performance through production and field operation
  • Using data (lab testing, manufacturing, field returns, in-service logs) perform analytics to identify trends, root causes of failures (RCFA), latent defects, and reliability risks—then drive corrective and preventive actions
  • Define and oversee reliability test plans, accelerated life testing, environmental stress screening, field-data analysis, degradation modelling and life-cycle modelling in collaboration with test & validation teams
  • Monitor key reliability indicators (e.g., failure-rate trending, early‐life failures, wear-out characteristics, maintenance cost per unit time/mission, parts-life forecasting) and provide actionable insights to leadership
  • Communicate reliability status, risk posture, and improvement plans to senior leadership and stakeholders, including interfacing with defense-customer reliability/quality requirements and audits if applicable
What we offer
What we offer
  • Offers Equity
  • healthcare
  • dental and vision plans
  • retirement savings
  • paid time off
  • continuing education
  • training
  • career growth
  • Fulltime
Read More
Arrow Right

Software Engineer, Internship - Defense Tech

Software Engineers at Palantir build software at scale to transform how organiza...
Location
Location
United States , Palo Alto
Salary
Salary:
10500.00 USD / Month
palantir.com Logo
Palantir Technologies
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Engineering background in fields such as Computer Science, Mathematics, Software Engineering, and Physics
  • Familiarity with data structures, storage systems, cloud infrastructure, front-end frameworks, and other technical tools
  • Active US Security clearance, or eligibility and willingness to obtain a US Security clearance prior to start of internship
  • Experience coding in programming languages, such as Java, C++, Python, JavaScript, or similar languages
  • Must be planning on graduating in 2027. This should be your final internship before graduating
Job Responsibility
Job Responsibility
  • Ownership: We see projects through from beginning to end in spite of obstacles we may encounter
  • Collaboration: We work internally with people from a variety of backgrounds — such as other Software Engineers, Product Managers, Designers and Product Reliability Engineers. We also partner with our business development teams (Forward Deployed Engineers, Deployment Strategists) in order to understand and solve our customers' problems
  • Trust: We trust each other to effectively handle time and priorities, and don't micromanage. We want people to have the space to think for themselves, while feeling supported by their team
What we offer
What we offer
  • Promoting health and well-being across all areas of Palantirians’ lives is just one of the ways we’re investing in our community
  • Fulltime
Read More
Arrow Right

Software Engineer - Data Infra Reliability

Luma's mission is to build multimodal AI to expand human imagination and capabil...
Location
Location
United States , Palo Alto
Salary
Salary:
220000.00 - 280000.00 USD / Year
lumalabs.ai Logo
Luma AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Deep SRE/DevOps proficiency: You live and breathe Linux, networking, and automation
  • Infrastructure-as-Code Native: You have extensive experience with Terraform, Ansible, or similar tools to manage complex cloud environments (AWS/GCP)
  • Kubernetes Expert: You have managed Kubernetes in production and understand its internals, not just how to deploy containers
  • Python Proficiency: You can write high-quality Python code for automation, tooling, and infrastructure management
  • Data-Minded: You understand the specific challenges of stateful data systems and high-throughput storage (S3/Object Store)
Job Responsibility
Job Responsibility
  • Automate Everything: Apply Infrastructure-as-Code (IaC) principles using Terraform to provision, manage, and scale our data infrastructure
  • Harden Data Pipelines: Build reliability and fault tolerance into our core data ingestion and processing workflows, ensuring high availability for research jobs
  • Scale Kubernetes & Ray: Operate and optimize large-scale Kubernetes clusters and Ray deployments to handle bursty, high-throughput workloads
  • Define Reliability: Establish Service Level Objectives (SLOs) and observability standards (Prometheus/Grafana) for our data platforms
  • Debug & Heal: serve as the first line of defense for complex infrastructure failures, diagnosing root causes in distributed storage and compute systems
  • Fulltime
Read More
Arrow Right
New

General Manager

We are looking for a General Manager to lead an aviation operation in North Caro...
Location
Location
United States , Winston-Salem
Salary
Salary:
Not provided
https://www.roberthalf.com Logo
Robert Half
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Textile Engineering, Industrial Engineering, Business, or a related field
  • At least 5 years of leadership experience in textiles, manufacturing, aerospace, or a closely related industry
  • Demonstrated understanding of aviation textile standards, regulatory compliance, and quality assurance practices
  • Experience overseeing operational performance, daily production activity, and cross-functional manufacturing teams
  • Proven ability to manage financial results, including budgets, forecasts, and profit-and-loss accountability
  • Knowledge of supply chain coordination, inventory management, and production planning within a manufacturing environment
  • Strong communication, client management, and relationship-building skills
Job Responsibility
Job Responsibility
  • Direct day-to-day operations for aviation textile manufacturing, ensuring production targets, quality expectations, and delivery commitments are achieved
  • Guide planning across production, materials, inventory, and supply chain activities to maintain efficient workflow and reliable product availability
  • Ensure products and processes align with applicable aviation regulations, customer specifications, and required certification standards
  • Lead quality programs, internal reviews, external audits, and regulatory inspections to uphold compliance across the division
  • Drive process improvement efforts that increase productivity, reduce waste, and support cost-effective manufacturing performance
  • Develop business growth plans focused on aviation customers, including OEM, MRO, commercial, defense, and private aviation markets
  • Maintain strong working relationships with clients, vendors, and industry partners to support long-term business success and customer satisfaction
  • Manage the division’s financial performance through budgeting, forecasting, KPI tracking, and disciplined cost control
  • Provide leadership to cross-functional teams spanning production, engineering, quality, and sales while promoting safety, accountability, and career development
What we offer
What we offer
  • Medical, vision, dental, and life and disability insurance
  • Enrollment in company 401(k) plan
  • Fulltime
Read More
Arrow Right

Systems Engineer (Experienced or Lead)

At Boeing, we innovate and collaborate to make the world a better place. We’re c...
Location
Location
United States , Hazelwood; Saint Charles; Berkeley
Salary
Salary:
112200.00 - 185150.00 USD / Year
boeing.com Logo
Boeing
Expiration Date
June 15, 2026
Flip Icon
Requirements
Requirements
  • Bachelor of Science degree in Engineering, Engineering Technology (including Manufacturing Technology), Computer Science, Data Science, Mathematics, Physics, Chemistry or non-US equivalent qualifications directly related to the work statement
  • 4 or more years' related engineering experience
  • Prior Systems Engineering experience (i.e. system design, functional decomposition, requirements development, analysis, verification, and validation)
Job Responsibility
Job Responsibility
  • Lead the systems engineering efforts on new development, production, and/or sustainment programs, ensuring alignment with program goals and objectives
  • Translate customer and operational needs into system performance requirements
  • Guide cross-functional teams to define and maintain system requirements, interfaces, behaviors, and verification criteria for complex systems
  • Perform analyses in affordability, safety, reliability, maintainability, testability, human factors, survivability, vulnerability, security, certification, and product assurance to achieve mission success
  • Run design reviews and technical assessments, giving recommendations to improve system performance and reliability
  • Maintain and improve requirements management, risk/issues/opportunity tracking, tools, and technology readiness assessment processes
  • Lead the program in implementing and/or adopting the latest SE methodologies (e.g. Model Based Systems Engineering) to meet customer expectations
  • Mentor, coach and advise engineers across the program in SE tools, techniques, planning and strategy
What we offer
What we offer
  • competitive base pay and variable compensation opportunities
  • health insurance
  • flexible spending accounts
  • health savings accounts
  • retirement savings plans
  • life and disability insurance programs
  • paid and unpaid time away from work
  • Best in class 401(k) plan: we'll match your contributions dollar for dollar, up to 10% of eligible pay with Immediate 100% vesting
  • Student Loan Match: The Boeing 401(k) Student Loan Match allows eligible enrolled U.S. employees to have their qualified student loan debt payments counted, along with any match-eligible contributions they make, for purposes of determining the Company Match to employees' Boeing 401(k) accounts
  • Generous company match to your 401(k)
  • Fulltime
!
Read More
Arrow Right
New

Electrical Engineer

The Electrical Engineer is responsible for the electrical design of simulated tr...
Location
Location
United States , Tampa
Salary
Salary:
Not provided
aerosimulation.com Logo
Aero Simulation, Inc. (ASI)
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Working knowledge of hardware and electrical system design and test processes
  • Working knowledge of A/C and D/C power distribution, grounding, I/O distribution, networking, KVM, Emergency Power Off, Overheat, and Audio electrical designs
  • Demonstrated experience producing manufacturable electrical designs
  • Previous experience working with government customers with preferred experience presenting and/or supporting requirements reviews, design reviews, and acceptance testing
  • Proficiency in common business software (Microsoft Office – Word, Outlook, PowerPoint, Excel, SharePoint, Visio)
  • Bachelors Degree in Electrical Engineering or related field
  • Equivalent experience to education and three years’ experience related to aircraft or flight simulation (Mid-Level)
  • U.S. Citizenship Required
  • Must be able to successfully pass an initial background screening
  • Must be able to obtain and maintain an active Department of Defense (DoD) security clearance
Job Responsibility
Job Responsibility
  • Work closely with Systems and Mechanical Engineering
  • Gather a comprehensive view for the design and development of new training systems in accordance with Government/Industry standards and Customer specifications
  • Use the collected information and electrical engineering background to produce electrical designs and details that are manufacturable, ergonomic, reliable, and maintainable. This includes the design of wire lists, cable drawings, system drawings, and top-level assemblies/installations
  • Work with and mentor electrical engineers of multiple levels to develop comprehensive and cohesive electrical designs. Typical systems are A/C and D/C power distribution, grounding, I/O distribution, networking, KVM, Emergency Power Off, Overheat, and Audio
  • Ensures specification of hardware by working closely with vendors and suppliers for successful technology solutions
  • Communicate effectively and work closely with the Computer Aided Design team to generate wire lists, cable drawings, system drawings, and top-level assemblies/installations
  • Support customer meetings including requirement reviews, design reviews, and acceptance testing. Creation of electrical material and presentation of the material is required
  • Provides support to manufacturing in the form of resolving design and documentation issues during production phases, and documenting changes by creating engineering change documents
  • Support customer events such as configuration audits and maintenance training
What we offer
What we offer
  • flexible work environment
  • generous paid time off
  • professional development opportunities
  • industry competitive compensation
  • medical
  • dental
  • 401k
  • Employee Stock Ownership Plan (ESOP)
  • Fulltime
Read More
Arrow Right