CrawlJobs Logo

Cleared Senior/ Principal Computing Infrastructure Engineer

sandia.gov Logo

Sandia National Laboratories

Location Icon

Location:
United States , Albuquerque

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

87400.00 - 199700.00 USD / Year

Job Description:

We are seeking a Computing Infrastructure Engineer to perform the full lifecycle management (analysis, design, development, testing, integration and maintenance) of multi-user information systems including servers, storage, virtual and cloud infrastructures, and associated components and subsystems. Maintains the integrity of servers and systems to meet established requirements for service levels, disaster recovery, business continuity, and security. Coordinates and interfaces with customers, suppliers, and domain experts such as networking, software, and desktop personnel.

Job Responsibility:

  • Perform the full lifecycle management (analysis, design, development, testing, integration and maintenance) of multi-user information systems including servers, storage, virtual and cloud infrastructures, and associated components and subsystems
  • Maintains the integrity of servers and systems to meet established requirements for service levels, disaster recovery, business continuity, and security
  • Coordinates and interfaces with customers, suppliers, and domain experts such as networking, software, and desktop personnel
  • Applies systematic, disciplined, and quantifiable engineering and life cycle management techniques and processes to deliver integrated information systems hardware for mission-enabled or direct mission deployment and production
  • Performs technical research and development to enable continuing innovation within the infrastructure
  • Evaluates, recommends, and may decide hardware and software technologies for hardware system solutions
  • Introduces new computer systems and technologies into current configurations for optimum information systems engineering functionality
  • Collaborates with customers, suppliers, and other domain experts such as networking, software and desktop personnel in the development and integration of hardware systems solutions
  • Performs hardware systems analysis, monitoring, troubleshooting, repair and recovery, disaster recovery, software installations, systems OS and software applications upgrades and security patches, performance tuning, hardware upgrades and resource optimization, maintenance of user accounts, and training users

Requirements:

  • A Bachelor's degree in a relevant discipline and five (5) years of directly relevant experience, or an equivalent combination of directly relevant education and engineering or scientific experience that demonstrates the knowledge, skills, and ability to perform independent research and development
  • Experience in various operating systems (e.g., Linux, Windows), networking protocols, virtualization technologies
  • Experience in scripting languages (e.g., Python, Bash) and configuration management tools (e.g., Ansible, Puppet)
  • Experience in various network hardware components (e.g., Dell/HP servers and desktops, Thin Clients, Zero Clients, Firewalls)
  • Active DOE Q clearance

Nice to have:

  • Experience in designing, implementing, and managing computing infrastructure, including servers, storage systems, and network devices
  • Experience with data center operations and best practices is essential
  • Understanding of cybersecurity principles and practices, including secure network design, access controls, and vulnerability management
  • Certifications: Industry certifications such as Certified Information Systems Security Professional (CISSP), Certified Cloud Security Professional (CCSP), or vendor-specific certifications (e.g., AWS Certified Solutions Architect) are highly desirable
  • Project Management: Familiarity with project management methodologies and tools to effectively plan, execute, and deliver infrastructure projects on time and within budget
  • Automation and DevOps: Experience with automation tools (e.g., Jenkins, GitLab CI/CD) and knowledge of DevOps principles to streamline infrastructure deployment and management processes
  • Cloud Expertise: Proficiency in deploying and managing infrastructure in public, private, or hybrid cloud environments. Knowledge of containerization technologies (e.g., Docker, Kubernetes) is a plus
  • Continuous Learning: Demonstrated commitment to staying updated with emerging technologies, industry trends, and best practices through self-learning, training, or participation in professional communities
  • Knowledge of NIST 800-53 requirements
  • Strong analytical and problem-solving skills to identify and resolve infrastructure issues efficiently. Experience with monitoring tools and performance optimization techniques is important
  • Excellent verbal and written communication skills to interact with team members, stakeholders, and vendors. Ability to work collaboratively in a multidisciplinary environment
What we offer:
  • Challenging work with amazing impact that contributes to security, peace, and freedom worldwide
  • Extraordinary co-workers
  • Some of the best tools, equipment, and research facilities in the world
  • Career advancement and enrichment opportunities
  • Flexible work arrangements for many positions include 9/80 (work 80 hours every two weeks, with every other Friday off) and 4/10 (work 4 ten-hour days each week) compressed workweeks, part-time work, and telecommuting (a mix of onsite work and working from home)
  • Generous vacation, strong medical and other benefits, competitive 401k, learning opportunities, relocation assistance and amenities aimed at creating a solid work/life balance

Additional Information:

Job Posted:
March 14, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Cleared Senior/ Principal Computing Infrastructure Engineer

Senior ML Data Engineer

As a Senior Data Engineer, you will play a pivotal role in our AI/ML workstream,...
Location
Location
Poland , Warsaw
Salary
Salary:
Not provided
awin.com Logo
Awin Global
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor or Master’s degree in data science, data engineering, Computer Science with focus on math and statistics / Master’s degree is preferred
  • At least 5 years experience as AI/ML data engineer undertaking above task and accountabilities
  • Strong foundation in computer science principes and statistical methods
  • Strong experience with cloud technology (AWS or Azure)
  • Strong experience with creation of data ingestion pipeline and ET process
  • Strong knowledge of big data tool such as Spark, Databricks and Python
  • Strong understanding of common machine learning techniques and frameworks (e.g. mlflow)
  • Strong knowledge of Natural language processing (NPL) concepts
  • Strong knowledge of scrum practices and agile mindset
  • Strong Analytical and Problem-Solving Skills with attention to data quality and accuracy
Job Responsibility
Job Responsibility
  • Design and maintain scalable data pipelines and storage systems for both agentic and traditional ML workloads
  • Productionise LLM- and agent-based workflows, ensuring reliability, observability, and performance
  • Build and maintain feature stores, vector/embedding stores, and core data assets for ML
  • Develop and manage end-to-end traditional ML pipelines: data prep, training, validation, deployment, and monitoring
  • Implement data quality checks, drift detection, and automated retraining processes
  • Optimise cost, latency, and performance across all AI/ML infrastructure
  • Collaborate with data scientists and engineers to deliver production-ready ML and AI systems
  • Ensure AI/ML systems meet governance, security, and compliance requirements
  • Mentor teams and drive innovation across both agentic and classical ML engineering practices
  • Participate in team meetings and contribute to project planning and strategy discussions
What we offer
What we offer
  • Flexi-Week and Work-Life Balance: We prioritise your mental health and well-being, offering you a flexible four-day Flexi-Week at full pay and with no reduction to your annual holiday allowance. We also offer a variety of different paid special leaves as well as volunteer days
  • Remote Working Allowance: You will receive a monthly allowance to cover part of your running costs. In addition, we will support you in setting up your remote workspace appropriately
  • Pension: Awin offers access to an additional pension insurance to all employees in Germany
  • Flexi-Office: We offer an international culture and flexibility through our Flexi-Office and hybrid/remote work possibilities to work across Awin regions
  • Development: We’ve built our extensive training suite Awin Academy to cover a wide range of skills that nurture you professionally and personally, with trainings conveniently packaged together to support your overall development
  • Appreciation: Thank and reward colleagues by sending them a voucher through our peer-to-peer program
Read More
Arrow Right

Director of Engineering

The Director of Engineering is the senior technical execution leader responsible...
Location
Location
United States , Aberdeen Proving Ground
Salary
Salary:
Not provided
VES
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Engineering, Computer Science, or a related technical field (Master's degree preferred)
  • 15+ years of engineering experience, including significant hands-on technical responsibility for complex systems
  • 7+ years in senior technical leadership roles, such as Principal Engineer, Chief Engineer, Lead Architect, or equivalent
  • Demonstrated ability to independently solve complex, cross-domain technical problems involving software, systems, infrastructure, and security
  • Strong understanding of software engineering, systems engineering, integration practices, and modern deployment environments
  • Experience implementing and enforcing SDLC, configuration management, and quality standards
  • Experience working in a government contracting or regulated environment, including DoD or Federal programs
  • Ability to communicate complex technical concepts clearly to engineers, program leadership, executives, and customers
  • Excellent written and oral communication skills with respect to the above requirements
  • Ability to obtain and maintain a U.S. Government security clearance
Job Responsibility
Job Responsibility
  • Lead and oversee engineering execution across multiple concurrent programs, ensuring solutions meet cost, schedule, performance, quality, and architectural expectations
  • Serve as the primary technical execution lead across the organization, with authority to make technical decisions necessary to unblock delivery and resolve engineering challenges
  • Act as the first escalation point for complex technical problems, integration failures, and cross-program dependencies, independently driving solutions for the majority of issues before CTO involvement is required
  • Apply deep systems-level technical judgment to diagnose, frame, and resolve difficult engineering problems spanning software, systems, infrastructure, deployment, and security
  • Ensure engineering decisions made under delivery pressure preserve long-term system maintainability, reliability, and scalability
  • Develop and maintain a deep understanding of VES engineering processes, standards, and technical expectations, and ensure they are applied consistently across programs
  • Partner with Principal Engineers to review and approve system architectures, technical approaches, and major design decisions
  • Ensure architectural consistency and technical coherence across programs while allowing appropriate flexibility to meet mission and customer needs
  • Identify systemic technical issues, recurring failure modes, and architectural debt across the portfolio and drive corrective action
  • Work closely with Principal Engineers (Mission Command, Land Systems, Emerging Technologies, Cyber Security, Systems Engineering) as domain technical authorities
What we offer
What we offer
  • 401(k) match
  • Highly Competitive Salary
  • Up to 15 Paid Vacation days / year
  • 11 Paid Holidays
  • Flexible work/life balance culture
  • Fulltime
Read More
Arrow Right

Principal Technical Program Manager - AI and Search Infrastructure

Atlassians can choose where they work – whether in an office, from home, or a co...
Location
Location
United States , San Francisco; Mountain View; Seattle
Salary
Salary:
166100.00 - 266800.00 USD / Year
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Minimum of 5+ years of experience with deep domain expertise in the areas of Search Infrastructure (i.e. Search Quality, Search Relevance, Vector Search) or Artificial Intelligence (i.e. LLMs, Inference, Fine-tuning, Content Generation)
  • Minimum 10+ years of experience in Technical Product / Program Management or Technical Leadership role
  • Experience in 0-1 product / platform development and scaling
  • Excellent collaborator who can work across teams effectively and represent domain area to senior leadership (VP+ levels)
  • Experience in gathering product needs, setting strategy & plans, creating clear roadmap with aligned success metrics
  • Experience driving platform evangelization and adoption internally and with external customers (i.e. Enterprise / SMB)
  • Experience building platform products for Enterprise is a plus
  • Master degree or higher education in Computer Science is a plus
  • Excellent verbal, written, and facilitation skills (including experience with facilitating meetings and engaging with an executive audience)
  • Demonstrated experience and success leading high-impact, cross-functional programs or products
Job Responsibility
Job Responsibility
  • Own the strategic vision for Atlassian platform area (i.e. AI, Search Infra) with focus of driving best outcomes for customers (Atlassians)
  • Provide technical expertise for Atlassian platform area (AI or Search Infra)
  • Define product / platform strategy, plans, priorities and roadmap
  • Partner with engineering team and other disciplines on strategy, plans, roadmaps and execution
  • Lead communication with senior leadership, customers and stakeholders on regular cadence
  • Establish key performance indicators (KPIs) and metrics to measure the effectiveness of key initiatives / projects
What we offer
What we offer
  • health coverage
  • paid volunteer days
  • wellness resources
  • Fulltime
Read More
Arrow Right

Senior Principal Engineering Manager

Microsoft Research (MSR) is working to transform the future of artificial intell...
Location
Location
United States , Redmond
Salary
Salary:
163000.00 - 296400.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • 5+ years of people management experience leading software engineering teams, including managing principal engineers
  • Experience building or operating infrastructure for large-scale distributed systems, cloud platforms, or artificial intelligence (AI)/machine learning(ML) workloads
  • Track record of driving execution on complex, multi-workstream infrastructure projects with clear milestones and accountability
  • Technical fluency in one or more of: large-scale compute clusters, GPU infrastructure, scheduling and orchestration (Kubernetes, Volcano), or High-Performance Compute (HPC) environments
  • Experience with GPU programming (CUDA, NCCL) and frameworks such as PyTorch
  • Expertise in networking (InfiniBand, NVLink), storage systems, or distributed training parallelisms
  • A track record of strong cross-functional partnerships, including the ability to align on strategic direction, deliver joint accountabilities, and develop relationships with staff members with widely varied expertise
  • Experience scaling engineering teams through significant growth phases (hiring, onboarding, and integrating new engineers into a high-performing team)
  • Master's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 15+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
Job Responsibility
Job Responsibility
  • Lead, mentor, and grow the engineering team that builds MSR’s AI research infrastructure
  • Recruit and develop exceptional engineering talent, building a diverse team - including hiring, onboarding, career development, and performance management
  • Drive execution across the team by setting clear goals, tracking milestones, managing dependencies, and ensuring accountability for delivering complex infrastructure projects on time and at high quality
  • Lead team culture and process changes, cultivating an AI-first mentality that accelerates our progress through agentic coding, automation, and skills development
  • Provide technical vision and judgment on the team's architecture, strategy, and roadmap — spanning supercomputer GPU clusters, high performance networking, workload optimization, researcher tools, and agentic workflows — while empowering engineers to own deep technical details
  • Collaborate closely cross-discipline with engineers, program managers, and research and science teams to align priorities, resolve dependencies, and build better solutions together
  • Foster a team culture of operational excellence, continuous improvement, and high psychological safety where engineers are empowered to take ownership and innovate
  • Fulltime
Read More
Arrow Right

Senior Principal Product Development - Hardware

Microsoft Silicon, Cloud Hardware, and Infrastructure Engineering (SCHIE) is the...
Location
Location
United States , Redmond
Salary
Salary:
163000.00 - 296400.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master's Degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 9+ years technical engineering experience OR Bachelor's Degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 11+ years technical engineering experience OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Technical Leadership & Strategy: Lead the definition and evolution of scalable, repeatable processes across hardware development and sustainment
  • Cross‑Functional Collaboration: Partner closely with development teams across Microsoft, ODM partners, and platform engineering groups to ensure system requirements are clearly defined, aligned, and executed with precision
  • Process & Engineering Innovation: Drive continuous improvements in hardware development workflows using data, telemetry, and modern engineering methodologies
  • Stakeholder Engagement & Alignment: Influence and communicate effectively with senior Microsoft and supplier leaders to ensure alignment on scalable architectures, engineering best practices, and long‑term platform strategy
  • Operational Excellence & Data‑Driven Execution: Establish and uphold the highest standards for quality, sustainability, and resilience across hardware development
  • Fulltime
Read More
Arrow Right

Principal Site Reliability Engineering Manager

Microsoft Substrate is the foundational cloud platform that powers many of Micro...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Doctorate Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration
  • OR Master's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administration
  • OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 5+ years technical experience in software engineering, network engineering, or systems administration
  • OR equivalent experience
  • Candidates must be able to meet Microsoft, customer and/or government security screening requirements required for this role
  • This role requires access to Microsoft Government cloud environments, including GCC Moderate (GCCM), GCC High (GCCH), and Department of Defense (DoD) environments
  • For access to GCCH and DoD environments, this role requires the ability to obtain and maintain a favorably adjudicated Tier 3 (T3) background investigation
  • For access to GCCM environments, this role requires the ability to meet Criminal Justice Information Services (CJIS) eligibility requirements
  • For manager-level roles, a Tier 5 (T5) background investigation is preferred
  • Candidates may be considered without currently holding these background investigations, provided they are eligible for and able to successfully obtain them
Job Responsibility
Job Responsibility
  • Lead and develop a team of Site Reliability Engineer ICs, providing clear expectations, regular coaching, and career guidance across senior and principal levels
  • Own the operational health and reliability posture of Substrate services running in regulated environments
  • Drive change and influence across the org as you establish and drive SLOs, SLIs, and operational metrics
  • Lead effective incident management and post-incident reviews
  • Serve as an actively engaged on-call engineer (OCE) and participate in an on-call rotation
  • Own reliability, resilience, and disaster recovery, including driving and coordinating DR and game day exercises
  • Drive engineering led operational excellence at scale
  • Partner with engineering and product teams to embed reliability, security, and compliance considerations early in service design
  • Influence technical and operational strategy beyond your immediate team
  • Represent your team’s work clearly to leadership and partners
  • Fulltime
Read More
Arrow Right

Senior Site Reliability Engineering Manager

Microsoft Substrate is the foundational cloud platform that powers many of Micro...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Doctorate Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration
  • OR Master's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administration
  • OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 5+ years technical experience in software engineering, network engineering, or systems administration
  • OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Ability to obtain and maintain appropriate background investigations and customer screenings for access to GCC Moderate, GCC High, and Department of Defense environments
  • For access to GCCH and DoD environments, ability to obtain and maintain a favorably adjudicated Tier 3 (T3) background investigation
  • For access to GCCM environments, ability to meet Criminal Justice Information Services (CJIS) eligibility requirements
  • For manager-level roles, a Tier 5 (T5) background investigation is preferred
  • Pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Lead and develop a team of Site Reliability Engineer ICs, providing clear expectations, regular coaching, and career guidance across senior and principal levels
  • Own the operational health and reliability posture of Substrate services running in regulated environments
  • Drive change and influence across the org as you establish and drive SLOs, SLIs, and operational metrics
  • Lead effective incident management and post-incident reviews
  • Serve as an actively engaged on-call engineer (OCE) and participate in an on-call rotation
  • Own reliability, resilience, and disaster recovery, including driving and coordinating DR and game day exercises
  • Drive engineering led operational excellence at scale
  • Partner with engineering and product teams to embed reliability, security, and compliance considerations early in service design
  • Influence technical and operational strategy beyond your immediate team
  • Represent your team’s work clearly to leadership and partners
  • Fulltime
Read More
Arrow Right

Software Engineer II, AI Developer Tools

At Docker, we make app development easier so developers can focus on what matter...
Location
Location
United States , Seattle
Salary
Salary:
128000.00 - 181500.00 USD / Year
docker.com Logo
Docker
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 2+ years building backend systems, APIs, or developer-facing tools with strong software engineering fundamentals
  • Proficiency in Go (preferred), Rust, Java, or Python with understanding of data structures, algorithms, and design patterns
  • Basic understanding of AI/ML concepts with eagerness to learn about LLM APIs, prompt engineering, and AI agent development through hands-on work
  • Experience with cloud platforms (AWS, GCP, or Azure) and understanding of distributed systems or microservices
  • Familiarity with CI/CD pipelines, automated testing, version control (Git), and modern development workflows
  • Strong problem-solving skills with ability to work through technical challenges with guidance from senior engineers
  • Good communication skills in remote, asynchronous environments with ability to document technical decisions
  • Collaborative mindset with eagerness to learn from code reviews and feedback
  • Self-motivated with ability to work autonomously while knowing when to ask for help
  • Passion for developer tools and user experience
Job Responsibility
Job Responsibility
  • Build AI Developer Tool Features: Implement features for AI-powered developer tools such as code review assistants, test generators, deployment diagnostics, and on-call assistance tools
  • Implement LLM Integrations: Build integrations with LLM APIs (OpenAI, Anthropic, etc.) such as prompt engineering, response handling, error management, and performance optimization
  • Contribute to Platform Infrastructure: Help build self-service platform capabilities such as deployment pipelines, observability integration, security controls, and operational tooling that enable teams to rapidly deploy AI developer tools
  • Support AI-Native Development Adoption: Contribute to tools and programs that help teams adopt AI developer tools such as Claude Code, Cursor, and Warp across Docker's engineering organization
  • Write Quality Code: Develop well-tested code with unit and integration tests
  • follow team coding standards and participate actively in code reviews to learn best practices
  • Maintain Production Systems: Assist with monitoring, alerting, and troubleshooting production AI systems
  • participate in incident response and learn operational best practices
  • Collaborate and Learn: Work closely with Senior Engineers and Principal Engineer on technical designs
  • ask questions, seek feedback, and continuously improve your skills in AI/LLM technologies and platform engineering
What we offer
What we offer
  • Freedom & flexibility
  • fit your work around your life
  • Designated quarterly Whaleness Days plus end of year Whaleness break
  • Home office setup
  • we want you comfortable while you work
  • 16 weeks of paid Parental leave
  • Technology stipend equivalent to $100 net/month
  • PTO plan that encourages you to take time to do the things you enjoy
  • Training stipend for conferences, courses and classes
  • Equity
  • Fulltime
Read More
Arrow Right