Senior HPC Deployment Engineer Job at Hewlett Packard Enterprise (Melbourne)

Senior Network Engineer, Deployment

Crusoe's mission is to accelerate the abundance of energy and intelligence. We’r...

Location

United States , San Francisco, Sunnyvale

Salary:

162000.00 - 196000.00 USD / Year

Crusoe

Expiration Date

Until further notice

Requirements

10+ years of related experience building and operating at scale in a production environment
In-depth knowledge of network protocols including TCP/IP, QoS, BGP, OSPF/IS-IS, EVPN, VXLAN, QoSand MPLS-related technologies like RSVP-TE, LDP, etc.
Good understanding of network monitoring protocols and tools, such as SNMP, IPFIX, Sflow/netflow, and Telemetry
Familiar with data center network architecture, such as Fat Tree architecture, CLOS, BGP-TE, and peering for edge
Hands-on experience with major network devices like Mellanox, Cisco, Arista, Juniper, and other mainstream vendors
Familiar with mainstream commercial switch/router chipsets, such as Broadcom, Barefoot, etc.
In-depth knowledge of public cloud architecture connectivity options to AWS, GCP, Azure, Ali Cloud, OCI, etc.
Good understanding of IPv6 and IPv4-IPv6 coexistence technologies
Self-motivated, with good communication and writing skills
Team player and participate in Crusoe Energy Cloud network global on-call rotation

Job Responsibility

Deploy, build, and optimize Crusoe Energy Cloud's global network, including edge, backbone, data center, and public cloud connectivity
Work with cross-functional teams, including but not limited to Software Infrastructure and Product, to drive the innovation and evolution of the Crusoe Energy Cloud network
Work with external vendors and ISPs to test and verify device and carrier selection
Will be part of a 24/7 Oncall Support for the Crusoe Network

What we offer

Restricted Stock Units in a fast growing, well-funded technology company
Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
Employer contributions to HSA accounts
Paid Parental Leave
Paid life insurance, short-term and long-term disability
Teladoc
401(k) with a 100% match up to 4% of salary
Generous paid time off and holiday schedule
Cell phone reimbursement
Tuition reimbursement

Fulltime

Senior Cybersecurity Engineer

Senior Cybersecurity Engineer LOCATION: Eglin AFB, FL JOB STATUS: Full-time C...

Location

United States , Eglin Air Force Base

Salary:

Not provided

Astrion

Expiration Date

Until further notice

Requirements

Master’s Degree (in Computer Science, Cybersecurity or a related field). Relevant experience may be substituted for the degree
10 Years’ total experience, at least 8 of which is in cybersecurity engineering, architecture or R&D infrastructure
Top Secret Clearance with SCI. Eligible for Special Access Program (SAP) access. US Citizenship is required
DoD 8570/8140 IAT Level III (CISSP, CISM, or equivalent). Certifications: Security+, CEH, or other relevant security certifications
Expert-level knowledge of cybersecurity principles, risk management, and secure computing architectures
Hands-on experience with security tools and technologies, such as SIEM, intrusion detection/prevention systems, vulnerability scanners, and endpoint protection solutions. Experience with Host-Based Security System (HBSS), Assured Compliance Assessment Solution (ACAS), Nessus, Tenable.sc, Tenable.io, NNM, LCE, Nessus Manager, Agents, and Scanner
Experience with scripting (Python, PowerShell) and automation tools (Ansible, Chef)
Familiarity with Risk Management Framework (RMF), Authority to Operate (ATO) documentation, and enclave compliance management
Physically able to lift up to 50 lbs
adaptable to fieldwork and hands-on installations

Job Responsibility

Collaborate with network engineers to architect secure network topologies for current and future connected and isolated environments, ensuring security is embedded in the design phase
Design and deploy security solutions for S&T environments that support continuous research, development, and DevSecOps, working closely with network engineers to implement and maintain these solutions
Advise on security planning for long-term initiatives, including SDREN integration and the Weapons Technology Integration Center (WTIC) and other facility projects, in conjunction with network planning efforts
Develop security innovation roadmaps aligned with mission goals and emerging technologies, coordinating with network engineers to ensure alignment with network modernization efforts
Coordinate with facilities, engineering, and network teams to ensure robust infrastructure supports secure research operations, focusing on the security aspects of network hardware/power/cooling needs and structured cabling
Lead security aspects of containerization, virtualization, and orchestration of systems to support laboratory computing, HPC, and edge devices, working with network engineers to implement secure configurations
Engineer multiple S&T networks security architecture in compliance with NIST 800-series, DoD RMF, DISA Security Technical Implementation Guides (STIGs), and cybersecurity best practices, collaborating with network engineers to ensure seamless integration. Review engineering, architecture, and designs to ensure DoD security policies are met
Implement DevSecOps pipelines to automate security scans and CI/CD deployments, working with network engineers to integrate security into existing pipelines
Manage ATO package development and collaborate with ISSMs, network engineers, and cybersecurity stakeholders to ensure compliance. Review and develop RMF Assessment and Authorization (A&A) documentation, e.g. System Security Plans (SSPs), Security Assessment Reports (SARs), and Plans of Action and Milestones (POA&Ms)
Integrate identity management and single sign-on solutions across enclaves and hybrid environments, coordinating with network engineers to implement and maintain these solutions. Analyze and tune HBSS policies for assets during integration test events. Perform verification and troubleshooting across all HBSS modules. Install updates to HBSS software as released and in compliance with STIG requirements. Monitor HBSS software to ensure that the clients/servers are operational and reporting properly

What we offer

Competitive salaries
Continuing education assistance
Professional development
Multiple healthcare benefits package options
401K with employer matching
Competitive time off policy along with a federally recognized holiday schedule

Fulltime

Senior Software Engineer

We are looking for a dynamic, energetic Sr. Software Systems Design Engineer to ...

Location

India , Bangalore

Salary:

Not provided

AMD

Expiration Date

Until further notice

Requirements

Very strong data structure and algorithmic skills
Experience in software development using C/C++ and debugging skills on multicore systems
Experience in identifying performance bottlenecks, and designing/implementing optimizations to relieve analyzed bottlenecks
Experience in x86 (or other architecture based) optimizations
Understanding of Cache sub-system, Instruction Set Architecture, pipeline (for any CPU)
Experience in performance analysis for data center, HPC (High Performance Computing), MPI (Message passing Interface) applications
Bachelors or Master's degree in Computer Science Engineering or related field.

Job Responsibility

Problem solving across multiple software layers (user space, kernel, applications, libraries) and hardware
Optimization/development of the CPU performance stack (applications, libraries) for AMD server processors
Analyze and solve performance, scalability bottlenecks when code is running on multi-core, multi-node deployments
Innovate and publish papers, patents and participate in technical conferences to advance AMD technologies
Continuously learn and grow along with evolving X86 server CPU architecture and application landscape
Lead collaborative approaches with multiple teams
Mentor others to achieve integrated projects.

Fulltime

Senior Software Engineer - Performance Tooling

The Artificial Intelligence (AI) Frameworks team at Microsoft develops AI softwa...

Location

United States , Redmond

Salary:

119800.00 - 234700.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C++, or Python OR equivalent experience
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. This includes passing the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Master's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C++, or Python OR Bachelor's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C++, or Python OR equivalent experience
4+ years’ practical experience working on high performance applications and performance debugging and optimization on CPUs/GPUs
Experience in DNN/LLM inference and experience in one or more DL frameworks such as PyTorch, Tensorflow, or ONNX Runtime and familiarity with CUDA, ROCm, Triton
Technical background and solid foundation in software engineering principles, computer architecture, GPU architecture, hardware neural net acceleration
Experience in end-to-end performance analysis and optimization of state of the art LLMs and HPC applications, including proficiency using GPU profiling tools
Cross-team collaboration skills and the desire to collaborate in a team of researchers and developers
Ability to independently lead projects

Job Responsibility

Work across multiple layers of the AI software stack (abstractions, programming models, compilers, runtimes, libraries, and APIs) to enable large-scale model training and inference
Benchmark OpenAI and other LLMs for performance on GPUs and Microsoft hardware
Debug, profile, and optimize performance for training/inference workloads on Central Processing Units (CPUs)/Graphics Processing Units (GPUs)
Monitor performance regressions and drive continuous improvements to reduce time-to-deploy and hardware footprint
Collaborate across teams of researchers and engineers to deliver scalable, production-ready AI performance improvements

Fulltime

Senior Power Engineer

Microsoft Silicon, Cloud Hardware, and Infrastructure Engineering (SCHIE) is the...

Location

United States , Redmond

Salary:

119800.00 - 234700.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Master's Degree in Electrical Engineering, Computer Engineering, or related field AND 3+ years technical engineering experience OR Bachelor's Degree in Electrical Engineering, Computer Engineering, or related field AND 5+ years technical engineering experience OR equivalent experience
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter

Job Responsibility

Define and design rack level power systems to enable integration of a variety of IT gears into Microsoft data centers
Work with cross-functional teams to drive product qualification full test coverage to meet product requirements and ensure product deployment
Lead rack level power system testing and identify integration issues between PSU, power shelf, firmware and system load
Contribute to rack level power system solution roadmap with cross-functional teams and vendor partners to ensure long term scalability of Microsoft power infrastructure
Collaborate with cross-functional teams on power delivery for high-performance compute (HPC), AI workloads, and ODM platforms
Define and enforce power system design standards, safety protocols, and compliance with global regulatory requirements
Engage with external partners, suppliers, and academic collaborators to advance power system innovation

Senior Software Engineer - Copilot Security

Copilot Security is at the core of Microsoft’s mission to deliver trusted, human...

Location

United States , Redmond

Salary:

119800.00 - 234700.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
3+ years in technical engineering roles building large-scale services.
Hands-on experience designing and operating security-critical or AI-powered systems at scale, including agentic AI, secure orchestration, or advanced threat defenses.
Proven ability to design, build, and ship agentic AI features or frameworks.
Ability to clearly explain complex systems and security concepts to technical and non-technical stakeholders and influence cross-org roadmaps.
Agentic AI Development & Orchestration: Experience building production agent systems using frameworks such as LangGraph, Amazon Strands SDK, or similar platforms
familiarity with agentic design patterns including tool calling, multi-agent coordination, and secure delegation patterns.
Hands-on experience with distributed training frameworks (Ray, Slurm, HPC), containerization and orchestration technologies (Docker, Kubernetes) for ML model deployment, and ML lifecycle management in production environments.
Experience designing evaluation frameworks for LLM-based applications and implementing observability for agent systems using tools such as Phoenix, MLFlow, LangFuse, or custom eval harnesses
understanding of AI safety evaluation methodologies including adversarial testing and red-teaming.

Job Responsibility

Develop and ship agentic AI-powered security features that protect users from threats such as prompt injection, adversarial manipulation, and abuse of agentic workflows.
Implement secure orchestration frameworks that enable Copilot to safely delegate, coordinate, and execute actions across devices, services, and platforms.
Invent and apply new intelligent agents that leverage information flow analysis and apply common sense and judgement guardrails for security and privacy.
Collaborate with product, engineering, security, privacy, and AI teams to adopt agentic security patterns and best practices across Copilot and MAI.
Monitor key metrics for agentic AI security and innovation, using data-driven insights to improve defenses and enablement.
Document secure agentic AI patterns, ensuring they address novel risks, support safe delegation, and enable responsible orchestration of actions.

Fulltime

Senior DevOps Engineer (AI & Cloud Infrastructure)

We are seeking a Senior DevOps Engineer to design, deploy, and operate the next ...

Location

United States , Palo Alto

Salary:

175000.00 - 250000.00 USD / Year

Inflection AI

Expiration Date

Until further notice

Requirements

5+ years of hands-on experience in DevOps, Site Reliability Engineering, or ML Infrastructure supporting high-scale, production systems
Deep expertise in Azure and AWS, including storage, compute, networking, databases, and cloud-native monitoring services
Strong Kubernetes administration experience, including GPU scheduling, operator deployment, and management of core infrastructure components
experience with Slurm is highly desirable
Proven experience deploying, scaling, and operating Large Language Models (LLMs) and inference engines such as vLLM, TGI, or Triton
Strong experience with modern DevOps tooling: Terraform, Helm, Kustomize, ArgoCD, GitHub Actions or GitLab CI, Prometheus, Grafana, and Clickhouse
Advanced scripting and automation skills in Python and Bash, with the ability to debug complex distributed systems and optimize performance at scale
Demonstrated ability to troubleshoot LLM servers, Kubernetes workloads, GPU utilization, and cloud infrastructure bottlenecks
Have a bachelor’s degree or equivalent in a related field to the offered position requirements.

Job Responsibility

Architect, deploy, and operate large-scale LLM inference servers and AI applications with a focus on low latency, high availability, and production reliability
Design, provision, and maintain complex cloud architectures across Azure and AWS, including storage, compute, networking, databases, and native LLM services
Manage GPU-enabled Kubernetes clusters and Slurm-based HPC environments, optimizing resource allocation for AI training and inference workloads
Deploy and operate core Kubernetes infrastructure components and operators (GPU operators, ingress controllers, service meshes, CNIs, CSIs, and storage drivers)
Build scalable infrastructure-as-code and deployment workflows using Terraform, Helm, Kustomize, ArgoCD, and GitOps best practices
Design and maintain centralized observability systems using Prometheus, Grafana, Clickhouse, and cloud-native monitoring tools
Participate in on-call rotations, lead incident response, perform post-mortems, and continuously improve system reliability and SLAs.

What we offer

Diverse medical, dental and vision options
401k matching program
Unlimited paid time off
Parental leave and flexibility for all parents and caregivers
Support of country-specific visa needs for international employees living in the Bay Area
Meaningful equity component.

Fulltime

Senior Research Engineer

The HPE HPC & AI EMEA Research Lab (ERL) is characterized by a unique blend of i...

Location

Germany , Munich, Berlin

Salary:

Not provided

Hewlett Packard Enterprise

Expiration Date

Until further notice

Requirements

Development experience in compiled languages such as C, C++ or Fortran and experience with interpreted environments such as Python
At least a B.Sc. equivalent in a Science, Technology, Engineering or Mathematical discipline
Parallel programming experience, with programming models such as OpenMP, MPI, CUDA, OpenACC, HIP, PGAS languages, etc.
An understanding of AI/ML frameworks, experience with frameworks such as TensorFlow or PyTorch is highly desirable
An interest in system- and data center monitoring and operational data analysis
Professional language skills in English and German

Job Responsibility

Perform world-class research while also shaping products of the future
Work with the most esteemed research partners across Europe
Enable high performance research software on pre-Exascale and Exascale supercomputers
Provide new environments/abstractions to support application developers to build, deploy, and run applications taking advantage of leading-edge hardware at scale
Make and operate HPC/AI systems and datacenters in a sustainable way
Manage modern data-intensive workloads in high performance environments

What we offer

Competitive salary and extensive benefits package (pension scheme, insurances, bike and car leasing, and other fringe benefits)
Work-life balance (flexible working time and hybrid workplace model, 30 vacation days, four HPE Wellness-Fridays, up to six months paid parental leave)
Support for education, training, and career development
Diverse and dynamic work environment

Select Country

Senior HPC Deployment Engineer

Job Description

Job Responsibility

Requirements

Nice to have

What we offer

Looking for more opportunities?