Principal Software Engineer, SDN Networking Job at Crusoe (San Francisco)

Principal Software Engineer

You will be responsible for the design and development of a scalable distributed...

Location

United States , Santa Clara

Salary:

147000.00 - 237500.00 USD / Year

Palo Alto Networks

Expiration Date

Until further notice

Requirements

Bachelor's degree in Computer Science, IT, or equivalent experience
5+ years of experience specifically operating Red Hat OpenShift (OCP) in a production environment
Deep experience racking/stacking and cabling high-density GPU systems (e.g., NVIDIA DGX or similar) and specialized AI/ML hardware
Advanced proficiency in Ansible or Pulumi for automating bare-metal provisioning and cluster configuration
Strong Python and Bash skills for developing custom health-check scripts and API integrations
Expert-level CoreOS and RHEL administration, including kernel tuning and systemd management
Solid understanding of BGP, VLAN tagging, LACP, and Load Balancing (F5/NGINX) essential for cluster ingress
Experience with vSphere or KVM, and persistent storage solutions like OpenShift Data Foundation (ODF) or Ceph
Familiarity with DCIM tools (Netbox) and monitoring stacks ( ELK/Lok ..etci)
Ability to lift and move equipment up to 50 pounds (e.g., high-density 2U/4U servers)

Job Responsibility

Design and development of a scalable distributed management plane infrastructure to manage Palo Alto Networks’ next-generation network security solutions
Monitor and maintain data center systems with a focus on 'Zero Single Point of Failure' (ZSPoF) architecture for OpenShift control planes and worker nodes
Implement and manage OpenShift 4.x clusters across multiple power and cooling zones to ensure 99.99% uptime
Design, test, and execute automated failover strategies and backup/restore procedures using tools like OADP (Velero) and Red Hat ACM
Perform routine maintenance and upgrades using GitOps (ArgoCD) and the Machine Config Operator to ensure zero-downtime node evacuations and patching
Resolve deep-stack hardware and software issues, from faulty GPU firmware to OpenShift SDN (OVN-Kubernetes) network latencies
Coordinate with vendors for specialized hardware (e.g., NVIDIA, Dell, Cisco) while maintaining strict security and firmware compliance
Optimize rack density for high-performance GPU clusters while managing thermal loads and power distribution (PDU) to prevent circuit-trip outages
Maintain accurate documentation and integrate hardware health metrics (IPMI/SNMP) into Prometheus/Grafana for proactive alerting
Rack and stack high-density GPU servers, ensuring redundant power-pathing and high-speed (100G/200G) InfiniBand or Ethernet cabling

Fulltime

Principal Software Engineer

We’re looking for a seasoned Principal Engineer to take full ownership of Airwal...

Location

Singapore , Singapore

Salary:

Not provided

Airwallex

Expiration Date

Until further notice

Requirements

Deep experience in cloud-native edge networking (API Gateway, DNS, CDN, GA, firewalls)
Proficiency with SDN concepts and tools (e.g., OpenDaylight, Envoy, NGINX/OpenResty, Kong, Apisix)
Familiar with Cloudflare, AWS or GCP Cloud Networking, techniques
Knowledge of hybrid/multi-cloud patterns and traffic engineering at scale
Hands-on with cloud firewall systems, WAF, rate limiting, and bot detection
A security-aware mindset with ability to balance protection and developer experience
Experience defining cross-team processes and governance frameworks
Strong communication skills and ability to lead across engineering and security teams

Job Responsibility

Own the Edge Network Stack
Design and evolve the architecture for Airwallex's external traffic stack including: API Gateways (routing, filtering, throttling), DNS services (global resolution & routing), CDNs (caching strategies and invalidation), Global Accelerators (latency and route optimization)
Define and Enforce Border Security
Partner with InfoSec to design and operationalize: DDoS protection, bot mitigation, and anomaly detection (e.g., Cloud Armor, WAF), Rate limiting and QoS policy enforcement for prioritized customer/partner APIs, Firewall rule governance and bad actor prevention mechanisms, Intrusion Prevention and Auth mechanisms at the border
Policy-Driven API Route Management
Build end-to-end processes and tooling for how engineers expose public APIs: Define policy and controls for route registration, approval, and change management, Work with platform teams to enforce compliance across microservices and gateways, Contribute to internal tools for observability, access review, and lifecycle auditing
Enable Global-Scale, Secure Performance
Establish reliability and quality of service (QoS) goals for critical paths (e.g., payments, onboarding, auth), Design for hybrid/multi-cloud edge strategy and backbone traffic replication, Tune latency, failover, and availability posture across regions

Fulltime

Principal Software Engineer

The HPC/AI (High performance Computing and Artificial Intelligence) team is on a...

Location

United States , Multiple Locations

Salary:

163000.00 - 296400.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
OR equivalent experience
Ability to meet Microsoft, customer and/or government security screening requirements
Master's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 15+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
Hands-on experience with networking technologies in AI-specific hardware (e.g., InfiniBand, ROCE, NVLink)
In-depth understanding of networking protocols (e.g., Ethernet, TCP/IP, RDMA, gRPC) and distributed systems
Familiarity with network virtualization, software-defined networking (SDN), or network performance tuning
Familiarity with AI accelerators such as GPUs (NVIDIA, AMD) or TPUs, and how they interact with networking infrastructure
Experience with telemetry and observability tools for network monitoring at scale
Background in building scalable and fault-tolerant systems in large, distributed environments

Job Responsibility

Partner with appropriate stakeholders to determine user requirements for a set of scenarios
Lead identification of dependencies and the development of design documents for a product, application, service, or platform
Leads by example and mentors others to produce extensible and maintainable code used across products
Design, develop, and optimize networking solutions tailored for large-scale AI training infrastructure
Architect and implement high-performance, low-latency, and low-jitter communication frameworks for distributed systems
Benchmark, analyze, and enhance the scalability and reliability of networking systems to handle petabyte-scale data transfer
Debug and resolve complex networking issues in large-scale, high-performance environments
Drive identification of dependencies and the development of design documents for a product, application, service, or platform
Create, implement, optimize, debug, refactor, and reuse code to establish and improve performance and maintainability, effectiveness, and return on investment (ROI)
Act as a Designated Responsible Individual (DRI) and guides other engineers by developing and following the playbook, working on call to monitor system/product/service for degradation, downtime, or interruptions, alerting stakeholders about status and initiates actions to restore system/product/service for simple and complex problems when appropriate

Fulltime

Principal Software Engineer - Azure Compute

Do you love to work on the latest technologies? Are you looking to make a real d...

Location

India , Hyderabad

Salary:

Not provided

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Strong background in Linux systems, systems design, and OS fundamentals
Experience in host networking, network overlays, and SDN is a plus
Bachelor’s or advanced degree with 10+ years of experience in software development and leading teams

Job Responsibility

Partners with appropriate stakeholders to determine user requirements for a set of scenarios
Leads identification of dependencies and the development of design documents for a product, application, service, or platform
Leads by example and mentors others to produce extensible and maintainable code used across products
Leverages subject-matter expertise of cross-product features with appropriate stakeholders (e.g., project managers) to drive multiple group's project plans, release plans, and work items
Holds accountability as a Designated Responsible Individual (DRI), mentoring engineers across products/solutions, working on-call to monitor system/product/service for degradation, downtime, or interruptions
Proactively seeks new knowledge and adapts to new trends, technical solutions, and patterns that will improve the availability, reliability, efficiency, observability, and performance of products while also driving consistency in monitoring and operations at scale and shares knowledge with other engineers

Fulltime

Sr Principal Presales, Systems Engineer - Cloud & AI Networking

Sr Principal Presales, Systems Engineer - Cloud & AI Networking. This role has b...

Location

United States

Salary:

194500.00 - 456500.00 USD / Year

Hewlett Packard Enterprise

Expiration Date

Until further notice

Requirements

Bachelor's degree in engineering or related field required, advanced degree preferred
15+ years of technical experience in network infrastructure Architecture, design and solutions consulting, designing large-scale AI, Cloud Data Center in Hyperscale networking environments
15+ years of industry experience applying technical expertise with JunOS, EOS, IOS and hyperscale network architecture frameworks
JNCIE or equivalent certification
Deep hands-on expertise with hyperscale routing, MPLS traffic engineering, switching, SDN overlays (EVPN/VXLAN, MP-BGP), Network Fabrics and DCI solutions, network provisioning, automation and monitoring (Apstra, Paragon, NETCONF/REST APIs, YANG)
Familiarity with major networking silicon, hardware platforms and related software (e.g., Juniper, Arista, Cisco, SONiC, Cumulus etc.)
SME in Virtualization technologies on x86 and scaling using technologies such DPDK, SR-IOV, SmartNICs etc.
Proven success in complex pre-sales roles, with a strong ability to build relationships with senior technical and C-Level stakeholders
Experience with programming/scripting (Python, APIs, JSON etc.) to enable solution integration and automation
Strong communication, presentation, and interpersonal skills with the ability to influence diverse technical and business audiences

Job Responsibility

Act as a primary technical architect early in complex sales cycles, working independently or with Sales Engineers to translate business needs into scalable technical solutions
Architect end-to-end networking infrastructure solutions with strong expertise in data center, AI networking, hyperscale environments, and WAN technologies
Lead solution design and customization, orchestrating input from specialists, systems engineers, and product teams to meet customer requirements
Deliver compelling demos, proof-of-concepts, and technical presentations that clearly articulate HPE Networking’s value in customer use cases
Engage with hyperscalers, large enterprises, service providers and technology partners to co-design solutions addressing emerging challenges across industries and workloads
Influence technical strategy throughout deal validation, solutioning, and initial execution, ensuring smooth handoff to delivery teams
Develop best practices, enablement collateral, and architecture playbooks to scale solutions internally and with partners
Mentor and provide technical leadership to junior technologists and sales engineering teams
Navigate and adapt through complex technology transitions, maintaining thought leadership in networking innovations and industry trends

What we offer

Health & Wellbeing
Personal & Professional Development
Unconditional Inclusion

Fulltime

Principal Engineer

The Senior Data Center Operations Engineer is responsible for the bedrock of our...

Location

United States , Santa Clara

Salary:

147000.00 - 237500.00 USD / Year

Palo Alto Networks

Expiration Date

Until further notice

Requirements

Bachelor's degree in Computer Science, IT, or equivalent experience
5+ years of experience specifically operating Red Hat OpenShift (OCP) in a production environment
Deep experience racking/stacking and cabling high-density GPU systems (e.g., NVIDIA DGX or similar) and specialized AI/ML hardware
Advanced proficiency in Ansible or Pulumi for automating bare-metal provisioning and cluster configuration
Strong Python and Bash skills for developing custom health-check scripts and API integrations
Expert-level CoreOS and RHEL administration, including kernel tuning and systemd management
Solid understanding of BGP, VLAN tagging, LACP, and Load Balancing (F5/NGINX) essential for cluster ingress
Experience with vSphere or KVM, and persistent storage solutions like OpenShift Data Foundation (ODF) or Ceph
Familiarity with DCIM tools (Netbox) and monitoring stacks ( ELK/Lok ..etci)
Ability to lift and move equipment up to 50 pounds (e.g., high-density 2U/4U servers)

Job Responsibility

Design and development of a scalable distributed management plane infrastructure to manage Palo Alto Networks’ next-generation network security solutions
Ensure 99.99% availability by architecting resilient physical layouts and automating the deployment, scaling, and self-healing capabilities of our production clusters
Monitor and maintain data center systems with a focus on 'Zero Single Point of Failure' (ZSPoF) architecture for OpenShift control planes and worker nodes
Implement and manage OpenShift 4.x clusters across multiple power and cooling zones to ensure 99.99% uptime
Design, test, and execute automated failover strategies and backup/restore procedures using tools like OADP (Velero) and Red Hat ACM
Perform routine maintenance and upgrades using GitOps (ArgoCD) and the Machine Config Operator to ensure zero-downtime node evacuations and patching
Resolve deep-stack hardware and software issues, from faulty GPU firmware to OpenShift SDN (OVN-Kubernetes) network latencies
Coordinate with vendors for specialized hardware (e.g., NVIDIA, Dell, Cisco) while maintaining strict security and firmware compliance
Optimize rack density for high-performance GPU clusters while managing thermal loads and power distribution (PDU) to prevent circuit-trip outages
Maintain accurate documentation and integrate hardware health metrics (IPMI/SNMP) into Prometheus/Grafana for proactive alerting

Fulltime

Principal Software Engineering Manager

The HPC/AI (High-Performance Computing and Artificial Intelligence) organization...

Location

United States , Multiple Locations

Salary:

139900.00 - 274800.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
Master's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
4+ years people management experience
10+ years of professional software design and development experience in large-scale distributed systems
Experience building and operating networking infrastructure for hyperscale datacenters or AI clusters
Hands-on experience with networking technologies in AI-specific hardware (e.g., InfiniBand, ROCE, MRC, NVLink)
In-depth understanding of networking protocols (e.g., Ethernet, TCP/IP, RDMA, gRPC) and distributed systems
Familiarity with network virtualization, software-defined networking (SDN), or network performance tuning
Familiarity with AI accelerators such as GPUs (NVIDIA, AMD) or TPUs, and how they interact with networking infrastructure

Job Responsibility

Hire, manage, and grow a high-performing team of software engineers, fostering a culture of excellence, inclusion, and innovation
Lead the design and development of large-scale distributed systems and services that power Azure’s AI infrastructure
Drive engineering planning and execution while ensuring alignment with organizational OKRs and long-term strategy
Establish lean, scalable, and efficient processes that promote innovation and engineering rigor
Deliver best-in-class engineering by ensuring services and components are modular, secure, reliable, diagnosable, observable, and reusable
Improve test coverage, automation, and integration testing to proactively identify and resolve reliability gaps
Ensure live-site reliability and service health through robust monitoring, telemetry, and automation
Collaborate across Microsoft and partner organizations to deliver cohesive, end-to-end infrastructure solutions
Apply data-driven insights to optimize performance, scalability, and customer satisfaction
Champion Microsoft’s culture by modeling, coaching, and caring—nurturing diversity, inclusion, and continuous growth for your team and peers

Fulltime

NaaS Architect Principal

The NaaS Architect Principal is central to BT International's network transforma...

Location

Spain , Madrid

Salary:

Not provided

Plusnet

Expiration Date

Until further notice

Requirements

Strategic Architecture Leadership – Proven ability to define and communicate network architectural vision, with track record driving large-scale network transformation programs in service provider or cloud environments
Network Architecture Expertise – Deep understanding of service provider networks including SDN, segment routing, MPLS, BGP and overlay technologies, combined with cloud-native networking and container networking patterns
Platform Engineering Mindset – Strong understanding of platform-as-a-product principles, building self-service capabilities and treating internal teams as customers with clear SLAs
API & Integration Architecture – Extensive experience designing API-driven architectures using RESTful, gRPC and event-driven patterns, with knowledge of industry standards including TMF, MEF and CAMARA
Technical Depth – Hands-on background in network engineering with coding capability in at least one language (Python, Go) and participation in technical spike or proof of concept work
Automation & Infrastructure-as-Code – Strong background in network automation, infrastructure-as-code (Terraform, Ansible) and GitOps with Flux/Argo CD
Cloud-Native & Multi-Cloud – Experience with cloud-native patterns including Kubernetes, containers and orchestration, operating across multi-vendor and multi-cloud environments
Observability & Network Operations – Knowledge of observability systems (ELK, Prometheus, Grafana, gNMI), telemetry pipelines, event streaming platforms (Kafka), orchestration platforms (Itential, NetBox) and traffic engineering controllers
Telco Transformation Context – Experience navigating organizational and technical challenges of telco network modernization while maintaining operational continuity
Zero Touch Operations – Knowledge of intent-based networking, automated remediation, workflow-driven operations and compliance management that enable zero-touch networking principles

Job Responsibility

Define and lead the architectural strategy for NaaS platform evolution, establishing target state architectures that balance functional requirements with non-functional requirements including scalability, resilience, security and cost optimization
Work hand in hand with product engineering squads to provide hands-on architectural guidance, working directly with engineers to deliver product excellence as well as technical spikes and proof-of-concepts
Drive API-first architecture across network services, establishing patterns for exposing network capabilities through modern integration approaches including RESTful APIs, gRPC and event-driven patterns, with alignment to industry standards including TMF, MEF and CAMARA
Lead vendor rationalization strategy across network equipment vendors, cloud providers and orchestration platforms, reducing vendor dependencies through strategic build vs buy decisions and phasing out unnecessary third-party systems in favor of composable in-house capabilities
Champion modern architecture patterns including infrastructure-as-code, GitOps, automated provisioning and cloud-native networking that enable continuous delivery and operational excellence
Establish observability frameworks for network services including telemetry pipelines, metrics collection, distributed tracing and logging strategies that enable proactive operations and rapid troubleshooting
Collaborate with platform engineering teams to build Internal Developer Platform capabilities that abstract network complexity and provide self-service access to network functions
Drive architectural governance through design reviews and conformance processes, ensuring solutions align with platform standards while empowering product team autonomy
Provide technical thought leadership on network architecture including SDN underlay, control plane, management plane APIs and telemetry, translating industry trends and technological advances into roadmaps that align with BT International's platform strategy
Mentor architects and engineers, fostering architectural thinking and technical leadership capability across both the architecture and product engineering organizations

Select Country

Principal Software Engineer, SDN Networking

Job Description

Job Responsibility

Requirements

Nice to have

What we offer

Looking for more opportunities?

Principal Software Engineer, SDN Networking

Principal Software Engineer

Principal Software Engineer

Principal Software Engineer

Principal Software Engineer - Azure Compute

Sr Principal Presales, Systems Engineer - Cloud & AI Networking

Principal Engineer

Principal Software Engineering Manager

NaaS Architect Principal

Our AI answers in your language