CrawlJobs Logo

Senior Software Engineer - Networking / RDMA

https://www.microsoft.com/ Logo

Microsoft Corporation

Location Icon

Location:
United States , Santa Clara

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

119800.00 - 234700.00 USD / Year

Job Description:

Microsoft Silicon, Cloud Hardware, and Infrastructure Engineering (SCHIE) is the team behind Microsoft’s expanding Cloud Infrastructure and responsible for powering Microsoft’s “Intelligent Cloud” mission. SCHIE delivers the core infrastructure and foundational technologies for Microsoft's over 200 online businesses including Bing, MSN, Office 365, Xbox Live, Teams, OneDrive, and the Microsoft Azure platform globally with our server and data center infrastructure, security and compliance, operations, globalization, and manageability solutions. Our focus is on smart growth, high efficiency, and delivering a trusted experience to customers and partners worldwide and we are looking for passionate, high-energy engineers to help achieve that mission. The Azure Data Processing Unit (DPU) team brings together state-of-the-art software and hardware expertise to create a highly programmable and high-performance chip with the capability to efficiently handle large data volumes. Thanks to its integrated design, this solution empowers Azure to develop solutions for solving the next generation problems with increased agility and performance leveraging the DPU’s compute, storage, and networking capabilities. As a Principal Software Engineer in the DPU Networking software team, you will design, develop, deploy and support networking packet forwarding and control plane functions that enable high performance data processing within various network endpoints in Azure data centers. You will work as part of a dynamic, multi-talented team of engineers from across the world. You would collaborate with technical stakeholders in a cross functional team manner and contribute towards the success of multiple projects and initiatives across the organization. This opportunity will allow you to develop new solutions for the Azure fleet, participate in the design of cutting-edge networking solutions and hone your design and performance optimization skills. As Microsoft's cloud business continues to grow the ability to deploy new offerings and hardware infrastructure on time, in high volume with high quality and lowest cost is of paramount importance. To achieve this goal, the DPU Networking Software team is instrumental in defining and delivering operational measures of success for quality, delivery, scale and sustainability related to Microsoft cloud software. We are looking for seasoned engineers with a dedicated passion for customer focused solutions, insight and industry knowledge to envision and implement future technical solutions that will manage and optimize the Cloud infrastructure.

Job Responsibility:

  • Collaborate with stakeholders to understand business needs and translate them into technical requirements and solutions
  • Work across team and organizational boundaries to drive clarity and alignment
  • Drives identification of dependencies and the development of design documents for a product, application, service, or platform
  • Drives, creates, implements, optimizes, debugs, refactors, and reuses code to establish and improve performance and maintainability, effectiveness, and return on investment (ROI)
  • Conduct research, stay updated with the latest industry trends, and experiment with cutting-edge technologies to drive innovation
  • Leverages subject-matter expertise of product features and partners with appropriate stakeholders (e.g., project managers) to drive a workgroup's project plans, release plans, and work items
  • Acts as a Designated Responsible Individual (DRI) and guides other engineers by developing and following the playbook, working on call to monitor system/product/service for degradation, downtime, or interruptions, alerting stakeholders about status and initiates actions to restore system/product/service for simple and complex problems when appropriate
  • Proactively seeks new knowledge and adapts to new trends, technical solutions, and patterns that will improve the availability, reliability, efficiency, observability, and performance of products while also driving consistency in monitoring and operations at scale
  • Coaching and mentorship of fellow team members
  • Effective communication skills and a passion for delivering scalable solutions through a diverse team of engineers

Requirements:

  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++
  • OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter

Nice to have:

  • 2+ years of experience in developing networking software stack for RDMA forwarding or control plane functions
  • 4+ years of experience in software design and coding of Layer2/L3/L4 ethernet/IP networking data plane packet forwarding and control plane processing functions within a programmable NIC or network switches and routers or an architecture with hardware offload
  • Experience in developing networking software on DPUs or programmable NICs or other hardware offload architectures
  • Experience in developing technologies for reliable data transfer across network with efficient fabric utilization and deterministic latency
  • CI/CD Experience: Knowledge of Continuous Integration and Continuous Deployment (CI/CD) practices for streamlined software development and deployment processes
  • Scripting for Developer Tools: Proficiency in scripting languages to build and enhance developer tools, automating repetitive tasks and improving workflow efficiency

Additional Information:

Job Posted:
January 31, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior Software Engineer - Networking / RDMA

Principal Software Engineer – CXI Drivers & Kernel Networking

Principal Software Engineer – CXI Drivers & Kernel Networking. This role is part...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of systems software experience with deep expertise in Linux kernel development
  • Strong experience with: PCIe, DMA, interrupts, memory management
  • Strong experience with: Linux networking stack (netdev, IP, sockets, RDMA/RXE)
  • Hands-on experience with Switch or NIC Software Stacks, especially in the low-level kernel and user space
  • Proven ability to debug complex kernel + hardware interactions
  • Excellent C programming and kernel debugging skills
Job Responsibility
Job Responsibility
  • Architect, develop, and maintain Linux kernel drivers for the CXI interconnect, including: CXI Core Driver (shared hardware abstraction and resource management), CXI User Driver (user-space access, queue management, protection domains), CXI Ethernet Driver (IP, RXE, sockets integration)
  • Lead 800G CXI driver development: resource partitioning, Interaction with IOMMU, PCIe, and virtualization stacks
  • Own kernel interfaces used by: Lustre/LNet (kCXI, kfabric provider), Verbs / RXE paths, User-space libraries (libcxi, libfabric providers)
  • Drive performance, scalability, and reliability improvements: Low-latency paths, queueing models, retry/timeout handling, Error reporting, recovery, and fault isolation
  • Collaborate closely with ASIC, firmware, and validation teams to deliver Chip-to-Ship outcomes
  • Act as a technical leader: Design reviews, code reviews, mentoring senior engineers, Influence long-term driver architecture and roadmap
What we offer
What we offer
  • Health & Wellbeing: comprehensive suite of benefits that supports physical, financial and emotional wellbeing
  • Personal & Professional Development: specific programs catered to helping you reach any career goals
  • Unconditional Inclusion: unconditionally inclusive in the way we work and celebrate individual uniqueness
Read More
Arrow Right

Senior Cloud Support Engineer

Crusoe Cloud is revolutionizing high-performance computing by offering sustainab...
Location
Location
Ireland , Dublin
Salary
Salary:
Not provided
crusoe.ai Logo
Crusoe
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in IT, Computer Science, Engineering, or a related field, or 4+ years of equivalent technical experience
  • Strong command-line interface (CLI) skills in Linux environments
  • Proficiency with Git for code management and collaboration
  • 5+ years of experience in a customer support role, ideally within cloud, storage, or networking environments
  • Experience with container orchestration (e.g., Kubernetes), workload management (e.g., Slurm, Terraform), and monitoring tools (e.g., Grafana)
  • Familiarity with other public cloud platforms (e.g., AWS, Azure, GCP)
  • Excellent communication and customer service skills, including the ability to prioritize competing escalations
  • Understanding of HPC technologies such as Infiniband, RDMA, RoCE, and Software Defined Networking (SDN)
Job Responsibility
Job Responsibility
  • Provide exceptional technical support to customers via Zendesk, meeting SLAs and maintaining high CSAT (95%+)
  • Participate in a 24/7 on-call rotation to ensure timely resolution of critical issues
  • Diagnose and resolve issues related to VMs, hardware failures, and scaling tests using CLI and internal tools
  • Manage alert triage, prepare for maintenance windows, and conduct node delivery testing
  • Work closely with SRE, Networking, and Storage teams from initial triage to root cause analysis (RCA) delivery
  • Adhere to global team collaboration and handoff processes for ticketing and on-call procedures
  • Develop onboarding/training materials, knowledge base documentation, and standard operating procedures (SOPs)
What we offer
What we offer
  • pension contributions
  • private health and dental insurance
  • income protection
  • life assurance
  • Fulltime
Read More
Arrow Right

Senior Cloud Support Engineer

Crusoe's mission is to accelerate the abundance of energy and intelligence. We’r...
Location
Location
United States , San Francisco; Denver
Salary
Salary:
125000.00 - 151000.00 USD / Year
crusoe.ai Logo
Crusoe
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in IT, Computer Science, Engineering, or a related field, or 4+ years of equivalent technical experience
  • Strong command-line interface (CLI) skills in Linux environments
  • Proficiency with Git for code management and collaboration
  • 5+ years of experience in a customer support role, ideally within cloud, storage, or networking environments
  • Experience with container orchestration (e.g., Kubernetes), workload management (e.g., Slurm, Terraform), and monitoring tools (e.g., Grafana)
  • Familiarity with other public cloud platforms (e.g., AWS, Azure, GCP)
  • Excellent communication and customer service skills, including the ability to prioritize competing escalations
  • Understanding of HPC technologies such as Infiniband, RDMA, RoCE, and Software Defined Networking (SDN)
Job Responsibility
Job Responsibility
  • Provide exceptional technical support to customers via Zendesk, meeting SLAs and maintaining high CSAT (95%+)
  • Participate in a 24/7 on-call rotation to ensure timely resolution of critical issues
  • Diagnose and resolve issues related to VMs, hardware failures, and scaling tests using CLI and internal tools
  • Manage alert triage, prepare for maintenance windows, and conduct node delivery testing
  • Work closely with SRE, Networking, and Storage teams from initial triage to root cause analysis (RCA) delivery
  • Adhere to global team collaboration and handoff processes for ticketing and on-call procedures
  • Develop onboarding/training materials, knowledge base documentation, and standard operating procedures (SOPs)
What we offer
What we offer
  • Restricted Stock Units in a fast growing, well-funded technology company
  • Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
  • Employer contributions to HSA accounts
  • Paid Parental Leave
  • Paid life insurance, short-term and long-term disability
  • Teladoc
  • 401(k) with a 100% match up to 4% of salary
  • Generous paid time off and holiday schedule
  • Cell phone reimbursement
  • Tuition reimbursement
  • Fulltime
Read More
Arrow Right

Senior Cloud Support Engineer

Crusoe's mission is to accelerate the abundance of energy and intelligence. We’r...
Location
Location
Ireland , Dublin
Salary
Salary:
Not provided
crusoe.ai Logo
Crusoe
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in IT, Computer Science, Engineering, or a related field, or 4+ years of equivalent technical experience
  • Strong command-line interface (CLI) skills in Linux environments
  • Proficiency with Git for code management and collaboration
  • 5+ years of experience in a customer support role, ideally within cloud, storage, or networking environments
  • Experience with container orchestration (e.g., Kubernetes), workload management (e.g., Slurm, Terraform), and monitoring tools (e.g., Grafana)
  • Familiarity with other public cloud platforms (e.g., AWS, Azure, GCP)
  • Excellent communication and customer service skills, including the ability to prioritize competing escalations
  • Understanding of HPC technologies such as Infiniband, RDMA, RoCE, and Software Defined Networking (SDN)
Job Responsibility
Job Responsibility
  • Provide exceptional technical support to customers via Zendesk, meeting SLAs and maintaining high CSAT (95%+)
  • Participate in a 24/7 on-call rotation to ensure timely resolution of critical issues
  • Primary point of contact for incident management, focusing on initial triage, communication, and procedural rigor throughout the incident lifecycle
  • Lead response efforts, ensuring clear communication with both technical and non-technical stakeholders and acting as a customer advocate to minimize disruption
  • Diagnose and resolve issues related to VMs, hardware failures, and scaling tests using CLI and internal tools
  • Manage alert triage, prepare for maintenance windows, and conduct node delivery testing
  • Work closely with SRE, Networking, and Storage teams from initial triage to root cause analysis (RCA) delivery
  • Adhere to global team collaboration and handoff processes for ticketing and on-call procedures
  • Develop onboarding/training materials, knowledge base documentation, and standard operating procedures (SOPs)
What we offer
What we offer
  • pension contributions
  • private health and dental insurance
  • income protection
  • life assurance
  • Fulltime
Read More
Arrow Right

Senior Staff Cloud Support Engineer

As a Senior Staff Cloud Support Engineer, you are a technical authority within C...
Location
Location
United States , San Francisco; Sunnyvale
Salary
Salary:
180000.00 - 220000.00 USD / Year
crusoe.ai Logo
Crusoe
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years experience in SRE, DevOps, HPC, or Cloud Infrastructure roles
  • Advanced Linux systems expertise
  • Deep Kubernetes operational experience (CKA-level or higher)
  • Strong networking knowledge: Infiniband, RDMA, RoCE, SDN
  • Experience supporting AI/ML workloads at scale (GPU clusters)
  • Proven track record of resolving multi-layer, distributed system failures
  • Strong customer communication and executive-facing presence
Job Responsibility
Job Responsibility
  • Serve as highest-level escalation point for complex P1/P0 incidents
  • Lead cross-functional root cause investigations involving compute, networking (IB/RDMA/RoCE), storage, and orchestration layers
  • Partner with SRE, Software teams (Storage, Networking, Compute, K8) to design systemic fixes rather than recurring workarounds
  • Design and improve node validation, burn-in processes, performance baselining, and release readiness
  • Influence Kubernetes architecture, workload orchestration (Slurm, Terraform), and AI/ML cluster stability
  • Reduce MTTR and incident recurrence through structural improvements
  • Troubleshoot NCCL, IB, GPU driver/firmware issues, distributed training failures
  • Support complex AI workloads (training + inference) with performance tuning and observability improvements
  • Act as senior technical advisor during high-risk customer incidents
  • Deliver executive-ready RCAs with clarity and confidence
What we offer
What we offer
  • Restricted Stock Units in a fast growing, well-funded technology company
  • Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
  • Employer contributions to HSA accounts
  • Paid Parental Leave
  • Paid life insurance, short-term and long-term disability
  • Teladoc
  • 401(k) with a 100% match up to 4% of salary
  • Generous paid time off and holiday schedule
  • Cell phone reimbursement
  • Tuition reimbursement
  • Fulltime
Read More
Arrow Right

Senior Infrastructure Engineer

We are seeking a highly skilled and motivated GPU Fleet Operations Engineer to j...
Location
Location
United States , San Francisco; Sunnyvale
Salary
Salary:
183000.00 - 210000.00 USD / Year
crusoe.ai Logo
Crusoe
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven experience diagnosing and repairing high-density, rack-mounted compute hardware in production environments
  • Deep understanding of GPU architectures and hands-on experience with GPU-based systems
  • Experience supporting NVIDIA A100, H200, GB200, B200 and AMD 350X / 355X series platforms
  • Familiarity with high-speed interconnects such as InfiniBand, NVLink, and RDMA over Converged Ethernet (RoCE)
  • Strong Linux experience (Ubuntu, Rocky Linux, CentOS) using the command line for diagnostics and testing
  • Proficiency with GPU and system diagnostic tools such as NVIDIA DCGM and NVIDIA field diagnostic utilities
  • Experience working with enterprise server hardware, power delivery, and cooling systems
  • Strong analytical and problem-solving skills
  • Excellent communication and collaboration skills
  • Ability to work independently in a fast-paced data center or operations environment
Job Responsibility
Job Responsibility
  • Perform deep-level diagnosis and troubleshooting of hardware faults within GPU racks and high-density compute systems
  • Troubleshoot and support GPU platforms including NVIDIA A100, H200, GB200, B200 and AMD 350X / 355X
  • Execute component-level diagnosis and remediation for failed or degraded hardware
  • Partner with data center operations to manage and perform field-replaceable unit (FRU) repairs for GPUs, power supplies, cooling systems, interconnects, and networking hardware
  • Conduct post-repair validation, burn-in testing, torch testing, and NVIDIA NCCL testing to ensure system stability and performance
  • Implement and execute preventative maintenance procedures to improve fleet reliability and extend hardware lifespan
  • Perform firmware and BIOS upgrades across the GPU fleet
  • Maintain detailed documentation of maintenance activities, failures, and resolutions in ticketing and asset management systems
  • Develop and update standard operating procedures (SOPs) for troubleshooting, repair, and validation workflows
  • Collaborate with engineering, software, and data center operations teams to identify root causes of systemic failures and implement preventative solutions
What we offer
What we offer
  • Restricted Stock Units in a fast growing, well-funded technology company
  • Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
  • Employer contributions to HSA accounts
  • Paid Parental Leave
  • Paid life insurance, short-term and long-term disability
  • Teladoc
  • 401(k) with a 100% match up to 4% of salary
  • Generous paid time off and holiday schedule
  • Cell phone reimbursement
  • Tuition reimbursement
  • Fulltime
Read More
Arrow Right
New

Care Assistant

This is an excellent opportunity to enhance your current skill base and join us ...
Location
Location
United Kingdom , Preston
Salary
Salary:
13.00 GBP / Hour
carelinehomecare.co.uk Logo
Care Line Homecare Limited
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Compassion and ability to care
  • Resilience
  • Willingness to learn new skills and develop knowledge as part of a close-knit team
  • Access to a vehicle required to the geographical location of the role
Job Responsibility
Job Responsibility
  • Support client with Cerebral Palsy to attend university and social events
  • Enhance skill base in complex care areas such as tracheotomy, ventilation, seizure management, stoma care
  • Care for and support people of all ages in the community and their homes with Spinal Cord injury, Muscular dystrophy, acquired brain injuries and other complex needs
What we offer
What we offer
  • Maternity/Paternity leave
  • Pension scheme
  • Paid annual leave
  • Refer a friend scheme
  • Cycle-to-work scheme
  • Enhanced DBS check
  • Full training and clinical support
  • Access to resources, career pathways, benefits, investments, opportunities, and security of being part of a larger group
  • Fulltime
Read More
Arrow Right
New

Construction Project Coordinator

Microsoft Cloud Operations + Innovation (CO+I) is the team behind the cloud. CO+...
Location
Location
United States , Wenatchee
Salary
Salary:
96500.00 - 188400.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Construction Project Management, Architecture, Engineering, or related field AND 5+ years related experience OR equivalent experience
Job Responsibility
Job Responsibility
  • Support construction activities from project initiation through closeout, ensuring all tasks, schedules, and action items are tracked and executed on time
  • Coordinate meetings, vendor activities, engineering site visits, and internal crossfunctional discussions to maintain project progress
  • Assist in preparing project approval tools, requests for proposals, and scope-of-work documentation to support on-time delivery
  • Participate in weekly meetings for construction projects across the Metro
  • Oversee project repositories (SharePoint or equivalent), ensuring clear organization, retrieval, and version control of documentation across all phases
  • Maintain, track and archive all required project documentaion, including drawings, logs, compliance documents, engineering reports and turnover packages
  • Actively participate in project site safety, including daily site reviews for EHS observations, and assist with coordination of EHS related documentation
  • Track project constraints and risks
  • escalate delays or issues and initiate acceleration programs when critical path tasks are compromised
  • Assist in monitoring vendor performance and ensuring contract compliance, including coordinating documentation required for commissioning managers and contractors
  • Fulltime
Read More
Arrow Right