CrawlJobs Logo

Fleet Engineering Debug

https://www.microsoft.com/ Logo

Microsoft Corporation

Location Icon

Location:
United States , Redmond

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

119800.00 - 234700.00 USD / Year

Job Description:

Microsoft Silicon, Cloud Hardware, and Infrastructure Engineering (SCHIE) is the team behind Microsoft’s expanding Cloud Infrastructure and responsible for powering Microsoft’s “Intelligent Cloud” mission. SCHIE delivers the core infrastructure and foundational technologies for Microsoft's over 200 online businesses including Bing, MSN, Office 365, Xbox Live, Teams, OneDrive, and the Microsoft Azure platform globally with our server and data center infrastructure, security and compliance, operations, globalization, and manageability solutions. Our focus is on smart growth, high efficiency, and delivering a trusted experience to customers and partners worldwide and we are looking for a Fleet Engineering Debug to help achieve that mission. As Microsoft's cloud business continues to grow the ability to deploy new offerings and hardware infrastructure on time, in high volume with high quality and lowest cost is of paramount importance. To achieve this goal, the Hardware, Infrastructure Management, and Fundamentals Engineering (HIFE) team is instrumental in defining and delivering operational measures of success for hardware manufacturing, improving the planning process, quality, delivery, scale and sustainability related to Microsoft cloud hardware. We are looking for a Fleet Engineering Debug with a dedicated commitment for customer focused solutions, insight and industry knowledge to envision and implement future technical solutions that will manage and optimize the cloud infrastructure.

Job Responsibility:

  • Execute system level end to end debug solutions for at scale datacenter systems
  • Lead collaboration projects with hardware, firmware and software teams that drive root cause analysis
  • Accountable for successful execution of targeted system level root cause analysis and defect reduction projects
  • Provide technical recommendations on diagnostics or debug deployment technologies
  • Lead debug of complex problems based on technical and business understanding
  • Develop innovative at scaleable debug methodologies, test strategies and test routines in data center solutions
  • Solve problems relating to essential services and build automation to drive debug efficiency
  • Effectively communicate with partners and stakeholders for planning and progress on initiatives using data.

Requirements:

  • Master's Degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 3+ years technical engineering experience OR Bachelor's Degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 5+ years technical engineering experience OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.

Nice to have:

  • 7+ years of experience of technical leadership as a platform or software architect or validation architect or a lead debug engineer or equivalent industry experience leadership position
  • In-depth understanding of modern computer architectures or System on Chip features like reliability, accessibility and serviceability (RAS) features, virtualization technologies or major architectural blocks like Memory Controllers or Central Processing Units or Storage or Networking solutions for cloud or datacenter infrastructures
  • Ability to lead technical in-depth technical reviews into software solutions used in at scale environments or datacenter infrastructure, cloud native operating systems, or virtualization technologies
  • Platform or software level debug and validation experience
  • Software and data analytical skills.

Additional Information:

Job Posted:
April 11, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Fleet Engineering Debug

Senior C++ Engineer - Satellite Real-Time Control Systems

The Mission of the Senior C++ Engineer - Satellite Real-Time Control Systems ICE...
Location
Location
Finland , Helsinki
Salary
Salary:
Not provided
iceye.com Logo
ICEYE
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • You love writing modern C++ and know what production-quality code looks like
  • Proven track record of shipping real-time control software for autonomous or safety-critical systems—satellites, drones, robotics, automotive…
  • Understand hard real-time constraints, latency budgeting and deterministic behaviour
  • Comfortable interfacing with sensors, actuators and embedded Linux environments
  • Champion of good engineering practice: rigorous testing at all levels, CI/CD, clear documentation
  • Ownership through full software lifecycle—from whiteboard concepts to on-orbit maintenance
  • Clear communicator who enjoys solving problems with colleagues across disciplines
Job Responsibility
Job Responsibility
  • Write and optimize real-time C++ code that meets strict determinism and latency budgets needed for safe and precise on-orbit execution
  • Build & own the software layer that bridges sensors, actuators and control algorithms - deterministic loops, telemetry pipelines and on-orbit autonomy
  • Drive quality through full development lifecycle: requirements → design → code → HIL/MIL testing → launch → on-orbit support
  • Collaborate with GNC, electronics, ground-segment and mission-ops engineers to debug, iterate and improve performance
  • Lead architecture evolution as our fleet and use-cases grow—refactor, optimise and introduce new technologies where they add value
  • Investigate anomalies: deep-dive into flight telemetry, reproduce issues on ground and roll out fixes that keep the constellation healthy
What we offer
What we offer
  • Occupational healthcare, occupational and private insurance
  • Yearly benefit budget to spend on sport, transport, wellness, lunch, etc
  • Phone subscription with iPhone of choice
  • Relocation support (flight tickets, accommodation, relocation agency support)
  • Time and resources for self-development, research, training, conferences, and certification schemes
  • Inspiring office environment with collaborative spaces and silent workspaces
  • Access to state-of-the-art labs and testing facilities
  • Opportunities to attend international space conferences
  • Fulltime
Read More
Arrow Right

Staff Software Engineer, Technical Lead

Provide hands-on technical leadership for the core software platform that powers...
Location
Location
United States , San Francisco
Salary
Salary:
140000.00 - 240000.00 USD / Year
chefrobotics.ai Logo
Chef Robotics
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of software engineering with 3+ years in technical leadership roles
  • Track record delivering production robotic systems, IoT devices, or autonomous systems at scale
  • Experience designing reliable systems for B2B/enterprise deployments
  • On-device platform expertise: OS configuration, device drivers, system services, networking stack configuration
  • Robotics middleware: ROS/ROS2, real-time systems, sensor integration
  • Infrastructure: Containerization (Docker/K8s), CI/CD pipelines, monitoring/observability
Job Responsibility
Job Responsibility
  • Define and evolve the architecture for on-robot software, including OS configuration, hardware abstraction, middleware, and system services
  • Lead middleware architecture decisions for real-time robot control, sensor integration, and inter-process communication
  • Establish patterns for full-stack development, connecting on-robot systems to cloud services and web interfaces
  • Write production code for high-impact features across the stack: robotics middleware, backend services, and cloud infrastructure
  • Lead critical technical initiatives, including robotic platform software, cloud data pipelines, and fleet management platform
  • Build robust deployment, monitoring, and OTA update systems for production robot fleets
  • Debug the most challenging issues from kernel/driver level through the application layer
  • Establish engineering standards and processes that balance rigor with startup agility
  • Champion reliability, observability, and testing practices across embedded and cloud systems
  • Mentor engineers through code reviews, design discussions, and pairing sessions
What we offer
What we offer
  • Medical, dental, and vision insurance
  • Commuter benefits
  • Flexible paid time off (PTO)
  • Catered lunch
  • 401(k) matching
  • Fulltime
Read More
Arrow Right

Systems Engineer, Diagnostics

As a Systems Engineer on the Diagnostics Engineering team, you will lead efforts...
Location
Location
United States , Palo Alto
Salary
Salary:
110000.00 - 240000.00 USD / Year
1x.tech Logo
1X Technologies
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Degree in engineering or a related field, or equivalent experience
  • Hands-on experience debugging complex subsystems involving microprocessors and software-controlled electrical or electromechanical devices
  • Ability to read and interpret C++, Python, and similar embedded systems languages
  • Proficient in data visualization techniques and tools
  • Experience with Linux, Git, command line tools, and standard diagnostic equipment (e.g., oscilloscopes, multimeters, log analyzers)
  • Experience designing and building mechanical test fixtures, diagnostic tools, or custom hardware
  • Solid fundamentals in electrical and embedded systems troubleshooting
  • Experience supporting hardware bring-up, calibration, or production validation workflows
  • Familiarity with ROS2, behavior trees, or motion-planning stacks
Job Responsibility
Job Responsibility
  • Diagnose mechanical, electrical, software, and controls failures, and document root causes
  • Debug complex mechanical failures using engineering fundamentals such as drawings, mechanisms, tolerances, fits, inspection, and measurement techniques
  • Use electrical and system-level instrumentation to investigate faults in robots and sub-components
  • Develop test tools and scripts in C++ and Python to support diagnostics, integration, and data analysis
  • Create clear reporting from diagnostic and fleet data to drive decision-making and track improvements
  • Collaborate with prototyping and design engineers to iterate on hardware changes based on diagnostic findings
  • Communicate design recommendations effectively to hardware, electrical, and controls teams
What we offer
What we offer
  • Health, dental, and vision insurance
  • 401(k) with company match
  • Paid time off and holidays
  • Fulltime
Read More
Arrow Right

ASIC Engineer Intern - Infra Silicon Enablement

Meta is seeking an ASIC Engineering Intern to join our Release to Production Eng...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Currently has, or is in the process of obtaining, a Bachelor's degree in Electrical Engineering, Computer Engineering or related engineering fields
  • Completed Coursework in Computer architecture and/or Electrical engineering
  • Experience with troubleshooting, debug and analytics for Silicon products
  • Experience in Linux, Python, C/C++ and/or similar languages (data structures, algorithms, and OOP)
  • Must obtain work authorization in the country of employment at the time of hire and maintain ongoing work authorization during employment
  • Intent to return to a degree-program after the completion of the internship/co-op
Job Responsibility
Job Responsibility
  • Work across all aspects of silicon lifecycle to deliver reliable and performant silicon solutions - from early architecture and design inputs, pre-silicon validation, bring-up and post-silicon characterization and deployment in fleet
  • Create/develop validation and automation tool sets targeted at silicon validation and productization - inclusive of, but not limited to silicon diagnostics, performance analysis, debug tools, bare metal and full stack systems, from early labs to data center deployments
  • Understand production system use cases to improve validation
  • Provide feedback into next generation architecture and design with insights from the production fleet
  • Root-cause, resolve and remediate issues with silicon across the product lifecycle
Read More
Arrow Right

Vehicle Electrical Integration Technician

We are looking for an experienced Vehicle Integration Technician who is highly m...
Location
Location
United States , Odessa
Salary
Salary:
Not provided
kodiak.ai Logo
Kodiak Robotics
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong technical background
  • 2+ years of experience as a technician in the autonomous, automotive, aerospace or related industry
  • Confident with automotive electronics and familiar with communication protocols and tools such as Ethernet, CAN, LIN, 100BASE-T1
  • Experience with electrical/harnessing standards and manufacturing
  • Strong verbal and written communication
  • Deep understanding of manufacturing processes for both low and high volume production
  • Ability to regularly lift and /or move up to 10 pounds, frequently lift and/or move up to 25 pounds and occasionally lift and/or move up to 45 pounds
  • Willingness to work in a traditional shop setting or outdoors in all weather conditions during the day/night
  • Ability and willingness to climb ladders, stand, crouch, kneel and generally maneuver as required to repair mechanical and electrical equipment in, on, and around trucks
Job Responsibility
Job Responsibility
  • Troubleshoot and debug issues in the field, and trace failures to correct subsystems
  • Develop and perform corrective and preventative actions for top failure modes
  • Hands-on work with mechanical, electrical, and software systems to diagnose, repair, and root cause issues seen across our fleet
  • Maintain, repair and upfit mechanical, electrical and software systems to contribute and improve to fleet-wide uptime
  • Lead and support engineering test activities by applying engineering practices, principles, and various analysis tools to achieve performance targets and standards
  • Provide feedback and inspect hardware quality at various levels and resolve manufacturing/build issues
  • Create and review documentation such as drawings, schematics, build instructions, and incoming hardware quality tests/checks
What we offer
What we offer
  • Competitive compensation package including equity and annual bonuses
  • Excellent Medical, Dental, and Vision plans through Kaiser Permanente, Cigna, and MetLife (including a medical plan with infertility benefits)
  • MetLife Legal Services, Identity & Fraud Protection, Hospital Indemnity Insurance, Accident Insurance, & Critical Illness Insurance
  • Flexible PTO, 10 paid holidays, and generous parental leave policies
  • Office perks: dog-friendly, free catered lunch, a fully stocked kitchen, and free EV charging
  • Long Term Disability, Short Term Disability, Life Insurance
  • Wellbeing Benefits - Headspace through Cigna, Calm through Kaiser, One Medical, Gympass, Spring Health through Cigna, Rula (mental health navigation)
  • Fidelity 401(k)
  • Commuter, FSA, Dependent Care FSA, HSA
  • Various incentive programs (referral bonuses, patent bonuses, etc.)
  • Fulltime
Read More
Arrow Right

Systems Software Engineer - Fleet Management

We're looking for a strong systems software engineer to lead our camera fleet ma...
Location
Location
United States , San Mateo
Salary
Salary:
240000.00 - 300000.00 USD / Year
verkada.com Logo
Verkada
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • BS/MS in Computer Science (or similar degree)
  • 5+ years experience of industry experience in distributed software engineering
  • Deep Linux expertise: Strong understanding of Linux internals—not just using it, but understanding how it works (process management, system-level debugging)
  • Systems programming skills: Proficiency in systems-oriented languages like C, Go
  • Observability & metrics: Experience with metrics systems (eg. Grafana, Prometheus) and building data pipelines
  • Data-driven mindset: You rely on data and analysis rather than intuition
  • you're comfortable with statistics and quantitative reasoning
  • Ownership mentality: Willingness to own and lead the entire effort—not just writing code, but defining architecture, establishing processes, and driving team direction
  • Proven crisis management: Track record of handling high-stakes incidents and making sound decisions under pressure
  • Must be willing and able to work onsite five days per week.
Job Responsibility
Job Responsibility
  • Architect for scale: Design systems that can handle the complexities of managing software across a large and growing camera fleet
  • Build observability infrastructure: Design and implement dashboards, metrics systems, and monitoring solutions using tools like Grafana
  • Lead safe release operations: Own the end-to-end process for releasing camera software, ensuring reliability and minimizing risk across the fleet
  • Develop analysis tools: Create data pipelines and analytical tools to measure release health, identify issues early, and drive data-informed decisions
  • Establish accountability: Define release procedures, hold teams accountable to standards, and ensure adherence to the Safe Release Procedure
  • Create automated safeguards: Develop alerts and automated tests that catch problems before they impact customers
  • Respond to critical incidents: Be comfortable making decisions under pressure during high-stakes situations
What we offer
What we offer
  • Healthcare programs that can be tailored to meet the personal health and financial well-being needs - Premiums are 100% covered for the employee under at least one plan and 80% for family premiums under all plans
  • Nationwide medical, vision and dental coverage
  • Health Saving Account (HSA) with annual employer contributions and Flexible Spending Account (FSA) with tax saving options
  • Expanded mental health support
  • Paid parental leave policy & fertility benefits
  • Time off to relax and recharge through our paid holidays, firmwide extended holidays, flexible PTO and personal sick time
  • Professional development stipend
  • Fertility Stipend
  • Wellness/fitness benefits
  • Healthy lunches provided daily
  • Fulltime
Read More
Arrow Right

Applied AI Engineer - Flywheel Automation & Continuous Learning

Kodiak is seeking a world-class Applied AI Engineer to design and build the AI F...
Location
Location
United States , Mountain View
Salary
Salary:
180000.00 - 240000.00 USD / Year
kodiak.ai Logo
Kodiak Robotics
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s, Master’s, or PhD in Computer Science, Machine Learning, Robotics, or a related field
  • 3+ years of experience building production-grade ML infrastructure or model pipelines
  • Deep proficiency in Python and deep learning frameworks (e.g., PyTorch, TensorFlow)
  • Experience with distributed training and pipeline orchestration (e.g., Airflow, Kubeflow, Dagster)
  • Strong engineering fundamentals, debugging skills, and ability to scale systems
  • Passion for turning real-world data into self-improving AI systems
Job Responsibility
Job Responsibility
  • Design and implement the end-to-end AI Flywheel, platforms for training, validation, deployment, and building a robust automated system
  • Build and maintain multi-node distributed training pipelines using tools like PyTorch DDP, Horovod, or Ray
  • Develop smart data mining and active learning strategies to prioritize valuable training data from petabyte-scale logs
  • Automate model evaluation and selection pipelines to support rapid iteration and closed-loop deployment
  • Build infrastructure for seamless model image packaging, validation, and rollout across Kodiak’s autonomous fleet and AI platform
  • Ensure that the flywheel is reliable, reproducible, and scalable, capable of learning from millions of real-world miles
What we offer
What we offer
  • Competitive compensation package including equity and annual bonuses
  • Excellent Medical, Dental, and Vision plans through Kaiser Permanente, Cigna, and MetLife (including a medical plan with infertility benefits)
  • MetLife Legal Services, Identity & Fraud Protection, Hospital Indemnity Insurance, Accident Insurance, & Critical Illness Insurance
  • Flexible PTO, 10 paid holidays, and generous parental leave policies
  • Office perks: dog-friendly, free catered lunch, a fully stocked kitchen, and free EV charging
  • Long Term Disability, Short Term Disability, Life Insurance
  • Wellbeing Benefits - Headspace through Cigna, Calm through Kaiser, One Medical, Gympass, Spring Health through Cigna, Rula (mental health navigation)
  • Fidelity 401(k)
  • Commuter, FSA, Dependent Care FSA, HSA
  • Various incentive programs (referral bonuses, patent bonuses, etc.)
  • Fulltime
Read More
Arrow Right

Software Engineer, Data Infrastructure - Research

The Workload team is responsible for designing and running OpenAI’s LLM training...
Location
Location
United States , San Francisco
Salary
Salary:
250000.00 - 380000.00 USD / Year
openai.com Logo
OpenAI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong engineering fundamentals with experience in distributed systems, data pipelines, or infrastructure
  • Experience building APIs, modular code, and scalable abstractions
  • Comfortable debugging bottlenecks across large fleets of machines
  • Pride in building infrastructure that 'just works'
  • Collaborative, humble, and excited to own a foundational part of the ML stack
Job Responsibility
Job Responsibility
  • Design and implement the dataset infrastructure that powers OpenAI’s next-generation training stack
  • Design and maintain standardized dataset APIs, including for multimodal (MM) data that cannot fit in memory
  • Build proactive testing and scale validation pipelines for dataset loading at GPU scale
  • Collaborate with teammates to integrate datasets seamlessly into training and inference pipelines
  • Document and maintain dataset interfaces so they are discoverable, consistent, and easy for other teams to adopt
  • Establish safeguards and validation systems to ensure datasets remain reproducible and unchanged once standardized
  • Debug and resolve performance bottlenecks in distributed dataset loading
  • Provide visualization and inspection tools to surface errors, bugs, or bottlenecks in datasets
What we offer
What we offer
  • Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
  • 401(k) retirement plan with employer match
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
  • 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time
  • Mental health and wellness support
  • Employer-paid basic life and disability coverage
  • Annual learning and development stipend to fuel your professional growth
  • Daily meals in our offices, and meal delivery credits as eligible
  • Fulltime
Read More
Arrow Right