CrawlJobs Logo

Senior Software Engineer - Datacenter Platform

https://www.microsoft.com/ Logo

Microsoft Corporation

Location Icon

Location:
Serbia , Belgrade

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

It is a mobile-first, cloud-first world, and we are enabling it. Microsoft Azure is at the core of the Microsoft Cloud, providing the foundational infrastructure for large-scale, distributed, and dynamic computing. Our team within Azure delivers the software platform that powers internal Microsoft services such as Office 365, Bing.com, Xbox Live, Skype, and OneDrive, as well as external customers who rely on us to run mission-critical cloud applications for their businesses. We are seeking a Senior Software Engineer to help evolve, expand, and define our software platform and infrastructure. Areas of focus include core infrastructure services at the lowest levels of the stack, achieving five nines (99.999%) reliability, fault tolerance, distributed service monitoring, operational efficiency across the data center hardware lifecycle, performance metrics collection and analysis, alerting, visualization, device operations, and coordination of node diagnostics and repairs. This is a dynamic and fast-paced environment offering a unique opportunity to work on something highly strategic to Microsoft and impactful across the industry. Few roles in computer science provide the chance to operate at this massive scale. If you are passionate about building robust, highly distributed software systems that form the backbone of the Microsoft Cloud, we would love to connect.

Job Responsibility:

  • Design and develop solutions that build and improve cloud services running over distributed system.
  • Provide new features for Microsoft Cloud internal infrastructure software.
  • Keep infrastructure services running and deliver code updates on a regular cadence to improve performance and reliability.
  • Collaborate with appropriate stakeholders to determine user requirements for a scenario.
  • Acts as a Designated Responsible Individual (DRI) and guides other engineers by developing and following the playbook, working on call to monitor system/product/service for degradation, downtime, or interruptions, alerting stakeholders about status and initiates actions to restore system/product/service for simple and complex problems when appropriate.

Requirements:

  • Bachelor's Degree in Computer Science or related technical field AND technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.

Nice to have:

Master's Degree in Computer Science or related technical field AND technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.

Additional Information:

Job Posted:
February 04, 2026

Employment Type:
Fulltime
Work Type:
Remote work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior Software Engineer - Datacenter Platform

Network Software Test – Senior Software Engineer

About Arrcus: Arrcus was founded to enhance business efficiency through superior...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
arrcus.com Logo
Arrcus
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • BS/MS in Computer Engineering/Computer Science or equivalent degree
  • Ability to write high quality automated test cases using Python
  • 5+ years of hands-on test experience of Networking protocols such as OSPF, BGP, ISIS, MPLS, BFD, MLAG, EVPN, VxLAN, SR-MPLS, SRv6
  • Proficient in the use of traffic generators to develop Data Path and Control Plane Test cases
  • Growing the existing automation framework to support customer user case testing scenarios and cross-feature integrations
  • Working knowledge of Test Harness like Robot framework, Jinja2 templating
  • Expertise in Scale and Performance Testing using simulation for customer networks
  • Using development infrastructure tools, such as Jenkins, Git, JIRA, etc.
  • Familiarity with Docker Containers, VMs expected
  • Knowledge of Network merchant silicon chipsets and Whitebox platforms
Job Responsibility
Job Responsibility
  • Deep understanding of Layer 2/3 protocols like BGP, BGP EVPN, ISIS, SR, MPLS,L3VPN, SRv6, and ability to validate networking functionality and performance through automation
  • Ability to understand and learn Service Provider, Datacenter, Campus/ Enterprise Customer Solutions
  • Influence development team to align with customer expectations with respect to deployment and UX needs
  • Creative problem solving and excellent Troubleshooting skills
  • Ability to handle multiple tasks and complete them on time
  • Good documentation and presentation skills
What we offer
What we offer
  • Generous compensation packages including equity
  • Medical Insurance
  • Parental Leave
  • Sabbatical leave (After 4 years of service)
  • Fulltime
Read More
Arrow Right

Senior Manager, Performance AI/ML Network Deployment Engineering

The Senior Manager, DC GPU Advanced Forward Deployment and Systems Engineering i...
Location
Location
United States , Santa Clara
Salary
Salary:
210400.00 - 315600.00 USD / Year
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Expertise in networking and performance optimization for large-scale AI/ML networks, including network, compute, storage cluster design, modelling, analytics, performance tuning, convergence, scalability improvements
  • Prefer candidates with solid, hands-on expertise in at least one or more of 3 domains, namely compute, network, storage
  • Experience in working with large customers such as Cloud Service Providers and global enterprise customers
  • Proven leadership in engaging customers with diverse technical disciplines in avenues such as Proof of Concept, Competitive evaluations, Early Field Trials etc
  • Direct experience in working with large customers and can operate with sense of urgency, own the problems and resolve it
  • Demonstrated leadership in network architecture, hands on experience in RoCEv2 Design, VXLAN-EVPN, BGP, and Lossless Fabrics
  • Proven ability to influence design and technology roadmaps, leveraging a deep understanding of datacenter products and market trends
  • Extensive hands-on Network deployment expertise and proven track record of delivering large projects on time. Cisco, Juniper or Arista experience is preferred
  • Direct, co-development/deployment experience in working with strategic customers/partners in bringing solutions to market
  • Excellent communication level from engineer to mid-management to C-level of audience
Job Responsibility
Job Responsibility
  • Collaborate with strategic customers on scalable designs involving compute, networking, storage environment, work with industry partners, Internal teams to accelerate the deployment, adoption of various AI/ML models
  • Engage system-level triage and at-scale debug of complex issues across hardware, firmware, and software, ensuring rapid resolution and system reliability
  • Drive the ramp of Instinct-based large scale AI datacenter infrastructure based on NPI base platform hardware with ROCm, scaling up to pod and cluster level, leveraging the best in network architecture for AI/ML workloads
  • Enhance tools and methodologies for large-scale deployments to meet customer uptime goals and exceed performance expectations
  • Engage with clients to deeply understand their technical needs, ensuring their satisfaction with tailored solutions that leverage your past experience in strategic customer engagements and architectural wins
  • Provide domain specific knowledge to other groups at AMD, share the lessons learnt to drive continuous improvement
  • Engage with AMD product groups to drive resolution of application and customer issues
  • Develop and present training materials to internal audiences, at customer venues, and at industry conferences
Read More
Arrow Right

Senior Engineering Manager DevSecOps

AMD India (SPSE) is looking for a strong Manager to lead an Embedded Software De...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 15+ years (or more) of overall relevant industry experience and a track record of shipping server/storage/networking products for the enterprise, cloud data center and service provider markets
  • Prior experience in customer-facing / applications engineering role will be a big plus
  • At least 5 years of experience as a first/second-line manager leading the development of embedded software
  • Deep understanding of full product life cycle, software development methods (both Agile and Waterfall), and development and build environments
  • Ability to undertake loosely defined goals or complex problems to create order, and drive closure
  • Ability to organize, delegate, and effectively deliver to large and complex programs
  • Ability to drive multi-geo projects by working effectively with remote teams
  • Ability to thrive in fast-paced, highly dynamic environment, with a bias towards action and results
  • Manage major software release deliveries as a release manager
  • Conflict resolution skills including ability to bridge style difference
Job Responsibility
Job Responsibility
  • Lead an Embedded Software DevSecOps engineering team to lead and deliver modular, quality oriented, and extensible FW infrastructure
  • Managing resources effectively to deliver commitment on schedule
  • Cultivate a high performing team and constantly raise the bar
  • Closely collaborate with peer development teams, architecture, customer support and product line management
  • Contribute to the vision and strategy of continuous integration, improved development processes, quality and productivity improvements
  • Lead end-to-end DevOps programs from planning through execution, ensuring alignment with business goals
  • Design and implement secure CI/CD pipelines across cloud, hybrid, and on-prem environments
  • Review technical designs and pipeline code for security, scalability, and reliability
  • Establish monitoring and observability frameworks to track performance, adoption, and security posture
  • Identify and mitigate technical risks and process inefficiencies across teams
Read More
Arrow Right
New

Senior Platform Service Engineering Manager

In alignment with our Microsoft values, we are committed to cultivating an inclu...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, service engineering, or systems engineering OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Microsoft Cloud Background Check
Job Responsibility
Job Responsibility
  • People Management and Support
  • Technical Knowledge and Expertise
  • Operational Excellence
  • Collaboration and Knowledge Sharing
  • Fulltime
Read More
Arrow Right

Senior+ Software Engineer - Cloud Availability Platform Engineering (Observability)

We are looking for a highly skilled engineer with deep expertise in building and...
Location
Location
United States , San Francisco
Salary
Salary:
166000.00 - 201000.00 USD / Year
crusoe.ai Logo
Crusoe
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of experience in infrastructure or platform engineering, with a focus on observability and monitoring systems
  • Deep expertise with metrics systems (Prometheus, Thanos, Mimir, Cortex), logging pipelines (Fluent Bit, Vector, Loki, ELK/Opensearch), and tracing platforms (Jaeger, Tempo, OpenTelemetry)
  • Strong programming skills in Go or Python for automation, operators, and custom integrations
  • Experience running observability platforms on Kubernetes and operating them at scale across multi-datacenter environments
  • Proven ability to design, optimize, and scale telemetry pipelines handling high cardinality and high throughput data
  • Solid understanding of distributed systems, performance engineering, and debugging complex workloads
  • Strong collaboration skills and the ability to influence engineering teams to adopt observability best practices
Job Responsibility
Job Responsibility
  • Designing and operating scalable observability systems (metrics, logging, tracing) across multi-datacenter Kubernetes environments
  • Architecting end-to-end telemetry pipelines, including ingestion, storage, querying, and visualization
  • Extending monitoring and alerting with Prometheus, Alertmanager, Thanos/Cortex, Grafana, and OpenTelemetry
  • Building scalable log collection and processing pipelines with Fluent Bit, Vector, Loki, or ELK/Opensearch stacks
  • Implementing distributed tracing platforms (Tempo, Jaeger, OpenTelemetry) and integrating with service meshes, load balancers, and APIs
  • Defining and driving adoption of SLOs, SLIs, and error budgets across services and teams
  • Automating provisioning and scaling of observability infrastructure with Kubernetes, Terraform, and custom tooling (Go, Python)
  • Ensuring reliability and cost efficiency of telemetry pipelines while supporting high-volume workloads (AI/ML, HPC clusters, GPU infrastructure)
  • Embedding security best practices into observability platforms, including RBAC, TLS, secret management, and multi-tenant access controls
  • Partnering with engineering teams to embed observability into applications, services, and infrastructure
What we offer
What we offer
  • Restricted Stock Units in a fast growing, well-funded technology company
  • Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
  • Employer contributions to HSA accounts
  • Paid Parental Leave
  • Paid life insurance, short-term and long-term disability
  • Teladoc
  • 401(k) with a 100% match up to 4% of salary
  • Generous paid time off and holiday schedule
  • Cell phone reimbursement
  • Tuition reimbursement
  • Fulltime
Read More
Arrow Right

Director engineering solution architecture

Join Arm’s Solution Architecture Team as a Technical Leader, where you'll drive ...
Location
Location
United Kingdom , Cambridge
Salary
Salary:
Not provided
arm.com Logo
ARM
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Meaningful experience leading progressive architecture and platform engineering initiatives at scale in a senior or managerial role
  • Advancing datacenter (OpenStack) and cloud (AWS/Azure/GCP) hybrid platforms with autonomous composable infrastructure through Infrastructure as Code (IaC), and dynamically assembled via policy
  • Demonstrated expertise in efficient High-Performance Computing/Throughout (HPC/HTC), with a deep understanding of scheduling, tooling, licensing, data and engineering processes (ideally EDA) for compute-intensive workloads across diverse platforms
  • Solid grounding in modern software development practices (SDLC, DevOps, version control, automated testing, CI/CD, code review, collaborative workflows, developer experience)
  • A background with system observability (Dynatrace, Open Telemetry), SRE and security (e.g., TLS, OAuth, SPIFFE, zero-trust models), ensuring resilience, self-healing and compliance
Job Responsibility
Job Responsibility
  • Strategic Leadership: Define roadmaps for systems supporting Arm’s engineering platforms. Align Arm's strategic goals with architectural decisions
  • Innovation & Research: Be curious, continuously evaluate new technologies, practices, patterns, and industry trends. Introduce exemplary solutions for visionary platform and operations
  • Architectural Ownership: Be responsible for the design and integration of scalable, secure, and high-performance systems. Lead all aspects of end-to-end lifecycle from conception to delivery
  • Multi-functional Partnership: Engage with senior engineering, product, and IT leaders to understand evolving needs, and craft infrastructure solutions that drive operational excellence
  • Mentorship & Team Development: Inspire and mentor a team of skilled architects. Nurture a culture of technology passion, innovation, and continuous improvement
Read More
Arrow Right

Senior Staff Engineer, Software Engineering

Our Senior Staff Engineer works with our Staff and Sr. Engineers to innovate and...
Location
Location
United States , Chevy Chase; Austin; Richardson; Seattle; Palo Alto
Salary
Salary:
110000.00 - 260000.00 USD / Year
geico.com Logo
Geico
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Exemplary ability to design, perform experiments, and influence engineering direction and product roadmap
  • Experience partnering with engineering teams and transferring research to production
  • Track-record of publications history in credible conferences and journals
  • Experience with continuous delivery and infrastructure as code
  • In-depth knowledge of CS data structures and algorithms
  • Experience solving analytical problems with quantitative approaches
  • Ability to excel in a fast-paced, startup-like environment
  • Knowledge of developer tooling across the software development life cycle (task management, source code, building, deployment, operations, real-time communication)
  • Fluency and Specialization with at least two modern languages such as Go, Java, C++, Python or C# including object-oriented design
  • Experience with Microservices oriented architecture and extensible REST APIs
Job Responsibility
Job Responsibility
  • Focus on multiple areas and provide technical and thought leadership to the enterprise
  • Collaborate with product managers, team members, customers, and other engineering teams to solve our toughest problems
  • Develop and execute technical software development strategy for a variety of domains
  • Accountable for the quality, usability, and performance of the solutions
  • Utilize programming languages like Python, C# or other object-oriented languages, SQL, and NoSQL databases, Container Orchestration services including Docker and Kubernetes, and a variety of Azure tools and services
  • Be a role model and mentor, helping to coach and strengthen the technical expertise and know-how of our engineering and product community. Influence and educate executives
  • Consistently share best practices and improve processes within and across teams
  • Analyze cost and forecast, incorporating them into business plans
  • Determine and support resource requirements, evaluate operational processes, measure outcomes to ensure desired results, and demonstrate adaptability and sponsoring continuous learning
What we offer
What we offer
  • Comprehensive Total Rewards program that offers personalized coverage tailor-made for you and your family’s overall well-being
  • Financial benefits including market-competitive compensation
  • a 401K savings plan vested from day one that offers a 6% match
  • performance and recognition-based incentives
  • and tuition assistance
  • Access to additional benefits like mental healthcare as well as fertility and adoption assistance
  • Supports flexibility- We provide workplace flexibility as well as our GEICO Flex program, which offers the ability to work from anywhere in the US for up to four weeks per year
  • Fulltime
Read More
Arrow Right
New

Senior Software Engineer

The Budget Optimization Engineering team at Microsoft builds the real-time data ...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C#, Java, Go, or Python OR equivalent experience.
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
  • 7+ years of technical experience in software development, service engineering, or systems engineering.
  • 5+ years of experience building and operating large-scale distributed systems, backend services, or data platforms with strict SLA requirements.
  • Apache Kafka — solid understanding of consumers, producers, offset management, partition strategies, performance tuning, and cross-datacenter replication patterns.
  • Kubernetes — production experience writing and deploying Helm charts
  • hands-on with Deployments, StatefulSets, Services, ConfigMaps, Secrets, Jobs, and HPAs
  • comfortable with multi-cluster and multi-datacenter environments.
  • Cloud infrastructure — practical experience with Azure (AKS, ACR, Azure Key Vault, Azure Application Insights, Azure Log Analytics)
  • familiarity with Azure DevOps or equivalent CI/CD platforms.
Job Responsibility
Job Responsibility
  • Design and build highly scalable backend services and data pipelines that support privacy-preserving measurement and analytics scenarios using Java, Python (and C# where applicable).
  • Maintain and improve production services across the optimization platform — including Kafka streaming pipelines, budget controllers, job orchestration (job-broker), and deal monitoring — with a focus on reliability and strict SLA adherence.
  • Drive integrations with external data and measurement partners, designing stable interfaces, schema governance patterns, and robust validation pipelines.
  • Work closely with PMs, data science, privacy, and security teams to translate measurement needs into scalable platform capabilities.
  • Contribute to the full service lifecycle: design, implementation, testing, code review, and deployment.
  • Improve reliability and observability of Kafka consumer/producer pipelines (offset management, retry strategies, delivery guarantees) across cross-datacenter replication flows.
  • Design and implement Kubernetes/Helm deployments for services currently running on legacy orchestration (Maestro, SAND instances, bare Docker), targeting Azure-native cloud infrastructure.
  • Integrate application telemetry (Prometheus/Dropwizard Metrics) with Azure Application Insights and Azure Log Analytics to support production observability and SLA monitoring.
  • Apply practical experience with Azure services — including AKS, ACR, and Azure Key Vault — to support secure, cloud-native deployments.
  • Lead initiatives to make delivery of high-quality software routine and efficient across the full SDLC, from inception and technical design through testing and production operations.
  • Fulltime
Read More
Arrow Right