CrawlJobs Logo

Software Engineer - Observability

Ireland, Dublin Employment contract 63400.00 - 105700.00 EUR / Year · Job Posted July 05, 2026
Apply Position
Job Link Share

Job Description

Microsoft is a company where passionate innovators come to collaborate, envision what can be and take their careers further. This is a world of more possibilities, more innovation, more openness, and the sky is the limit thinking in a cloud-enabled world. Microsoft's Azure Data engineering team is leading the transformation of analytics in the world of data with products like databases, data integration, big data analytics, messaging & real-time analytics, and business intelligence. The products our portfolio include Microsoft Fabric, Azure SQL DB, Azure Cosmos DB, Azure PostgreSQL, Azure Data Factory, Azure Synapse Analytics, Azure Service Bus, Azure Event Grid, and Power BI. Our mission is to build the data platform for the age of AI, powering a new class of data-first applications and driving a data culture. Within Azure Data, the messaging and real-time analytics team provides comprehensive solutions and a robust platform that enables users to ingest high granularity signals (real-time & observability) and complex data, converting those into a competitive advantage in real-time for both end users and modern applications. We are looking for a Software Engineer to help shape the next phase of our Observability Platform. On this team, you'll work on how trillions of signals from Microsoft's intelligent cloud are ingested, processed, and turned into reliable, real‑time insights. We own the core telemetry ingestion pipelines that handle more than an Exabyte of data every day and underpin observability across Azure, Office, Windows, Xbox, and a broad ecosystem of customer applications. This role sits close to the hard problems of scale, with strict requirements on throughput, reliability, correctness, and operability, and offers meaningful ownership over systems that run continuously at global scale. Beyond individual contributions, you'll collaborate across teams, mentor engineers, and help establish technical direction and engineering standards as the platform evolves. We do not just value differences or different perspectives. We seek them out and invite them in so we can tap into the collective power of everyone in the company. As a result, our customers are better served.

Job Responsibility

  • Design, develop, and operate large-scale, multi-tenant telemetry ingestion pipelines and services (real-time and batch) to handle massive data volumes
  • Build and enhance APIs, tools, and subsystems for telemetry collection, routing, storage, and efficient data access
  • Integrate advanced capabilities (e.g., machine learning–based anomaly detection and data validation) to enhance platform intelligence and insights
  • Own core components of the ingestion and observability platform, driving continuous improvements in reliability, scalability, performance, and data quality
  • Implement robust monitoring, alerting, and diagnostics and ensure production services run reliably, including participation in on-call rotations and incident response
  • Collaborate with partner teams to deliver end-to-end observability solutions and contribute to design reviews and best practices that uphold high engineering standards
  • Embody our culture and values

Requirements

  • Bachelor's degree in computer science or related discipline, or equivalent experience
  • Software development with demonstrated experience shipping products or services
  • Solid understanding of data structures, algorithms, and system design fundamentals
  • Strong problem-solving and analytical skills, with a structured approach to software design
  • Ability to collaborate effectively in a cross-functional team environment
  • Strong communication skills
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.

Nice to have

  • Experience with cloud platforms (Azure, AWS, or Google Cloud)
  • Experience building high-performance, scalable, and high-throughput systems
  • Experience with Service Fabric, AKS, or Azure DevOps
  • Experience debugging and resolving complex production issues
  • Experience working with AI/ML concepts or systems

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Software Engineer - Observability

8 matching positions

Senior Software Engineer and Software Engineer II

OneDrive and SharePoint are rapidly growing services at the center of Microsoft'...
Location
Location
United States , Redmond
Salary
Salary:
100600.00 - 199000.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 2+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Master's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Experience in related to cloud scale distributed design and patterns
  • The ability to deliver informed designs and plans ahead of production and execution
  • Knowledge of others' expertise and the ability to involve multiple players (within and outside the organization) in the creation or development of novel products, processes, or research streams
Job Responsibility
Job Responsibility
  • Design and deliver systems that enable partners and ISVs to migrate from other cloud providers, improve core systems performance and efficiencies, and ensure zero customer impact throughout the change management cycle
  • Deliver systems to meet our business continuity planning goals, provide telemetry for optimizing the service and drive our response time for detecting and resolving service issues down
  • Create, implement, optimize, debug, refactor, and reuses code to establish and improve performance and maintainability, effectiveness, and return on investment (ROI)
  • Contribue to the identification of dependencies, and the development of design documents for a product area with little oversight
  • Helps to identify other teams and technologies that will be leveraged, how they will interact, and when one's system may provide support to others
  • Contributes to determining back-end dependencies associated with product, application, service, or platform functionality for product features
  • Understands downstream effects of solutions and work provided
  • Helps to identify areas of dependency and overlap with other teams or team members and drives coordination
  • Remain current in skills by investing time and effort into staying abreast of current developments that will improve the availability, reliability, efficiency, observability, and performance of products while also driving consistency in monitoring and operations at scale
  • Reviews work items to deepen knowledge of product features in partnership with appropriate stakeholders (e.g., project managers) and executes project plans, release plans, and work items
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, Observability

You will work on core observability systems (metrics, logs, traces) while also d...
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
roku.com Logo
Roku
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years in software engineering, building distributed, high-throughput systems or observability platforms
  • 4+ years of Go/Golang experience
  • our observability ecosystem is built on Go, making it the most effective language for this role
  • Experience with, or strong interest in, observability tools (Prometheus, Grafana, Loki, Tempo, ELK/OpenSearch, Clickhouse) and standards (OpenTelemetry, OpenTracing, OpenMetrics)
  • Deep understanding of distributed systems and data models
  • Hands-on experience with Kubernetes and cloud platforms (AWS, GCP, Azure)
Job Responsibility
Job Responsibility
  • Extend and integrate open-source observability systems, and when necessary, structurally overhaul core components, such as storage layers and query paths, to enhance the performance, reliability, and usability of these tools at scale
  • Build services to improve performance, usability, reliability, and cost efficiency
  • Implement features like pre-aggregation, downsampling, and sampling to reduce load and accelerate queries
  • Create developer-facing capabilities for metrics, logs, and traces usage, data quality, and cost management
  • Automate onboarding, dashboards, alerting, and tracing
  • Collaborate across platform and infrastructure teams to integrate observability into Roku’s cloud-native stack
What we offer
What we offer
  • global access to mental health and financial wellness support and resources
  • healthcare (medical, dental, and vision)
  • life, accident, disability, commuter, and retirement options (401(k)/pension)
  • Fulltime
Read More
Arrow Right

Software Engineer, Observability

As a Software Engineer in Observability, you’ll be responsible for our metrics a...
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
dialpad.com Logo
Dialpad
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Background in both Systems and/or Software Engineering
  • Experience in designing, automating, maintaining, and optimizing observability platforms (logging, metrics, and tracing)
  • Experience with configuration management tools such as Ansible, Terraform, etc.
  • Experience with Public Cloud environments such as GCP, AWS, etc.
  • Familiarity with languages such as Python, Go, Rust, etc.
  • Previous direct experience with Grafana, Loki, Prometheus
  • Experience with Linux
  • Experience with Kubernetes (including GKE/EKS) and building containerized applications
  • Undergraduate degree in Computer Science or Engineering
Job Responsibility
Job Responsibility
  • Develop and improve instrumentation for monitoring and logging the health and availability of services
  • Develop and maintain the observability stack within Dialpad engineering
  • Define best practices and standards around making systems and services measurable, and work with various teams to get those best practices applied
  • Create tools and libraries for other engineering teams to enable them to build self-monitoring capabilities
  • Create and own internal documentation used by the other engineering teams
  • Stay up-to-date with the latest trends in observability, logging, monitoring, and cloud technologies
  • Collaborate with different engineering teams to integrate observability practices into their workflows
  • Participate in a rotating on-call within the larger Infrastructure Engineering division
What we offer
What we offer
  • Competitive salary
  • comprehensive benefits
  • real opportunities for growth
  • cutting-edge AI tools
  • robust training program
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, Observability

We are looking for an experienced Senior Engineer to join our newly formed Obser...
Location
Location
Germany , Berlin
Salary
Salary:
Not provided
aiven.io Logo
Aiven Deutschland GmbH
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Extensive experience with observability concepts on a big scale
  • A good grasp of monitoring and observability tools like Prometheus, Grafana, and OpenTelemetry
  • Understanding of SLAs, SLOs, and SLIs
  • Strong knowledge of database fundamentals, including OLAP vs. OLTP, persistence, replication, and clustering
  • Experience with ClickHouse specifically regarding logs, metrics, and OpenTelemetry is highly desirable
  • Experience in building and designing distributed systems in a cloud environment
  • Ability to work with SQL to interact with our platform's master database
  • Deep understanding of release management and testing best practices to own the delivery pipeline
  • A genuine interest in solving complex technical challenges with customer-focused solutions
Job Responsibility
Job Responsibility
  • Ensure our existing observability offering is up and running all the time
  • Ideate and develop innovative new features that attract our target customer segment, drive product engagement, and ultimately fuel growth
  • Support our existing external customer base by resolving escalated support issues and collaborating with them to understand and solve their needs
  • Guide the team in the hands-on implementation of key platform features, ensuring maintainability and performance
  • Empower your team to act as 'product custodians' by consistently addressing foundational and production issues
  • Practise effective communication and collaboration both within the team and across the wider organization and act as a role model in transparency for your peers
What we offer
What we offer
  • Participate in Aiven’s equity plan
  • Balance work and life with our hybrid work policy
  • Choose the equipment you need to set yourself up for success
  • Use your Professional Development Plan budget for learning opportunities
  • Receive holistic wellbeing support through our global Employee Assistance Program
  • Inquire about our Global Time Off Commitment (Parental and Sick Leave, as well as Personal Time)
  • Enjoy country-specific benefits for our global cast
  • Fulltime
Read More
Arrow Right

Sr/Staff Software Engineer, Observability

We are looking for a highly skilled engineer with deep expertise in building and...
Location
Location
United States , San Francisco
Salary
Salary:
172000.00 - 253000.00 USD / Year
crusoe.ai Logo
Crusoe
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years of experience with distributed systems, with a focus on observability and monitoring systems
  • Deep expertise with metrics systems (Prometheus, Thanos, Mimir, Cortex), logging pipelines (Fluent Bit, Vector, Loki, ELK/Opensearch), and tracing platforms (Jaeger, Tempo, OpenTelemetry)
  • Strong programming skills in Go or Python for automation, operators, and custom integrations
  • Experience running observability platforms on Kubernetes and operating them at scale across multi-datacenter environments
  • Proven ability to design, optimize, and scale telemetry pipelines handling high cardinality and high throughput data
  • Solid understanding of distributed systems, performance engineering, and debugging complex workloads
  • Familiarity with service meshes, networking, and workload instrumentation (Envoy, Istio, OpenTelemetry SDKs)
  • Strong collaboration skills and the ability to influence engineering teams to adopt observability best practices
Job Responsibility
Job Responsibility
  • Designing and operating scalable observability systems (metrics, logging, tracing) across multi-datacenter Kubernetes environments
  • Architecting end-to-end telemetry pipelines, including ingestion, storage, querying, and visualization
  • Extending monitoring and alerting with Prometheus, Alertmanager, Thanos/Cortex, Grafana, and OpenTelemetry
  • Building scalable log collection and processing pipelines with Fluent Bit, Vector, Loki, or ELK/Opensearch stacks
  • Implementing distributed tracing platforms (Tempo, Jaeger, OpenTelemetry) and integrating with service meshes, load balancers, and APIs
  • Defining and driving adoption of SLOs, SLIs, and error budgets across services and teams
  • Automating provisioning and scaling of observability infrastructure with Kubernetes, Terraform, and custom tooling (Go, Python)
  • Ensuring reliability and cost efficiency of telemetry pipelines while supporting high-volume workloads (AI/ML, HPC clusters, GPU infrastructure)
  • Embedding security best practices into observability platforms, including RBAC, TLS, secret management, and multi-tenant access controls
  • Mentoring engineers and shaping Crusoe’s observability strategy and technical roadmap
What we offer
What we offer
  • Restricted Stock Units in a fast growing, well-funded technology company
  • Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
  • Employer contributions to HSA accounts
  • Paid Parental Leave
  • Paid life insurance, short-term and long-term disability
  • Teladoc
  • 401(k) with a 100% match up to 4% of salary
  • Generous paid time off and holiday schedule
  • Cell phone reimbursement
  • Tuition reimbursement
  • Fulltime
Read More
Arrow Right

Backend Software Engineer, Observability Product (Agent)

NetBox Labs is seeking a Backend Software Engineer to join our rapidly expanding...
Location
Location
United States; United Kingdom
Salary
Salary:
165000.00 - 195000.00 USD / Year
netboxlabs.com Logo
NetBox Labs
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Deep knowledge of the OSI framework, networks and protocols - esp. DPI, SNMP, sFlow/NetFlow, gNMI
  • Linux system and network programming experience (e.g. system calls, IPC, processes, threads, sockets)
  • Experience with C++ (and/or Rust), as well as Go and Python
  • Experience with eBPF helpful
  • 5+ years of professional experience as a software engineer, and 2+ years in a startup environment
  • Experience in distributed systems and backend microservices development
  • Strong understanding of gRPC, protobuf, event-driven architecture, and streaming data systems
  • Experience with Redis streams, Kafka, MQTT, AMQP or other messaging systems
  • Familiarity with programmatic interaction with network infrastructure via APIs, SSH/CLI automation (e.g., Netmiko, NAPALM), or other network automation frameworks
  • Familiarity with observability concepts (metrics, logs, traces) and related protocols, especially OpenTelemetry
Job Responsibility
Job Responsibility
  • Work with a full stack team to build and maintain open source, source available, and closed source software across our observability project portfolio – shipping to the community and delivering into our commercial cloud and on‑premise products
  • Integrate closely with NetBox’s data model to drive workflows for reconciling observed vs intended state and enriching telemetry and monitoring data
  • Define and maintain data schemas and APIs shared across products
  • Ensure observability systems meet scalability and reliability goals (SLAs/SLOs)
  • Implement testing, CI/CD automation, and code quality standards across observability services
What we offer
What we offer
  • Offers Equity
  • Offers Bonus
  • Fulltime
Read More
Arrow Right

Senior Software Engineer - Observability

As a Senior Software Engineer, you will be directly responsible for Palantir’s o...
Location
Location
United States , New York
Salary
Salary:
135000.00 - 200000.00 USD / Year
palantir.com Logo
Palantir Technologies
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of professional software development experience
  • 2+ years of experience contributing to the system design or architecture (architecture, design patterns, reliability and scaling) of new and existing systems
  • 1+ years of experience as a mentor, tech lead Or leading an engineering team
  • Strong coding skills in Go, Java, or equivalent
  • Experience designing, building, and operating high-scale observability or infrastructure systems
  • Bachelor's degree in Computer Science or equivalent
  • Active US Security clearance, or eligibility and willingness to obtain a US Security clearance
Job Responsibility
Job Responsibility
  • Partner with our extended leadership team to set and define a technical strategy for your team aligned with the wider team strategy
  • Build and champion a long-term tech roadmap to reduce operational burden, ensure scalability, reduce risk, and guide your team towards step-changes whenever possible
  • Be technically involved and engage in substantive discussion when reviewing technical roadmaps and project implementation with the team
  • Work closely with teammates and stakeholders to enable sustainable and timely delivery of technical solutions to address business needs
  • Facilitate partnerships between engineering teams and operators to build innovative products that help Palantir scale
  • Act as a multiplier for other engineers on the team. Define where the technical bar should be, and help engineers achieve it. Lead engineers and accelerate their growth by providing thoughtful feedback, technical mentorship, and effectively manage performance
  • Foster a non-hierarchical exchange of ideas
  • valuing the idea rather than the individual who communicates it
What we offer
What we offer
  • Employees (and their eligible dependents) can enroll in medical, dental, and vision insurance as well as voluntary life insurance
  • Employees are automatically covered by Palantir’s basic life, AD&D and disability insurance
  • Commuter benefits
  • Relocation assistance
  • Take what you need paid time off, not accrual based
  • 2 weeks paid time off built into the end of each year (subject to team and business needs)
  • 10 paid holidays throughout the calendar year
  • Supportive leave of absence program including time off for military service and medical events
  • Paid leave for new parents and subsidized back-up care for all parents
  • Fertility and family building benefits including but not limited to adoption, surrogacy, and preservation
  • Fulltime
Read More
Arrow Right

Senior Software Engineer - Observability and Reliability

We are growing the engineering team and looking for engineers who have the chops...
Location
Location
United States , San Francisco
Salary
Salary:
150000.00 - 220000.00 USD / Year
sigmacomputing.com Logo
Sigma Computing
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong Computer Science fundamentals
  • 5+ years industry experience building and maintaining high-quality software, especially software other engineers use
  • You apply a product mindset to infrastructure systems and feel accomplished enabling others
  • Desire to be a great teammate and have fun at work
  • Strong sense of craftsmanship, and a healthy academic curiosity
Job Responsibility
Job Responsibility
  • Build observability tools and platforms, including: metrics, logging, distributed tracing, dashboarding, alerting, application performance management
  • Build with modern tools and languages like Go, Open Telemetry and Kubernetes
  • Participate in on-call rotation and ensure uptime of services
  • Create runtime tools/processes that optimize cloud triaging and limit downtime
  • Define best practices around making our systems and services measurable
  • Collaborate with peers and stakeholders through design and code reviews to ensure best practices amongst available technologies. We expect successful candidates to be coding a majority of their time
What we offer
What we offer
  • Equity
  • Generous health benefits
  • Flexible time off policy
  • Paid bonding time for all new parents
  • Traditional and Roth 401k
  • Commuter and FSA benefits
  • Lunch Program
  • Dog friendly office
  • Fulltime
Read More
Arrow Right