CrawlJobs Logo

Principal Engineer I - Cloud Observability

confluent.io Logo

Confluent

Location Icon

Location:
India

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

We’re not just building better tech. We’re rewriting how data moves and what the world can do with it. With Confluent, data doesn’t sit still. Our platform puts information in motion, streaming in near real-time so companies can react faster, build smarter, and deliver experiences as dynamic as the world around them. It takes a certain kind of person to join this team. Those who ask hard questions, give honest feedback, and show up for each other. No egos, no solo acts. Just smart, curious humans pushing toward something bigger, together. One Confluent. One Team. One Data Streaming Platform.

Job Responsibility:

  • You will work with a team of engineers and architects to help evolve Confluent Observability features
  • Work closely with product management, engineering leadership, and other key stakeholders across various teams in Confluent to build and drive the overall roadmap
  • Need you to be a strong tech voice outside Confluent Observability within Confluent
  • Influence the overall domain health and operational hygiene for Confluent Observability
  • We need a tech champion for the observability capabilities we provide to our customers
  • You are expected to review designs and code and improve our technical standards
  • We are looking at you to lead the technology charter for our observability features in Confluent Cloud and in hybrid scenarios with Confluent Platform
  • Mentor a team of high-performing engineers and leads, helping them to continue in growing their skill set through hands-on experience and mentorship
  • Be a strong technical leader and representative for engineering teams in India
  • Provide timely and productive feedback, encourage a growth mindset, and advise team members in setting and working toward personal development goals
  • Nurture a culture of excellence on the team through a focus on hiring, communication, execution, and work quality
  • Create and manage processes that enable the team to do its best work

Requirements:

  • Minimum of 15+ years of hands-on software development experience with the ability to anticipate future technical needs for the product and craft plans to realize them
  • Taking ideas to production is something we look for
  • Ready to roll up your sleeves - code, debug, design - do whatever it takes to ship the product to production
  • Experience building and operating large-scale systems. Solid understanding of basic systems operations (disk, network, operating systems, etc). Experience running production services in the cloud
  • Strong fundamentals in distributed systems design and development. Solid fundamentals in concurrent and multi-threading programming
  • A self starter with the ability to work effectively in teams. Proactively identifying the symptoms of technical issues and reason about their causes is needed. This will be followed by fixing the root causes
  • Timely shipping of deliverables
  • being able to trade-off short term technical decisions with the long term. Move fast, build in increments, and iterate. A sense of urgency, a mindset towards achieving results, and excellent prioritization skills
  • Ability to influence the team, peers and upper management in technology decisions using effective communication and collaborative techniques
  • Degree in Computer Science, Engineering or equivalent experience. Understanding of various technologies, programming paradigms and frameworks is needed. Ability to be pragmatic and trade off their usage in production is essential
  • Ability to take on intense customer production issues on-call
  • debugging and mitigating them will be needed. This requires patient log and metrics analysis with solid reasoning to nail the issue

Nice to have:

  • Experience in designing and developing effective solutions for systems observability problems, including effective enablement of metrics, logging, events, or traces capabilities
  • Experience using and operating Apache Kafka, Apache Flink, Apache Druid, and OpenSearch is a big plus
  • Interest in evangelism (giving talks at tech conferences, writing blog posts evangelizing Kafka)
  • Experience working on stream processing technology or query processing systems
What we offer:
  • Remote-First Work
  • Robust Insurance Benefits
  • Flexible Time Away
  • The Best Teammates
  • Experience Ambassadors
  • Open and Honest Culture
  • Well-Being and Growth

Additional Information:

Job Posted:
May 04, 2026

Employment Type:
Fulltime
Work Type:
Remote work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Principal Engineer I - Cloud Observability

Principal Cloud Infrastructure Engineer

As Highspot continues to scale rapidly, building a robust and efficient platform...
Location
Location
United States , Seattle
Salary
Salary:
188696.00 - 282609.00 USD / Year
highspot.com Logo
Highspot
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 15+ years of experience in software or infrastructure engineering
  • At least 5 years focused on platform engineering or cloud infrastructure at scale
  • Proven success designing and operating internal developer platforms in AWS environments
  • Expert-level experience with Kubernetes, including provisioning, cluster lifecycle management, workload orchestration, and multi-tenant design
  • Strong expertise in Terraform, GitOps tools (e.g., ArgoCD), and CI/CD systems (e.g., GitHub Actions, Spinnaker)
  • Deep understanding of cloud networking, IAM, service meshes, and container orchestration at scale
  • Familiar with the CNCF landscape and how to leverage open-source tools to solve platform problems
  • Passion for developer experience
  • Track record of technical leadership, mentoring, and influencing engineering culture at a large scale
  • Bachelor's or Master’s in Computer Science or related discipline, or equivalent practical experience
Job Responsibility
Job Responsibility
  • Design and build scalable platform capabilities that empower engineering teams to ship features reliably, securely, and quickly
  • Create and maintain developer-facing tools and paved paths (e.g., CI/CD pipelines, Kubernetes platforms, observability stacks, secrets management)
  • Implement Infrastructure-as-Code and GitOps patterns to promote consistency, automation, and compliance across environments
  • Collaborate with product, security, and compliance stakeholders to build platform services that meet SLAs and governance standards
  • Drive efforts to standardize and simplify infrastructure across cloud environments (AWS, Azure), enabling secure multi-cloud operation
  • Lead incident response, reliability engineering, and observability improvements that ensure platform uptime and performance
  • Act as a technical mentor and thought leader, guiding teams on infrastructure architecture, platform adoption, and best practices
  • Define and execute on a strategic roadmap to evolve the internal platform in line with company growth and technology direction
What we offer
What we offer
  • Comprehensive medical, dental, vision, disability, and life benefits
  • Health Savings Account (HSA) with employer contribution
  • 401(k) Matching with immediate vesting on employer match
  • Flexible PTO
  • 8 paid holidays and 5 paid days for Annual Holiday Week
  • Quarterly Recharge Fridays (paid days off for mental health recharge)
  • 18 weeks paid parental leave
  • Access to Coaches and Therapists through Modern Health
  • 2 volunteer days per year
  • Commuting benefits
  • Fulltime
Read More
Arrow Right

Principal Cloud Infrastructure Engineer

As Highspot continues to scale rapidly, building a robust and efficient platform...
Location
Location
Canada , Vancouver
Salary
Salary:
170435.00 - 230435.00 CAD / Year
highspot.com Logo
Highspot
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 15+ years of experience in software or infrastructure engineering
  • At least 5 years focused on platform engineering or cloud infrastructure at scale
  • Proven success designing and operating internal developer platforms in AWS and/or Azure environments
  • Expert-level experience with Kubernetes, including provisioning, cluster lifecycle management, workload orchestration, and multi-tenant design
  • Strong expertise in Terraform, GitOps tools (e.g., ArgoCD), and CI/CD systems (e.g., GitHub Actions, Spinnaker)
  • Deep understanding of cloud networking, IAM, service meshes, and container orchestration at scale
  • Familiar with the CNCF landscape and how to leverage open-source tools to solve platform problems
  • Passion for developer experience
  • Track record of technical leadership, mentoring, and influencing engineering culture at a large scale
  • Bachelor's or Master’s in Computer Science or related discipline, or equivalent practical experience
Job Responsibility
Job Responsibility
  • Design and build scalable platform capabilities that empower engineering teams to ship features reliably, securely, and quickly
  • Create and maintain developer-facing tools and paved paths (e.g., CI/CD pipelines, Kubernetes platforms, observability stacks, secrets management)
  • Implement Infrastructure-as-Code and GitOps patterns to promote consistency, automation, and compliance across environments
  • Collaborate with product, security, and compliance stakeholders to build platform services that meet SLAs and governance standards
  • Drive efforts to standardize and simplify infrastructure across cloud environments (AWS, Azure), enabling secure multi-cloud operation
  • Lead incident response, reliability engineering, and observability improvements that ensure platform uptime and performance
  • Act as a technical mentor and thought leader, guiding teams on infrastructure architecture, platform adoption, and best practices
  • Define and execute on a strategic roadmap to evolve the internal platform in line with company growth and technology direction
What we offer
What we offer
  • Comprehensive medical, dental, vision, disability, and life benefits
  • Group Retirement Savings Plan (RRSP) and matching employer contributions (DPSP) with immediate vesting
  • Flexible PTO
  • Generous Holiday Schedule + 5 Days for Annual Holiday Week
  • Quarterly Recharge Fridays (paid days off for mental health recharge)
  • Flexible work schedules
  • Access to Coaches and Therapists through Modern Health
  • 2 Volunteer days per year
  • Monthly transportation allowance for employees that work in our Vancouver Hub location
  • Eligible for bonuses and stock options
  • Fulltime
Read More
Arrow Right

Principal Platform Engineer

Principal Platform Engineer role at Endor Labs building the Application Security...
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
https://www.endorlabs.com Logo
Endor Labs
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 12+ years of Site Reliability Engineering or Platform Engineering experience
  • Deep hands-on expertise with Kubernetes and CNCF ecosystem in production environments
  • Significant experience with at least one major cloud provider (Azure, Google Cloud, or AWS)
  • Strong experience managing large infrastructure deployments using Terraform, OpenTofu, or Terragrunt
  • Hands-on experience with open source observability tools (Prometheus, Grafana, Mimir, Pyroscope)
  • Self-driven problem solver with initiative
  • Customer-focused engineering mindset
  • Clear communication skills across technical and non-technical audiences
Job Responsibility
Job Responsibility
  • Build Cloud Infrastructure at Scale on Azure, Google Cloud, and AWS
  • Master Kubernetes & CNCF Ecosystem with multi-tenant clusters
  • Scale Observability Platform with Prometheus, Grafana, Mimir, and Pyroscope
  • Transform Developer Experience with self-service tools and automation
  • Drive Infrastructure as Code with Terraform/OpenTofu
  • Solve Complex Technical Challenges like zero-downtime migrations and cost optimization
  • Collaborate Across Teams with Security, Backend, and Product Engineering
  • Iterate and Innovate in fast-paced environment
  • Fulltime
Read More
Arrow Right

Principal Data Engineer

PointClickCare is searching for a Principal Data Engineer who will contribute to...
Location
Location
United States
Salary
Salary:
183200.00 - 203500.00 USD / Year
pointclickcare.com Logo
PointClickCare
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Principal Data Engineer with at least 10 years of professional experience in software or data engineering, including a minimum of 4 years focused on streaming and real-time data systems
  • Proven experience driving technical direction and mentoring engineers while delivering complex, high-scale solutions as a hands-on contributor
  • Deep expertise in streaming and real-time data technologies, including frameworks such as Apache Kafka, Flink, and Spark Streaming
  • Strong understanding of event-driven architectures and distributed systems, with hands-on experience implementing resilient, low-latency pipelines
  • Practical experience with cloud platforms (AWS, Azure, or GCP) and containerized deployments for data workloads
  • Fluency in data quality practices and CI/CD integration, including schema management, automated testing, and validation frameworks (e.g., dbt, Great Expectations)
  • Operational excellence in observability, with experience implementing metrics, logging, tracing, and alerting for data pipelines using modern tools
  • Solid foundation in data governance and performance optimization, ensuring reliability and scalability across batch and streaming environments
  • Experience with Lakehouse architectures and related technologies, including Databricks, Azure ADLS Gen2, and Apache Hudi
  • Strong collaboration and communication skills, with the ability to influence stakeholders and evangelize modern data practices within your team and across the organization
Job Responsibility
Job Responsibility
  • Lead and guide the design and implementation of scalable streaming data pipelines
  • Engineer and optimize real-time data solutions using frameworks like Apache Kafka, Flink, Spark Streaming
  • Collaborate cross-functionally with product, analytics, and AI teams to ensure data is a strategic asset
  • Advance ongoing modernization efforts, deepening adoption of event-driven architectures and cloud-native technologies
  • Drive adoption of best practices in data governance, observability, and performance tuning for streaming workloads
  • Embed data quality in processing pipelines by defining schema contracts, implementing transformation tests and data assertions, enforcing backward-compatible schema evolution, and automating checks for freshness, completeness, and accuracy across batch and streaming paths before production deployment
  • Establish robust observability for data pipelines by implementing metrics, logging, and distributed tracing for streaming jobs, defining SLAs and SLOs for latency and throughput, and integrating alerting and dashboards to enable proactive monitoring and rapid incident response
  • Foster a culture of quality through peer reviews, providing constructive feedback and seeking input on your own work
What we offer
What we offer
  • Benefits starting from Day 1!
  • Retirement Plan Matching
  • Flexible Paid Time Off
  • Wellness Support Programs and Resources
  • Parental & Caregiver Leaves
  • Fertility & Adoption Support
  • Continuous Development Support Program
  • Employee Assistance Program
  • Allyship and Inclusion Communities
  • Employee Recognition … and more!
  • Fulltime
Read More
Arrow Right

Senior Principal Cloud Developer

The role involves designing and building innovative Agentic AI applications and ...
Location
Location
United States , San Jose
Salary
Salary:
157500.00 - 361500.00 USD / Year
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10-15 years of experience in developing highly scalable cloud and cloud-native applications using technology stacks, architecture, design, development, and support
  • at least one year of recent multi-agent Agentic and RAG GenAI Software Development experience applied to Networking and/or Observability domains
  • experience developing Network Observability software for large scale Network Monitoring, Network Performance, Network Configuration or Network Capacity Management products
  • deep understanding and experience in Networking Protocol and Networking Best Practices for Enterprise and Service Provider networks
  • proven skills and programming experience in Golang, scalable concurrent processing, REST, Data Caching Services, DB schema design and data access technologies
  • experience in building, orchestrating, and deploying highly scalable REST based stateless APIs/web services for web applications in Kubernetes environment
  • familiarity with code versioning tools such as Git
  • knowledge of Network and NetFlow Logs processing and indexing
  • ability to communicate with senior Executives and with customers
Job Responsibility
Job Responsibility
  • design and build large scale distributed systems
  • apply best practices for high availability, scalability, resilience, performance, and security requirements in the cloud
  • transition proof-of-concept implementations into R&D teams to accelerate new product delivery
  • create technical content such as designs, specifications, and initial software implementations
  • mentor less-experienced staff members
  • collect product feedback from field interactions to provide input into Engineering and Product Management
  • maintain knowledge of OpsRamp SaaS product and roadmap, as well as competition
  • collaborate with product team to translate functional requirements into technical solutions
  • develop monitoring solutions using tools and services that are part of the cloud infrastructure
  • facilitate CI/CD by integrating development processes
What we offer
What we offer
  • comprehensive suite of benefits supporting physical, financial, and emotional wellbeing
  • personal and professional development programs
  • unconditional inclusion and flexibility to manage work and personal needs
  • Fulltime
Read More
Arrow Right

Principal Data Engineer

We are on the lookout for a Principal Data Engineer to help define and lead the ...
Location
Location
United Kingdom
Salary
Salary:
Not provided
dotdigital.com Logo
Dotdigital
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Extensive experience delivering python-based projects in the data engineering space
  • Extensive experience working with SQL and NoSQL database technologies (e.g. SQL Server, MongoDB & Cassandra)
  • Proven experience with modern data warehousing and large-scale data processing tools (e.g. Snowflake, DBT, BiqQuery, Clickhouse)
  • Hands on experience with data orchestration tools like Airflow, Dagster or Prefect
  • Experience using cloud environments (e.g. Azure, AWS, GCP) to process, store and surface large scale data
  • Experience using Kafka or similar event-based architectures e.g. (Pub/Sub via AWS SQS, Azure EventHubs, AWS Kinesis)
  • Strong grasp of data architecture and data modelling principles for both OLAP and OLTP workloads
  • Capable in the wider software development lifecycle in terms of agile ways of working and continuous integration/deployment of data solutions
  • Experience as a lead or Principal Engineer on large-scale data initiative or product builds
  • Demonstrated ability to architect data systems and data structures for high volume, high throughput systems
Job Responsibility
Job Responsibility
  • Lead the design and implementation of scalable, secure and resilient data systems across streaming, batch and real-time use cases
  • Architect data pipelines, model and storage solutions that power analytical and product use cases
  • using primarily Python and SQL via orchestration tooling that run workloads in the cloud
  • Leverage AI to automate both data processing and engineering processes
  • Assure and drive best practices relating to data infrastructure, governance, security and observability
  • Work with technologists across multiple teams to deliver coherent features and data outcomes
  • Support the data team to help adopt data engineering principles
  • Identify, validate and promote new tools and technologies that improve the performance and stability of data services
What we offer
What we offer
  • Parental leave
  • Medical benefits
  • Paid sick leave
  • Dotdigital day
  • Share reward
  • Wellbeing reward
  • Wellbeing Days
  • Loyalty reward
  • Fulltime
Read More
Arrow Right

Principal Engineer

The Principal AI/ML Operations Engineer leads the architecture, automation, and ...
Location
Location
United States , Pleasanton, California
Salary
Salary:
251000.00 - 314500.00 USD / Year
blackline.com Logo
BlackLine
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Machine Learning, Data Science, or a related field
  • 10+ years in ML infrastructure, DevOps, and software system architecture
  • 4+ years in leading MLOps or AI Ops platforms
  • Strong programming skills in languages such as Python, Java, or Scala
  • Expertise in ML frameworks (TensorFlow, PyTorch, scikit-learn) and orchestration tools (Airflow, Kubeflow, Vertex AI, MLflow)
  • Proven experience operating production pipelines for ML and LLM-based systems across cloud ecosystems (GCP, AWS, Azure)
  • Deep familiarity with LangChain, LangGraph, ADK or similar agentic system runtime management
  • Strong competencies in CI/CD, IaC, and DevSecOps pipelines integrating testing, compliance, and deployment automation
  • Hands-on with observability stacks (Prometheus, Grafana, Newrelic) for model and agent performance tracking
  • Understanding of governance frameworks for Responsible AI, auditability, and cost metering across training and inference workloads
Job Responsibility
Job Responsibility
  • Define enterprise-level standards and reference architectures for ML-Ops and AIOps systems
  • Partner with data science, security, and product teams to set evaluation and governance standards (Guardrails, Bias, Drift, Latency SLAs)
  • Mentor senior engineers and drive design reviews for ML pipelines, model registries, and agentic runtime environments
  • Lead incident response and reliability strategies for ML/AI systems
  • Lead the deployment of AI models and systems in various environments
  • Collaborate with development teams to integrate AI solutions into existing workflows and applications
  • Ensure seamless integration with different platforms and technologies
  • Define and manage MCP Registry for agentic component onboarding, lifecycle versioning, and dependency governance
  • Build CI/CD pipelines automating LLM agent deployment, policy validation, and prompt evaluation of workflows
  • Develop and operationalize experimentation frameworks for agent evaluations, scenario regression, and performance analytics
What we offer
What we offer
  • short-term and long-term incentive programs
  • robust offering of benefit and wellness plans
  • Fulltime
Read More
Arrow Right

Principal Engineer, AI Strategy and Innovation

Shape the architecture and execution of CLEAR’s AI platform strategy, from infra...
Location
Location
United States , New York
Salary
Salary:
250000.00 - 290000.00 USD / Year
clearme.com Logo
Clear
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 12+ years in software engineering and/or technical experience with deep expertise in AI systems, ML platforms, and data infrastructure
  • At least 5 years of experience with various AI technologies including GenAI, ML, Deep Learning, RPA or others
  • Proven ability to scale AI capabilities into high-throughput, low-latency environments
  • Strong technical background in cloud-native architectures (AWS or similar) and modern AI/ML stacks (TensorFlow/PyTorch, MLflow, RAG, MCP, etc.)
  • Experience leading AI strategy and platform adoption in enterprise-scale environments
  • Skilled at translating regulatory and compliance requirements into responsible AI practices
  • Track record of partnering closely with Product, Engineering, Analytics, and Security teams as well as business executives
  • Excellent communicator who can set a vision for AI, explain technical trade-offs, and influence executives, peers, and partners
  • Passionate about embedding AI into core products to deliver measurable impact for members and enterprise partners
Job Responsibility
Job Responsibility
  • Define and scale CLEAR’s AI strategy: spanning data pipelines, ML lifecycle management, and intelligent applications
  • Lead engineering execution for AI models (development, deployment, monitoring, retraining) with a focus on reliability, observability, and ethical AI practices
  • Modernize analytics and intelligence systems to deliver predictive insights and partner-facing transparency in real time
  • Operationalize trust in AI by embedding privacy, compliance, and security into all platforms and workflows
  • Influence cross-functional stakeholders across the business, fostering a culture of technical rigor, collaboration, and innovation, advising C Suite executives, leaders, and individual contributors
  • Lead the AI Governance group and drive best practices across business functions
  • Track and optimize KPIs on AI adoption, model performance, scalability, and business impact
What we offer
What we offer
  • Comprehensive healthcare plans
  • Family-building benefits (fertility and adoption/surrogacy support)
  • Flexible time off
  • Annual wellness stipend
  • Free OneMedical memberships for you and your dependents
  • A CLEAR Plus membership
  • A 401(k) retirement plan with employer match
  • Catered lunches every day
  • Fully stocked kitchens
  • Stipends and reimbursement programs for well-being and learning & development
  • Fulltime
Read More
Arrow Right