CrawlJobs Logo

Observability Engineer – Splunk Focus

https://www.inetum.com Logo

Inetum

Location Icon

Location:
Portugal , Lisbon

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

Join our growing Monitoring team! As a Splunk Specialist, you will collaborate closely with colleagues across all regions and interact with various internal teams to support and enhance our monitoring capabilities.

Job Responsibility:

  • Provide support for monitoring tools: Splunk (Enterprise & ITSI), OpenTelemetry, Cribl, SolarWinds, Dynatrace
  • Automate daily tasks using Ansible
  • Assist development and production teams in migrating to the new Splunk Enterprise and ITSI platforms
  • Build dashboards and define relevant metrics
  • Propose and implement improvements across tools, processes, and KPIs

Requirements:

  • Proven expertise in Splunk Enterprise
  • Strong experience with Splunk ITSI
  • Knowledge of Cribl
  • Ability to design and implement Splunk dashboards
  • Familiarity with automation tools (e.g., Ansible)
  • Experience working in multi-regional teams is a plus

Nice to have:

French

Additional Information:

Job Posted:
July 25, 2025

Employment Type:
Fulltime
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Observability Engineer – Splunk Focus

Monitoring & Observability Engineer

The Monitoring & Observability Engineer is a senior level position responsible f...
Location
Location
India , Chennai; Pune
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3-7 years of relevant experience in an Engineering & IT role
  • At least 2+ years of hands-on working experience in: Strong understanding of UI/UX principles and best practices
  • Proficient in JavaScript, TypeScript, HTML, CSS, React, and Node.js
  • Experience with backend technologies and databases (e.g., MongoDB)
  • Experience with Python Programming
  • Experience with version control systems (e.g., Git)
  • Strong problem-solving and analytical skills
  • Excellent communication and collaboration skills
  • Create modular and reusable React components to streamline development and maintain consistency across the application
  • Continuously improve existing applications, addressing bugs, and implementing new features
Job Responsibility
Job Responsibility
  • Drive the best-in-class monitoring using a range of tools across all regions of Global Consumer bank
  • Drive POCs and incubate new features and capabilities
  • Be forward looking and ensure long term strategic success
  • Work closely with the monitoring operations teams, production support, performance test teams, operations, application owners and application owners to deliver best-in-class monitoring
  • Explain complicated performance bottlenecks to stakeholders
  • Understand complicated application architecture, including Java app servers, Web Servers, Cloud (PCF, AWS, Google), Kubernetes, TIBCO, mainframe
  • Build advanced dashboards and queries
  • Be a subject matter expert for the Global Consumer Bank, including conducting brown bags and office hours
  • Recommend product customization for system integration
  • Identify problem causality, business impact and root causes
  • Fulltime
Read More
Arrow Right
New

Lead Systems Operations Engineer - Platform Reliability Engineering, Sre, Observability And Monitoring, Platform Support

Wells Fargo is seeking a Lead Systems Operations Engineer. Platform Reliability ...
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
https://www.wellsfargo.com/ Logo
Wells Fargo
Expiration Date
May 21, 2026
Flip Icon
Requirements
Requirements
  • 5+ years of Systems Engineering, Technology Architecture experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
  • 5+ years of experience in Systems Operations, SRE, Platform Engineering, or Production Support with deep expertise in at least one platform domain: Database, Cloud, Network, Compute/Storage, Middleware, or Enterprise Application Support
  • Strong hands-on experience applying SRE practices, including SLI/SLO definition, error budgets, and reliability metrics
  • Proven experience troubleshooting and resolving large-scale, distributed production systems
  • Hands-on experience with observability and monitoring tools such as Grafana, Splunk, Prometheus, Cribl, ThousandEyes, AppDynamics, or equivalent, including dashboards, alerting, logs, and metrics
  • Strong scripting and automation skills using Python, Bash, and/or PowerShell to reduce operational toil
  • Experience building automation or reliability tooling using APIs, Git-based workflows, and modern engineering practices
  • Solid understanding of incident, problem, and change management in enterprise production environments
  • Strong communication and influencing skills across engineering teams and senior leadership
  • Experience with capacity management, performance engineering, and resiliency design (HA, fault tolerance, RTO/RPO)
Job Responsibility
Job Responsibility
  • Lead complex, broad impact initiatives including provision of high level systems consultation for the technology teams
  • Work as key participant in large scale planning of computer systems and network infrastructure for Systems Operations functional area
  • Review and analyze complex technical challenges, as well as escalated support issues related to core business solutions that require in depth evaluation of multiple factors, such as alternatives, enhancements, periodic systems reviews, or improvements to existing systems
  • Make decisions on technical changes and enhancements
  • Consult with engineering team on change design requiring solid understanding of technical process controls or standards that influence and drive new initiatives
  • Collaborate and consult with technical peers, colleagues, and mid to more experienced level managers to resolve systems support issues and achieve goals
  • Fulltime
Read More
Arrow Right

Principal Architect - Cloud and Observability

We're building a world of health around every individual — shaping a more connec...
Location
Location
United States
Salary
Salary:
144200.00 - 288400.00 USD / Year
https://www.cvshealth.com/ Logo
CVS Health
Expiration Date
June 29, 2026
Flip Icon
Requirements
Requirements
  • 10+ years in infrastructure, cloud architecture, platform engineering, or SRE
  • 8+ years of architecture work in observability, cloud infrastructure, or both at a large enterprise
  • Solid experience with at least two of Azure, AWS, or GCP -- including networking, identity, compute, and storage
  • 5+ years with Kubernetes in production (OpenShift, EKS, AKS, or GKE)
  • 5+ years with OpenTelemetry or similar frameworks (collectors, SDKs, semantic conventions, pipeline design)
  • 5+ years with observability platforms: Grafana/Mimir/Loki/Tempo, Prometheus, Datadog, Splunk, Dynatrace, or comparable tools
  • Experience defining SLOs/SLIs and building alerting strategies at an organizational level
  • Proven track record writing architecture standards that other teams adopted and followed
  • Able to communicate clearly with both engineers and senior leadership
Job Responsibility
Job Responsibility
  • Own the enterprise observability reference architecture covering metrics, logs, traces, and events across all environments (cloud and on-prem)
  • Drive the OpenTelemetry-first instrumentation strategy -- standard libraries, semantic conventions, collector topologies (DaemonSet, gateway, sidecar), and pipeline design
  • Build and operate telemetry pipelines on Grafana Mimir, Loki, and Tempo, including multi-tenant configurations, retention policies, and capacity planning
  • Define how we measure reliability: SLOs, SLIs, error budgets, and alerting frameworks -- consistently across all lines of business
  • Own the integration between observability tooling and incident management (ServiceNow ITOM, xMatters)
  • Drive telemetry schema standards to ensure teams emit data that is useful downstream, not just technically compliant
  • Build and maintain reference architectures for our hybrid footprint: OpenShift on-prem with KVM/libvirt and Dell PowerFlex storage, plus Azure, AWS, and GCP
  • Lead standards work around workload identity and federation using SPIFFE/SPIRE and cloud-native IAM patterns to move away from static secrets
  • Provide guidance on compute runtime selection -- containers vs. VMs vs. bare metal vs. serverless -- with a clear decision framework for teams
  • Help teams connect autoscaling and capacity planning behavior to actual telemetry signals
What we offer
What we offer
  • medical, dental, and vision coverage
  • paid time off
  • retirement savings options
  • wellness programs
  • other resources, based on eligibility
  • bonus, commission or short-term incentive program
  • equity award program
  • Fulltime
Read More
Arrow Right

Senior Lead Systems Operations Engineer

Wells Fargo is seeking a Senior Lead Systems Operations Engineer.
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
https://www.wellsfargo.com/ Logo
Wells Fargo
Expiration Date
May 24, 2026
Flip Icon
Requirements
Requirements
  • 7+ years of Systems Engineering, Technology Architecture experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
  • 7+ years of experience in Systems Operations, SRE, Platform Engineering, or Production Support with deep expertise in at least one platform domain: Database, Cloud, Network, Compute/Storage, Middleware, or Enterprise Application Support
  • Strong hands-on experience applying SRE practices, including SLI/SLO definition, error budgets, and reliability metrics
  • Proven experience troubleshooting and resolving large-scale, distributed production systems
  • Hands-on experience with observability and monitoring tools such as Grafana, Splunk, Prometheus, Cribl, ThousandEyes, AppDynamics, or equivalent, including dashboards, alerting, logs, and metrics
  • Strong scripting and automation skills using Python, Bash, and/or PowerShell to reduce operational toil
  • Experience building automation or reliability tooling using APIs, Git-based workflows, and modern engineering practices
  • Solid understanding of incident, problem, and change management in enterprise production environments
  • Strong communication and influencing skills across engineering teams and senior leadership
  • Experience with capacity management, performance engineering, and resiliency design (HA, fault tolerance, RTO/RPO)
Job Responsibility
Job Responsibility
  • Act as an advisor to senior leadership to develop or influence platform support solutions for highly complex business and technical needs or technology initiatives
  • Lead highly complex, broad impact initiatives including provision of high-level systems consultation for the technology teams related to large scale planning of computer systems and network infrastructure for Systems Operations functional areas
  • Lead the strategy and resolution of highly complex and unique challenges requiring in-depth evaluation across multiple areas or the enterprise, delivering solutions that are long-term, large-scale and require vision, creativity, innovation, advanced analytical and inductive thinking
  • Translate advanced technology experience, in-depth knowledge of the organizations tactical and strategic business objectives, the enterprise technological environment, the organization structure, and strategic technological opportunities and requirements into technical engineering solutions
  • Provide vision, direction and expertise to senior leadership on implementing innovative and significant business solutions
  • Maintain knowledge of industry best practices and new technologies and recommend innovations that enhance operations or provide a competitive advantage to the organization
  • Strategically engage with all levels of professionals and managers across the enterprise and serve as an expert advisor to leadership
  • Provide training and mentoring to less experienced team members on guidebook changes and lead team to meet technical deliverables, while leveraging solid understanding of technical process controls or standards
  • Act as a Platform Reliability Engineering (PRE) subject matter expert, providing deep technical leadership in one core domain (Database, Cloud, Network, Compute/Storage, Middleware, or Application Support)
  • Lead analysis and resolution of complex, systemic production reliability issues, translating recurring incidents into long-term engineering solutions
  • Fulltime
Read More
Arrow Right

DevOps Engineer

Radix is building the most trusted data and analytics platform in multifamily. J...
Location
Location
Kosovo , Prishtine
Salary
Salary:
Not provided
radix.com Logo
Radix (AZ)
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven experience in a DevOps, SRE, or infrastructure-focused engineering role with a strong understanding of CI/CD concepts and tools such as Jenkins, GitLab CI, or CircleCI
  • Hands-on experience working in cloud environments like AWS, Azure, or Google Cloud and are comfortable designing, deploying, and managing scalable cloud infrastructure
  • Proficiency in scripting languages such as Python, Bash, or Ruby to automate tasks and improve operational efficiency
  • Practical experience with containerization using Docker and orchestration using Kubernetes
  • Experienced with configuration management tools such as Ansible, Chef, or Puppet to maintain consistent environments
  • Naturally approach problems with curiosity, seeking to understand root causes and explore innovative solutions
  • Demonstrate resilience in fast-moving, high-growth environments and remain effective when priorities or conditions change quickly
  • Adapt easily to ambiguity, shifting requirements, and evolving technologies, adjusting your approach with confidence
  • Thrive in a startup environment where ownership, iteration, and continuous improvement are core to how you work
  • Bring additional value through familiarity with infrastructure-as-code tools (Terraform, CloudFormation, Pulumi), cloud security best practices, and observability tools such as Prometheus, Grafana, ELK, or Splunk
Job Responsibility
Job Responsibility
  • Design and maintain CI/CD pipelines that accelerate software delivery and improve release reliability
  • Collaborate with engineering teams to streamline development workflows and strengthen DevOps best practices
  • Ensure high availability, scalability, and performance across production and development environments
  • Implement infrastructure as code (IaC) using Terraform, Ansible, CloudFormation, or similar tools
  • Enhance the security and compliance posture of Radix infrastructure and applications
  • Troubleshoot and resolve issues across development, staging, and production systems with urgency and clarity
  • Build and maintain monitoring, logging, and alerting systems to proactively detect and respond to incidents
  • Improve observability and system visibility to support data-driven operational decisions
  • Introduce automation to reduce operational toil and increase engineering efficiency
  • Stay current with emerging DevOps technologies and recommend improvements to infrastructure, tooling, and processes
What we offer
What we offer
  • Medical, dental and vision coverage designed to support your wellbeing
  • Unlimited PTO
  • Pre-IPO Equity
  • Performance Bonus
  • Learn From the Best
  • Build Category-Defining Products
Read More
Arrow Right

Site Reliability Engineering (SRE) / Observability Technical Lead

Join a dynamic team as a Site Reliability Engineer, leading observability and re...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
nttdata.com Logo
NTT DATA
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in SRE, Observability, or DevOps roles, with leadership responsibilities
  • Proven expertise with Application Performance Monitoring (APM) tools such as New Relic, Datadog, AppDynamics, or Dynatrace
  • Hands-on experience with OpenTelemetry (OTel) for distributed tracing and observability instrumentation
  • Strong proficiency in Infrastructure as Code (IaC) using Terraform
  • Solid understanding of cloud platforms including AWS, GCP, or Azure
  • Experience with automation/configuration management tools like Ansible, Chef, or Puppet
  • Deep knowledge of CI/CD pipelines and tools such as GitHub Actions, Jenkins, or Azure DevOps
  • Experience managing Kubernetes and containerized environments (Docker, Helm)
  • Familiarity with log aggregation and analysis platforms like ELK Stack or Splunk
  • Excellent leadership, communication, and collaboration skills
Job Responsibility
Job Responsibility
  • Lead the strategic development and management of observability and reliability frameworks across the organization, ensuring alignment with business goals and technical requirements
  • Design and implementation of monitoring and observability solutions, collaborating with engineering teams to define standards and best practices
  • Manage Infrastructure as Code (IaC) initiatives using Terraform, coordinating with cloud and infrastructure teams to ensure scalable and secure deployments
  • Drive automation strategies for monitoring, alerting, and logging pipelines, focusing on process improvements and operational efficiency
  • Develop and maintain comprehensive observability roadmaps, including distributed tracing, logging, and metrics collection strategies
  • Collaborate with product management, sales, and pre-sales teams to provide technical expertise and support during solution design and customer engagements
  • Lead cross-functional teams to enhance CI/CD pipelines and deployment reliability, ensuring smooth integration of observability tools and practices
  • Engage with vendors and strategic partners to evaluate, select, and integrate observability and monitoring solutions, ensuring alignment with organizational needs and fostering strong collaborative relationships
  • Mentor and develop junior engineers and analysts, fostering a culture of reliability, observability, and operational excellence
What we offer
What we offer
  • Tailored benefits that support your physical, emotional, and financial wellbeing
  • Continuous growth and development opportunities
  • Flexible work options
  • Fulltime
Read More
Arrow Right
New

Site Reliability Engineering (SRE) / Lead Engineer

We are currently seeking a Site Reliability Engineering (SRE) / Lead Engineer to...
Location
Location
Mexico , Guadalajara
Salary
Salary:
Not provided
nttdata.com Logo
NTT DATA
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8-10+ years of experience in SRE, Observability, or DevOps roles, with leadership responsibilities
  • Hands-on experience with OpenTelemetry for distributed tracing and observability instrumentation
  • Proven expertise with Application Performance Monitoring (APM) tools such as New Relic, Datadog, AppDynamics, or Dynatrace
  • Strong proficiency in Infrastructure as Code (IaC) using Terraform
  • Solid understanding of cloud platforms including AWS, GCP, or Azure
  • Experience with automation/configuration management tools like Ansible, Chef, or Puppet
  • Deep knowledge of CI/CD pipelines and tools such as GitHub Actions, Jenkins, or Azure DevOps
  • Experience managing Kubernetes and containerized environments (Docker, Helm)
  • Familiarity with log aggregation and analysis platforms like ELK Stack or Splunk
  • Excellent leadership, communication, and collaboration skills
Job Responsibility
Job Responsibility
  • Lead the strategic development and management of observability and reliability frameworks across the organization, ensuring alignment with business goals and technical requirements
  • Design and implementation of monitoring and observability solutions, collaborating with engineering teams to define standards and best practices
  • Manage Infrastructure as Code (IaC) initiatives using Terraform, coordinating with cloud and infrastructure teams to ensure scalable and secure deployments
  • Drive automation strategies for monitoring, alerting, and logging pipelines, focusing on process improvements and operational efficiency
  • Develop and maintain comprehensive observability roadmaps, including distributed tracing, logging, and metrics collection strategies
  • Collaborate with product management, sales, and pre-sales teams to provide technical expertise and support during solution design and customer engagements
  • Lead cross-functional teams to enhance CI/CD pipelines and deployment reliability, ensuring smooth integration of observability tools and practices
  • Engage with vendors and strategic partners to evaluate, select, and integrate observability and monitoring solutions, ensuring alignment with organizational needs and fostering strong collaborative relationships
  • Mentor and develop junior engineers and analysts, fostering a culture of reliability, observability, and operational excellence
  • Fulltime
Read More
Arrow Right

Senior/Staff Software Engineer (Backend), Growth

Our Growth Team sits at the heart of Airwallex’s mission, focusing on user acqui...
Location
Location
United States , San Francisco
Salary
Salary:
140000.00 - 230000.00 USD / Year
airwallex.com Logo
Airwallex
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor degree or above in computer science or engineering related majors
  • 5+ years of experience in modern engineering practices focused on continuous integration and quality engineering
  • Strong computer science fundamentals, solid understanding of OOP concepts
  • Experience domain driven design and event driven architectures
  • Deep working experience with high throughput / low latency / high available distributed systems
  • Cloud experience with GCP (preferred) or AWS (EC2, RDS, ELB, CloudFront, etc.) with Docker and Kubernetes
  • Familiarity with observability tooling such as Splunk, Grafana, and Prometheus
Job Responsibility
Job Responsibility
  • Design and Build Scalable Systems
  • Enable Data-Driven Decisions
  • Collaborate and Innovate
  • Experiment and Iterate
  • Promote Security & Compliance
What we offer
What we offer
  • medical, dental, and vision insurance
  • a 401(k) plan
  • short-term and long-term disability
  • basic life insurance
  • well-being benefits
  • 20 paid days of vacation
  • 12 paid days of company holidays
  • Fulltime
Read More
Arrow Right