CrawlJobs Logo

Java SRE

realign-llc.com Logo

Realign

Location Icon

Location:
United States , Phoenix

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

117000.00 USD / Year

Job Responsibility:

  • Excellent problem-solving skills and the ability to work under pressure in a fast-paced environment
  • Monitor and maintain the health, availability, and performance of production systems and applications
  • Troubleshoot and resolve production incidents, ensuring minimal downtime and service disruption
  • Identifying Defects and working with Dev to get them fixed based on priority
  • Taking care of implementation of RFCs
  • Doing pre and post validation of servers during traffic diversion
  • Collaborate with engineering teams to implement reliability best practices and improve system performance
  • Develop and maintain monitoring alerts and dashboards to ensure visibility into system metrics
  • Participate in on-call rotation and provide timely support for high-impact incidents
  • Implement automation tools and processes to streamline operations and reduce manual workloads
  • Document incidents and solutions for knowledge management and continuous improvement

Requirements:

  • Core Java
  • Splunk
  • Kibana
  • Grafana
  • Databases: Postgres, MongoDB
  • Experience in Production support engineering or SRE roles, preferably within the banking industry
  • Skilled in L1/L2 support, debugging, performance monitoring, and working in Agile/Scrum environments
  • Hands-on with ServiceNow, Spring Boot, REST APIs, and CI/CD pipelines
  • Strong knowledge of cloud services

Additional Information:

Job Posted:
March 19, 2026

Employment Type:
Fulltime
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Java SRE

VP - Cloud Security Reliability Engineer (SRE)

This role sits within the Cloud Security team which is responsible for Private a...
Location
Location
Singapore , Singapore
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree or equivalent work experience
  • 6+ years of relevant work experience
  • Highly motivated self-starter with excellent interpersonal and communication skills
  • Certification or formal training in site reliability engineering concepts and practices
  • Prior experience working towards SLIs, SLOs and observability capabilities at a large scale
  • 4+ years experience in Python (preferable) or Java, on large scale systems alongside Linux based scripting languages
  • Experience working on observability, logging and metrics toolsets: Prometheus, Grafana, Splunk, Elk
  • Experience of k8s and container technologies: Docker, Openshift and EKS
  • Experience with public cloud technologies: AWS, GCP or Azure
  • Experience with Secrets products: HashiCorp Vault or CyberArk
Job Responsibility
Job Responsibility
  • Working across Container products and Secrets products, across Public and Private Cloud, as well as Cloud native specific products
  • Architecting and building tools and platforms that provide capabilities for SRE
  • Collaboration with multiple stakeholders and partners across Engineering and Operations as well as partner teams within the wider Citi organisation
  • Actively owning production level incidents till resolution
  • Fulltime
Read More
Arrow Right

VP - Cloud Security Reliability Engineer (SRE)

This role sits within the Cloud Security team which is responsible for Private a...
Location
Location
Singapore , Singapore
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree or equivalent work experience
  • 6+ years of relevant work experience
  • Highly motivated self-starter with excellent interpersonal and communication skills
  • Certification or formal training in site reliability engineering concepts and practices
  • Prior experience working towards SLIs, SLOs and observability capabilities at a large scale
  • 4+ years experience in Python (preferable) or Java, on large scale systems alongside Linux based scripting languages
  • Experience working on observability, logging and metrics toolsets: Prometheus, Grafana, Splunk, Elk
  • Experience of k8s and container technologies: Docker, Openshift and EKS
  • Experience with public cloud technologies: AWS, GCP or Azure
  • Experience with Secrets products: HashiCorp Vault or CyberArk
Job Responsibility
Job Responsibility
  • Working across Container products and Secrets products, across Public and Private Cloud, as well as Cloud native specific products
  • Architecting and building tools and platforms that provide capabilities for SRE
  • Collaboration with multiple stakeholders and partners across Engineering and Operations as well as partner teams within the wider Citi organisation
  • Actively owning production level incidents till resolution
  • Fulltime
Read More
Arrow Right

SRE Production Support

We’re passionate about building software that solves problems. We count on our s...
Location
Location
United States , Livonia
Salary
Salary:
Not provided
selectmindsllc.com Logo
Select Minds
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master’s degree in Computer Science or related discipline
  • Ability to program (structured and OOP) using one or more high-level languages, such as Python, Java, C/C++, Ruby, and JavaScript
  • 5 to 6 years of experience in Production Support
  • Minimum 6+ years of professional experience in SRE Production Support
  • Experience with NFS, HDFS, Ceph, and Amazon S3, as well as dynamic resource management frameworks (Apache Mesos, Kubernetes, Yarn)
  • Must Provide 24×7 support on the production servers on a rotation basis
Job Responsibility
Job Responsibility
  • Monitoring and reporting on application behavior analytics, conducts smart triage by identifying, diagnosing, and coordinating resolution of performance problems before they impact end users, and participates in rapid root cause diagnosis of problems occurring within the application and infrastructure
  • Identifying the functional domain in which problems reside (Server Utilization, network Saturation, Application Tuning)
  • Participating in all Major Incident Management and Root Cause Analysis calls and provides expert troubleshooting support as needed
  • Understanding of troubleshooting, incidents and problems, work to resolve issues timely and determine fault or underlying issue. Work with both customer and vendor personnel
  • Monitoring high value Business-centric transactions and manages response actions
  • Maintaining accurate documentation for assigned workspace and procedures, updating procedures including, but not limited to software, hardware layers
  • Understand and utilize de-escalation techniques when working with difficult customers
  • Monitoring Application infrastructure and network through monitoring tools like Splunk, AppDynamics, Dynatrace
  • Proactively detects, reports, logs, and responds to all network performance and availability problems in each part of the Application
  • Follows incident, problem and change management processes related to technology infrastructure being supported. Reviews system requirements and application dependencies to determine monitoring configuration
  • Fulltime
Read More
Arrow Right

Applications Development Sr Programmer Analyst

Integration Services within Common Platform Engineering is responsible for devel...
Location
Location
Canada , Mississauga
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience working in Financial Services or a large complex and/or global environment
  • Experience of the following technologies: Kafka Ecosystem (Confluent distribution preferred)
  • Kubernetes and Openshift
  • Java
  • React
  • Familiarity with SRE practices
  • Consistently demonstrates clear and concise written and verbal communication
Job Responsibility
Job Responsibility
  • Designing and developing workflow solutions to integrate Kafka with our data governance and control platforms
  • Understanding the existing onboarding flow and working to streamline and simplify the process
  • Design and develop developer facing tooling to manage topics and connectors
  • Help to deliver the SRE requirements for this stack
  • Fulltime
Read More
Arrow Right

Principal Site Reliability Engineer

We are looking for a reliability expert who is passionate about scaling Cloud se...
Location
Location
United States , San Francisco; Mountain View
Salary
Salary:
170800.00 - 274300.00 USD / Year
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Expert-level proficiency with 8+ years experience in at least Java
  • Expert-level proficiency with 5+ years experience in public cloud offerings (AWS components like EC2, CloudFormation, RDS / Aurora, Caches, SQS - or equivalents, e.g. in GCP / Azure)
  • Expert-level proficiency with 5+ years experience in operating high-availability, fault-tolerant, scalable, distributed software in production: building monitoring into your code, tweaking dashboards, defining alerts, writing runbooks, etc.
  • Experience in driving large, complex, cross-organizational initiatives from inception to completion
  • Excellent communication skills in written and verbal forms, and an ability to communicate complex technical issues to a range of technical and non-technical audiences (management, peers, clients)
  • Experience in leadership positions, able to influence others and drive impactful outcomes through delegation
  • An ability and desire to mentor and coach engineers
Job Responsibility
Job Responsibility
  • Advocate for reliability methodologies
  • Work with a variety of platform, product and SRE teams to both build reliability into our platform and drive adoption of those practices into our products
  • Analyze and help improve our services and processes to get us to an even higher level of reliability, performance, scalability, and cost efficiency
What we offer
What we offer
  • Health and wellbeing resources
  • Paid volunteer days
  • Equity
  • Bonuses
  • Commissions
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

We are recruiting a Senior SRE for a company that provides an advanced data, ope...
Location
Location
Portugal , Lisboa
Salary
Salary:
Not provided
https://www.precisers.pt Logo
Precise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Up to 5 years of experience in a Site Reliability Engineering SRE, DevOps, or Production Engineering role, with a deep understanding of SRE principles and best practices
  • Incident management expertise, including triaging, escalation, and resolution of high-severity outages
  • Proficiency in at least one coding language Python or Java) for automation and debugging
  • Hands-on experience in Kubernetes K8s for managing and orchestrating containerized applications
  • Cloud experience AWS preferred) with exposure to key services like EC2, S3, Lambda, and CloudWatch
  • Excellent communication skills to articulate technical challenges and solutions effectively
  • Strong troubleshooting and problem-solving skills, with experience diagnosing complex production issues
  • Ability to stay calm under pressure, multitask, and prioritize effectively in fast-moving environments
  • Fluency in English (spoken and written) is required
  • Must have the legal right to work in the country
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

We are looking for an engineer who is passionate about scaling cloud services to...
Location
Location
United States , San Francisco; Austin; Mountain View; Washington DC; Seattle; New York
Salary
Salary:
116700.00 - 187400.00 USD / Year
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 1+ years experience operating high-availability, fault-tolerant, scalable, distributed software in production: building monitoring into your code, tweaking dashboards, defining alerts, writing runbooks, etc.
  • 1+ years of hands-on experience with public cloud offerings (AWS components like EC2, CloudFormation, RDS / Aurora, Caches, SQS - or equivalents, e.g. in GCP / Azure)
  • Familiarity with Unix / Linux operating systems
  • Great emphasis to debug, improve code, and automate routine tasks
  • Backend engineering experience in one or more prominent languages such as Java, Go or Python
  • Strong communication skills in written and verbal forms, and an ability to communicate complex technical issues to a range of technical and non-technical audiences (management, peers, clients)
Job Responsibility
Job Responsibility
  • Scaling cloud services
  • Owning the caching infrastructure, tooling, and automation that support Atlassian’s suite of Cloud products
  • Analyzing and improving services and processes to achieve higher levels of reliability, performance, scalability, and cost efficiency
What we offer
What we offer
  • Health coverage
  • Paid volunteer days
  • Wellness resources
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

As a Site Reliability Engineer (SRE), you will actively work to improve the perf...
Location
Location
United States
Salary
Salary:
116700.00 - 187400.00 USD / Year
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong coding/scripting experience
  • Serious troubleshooting skills across different levels of the stack
  • Maintaining a high standard of code quality
  • Understanding of Linux systems
  • Experience configuring and managing enterprise monitoring/metrics/logging solutions
  • Building, automating, and maintaining infrastructure in Amazon Web Services with infrastructure as code
  • Maintaining and troubleshooting continuous integration / continuous delivery pipelines in support of development teams
Job Responsibility
Job Responsibility
  • Improve the performance and reliability of Atlassian Analytics and our Analytics Visualization Platform
  • Expand our system to handle new system capabilities
  • Scale to support growing usage by customers and adoption in new Atlassian products
  • Address root causes of incidents and reduce incident rates
  • Serve in an on-call weekly rotation to make sure our products meet established SLO targets
What we offer
What we offer
  • Health coverage
  • Paid volunteer days
  • Wellness resources
  • Fulltime
Read More
Arrow Right