CrawlJobs Logo

SRE Production Support

selectmindsllc.com Logo

Select Minds

Location Icon

Location:
United States , Livonia

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

We’re passionate about building software that solves problems. We count on our site reliability engineers (SREs) to empower users with a rich feature set, high availability, and stellar performance level to pursue their missions. As we expand customer deployments, we’re seeking an experienced SRE to deliver insights from massive-scale data in real time. Specifically, we’re searching for someone who has fresh ideas and a unique viewpoint, and who enjoys collaborating with a cross-functional team to develop real-world solutions and positive user experiences for every interaction.

Job Responsibility:

  • Monitoring and reporting on application behavior analytics, conducts smart triage by identifying, diagnosing, and coordinating resolution of performance problems before they impact end users, and participates in rapid root cause diagnosis of problems occurring within the application and infrastructure
  • Identifying the functional domain in which problems reside (Server Utilization, network Saturation, Application Tuning)
  • Participating in all Major Incident Management and Root Cause Analysis calls and provides expert troubleshooting support as needed
  • Understanding of troubleshooting, incidents and problems, work to resolve issues timely and determine fault or underlying issue. Work with both customer and vendor personnel
  • Monitoring high value Business-centric transactions and manages response actions
  • Maintaining accurate documentation for assigned workspace and procedures, updating procedures including, but not limited to software, hardware layers
  • Understand and utilize de-escalation techniques when working with difficult customers
  • Monitoring Application infrastructure and network through monitoring tools like Splunk, AppDynamics, Dynatrace
  • Proactively detects, reports, logs, and responds to all network performance and availability problems in each part of the Application
  • Follows incident, problem and change management processes related to technology infrastructure being supported. Reviews system requirements and application dependencies to determine monitoring configuration
  • Involving in creating documentation

Requirements:

  • Master’s degree in Computer Science or related discipline
  • Ability to program (structured and OOP) using one or more high-level languages, such as Python, Java, C/C++, Ruby, and JavaScript
  • 5 to 6 years of experience in Production Support
  • Minimum 6+ years of professional experience in SRE Production Support
  • Experience with NFS, HDFS, Ceph, and Amazon S3, as well as dynamic resource management frameworks (Apache Mesos, Kubernetes, Yarn)
  • Must Provide 24×7 support on the production servers on a rotation basis

Additional Information:

Job Posted:
December 11, 2025

Employment Type:
Fulltime
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for SRE Production Support

Head of Support

Coralogix is a modern, full-stack observability platform transforming how busine...
Location
Location
Israel , Ramat Gan
Salary
Salary:
Not provided
coralogix.com Logo
Coralogix
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience in technical support, DevOps, SRE, or similar roles
  • Strong knowledge of AWS/Azure/GCP and Kubernetes ecosystems
  • Familiarity with observability tools (Kibana, Grafana, Prometheus, Datadog, Splunk, ELK)
  • Hands-on experience with Kubernetes, Docker, and distributed systems
  • Proficiency with ELK concepts, RegEx, Lucene, and PromQL
  • Proven leadership of global/multi-regional support teams (35+ people)
  • Strong incident management and escalation-handling skills
  • Ability to optimize support operations, workflows, and tooling
  • Strong analytical and data-driven decision-making abilities
  • Excellent communicator with technical and non-technical audiences
Job Responsibility
Job Responsibility
  • Lead and coach global Technical Support Engineering teams
  • Ensure high-quality support with improvements in CSAT, response/resolution times, backlog, and KPIs
  • Maintain clear global processes and standards
  • Align with regional leads for coverage across time zones
  • Act as the senior escalation point for complex issues
  • Guide engineers in root cause analysis, distributed systems, and observability
  • Oversee incident management with strong communication and collaboration
  • Maintain hands-on knowledge of Coralogix architecture and tooling
  • Drive continuous improvement to streamline workflows and reduce escalations
  • Enhance productivity through better tools, processes, and automation
  • Fulltime
Read More
Arrow Right

Site Reliability Engineering Support Lead

Site Reliability Engineering Support Lead role focused on application support, d...
Location
Location
Ireland , Dublin
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Solid SRE process experience
  • 5+ years of Leading high-performance, 24x7, DevOps or SysOps team
  • Proficiency in Windows administration, Office 365, Exchange, SharePoint, Active Directory, Backup, Networking and Infrastructure
  • Experience with Microsoft OS Windows & Server
  • Experience in ticket tracking and resolving on time
  • Hands-on experience on ticketing tools (ServiceNow)
  • Excellent verbal, written, presentation and interpersonal communication skills
  • Ability to make complex technical matters easy-to-comprehend for non-technical persons.
Job Responsibility
Job Responsibility
  • Taking end-to-end Ownership of Application Support for Production Systems Issues resolution
  • Implementing, monitoring, and maintaining CI/CD frameworks
  • Developing new capabilities, coordinating implementation across a large number of teams including infrastructure, developer tools and information security
  • Influencing a culture of Site Reliability Engineering. Engaging in training and mentoring to help develop other engineers with SRE mind set
  • Providing the first line of after-deployment technical support at L1 and L2 level for applications and and/or associated production systems diagnostics, and network health monitoring
  • Coordination and/or for deploying hands-on fixes, patches and software updates at the application level, and as appropriate at the network level
  • Managing a team of technical support engineers who provide technical support to users
  • Escalating complex problems to the L3 level of expertise within organization, along with observations from investigative and diagnostic assessments
  • Co-ordinating in the investigation of repeated technical issues affecting user system and seeing through to resolution
  • Escalating, resolving, guiding team, and tracking production incidents to closure
What we offer
What we offer
  • Competitive base salary (which is annually reviewed)
  • Hybrid working model (up to 2 days working at home per week)
  • Additional benefits to support you and your family to be well, live well and save well.
  • Fulltime
Read More
Arrow Right

FX Applications Support Senior Analyst

As an OpsTech Application Support Analyst, the candidate will play a pivotal rol...
Location
Location
Australia , Sydney
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5-8 years experience in an Application Support role
  • experience installing, configuring or supporting business applications
  • experience with some programming languages and willingness/ability to learn
  • advanced execution capabilities and ability to adjust quickly to changes and re-prioritization
  • effective written and verbal communications including ability to explain technical issues in simple terms that non-IT staff can understand
  • demonstrated analytical skills
  • issue tracking and reporting using tools
  • knowledge/experience of problem Management Tools
  • good all-round technical skills
  • effectively share information with other support team members and with other technology teams
Job Responsibility
Job Responsibility
  • Provide technical and business support for users of Citi Applications
  • maintain application systems
  • manage, maintain and support applications
  • perform start of day checks, continuous monitoring, and regional handover
  • develop and maintain technical support documentation
  • maximize the potential of applications
  • assess risk and impact of production issues and escalate
  • ensure storage and archiving procedures are functioning correctly
  • formulate and define scope and objectives for complex application enhancements
  • prioritize bug fixes and support tooling requirements
What we offer
What we offer
  • Rewarding work in a supportive environment
  • clear opportunities for progression
  • exciting company benefits
  • Fulltime
Read More
Arrow Right

Lead SRE

We are looking for a Lead SRE to join our Inetum Team and be part of a work cult...
Location
Location
Portugal , Lisbon
Salary
Salary:
Not provided
https://www.inetum.com Logo
Inetum
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • SRE IT production processes
  • Agile / DevOps Mindset Problem Solving
  • Scripting: Python, YML, Shell
  • Monitoring: Dynatrace, Nagios
  • Linux
  • Admin Network (DNS, Firewall, Switch)
  • DevOps stack: Git & Git Flow, Artifactory, Jenkins or Gitlab CI, Ansible Tower, Digital ai Release
  • Cloud: Kubernetes, Docker, Argo CD, ArgoCD, Vault, Helm
  • End-to-end IT organization and processes (from development to run / operate)
  • Technical Architecture
Job Responsibility
Job Responsibility
  • Train SREs and their managers on SRE practices
  • Co-construct the transformation strategy and the support plan by participating in workshops, brainstorming with the transformation team and producing training content
  • Coach and support
  • Fulltime
Read More
Arrow Right

Mainframe Developer with Vision Plus

Role- Mainframe Developer with Vision Plus
Location
Location
Canada , Toronto
Salary
Salary:
110000.00 USD / Year
realign-llc.com Logo
Realign
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Mainframe DB2 - Application Development and Vision Plus
  • Provide SRE Production Support for critical systems
  • Troubleshoot and resolve production issues
  • Collaborate with development teams to enhance system scalability and reliability
  • Optimize database queries and ensure efficient data management
  • Participate in on-call rotations
  • Overall 8+ Years of experience in IT
  • 4+ years hands on experience in Vision Plus
  • Responsible for monitoring and providing solutions for Production and UAT Incidents
  • Hands-on experience of one or more of the sub-systems in Vision PLUS (CMS Posting, FAS, TRAMS, VMX)
Job Responsibility
Job Responsibility
  • Provide SRE Production Support for critical systems, ensuring high availability and reliability
  • Troubleshoot and resolve production issues to minimize downtime and improve system performance
  • Collaborate with development teams to enhance system scalability and reliability
  • Optimize database queries and ensure efficient data management
  • Participate in on-call rotations to address production incidents promptly
  • Responsible for monitoring and providing solutions for Production and UAT Incidents
  • Should be able to work independently and escalate if issue is not resolved on time
  • Collaborate with multiple teams to provide technical know-how, and solutions to complex business problems
  • Assist in resolving complex issues and incidents during implementation, testing and in production
  • Fulltime
Read More
Arrow Right

Vice President, Applications Support Technology Lead Analyst

The Apps Support Lead Analyst is a seasoned professional role providing Level 2 ...
Location
Location
Japan , Chiyoda, Tokyo
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5–10 years’ experience in L2 application production support in a securities/investment bank or financial services trading environment
  • Demonstrated experience providing trade floor support to Front/Middle Office users in Equities or capital markets
  • Excellent business-level English communication (written/verbal)
  • Japanese language capability desirable
  • Proven ability to prioritize and multi-task effectively under extreme time pressure in a real-time trading environment
  • Strong diagnostic skills including analysis of application/server logs, GC logs, thread/heap dumps, and traces to identify root cause and mitigations
  • Hands-on experience with monitoring/alerting platforms (e.g., ITRS Geneos, Grafana, or equivalents)
  • Working knowledge of Change Management and deployment practices, including CI/CD pipelines and rollback procedures
  • Experience with middleware messaging technologies (IBM MQ, Solace, Kafka, Tibco EMS, or similar)
  • Familiarity with incident/problem management tooling (e.g., ServiceNow/JIRA) and structured RCA/problem management
Job Responsibility
Job Responsibility
  • Provide Level 2 production support for Equities trading applications, acting as the primary technical escalation point for trading-impacting incidents
  • Respond to critical incidents during market hours, executing rapid diagnosis and restoration activities to minimize business disruption
  • Maintain hands-on trade floor coverage, delivering direct support to Front/Middle Office users in a high-pressure, real-time environment
  • Serve as a key liaison between business users and Technology (development, infrastructure, vendors), ensuring timely triage, escalation, and resolution
  • Perform deep technical troubleshooting across applications and environments, including analysis of logs and runtime evidence to identify root cause and remediation paths
  • Proactively monitor production using enterprise tooling (e.g., ITRS Geneos) to detect anomalies and prevent outages
  • Execute operational routines including start-of-day checks, continuous monitoring, and regional handover to support global coverage
  • Support production integrity activities, including same-day risk reconciliations and data consistency validation across trading systems
  • Manage change, deployment, and release execution using CI/CD and Change Management controls, including rollback readiness and zero-impact implementation practices
  • Drive service stability through post-incident review, problem management input, and continuous improvement initiatives across stability/efficiency/effectiveness
  • Fulltime
Read More
Arrow Right

Java SRE

Location
Location
United States , Phoenix
Salary
Salary:
117000.00 USD / Year
realign-llc.com Logo
Realign
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Core Java
  • Splunk
  • Kibana
  • Grafana
  • Databases: Postgres, MongoDB
  • Experience in Production support engineering or SRE roles, preferably within the banking industry
  • Skilled in L1/L2 support, debugging, performance monitoring, and working in Agile/Scrum environments
  • Hands-on with ServiceNow, Spring Boot, REST APIs, and CI/CD pipelines
  • Strong knowledge of cloud services
Job Responsibility
Job Responsibility
  • Excellent problem-solving skills and the ability to work under pressure in a fast-paced environment
  • Monitor and maintain the health, availability, and performance of production systems and applications
  • Troubleshoot and resolve production incidents, ensuring minimal downtime and service disruption
  • Identifying Defects and working with Dev to get them fixed based on priority
  • Taking care of implementation of RFCs
  • Doing pre and post validation of servers during traffic diversion
  • Collaborate with engineering teams to implement reliability best practices and improve system performance
  • Develop and maintain monitoring alerts and dashboards to ensure visibility into system metrics
  • Participate in on-call rotation and provide timely support for high-impact incidents
  • Implement automation tools and processes to streamline operations and reduce manual workloads
  • Fulltime
Read More
Arrow Right

Vice President, Applications Support Technology Lead Analyst

The Apps Support Lead Analyst is a seasoned professional role providing Level 2 ...
Location
Location
Japan , Chiyoda, Tokyo
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5–10 years’ experience in L2 application production support in a securities/investment bank or financial services trading environment
  • Demonstrated experience providing trade floor support to Front Office users in Equities or capital markets
  • Excellent business-level English communication (written/verbal)
  • Japanese language capability desirable
  • Proven ability to prioritize and multi-task effectively under extreme time pressure in a real-time trading environment
  • Strong diagnostic skills including analysis of application/server logs, GC logs, thread/heap dumps, and traces
  • Hands-on experience with monitoring/alerting platforms (e.g., ITRS Geneos, Grafana, or equivalents)
  • Working knowledge of Change Management and deployment practices, including CI/CD pipelines and rollback procedures
  • Experience with middleware messaging technologies (IBM MQ, Solace, Kafka, Tibco EMS, or similar)
  • Familiarity with incident/problem management tooling (e.g., ServiceNow/JIRA) and structured RCA/problem management
Job Responsibility
Job Responsibility
  • Provide Level 2 production support for Equities trading applications, acting as the primary technical escalation point for trading-impacting incidents
  • Respond to critical incidents during market hours, executing rapid diagnosis and restoration activities to minimize business disruption
  • Maintain hands-on trade floor coverage, delivering direct support to Front Office users in a high-pressure, real-time environment
  • Serve as a key liaison between business users and Technology (development, infrastructure, vendors), ensuring timely triage, escalation, and resolution
  • Perform deep technical troubleshooting across applications and environments, including analysis of logs and runtime evidence to identify root cause and remediation paths
  • Proactively monitor production using enterprise tooling (e.g., ITRS Geneos) to detect anomalies and prevent outages
  • Execute operational routines including start-of-day checks, continuous monitoring, and regional handover to support global coverage
  • Support production integrity activities, including same-day risk reconciliations and data consistency validation across trading systems
  • Manage change, deployment, and release execution using CI/CD and Change Management controls, including rollback readiness and zero-impact implementation practices
  • Drive service stability through post-incident review, problem management input, and continuous improvement initiatives across stability/efficiency/effectiveness
  • Fulltime
Read More
Arrow Right