CrawlJobs Logo

SRE Production Support

selectmindsllc.com Logo

Select Minds

Location Icon

Location:
United States , Livonia

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

We’re passionate about building software that solves problems. We count on our site reliability engineers (SREs) to empower users with a rich feature set, high availability, and stellar performance level to pursue their missions. As we expand customer deployments, we’re seeking an experienced SRE to deliver insights from massive-scale data in real time. Specifically, we’re searching for someone who has fresh ideas and a unique viewpoint, and who enjoys collaborating with a cross-functional team to develop real-world solutions and positive user experiences for every interaction.

Job Responsibility:

  • Monitoring and reporting on application behavior analytics, conducts smart triage by identifying, diagnosing, and coordinating resolution of performance problems before they impact end users, and participates in rapid root cause diagnosis of problems occurring within the application and infrastructure
  • Identifying the functional domain in which problems reside (Server Utilization, network Saturation, Application Tuning)
  • Participating in all Major Incident Management and Root Cause Analysis calls and provides expert troubleshooting support as needed
  • Understanding of troubleshooting, incidents and problems, work to resolve issues timely and determine fault or underlying issue. Work with both customer and vendor personnel
  • Monitoring high value Business-centric transactions and manages response actions
  • Maintaining accurate documentation for assigned workspace and procedures, updating procedures including, but not limited to software, hardware layers
  • Understand and utilize de-escalation techniques when working with difficult customers
  • Monitoring Application infrastructure and network through monitoring tools like Splunk, AppDynamics, Dynatrace
  • Proactively detects, reports, logs, and responds to all network performance and availability problems in each part of the Application
  • Follows incident, problem and change management processes related to technology infrastructure being supported. Reviews system requirements and application dependencies to determine monitoring configuration
  • Involving in creating documentation

Requirements:

  • Master’s degree in Computer Science or related discipline
  • Ability to program (structured and OOP) using one or more high-level languages, such as Python, Java, C/C++, Ruby, and JavaScript
  • 5 to 6 years of experience in Production Support
  • Minimum 6+ years of professional experience in SRE Production Support
  • Experience with NFS, HDFS, Ceph, and Amazon S3, as well as dynamic resource management frameworks (Apache Mesos, Kubernetes, Yarn)
  • Must Provide 24×7 support on the production servers on a rotation basis

Additional Information:

Job Posted:
December 11, 2025

Employment Type:
Fulltime
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for SRE Production Support

Head of Support

Coralogix is a modern, full-stack observability platform transforming how busine...
Location
Location
Israel , Ramat Gan
Salary
Salary:
Not provided
coralogix.com Logo
Coralogix
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience in technical support, DevOps, SRE, or similar roles
  • Strong knowledge of AWS/Azure/GCP and Kubernetes ecosystems
  • Familiarity with observability tools (Kibana, Grafana, Prometheus, Datadog, Splunk, ELK)
  • Hands-on experience with Kubernetes, Docker, and distributed systems
  • Proficiency with ELK concepts, RegEx, Lucene, and PromQL
  • Proven leadership of global/multi-regional support teams (35+ people)
  • Strong incident management and escalation-handling skills
  • Ability to optimize support operations, workflows, and tooling
  • Strong analytical and data-driven decision-making abilities
  • Excellent communicator with technical and non-technical audiences
Job Responsibility
Job Responsibility
  • Lead and coach global Technical Support Engineering teams
  • Ensure high-quality support with improvements in CSAT, response/resolution times, backlog, and KPIs
  • Maintain clear global processes and standards
  • Align with regional leads for coverage across time zones
  • Act as the senior escalation point for complex issues
  • Guide engineers in root cause analysis, distributed systems, and observability
  • Oversee incident management with strong communication and collaboration
  • Maintain hands-on knowledge of Coralogix architecture and tooling
  • Drive continuous improvement to streamline workflows and reduce escalations
  • Enhance productivity through better tools, processes, and automation
  • Fulltime
Read More
Arrow Right
New

Platform Support Technology Director

Location
Location
United States , New York; Iselin
Salary
Salary:
215000.00 - 355000.00 USD / Year
https://www.wellsfargo.com/ Logo
Wells Fargo
Expiration Date
May 30, 2026
Flip Icon
Requirements
Requirements
  • 10+ years of Technology Strategic Leadership experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
  • 4+ years of management or leadership experience
  • 7+ years of hands-on production support and SRE engineering experience, including monitoring instrumentation, production deployments, incident management, and problem management
  • 8+ years total in Production Support, SRE, or Production Engineering roles
  • 5+ years of leadership experience managing teams of 5–10 engineers in high‑availability trading or market‑facing environments
  • 8+ years of equity finance, trading, prime brokerage, or similar capital markets workflows
Job Responsibility
Job Responsibility
  • Manage a team of engineering managers and engineering leads
  • Focus on delivering commitments aligned to enterprise strategic priorities
  • Build support for strategies with business and technology leaders
  • Serve as the primary production technology leader for Equity Prime Finance
  • Lead, mentor, and develop a team of 5–10 production engineers/SREs, ensuring skills growth, operational rigor, and high performance
  • Drive true SRE practices, including automation, observability, SLIs/SLOs, error budgets, root-cause reduction, and reliability engineering
  • Guide development of actionable roadmaps and plans
  • Identify opportunities and strategies for continuous improvement of software engineering practices
  • Provide oversight to software craftsmanship, security, availability, resilience, and scalability of solutions developed by the teams or third-party providers
  • Identify financial management and strategic resourcing
What we offer
What we offer
  • Health benefits
  • 401(k) Plan
  • Paid time off
  • Disability benefits
  • Life insurance, critical illness insurance, and accident insurance
  • Parental leave
  • Critical caregiving leave
  • Discounts and savings
  • Commuter benefits
  • Tuition reimbursement
  • Fulltime
Read More
Arrow Right

Site Reliability Engineering Support Lead

Site Reliability Engineering Support Lead role focused on application support, d...
Location
Location
Ireland , Dublin
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Solid SRE process experience
  • 5+ years of Leading high-performance, 24x7, DevOps or SysOps team
  • Proficiency in Windows administration, Office 365, Exchange, SharePoint, Active Directory, Backup, Networking and Infrastructure
  • Experience with Microsoft OS Windows & Server
  • Experience in ticket tracking and resolving on time
  • Hands-on experience on ticketing tools (ServiceNow)
  • Excellent verbal, written, presentation and interpersonal communication skills
  • Ability to make complex technical matters easy-to-comprehend for non-technical persons.
Job Responsibility
Job Responsibility
  • Taking end-to-end Ownership of Application Support for Production Systems Issues resolution
  • Implementing, monitoring, and maintaining CI/CD frameworks
  • Developing new capabilities, coordinating implementation across a large number of teams including infrastructure, developer tools and information security
  • Influencing a culture of Site Reliability Engineering. Engaging in training and mentoring to help develop other engineers with SRE mind set
  • Providing the first line of after-deployment technical support at L1 and L2 level for applications and and/or associated production systems diagnostics, and network health monitoring
  • Coordination and/or for deploying hands-on fixes, patches and software updates at the application level, and as appropriate at the network level
  • Managing a team of technical support engineers who provide technical support to users
  • Escalating complex problems to the L3 level of expertise within organization, along with observations from investigative and diagnostic assessments
  • Co-ordinating in the investigation of repeated technical issues affecting user system and seeing through to resolution
  • Escalating, resolving, guiding team, and tracking production incidents to closure
What we offer
What we offer
  • Competitive base salary (which is annually reviewed)
  • Hybrid working model (up to 2 days working at home per week)
  • Additional benefits to support you and your family to be well, live well and save well.
  • Fulltime
Read More
Arrow Right

FX Applications Support Senior Analyst

As an OpsTech Application Support Analyst, the candidate will play a pivotal rol...
Location
Location
Australia , Sydney
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5-8 years experience in an Application Support role
  • experience installing, configuring or supporting business applications
  • experience with some programming languages and willingness/ability to learn
  • advanced execution capabilities and ability to adjust quickly to changes and re-prioritization
  • effective written and verbal communications including ability to explain technical issues in simple terms that non-IT staff can understand
  • demonstrated analytical skills
  • issue tracking and reporting using tools
  • knowledge/experience of problem Management Tools
  • good all-round technical skills
  • effectively share information with other support team members and with other technology teams
Job Responsibility
Job Responsibility
  • Provide technical and business support for users of Citi Applications
  • maintain application systems
  • manage, maintain and support applications
  • perform start of day checks, continuous monitoring, and regional handover
  • develop and maintain technical support documentation
  • maximize the potential of applications
  • assess risk and impact of production issues and escalate
  • ensure storage and archiving procedures are functioning correctly
  • formulate and define scope and objectives for complex application enhancements
  • prioritize bug fixes and support tooling requirements
What we offer
What we offer
  • Rewarding work in a supportive environment
  • clear opportunities for progression
  • exciting company benefits
  • Fulltime
Read More
Arrow Right

Lead SRE

We are looking for a Lead SRE to join our Inetum Team and be part of a work cult...
Location
Location
Portugal , Lisbon
Salary
Salary:
Not provided
https://www.inetum.com Logo
Inetum
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • SRE IT production processes
  • Agile / DevOps Mindset Problem Solving
  • Scripting: Python, YML, Shell
  • Monitoring: Dynatrace, Nagios
  • Linux
  • Admin Network (DNS, Firewall, Switch)
  • DevOps stack: Git & Git Flow, Artifactory, Jenkins or Gitlab CI, Ansible Tower, Digital ai Release
  • Cloud: Kubernetes, Docker, Argo CD, ArgoCD, Vault, Helm
  • End-to-end IT organization and processes (from development to run / operate)
  • Technical Architecture
Job Responsibility
Job Responsibility
  • Train SREs and their managers on SRE practices
  • Co-construct the transformation strategy and the support plan by participating in workshops, brainstorming with the transformation team and producing training content
  • Coach and support
  • Fulltime
Read More
Arrow Right

Mainframe Developer with Vision Plus

Role- Mainframe Developer with Vision Plus
Location
Location
Canada , Toronto
Salary
Salary:
110000.00 USD / Year
realign-llc.com Logo
Realign
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Mainframe DB2 - Application Development and Vision Plus
  • Provide SRE Production Support for critical systems
  • Troubleshoot and resolve production issues
  • Collaborate with development teams to enhance system scalability and reliability
  • Optimize database queries and ensure efficient data management
  • Participate in on-call rotations
  • Overall 8+ Years of experience in IT
  • 4+ years hands on experience in Vision Plus
  • Responsible for monitoring and providing solutions for Production and UAT Incidents
  • Hands-on experience of one or more of the sub-systems in Vision PLUS (CMS Posting, FAS, TRAMS, VMX)
Job Responsibility
Job Responsibility
  • Provide SRE Production Support for critical systems, ensuring high availability and reliability
  • Troubleshoot and resolve production issues to minimize downtime and improve system performance
  • Collaborate with development teams to enhance system scalability and reliability
  • Optimize database queries and ensure efficient data management
  • Participate in on-call rotations to address production incidents promptly
  • Responsible for monitoring and providing solutions for Production and UAT Incidents
  • Should be able to work independently and escalate if issue is not resolved on time
  • Collaborate with multiple teams to provide technical know-how, and solutions to complex business problems
  • Assist in resolving complex issues and incidents during implementation, testing and in production
  • Fulltime
Read More
Arrow Right

Vice President, Applications Support Technology Lead Analyst

The Apps Support Lead Analyst is a seasoned professional role providing Level 2 ...
Location
Location
Japan , Chiyoda, Tokyo
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5–10 years’ experience in L2 application production support in a securities/investment bank or financial services trading environment
  • Demonstrated experience providing trade floor support to Front/Middle Office users in Equities or capital markets
  • Excellent business-level English communication (written/verbal)
  • Japanese language capability desirable
  • Proven ability to prioritize and multi-task effectively under extreme time pressure in a real-time trading environment
  • Strong diagnostic skills including analysis of application/server logs, GC logs, thread/heap dumps, and traces to identify root cause and mitigations
  • Hands-on experience with monitoring/alerting platforms (e.g., ITRS Geneos, Grafana, or equivalents)
  • Working knowledge of Change Management and deployment practices, including CI/CD pipelines and rollback procedures
  • Experience with middleware messaging technologies (IBM MQ, Solace, Kafka, Tibco EMS, or similar)
  • Familiarity with incident/problem management tooling (e.g., ServiceNow/JIRA) and structured RCA/problem management
Job Responsibility
Job Responsibility
  • Provide Level 2 production support for Equities trading applications, acting as the primary technical escalation point for trading-impacting incidents
  • Respond to critical incidents during market hours, executing rapid diagnosis and restoration activities to minimize business disruption
  • Maintain hands-on trade floor coverage, delivering direct support to Front/Middle Office users in a high-pressure, real-time environment
  • Serve as a key liaison between business users and Technology (development, infrastructure, vendors), ensuring timely triage, escalation, and resolution
  • Perform deep technical troubleshooting across applications and environments, including analysis of logs and runtime evidence to identify root cause and remediation paths
  • Proactively monitor production using enterprise tooling (e.g., ITRS Geneos) to detect anomalies and prevent outages
  • Execute operational routines including start-of-day checks, continuous monitoring, and regional handover to support global coverage
  • Support production integrity activities, including same-day risk reconciliations and data consistency validation across trading systems
  • Manage change, deployment, and release execution using CI/CD and Change Management controls, including rollback readiness and zero-impact implementation practices
  • Drive service stability through post-incident review, problem management input, and continuous improvement initiatives across stability/efficiency/effectiveness
  • Fulltime
Read More
Arrow Right
New

Credit Risk Support Lead- SRE

Join Barclays as a Credit Risk Support Lead- SRE role, where to effectively moni...
Location
Location
India , Pune
Salary
Salary:
Not provided
barclays.co.uk Logo
Barclays
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 14+ years’ experience in production support
  • High energy, hands-on and results & goal-oriented
  • Expertise in log debugging, root cause analysis and troubleshooting live issues
  • Experience on observability tools like ESaaS, AppD / ITRS , Netcool
  • Experience in data analysis to identify underlying themes impacting stability, performance, and customer experience
  • Ensures and promotes ITIL best practices for Incident, Problem, Change, Release management (including managing and running triages, conducting root cause analysis, post incident reviews etc)
  • Strong Credit Risk business knowledge
  • Negotiate SLAs/OLAs with customer and other support elements
  • Business (IT) Continuity Management
  • KPI reporting and monitoring
Job Responsibility
Job Responsibility
  • Provision of technical support for the service management function to resolve more complex issues for a specific client of group of clients. Develop the support model and service offering to improve the service to customers and stakeholders.
  • Execution of preventative maintenance tasks on hardware and software and utilisation of monitoring tools/metrics to identify, prevent and address potential issues and ensure optimal performance.
  • Maintenance of a knowledge base containing detailed documentation of resolved cases for future reference, self-service opportunities and knowledge sharing.
  • Analysis of system logs, error messages and user reports to identify the root causes of hardware, software and network issues, and providing a resolution to these issues by fixing or replacing faulty hardware components, reinstalling software, or applying configuration changes.
  • Automation, monitoring enhancements, capacity management, resiliency, business continuity management, front office specific support and stakeholder management.
  • Identification and remediation or raising, through appropriate process, of potential service impacting risks and issues.
  • Proactively assess support activities implementing automations where appropriate to maintain stability and drive efficiency. Actively tune monitoring tools, thresholds, and alerting to ensure issues are known when they occur.
What we offer
What we offer
  • Competitive holiday allowance
  • Life assurance
  • Private medical care
  • Pension contribution
  • Fulltime
Read More
Arrow Right