CrawlJobs Logo

SRE Production Support

selectmindsllc.com Logo

Select Minds

Location Icon

Location:
United States , Livonia

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

We’re passionate about building software that solves problems. We count on our site reliability engineers (SREs) to empower users with a rich feature set, high availability, and stellar performance level to pursue their missions. As we expand customer deployments, we’re seeking an experienced SRE to deliver insights from massive-scale data in real time. Specifically, we’re searching for someone who has fresh ideas and a unique viewpoint, and who enjoys collaborating with a cross-functional team to develop real-world solutions and positive user experiences for every interaction.

Job Responsibility:

  • Monitoring and reporting on application behavior analytics, conducts smart triage by identifying, diagnosing, and coordinating resolution of performance problems before they impact end users, and participates in rapid root cause diagnosis of problems occurring within the application and infrastructure
  • Identifying the functional domain in which problems reside (Server Utilization, network Saturation, Application Tuning)
  • Participating in all Major Incident Management and Root Cause Analysis calls and provides expert troubleshooting support as needed
  • Understanding of troubleshooting, incidents and problems, work to resolve issues timely and determine fault or underlying issue. Work with both customer and vendor personnel
  • Monitoring high value Business-centric transactions and manages response actions
  • Maintaining accurate documentation for assigned workspace and procedures, updating procedures including, but not limited to software, hardware layers
  • Understand and utilize de-escalation techniques when working with difficult customers
  • Monitoring Application infrastructure and network through monitoring tools like Splunk, AppDynamics, Dynatrace
  • Proactively detects, reports, logs, and responds to all network performance and availability problems in each part of the Application
  • Follows incident, problem and change management processes related to technology infrastructure being supported. Reviews system requirements and application dependencies to determine monitoring configuration
  • Involving in creating documentation

Requirements:

  • Master’s degree in Computer Science or related discipline
  • Ability to program (structured and OOP) using one or more high-level languages, such as Python, Java, C/C++, Ruby, and JavaScript
  • 5 to 6 years of experience in Production Support
  • Minimum 6+ years of professional experience in SRE Production Support
  • Experience with NFS, HDFS, Ceph, and Amazon S3, as well as dynamic resource management frameworks (Apache Mesos, Kubernetes, Yarn)
  • Must Provide 24×7 support on the production servers on a rotation basis

Additional Information:

Job Posted:
December 11, 2025

Employment Type:
Fulltime
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for SRE Production Support

Head of Support

Coralogix is a modern, full-stack observability platform transforming how busine...
Location
Location
Israel , Ramat Gan
Salary
Salary:
Not provided
coralogix.com Logo
Coralogix
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience in technical support, DevOps, SRE, or similar roles
  • Strong knowledge of AWS/Azure/GCP and Kubernetes ecosystems
  • Familiarity with observability tools (Kibana, Grafana, Prometheus, Datadog, Splunk, ELK)
  • Hands-on experience with Kubernetes, Docker, and distributed systems
  • Proficiency with ELK concepts, RegEx, Lucene, and PromQL
  • Proven leadership of global/multi-regional support teams (35+ people)
  • Strong incident management and escalation-handling skills
  • Ability to optimize support operations, workflows, and tooling
  • Strong analytical and data-driven decision-making abilities
  • Excellent communicator with technical and non-technical audiences
Job Responsibility
Job Responsibility
  • Lead and coach global Technical Support Engineering teams
  • Ensure high-quality support with improvements in CSAT, response/resolution times, backlog, and KPIs
  • Maintain clear global processes and standards
  • Align with regional leads for coverage across time zones
  • Act as the senior escalation point for complex issues
  • Guide engineers in root cause analysis, distributed systems, and observability
  • Oversee incident management with strong communication and collaboration
  • Maintain hands-on knowledge of Coralogix architecture and tooling
  • Drive continuous improvement to streamline workflows and reduce escalations
  • Enhance productivity through better tools, processes, and automation
  • Fulltime
Read More
Arrow Right

Site Reliability Engineering Support Lead

Site Reliability Engineering Support Lead role focused on application support, d...
Location
Location
Ireland , Dublin
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Solid SRE process experience
  • 5+ years of Leading high-performance, 24x7, DevOps or SysOps team
  • Proficiency in Windows administration, Office 365, Exchange, SharePoint, Active Directory, Backup, Networking and Infrastructure
  • Experience with Microsoft OS Windows & Server
  • Experience in ticket tracking and resolving on time
  • Hands-on experience on ticketing tools (ServiceNow)
  • Excellent verbal, written, presentation and interpersonal communication skills
  • Ability to make complex technical matters easy-to-comprehend for non-technical persons.
Job Responsibility
Job Responsibility
  • Taking end-to-end Ownership of Application Support for Production Systems Issues resolution
  • Implementing, monitoring, and maintaining CI/CD frameworks
  • Developing new capabilities, coordinating implementation across a large number of teams including infrastructure, developer tools and information security
  • Influencing a culture of Site Reliability Engineering. Engaging in training and mentoring to help develop other engineers with SRE mind set
  • Providing the first line of after-deployment technical support at L1 and L2 level for applications and and/or associated production systems diagnostics, and network health monitoring
  • Coordination and/or for deploying hands-on fixes, patches and software updates at the application level, and as appropriate at the network level
  • Managing a team of technical support engineers who provide technical support to users
  • Escalating complex problems to the L3 level of expertise within organization, along with observations from investigative and diagnostic assessments
  • Co-ordinating in the investigation of repeated technical issues affecting user system and seeing through to resolution
  • Escalating, resolving, guiding team, and tracking production incidents to closure
What we offer
What we offer
  • Competitive base salary (which is annually reviewed)
  • Hybrid working model (up to 2 days working at home per week)
  • Additional benefits to support you and your family to be well, live well and save well.
  • Fulltime
Read More
Arrow Right

FX Applications Support Senior Analyst

As an OpsTech Application Support Analyst, the candidate will play a pivotal rol...
Location
Location
Australia , Sydney
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5-8 years experience in an Application Support role
  • experience installing, configuring or supporting business applications
  • experience with some programming languages and willingness/ability to learn
  • advanced execution capabilities and ability to adjust quickly to changes and re-prioritization
  • effective written and verbal communications including ability to explain technical issues in simple terms that non-IT staff can understand
  • demonstrated analytical skills
  • issue tracking and reporting using tools
  • knowledge/experience of problem Management Tools
  • good all-round technical skills
  • effectively share information with other support team members and with other technology teams
Job Responsibility
Job Responsibility
  • Provide technical and business support for users of Citi Applications
  • maintain application systems
  • manage, maintain and support applications
  • perform start of day checks, continuous monitoring, and regional handover
  • develop and maintain technical support documentation
  • maximize the potential of applications
  • assess risk and impact of production issues and escalate
  • ensure storage and archiving procedures are functioning correctly
  • formulate and define scope and objectives for complex application enhancements
  • prioritize bug fixes and support tooling requirements
What we offer
What we offer
  • Rewarding work in a supportive environment
  • clear opportunities for progression
  • exciting company benefits
  • Fulltime
Read More
Arrow Right

Lead SRE

We are looking for a Lead SRE to join our Inetum Team and be part of a work cult...
Location
Location
Portugal , Lisbon
Salary
Salary:
Not provided
https://www.inetum.com Logo
Inetum
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • SRE IT production processes
  • Agile / DevOps Mindset Problem Solving
  • Scripting: Python, YML, Shell
  • Monitoring: Dynatrace, Nagios
  • Linux
  • Admin Network (DNS, Firewall, Switch)
  • DevOps stack: Git & Git Flow, Artifactory, Jenkins or Gitlab CI, Ansible Tower, Digital ai Release
  • Cloud: Kubernetes, Docker, Argo CD, ArgoCD, Vault, Helm
  • End-to-end IT organization and processes (from development to run / operate)
  • Technical Architecture
Job Responsibility
Job Responsibility
  • Train SREs and their managers on SRE practices
  • Co-construct the transformation strategy and the support plan by participating in workshops, brainstorming with the transformation team and producing training content
  • Coach and support
  • Fulltime
Read More
Arrow Right

API Production Support Lead (SRE)

At Citi, we’re passionate about building and maintaining highly reliable APIs th...
Location
Location
Canada , Mississauga, Ontario
Salary
Salary:
94300.00 - 141500.00 USD / Year
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Extensive experience supporting Java and J2EE based applications and tooling
  • Deep technical knowledge and hands-on experience supporting and troubleshooting environments including AWS, ECS, Oracle DB, and Mongo DB
  • A strong understanding and practical application of SRE concepts, particularly in defining and measuring SLIs, SLOs and Error Budgets
  • Demonstrated experience in building and utilizing comprehensive monitoring solutions such as AppDynamics, Splunk, Kibana to proactively alert on production API-related issues and ensure system health
  • In-depth knowledge and hands-on experience with API Gateway technologies, specifically APIGEE, and CDN solutions like Akamai
  • Proven ability to proactively identify and address problems, areas for improvement, and performance bottlenecks within complex API ecosystems using software-based solutions
  • Strong coding experience beyond simple scripts, preferably in Java or Python, for automation and internal tool development
  • Bachelor’s/University degree in Computer Science, Engineering, or a related field
Job Responsibility
Job Responsibility
  • Champion stability initiatives to enable high availability and resilience for our API applications
  • Exhibit calm and analytical leadership when faced with major incidents on critical API systems
  • Lead the proactive monitoring and management of production API environments
  • Drive the definition, analysis, and reporting of SLIs and SLOs for all supported APIs and clients
  • Contribute to the development and implementation of tools and systems designed to enhance API operational management
  • Measure and optimize API system performance
  • Provide leadership and expert operational support for critical, large-scale distributed API ecosystems
  • Lead the gathering and analysis of performance metrics from API platforms and underlying infrastructure
  • Partner closely with API development teams to improve services through rigorous operational feedback loops, testing, and release procedures
  • Drive the creation of sustainable API operational systems and services through automation and continuous uplifts
  • Fulltime
Read More
Arrow Right

Senior Product Manager - AppTrust

At JFrog, we’re reinventing DevOps to help the world’s greatest companies innova...
Location
Location
Israel , Netanya/Tel Aviv
Salary
Salary:
Not provided
jfrog.com Logo
JFrog
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in E2E Product Management, preferably in B2B products and SaaS platforms
  • Experience driving elements of the product development lifecycle such as product vision, go-to-market strategy, driving requirements, UX, and product launch
  • Experience with user-facing products
  • solid understanding of UX and product design
  • Technical experience in Engineering, DevOps, SRE, and Tech Support — a huge advantage
  • Experience in driving strategic initiatives in a cross-organization environment
  • Excellent analytical, interpersonal, and problem-solving skills
Job Responsibility
Job Responsibility
  • Own the full cycle of product development including ideation, competitive analysis, client validation, discovery with R&D, spec writing, launching and monitoring
  • Understand customer needs and gather product requirements, identify market opportunities, and define product vision and strategy
  • Work closely with multiple teams within the company to deliver a high-quality B2D product on schedule, including Sales, Support, Marketing, and Engineering
  • Master the product and lead the requirements through the full lifecycle, from ideation to development and launch
  • Build positive relationships and trust through strong cross-team interactions, and get buy-in for the product vision across internal and external stakeholders
  • Identify, design, experiment, and iterate product decisions by leveraging data and evidence gathered from customer usage and interviews, market research, and usage/adoption metrics
Read More
Arrow Right

Equities Electronic Trading Support / SRE

Embark on a transformative journey as an Equities Electronic Trading Support/SRE...
Location
Location
United States , New York
Salary
Salary:
120000.00 - 175000.00 USD / Year
barclays.co.uk Logo
Barclays
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Working in Unix/Linux environments for production support, troubleshooting, and performance analysis
  • Writing scripts in Bash and Python to automate operational tasks and improve system reliability
  • Supporting containerized applications using Kubernetes and Docker in production environments
  • Supporting electronic trading systems that use FIX messaging in low-latency environments
Job Responsibility
Job Responsibility
  • Development and delivery of high-quality software solutions by using industry aligned programming languages, frameworks, and tools. Ensuring that code is scalable, maintainable, and optimized for performance
  • Cross-functional collaboration with product managers, designers, and other engineers to define software requirements, devise solution strategies, and ensure seamless integration and alignment with business objectives
  • Collaboration with peers, participate in code reviews, and promote a culture of code quality and knowledge sharing
  • Stay informed of industry technology trends and innovations and actively contribute to the organization’s technology communities to foster a culture of technical excellence and growth
  • Adherence to secure coding practices to mitigate vulnerabilities, protect sensitive data, and ensure secure software solutions
  • Implementation of effective unit testing practices to ensure proper code design, readability, and reliability
What we offer
What we offer
  • Competitive holiday allowance
  • Life assurance
  • Private medical care
  • Pension contribution
  • Fulltime
Read More
Arrow Right

Sre design & support engineer

We are looking for a self-driven, software engineering mindset SRE engineer to •...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
pepsico.com Logo
Pepsico
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8-11 years of work experience evolving to a SRE engineer
  • 3-5 years of experience in continuously improving and transforming IT operations ways of working
  • Bachelor’s degree in Computer Science, Information Technology or a related field
  • Proven experience as an SRE in designing the events diagnostics, performance measures and alert solutions to meet the SLA/SLO/SLIs
  • The ideal Engineer will be highly quantitative, have great judgment, able to connect dots across ecosytems, and efficiently work cross-functionally across teams to ensure SRE orchestrating solutions are meeting customer/end-user expectations
  • The candidate will take a pragmatic approach resolving incidents, including the ability to systemically triangulate root causes and work effectively with external and internal teams to meet objectives
  • A strong expertise of SRE (Software Reliability Engineering) and IT Service Management (ITSM) processes with a track record for improving service offerings – pro-actively resolving incidents, providing a seamless customer/end-user experience and proactively identifying and mitigating areas of risk
  • Hands on experience in Python, SQL /No-SQl( MySQL, Mongo DB, Cassandra, Postgress), AppDynamics, ELK Stack Grafana, Splunk, Dynatrace, Kafka and any SRE Ops toolsets
  • A firm understanding of cloud archticture for distributed environments
  • Front-end technologies: HTML, CSS, JavaScript, and frameworks like React, Angular, or Vue.js
Job Responsibility
Job Responsibility
  • Engage & influence product and engineering teams during the design and development phases to embed reliability and operability into new services defining & enforce events, logging, monitoring, and observability standards across applications
  • Ensuring non-functional requirements (NFRs) are embedded early including SLA/SLO/SLI and error budgets into the product’s offerings as part of the engineering solution
  • Execute as Pro-active SRE Support engineer, preventing P1, P2, potential P3s, diagnosing any anomalies prior to any user and driving the necessary remediations across the teams involved in end-to-end ecosystem availability, performance and consumption of the cloud architected application ecosystem leveraging SRE Orchestration solutions
  • Collaborates with Engineering & support teams, including participation in escalations, , and blameless postmortems,
  • Work closely with customer-facing support teams to empower them with SRE insights and tooling
  • Observe, diagnose & improve the end-2-end ecosystem performance of the Modern architected application portfolio i.e. technical “understanding of interactions" of a full stack application alongside with peer SRE team member
  • Continuously optimize the L2/support operations work via SRE workflow automation
  • Shape the SRE orchestration platform design with inputs from Production Operations, Business usage & Product and engineering teams
  • Actively engage and drive AI Ops adoption across teams
Read More
Arrow Right