Site Reliability Engineer Job at Apex Systems (Irving)

Job Description

As part of tech modernization and cloud migration, digital applications are undergoing migration to Azure cloud environment. The applications are needed to be performance tested with the required tunable to be resilient enough.

Job Responsibility

Performance Testing: Design and execute performance tests to evaluate the system's responsiveness, stability, scalability, and resource usage
Identify performance bottlenecks and provide recommendations for improvements
Analyze test results and generate detailed performance reports
Resiliency Testing: Conduct resiliency tests to ensure the system can handle failures and recover gracefully
Implement and test failure scenarios to validate the system's fault tolerance
Recommend and validate resiliency patterns such as circuit breakers, bulkheads, and retries
Performance Monitoring: Set up and maintain performance monitoring tools to continuously track system performance
Analyze performance metrics and logs to detect and diagnose performance issues in real-time
Capacity Planning: Perform capacity planning to ensure the system can handle expected and peak loads
Provide recommendations for scaling resources based on performance data and future growth projections
Performance Optimization: Collaborate with development and operations teams to optimize code, database queries, and infrastructure configurations
Recommend best practices for performance tuning and optimization
Kubernetes Performance Parameters: Recommend and configure performance parameters for Kubernetes clusters, such as resource limits, requests, and autoscaling policies
Ensure optimal performance of containerized applications running in Kubernetes environments
Resiliency Patterns: Recommend and implement resiliency patterns like circuit breakers, rate limiters, and fallback mechanisms to enhance system reliability
Validate the effectiveness of these patterns through testing and monitoring
Documentation and Training: Document performance testing methodologies, tools, and best practices
Provide training and support to development and operations teams on performance and resiliency best practices
Continuous Improvement: Continuously evaluate and improve performance testing and monitoring processes
Stay updated with the latest performance engineering tools, techniques, and industry trends

Requirements

Experience with containerization technologies like Docker
String scripting skills in languages such as Bash, Python
Effective problem-solving and analytical skills
Must be familiar with observability and APM tools like Splunk, ELK, AppDynamics etc
Good understanding of Architecture patterns and resiliency
Programing experience in Java and Spring boot
Strong microservices application support experience
Proficient understanding of algorithms, data structures, architectural design patterns and best practices
Experience with Cloud is required

Nice to have

Experience working applications using Kubernetes platform is preferred
Understanding of networking concepts, including DNS, load balancing, firewalls, and VPNs

What we offer

Medical, dental, vision, life, disability, and other insurance plans
ESPP (employee stock purchase program)
401K program with company match after 12 months
HSA (Health Savings Account on the HDHP plan)
SupportLinc Employee Assistance Program (EAP) with up to 8 free counseling sessions
Corporate discount savings program
On-demand training program
Access to certification prep and a library of technical and leadership courses/books/seminars after 6+ months
Certification discounts
Dedicated customer service team
Certified Career Coach

Apex Systems - All Job Offers

Select Country

Site Reliability Engineer

Job Description

Job Responsibility

Requirements

Nice to have

What we offer

Looking for more opportunities?

Site Reliability Engineer

Site Reliability Engineer

Site Reliability Engineer

Site Reliability Engineer

Site Reliability Engineer

Site Reliability Engineer

Site Reliability Engineer

Site Reliability Engineer

Site Reliability Engineer

Our AI answers in your language