This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
FreeWheel is seeking an SRE to join Freewheel OPS team based in Denver, CO or Chicago, IL. As a member of the Global Operation team, you will be responsible for ensuring the reliability, scalability, and performance of Freewheel systems. Working closely with engineers and other operation sub-teams, you will manage infrastructure, optimize system reliability, automate daily operations, and resolve technical issues that impact upstream/downstream platform.
Job Responsibility:
System Monitoring and Optimization: Design and implement monitoring and alerting systems to ensure the stability, reliability, and performance of data platforms. Join in on-call shift to quickly respond to and resolve issues
Automation and Tool Development: Develop and maintain automation tools and scripts for deployment, monitoring, backup and disaster recovery
Performance Optimization: Analyze and optimize the performance of data storage, query performance, and data flows to ensure efficient processing of large-scale datasets, reduce latency, an improve processing speed
Incident Response and Troubleshooting: Respond quickly to platform failures, perform troubleshooting, and coordinate cross-team efforts to resolve issues and ensure high availability and reliability
Capacity Planning and Scaling: Work with engineering teams to analyze and forecast capacity requirements, ensuring the system can handle traffic growth and scale infrastructure accordingly. Support Freewheel powered Live events
Documentation and Knowledge Sharing: Document the architecture, configurations, and operational procedures for platforms, ensuring knowledge is shared across the team and providing relevant training
Security and Compliance: Ensure platforms meet security standards and compliance requirements to prevent breaches or misuse
Cross-Team Collaboration: Collaborate with engineering team, product team, and project management team to support product design and implementation, solving reliability-related issues
Requirements:
3+ years of experience as an SRE, DevOps or Operations Engineer
Experience with cloud platforms (e.g. AWS, OCI, GCP, Azure) is a plus
Hands-on experience with Terraform and infrastructure as code principle is a huge plus
Experience with an automation tool or framework such as Ansible, Terraform, Kubernetes, Docker for automating system deployment
Programming Skills: Proficient in at least one programming language, such as Python, Go, Java, or Scala, with the ability to write efficient scripts and automation tools
System Monitoring and Log Management: Familiar with using monitoring and log management tools such as Prometheus, Grafana, ELK Stack, or other similar tools
Team Collaboration and Communication: Excellent communication skills with the ability to convey technical information clearly and concisely to both technical and non-technical stakeholders
Proactive learner eager to grow in operations and governance
Education: Bachelor’s degree or higher in Computer Science, Software Engineering, or a related field
What we offer:
Medical, prescription, vision, and dental insurance for eligible employees
401(k) savings plan with dollar-for-dollar matching up to the first 6% of your pay
Paid time off including eight observed company holidays and flex time
Exclusive perks + discounts, including tuition assistance, commuter benefits and more