This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are currently seeking a Unix & Linux - Systems Administration - Hybrid / Partially Client Onsite to join our team in Montreal, Quebec (CA-QC), Canada (CA). The role is critical to our day-to-day incident management function with primary responsibilities for: Diagnosis and resolution of immediate production impacting issues in the compute and storage plant; Working with other infrastructure teams including networking, database administration and hosted solution teams for outage resolution, as well as customers aligned with the business users of our plant to determine scope, impact, and appropriate resolution path; Carry out proactive health and hygiene tasks to maintain operational stability and compliance for risk & control programs to ensure the production environment is not put at risk; Collaborate with engineering teams to test and certify new hardware and software products; Occasional weekend project work responsibilities to on-board new UNIX assets for growth or large programs such as new datacenter build outs.
Job Responsibility:
Diagnosis and resolution of immediate production impacting issues in the compute and storage plant
Working with other infrastructure teams including networking, database administration and hosted solution teams for outage resolution, as well as customers aligned with the business users of our plant to determine scope, impact, and appropriate resolution path
Carry out proactive health and hygiene tasks to maintain operational stability and compliance for risk & control programs to ensure the production environment is not put at risk
Collaborate with engineering teams to test and certify new hardware and software products
Occasional weekend project work responsibilities to on-board new UNIX assets for growth or large programs such as new datacenter build outs
Requirements:
5 to 7 years of experience in a similar role
Must have strong knowledge and experience with Linux, preferably RedHat, and/or any other Linux distributions
Strong knowledge and experience of various services (i.e. DNS, DHCP, NTP, Kerberos, SSHD, PXE, SFTP, HTTPD, Docker, etc.)
Knowledge of various enterprise server hardware models (blades, rackmount, standalone) networking, routers and switches
Must be able to read, understand and write intermediate to complex scripts using KSH, Bash, Perl, Python, etc.
Good understanding and workings of configuration management tools, RedHat Satellite servers, Quattor, Puppet, Chef, etc.
Good knowldege and understanding of Clustering, Virtualization, NAS, NFS and SAN
Excellent communication and written skills. Being able to explain technical problems to non-technical audience
Available for on-call (1 week out of ever 4-6 weeks), rotated weekly within the team, and become a point person for any production issues
Ability to work in a global distributed team
Nice to have:
Experience with troubleshooting incidents involving compute resources, network problems, remote storage related problems (SAN, NAS), etc.
Experience with analyzing and diagnosing kernel crash/core dumps, network packet captures and identifying the root cause of problems
Sound knowledge of networking, TCP/IP, Layer 2/3 network design, firewall, switches and routers, etc.
Experience working in a DevOps environment
Knowlege and experience with various server hardware models and vendors (i.e. IBM, Dell, HP, etc.)
Ability to identify performance bottlenecks and tune the system parameters to provide more throughput
Good understanding and knowledge of Load Balancing, High Availability and BC