McGill University is seeking a Systems Administrator, Storage Systems to take a significant role in the operations, maintenance of present and planning for future initiatives in the area of Advanced Research Computing (ARC) and specifically the large-scale, multi-petabyte tape library & near-line data storage platform. Reporting to the Associate Director, Operations, of the McGill High Performance Computing Centre (“HPC Centre”), the incumbent will work within the Calcul Québec (CQ) organization and join a vibrant team of HPC systems administrators and analysts across several Quebec institutions. It is an opportunity to be working with leading edge technology and a team that has installed and operated supercomputers in the Top 500.
McGill is a founding member of CQ, a consortium of Québec universities whose objective is to provide advanced research computing (ARC) to the research community including HPC data centres at the leading edge of technology and highly qualified computing experts. More than 600 research groups take advantage of the resources made available to them by CQ to conduct research in various fields. CQ is a Regional Partner of Compute Canada (CC), the non-profit organization in charge of coordinating ARC efforts throughout Canada.
The ARC environment includes an HPC cluster consisting of over one thousand nodes with a mixture of CPU and GPU processors, a cloud environment, as well as a multi-petabyte disk and tape storage environment with backup and archive capabilities.
Duties and Responsibilities:
The systems administrator will be responsible for the core operations, maintenance and growth planning for the HPC Centre’s storage systems and servers based upon Lustre, and specifically the multi-petabyte tape backup and archive environment.
Perform daily routine maintenance such as reviewing activity logs, performing TSM database and storage pool maintenance, and on-site and off-site tape management.
Installation and maintenance of TSM server and client environments.
Server client troubleshooting and issue resolution.
Manage future software upgrades of the TSM environment.
Perform proactive maintenance and tuning to all elements of the TSM environment including libraries and host operating systems.
Reporting of operational statistics, performance metrics and forecasting data to management.
Establish clear and thorough documentation regarding the TSM and storage environments.
Deliver reliable and high performance access to storage in a high availability environment for storage operations including multipath and automated failover.
Education and Experience:
Bachelor’s degree in Computer Science or in a related scientific field
Three (3) years' related experience
Other Qualifying Skills and Abilities:
At least 3 years experience in a large, enterprise environment containing hundreds of server, storage and network elements operating in a clustered setup.
Demonstrated expertise in the following areas:
Linux systems administration including RedHat Linux or CentOS Linux. System administration of Power Systems running RedHat Linux, is preferred.
At least 3 to 5 years demonstrated expertise experience with Tivoli Service Manager (TSM) (Spectrum Protect) for backup and archive in an enterprise environment
Architecting, implementing and operating large-scale TSM installations growing into the multi-petabyte scales
Disk and tape storage systems, including tape libraries, in an enterprise storage environment
High-availability and load balanced environments
Automation and monitoring of systems administration tasks
Shell scripting and other scripting languages (ex. Python, Perl, etc)
Knowledge in the following areas is considered as assets:
Fibre Channel and SAN systems including design, configuration and maintenance including troubleshooting and resolution
xCat, Puppet and Monitoring tools, such as Icinga
Database management including with MySQL and PostgreSQL
Systems and network security architecture, configuration and maintenance on Linux systems including RedHat and CentOS
Large-Scale storage systems such as Lustre
Attention to detail in the level of work performed, taking pride, responsibility and a sense of ownership for the successful operations of the systems under their administration and the availability and reliability of those systems in support of all research users. Advanced problem-solving skills. Good oral and written communication skills in both French and English. Ability to work in complex technical environments. Ability to effectively work under pressure, with multiple concurrent tasks and priorities, so as to achieve successful outcomes and results. Ability to take supervisory and management direction so as to work effectively and with little direct supervision in order to complete tasks. Ability to work effectively with a distributed team in a collaborative environment across Quebec and Canada.
Ability to work cooperatively with a diverse team of professionals, acting as a technical resource for others in the team, as well as to work together with other staff on projects of significant importance and value to the organization and to the clients we serve.
Ability to perform problem identification and perform issue resolution in a complex environment. A demonstrated aptitude for learning new technologies.
How to Apply:
Please submit your cover letter and curriculum vitae, clearly indicating the reference number, to Staffing:
McGill University, Human Resources (Staffing)
688 Sherbrooke Street West, suite 1520
apply.hr [at] mcgill.ca
The deadline to apply for this position is March 5, 2020 at 5:00 PM.
- Current employees: please indicate your McGill ID number in your application.*
We thank all applicants for their interest in McGill University. However, Staffing will only contact applicants selected for an interview.
In order to maintain internal priority, McGill employees must apply within the delays specified in the MUNACA collective agreement for positions covered by the collective agreement or according to the personnel policies for positions covered by the personnel policies.
McGill University hires on the basis of merit and is strongly committed to equity and diversity within its community. We welcome applications from racialized persons/visible minorities, women, Indigenous persons, persons with disabilities, ethnic minorities, and persons of minority sexual orientations and gender identities, as well as from all qualified candidates with the skills and knowledge to productively engage with diverse communities. McGill implements an employment equity program and encourages members of designated groups to self-identify. Persons with disabilities who anticipate needing accommodations for any part of the application process may contact, in confidence, accessibilityrequest.hr [at] mcgill.ca or 514-398-3711.
Office of the Vice-Principal (Research and Innovation)
McGill High Performance Computing Centre (HPC) – Calcul Québec
(Grade 05) $56,800 - $71,100 (midpoint) - $92,500
Two (2) years
Associate Director, Operations (HPC)
Friday, February 21, 2020