High-Performance Computing/Linux Systems Administrator – CNSI at UCLA
Position title: Systems Administrator 3
Application due date: October 11, 2022
Applications will continue to be accepted until this date, but those received after the review date will only be considered if the position has not yet been filled.
The California NanoSystems Institute (CNSI) at UCLA produces life-changing scientific, economic and social impact in California and across the world. The CNSI environment fosters interdisciplinary teams that collaborate on high-impact research, leveraging the Institute’s leading-edge facilities and expert services to address grand challenges of our time.
A successful High-Performance Computing (HPC)/Linux Systems Administrator at CNSI is passionate about technology and will thrive working on a collaborative team of technologists and scientists to support and grow the Institute’s research computing infrastructure. Under the direction of the Information Technology Manager, this role is responsible for the Institute’s HPC cluster, high-performance storage, Linux servers and workstations, advanced networking, and associated management and monitoring infrastructure.
The systems administrator will work with Technology Center (core research laboratories) leadership and research staff to understand workflows and computing needs to analyze, specify, design and implement operating system and application software solutions in a medium-sized, heterogeneous computing environment made up of standalone, virtualized, and clustered Windows and Linux systems. The systems administrator is tasked with working with research staff to optimize the performance and utilization of the Institute’s HPC resources including the parallel file system storage array. Additional responsibilities involve security patching, OS and package upgrades, automation, scripting. troubleshooting hardware, software and networking issues, and ensuring maximum system availability.
The incumbent will enforce IT security best practices, engage in the monitoring, investigation and remediation of cyber security threats, provide network administration and troubleshooting, and also actively participate in strategic planning discussions regarding future state of the Institute’s infrastructure. The systems administrator will work collaboratively with the other members of the IT team to develop, plan and implement projects to achieve team objectives.
The incumbent is expected to stay informed of the latest IT security threats and work with the IT team to assess network security levels and compliance, institute IT security best practices, and develop and implement action-plans to improve system and network security. The incumbent will monitor the health and overall performance of the department’s network hardware, physical and virtual servers, uninterruptable power supplies, storage solutions, databases, and web services. Other tasks include the evaluation, recommendation, implementation and support of new hardware and software technologies, documentation of systems and processes, the training of staff, researchers and CNSI building residents regarding network access procedures, software applications, security best practices and policies, hardware usage, serve as backup for other IT team members and perform other duties as assigned.
CNSI is committed to supporting work-life balance for its employees. Due to the complex, laboratory-centric work done at the Institute, on-site work is required. The Institute’s IT team works on a rotating and partially overlapping schedule. System maintenance activities will require intermittent night and weekend work, some of which can be done remotely.
- Minimum of 5 years of experience administering a high-performance computing cluster.
- Minimum of 7 years of experience with Linux (CentOS 7, Scientific Linux 7, Rocky Linux) administration with Windows Active Directory integration.
- Minimum 7 years of experience with desktop and server hardware, RAID, and an ability to methodically diagnose and resolve hardware or software-based issues.
- Ability to build and maintain positive, constructive and collaborative working relationships in a dynamic environment.
- Advanced knowledge of system and network performance monitoring and optimization techniques.
- Advanced knowledge of network hardware, network topology, network and routing protocols, subnetting and addressing, and network security.
- Advanced knowledge of Active Directory, DNS, DHCP, LDAP, NFS, SMB, SNMP, server clustering and replication, virtual servers, wireless protocols, and security certificates.
- Expert knowledge of local, parallel and distributed file systems such as Ext4, XFS, ZFS, LustreFS, BeeGFS and/or GPFS.
- Demonstrated skill in developing and implementing methods and procedures to ensure information security and data integrity.
- Experience in writing technical documentation and user guides.
- Skilled in communicating effectively both orally and in writing; develop and deliver presentations on technical topics to audiences of varying knowledge and level.
- Demonstrated skill in analyzing complex or ambiguous technical information, problems, or situations.
- Ability to work efficiently and effectively in the midst of diversified responsibilities, changing priorities and frequent interruptions.
- Ability to plan, coordinate, and prioritize tasks/projects while keeping leadership and stakeholders apprised of status on a regular basis.
- Ability to work nights and weekends to perform maintenance tasks, and be responsive in emergencies.
- Ability to lift 42 lbs.
- Experience with HPC job scheduling software such as SLURM.
- Experience with Linux automation software such as Ansible or Puppet.
- Demonstrated experience with hard drive cloning and enterprise backup and disaster recovery software such as Veeam Backup and Recovery, Veritas Backup Exec or NetBackup.
- Strong knowledge of common software packages including next generation antivirus, Microsoft Office, Adobe Creative Cloud, web browsers, file managers, and cloud storage.
- Experience with VPNs and next generation firewalls.
- Knowledge of University IT policies and procedures.
- Experience with shell scripting or other programming languages such as Perl, Python, or PowerShell.