Senior HPC Infrastructure Engineer (105480-1122) . Job Reference: 1889133

University of Warwick
December 19, 2022
Contact:N/A
Offerd Salary:£1.3
Location:N/A
Working address:N/A
Contract Type:Other
Working Time:Full time
Working type:N/A
Job Ref.:N/A

Vacancy Type/Job category

Management & Professional

Department

Research Technology Platform

Sub Department

Scientific Computing RTP

Salary

Competitive Salary

Location

University of Warwick, Coventry

Vacancy Overview

Previous candidates need not apply.

Permanent position, 1.0 FTE.

We are seeking an experienced and talented infrastructure engineer to take a senior role in delivering and supporting high-performance computing (HPC) services at the University of Warwick. As part of the Scientific Computing Research Technology Platform (SCRTP), you will operate a diverse range of HPC equipment across two on-campus research data centres. This includes a £1.3M system commissioned in 2021, and the £3M Sulis system (sulis.ac.uk) hosted by Warwick on behalf of HPC Midlands+, one of eight tier 2 HPC platforms in the UK. A further refresh of our HPC capability is planned for 2022.

The SCRTP infrastructure handles many workloads, from processing experimental data gathered at particle accelerators, observatories, and on-campus experimental facilities, to massively parallel simulations of star system formation and GPU accelerated deep learning. HPC underpins Warwick research tackling the greatest global challenges. Our HPC users design energy harvesting and battery materials at the quantum scale, pioneer computational discovery of new energy efficient chemical syntheses, and undertake molecular scale investigation of anti-microbial resistance.

The successful candidate will ensure this key component of our research infrastructure is maintained, secured, and operated to standards commensurate with enabling world-class research. You will work closely with Research Software Engineers, Software Specialists and the SCRTP Facility Manager to provide solutions for efficient and effective utilisation by our growing HPC user community of 200+ staff and research students.

Interview Date: TBC.

Job Description

JOB PURPOSE

The purpose of this role is to engineer, develop, operate and support the University's High Performance Computing (HPC) facilities to standards necessary to enable internationally competitive research. This role forms part of a specialist team of highly skilled technical experts providing the University's Scientific Computing facilities.

DUTIES & RESPONSIBILITIES

1. Operations

a. Design and engineer High Performance Computing (HPC) and associated storage infrastructure, to specifications and standards necessary to enable internationally competitive research.

b. Be responsible for the systems engineering of the HPC facilities.

c. Co-ordinate HPC operations within the broader context of University scientific computing facilities and regional services, by working with the team.

d. Work with the team to implement configuration management and revision control processes to ensure changes are properly tracked and available for audit when needed.

e. Specify, implement and maintain systems to manage the deployment of application software stacks aligned to the needs of Warwick researchers and research projects.

f. Ensure the workload placed on the HPC facilities is managed, prioritised and executed in line with agreed service levels and policies for the facilities.

g. Design, develop and maintain tools to analyse and report usage data for the HPC facilities.

h. Co-ordinate HPC operations within the broader context of University scientific computing facilities, by working with teams responsible for research storage, Linux desktop and research software.

i. Develop and manage appropriate security polices to ensure secure authentication, ongoing software maintenance and patching are in place on the HPC facilities.

j. Implement backup and disaster recovery procedures in line with relevant policies, processes and service level agreements for the facilities.

k. Work with the team to implement infrastructure monitoring and alerting systems to ensure problems are detected and downtime reduced.

l. Plan and implement physical infrastructure deployments within the data centre working with the Estates Office and central IT Services where needed.

m. Test, evaluate and recommend new technology for future use as part of the facilities.

n. Write and maintain documentation on system design and management processes to ensure knowledge is accessible and disseminated appropriately within the team.

o. Keep up to date with the state of the art through a program of conference attendance, technical briefings and engagement with developers, practitioners and vendors of technology.

p. Provide reasonable and appropriate out-of-hours systems support where warranted by major disruption or outages affecting mission-critical services.

q. Work as directed by the Scientific Computing Platform Manager.

2. Customer engagement

a. Deliver specialist research computing support for HPC as required by prioritising requests and responding in line with established service level agreements where applicable.

b. Develop good relations with users of the facilities through good communication, evaluation of feedback and providing standards of support, guidance and expert advice appropriate for enabling internationally competitive research.

c. Develop good relations with external users, and with providers of regional and national HPC facilities.

d. Write and maintain web-based documentation and training material for users of the HPC facilities

3. Financial

a. Draft technical specifications for invitation to tender documents and technically assess tender responses.

b. Support research grant applications with costs for facilities access and dedicated equipment.

c. Procure new equipment in line with University procurement policies, processes and regulations.

4. Health and safety

a. Work within established health and safety procedures liaising with the Estates Office and University Security where needed.

Person Specification

The Person Specification focuses on the knowledge, skills, experience and qualifications required to undertake the role effectively. This is measured by (a) Application Form, (b) Test/Exercise, (c) Interview, (d) Presentation.

Essential Criteria 1

Degree or equivalent in a scientific discipline relevant to scientific computing or equivalent experience. (a, c)

Essential Criteria 2

Strong track record in Linux systems engineering gained in a large-scale, HPC environment. (a, c)

Essential Criteria 3

Experience of working as part of a highly skilled technical team. (a, c)

Essential Criteria 4

Experience of supporting expert and non-expert users of Linux-based computing facilities. (a, c)

Essential Criteria 5

Development of user-facing documentation and example scripts. (a, c)

Essential Criteria 6

Experienced with automated Linux deployment and provisioning tools. (a, c)

Essential Criteria 7

Experience with parallel high performance file systems such as IBM Spectrum Scale (formerly GPFS) or Lustre. (a, c)

Essential Criteria 8

Configuring and managing batch processing management software such as SLURM, PBS or Torque. (a, c)

Essential Criteria 9

Proven knowledge of cluster network administration and parallel computing. (a, c)

Essential Criteria 10

Experience of server-based GPU processing technology and associated software. (a, c)

Essential Criteria 11

Excellent written and verbal communication skills. (a, c)

Desirable Criteria 1

Higher degree desirable but not essential. (a)

Desirable Criteria 2

Experience of undertaking computational research within a university environment. (a, c)

Further Particulars

The Scientific Computing Research Technology Platform (SCRTP) is one of 12 RTPs which serve as central shared facilities to the Warwick research community. The SCRTP team have been delivering research computing services since 2002. The facility provides a managed Linux desktop environment for research, a multi-Petabyte storage pool for shared datasets and a pool of taskfarm servers for offloading long running computations. We also host and operate Warwick's high performance computing clusters and the Sulis tier 2 HPC system (sulis.ac.uk) on behalf of HPC Midlands+.

Robust and performant computing facilities are recognised as vital to delivering Warwick's ambitious research strategy. Since 2016 we have invested upwards of £5M in expanding and refreshing our scientific computing provision and benefitted from over £3.5M of research council equipment funding. Our growing team now includes research software engineering, graduate training provision and software support.

For further information on working life at the university consult our HR pages at https: // warwick.ac.uk/services/humanresources/about/.

Further information on the SCRTP can be found at https: // warwick.ac.uk/scrtp, by contacting the SCRTP Academic Director Prof David Quigley (D.Quigley@warwick.ac.uk).

We will consider applications for employment on a part-time or other flexible working basis, even where a position is advertised as full-time, unless there are operational or other objective reasons why it is not possible to do so.

Warwick is committed to building an organisation of mutual respect and dignity, promoting a welcoming, diverse and inclusive working and learning environment. We recognise that everyone is different in a variety of visible and non-visible ways, and that those differences are to be recognised, respected, and valued. Where possible, we go beyond legislation to provide a place where everyone can thrive, supporting all staff to achieve their full potential. We aspire to remove economic, social and cultural barriers that may otherwise prevent people from succeeding.

We therefore welcome and encourage applications from all communities regardless of culture, background, age, disability, sex/gender, gender identity or expression, ethnicity, religion/belief, or sexual or romantic orientation. To find out more about our social inclusion work at Warwick visit our webpages here.

The University of Warwick holds an Athena SWAN Silver award; a national initiative to promote gender equality for all staff and students. Further information about the work of the University in relation to Athena SWAN can be found at the following link: Athena Swan (warwick.ac.uk)

The University of Warwick is one of the six founder institutions of the EUTOPIA European University alliance, whose aim is to become by 2025 an open, multicultural, confederated operation of connected campuses.

Right to work in the UK If you do not yet have the right to work in the UK and/or are seeking sponsorship for a Skilled Worker visa in the UK points-based immigration system please click on this link which contains further information about obtaining right to work in the UK and details about eligibility for sponsorship for a Skilled Worker Visa.

Recruitment of Ex-Offenders Policy

As an organisation using the (DBS) Disclosure and Barring Service to assess applicants' suitability for positions of trust, the University of Warwick complies with the DBS Code of Practice and undertakes not to discriminate unfairly against any subject of a Disclosure on the basis of a conviction or other information revealed. More information is available on the University's Vacancy pages and applicants may request a copy of the DBS Code of Practice.

Closing Date

19 Dec 2022

From this employer

Recent blogs

Recent news