2021-03799 - Post-Doctoral Research Visit F/M Performance Modeling of HPC
Contract type : Fixed-term contract
Level of qualifications required : PhD or equivalent
Fonction : Post-Doctoral Research Visit
About the research centre or Inria department
Team STORM combines strengths on high level DSLs, heterogeneous runtimes and
analysis tools to help programmers get the highest efficiency from modern
computer architectures in
a portable manner.
This work will be developed within the framework of the HPCProSol joint team.
This team was established in 2021 as a collaboration between Inria Bordeaux
(TADaaM and STORM teams) and the National Laboratory for Scientific
Computing (LNCC) in Petrópolis, Brazil.
The team's main goal is to study and characterize the new High-Performance
Computing workload, represented by a set of scientific applications that are
important to the LNCC because they are representative of its supercomputer's
workload. Their machine, named Santos Dumont, was the largest in Latin America
and used by a diverse scientific community, thus it runs applications from
many fields. Therefore, its workload allows for drawing conclusions that can
be generalized for many similar applications and systems. The generated
knowledge will guide the proposal of monitoring and profiling techniques for
applications, and the design of new coordination mechanisms to arbitrate
resources in HPC environments.
Trips between Bordeaux and Petrópolis are planned during the contract. Travel
expenses are covered by the joint team within limits set by Inria.
HPC architectures, the supercomputers, were conceived to efficiently run
traditional HPC applications, namely numerical simulations. However, in the
context of the convergence between HPC and Big Data 1, the notion of
scientific application is evolving into a scientific workflow, composed of
CPU-intensive and data-intensive tasks. This evolution characterizes the new
In this new scenario, efficient application execution becomes more challenging
due to a mismatch between systems and applications. New applications include
new methods, libraries, and runtime systems that may not have been properly
optimized to the supercomputer, leading to problems such as load imbalance and
poor communication performance. Meanwhile, supercomputers' resources are
arbitrated between applications using little information as the number of CPUs
and the estimated execution time, which potentially wastes resources that are
unused at different moments during application execution 2. Additionally,
although running on independent nodes, concurrent applications still share the
network and I/O infrastructures, which means they can interfere with each
other. The contention in the access to shared I/O resources has been shown to
affect applications' performance non-uniformly, depending on their
characteristics 3, 4. Hence these problems are expected to become worse as
the new HPC workload includes more diverse codes, and should be tackled by
better scheduling at application and system levels, and consider applications'
characteristics to avoid issues such as interference 5.
1 M. Asch et al. Big data and extreme-scale computing: Pathways to
convergence-toward a shaping strategy for a future software and data ecosystem
for scientific inquiry. The International Journal of High Performance
Computing Applications, 32(4):435–479, 2018.
2 J. L. Bez, A. Miranda, R. Nou, F. Zanon Boito, T. Cortes, and P. Navaux.
Arbitration Policies for On-Demand User-Level I/O Forwarding on HPC Platforms.
In IPDPS 2021 - 35th IEEE International Parallel and Distributed Processing
Symposium, Portland, Oregon / Virtual, United States, May 2021.
3 X. Ji, B. Yang, T. Zhang, X. Ma, X. Zhu, X. Wang, N. El-Sayed, J. Zhai,
W. Liu, and W. Xue. Automatic, application-aware i/o forwarding resource
allocation. In 17th USENIX Conference on File and Storage Technologies (FAST
19), pages 265–279, Boston, MA, Feb. 2019. USENIX Association.
4 O. Yildiz, M. Dorier, S. Ibrahim, R. Ross, and G. Antoniu. On the root
causes of cross-application I/O interference in HPC storage systems. In 2016
IEEE International Parallel and Distributed Processing Symposium (IPDPS),
pages 750–759, 2016.
5 J. Yu, G. Liu, W. Dong, X. Li, J. Zhang, and F. Sun. On the load
imbalance problem of I/O forwarding layer in HPC systems. In 2017 3rd IEEE
International Conference on Computer and Communications (ICCC), volume 2018,
pages 2424–2428. IEEE, Dec. 2017.
The goal of this post-doctoral research is to study and model the performance
of applications that represent the new HPC workload, selected from the LNCC's
A numerical simulation library, called MHM, developed by the LNCC 6.
This library implements a number of finite element methods and offers
support to hybrid parallelism (OpenMP + MPI) for classic and multiscale
Data analysis tasks and workflows from the BioInfoPortal science gateway
7 developed by the LNCC to allow for easy execution of bioinformatics
applications on the Santos Dumont machine.
The recruited person will work in collaboration with researchers from the
joint team to profile these applications at different scales, and in
concurrence with other codes and stress benchmarks. The recruited person will
also be responsible for modeling the applications' performance, for finding
ways to generalize these profiles to similar applications, and for identifying
the information that should be obtained during application execution. This
information should be useful for obtaining new profiles automatically, and to
compute metrics that can help the runtime to predict deviations from the
standard application behavior (for instance, if the input phase of an HPC
simulation lasts longer than expected, it is possible the application is
treating a larger problem and thus will run longer, with longer and more
spaced output phases).
6 A. T. A. Gomes, D. Paredes, W. D. S. Pereira, R. P. Souto, and F.
Valentin. Per-formance analysis of the MHM simulator in a petascale machine.
In Proceedings of the XXXVIII Iberian Latin American Congress on Computational
Methods in Engineering., 2017
7 K. A. Ocaña, M. Galheigo, C. Osthoff, L. M. Gadelha, F. Porto, A. T. A.
Gomes, D. de Oliveira, and A. T. Vasconcelos. BioInfoPortal: A scientific
gateway for integrating bioinformatics applications on the Brazilian national
high-performance computing network. Future Gener. Comput. Syst.,
107(C):192–214, June 2020.
Design and run experiments with applications on a supercomputer
Model application performance and the effects of interference
Identify useful information and performance metrics for modeling and
predicting application behavior
Write reports and papers on the subject
Organize scripts and datasets for the reproduction of results and
Knowledge of parallel computing, HPC, and performance profiling and
modeling are required.
Communication skills in English (reading, writing, presenting) are
Knowledge of the French and Portuguese languages are a plus.
Technical skills: command line usage of Linux-based HPC systems; script
programming; ability to modify the source code of applications written in
different programming languages; statistical analysis using R of Python.
This is a post-doctoral position for 12 to 24 months, offered in the context
of the collaboration between Inria and the LNCC. The candidature must be
submitted by email to firstname.lastname@example.org before July 10th 2021 with all the
documents listed below. If you want to apply, please contact us beforehand at
email@example.com, firstname.lastname@example.org, and jean-
The summary sheet;
A research project detailing the research program, the work plan, the
visits expected during the post-doc and when it should begin (November
the 1st by default, January the 1st 2022 at the latest);
A detailed CV of the candidate including a description of the work
conducted during the Ph.D., a complete list of the publications, and the 2
most important publications;
A motivation letter of the candidate;
Two recommendation letters (from people working in France or outside);
A support letter from the Inria team;
A support letter from the LNCC;
A copy of the passport.
Partial reimbursement of public transport costs
Possibility of teleworking and flexible organization of working hours
Professional equipment available (videoconferencing, loan of computer
Social, cultural and sports events and activities
Access to vocational training
Social security coverage
2653€ / month (before taxs)
Theme/Domain : Distributed and High Performance Computing
Scientific computing (BAP E)
Town/city : Talence
Inria Center : CRI Bordeaux - Sud-Ouest
Starting date : 2021-11-01
Duration of contract : 12 months
Deadline to apply : 2021-07-10
Inria Team : STORM
Lima Pilla Laercio / email@example.com
The keys to success
Important qualities to succeed in this work include the capacity for
initiative and autonomy, integrity, a willingness to learn, and relational
abilities to work in a diverse and geographically-distributed team. A thesis
in computer science is required. A thesis in performance profiling or modeling
is a real asset.
Inria is the French national research institute dedicated to digital science
and technology. It employs 2,600 people. Its 200 agile project teams,
generally run jointly with academic partners, include more than 3,500
scientists and engineers working to meet the challenges of digital technology,
often at the interface with other disciplines. The Institute also employs
numerous talents in over forty different professions. 900 research support
staff contribute to the preparation and development of scientific and
entrepreneurial projects that have a worldwide impact.
Instruction to apply
Thank you to send:
- Cover letter
- Support letters (mandatory)
- List of publication
Defence Security :
This position is likely to be situated in a restricted area (ZRR), as
defined in Decree No. 2011-1425 relating to the protection of national
scientific and technical potential (PPST).Authorisation to enter an area is
granted by the director of the unit, following a favourable Ministerial
decision, as defined in the decree of 3 July 2012 relating to the PPST. An
unfavourable Ministerial decision in respect of a position situated in a ZRR
would result in the cancellation of the appointment.
Recruitment Policy :
As part of its diversity policy, all Inria positions are accessible to people
Warning : you must enter your e-mail address in order to save your
application to Inria. Applications must be submitted online on the Inria
website. Processing of applications sent from other channels is not