Post-Doctoral Research Visit F/M Performance Modeling of HPC Applications [Inria/LNCC]



July 10, 2021


Team STORM combines strengths on high level DSLs, heterogeneous runtimes and performance analysis tools to help programmers get the highest efficiency from modern computer architectures in a portable manner.


This work will be developed within the framework of the HPCProSol joint team. This team was established in 2021 as a collaboration between Inria Bordeaux (TADaaM and STORM teams) and the National Laboratory for Scientific Computing (LNCC) in Petrópolis, Brazil.

The team's main goal is to study and characterize the new High-Performance Computing workload, represented by a set of scientific applications that are important to the LNCC because they are representative of its supercomputer's workload. Their machine, named Santos Dumont, was the largest in Latin America and used by a diverse scientific community, thus it runs applications from many fields. Therefore, its workload allows for drawing conclusions that can be generalized for many similar applications and systems. The generated knowledge will guide the proposal of monitoring and profiling techniques for applications, and the design of new coordination mechanisms to arbitrate resources in HPC environments.

Trips between Bordeaux and Petrópolis are planned during the contract. Travel expenses are covered by the joint team within limits set by Inria.

Scientific context

HPC architectures, the supercomputers, were conceived to efficiently run traditional HPC applications, namely numerical simulations. However, in the context of the convergence between HPC and Big Data 1, the notion of scientific application is evolving into a scientific workflow, composed of CPU-intensive and data-intensive tasks. This evolution characterizes the new HPC workload.

In this new scenario, efficient application execution becomes more challenging due to a mismatch between systems and applications. New applications include new methods, libraries, and runtime systems that may not have been properly optimized to the supercomputer, leading to problems such as load imbalance and poor communication performance. Meanwhile, supercomputers' resources are arbitrated between applications using little information as the number of CPUs and the estimated execution time, which potentially wastes resources that are unused at different moments during application execution 2. Additionally, although running on independent nodes, concurrent applications still share the network and I/O infrastructures, which means they can interfere with each other. The contention in the access to shared I/O resources has been shown to affect applications' performance non-uniformly, depending on their characteristics 3, 4. Hence these problems are expected to become worse as the new HPC workload includes more diverse codes, and should be tackled by better scheduling at application and system levels, and consider applications' characteristics to avoid issues such as interference 5.


1 M. Asch et al. Big data and extreme-scale computing: Pathways to convergence-toward a shaping strategy for a future software and data ecosystem for scientific inquiry. The International Journal of High Performance Computing Applications, 32(4):435–479, 2018.

2 J. L. Bez, A. Miranda, R. Nou, F. Zanon Boito, T. Cortes, and P. Navaux. Arbitration Policies for On-Demand User-Level I/O Forwarding on HPC Platforms. In IPDPS 2021 - 35th IEEE International Parallel and Distributed Processing Symposium, Portland, Oregon / Virtual, United States, May 2021.

3 X. Ji, B. Yang, T. Zhang, X. Ma, X. Zhu, X. Wang, N. El-Sayed, J. Zhai, W. Liu, and W. Xue. Automatic, application-aware i/o forwarding resource allocation. In 17th USENIX Conference on File and Storage Technologies (FAST 19), pages 265–279, Boston, MA, Feb. 2019. USENIX Association.

4 O. Yildiz, M. Dorier, S. Ibrahim, R. Ross, and G. Antoniu. On the root causes of cross-application I/O interference in HPC storage systems. In 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 750–759, 2016.

5 J. Yu, G. Liu, W. Dong, X. Li, J. Zhang, and F. Sun. On the load imbalance problem of I/O forwarding layer in HPC systems. In 2017 3rd IEEE International Conference on Computer and Communications (ICCC), volume 2018, pages 2424–2428. IEEE, Dec. 2017.


The goal of this post-doctoral research is to study and model the performance of applications that represent the new HPC workload, selected from the LNCC's workload:

  • A numerical simulation library, called MHM, developed by the LNCC 6. This library implements a number of finite element methods and offers support to hybrid parallelism (OpenMP + MPI) for classic and multiscale numerical simulations.
  • Data analysis tasks and workflows from the BioInfoPortal science gateway 7 developed by the LNCC to allow for easy execution of bioinformatics applications on the Santos Dumont machine.
  • The recruited person will work in collaboration with researchers from the joint team to profile these applications at different scales, and in concurrence with other codes and stress benchmarks. The recruited person will also be responsible for modeling the applications' performance, for finding ways to generalize these profiles to similar applications, and for identifying the information that should be obtained during application execution. This information should be useful for obtaining new profiles automatically, and to compute metrics that can help the runtime to predict deviations from the standard application behavior (for instance, if the input phase of an HPC simulation lasts longer than expected, it is possible the application is treating a larger problem and thus will run longer, with longer and more spaced output phases).


    6 A. T. A. Gomes, D. Paredes, W. D. S. Pereira, R. P. Souto, and F. Valentin. Per-formance analysis of the MHM simulator in a petascale machine. In Proceedings of the XXXVIII Iberian Latin American Congress on Computational Methods in Engineering., 2017

    7 K. A. Ocaña, M. Galheigo, C. Osthoff, L. M. Gadelha, F. Porto, A. T. A. Gomes, D. de Oliveira, and A. T. Vasconcelos. BioInfoPortal: A scientific gateway for integrating bioinformatics applications on the Brazilian national high-performance computing network. Future Gener. Comput. Syst., 107(C):192–214, June 2020.

    Main activities

  • Design and run experiments with applications on a supercomputer
  • Model application performance and the effects of interference
  • Identify useful information and performance metrics for modeling and predicting application behavior
  • Write reports and papers on the subject
  • Organize scripts and datasets for the reproduction of results and statistical analyses
  • Knowledge of parallel computing, HPC, and performance profiling and modeling are required.
  • Communication skills in English (reading, writing, presenting) are required.
  • Knowledge of the French and Portuguese languages are a plus.
  • Technical skills: command line usage of Linux-based HPC systems; script programming; ability to modify the source code of applications written in different programming languages; statistical analysis using R of Python.
    Important qualities to succeed in this work include the capacity for initiative and autonomy, integrity, a willingness to learn, and relational abilities to work in a diverse and geographically-distributed team. A thesis in computer science is required. A thesis in performance profiling or modeling is a real asset.

