摘 要:
efficient and scalable system software, especially performance analysis and
monitoring, for large-scale systems, is increasingly important both for the
developers of parallel applications and the designers of next-generation HPC
systems. However, conventional performance tools suffer from significant
time/space overhead due to the ever-increasing problem size and system scale.
For instance, Memory monitoring is of critical use in understanding
applications and evaluating systems. Due to the dynamic nature in programs’
memory accesses, common practice today leaves large amounts of address
examination and data recording at runtime, at the cost of substantial
performance overhead.
the other hand, the cost of source code analysis is independent of the problem
size and system scale, making it very appealing for large-scale performance
analysis. Inspired by this observation, we have designed a series of
light-weight system software for HPC systems, such as a memory access
monitoring tool, a performance variance detection tool, and a communication
trace compression tool. In this talk, I will share our experience on building
these tools through combining static analysis and runtime analysis and also
point out the main challenges in this direction.
翟季冬,清华大学计算机系副教授,博士生导师。主要研究领域为高性能计算、性能评测、大规模并行程序性能分析和优化。2015-2016在斯坦福大学计算机系任访问助理教授。相关研究成果发表在高性能计算领域重要的国际会议和期刊SC、PPoPP、ICS、MICRO、ASPLOS、ATC、CGO、IEEE TPDS、IEEE TC等。其中SC14论文入选会议Best
Paper Finalist,是大陆学者首次入围该奖项。担任NPC
2018程序委员会主席、ACM/IEEE SC 2018和2019程序委员会委员、PPOPP