Torsten Hoefler
Demystifying parallel and distributed deep learning: An in-depth concurrency analysis
T Ben-Nun, T Hoefler
ACM Computing Surveys (CSUR) 52 (4), 1-43, 2019
Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks
T Hoefler, D Alistarh, T Ben-Nun, N Dryden, A Peste
Journal of Machine Learning Research 22 (241), 1-124, 2021
The convergence of sparsified gradient methods
D Alistarh, T Hoefler, M Johansson, N Konstantinov, S Khirirat, C Renggli
Advances in Neural Information Processing Systems 31, 2018
MPI: A Message-Passing Interface Standard
MPI Forum
Technical Report, 2012
Slim fly: A cost effective low-diameter network topology
M Besta, T Hoefler
SC'14: proceedings of the international conference for high performance …, 2014
Characterizing the influence of system noise on large-scale applications by simulation
T Hoefler, T Schneider, A Lumsdaine
SC'10: Proceedings of the 2010 ACM/IEEE International Conference for High …, 2010
Generic topology mapping strategies for large-scale parallel architectures
T Hoefler, M Snir
Proceedings of the international conference on Supercomputing, 75-84, 2011
The PERCS high-performance interconnect
B Arimilli, R Arimilli, V Chung, S Clark, W Denzel, B Drerup, T Hoefler, ...
2010 18th IEEE Symposium on High Performance Interconnects, 75-82, 2010
Scientific benchmarking of parallel computing systems: twelve ways to tell the masses when reporting performance results
T Hoefler, R Belli
Proceedings of the international conference for high performance computing …, 2015
Implementation and performance analysis of non-blocking collective operations for MPI
T Hoefler, A Lumsdaine, W Rehm
Proceedings of the 2007 ACM/IEEE conference on Supercomputing, 1-10, 2007
Neural code comprehension: A learnable representation of code semantics
T Ben-Nun, AS Jakobovits, T Hoefler
Advances in neural information processing systems 31, 2018
Gptq: Accurate post-training quantization for generative pre-trained transformers
E Frantar, S Ashkboos, T Hoefler, D Alistarh
arXiv preprint arXiv:2210.17323, 2022
Graph of thoughts: Solving elaborate problems with large language models
M Besta, N Blach, A Kubicek, R Gerstenberger, M Podstawski, ...
Proceedings of the AAAI Conference on Artificial Intelligence 38 (16), 17682 …, 2024
LogGOPSim: simulating large-scale applications in the LogGOPS model
T Hoefler, T Schneider, A Lumsdaine
Proceedings of the 19th ACM International Symposium on High Performance …, 2010
Augment your batch: Improving generalization through instance repetition
E Hoffer, T Ben-Nun, I Hubara, N Giladi, T Hoefler, D Soudry
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2020
Dare: High-performance state machine replication on rdma networks
M Poke, T Hoefler
Proceedings of the 24th International Symposium on High-Performance Parallel …, 2015
Using automated performance modeling to find scalability bugs in complex codes
A Calotoiu, T Hoefler, M Poke, F Wolf
Proceedings of the International Conference on High Performance Computing …, 2013
Using advanced MPI: Modern features of the message-passing interface
W Gropp, T Hoefler, R Thakur, E Lusk
MIT Press, 2014
The digital revolution of Earth-system science
P Bauer, PD Dueben, T Hoefler, T Quintino, TC Schulthess, NP Wedi
Nature Computational Science 1 (2), 104-113, 2021
To push or to pull: On reducing communication and synchronization in graph computations
M Besta, M Podstawski, L Groner, E Solomonik, T Hoefler
Proceedings of the 26th International Symposium on High-Performance Parallel …, 2017
