RaNNC, Automatic Parallelization Middleware for Deep Learning, Wins First Place at PyTorch Annual Hackathon 2021

6 Dec. 2021

National Institute of Information and Communications Technology
The University of Tokyo

RaNNC (Rapid Neural Network Connector), automatic parallelization middleware for deep learning developed jointly by Data-driven Intelligent System Research Center (DIRECT), the National Institute of Information and Communications Technology (NICT, President: TOKUDA Hideyuki, Ph.D.), and the University of Tokyo (President: FUJII Teruo), won first place at the PyTorch Annual Hackathon 2021 (PyTorch Developer Tools & Libraries category).

PyTorch is the de facto standard framework for deep learning, and this hackathon is the only worldwide event that both awards PyTorch projects and is officially held by Facebook, the leading PyTorch developer (https://pytorch2021.devpost.com/). RaNNC drastically simplifies training of large-scale neural networks, which has been very difficult with existing functions of PyTorch. RaNNC is available as open-source software, and anyone can download and use it for free, even for commercial purposes.

PyTorch Annual Hackathon (https://pytorch2021.devpost.com/) is an event where users of PyTorch, the de facto standard framework for deep learning, come together and develop software or machine learning models using PyTorch. It has been held annually since 2019 and is known as the only worldwide event that both awards PyTorch projects and is officially held by Facebook, which is the leading player in the development of PyTorch. This year, 1,947 people participated from around the world, and 65 projects were submitted.

RaNNC is middleware that automatically partitions large-scale neural networks and parallelizes their training using many GPUs*1. To train a large-scale neural network, users need to partition it and compute the parts on multiple GPUs because the parameters of a huge neural network do not fit into the memory of a GPU. However, it is very difficult, even for experts, to partition a neural network, considering memory usage and the efficiency of parallel processing. In contrast, RaNNC takes a description of a neural network that is designed to be computed on a single GPU and automatically partitions the network so that partitioned sub-networks fit into the memory of each GPU and a high training speed is achieved. This drastically simplifies training of large-scale neural networks.

RaNNC was first released in March 2021 and significantly updated for PyTorch Annual Hackathon 2021. To reduce usage of GPU memory, the new feature introduced during the hackathon keeps most parameters of a neural network on the main memory, which is much larger than GPU memory, and moves only the necessary parameters to GPU memory just before the computations on the GPU use the parameters. This enables us to train large-scale neural networks with much less GPU memory.

RaNNC has been developed through collaborative research by NICT and the University of Tokyo. This research group has the following members:

Masahiro Tanaka	Senior Researcher, Data-driven Intelligent System Research Center, Universal Communication Research Institute, NICT
Kenjiro Taura	Professor, Department of Information and Communication Engineering, Graduate School of Information Science and Technology, The University of Tokyo / Director of Information Technology Center, The University of Tokyo
Toshihiro Hanawa	Professor, Supercomputing Research Division, Information Technology Center, The University of Tokyo
Kentaro Torisawa	NICT Fellow / Associate Director General, Universal Communication Research Institute, NICT / Distinguished Researcher, Data-driven Intelligent System Research Center, Universal Communication Research Institute, NICT

We confirmed that RaNNC could automatically parallelize training of a neural network with 100 billion parameters. To train such a huge network, in previous work, human experts have had to significantly rewrite the description of a neural network to optimize parallel processing. However, RaNNC can automatically parallelize training and achieve a high training speed given a description of a neural network that is not designed for parallel processing.

In addition, some well-known frameworks designed to train large-scale neural networks are only applicable to specific types of networks, including Transformer*2, while RaNNC is basically applicable to any type of neural network.

The source code and usage examples of RaNNC are now available at GitHub*3 (https://github.com/nict-wisdom/rannc). RaNNC is licensed under an MIT license, which allows users to use RaNNC for free, even for commercial purpose.

Glossary

*1 GPU (Graphics Processing Unit)
A device originally developed for computations of image processing. Nowadays, it is widely used for general-purpose processing because it shows high computation performance through parallel processing. Particularly in deep learning, GPUs are widely used because they can efficiently parallelize a huge amount of computation. However, a GPU has much smaller memory than a CPU, and the parameters of large-scale neural networks often do not fit into the memory of a GPU.

*2 Transformer
A neural network proposed in 2017 that has mainly been used for language processing. It has had a significant impact in the research area and has been used in many successor networks including BERT, which surpassed existing records in many language processing tasks.

*3 GitHub
A code repository site where source codes of a vast amount of software are registered. Most of the source codes on the site are publicly available.

Contacts

< Technical Contact >
TANAKA Masahiro
Data-driven Intelligent System Research Center
Universal Communication Research Institute
NICT
E-mail: wisdom-contact[at]ml.nict.go.jp

< Media Contact >
Press Office
Public Relations Department
NICT
E-mail: publicity[at]nict.go.jp

Yoshihisa Obayashi
Public Relations Specialist
Information Technology Center, The University of Tokyo
E-mail: itc-press[at]itc.u-tokyo.ac.jp
Phone: +81 80-9422-7780