Monday, 04 May 2020
Location information: Virtual Meeting.

Schedule

9:00-9:20Introduction & Motivation
9:20-10:30Neural Network Compression + Q&A
10:30-11:00Coffee Break
11:00-12:00Distributed Learning
12:00-12:30Code Demo + Q&A

Summary

Deep neural networks have recently demonstrated their incredible ability to solve complex tasks. Today's models are trained on Millions of examples using powerful GPU cards and are able to reliably annotate images, translate text, understand spoken language or play strategic games such as chess or go. Furthermore, deep learning will also be integral part of many future technologies, e.g., autonomous driving, Internet of Things (IoT) or 5G networks. Especially with the advent of IoT, the number of intelligent devices has rapidly grown in the last couple of years. Many of these devices are equipped with sensors that allow them to collect and process data at unprecedented scales. This opens up unique opportunities for deep learning methods.

However, these new applications come with a number of additional constraints and requirements, which limit the out-of-the-box use of current models.
1. Embedded devices, IoT gadgets and smartphones have limited memory & storage capacities and restricted energy resources. Deep neural networks such as VGG-16 require over 500 MB for storing the parameters and up to 15 giga-operations for performing a single forward pass. It is clear that such models in their current (uncompressed) form can not be used on-device.
2. Training data is often distributed over devices and can not simply be collected at a central server due to privacy issues or limited resources (bandwidth). Since a local training of the model with only few data points is often not promising, new collaborative training schemes are needed to bring the power of deep learning to these distributed applications.
This tutorial will discuss recently proposed techniques to tackle these two problems. We will start with a brief introduction to deep learning, it’s current use and the limitations of today’s models with respect to computational & memory complexity, energy efficiency and in distributed settings. We will stress the practical need to tackle these problems and discuss the recent developments towards this goal, including the emerging standardization activities by the ITU ML5G and MPEG AHG CNNMCD.

Then we will move on to the topic of neural network compression. We will start with a brief introduction of the basics concepts from source coding and information theory, including rate-distortion theory, quantization, entropy coding and the minimum description length principle. These concepts are needed to formalize the neural network compression problem. We will then move on to the discussion of specific techniques for compressing DNNs. For that we will differentiate between different steps of the compression process, namely pruning & sparsification, quantization and entropy coding. The first two steps are lossy, whereas the last step is lossless. Since size reduction is not the only goal of neural network compression (e.g., fast inference, energy efficiency are other goals), we will also discuss approaches to efficient inference, including recently proposed neural network format. We will finish this part with a presentation of a use case, namely on-device speech recognition, showing how to make use of compression methods in practical applications.

After the Q&A and the coffee break we will present the recent developments in distributed learning. We present different distributed training scenarios and compare them with respect to their communication characteristics. We then focus on Federated Learning for the rest of the talk. We enumerate existing challenges in federated learning - communication efficiency, data heterogeneity, privacy, personalization, robustness - and present solutions to these challenges that have been proposed in the litterature. We specifically focus on techniques proposed for reducing the communication overhead in distributed learning and discuss clustered FL, a new approach to model-agnostic distributed multi-task optimization. Here we will stress the similarity to concepts introduce in the first part of the tutorial, namely sparsification, quantization, encoding.

We will conclude the tutorial with a Q&A session.

For background material on the topic, see our reading list.

Outline of the tutorial

1. Introduction
- Current use of deep learning
- Practical limitations of current models and new applications
- Recent developments in research, industry & standardization

2. Neural Network Compression
- Background: Source Coding, Information Theory
- Pruning & Sparsification Methods
- Quantization & Fixed Point Inference
- Neural Network Formats
- Use Case Study: On-Device Speech Recognition

3. Questions

4. Coffee Break

5. Distributed Learning
- Background: SGD, Learning Theory
- Basic Concepts of Federated and Distributed Learning
- Reducing Communication Overhead & Connection to NN Compression
- Federated Learning & Differential Privacy
- Clustered Federated Learning

7. Questions

Slides

Slides Part 1
Slides Part 2

Organizers

Wojciech Samek Felix Sattler
Fraunhofer Heinrich Hertz Institute Fraunhofer Heinrich Hertz Institute