IEEE ICASSP 2020 Special Session on "Distributed Machine Learning on Wireless Networks"

Tuesday, 05 May 2020
Location information: Virtual Meeting.

The number of intelligent devices on wireless networks (e.g. smartphones, IoT devices) has rapidly grown in recent years. Many of these devices are equipped with sensors and with increasingly powerful processing power that allow them to collect and process data at unprecedented scales. These developments provide a significant opportunity to dramatically change how machine learning algorithms involving data collected by wireless devices are implemented. In particular, due to limited resources (e.g., bandwidth and power), latency constraints, and data privacy concerns, centralized training schemes, which require all the data to reside at a central location, will increasingly be substituted by distributed machine learning. The latter allows multiple parties to jointly train a model on their combined data, without any of the participants having to reveal their local data to other parties or to a centralized server. Moreover, it concentrates learning in locations where models may be used (e.g., for autonomous driving, environmental mapping, etc.), and minimizes latency and resource consumption. This new form of collaborative learnin however comes with various theoretical and methodological challenges.

For instance, the significant communication overhead can still be a limiting factor in largescale distributed learning scenarios. Furthermore, the impairments of the wireless medium, such as fading and interference, can affect the quality of learning. Thus, new methods for optimal client selection, efficient quantization and encoding of the transmitted information and error accumulation are required for making distributed training schemes practical in these settings. Also, a better theoretical understanding of the distributed learning problem (e.g., convergence results) and the development of learning schemes that are robust to adversarial or malfunctioning participants or to other vagaries of the wireless medium, are of high practical value. Concepts and methods from information theory, optimization and wireless communications can help to tackle these challenges.

This special session aims to bring together the latest research in this emerging research field and foster exchange with areas, which are traditionally strongly represented at IEEE ICASSP, e.g., information theory, machine learning, optimization and wireless communications.

For background material on the topic, see our reading list.

List of Speakers

		Deniz Gunduz, Imperial Colleague London Deniz Gündüz (S’02) received the B.S. degree in electrical and electronics engineering from the Middle East Technical University, Ankara, Turkey, in 2002, and the M.S. and Ph.D. degrees in electrical engineering from Polytechnic Institute of New York University (formerly Polytechnic University), Brooklyn, NY, in 2004 and 2007, respectively. He is currently a Postdoctoral Research Associate at the Department of Electrical Engineering, Princeton University, Princeton, NJ, and a consulting Assistant Professor at the Department of Electrical Engineering, Stanford University, Stanford, CA. In 2004, he was a summer researcher in the Laboratory of Information Theory (LTHI) at EPFL in Lausanne, Switzerland. His research interests lie in the areas of communications theory and information theory with special emphasis on joint source–channel coding, cooperative communications, network security, and cross-layer design. Dr. Gündüz is the recipient of the 2008 Alexander Hessel Award of Polytechnic University given to the best PhD dissertation, and a coauthor of the paper that received the Best Student Paper Award at the 2007 IEEE International Symposium on Information Theory.
		Gauri Joshi, Carnegie Mellon University Gauri Joshi received a B.Tech degree in Electrical Engineering, and an M. Tech in Communication and Signal Processing from the Indian Institute of Technology (IIT) Bombay in 2009 and 2010 respectively. She received an S.M. degree in 2012, and is now pursuing a PhD at the Massachusetts Institute of Technology (MIT). She is a recipient of the Institute Gold Medal for academic excellence at IIT Bombay, and the William Martin Memorial Prize for best S.M. thesis in computer science at MIT.
		Mehdi Bennis, University of Oulu Mehdi Bennis received his M.Sc. degree in electrical engineering jointly from EPFL, Switzerland, and the Eurecom Institute, France, in 2002. He obtained his Ph.D. from the University of Oulu in December 2009 on spectrum sharing for future mobile cellular systems. Currently he is an associate professor at the University of Oulu and an Academy of Finland research fellow. His main research interests are in radio resource management, heterogeneous networks, game theory, and machine learning in 5G networks and beyond. He has co-authored one book and published more than 200 research papers in international conferences, journals, and book chapters. He was the recipient of the prestigious 2015 Fred W. Ellersick Prize from the IEEE Communications Society, the 2016 Best Tutorial Prize from the IEEE Communications Society, the 2017 EURASIP Best Paper Award for the Journal of Wireless Communications and Networks, and the 2017 all-University of Oulu Award for Research.
		Osvaldo Simeone, King's College London Osvaldo Simeone received the M.Sc. degree (with honors) and the Ph.D. degree in information engineering from Politecnico di Milano, Milan, Italy, in 2001 and 2005 respectively. He is currently with the Center for Wireless Communications and Signal Processing Research (CWCSPR), at the New Jersey Institute of Technology (NJIT), Newark, NJ, where he is an Assistant Professor. His current research interests concern the cross-layer analysis and design of wireless networks with emphasis on information-theoretic, signal processing, and queuing aspects. Specific topics of interest are: cognitive radio, cooperative communications, ad hoc, sensor, mesh and hybrid networks, distributed estimation, and synchronization. Dr. Simeone is the corecipient of the best paper awards of IEEE SPAWC 2007 and IEEE WRECOM 2007. He currently serves as an Editor for IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS.
		Vincent Poor, Princeton University H. Vincent Poor is the Michael Henry Strater University Professor of Electrical Engineering at Princeton University, where his interests are in information theory and signal processing, and their applications in wireless networks, energy systems and related fields. He has served in editorial roles for several SPS publications, including the Transactions on Signal Processing, Signal Processing Magazine, and JSTSP. As IEEE Fellow, he received the Technical Achievement and Society Awards of the SPS in 2007 and 2011, respectively.
		Wojciech Samek, Fraunhofer Heinrich Hertz Institute Wojciech Samek is the Head of the Machine Learning Group at Fraunhofer Heinrich Hertz Institute. He is associated with the Berlin Big Data Center and the Berli Center for Machine Learning and is an editorial board member of IEEE TNNLS, Digital Signal Processing and PLoS ONE. He is a member of the Machine Learning for Signal Processing Technical Committee of the IEEE Signal Processing Society, and has received multiple best paper awards . He has co-authored more than 100 journal and conference papers, predominantly in the areas deep learning, interpretable machine learning, robust signal processing, neural network compression and distributed learning.

Schedule

09:00-09:20	Introduction
09:20-10:00	Talk 1: Deniz Gunduz - Hierarchical Federated Learning in Across Heterogenous Cellular Networks We consider federated edge learning (FEEL), where mobile users (MUs) collaboratively learn a global model by sharing local updates on the model parameters rather than their datasets, with the help of a mobile base station (MBS). We optimize the resource allocation among MUs to reduce the communication latency in learning iterations. Observing that the performance in this centralized setting is limited due to the distance of the cell-edge users to the MBS, we consider small cell base stations (SBSs) orchestrating FEEL among MUs within their cells, and periodically exchanging model updates with the MBS for global consensus. We show that this hierarchical federated learning (HFL) scheme significantly reduces the communication latency without sacrificing the accuracy.
10:00-10:40	Talk 2: Mehdi Bennis - Q-GADMM: Quantized Group ADMM for Communication Efficient Decentralized Machine Learning In this paper, we propose a communication-efficient decentralized machine learning (ML) algorithm, coined quantized group ADMM (Q-GADMM). Every worker in Q-GADMM communicates only with two neighbors, and updates its model via the alternating direct method of multiplier (ADMM), thereby ensuring fast convergence while reducing the number of communication rounds. Furthermore, each worker quantizes its model updates before transmissions, thereby decreasing the communication payload sizes. We prove that Q-GADMM converges for convex loss functions, and numerically show that Q-GADMM yields 7x less communication cost while achieving almost the same accuracy and convergence speed compared to a baseline without quantization, group ADMM (GADMM).
10:40-11:10	Coffee Break
11:10-11:50	Talk 3: Gauri Joshi - Overlap Local-SDG: An Algorithmic Approach to hide Communication Delays in Distributed SDG Distributed stochastic gradient descent (SGD) is essential for scaling the machine learning algorithms to a large number of computing nodes. However, the infrastructures variability such as high communication delay or random node slowdown greatly impedes the performance of distributed SGD algorithm, especially in a wireless system or sensor networks. In this paper, we propose an algorithmic approach named Overlap Local-SGD (and its momentum variant) to overlap communication and computation so as to speedup the distributed training procedure. The approach can help to mitigate the straggler effects as well. We achieve this by adding an anchor model on each node. After multiple local updates, locally trained models will be pulled back towards the synchronized anchor model rather than communicating with others. Experimental results of training a deep neural network on CIFAR-10 dataset demonstrate the effectiveness of Overlap Local-SGD. We also provide a convergence guarantee for the proposed algorithm under non-convex objective functions.
11:50-12:30	Talk 4: Osvaldo Simeone - Cooperative Learning via Federated Distillation over Fading Channels Cooperative training methods for distributed machine learning are typically based on the exchange of local gradients or local model parameters. The latter approach is known as Federated Learning (FL). An alternative solution with reduced communication overhead, referred to as Federated Distillation (FD), was recently proposed that exchanges only averaged model outputs. While prior work studied implementations of FL over wireless fading channels, here we propose wireless protocols for FD and for an enhanced version thereof that leverages an offline communication phase to communicate “mixed-up” covariate vectors. The proposed implementations consist of different combinations of digital schemes based on separate source-channel coding and of over-the-air computing strategies based on analog joint source-channel coding. It is shown that the enhanced version FD has the potential to significantly outperform FL in the presence of limited spectral resources.
10:40-11:10	Coffee Break
12:30-13:10	Talk 5: Wojciech Samek - On the Byzantine Robustness of Clustered Federated Learning Federated Learning (FL) is currently the most widely adopted framework for collaborative training of (deep) machine learning models under privacy constraints. Albeit it’s popularity, it has been observed that Federated Learning yields suboptimal results if the local clients’ data distributions diverge. The recently proposed Clustered Federated Learning Framework addresses this issue, by separating the client population into different groups based on the pairwise cosine similarities between their parameter updates. In this work we investigate the application of CFL to byzantine settings, where a subset of clients behaves unpredictably or tries to disturb the joint training effort in an directed or undirected way. We perform experiments with deep neural networks on common Federated Learning datasets which demonstrate that CFL (without modifications) is able to reliably detect byzantine clients and remove them from training.
13:10-13:50	Talk 6: Vincent Poor - Federated Learning with Quantization Constraints Traditional deep learning models are trained on centralized servers using labeled sample data collected from edge devices. This data often includes private information, which the users may not be willing to share. Federated learning (FL) is an emerging approach to train such learning models without requiring the users to share their possibly private labeled data. In FL, each user trains its copy of the learning model locally. The server then collects the individual updates and aggregates them into a global model. A major challenge that arises in this method is the need of each user to efficiently transmit its learned model over the throughput limited uplink channel. In this work, we tackle this challenge using tools from quantization theory. In particular, we identify the unique characteristics associated with conveying trained models over rate-constrained channels, and characterize a suitable quantization scheme for such setups. We show that combining universal vector quantization methods with FL yields a decentralized training system, which is both efficient and feasible. We also derive theoretical performance guarantees of the system. Our numerical results illustrate the substantial performance gains of our scheme over FL with previously proposed quantization approaches.
13:50-15:00	Panel Discussion + Conclusion

Note that the exact schedule can be subject to change.

Organizers


H. Vincent Poor	Wojciech Samek
Princeton University	Fraunhofer Heinrich Hertz Institute