Tencent AI Lab open source world’s first automated model compression framework PocketFlow

Tencent AI Lab open source world’s first automated model compression framework PocketFlow

Tencent AI Lab Machine Learning Center announced today that it has successfully developed the world’s first automated deep learning model compression framework – PocketFlow, and will release the open source code in the near future. According to Lei Feng.com’s AI technology review, this is an automatic model compression framework for mobile AI developers. It integrates the current mainstream model compression and training algorithms, and combines self-developed hyperparameter optimization components to achieve a fully automated managed model. Compression and acceleration. Developers can quickly deploy AI technology to mobile products without knowing the details of specific algorithms to achieve local and efficient processing of user data.

Tencent AI Lab Machine Learning Center announced today that it has successfully developed the world’s first automated deep learning model compression framework – PocketFlow, and will soon release its open source code. According to Lei Feng.com’s AI technology review, this is an automatic model compression framework for mobile AI developers. It integrates the current mainstream model compression and training algorithms, and combines self-developed hyperparameter optimization components to achieve a fully automated managed model. Compression and acceleration. Developers can quickly deploy AI technology to mobile products without knowing the details of specific algorithms to achieve local and efficient processing of user data.

With the rapid development of AI technology, more and more companies hope to inject AI capabilities into their mobile products. However, mainstream deep learning models often require high computing resources, making it difficult to directly deploy them into consumer-grade mobile devices. In this case, many model compression and acceleration algorithms have emerged, which can effectively improve the computational efficiency of network structures such as CNN and RNN with a small loss of accuracy (or even lossless), thus enabling the deployment of deep learning models on mobile terminals. become possible. However, how to select appropriate model compression and acceleration algorithms and corresponding hyperparameter values ​​according to actual application scenarios often requires more professional knowledge and practical experience, which undoubtedly raises the threshold of this technology for general developers.

In this context, Tencent AI Lab Machine Learning Center has developed the PocketFlow open source framework to realize automatic deep learning model compression and acceleration, and help AI technology to be widely used in more mobile products. By integrating multiple deep learning model compression algorithms and innovatively introducing hyperparameter optimization components, the automation of model compression technology has been greatly improved. Developers do not need to intervene in the selection of specific model compression algorithms and their hyperparameter values. They only need to specify and set expected performance indicators, and then they can obtain a compression model that meets their needs through PocketFlow and quickly deploy them to mobile applications.

Framework introduction

The PocketFlow framework is mainly composed of two components, namely the model compression/acceleration algorithm component and the hyperparameter optimization component. The specific structure is shown in the following figure.

Tencent AI Lab open source world’s first automated model compression framework PocketFlow

The developer takes the uncompressed original model as the input to the PocketFlow framework, and specifies the desired performance metrics, such as the compression and/or speedup factor of the model; during each iteration, the hyperparameter optimization component selects a set of hyperparameter values. Then, the model compression/acceleration algorithm component compresses the original model based on the combination of hyperparameter values ​​to obtain a compressed candidate model; based on the results of the performance evaluation of the candidate model, the hyperparameter optimization component adjusts its own model parameters , and select a new set of hyperparameter value combinations to start the next iteration process; when the iteration ends, PocketFlow selects the optimal hyperparameter value combination and the corresponding candidate model as the final output and returns it to the developer Model deployment for mobile.

Specifically, PocketFlow achieves compression and acceleration of deep learning models with less accuracy loss and higher automation through the effective combination of the following algorithm components:

a) Channel pruning (channel pruning) component: In the CNN network, by pruning the channel dimension in the feature map, the model size and computational complexity can be reduced at the same time, and the compressed model can be directly based on the existing depth. Learn the framework to deploy. In the CIFAR-10 image classification task, by channel pruning the ResNet-56 model, the classification accuracy loss of 0.4% under 2.5x acceleration and 0.7% under 3.3x acceleration can be achieved.

b) Weight sparsification component: By introducing sparsity constraints to the network weights, the number of non-zero elements in the network weights can be greatly reduced; the network weights of the compressed model can be stored and transmitted in the form of a sparse matrix , so as to achieve model compression. For the MobileNet image classification model, after removing 50% of the network weights, the Top-1 classification accuracy loss on the ImageNet dataset is only 0.6%.

c) Weight quantization component: By introducing quantization constraints to network weights, the number of bits required to represent each network weight can be reduced; the team also provides support for two categories of quantization algorithms, uniform and non-uniform, The hardware optimization of devices such as ARM and FPGA can be fully utilized to improve the computing efficiency of the mobile terminal and provide software support for future neural network chip designs. Taking the ResNet-18 model for ImageNet image classification task as an example, it can achieve 4x compression with lossless accuracy under 8-bit specific point quantization.

d) Network distillation component: For the above various model compression components, the output of the uncompressed original model is used as additional supervision information to guide the training of the compressed model, under the premise that the compression/acceleration multiple is unchanged Accuracy improvements ranging from 0.5% to 2.0% can be obtained.

e) Multi-GPU training component: The deep learning model training process requires high computing resources, and it is difficult for a single GPU to complete model training in a short time. Therefore, the team provides a comprehensive multi-machine and multi-card distributed training program. Support to speed up the user’s development process. Both the Resnet-50 image classification model based on ImageNet data and the Transformer machine translation model based on WMT14 data can be trained within an hour.

f) Hyper-parameter optimization component: Most developers often don’t know much about model compression algorithms, but the value of hyperparameters often has a huge impact on the final result, so the team introduced the hyperparameter optimization component, using It includes algorithms such as reinforcement learning and the AutoML automatic hyperparameter optimization framework developed by AI Lab to determine the optimal combination of hyperparameter values ​​according to specific performance requirements. For example, for the channel pruning algorithm, the hyperparameter optimization component can automatically use different pruning ratios for each layer according to the redundancy of each layer in the original model. Maximize model recognition accuracy.

Tencent AI Lab open source world’s first automated model compression framework PocketFlow

Performance show

By introducing the hyperparameter optimization component, it not only avoids the high threshold and tedious manual parameter adjustment work, but also makes PocketFlow surpass the effect of manual parameter adjustment in all compression algorithms. Taking the image classification task as an example, on datasets such as CIFAR-10 and ImageNet, PocketFlow can effectively compress and accelerate various CNN network structures such as ResNet and MobileNet.

On the CIFAR-10 dataset, PocketFlow uses ResNet-56 as the benchmark model for channel pruning, and adds training strategies such as hyperparameter optimization and network distillation, achieving a classification accuracy loss of 0.4% under 2.5 times acceleration, and 3.3 times acceleration. Accuracy loss of 0.7%, and significantly better than the uncompressed ResNet-44 model; [2] On the ImageNet dataset, PocketFlow can continue to sparse the weight of the MobileNet model, which is already very streamlined, and achieve similar classification accuracy with a smaller model size; compared with Inception-V1, ResNet-18 and other models, the model size is only It is about 20~40% of the latter, but the classification accuracy is basically the same (or even higher).

Tencent AI Lab open source world’s first automated model compression framework PocketFlow
Tencent AI Lab open source world’s first automated model compression framework PocketFlow

Compared with the time-consuming and laborious manual parameter tuning, the AutoML automatic hyperparameter optimization component in the PocketFlow framework can achieve similar performance to manual parameter tuning in more than 10 iterations. After 100 iterations, the searched hyperparameter combinations can be Reduced accuracy loss by about 0.6%; by using a hyperparameter optimization component to automatically determine the number of quantization bits for the weights of each layer in the network, PocketFlow achieved consistent compression of the ResNet-18 model used for ImageNet image classification tasks Performance improvement; when the average number of quantization bits is 4 bits, the introduction of the hyperparameter optimization component can improve the classification accuracy from 63.6% to 68.1% (the original model’s classification accuracy is 70.3%).

Tencent AI Lab open source world’s first automated model compression framework PocketFlow
Tencent AI Lab open source world’s first automated model compression framework PocketFlow

Compression and acceleration of deep learning models is one of the current research hotspots in academia, and it also has broad application prospects in industry. With the launch of PocketFlow, developers do not need to understand the specific details of the model compression algorithm, nor do they need to care about the selection and tuning of various hyperparameters. The application of AI capabilities in more mobile products has paved the way.

The Links:   LQ190E1LW43 FP15R12W1T4

micohuang