Deep Learning (DL) has become the dominant methodology in the machine learning field. But there is more to DL than just performance: as today’s complex models demand more and more parameters and compute resources, researchers are also critically examining these important footprint metrics.
A Google Research team recently conducted a survey on how to make Deep Learning models smaller, faster and better, focusing on core areas of model efficiency, from modeling techniques to hardware support. The team has also open-sourced an experiment-based guide and code to help practitioners optimize their model training and deployment.
The researchers first identify challenges in DL model training and deployment, including sustainable server-side scaling, enabling on-device deployment, privacy and data sensitivity, new applications, and the explosion of models. These challenges are all grounded in model efficiency, a quality the researchers further break down into inference efficiency and training efficiency.
The team defines their objective as achieving pareto-optimality, i.e. choosing an optimal model based on the relevant trade-offs involved. To this end, they explore how various algorithms, techniques, tools, and infrastructures can work together to enable users to better train and deploy pareto-optimal models with respect to both model quality and model footprint.
The team classifies pareto-optimal models in five major areas: compression techniques, learning techniques, automation, efficient architecture and infrastructure. Compressing network layers is a classical technique for optimizing a model’s architecture. For example, quantization is a method designed to compress the weight matrices of a layer by reducing precision with minimal loss in quality. Learning techniques meanwhile focus on using different model training techniques to produce fewer prediction errors, require less data, converge faster, etc. Automation involves tools that improve the core metrics of a given model, such as optimizing hyperparameters and architecture search. Model architectures can also be designed to be more efficient, such as how attention layers solved the information bottleneck problem in Seq2Seq models. Finally, model training frameworks such as Tensorflow, PyTorch, etc., can also improve model efficiency.
The Google researchers survey each of these areas in depth in their 44-page paper, which provides valuable insights for practitioners on how to obtain efficient DL models. The team proposes and empirically proves the soundness of two strategies for achieving pareto-optimal models:
- Shrink-and-Improve for Footprint-Sensitive Models: If, as a practitioner, you want to reduce your footprint while keeping the quality the same, this could be a useful strategy for on-device deployments and server-side model optimization. Shrinking should ideally be minimally lossy in terms of quality (can be achieved via learned compression techniques, architecture search etc.), but in some cases even naively reducing capacity can also be compensated by the Improve phase. It is also possible to do the Improve phase before the Shrink phase.
- Grow-Improve-and-Shrink for Quality-Sensitive Models: When you want to deploy models that have better quality while keeping the same footprint, it might make sense to follow this strategy. Here, the capacity is first added by growing the model as illustrated earlier. The model is then improved using via learning techniques, automation, etc. and then shrunk back either naively or in a learned manner. Alternatively, the model could be shrunk back either in a learned manner directly after growing the model too.
The researchers believe theirs is the first survey in the efficient deep learning space to comprehensively cover the landscape of model efficiency, from modelling techniques to hardware support. They hope the work can serve as a practical guide to help developers train and deploy more efficient models and can provide the DL community with inspiration for further studies in this field.
The paper Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better is on arXiv.