What is model compression quantization?

Question

SMEBOOK (admin) · Accepted Answer

Model compression quantization is one of the technique used to compress models. It involves bundling weights together by clustering them or rounding them off so that the same number of connections can be represented using lesser amount of memory. Quantization is the idea of representing these weights by reducing the number of bits. The weights can be quantized to 16-bit, 8-bit, 4-bit or even with 1-bit. By reducing the number of bits used, the size of the deep neural network can be significantly reduced.

Share it on social networks: Tweet Share

SMEBOOK Knowledge base

What is model compression quantization?