Introduction
Quantization in general can be defined as mapping values from a large set of real numbers i.e., FP32 or even FP16 to values in a small discrete set most likely Int8 or Int4. There are recent works trying to map to 1bit models.
Typically ...