Model compression is a crucial technology for vision models, aiming to reduce the size and computational complexity of models while maintaining or slightly degrading their performance. Techniques such as quantization, pruning, and knowledge distillation are commonly used in model compression. This enables vision models to be deployed on resource-constrained devices, such as mobile phones and embedded systems.