LLM is decoding is memory-bound. Diffusion models is compute-bound. (Why??)
Deep Compression:
- deep compression VAE
Distillation:
Quantization:
- SVDQuant
Parallelism:
- tensor parallelism
- data parallelism
LLM is decoding is memory-bound. Diffusion models is compute-bound. (Why??)
Deep Compression:
Distillation:
Quantization:
Parallelism: