quantization

01 Apr 2024

Converting your own finetuned language model to GGUF
Sometimes your finetuned language Model work as expected but you need a faster inference time. Some other times you need to reduce memory footprint. By transformring your models to the GGUF format you can store quantized models and using them on top of the fast llama.cpp inference engine. [5 min read]