Sometimes your finetuned language Model work as expected but you need a faster inference time. Some other times you need to reduce memory footprint. By transformring your models to the GGUF format you can store quantized models and using them on top of the fast llama.cpp inference engine. [5 min read]