This article addresses the challenges of containerizing and deploying a Large Language Model (LLM). The main challenges include: the substantial disk space required by these models, the time-consuming process of pushing and pulling them, the lengthy build times, and the need to select a model with a permissive license for use and deployment in production without legal restrictions.
https://github.com/bitsector/containerised-hf-zephyr-7b-beta
Deploying an LLM locally without depending on 3rd party APIs has many benefits one should consider:
I will outline the steps to build, deploy, test, and expose the model's chat capability.
Suppose we want to use Zephyr-7b-beta as a chatbot. We wrap it with a simple Python Flask web app, install pip dependencies, package everything into a container, build, push to Docker Hub, and deploy as a web app. What could possibly go wrong?