Skip to Content
This 📚 Documentation is still 🧑‍💻 Work in Progress
DeveloperExperimental guides⚠️ 🚧(WIP)🚧 Running an LLM on the NPU

This Guide is not up to date and may not working anymore. Please use the sources below and follow the instructions there.

Caution

🚧Work in Progress🚧
You could run in to some Problems.
Be aware of what you do and its consequences.

🚧(Work in Progress)🚧 Running an .rkllm model on the NPU

Needed Packages:

pkcon install git pkcon install pip pkcon install libgomp

git repo with an Ollama compatible server:

git clone https://github.com/notpunchnox/rkllama
cd rkllama
pip install -r requirements.txt
pip install transformers pip install flask pip install huggingface_hub pip install dotenv pip install requests
chmod +x setup.sh ./setup.sh --no-conda

set CPU to rk3588 in serve.sh

serve.sh
8 CPU_MODEL="" #Change to CPU_MODEL="rk3588" 9 PORT=8080 #Change to something that's not 8080 like 8084

replace {"--no-conda" if use_no_conda else ""} with --no-conda in serve.py

client.py
357 os.system(f"bash ~/RKLLAMA/server.sh {"--no-conda" if use_no_conda else ""} --port={PORT}")
client.py
357 os.system(f"bash ~/RKLLAMA/server.sh {"--no-conda" if use_no_conda else ""} --port={PORT}")

Run a Model from a self converted Model file or pull it from Hugging Face

#Switch to where it is installed cd /root/rkllama chmod +x client.sh ./client.sh --no-conda

self converted Model Just add it to the /root/RKLLAMA/models folder

Hugging Face download a model

./client.sh --no-conda serve #open new terminal or continue running it in the Background ./client --no-conda pull #Follow the steps in the Terminal there and enter the Repo Path then the model filename
./client run 'model'

Source

https://github.com/NotPunchnox/rkllama/ 

Last updated on