This Guide is not up to date and may not working anymore. Please use the sources below and follow the instructions there.

Caution

🚧Work in Progress🚧
You could run in to some Problems.
Be aware of what you do and its consequences.

🚧(Work in Progress)🚧 Running an .rkllm model on the NPU

Needed Packages:


pkcon install git
pkcon install pip
pkcon install libgomp

git repo with an Ollama compatible server:


git clone https://github.com/notpunchnox/rkllama


cd rkllama


pip install -r requirements.txt


pip install transformers
pip install flask
pip install huggingface_hub
pip install dotenv
pip install requests


chmod +x setup.sh
./setup.sh --no-conda

set CPU to rk3588 in serve.sh

serve.sh


8 CPU_MODEL="" #Change to CPU_MODEL="rk3588"
9 PORT=8080 #Change to something that's not 8080 like 8084

replace {"--no-conda" if use_no_conda else ""} with --no-conda in serve.py

client.py


357 os.system(f"bash ~/RKLLAMA/server.sh {"--no-conda" if use_no_conda else ""} --port={PORT}")

client.py


357 os.system(f"bash ~/RKLLAMA/server.sh {"--no-conda" if use_no_conda else ""} --port={PORT}")

Run a Model from a self converted Model file or pull it from Hugging Face


#Switch to where it is installed
cd /root/rkllama
chmod +x client.sh
./client.sh --no-conda

self converted Model Just add it to the /root/RKLLAMA/models folder

Hugging Face download a model


./client.sh --no-conda serve
#open new terminal or continue running it in the Background
./client --no-conda pull
#Follow the steps in the Terminal there and enter the Repo Path then the model filename


./client run 'model'

Source

https://github.com/NotPunchnox/rkllama/