X3NO's blog
-
How to run local, uncensored LLM on your phone

home blog license github contact

How to run local, uncensored LLM on your phone

21/08/2024

Introduction

LLMs are cool, restrictions and relying on big corporations is not. In this post, I describe how to self-host your own little unhinged assistant that you can always carry in your pocket.

Main challenges we're going to face

Solving the challenges

Computational power

First of all, we need a model small enough to run on a phone - going over 5B parameters is probably not a very good idea. You can distil one yourself, but I'm too lazy for that and don't really know how to do it, so I'm going to stick to something popular that already meets these restrictions. LLaMA 3.2 is a good choice; it's new and has 1B and 3B models available by default.

Liberation of the model

We want our little buddy to be a freak; however, people building big models don't always approve of that. To solve this, we can uncensor our model of choice ourselves, which again, I'm not going to do here because I'm lazy. Here's a nice guide on how to do it though: https://erichartford.com/uncensored-models. Good that there's a second option waiting for us - publicly available models that have already been uncensored by someone else. I'm going to use this one.

Software

So, we have our model. How do we run it? Like I mentioned before, using Termux directly is not fun. A nice user-friendly UI is much nicer. ChatterUI is one of a few projects I was able to find that does exactly that!

Tutorial

  1. Download and install ChatterUI. Consider using Obtainium if you're on Android. Auto-updates are cool.
  2. Download your model of choice in the GGUF format. 4-bit quantisation or lower is the way to go if you don't feel like waiting forever for a response - grab yourself an IQ-quant if it's available.
  3. (Optional) Quantise your model (convert to GGUF) if the model you've chosen doesn't have it available as a download.
    This website is pretty nice if you have an HF account. If you don't, follow the guide from the llama.cpp's README.
  4. Open ChatterUI, write a l33t system prompt, and load your model's GGUF file.

That's it! You can now try to not get laughed at by your friends when you try to flex on them. Good luck!