[VicPiMakers Projects] Running the new llamafile (llama.cpp) app

Mon Apr 27 07:49:57 PDT 2026

Thanks Craig. I have it on my list to experiment more with self-hosting
LLMs. I think there will be calls for self-hosting once AI fervor has
peaked and labs have to show profitability.

Not on topic, but related to our NetSIG discussion on odd industry
behaviours around LLM resource consumption:

https://newsletter.pragmaticengineer.com/p/the-pulse-tokenmaxxing-as-a-weird-6b2

We're back to the days of  "more K-LOCs!"

On Sun, Apr 26, 2026 at 8:21 AM Craig Miller <cvmiller at gmail.com> wrote:

> Hi Greg,
>
> No I haven't. I think you could run 'strace' to see what the model was
> doing at the time, but it would be slow, and I am not sure it would tell
> you much.
>
> I don't think it was a RAM issue, since the container I am running the
> LLMs is unrestricted (can use all the host's memory, which is 32 GB), and
> the kernel is fairly recent (6.18.19-0-lts).
>
> I didn't spend much time on it, because, my objective was to get a local
> LLM running, not debug the model at the time.
>
> Craig...
> On 4/26/26 07:56, Greg H wrote:
>
> I was curious if you do any troubleshooting for the models that core dump.
> I don't have any experience with this and I'm wondering if there's much
> that you can do other than increase the resources (i.e. more RAM). Maybe
> upgrade the kernel? Guessing some models need the latest / greatest kernel
> versions to do their thing.
>
>
> On Sun, Apr 26, 2026 at 7:34 AM Craig Miller <cvmiller at gmail.com> wrote:
>
>> Hi Deid,
>>
>> Looking at the gguf models on HuggingFace:
>>
>> https://huggingface.co/models?library=gguf
>>
>> There were a couple of parameters I was looking at:
>>
>>    1. Not too big, somewhere between 5 and 10 GB in size
>>    2. Relatively recent
>>    3. Doesn't core dump right away
>>
>> I had the best luck at running the Qwen models. I am running
>> Qwen2.5-VL-7B-Instruct-abliterated.Q4_K_M.gguf on my PN-50, and it seems to
>> run reasonably fast. Some of the other models were quite slow on the PN-50.
>>
>> Have fun!
>>
>> Craig...
>> On 4/26/26 07:13, Deid Reimer wrote:
>>
>> Hey Craig,
>>
>> Why did you pick that particular LLM?
>>
>> Deid   VA7REI
>> On Apr 25, 2026, at 8:32 a.m., Craig Miller <cvmiller at gmail.com> wrote:
>>>
>>> Hi All,
>>>
>>> We were chatting before the most recent NetSIG about the new Llamafile
>>> app, which has excellent support for IPv6. The app runs a webserver (which
>>> is IPv6 accessible).  The new llamafile app takes a -m parameter which
>>> points to the gguf LLM model.
>>>
>>> * Old way*
>>>      ./google_gemma-3-4b-it-Q6_K.llamafile --server -v2 --host
>>> lxcllama.example.com
>>>  *New way*
>>>      llamafile -m model.gguf --server --port 8080
>>>
>>> Find the new llamafile at:
>>>
>>>     https://github.com/mozilla-ai/llamafile/releases/tag/0.10.0
>>>
>>> You can find gguf (LLM models) at:
>>>
>>>      https://huggingface.co/models?library=gguf
>>>
>>> I start my llamafile using this command:
>>>
>>>     ./llamafile-0.10.0 -m Qwen3.5-9B.Q4_K_M.gguf --server --port 8080
>>> --host lxcllama.example.com
>>>
>>> This way any webbrowser at my house, can access the LLM.
>>>
>>> Happy LLM-ing,
>>>
>>> Craig...
>>>
>>> --
>>> IPv6 is the future, the future is herehttp://ipv6hawaii.org/
>>>
>>> --
>>> Projects mailing listProjects at vicpimakers.cahttp://vicpimakers.ca/mailman/listinfo/projects_vicpimakers.ca
>>>
>>>
>> --
>> IPv6 is the future, the future is herehttp://ipv6hawaii.org/
>>
>> --
>> Projects mailing list
>> Projects at vicpimakers.ca
>> http://vicpimakers.ca/mailman/listinfo/projects_vicpimakers.ca
>>
>
> --
> IPv6 is the future, the future is herehttp://ipv6hawaii.org/
>
> --
> Projects mailing list
> Projects at vicpimakers.ca
> http://vicpimakers.ca/mailman/listinfo/projects_vicpimakers.ca
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://vicpimakers.ca/pipermail/projects_vicpimakers.ca/attachments/20260427/e1f17e2e/attachment.htm>