<!DOCTYPE html>

<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <p>Good article, Greg,</p>

    <p>Clearly, you aren't using enough tokens (to be tokenmaxxing).</p>

    <p>BTW, the llamafile (aka llama.cpp) will kick out token

      statistics. Not sure what it means:</p>

    <pre>srv   prompt_save:  - saving prompt with length 631, total state size = 34.516 MiB

srv          load:  - looking for better prompt, base f_keep = 0.022, sim = 0.500

srv        update:  - cache state: 5 prompts, 74.066 MiB (limits: 8192.000 MiB, 128000 tokens, 149758 est)

srv        update:    - prompt 0x7fc12c15f5b0:     431 tokens, checkpoints:  0,    23.576 MiB

srv        update:    - prompt 0x7fc12c1634a0:      75 tokens, checkpoints:  0,     4.103 MiB

srv        update:    - prompt 0x7fc12c15fb40:      75 tokens, checkpoints:  0,     4.103 MiB

srv        update:    - prompt 0x7fc15781b190:     142 tokens, checkpoints:  0,     7.768 MiB

srv        update:    - prompt 0x7fc12c1631b0:     631 tokens, checkpoints:  0,    34.516 MiB

</pre>

    <p>Craig....</p>

    <p><br>

    </p>

    <div class="moz-cite-prefix">On 4/27/26 07:49, Greg H wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:CAEuphw3G3gwBnKsitSbxDFZkv4zcEAYaKextQCW8zdZX9zhvmQ@mail.gmail.com">

      <meta http-equiv="content-type" content="text/html; charset=UTF-8">

      <div dir="ltr">Thanks Craig. I have it on my list to experiment

        more with self-hosting LLMs. I think there will be calls for

        self-hosting once AI fervor has peaked and labs have to show

        profitability.<br>

        <br>

        Not on topic, but related to our NetSIG discussion on odd

        industry behaviours around LLM resource consumption:<br>

        <br>

        <a

href="https://newsletter.pragmaticengineer.com/p/the-pulse-tokenmaxxing-as-a-weird-6b2"

          moz-do-not-send="true" class="moz-txt-link-freetext">https://newsletter.pragmaticengineer.com/p/the-pulse-tokenmaxxing-as-a-weird-6b2</a>

        <div><br>

        </div>

        <div>We're back to the days of  "more K-LOCs!" <br>

          <br>

        </div>

      </div>

      <br>

      <div class="gmail_quote gmail_quote_container">

        <div dir="ltr" class="gmail_attr">On Sun, Apr 26, 2026 at

          8:21\u202fAM Craig Miller &lt;<a href="mailto:cvmiller@gmail.com"

            moz-do-not-send="true" class="moz-txt-link-freetext">cvmiller@gmail.com</a>&gt;

          wrote:<br>

        </div>

        <blockquote class="gmail_quote"

style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

          <div>

            <p>Hi Greg,</p>

            <p>No I haven't. I think you could run 'strace' to see what

              the model was doing at the time, but it would be slow, and

              I am not sure it would tell you much.</p>

            <p>I don't think it was a RAM issue, since the container I

              am running the LLMs is unrestricted (can use all the

              host's memory, which is 32 GB), and the kernel is fairly

              recent (6.18.19-0-lts).</p>

            <p>I didn't spend much time on it, because, my objective was

              to get a local LLM running, not debug the model at the

              time.</p>

            <p>Craig...</p>

            <div>On 4/26/26 07:56, Greg H wrote:<br>

            </div>

            <blockquote type="cite">

              <div dir="ltr">I was curious if you do any troubleshooting

                for the models that core dump. I don't have any

                experience with this and I'm wondering if there's much

                that you can do other than increase the resources (i.e.

                more RAM). Maybe upgrade the kernel? Guessing some

                models need the latest / greatest kernel versions to do

                their thing. 

                <div> </div>

              </div>

              <br>

              <div class="gmail_quote">

                <div dir="ltr" class="gmail_attr">On Sun, Apr 26, 2026

                  at 7:34\u202fAM Craig Miller &lt;<a

                    href="mailto:cvmiller@gmail.com" target="_blank"

                    moz-do-not-send="true" class="moz-txt-link-freetext">cvmiller@gmail.com</a>&gt;

                  wrote:<br>

                </div>

                <blockquote class="gmail_quote"

style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                  <div>

                    <p>Hi Deid,</p>

                    <p>Looking at the gguf models on HuggingFace:</p>

                    <p><a

href="https://huggingface.co/models?library=gguf" target="_blank"

                        moz-do-not-send="true"

                        class="moz-txt-link-freetext">https://huggingface.co/models?library=gguf</a></p>

                    <p>There were a couple of parameters I was looking

                      at:</p>

                    <ol>

                      <li>Not too big, somewhere between 5 and 10 GB in

                        size</li>

                      <li>Relatively recent</li>

                      <li>Doesn't core dump right away</li>

                    </ol>

                    <p>I had the best luck at running the Qwen models. I

                      am running

                      Qwen2.5-VL-7B-Instruct-abliterated.Q4_K_M.gguf on

                      my PN-50, and it seems to run reasonably fast.

                      Some of the other models were quite slow on the

                      PN-50.</p>

                    <p>Have fun!</p>

                    <p>Craig...</p>

                    <div>On 4/26/26 07:13, Deid Reimer wrote:<br>

                    </div>

                    <blockquote type="cite">

                      <div dir="auto">Hey Craig, <br>

                        <br>

                      </div>

                      <div dir="auto">Why did you pick that particular

                        LLM?<br>

                        <br>

                      </div>

                      <div dir="auto">Deid   VA7REI</div>

                      <div class="gmail_quote">On Apr 25, 2026, at 8:32

                        a.m., Craig Miller &lt;<a

                          href="mailto:cvmiller@gmail.com"

                          target="_blank" moz-do-not-send="true"

                          class="moz-txt-link-freetext">cvmiller@gmail.com</a>&gt;

                        wrote:

                        <blockquote class="gmail_quote"

style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                          <p>Hi All,</p>

                          <p>We were chatting before the most recent

                            NetSIG about the new Llamafile app, which

                            has excellent support for IPv6. The app runs

                            a webserver (which is IPv6 accessible).  The

                            new llamafile app takes a -m parameter which

                            points to the gguf LLM model.</p>

                          <p><b> Old way</b><br>

                                 ./google_gemma-3-4b-it-Q6_K.llamafile

                            --server -v2 --host <a

                              href="http://lxcllama.example.com"

                              target="_blank" moz-do-not-send="true">lxcllama.example.com</a><br>

                             <b>New way</b><br>

                                 llamafile -m model.gguf --server --port

                            8080</p>

                          <p>Find the new llamafile at:</p>

                          <p>    <a

href="https://github.com/mozilla-ai/llamafile/releases/tag/0.10.0"

                              target="_blank" moz-do-not-send="true"

                              class="moz-txt-link-freetext">https://github.com/mozilla-ai/llamafile/releases/tag/0.10.0</a></p>

                          <p>You can find gguf (LLM models) at:</p>

                          <p>     <a

href="https://huggingface.co/models?library=gguf" target="_blank"

                              moz-do-not-send="true"

                              class="moz-txt-link-freetext">https://huggingface.co/models?library=gguf</a></p>

                          <p>I start my llamafile using this command:</p>

                          <p>    ./llamafile-0.10.0 -m

                            Qwen3.5-9B.Q4_K_M.gguf --server --port 8080

                            --host <a

                              href="http://lxcllama.example.com"

                              target="_blank" moz-do-not-send="true">lxcllama.example.com</a> </p>

                          <p>This way any webbrowser at my house, can

                            access the LLM.</p>

                          <p>Happy LLM-ing,</p>

                          <p>Craig...</p>

                          <pre cols="72">-- 

IPv6 is the future, the future is here

<a href="http://ipv6hawaii.org/" target="_blank" moz-do-not-send="true"

                          class="moz-txt-link-freetext">http://ipv6hawaii.org/</a></pre>

                          <pre>-- 

Projects mailing list

<a href="mailto:Projects@vicpimakers.ca" target="_blank"

                          moz-do-not-send="true"

                          class="moz-txt-link-freetext">Projects@vicpimakers.ca</a>

<a href="http://vicpimakers.ca/mailman/listinfo/projects_vicpimakers.ca"

                          target="_blank" moz-do-not-send="true"

                          class="moz-txt-link-freetext">http://vicpimakers.ca/mailman/listinfo/projects_vicpimakers.ca</a>

</pre>

                        </blockquote>

                      </div>

                      <br>

                      <fieldset></fieldset>

                    </blockquote>

                    <pre cols="72">-- 

IPv6 is the future, the future is here

<a href="http://ipv6hawaii.org/" target="_blank" moz-do-not-send="true"

                    class="moz-txt-link-freetext">http://ipv6hawaii.org/</a></pre>

                  </div>

                  -- <br>

                  Projects mailing list<br>

                  <a href="mailto:Projects@vicpimakers.ca"

                    target="_blank" moz-do-not-send="true"

                    class="moz-txt-link-freetext">Projects@vicpimakers.ca</a><br>

                  <a

href="http://vicpimakers.ca/mailman/listinfo/projects_vicpimakers.ca"

                    rel="noreferrer" target="_blank"

                    moz-do-not-send="true" class="moz-txt-link-freetext">http://vicpimakers.ca/mailman/listinfo/projects_vicpimakers.ca</a><br>

                </blockquote>

              </div>

              <br>

              <fieldset></fieldset>

            </blockquote>

            <pre cols="72">-- 

IPv6 is the future, the future is here

<a href="http://ipv6hawaii.org/" target="_blank" moz-do-not-send="true"

            class="moz-txt-link-freetext">http://ipv6hawaii.org/</a></pre>

          </div>

          -- <br>

          Projects mailing list<br>

          <a href="mailto:Projects@vicpimakers.ca" target="_blank"

            moz-do-not-send="true" class="moz-txt-link-freetext">Projects@vicpimakers.ca</a><br>

          <a

href="http://vicpimakers.ca/mailman/listinfo/projects_vicpimakers.ca"

            rel="noreferrer" target="_blank" moz-do-not-send="true"

            class="moz-txt-link-freetext">http://vicpimakers.ca/mailman/listinfo/projects_vicpimakers.ca</a><br>

        </blockquote>

      </div>

      <br>

      <fieldset class="moz-mime-attachment-header"></fieldset>

    </blockquote>

    <pre class="moz-signature" cols="72">-- 

IPv6 is the future, the future is here

<a class="moz-txt-link-freetext" href="http://ipv6hawaii.org/">http://ipv6hawaii.org/</a></pre>

  </body>

</html>