<!DOCTYPE html>
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <p>Good article, Greg,</p>
    <p>Clearly, you aren't using enough tokens (to be tokenmaxxing).</p>
    <p>BTW, the llamafile (aka llama.cpp) will kick out token
      statistics. Not sure what it means:</p>
    <pre>srv   prompt_save:  - saving prompt with length 631, total state size = 34.516 MiB
srv          load:  - looking for better prompt, base f_keep = 0.022, sim = 0.500
srv        update:  - cache state: 5 prompts, 74.066 MiB (limits: 8192.000 MiB, 128000 tokens, 149758 est)
srv        update:    - prompt 0x7fc12c15f5b0:     431 tokens, checkpoints:  0,    23.576 MiB
srv        update:    - prompt 0x7fc12c1634a0:      75 tokens, checkpoints:  0,     4.103 MiB
srv        update:    - prompt 0x7fc12c15fb40:      75 tokens, checkpoints:  0,     4.103 MiB
srv        update:    - prompt 0x7fc15781b190:     142 tokens, checkpoints:  0,     7.768 MiB
srv        update:    - prompt 0x7fc12c1631b0:     631 tokens, checkpoints:  0,    34.516 MiB

</pre>
    <p>Craig....</p>
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 4/27/26 07:49, Greg H wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CAEuphw3G3gwBnKsitSbxDFZkv4zcEAYaKextQCW8zdZX9zhvmQ@mail.gmail.com">
      <meta http-equiv="content-type" content="text/html; charset=UTF-8">
      <div dir="ltr">Thanks Craig. I have it on my list to experiment
        more with self-hosting LLMs. I think there will be calls for
        self-hosting once AI fervor has peaked and labs have to show
        profitability.<br>
        <br>
        Not on topic, but related to our NetSIG discussion on odd
        industry behaviours around LLM resource consumption:<br>
        <br>
        <a
href="https://newsletter.pragmaticengineer.com/p/the-pulse-tokenmaxxing-as-a-weird-6b2"
          moz-do-not-send="true" class="moz-txt-link-freetext">https://newsletter.pragmaticengineer.com/p/the-pulse-tokenmaxxing-as-a-weird-6b2</a>
        <div><br>
        </div>
        <div>We're back to the days of  "more K-LOCs!" <br>
          <br>
        </div>
      </div>
      <br>
      <div class="gmail_quote gmail_quote_container">
        <div dir="ltr" class="gmail_attr">On Sun, Apr 26, 2026 at
          8:21\u202fAM Craig Miller &lt;<a href="mailto:cvmiller@gmail.com"
            moz-do-not-send="true" class="moz-txt-link-freetext">cvmiller@gmail.com</a>&gt;
          wrote:<br>
        </div>
        <blockquote class="gmail_quote"
style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
          <div>
            <p>Hi Greg,</p>
            <p>No I haven't. I think you could run 'strace' to see what
              the model was doing at the time, but it would be slow, and
              I am not sure it would tell you much.</p>
            <p>I don't think it was a RAM issue, since the container I
              am running the LLMs is unrestricted (can use all the
              host's memory, which is 32 GB), and the kernel is fairly
              recent (6.18.19-0-lts).</p>
            <p>I didn't spend much time on it, because, my objective was
              to get a local LLM running, not debug the model at the
              time.</p>
            <p>Craig...</p>
            <div>On 4/26/26 07:56, Greg H wrote:<br>
            </div>
            <blockquote type="cite">
              <div dir="ltr">I was curious if you do any troubleshooting
                for the models that core dump. I don't have any
                experience with this and I'm wondering if there's much
                that you can do other than increase the resources (i.e.
                more RAM). Maybe upgrade the kernel? Guessing some
                models need the latest / greatest kernel versions to do
                their thing. 
                <div> </div>
              </div>
              <br>
              <div class="gmail_quote">
                <div dir="ltr" class="gmail_attr">On Sun, Apr 26, 2026
                  at 7:34\u202fAM Craig Miller &lt;<a
                    href="mailto:cvmiller@gmail.com" target="_blank"
                    moz-do-not-send="true" class="moz-txt-link-freetext">cvmiller@gmail.com</a>&gt;
                  wrote:<br>
                </div>
                <blockquote class="gmail_quote"
style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                  <div>
                    <p>Hi Deid,</p>
                    <p>Looking at the gguf models on HuggingFace:</p>
                    <p><a
href="https://huggingface.co/models?library=gguf" target="_blank"
                        moz-do-not-send="true"
                        class="moz-txt-link-freetext">https://huggingface.co/models?library=gguf</a></p>
                    <p>There were a couple of parameters I was looking
                      at:</p>
                    <ol>
                      <li>Not too big, somewhere between 5 and 10 GB in
                        size</li>
                      <li>Relatively recent</li>
                      <li>Doesn't core dump right away</li>
                    </ol>
                    <p>I had the best luck at running the Qwen models. I
                      am running
                      Qwen2.5-VL-7B-Instruct-abliterated.Q4_K_M.gguf on
                      my PN-50, and it seems to run reasonably fast.
                      Some of the other models were quite slow on the
                      PN-50.</p>
                    <p>Have fun!</p>
                    <p>Craig...</p>
                    <div>On 4/26/26 07:13, Deid Reimer wrote:<br>
                    </div>
                    <blockquote type="cite">
                      <div dir="auto">Hey Craig, <br>
                        <br>
                      </div>
                      <div dir="auto">Why did you pick that particular
                        LLM?<br>
                        <br>
                      </div>
                      <div dir="auto">Deid   VA7REI</div>
                      <div class="gmail_quote">On Apr 25, 2026, at 8:32
                        a.m., Craig Miller &lt;<a
                          href="mailto:cvmiller@gmail.com"
                          target="_blank" moz-do-not-send="true"
                          class="moz-txt-link-freetext">cvmiller@gmail.com</a>&gt;
                        wrote:
                        <blockquote class="gmail_quote"
style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                          <p>Hi All,</p>
                          <p>We were chatting before the most recent
                            NetSIG about the new Llamafile app, which
                            has excellent support for IPv6. The app runs
                            a webserver (which is IPv6 accessible).  The
                            new llamafile app takes a -m parameter which
                            points to the gguf LLM model.</p>
                          <p><b> Old way</b><br>
                                 ./google_gemma-3-4b-it-Q6_K.llamafile
                            --server -v2 --host <a
                              href="http://lxcllama.example.com"
                              target="_blank" moz-do-not-send="true">lxcllama.example.com</a><br>
                             <b>New way</b><br>
                                 llamafile -m model.gguf --server --port
                            8080</p>
                          <p>Find the new llamafile at:</p>
                          <p>    <a
href="https://github.com/mozilla-ai/llamafile/releases/tag/0.10.0"
                              target="_blank" moz-do-not-send="true"
                              class="moz-txt-link-freetext">https://github.com/mozilla-ai/llamafile/releases/tag/0.10.0</a></p>
                          <p>You can find gguf (LLM models) at:</p>
                          <p>     <a
href="https://huggingface.co/models?library=gguf" target="_blank"
                              moz-do-not-send="true"
                              class="moz-txt-link-freetext">https://huggingface.co/models?library=gguf</a></p>
                          <p>I start my llamafile using this command:</p>
                          <p>    ./llamafile-0.10.0 -m
                            Qwen3.5-9B.Q4_K_M.gguf --server --port 8080
                            --host <a
                              href="http://lxcllama.example.com"
                              target="_blank" moz-do-not-send="true">lxcllama.example.com</a> </p>
                          <p>This way any webbrowser at my house, can
                            access the LLM.</p>
                          <p>Happy LLM-ing,</p>
                          <p>Craig...</p>
                          <pre cols="72">-- 
IPv6 is the future, the future is here
<a href="http://ipv6hawaii.org/" target="_blank" moz-do-not-send="true"
                          class="moz-txt-link-freetext">http://ipv6hawaii.org/</a></pre>
                          <pre>-- 
Projects mailing list
<a href="mailto:Projects@vicpimakers.ca" target="_blank"
                          moz-do-not-send="true"
                          class="moz-txt-link-freetext">Projects@vicpimakers.ca</a>
<a href="http://vicpimakers.ca/mailman/listinfo/projects_vicpimakers.ca"
                          target="_blank" moz-do-not-send="true"
                          class="moz-txt-link-freetext">http://vicpimakers.ca/mailman/listinfo/projects_vicpimakers.ca</a>
</pre>
                        </blockquote>
                      </div>
                      <br>
                      <fieldset></fieldset>
                    </blockquote>
                    <pre cols="72">-- 
IPv6 is the future, the future is here
<a href="http://ipv6hawaii.org/" target="_blank" moz-do-not-send="true"
                    class="moz-txt-link-freetext">http://ipv6hawaii.org/</a></pre>
                  </div>
                  -- <br>
                  Projects mailing list<br>
                  <a href="mailto:Projects@vicpimakers.ca"
                    target="_blank" moz-do-not-send="true"
                    class="moz-txt-link-freetext">Projects@vicpimakers.ca</a><br>
                  <a
href="http://vicpimakers.ca/mailman/listinfo/projects_vicpimakers.ca"
                    rel="noreferrer" target="_blank"
                    moz-do-not-send="true" class="moz-txt-link-freetext">http://vicpimakers.ca/mailman/listinfo/projects_vicpimakers.ca</a><br>
                </blockquote>
              </div>
              <br>
              <fieldset></fieldset>
            </blockquote>
            <pre cols="72">-- 
IPv6 is the future, the future is here
<a href="http://ipv6hawaii.org/" target="_blank" moz-do-not-send="true"
            class="moz-txt-link-freetext">http://ipv6hawaii.org/</a></pre>
          </div>
          -- <br>
          Projects mailing list<br>
          <a href="mailto:Projects@vicpimakers.ca" target="_blank"
            moz-do-not-send="true" class="moz-txt-link-freetext">Projects@vicpimakers.ca</a><br>
          <a
href="http://vicpimakers.ca/mailman/listinfo/projects_vicpimakers.ca"
            rel="noreferrer" target="_blank" moz-do-not-send="true"
            class="moz-txt-link-freetext">http://vicpimakers.ca/mailman/listinfo/projects_vicpimakers.ca</a><br>
        </blockquote>
      </div>
      <br>
      <fieldset class="moz-mime-attachment-header"></fieldset>
    </blockquote>
    <pre class="moz-signature" cols="72">-- 
IPv6 is the future, the future is here
<a class="moz-txt-link-freetext" href="http://ipv6hawaii.org/">http://ipv6hawaii.org/</a></pre>
  </body>
</html>