Show HN: Local LLM Notepad – run a GPT-style model from a USB stick

17 points by davidye324 9 hours ago

What it is A single 45 MB Windows .exe that embeds llama.cpp and a minimal Tk UI. Copy it (plus any .gguf model) to a flash drive, double-click on any Windows PC, and you’re chatting with an LLM—no admin rights, Cloud, or network.

Why I built it Existing “local LLM” GUIs assume you can pip install, pass long CLI flags, or download GBs of extras.

I wanted something my less-technical colleagues could run during a client visit by literally plugging in a USB drive.

How it works PyInstaller one-file build → bundles Python runtime, llama_cpp_python, and the UI into a single PE.

On first launch, it memory-maps the .gguf; subsequent prompts stream at ~20 tok/s on an i7-10750H with gemma-3-1b-it-Q4_K_M.gguf (0.8 GB).

Tick-driven render loop keeps the UI responsive while llama.cpp crunches.

A parser bold-underlines every token that originated in the prompt; Ctrl+click pops a “source viewer” to trace facts. (Helps spot hallucinations fast.)

ensocode an hour ago

Interesting, will definitely try it. What can be expected? What other models do perform ok with this?

gxonatano 4 hours ago

> walk up to any computer

Windows users seem to think their OS is ubiquitous. But in fact for most hackers reading this site, using Windows is a huge step backwards in productivity and capability.

Zetaphor 2 hours ago

Surely you're hinting at Linux, in which case this runs fine with WINE

ge96 5 hours ago

Wonder if you can use/interface with those coral accelerator boards