A Bit that weighs 200 MB

November 13, 2025

I love Tron, and I really appreciated the new Tron Ares. In fact, I watched it twice in the cinema, which is a rarity for me! The visuals were astonishing, and the soundtrack by NIN raised the bar in the same way Daft Punk did back in 2010. The story is nice, with lots of references to the original movie. I think they really did a wonderful job.

One of the scenes I particularly loved (no spoiler since it's in one of the trailers) is when Ares goes into the "80s Grid", inside what appears to be an Apple III. That actually makes perfect sense since Flynn did use the same machine when trying to hack into Encom. There, he finds Bit, Clu's companion from the original movie. I always liked Bit; it's such a nerdy character concept, and I'm so glad they brought it back, even briefly, in the new installment.

Even though the scene was very cheesy and full of fan service, I could see the effort they put into recreating the exact style of the 1982 movie. The stark contrast between the sharp corners of the 80s Light Cycle trails and the sleek design of Dillinger's Light Cycles on the streets of Vancouver was amazing to see.

Inspired by the movie, I decided to go home and recreate Bit as a web app. I've seen lots of interesting creations of Bit, mainly using the Math.random function, so I thought, what if I hook it up to an LLM? What sort of wacky experience could I come up with? :)
For this exercise, I wanted to build everything fully local because, as I mentioned in my previous post, I don't like cloud AI services. Since I didn't want to self-host my model on fly.io, and I couldn't exactly ask people to set up their own Ollama instance just to run this little app, I went in a different direction: can I run the entire model directly in the browser? And can I make it multiplatform?

Running LLMs in browsers

After researching the topic and experimenting for a few weeks, as of November 2025, we have fundamentally only two options for running large language models directly in the browser: transformers.js and WebLLM. Some browsers, such as Opera and Chrome, are developing their own in-house solutions with private APIs, but nothing is really ready, widely available, or truly multiplatform yet.

I chose to give these two libraries a try, and I saw some really cool results. However, as soon as I tested them on a few smartphones, they either crashed (due to the large amount of data being loaded into the browser) or failed to run at all, since both libraries rely heavily on WebGPU, which is still very experimental. Here's a more insightful article from Mozilla about the topic.

I really wanted to have this small web app on my smartphone, and it had to be easily shareable across devices. After searching for a few more days with no luck, I was about to give up when I ultimately stumbled upon a project called Wllama which uses WebAssembly. CPU inference - that was exactly what I needed!

WebAssembly to the rescue (sort of)

By using this WebAssembly binding for llama.cpp, I was able to run inference directly on the CPU, which removed the dependency on WebGPU. There's one huge caveat, though: the experience varies dramatically across devices. I noticed that on Apple Silicon it runs much more smoothly compared to most Android ARM chips, and don't even get me started on Intel or AMD. This is probably due to the Unified Memory Architecture and high bandwidth of Apple's SoCs, which I think is why most mainstream solutions rely on WebGPU. I decided to still go ahead and develop my app anyway.

My next obstacle was on the horizon. Bit was taking far too long to reply to my inputs, and I quickly realized that my app was running on a single thread. This happened because Web Workers use SharedArrayBuffer, and the browser requires two specific HTTP headers to support multithreading:

This was a little annoying since most popular static hosting providers, such as GitHub Pages or surge.sh, do not allow customization of HTTP headers. The only one that allowed me to do this was Netlify. By modifying the netlify.toml file like this:

[[headers]]
  for = "/*"
  [headers.values]
    Cross-Origin-Opener-Policy = "same-origin"
    Cross-Origin-Embedder-Policy = "require-corp"

I could finally set COOP and COEP headers and achieve multithreading once and for all.
In the coming months, I'll keep experimenting with WebGPU+WebAssembly to see if I get a better outcome and support other platforms.

End result

The last choice I had to make was picking the actual model to respond to my prompts. I was looking for something extremely small, ideally no more than 1 billion parameters. Keep in mind that this had to be loaded entirely into a browser tab, which has a limit of 2GB, or it would crash. Initially, I opted to use Gemma3 270M (unsloth), but it gave very odd results. Models this small require fine-tuning before they become truly useful. Then I came across a model from Liquid AI called LFM2-350M (the 4-bit quantized version weighs roughly 229 MB) which worked quite well out of the box. The model gave me consistent answers, and it felt like it understood my prompts correctly, even when I spoke Italian!

When working with tiny LLMs, you really don't know what to expect. The hallucinations can be funny in their own way, and it is really hard to control the output, so you need to guardrail the prompt quite a bit. Nonetheless, I came up with a very simple one:

You are a friendly binary answer bot.
You can only respond with single word "YES" or "NO".
do not provide explanation, punctuation or other text.
To emphasize your answer, you can use "LOUD YES" or "LOUD NO".
Examples:
is the water wet?
YES
are you angry at me?
LOUD NO
is the planet earth flat?
NO
are you a bit?
LOUD YES

I figured I would use the "loud" prefix so that the model would emphasize the answer a bit more in some cases; the sound "YES YES YES!" or "NO NO NO!" would then be played, just like in the movies.

For the 3D assets, I discovered some amazing STL models made by Fernando Jerez and freely available on Printables. Since I didn't want the browser to process the STL files too heavily with three.js or similar tools, I converted them into simple, lightweight GIFs using IMAGEtoSTL. For the sounds, I sampled them directly from the movie. This was a bit of a pain because I had to manually isolate the background noise using Audacity's Notch Filter.

The small web app turned out quite nicely, I think. I even made it a PWA, so it should work offline. It is currently available at bit.simone.computer, and the source is on GitHub.

It's a real pity Disney doesn't seem very interested in the franchise. I truly think Ares was great, and I highly recommend everyone watch the tv show Tron Uprising and play Tron 2.0, which still holds up so damn well even after 22 years - my god.

These movies have always been incredible from a technical perspective. I could watch deep-dive videos on how they animated the first movie in 1982 all day.

Anyway, I guess I'll see you all in 15 to 30 years if someone finally decides to resuscitate the IP. plan9 end cursor

END OF LINE.█