Blog Posts: Machine Learning
Language Modeling Without Neural Networks
Generating Shakespeare has become the “Hello World” of language models.1 Recently, I’ve been messing with alternative language models and came across unbounded n-gram models. These models are purely statistical and don’t require optimizing weights or …
Text Diffusion Models are Faster at Writing Code
In this post, I run small experiments showing that diffusion language models generate code (and other structured text) at a faster rate. Increased stucture tends to correlate with reduced entropy, which leads to higher confident token predictions, which …
BERT is just a Single Text Diffusion Step
This article appeared on Hacker News. Link to the discussion here. Additionally, Andrej Karpathy wrote his thoughts about the post, linked here.
A while back, Google DeepMind unveiled Gemini Diffusion, an experimental language model that generates text …
Local SGD and DiLoCo Research Musings
Here are some notes I wrote over this topic. I’ve switched my master’s thesis to a different topic, but there were many interesting research directions I found in this area.
Local SGD and DiLoCo Overview: It is October 15th, 2025. For my last year of my …
Running GPT-2 in WebGL with Classic GPGPU Programming
This article appeared on Hacker News. Link to the discussion here!
A few weeks back, I implemented GPT-2 using WebGL and shaders (Github Repo) which made the front page of Hacker News. Here is a short write-up over what I learned about old-school …
Intro to Autograd Engines: Karpathy's Micrograd in Go
For a while, I wanted to build a complete autograd engine. What is an autograd engine, you might ask? To find the answer, we first must know what a neural network is.
Neural Network Crash Course: A neural network can just be seen as a black-box function. …