Blog Posts: Machine Learning

Generating Shakespeare Without Neural Networks

January 17, 2026

Machine Learning, 2026

Learning how to generate Shakespeare has become the “Hello World” of language models.1 Recently, I’ve been messing with alternative language models (diffusion language models instead of autoregressive transformers) and came across unbounded n-gram models. …

Text Diffusion Models are Faster at Writing Code

December 13, 2025

Machine Learning, 2025

In this post, I run small experiments showing that diffusion language models generate code (and other structured text) at a faster rate. Increased stucture tends to correlate with reduced entropy, which leads to higher confident token predictions, which …

BERT is just a Single Text Diffusion Step

October 20, 2025

Machine Learning, 2025

This article appeared on Hacker News. Link to the discussion here. Additionally, Andrej Karpathy wrote his thoughts about the post, linked here. A while back, Google DeepMind unveiled Gemini Diffusion, an experimental language model that generates text …

Local SGD and DiLoCo Research Musings

October 14, 2025

Machine Learning, 2025

Here are some notes I wrote over this topic. I’ve switched my master’s thesis to a different topic, but there were many interesting research directions I found in this area. Local SGD and DiLoCo Overview: It is October 15th, 2025. For my last year of my …

Running GPT-2 in WebGL with Classic GPGPU Programming

May 24, 2025

Machine Learning, Programming, 2025

This article appeared on Hacker News. Link to the discussion here! A few weeks back, I implemented GPT-2 using WebGL and shaders (Github Repo) which made the front page of Hacker News. Here is a short write-up over what I learned about old-school …

Intro to Autograd Engines: Karpathy's Micrograd in Go

November 11, 2023

Machine Learning, Programming, 2023

For a while, I wanted to build a complete autograd engine. What is an autograd engine, you might ask? To find the answer, we first must know what a neural network is. Neural Network Crash Course: A neural network can just be seen as a black-box function. …