Writing: Machine Learning|

Language Modeling Without Neural Networks

January 17, 2026

Machine Learning, 2026

Generating Shakespeare has become the “Hello World” of language models.1 Recently, I’ve been messing with alternative language models and came across unbounded n-gram models. These models are purely statistical and don’t require optimizing weights or …

Diffusion LLMs are Faster at Writing Code

December 13, 2025

Machine Learning, 2025

In this post, I run small experiments showing that diffusion language models generate code (and other structured text) at a faster rate. Increased stucture tends to correlate with reduced entropy, which leads to higher confident token predictions, which …

BERT is just a Single Text Diffusion Step

October 20, 2025

Machine Learning, 2025

A while back, Google DeepMind unveiled Gemini Diffusion, an experimental language model that generates text using diffusion. Unlike traditional GPT-style models that generate one word at a time, Gemini Diffusion creates whole blocks of text by refining …

Local SGD and DiLoCo Research Musings

October 14, 2025

Machine Learning, 2025

Here are some notes I wrote over this topic. I’ve switched my master’s thesis to a different topic, but there were many interesting research directions I found in this area. Local SGD and DiLoCo Overview: It is October 15th, 2025. For my last year of my …

Running GPT-2 in WebGL with Classic GPU Programming

May 24, 2025

Machine Learning, Programming, 2025

A few weeks back, I implemented GPT-2 using WebGL and shaders (Github Repo) which made the front page of Hacker News. Here is a short write-up over what I learned about old-school general-purpose GPU programming over the course of this project! Above is a …

Intro to Autograd Engines: Karpathy's Micrograd in Go

November 11, 2023

Machine Learning, Programming, 2023

For a while, I wanted to build a complete autograd engine. What is an autograd engine, you might ask? To find the answer, we first must know what a neural network is. Neural Network Crash Course: A neural network can just be seen as a black-box function. …