A doorknob

Blog Posts

Generating Shakespeare Without Neural Networks

Learning how to generate Shakespeare has become the “Hello World” of language models.1 Recently, I’ve been messing with alternative language models (diffusion language models instead of autoregressive transformers) and came across unbounded n-gram models. …

Text Diffusion Models are Faster at Writing Code

In this post, I run small experiments showing that diffusion language models generate code (and other structured text) at a faster rate. Increased stucture tends to correlate with reduced entropy, which leads to higher confident token predictions, which …

Curserve: Minimizing Agentic Coding End-to-End Latency

For Cal Hacks 2025, a few friends and I built Curserve, a fast and scalable server-side engine for agentic coding, which ended up placing for one of the sponsor prizes. We didn’t go to Cal Hacks to try and win, but instead to have a good excuse to work on …

BERT is just a Single Text Diffusion Step

This article appeared on Hacker News. Link to the discussion here. Additionally, Andrej Karpathy wrote his thoughts about the post, linked here. A while back, Google DeepMind unveiled Gemini Diffusion, an experimental language model that generates text …

Local SGD and DiLoCo Research Musings

Here are some notes I wrote over this topic. I’ve switched my master’s thesis to a different topic, but there were many interesting research directions I found in this area. Local SGD and DiLoCo Overview: It is October 15th, 2025. For my last year of my …

Running GPT-2 in WebGL with Classic GPGPU Programming

This article appeared on Hacker News. Link to the discussion here. A few weeks back, I implemented GPT-2 using WebGL and shaders (Github Repo) which made the front page of Hacker News. Here is a short write-up over what I learned about old-school …

Mathematical Statistics

My notes over Mark Maxwell’s course, Introduction to Mathematical Statistics, and his textbook, Probability & Statistics with Applications, Second Edition. Sampling Distributions and Estimation: Normally in a probability experiment, we don’t know the true …

Common Probability Distributions

An overview of common discrete and continuous distributions found in probability and statistics, from Mark Maxwell’s textbook, Probability & Statistics with Applications, Second Edition. Common Discrete Distributions: Discrete Uniform: A random variable …

How to Fix Hugo's iOS Code-Block Text-Size Rendering Issue

Lately, I’ve been coming across many blogs that have weird font-size rendering issues for code blocks on iOS. Basically, in a code snippet, the text-size would sometimes be much larger for some lines than others. Below is a screenshot of the issue from a …

Intro to Autograd Engines: Karpathy's Micrograd in Go

For a while, I wanted to build a complete autograd engine. What is an autograd engine, you might ask? To find the answer, we first must know what a neural network is. Neural Network Crash Course: A neural network can just be seen as a black-box function. …