Blog Posts: 2025

Text Diffusion Models are Faster at Writing Code

December 13, 2025

Machine Learning, 2025

In this post, I run small experiments showing that diffusion language models generate code (and other structured text) at a faster rate. Increased stucture tends to correlate with reduced entropy, which leads to higher confident token predictions, which …

Curserve: Minimizing Agentic Coding End-to-End Latency

November 9, 2025

Programming, 2025

For Cal Hacks 2025, a few friends and I built Curserve, a fast and scalable server-side engine for agentic coding, which ended up placing for one of the sponsor prizes. We didn’t go to Cal Hacks to try and win, but instead to have a good excuse to work on …

BERT is just a Single Text Diffusion Step

October 20, 2025

Machine Learning, 2025

This article appeared on Hacker News. Link to the discussion here. Additionally, Andrej Karpathy wrote his thoughts about the post, linked here. A while back, Google DeepMind unveiled Gemini Diffusion, an experimental language model that generates text …

Local SGD and DiLoCo Research Musings

October 14, 2025

Machine Learning, 2025

Here are some notes I wrote over this topic. I’ve switched my master’s thesis to a different topic, but there were many interesting research directions I found in this area. Local SGD and DiLoCo Overview: It is October 15th, 2025. For my last year of my …

Running GPT-2 in WebGL with Classic GPGPU Programming

May 24, 2025

Machine Learning, Programming, 2025

This article appeared on Hacker News. Link to the discussion here! A few weeks back, I implemented GPT-2 using WebGL and shaders (Github Repo) which made the front page of Hacker News. Here is a short write-up over what I learned about old-school …