A doorknob

BERT is just a Single Text Diffusion Step

A while back, Google DeepMind unveiled Gemini Diffusion, an experimental language model that generates text using diffusion. Unlike traditional GPT-style models that generate one word at a time, Gemini Diffusion creates whole blocks of text by refining …

Local SGD and DiLoCo Research Musings

Originally, I was doing my master’s thesis on this topic. I eventually changed my focus, but these are some of the notes I wrote from when I was looking at Local SGD and DiLoCo. Local SGD and DiLoCo Overview: It is October 15th, 2025. For my last year of …