BERT is just a Single Text Diffusion Step October 20, 2025Machine LearningThis article appeared on Hacker News. Link to the discussion here. A while back, Google DeepMind unveiled Gemini Diffusion, an …
Local SGD and DiLoCo Research Musings October 14, 2025Research, Machine LearningHere are some notes I wrote over this topic. I’ve switched my master’s thesis to a different topic, but there are many …
Running GPT-2 in WebGL with Classic GPGPU Programming May 24, 2025Machine Learning, ProgrammingThis article appeared on Hacker News. Link to the discussion here. A few weeks back, I implemented GPT-2 using WebGL and …
Intro to Autograd Engines: Karpathy's Micrograd in Go November 11, 2023Machine Learning, ProgrammingFor a while, I wanted to build a complete autograd engine. What is an autograd engine, you might ask? To find the answer, we …