Blog

How a bizarre training curve called "grokking" revealed that neural networks learn nothing — until suddenly, they learn everything.