Why SSMs Struggle in Parameter Golf: A Structural Analysis at 25M Parameters
TL;DR Over ~3 weeks of experimentation on an SSM-based submission to OpenAI’s Parameter Golf, I converged on a legal Mamba-3 hybrid at post-quant+TTT 1.1456 bpb, the best SSM submission in the 16M...