The MAMBA Model transformer having a language modeling head on best (linear layer with weights tied into the input
With these representations, there is a neat trick that we can easily use, specifically opt for a https://k2spiceshop.com/product/liquid-k2-on-paper-online/