Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
This is a Plain English Papers summary of a research paper called Mixture-of-Depths: Dynamically allocating compute in transformer-based language models. If you like these kinds of analysis, you should subscribe to the AImodels.fyi newsletter or follow me on Twitter. Overview...