Developed using Databricks with HuggingFace Community Blog that walks through this: https://huggingface.co/blog/AviSoori1x/makemoe-from-scratch
Part #2 detailing expert capacity: https://huggingface… [+2319 chars]
Lifestyle Architecture
Implementation of mixture of experts language model in a single file of PyTorch
From scratch implementation of a sparse mixture of experts language model inspired by Andrej Karpathy's makemore :) - AviSoori1x/makeMoE