=======================================
This is a minimalistic project to understand how MOE architectures work. The training and evaluation is done on MNIST data. Using this code you can have a good look at how things run under the hood.- Simple python code to run.
- Just run one file and you can see outputs for
models
: Trained modelsplots
: MOE activation outputs on epoch and batch number for a deeper understandingcsv
: MOE activation activate as a CSV output
-
Install Anaconda (Recommended):
Anaconda Installation Guide -
Create and Activate the Environment:
conda create -n moe_cnn python=3.10.15 conda activate moe_cnn pip install -r requirements.txt
-
Small Dataset (e.g., “Alice in Wonderland”):
python main.py
=======================================
Similarly we have a 'guided' of MoE. In this version we use labels to guide the data to handle particular labels. Enforcing expert to only have a expertise towards particular labels.Labels and features flow into the Gudied MoE and we calcualte 4 different losses - all of them which are weighted.
- Simple python code to run.
- Just run one file and you can see outputs for
models
: Trained modelsplots
: MOE activation outputs on epoch and batch number for a deeper understandingcsv
: MOE activation activate as a CSV output
How can we verify if the guided MOEs are actually working? Well we can visualize the assigments during training and testing. Below are the two assignment maps
From the maps its pretty evident that they guided MoE has better assignments.