mixture-of-experts

A simple design for MOE on Mnist data

=======================================

MIXTURE OF EXPERTS on MNIST DATA

This is a minimalistic project to understand how MOE architectures work. The training and evaluation is done on MNIST data. Using this code you can have a good look at how things run under the hood.

Key Features

Simple python code to run.
- Just run one file and you can see outputs for
- models: Trained models
- plots: MOE activation outputs on epoch and batch number for a deeper understanding
- csv: MOE activation activate as a CSV output

Activation of MOE

Setup Instructions

Install Anaconda (Recommended):
Anaconda Installation Guide

Create and Activate the Environment:

conda create -n moe_cnn python=3.10.15
conda activate moe_cnn
pip install -r requirements.txt

Running the Training

Small Dataset (e.g., “Alice in Wonderland”):
```
python main.py
```

MIXTURE OF GUIDED EXPERTS on MNIST DATA

=======================================

Similarly we have a 'guided' of MoE. In this version we use labels to guide the data to handle particular labels. Enforcing expert to only have a expertise towards particular labels.

Flow Diagram of Gudied MoEs

Labels and features flow into the Gudied MoE and we calcualte 4 different losses - all of them which are weighted.

Key Features

Simple python code to run.
- Just run one file and you can see outputs for
- models: Trained models
- plots: MOE activation outputs on epoch and batch number for a deeper understanding
- csv: MOE activation activate as a CSV output

Results on 10 Mixture of Experts

Here are the losses and acurracies after 5 epochs

Assignment heatmaps

How can we verify if the guided MOEs are actually working? Well we can visualize the assigments during training and testing. Below are the two assignment maps

Base Mixture of Experts

Guided Mixture of Experts

From the maps its pretty evident that they guided MoE has better assignments.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.vscode		.vscode
moe		moe
.gitignore		.gitignore
5-epochs-10-MOEs.png		5-epochs-10-MOEs.png
Gatting-Network-Decision.drawio.png		Gatting-Network-Decision.drawio.png
Guided-MoE-Flow-Diagram-Inference.drawio.png		Guided-MoE-Flow-Diagram-Inference.drawio.png
Guided-MoE-Flow-Diagram-Training.drawio.png		Guided-MoE-Flow-Diagram-Training.drawio.png
LICENSE		LICENSE
MOE-Arch.drawio.png		MOE-Arch.drawio.png
MoE-Arch-Design.png		MoE-Arch-Design.png
README.md		README.md
__init__.py		__init__.py
base_label_distribution_epoch_5_batch_0.png		base_label_distribution_epoch_5_batch_0.png
guided_label_distribution_epoch_5_batch_0.png		guided_label_distribution_epoch_5_batch_0.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mixture-of-experts

A simple design for MOE on Mnist data