taglets stands for ''Tasks Algorithmically Given Labels Established via Transferred Symbols.'' A novel framework for creating weak label sources from existing datasets, combining them through a modular architecture, and selecting key examples to label under limited budgets.
In this package, we automatically construct labeling functions for the given image classification problem in which there is not enough labeled data. For each amount of labeled data, we call appropriate modules, create weak labelers called taglets, and then combine their outputs to train an end model.
In the top level directory, run
docker build --tag brown_taglets:1.0 .
To start a container, run
docker run --rm --env-file env.list -v /lwll:/lwll:delegated --gpus all --shm-size 64G --ulimit nofile=1000000:1000000 brown_taglets:1.0
Note: "--shm-size 64G" and "--ulimit nofile=1000000:1000000" are crucial for our system to work.
To run the system for development, avoiding Docker, place yourself in the top level directory.
If it is the first time you run the repository's content, run:
bash setup.sh # The first time after you clone the repo
Note that this will install the python packages for you, so you might want to activate a virtual environment before running this.
Next, edit the num_processes in the file accelerate_config.yml to match with the number of gpus/processes (in case there are no gpus)
Then, go to dev_config.py
, set the variables (there are default values). Then, double-check that in run_jpl.sh
the --mode=dev
, and launch the system:
bash run_jpl.sh