-
Clone the repository:
git clone https://github.com/voiceboxneurips/voicebox.git
-
We recommend working from a clean environment, e.g. using
conda
:conda create --name voicebox python=3.9 source activate voicebox
-
Install dependencies:
cd voicebox pip install -r requirements.txt pip install -e .
-
Grant permissions:
chmod -R u+x scripts/
To reproduce our results, first download the corresponding data. Note that to download the VoxCeleb1 dataset, you must register and obtain a username and password.
Task | Dataset (Size) | Command |
---|---|---|
Objective evaluation | VoxCeleb1 (39G) | python scripts/downloads/download_voxceleb.py --subset=1 --username=<VGG_USERNAME> --password=<VGG_PASSWORD> |
WER / supplemental evaluations | LibriSpeech train-clean-360 (23G) |
./scripts/downloads/download_librispeech_eval.sh |
Train attacks | LibriSpeech train-clean-100 (11G) |
./scripts/downloads/download_librispeech_train.sh |
We provide scripts to reproduce our experiments and save results, including generated audio, to named and time-stamped subdirectories within runs/
. To reproduce our objective evaluation experiments using pre-trained attacks, run:
python scripts/experiments/evaluate.py
To reproduce our training, run:
python scripts/experiments/train.py
As a proof of concept, we provide a streaming implementation of VoiceBox capable of modifying user audio in real-time. Here, we provide installation instructions for MacOS and Ubuntu 20.04.
See video below:
-
Open a terminal and follow the installation instructions above. Change directory to the root of this repository.
-
Run the following command:
pacmd load-module module-null-sink sink_name=voicebox sink_properties=device.description=voicebox
If you are using PipeWire instead of PulseAudio:
pactl load-module module-null-sink media.class=Audio/Sink sink_name=voicebox sink_properties=device.description=voicebox
PulseAudio is the default on Ubuntu. If you haven't changed your system defaults, you are probably using PulseAudio. This will add "voicebox" as an output device. Select it as the input to your chosen audio software.
-
Find which audio device to read and write from. In your conda environment, run:
python -m sounddevice
You will get output similar to this:
0 HDA Intel HDMI: 0 (hw:0,3), ALSA (0 in, 8 out) 1 HDA Intel HDMI: 1 (hw:0,7), ALSA (0 in, 8 out) 2 HDA Intel HDMI: 2 (hw:0,8), ALSA (0 in, 8 out) 3 HDA Intel HDMI: 3 (hw:0,9), ALSA (0 in, 8 out) 4 HDA Intel HDMI: 4 (hw:0,10), ALSA (0 in, 8 out) 5 hdmi, ALSA (0 in, 8 out) 6 jack, ALSA (2 in, 2 out) 7 pipewire, ALSA (64 in, 64 out) 8 pulse, ALSA (32 in, 32 out) * 9 default, ALSA (32 in, 32 out)
In this example, we are going to route the audio through PipeWire (channel 7). This will be our INPUT_NUM and OUTPUT_NUM
-
First, we need to create a conditioning embedding. To do this, run the enrollment script and follow its on-screen instructions:
python scripts/streamer/enroll.py --input INPUT_NUM
-
We can now use the streamer. Run:
python scripts/stream.py --input INPUT_NUM --output OUTPUT_NUM
-
Once the streamer is running, open
pavucontrol
.a. In
pavucontrol
, go to the "Playback" tab and find "ALSA pug-in [python3.9]: ALSA Playback on". Set the output to "voicebox".b. Then, go to "Recording" and find "ALSA pug-in [python3.9]: ALSA Playback from", and set the input to your desired microphone device.
If you use this your academic research, please cite the following:
@inproceedings{authors2022voicelock,
title={VoiceBlock: Privacy through Real-Time Adversarial Attacks with Audio-to-Audio Models},
author={Patrick O'Reilly, Andreas Bugler, Keshav Bhandari, Max Morrison, Bryan Pardo},
booktitle={Neural Information Processing Systems},
month={November},
year={2022}
}