Skip to content

Latest commit



executable file
112 lines (89 loc) · 3.8 KB

File metadata and controls

executable file
112 lines (89 loc) · 3.8 KB

Optimize compression parameteres of Jarvis3

This repository provides optimization algorithms applied to Jarvis3 compressor.


  1. Clone this project and change script permissions:
git clone
cd OptimJV3/scripts
chmod +x *.sh

Quick Demo:

./                           # map sequences into its DS, sorted by size; view sequences info
./ -s cy -ga "output" -lg 100 -t 10  # optimize compression of CY.seq (Human Chromsome Y) with canonical GA for 100 generations and 10 threads

Advanced Setup:


Alternatively, setup can be done as the following:

./      # install listed compressors, GTO, and AlcoR
./     # downloads FASTA files
./        # gunzip cassava files
./     # simulates and stores 2 synthetic FASTA sequences
./         # cleans FASTA files and stores raw sequence files
./ # download raw sequences from a balanced sequence corpus
./         # map sequences into their ids, sorted by size; view sequences info

Then, if necessary, update path names and file names written in config.json.

View Downloaded Sequences:

View information of stored sequences:

./ -v


The implemented features are listed in the following scripts:

./ -h            # main script features
./ -h              # GA features
./ -h  # initialization features
./ -h             # ...
./ -h      
./ -h
./ -h        # crossover and Mutation features

Optimization examples:

To emulate random search, the following instruction may be executed (assuming cy is the sequence filename):

# GA applied to optimization of human chromosome Y compression
# -s: sequence filename (without extension)
# -ga: name of folder where GA results are stored
# -lg: last generation number
# -t: number of threads to paralelize execution of JARVIS3 solutions
./ -s cy -ga "randomSearch" -lg 1 -t 10

To run a single GA, the following instruction may be executed (assuming cy is the sequence filename):

# GA applied to optimization of human chromosome Y compression
# -s: sequence filename (without extension)
# -ga: name of folder where GA results are stored
# -lg: last generation number
# -t: number of threads to paralelize execution of JARVIS3 solutions
./ -s cy -ga "example" -lg 100 -t 10

Alternatively, a set of pre-configured GAs can be executed as (assuming cy is the sequence filename):

# GA applied to optimization of human chromosome Y compression
# -s: sequence filename (without extension)
# -lg: last generation number
# -t: number of threads to paralelize execution of JARVIS3 solutions
./ -s cy -lg 100 -t 10


It should be noted that, since the algorithm validates solutions based on memory used, in comparison to available memory (to avoid overuse of memory resources), there is no guarantee that all results will be identical.

To reproduce the metameric CGA's results for Escherichia coli (100 generations), CY (100 generations), and Cassava (20 generations), run the following:

bash -x ./ -s "escherichia_coli" -ga "e0_ga1_lr0_cmga" -lr 0 -lg 100 1> out 2> err &
bash -x ./ -s cy -ga "e0_ga1_lr0_cmga" -lr 0 -lg 100 1> out 2> err &
bash -x ./ -s cassava -ga "e0_ga1_lr0_cmga" -lr 0 -lg 20 1> out 2> err &

To reproduce the results for CY, execute the following:

bash -x ./ -s cy -lg 100 -t 10 1> out 2> err &

To reproduce the human genome sampling results, execute the instructions written in the following script:
