This pipeline is a comprehensive tool for whole genome pairwise alignment using lastal
. It is designed to facilitate alignment, chaining, and netting of genomic data. The primary objective is to provide an efficient and user-friendly approach to comparative genomics, specifically for aligning entire genomes.
- Nextflow: Workflow management that allows the pipeline to be run across multiple compute infrastructures in a portable manner.
- R: The pipeline requires R to be installed, along with the
CNEr
package, which is used for handling genomic alignment data. - lastal: A tool used for the alignment step of the pipeline.
- kentUtils: A collection of command-line tools for manipulating, analyzing, and visualizing genomic datasets.
To install the pipeline, clone the repository from GitHub:
git clone https://github.com/da-bar/whole_genome_pairwise_alignment.git
To run the pipeline, use the following command:
nextflow run main.nf -c nextflow.config
Configure the pipeline by editing the nextflow.config
file. A typical configuration might look like this:
params {
reference = '/path/to/reference/reference.fasta'
query = '/path/to/query/query.fasta'
output = '/path/to/output_folder'
}
Details about the parameters used in the pipeline will be provided (TBA).
- Aligning with lastal: Performs whole genome alignments using the
lastal
tool. - Chaining the Alignment: Chains the alignment data for better genomic alignment interpretation.
- Netting the Chains: Applies a netting process on the chained data to filter and refine alignments.
- Export in axt Format: The final alignments are exported in the
axt
file format for further analysis.
The pipeline produces several outputs, including:
- Alignment in MAF (Multiple Alignment Format) file.
- Alignment in PSL (Pairwise Sequence Alignment) format.
- Chained alignment files.
- Filtered chains and syntenic nets.
- Final netted alignment in AXT format.
An example Nextflow config, along with test genomes of S. cerevisiae and S. eubayanus, will be included in the GitHub repository.
Contributions to the pipeline are welcome. Please raise an issue or a pull request on the GitHub repository. For direct communication, contact the author via email.
This software is available under the MIT license. Please refer to the LICENSE file in the repository for more details.
If you use this pipeline in your research, please cite the CNEr paper: