Tiny is a powerful terminal-based bioinformatics tool designed for DNA sequence analysis. It provides various features for analyzing, comparing, and discovering patterns in DNA sequences from any organism, including bacterial, fungal, viral, plant, and animal genomes.
- Sequence validation with IUPAC ambiguous base support
- GC content calculation (handles ambiguous bases)
- Molecular weight calculation
- Base composition analysis
- Complement and reverse complement sequences
- Pairwise sequence alignment
- Global alignment (Needleman-Wunsch algorithm)
- Local alignment (Smith-Waterman algorithm)
- Semi-global alignment
- Mutation detection
- Sequence identity calculation
- Gap analysis
- Variable-length motif detection
- Frequency analysis
- Position tracking
- Consensus score calculation
- Custom minimum frequency thresholds
- TATA box detection
- GC box detection
- CAAT box detection
- Palindromic sequence identification
- Position information for all elements
- Comprehensive feature overview for GenBank files
- Feature type filtering and counting
- Customizable feature display limits
- Detailed qualifier information
- JSON export for complete feature data
- FASTA (.fa, .fasta)
- FASTQ (.fq, .fastq)
- GenBank (.gb, .gbk, .genbank)
- EMBL (.embl)
- JSON output format
- Progress bars for long operations
- Color-coded output
- Formatted tables
- Summary statistics
- Clear section separators
--feature-limit
: Control number of features displayed (0 for all)--feature-type
: Filter specific feature types(CDS, gene, tRNA, etc.)--save-features
: Export complete feature data to JSON--format-info
: Show detailed format-specific information
- Python 3.9 or higher
- Poetry (Python package manager)
- To install Poetry on Arch Linux, you can use the following command:
sudo pacman -S python-poetry
Alternatively, if you prefer to install it using the official installer, you can run:
curl -sSL https://install.python-poetry.org | python3 -
Or head over to the official documentation for poetry:
https://python-poetry.org/docs/
- Clone the repository:
git clone https://github.com/Bjorn99/Tiny.git
cd Tiny
- Install dependencies using Poetry:
poetry install
- Activate the Virtual Environment:
poetry shell
For a comprehensive list of examples and use cases, check out the Examples.md
# Analyze single or multiple sequences
tiny analyze ATCG GCTA
# Analyze sequences from files
tiny analyze --input sequence.fasta
tiny analyze --input sequence.gb --format-info
# Control feature display
tiny analyze --input sequence.gb --format-info --feature-limit 10
tiny analyze --input sequence.gb --format-info --feature-type CDS
tiny analyze --input sequence.gb --format-info --save-features
# Save analysis results to a file
tiny analyze ATCG GCTA --output results.json
# Global alignment
tiny align ATCGATCG ATCTATCG --mode global
# Local alignment
tiny align ATCGATCG ATCTATCG --mode local
# Semi-global alignment
tiny align ATCGATCG ATCTATCG --mode semi-global
# Find motifs of length 4 that appear at least twice
tiny find-motifs ATCGATCG ATCTATCG ATCGAGCG --length 4 --min-freq 2
# Find motifs in sequences from a file
tiny find-motifs --fasta sequences.fasta --length 6 --min-freq 3
# Find regulatory elements in a sequence
tiny find-regulatory TATAAAAGGCGGGCCAATATCGATCG
-
Performance Limitations
- Not optimized for very long sequences (>10,000 bp)
- Memory usage increases significantly with sequence length in pairwise alignments
- Motif finding can be computationally intensive for long sequences
-
Input Capabilities
- Handles DNA sequences with IUPAC ambiguous bases
- Supports multiple file formats (FASTA, FASTQ, GenBank, EMBL)
- Maximum recommended sequence length: 10,000 bp
-
Analysis Limitations
- No support for multiple sequence alignment
- No secondary structure prediction
- No phylogenetic analysis capabilities
- No support for genome-scale analyses
- Validate your input sequences before analysis
- Use appropriate alignment modes based on your sequences
- Consider sequence length limitations (max 10,000 bp)
- Use format-specific information with --format-info flag
- Save results to files for later analysis
- Use file input for multiple sequence analysis
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
This project is licensed under the GPL License - see the LICENSE file for details.
- Built with BioPython
- Project and dependency management with Poetry
- CLI interface powered by Typer
- Terminal formatting by Rich
Tiny is under active development. Future planned features include:
- Support for RNA sequences
- Multiple sequence alignment
- Phylogenetic analysis
- Secondary structure prediction
- Support for additional file formats
- Performance optimizations for longer sequences
- Advanced statistical analysis
- Integration with external databases
If you encounter any issues or have questions, please:
- Check the existing issues on GitHub
- Create a new issue if your problem isn't already reported
- Provide as much detail as possible about your problem
This tool implements methods and algorithms from various scientific publications. For a complete list of references, see REFERENCES.md. Key references include:
- Needleman-Wunsch algorithm: Needleman & Wunsch (1970), Journal of Molecular Biology
- Smith-Waterman algorithm: Smith & Waterman (1981), Journal of Molecular Biology
- IUPAC ambiguous base notation: Cornish-Bowden (1985), Nucleic Acids Research
- Motif finding methods: Bailey & Elkan (1994), ISMB Proceedings
- Regulatory element analysis: Bucher (1990), Journal of Molecular Biology
The tool is built using BioPython (Cock et al., 2009) and other open-source libraries. For implementation details and additional references, please refer to the full references list.