Skip to content

Deep Neural Network Tensorflow Model for viral genomes classification

License

Notifications You must be signed in to change notification settings

Xarvalus/tensorflow-prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tensorflow Prediction

Collection of scripts being used in building Deep Neural Network Tensorflow Model for viral genomes classification.

From data fetching, through clearing and storing the unified dataset within in-file DB, to Estimator based production-ready prediction classificator.

Model & Motivation

The genomes contains raw biological information about species and could be reasonable factor for classification based on chains of nucleotides.

Where the human would find it difficult to maintain, the machine (ML) can use it's operational capabilities to predicate basing on genomes like DNA/RNA.

After many adjustments to fit the case into rough borders of minimal resources and simplicity (just for experimentation & personal research), the gathered genomes of viruses are used to predicate by it into 7 taxonomy groups (Baltimore Classification):

I: dsDNA viruses (e.g. Adenoviruses, Herpesviruses, Poxviruses)
II: ssDNA viruses (+ strand or "sense") DNA (e.g. Parvoviruses)
III: dsRNA viruses (e.g. Reoviruses)
IV: (+)ssRNA viruses (+ strand or sense) RNA (e.g. Picornaviruses, Togaviruses)
V: (−)ssRNA viruses (− strand or antisense) RNA (e.g. Orthomyxoviruses, Rhabdoviruses)
VI: ssRNA-RT viruses (+ strand or sense) RNA with DNA intermediate in life-cycle (e.g. Retroviruses)
VII: dsDNA-RT viruses DNA with RNA intermediate in life-cycle (e.g. Hepadnaviruses)

Accomplished partially with less than moderate results.

Installation

Will install python setup based on pip's freezed requirements.txt.

make install

Fetch genomes data from NCBI

Fetches viruses genomes and metadata to text files in specific specie directory.

make fetch_data

Store genomes with metadata into database

Processes saved files from fetch and stores them normalized and unified in JSON format file (TinyDB).

make store_into_db

Train the genomes prediction model

Train the neural network and watch the prediction result 👊

make train

Analyze stored data

Show basic information about stored species and their genomes (eg count, largest genome, groups counts etc).

make analyze_data

Highly influenced by (many thanks 👏)

About

Deep Neural Network Tensorflow Model for viral genomes classification

Resources

License

Stars

Watchers

Forks

Packages

No packages published