Server side: Making queries and debugging

Making queries and debugging

manage.py script provides shell mode which is a recommended tool for interactive scripting on the server e.g. for cherry-picking specific information. The shell mode will use ipython3 if installed. Following steps describe the typical application of this tool with the framework; let's find all proteins associated with acetylation sites:

Working remotely

Check for already running sessions:

screen -ls

Create a new (named) screen session:

screen -S my_session

Reconnect to it if needed after connection drops:

screen -x my_session

Running shell mode

Firstly, you should enable your virtual environment (as it is described in deployment instruction) and enter website directory.

# might be `source virtual_environment_3.7.3_v2019/bin/activate`
source virtual_environment/bin/activate
cd website

(Optional) if you did not install ipython3 earlier, use: pip3 install ipython3.

Afterwards just type ./manage.py shell and you are ready to go.

All models are accessible under models module namespace. To avoid specifying models. namespace prefix before each model usage you may directly import models you want to work with. If not decided yet which model you want to use, you can use Python's syntax: from models import * to import everything.

Here we import models.Site directly:

from models import Site

Making queries & getting the data

Now we can run a query, using sqlalchemy querying system:

from sqlalchemy import and_
sites = Site.query.filter(
    and_(Site.kinases.any(), Site.type.contains('acetylation'))
).all()

Let's check how many sites we have:

print(len(sites))

For example, if the result is 324, there are 324 acetylation sites with at least one kinase associated.

What's important, you can check the actual SQL query before executing it. Just do not type .all() (or similar) and print the result. Moreover, you can write plain SQL queries using special sqlalchemy functions. It can be done either with textual queries or by telling an engine to execute SQL statements.

Post-processing

Now you can work with retrieved objects as with usual Python class instances.

proteins_acssociated_with_acetylation_sites = set()
for site in sites:
    for kinase in site.kinases:
        if kinase.protein:
            proteins_acssociated_with_acetylation_sites.add(kinase.protein)

print(proteins_acssociated_with_acetylation_sites)

It gave us 27 proteins associated with acetylation sites:

{<Protein NM_012231 with seq of 1719 aa from PRDM2 gene>,
 <Protein NM_005030 with seq of 604 aa from PLK1 gene>,
 <Protein NM_002758 with seq of 335 aa from MAP2K6 gene>,
 <Protein NM_001514 with seq of 317 aa from GTF2B gene>,
 <Protein NM_145331 with seq of 607 aa from MAP3K7 gene>,
 <Protein NM_003884 with seq of 833 aa from KAT2B gene>,
 <Protein NM_001145415 with seq of 1292 aa from SETDB1 gene>,
 <Protein NM_002392 with seq of 498 aa from MDM2 gene>,
 <Protein NM_004424 with seq of 785 aa from E4F1 gene>,
 <Protein NM_003491 with seq of 236 aa from NAA10 gene>,
 <Protein NM_182710 with seq of 547 aa from KAT5 gene>,
 <Protein NM_001429 with seq of 2415 aa from EP300 gene>,
 <Protein NM_001282166 with seq of 424 aa from SUV39H1 gene>,
 <Protein NM_004380 with seq of 2443 aa from CREBBP gene>,
 <Protein NM_020197 with seq of 434 aa from SMYD2 gene>,
 <Protein NM_003642 with seq of 420 aa from HAT1 gene>,
 <Protein NM_030662 with seq of 401 aa from MAP2K2 gene>,
 <Protein NM_021078 with seq of 838 aa from KAT2A gene>,
 <Protein NM_000551 with seq of 214 aa from VHL gene>,
 <Protein NM_006709 with seq of 1211 aa from EHMT2 gene>,
 <Protein NM_001880 with seq of 506 aa from ATF2 gene>,
 <Protein NM_005923 with seq of 1375 aa from MAP3K5 gene>,
 <Protein NM_002613 with seq of 557 aa from PDPK1 gene>,
 <Protein NM_005204 with seq of 468 aa from MAP3K8 gene>,
 <Protein NM_004333 with seq of 767 aa from BRAF gene>,
 <Protein NM_001278549 with seq of 457 aa from PDK1 gene>,
 <Protein NM_030648 with seq of 367 aa from SETD7 gene>}

Additional statistics

Some statistics are not saved to database and accessible only from manage.py script. Please run:

./manage.py shell -c 'results = stats.generate_source_specific_summary_table()'

to calculate those statistics. Then explore results, e.g. with print(results).

One may want to use double optimized mode to speed up calculations. This mode (unfortunately) does not work with ipython and use of raw shell is needed:

python3 -OO manage.py shell --raw -c 'results = stats.generate_source_specific_summary_table()'

(Advanced) Custom initialization

By default an app instance is created for you when you enter the shell mode. If the default configuration of the created app does not suite your needs (e.g. you want to skip statistics loading), you may create a custom app instance.

To start up, you have to import create_app factory from app module and create app instance to initialize all necessary bindings and connecting to the appropriate database. Following example shows how to modify configuration to turn off statistics loading:

from app import create_app
app = create_app(config_override={'LOAD_STATS': False})

More advanced configuration override might look like this one:

app = create_app(config_override={'BDB_MODE': 'r', 'SCHEDULER_ENABLED': False, 'USE_CELERY': False})

You will find all available configuration options in config.py file.

Are mutation counts equal?

Following test case is another example of server-side scripting for debugging purposes. It extends Statistics class in order to determine if overall mutation counts is the same as the one calculated from the union of mutation datasets (minus common part as determined using inclusion-exclusion principle). This was used when solving #46 issue.

from stats import Statistics
from models import are_details_managed


class DetailsSensitiveStatistics(Statistics):
    def count_mutations(self, mutation_class):
        if are_details_managed(mutation_class):
            return super().count_mutations(mutation_class)
        else:
            return self.count(mutation_class)


# use of Statistics class may suffice, but the modified one will catch more errors
stats = DetailsSensitiveStatistics()
stats.calc_all()   # recalculate counts
counts = stats.get_all()
muts = counts['muts']
all_muts = (
    muts['MC3'] + muts['ClinVar'] + muts['ESP6500'] + muts['TKGenomes']
    - stats.from_more_than_one_source()
)
assert muts['all_confirmed'] - all_muts == 0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly