Update the documentation

anergictcell · Jul 27, 2023 · ad699e2 · ad699e2
1 parent 418b872
commit ad699e2
Show file tree

Hide file tree

Showing 15 changed files with 485 additions and 62 deletions.
diff --git a/Makefile b/Makefile
@@ -16,7 +16,7 @@ check:
 
 docs:
 	@echo "Generating documentation"
-	sphinx-build -b html -d _build/doctrees docs _build/html/
+	sphinx-build -a -b html -d _build/doctrees docs _build/html/
 
 build:
 	@echo "Building packages"

diff --git a/README.rst b/README.rst
@@ -6,15 +6,17 @@ A Python library to work with, analyze, filter and inspect the `Human Phenotype
 
 Visit the `PyHPO Documentation`_ for a more detailed overview of all the functionality.
 
+.. _Human Phenotype Ontology: https://hpo.jax.org/
+.. _PyHPO Documentation: https://pyhpo.readthedocs.io/en/latest/
 
 Main features
 =============
 
-- Identify patient cohorts based on clinical features
-- Cluster patients or other clinical information for GWAS
-- Phenotype to Genotype studies
-- HPO similarity analysis
-- Graph based analysis of phenotypes, genes and diseases
+* 👫 Identify patient cohorts based on clinical features
+* 👨‍👧‍👦 Cluster patients or other clinical information for GWAS
+* 🩻→🧬 Phenotype to Genotype studies
+* 🍎🍊 HPO similarity analysis
+* 🕸️ Graph based analysis of phenotypes, genes and diseases
 
 
 **PyHPO** allows working on individual terms ``HPOTerm``, a set of terms ``HPOSet`` and the full ``Ontology``.
@@ -25,8 +27,45 @@ Internally the ontology is represented as a branched linked list, every term con
 
 It provides an interface to create ``Pandas Dataframe`` from its data, allowing integration in already existing data anlysis tools.
 
-Examples
---------
+
+Getting started
+===============
+
+The easiest way to install **PyHPO** is via pip
+
+.. code:: bash
+
+    pip install pyhpo
+
+or, you can additionally install optional packages for extra functionality
+
+.. code:: bash
+
+    # Include pandas during install
+    pip install pyhpo[pandas]
+
+    # Include scipy
+    pip install pyhpo[scipy]
+
+    # Include all dependencies
+    pip install pyhpo[all]
+
+.. note::
+
+    Some features of PyHPO require ``pandas`` and ``scipy``. The standard installation via pip will not include pandas or scipy and PyHPO will work just fine. (You will get a warning on the initial import though).
+
+    Without installing ``pandas``, you won't be able to export the Ontology as a ``Dataframe``, everything else will work fine.
+
+    Without installing ``scipy``, you won't be able to use the ``stats`` module, especially the enrichment calculations.
+
+
+Usage example
+=============
+
+Basic use cases
+---------------
+
+Some examples for basic functionality of PyHPO
 
 How similar are the phenotypes of two patients
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -85,40 +124,6 @@ How close are two HPO terms
     """
 
 
-Getting started
-===============
-
-The easiest way to install **PyHPO** is via pip
-
-.. code:: bash
-
-    pip install pyhpo
-
-or, you can additionally install optional packages for extra functionality
-
-.. code:: bash
-
-    # Include pandas during install
-    pip install pyhpo[pandas]
-
-    # Include scipy
-    pip install pyhpo[scipy]
-
-    # Include all dependencies
-    pip install pyhpo[all]
-
-.. note::
-
-    Some features of PyHPO require ``pandas`` and ``scipy``. The standard installation via pip will not include pandas or scipy and PyHPO will work just fine. (You will get a warning on the initial import though). 
-
-    Without installing ``pandas``, you won't be able to export the Ontology as a ``Dataframe``, everything else will work fine.
-
-    Without installing ``scipy``, you won't be able to use the ``stats`` module, especially the enrichment calculations.
-
-
-Usage example
-=============
-
 HPOTerm
 -------
 An ``HPOTerm`` contains various metadata about the term, as well as pointers to its parents and children terms. You can access its information-content, calculate similarity scores to other terms, find the shortest or longes connection between two terms. List all associated genes or diseases, etc.
@@ -308,7 +313,6 @@ It can be reused across several modules, e.g:
         return Ontology.get_hpo_object(term)
 
 
-
 HPOSet
 ------
 An ``HPOSet`` is a collection of ``HPOTerm`` and can be used to represent e.g. a patient's clinical information. It provides APIs for filtering, comparisons to other ``HPOSet`` and term/gene/disease enrichments.
@@ -406,7 +410,8 @@ Examples:
 *(This script is complete, it should run "as is")*
 
 
-For a more detailed description of how to use PyHPO, visit the `PyHPO Documentation`_.
+For a more detailed description of how to use PyHPO, visit the `PyHPO Documentation <https://pyhpo.readthedocs.io/en/latest/>`_.
+
 
 
 Contributing
@@ -424,6 +429,4 @@ PyHPO is using the Human Phenotype Ontology. Find out more at http://www.human-p
 
 Sebastian Köhler, Leigh Carmody, Nicole Vasilevsky, Julius O B Jacobsen, et al. Expansion of the Human Phenotype Ontology (HPO) knowledge base and resources. Nucleic Acids Research. (2018) doi: 10.1093/nar/gky1105
 
-.. _PyHPO Documentation: https://centogene.github.io/pyhpo/
 .. _MIT license: http://www.opensource.org/licenses/mit-license.php
-.. _Human Phenotype Ontology: https://hpo.jax.org/
diff --git a/docs/conf.py b/docs/conf.py
@@ -57,7 +57,7 @@
 
 # General information about the project.
 project = 'PyHPO'
-copyright = '2021, CENTOGENE GmbH'
+copyright = '2023, Jonas Marcello'
 author = pyhpo.__author__
 
 # The version info for the project you're documenting, acts as replacement for
@@ -74,7 +74,7 @@
 #
 # This is also used if you do content translation via gettext catalogs.
 # Usually you set "language" from the command line for these cases.
-language = None
+language = "en"
 
 # There are two options for replacing |today|: either, you set today to some
 # non-false value, then it is used:
@@ -296,3 +296,4 @@
 autodoc_member_order = 'bysource'
 
 napoleon_use_param = True
+autodoc_typehints = "both"
diff --git a/docs/index.rst b/docs/index.rst
@@ -1,10 +1,29 @@
-#################################
-Welcome to PyHPO's documentation!
-#################################
+###################
+PyHPO documentation
+###################
 
 .. toctree::
     :maxdepth: 1
-    :caption: API documentation:
+    :caption: 🚀 Getting started:
+
+    tutorial/installation
+    tutorial/basics
+    tutorial/data
+
+.. toctree::
+    :maxdepth: 1
+    :caption: 🖥️ Examples:
+
+    tutorial/examples
+    tutorial/terms
+    tutorial/ontology
+    tutorial/sets
+    tutorial/enrichment
+
+.. toctree::
+    :maxdepth: 1
+    :hidden:
+    :caption: 📄 API documentation:
 
     hpoterm
     ontology
@@ -16,17 +35,16 @@ Welcome to PyHPO's documentation!
     data
     parser
 
-*************
-Introduction:
-*************
+
 
 .. include:: ../README.rst
+    :end-before: Getting started
+
 
 ##################
 Indices and tables
 ##################
 
 * :ref:`genindex`
-* :ref:`modindex`
 * :ref:`search`
 
diff --git a/docs/tutorial/basics.rst b/docs/tutorial/basics.rst
@@ -0,0 +1,165 @@
+Basics
+------
+
+**PyHPO** provides and easy interface to work with the Human Phenotype Ontology. The main interface is the :doc:`ontology` object, which must be instantiated once. ``Ontology`` is designed as a singleton, so the same instance can be used across different modules.
+
+Ontology
+~~~~~~~~
+
+``Ontology`` can be instantiated with the default master data, simply by calling
+
+.. code:: python
+
+    from pyhpo import Ontology
+
+    _ = Ontology()
+
+It can now be used across all modules. Imagine the following source code:
+
+::
+
+    /mymodule
+      |- foo.py
+      |- bar.py
+      |- main.py
+
+
+``foo.py``:
+
+.. code:: python
+
+    from pyhpo import Ontology
+
+    def ontology_len():
+        print(len(Ontology))
+
+
+``bar.py``:
+
+.. code:: python
+
+    from pyhpo import Ontology
+
+    def get_term_name(term_id: int) -> str:
+        try:
+            term = Ontology[term_id]
+        except KeyError:
+            print("Term not present in Ontology")
+            return ""
+
+
+``main.py``:
+
+.. code:: python
+
+    import foo
+    import bar
+
+    from pyhpo import Ontology
+
+    # This is the only time where the Ontology is instantiated.
+    _ = Ontology()
+
+    foo.ontology_len()
+    # ==> Prints the number of HPO terms in the Ontology
+
+    bar.get_term_name(118) # ==> "Phenotypical abnormality"
+
+
+This code works as expected, the ``Ontology`` singleton is shared across all modules and submodules. It must be instantiated only once. Other modules only need to import the ``Ontology`` object.
+
+
+By default, ``Ontology()`` will load the HPO version provided along with the library (see the :doc:`data` section for details about how to update or change the masterdata.
+
+
+The Ontology holds references to all HPO terms, genes and diseases. Since terms are the most common use-case, ``Ontology`` allows easy subsetting to retrieve terms. For this, use the integer form of the HPO-Term ID:
+
+.. code:: python
+
+    from pyhpo import Ontology
+    _ = Ontology()
+
+    term = Ontology[118] # ==> returns term `HP:0000118`
+    
+
+Alternatively, terms can be retrieved by using the full HPO-Term ID:
+
+.. code:: python
+
+    from pyhpo import Ontology
+    _ = Ontology()
+
+    term = Ontology.get_hpo_object("HP:0000118") # ==> returns term `HP:0000118`
+
+
+The ``Ontology`` can also be used as an iterator, it iterates all HPO-Terms in random order:
+
+.. code:: python
+
+    from pyhpo import Ontology
+    _ = Ontology()
+
+    for term in Ontology:
+        print(term)
+
+
+HPOTerm
+~~~~~~~
+
+Another object that is a key part of **PyHPO** are the :doc:`terms`. HPOTerms are the building block of the ontology and provide a lot of relevant functionality. They hold references to all their ancestor and child terms, allowing a fast traversal of individual arms of the ontology.
+
+.. code:: python
+
+    from pyhpo import Ontology
+    _ = Ontology()
+
+    term = Ontology[118]
+
+    for child in term.children:
+        print(f"{child}")
+
+    for parent in term.parents:
+        print(f"{parent}")
+
+    # You can also iterate over all parents and their parents and grandparents etc.
+    for ancestor in term.all_parents:
+        print(f"{ancestor}")
+
+Do not try to instantiate ``HPOTerm`` s manually. Doing this would miss all important links to parents, children, genes, diseases etc.
+
+
+HPOSet
+~~~~~~
+
+:doc:`sets` are an important feature of **PyHPO** for doing patient or disease based data analysis. An HPOSet is primarily just that: A set of HPOTerms. You can use it to document the clinical information or full phenotype of a patient or to describe a disease. ``HPOSet`` work on top of Pythons standard ``set`` (``Set[HPOTerm]``) and can easily be build from such. They do, however, provide a lot of additional functionality.
+
+HPOSets can be compared to each each other to identify similar patients or diseases. The similarity comparisons can be used for clustering patient cohorts.
+
+.. code:: python
+
+    from pyhpo import Ontology, HPOSet
+    _ = Ontology()
+
+    ci_1 = HPOSet.from_queries([
+        'HP:0002943',
+        'HP:0008458',
+        'HP:0100884',
+        'HP:0002944',
+        'HP:0002751'
+    ])
+
+    ci_2 = HPOSet.from_queries([
+        'HP:0002650',
+        'HP:0010674',
+        'HP:0000925',
+        'HP:0009121'
+    ])
+
+    # Determine the similarity
+    ci_1.similarity(ci_2)  # ==> 0.7593552670152157
+
+
+Enrichment
+~~~~~~~~~~
+
+**PyHPO** includes statistical tests to determine the hypergeometric enrichment of linked diseases or genes in a set of HPOTerms. You can use this to find genes that are relevant for the phenotype of a patient. More examples are documented in :doc:`enrichment`.