diff --git a/docs/source/experiments.rst b/docs/source/experiments.rst index 1482156..f096530 100644 --- a/docs/source/experiments.rst +++ b/docs/source/experiments.rst @@ -53,8 +53,8 @@ images. The dataset is conformed by several histology patches of size 512x512 px. Masks labels exits for an expert pathologist and 20 medical -students. Every single patch contains label for every annotator as shown -in the figure: +students. Every single patch contains label for some scorers as shown in +the figure: .. image:: resources/crowd-seg-example-instances.png :width: 100% @@ -78,12 +78,12 @@ You can fetch your data simply like this: for i in range(1): img, mask = val[i] print(f"Batch {i}: {img.shape}, {mask.shape}") - -Output: + +Output: .. code:: text - Batch 0: (8, 512, 512, 3), (8, 512, 512, 6, 23) + Batch 0: (8, 512, 512, 3), (8, 512, 512, 6, 23) A single stage can also be fetched, and even visualized: @@ -95,42 +95,60 @@ A single stage can also be fetched, and even visualized: val = get_stage_data(stage = Stage.VAL, batch_size=8) val.visualize_sample() -When running -the ``visualize_sample`` method, the generator will load the images and -masks from the disk and display them, with a result similar to the -following: +For architecture debugging purposes, you can also fetch the data with a +downsampling for forcing balance between scoreres: + +.. code:: python + + train = get_stage_data(stage = Stage.TRAIN, batch_size=8, force_balance=True) + +Output: + +.. code:: text + + Loading train data with forced balance... + INFO:seg_tgce.data.crowd_seg.generator:Forced balance: limiting to 102 images per scorer. + +When running the ``visualize_sample`` method, the generator will load +the images and masks from the disk and display them, with a result +similar to the following: .. image:: resources/crowd-seg-generator-visualization.png :width: 100% :align: center :alt: sample from the CrowdSeg dataset with the ``ImageDataGenerator`` class. -Loading the dataset manually +Loading the generator itself ============================ -If you already have a downloaded dataset in a certain directory, you can -load it symply as a keras sequence with the ``ImageDataGenerator`` -class: +You can also instantiate the generator directly, from the class +``ImageDataGenerator``, which is the same as returned by the +``get_all_data`` and ``get_stage_data`` factories: .. code:: python - from seg_tgce.data.crowd_seg import ImageDataGenerator + from seg_tgce.data.crowd_seg.generator import ImageDataGenerator + from seg_tgce.data.crowd_seg.stage import Stage - val_gen = ImageDataGenerator( - image_dir="/Histology Data/patches/Val", - mask_dir="/Histology Data/masks/Val", + train_gen = ImageDataGenerator( + stage=Stage.VAL, batch_size=16, n_classes=6, ) - print(f"Train len: {len(val_gen)}") - print(f"Train masks scorers: {val_gen.n_scorers}") - print(f"Train masks scorers tags: {val_gen.scorers_tags}") + print(f"Train len: {len(train_gen)}") + print(f"Train masks scorers: {train_gen.n_scorers}") + print(f"Train masks scorers tags: {train_gen.scorers_tags}") val_gen.visualize_sample( batch_index=8, sample_index=8, scorers=["NP8", "NP16", "NP21", "expert"], ) -The ``ImageDataGenerator`` class is a subclass of the Keras ``Sequence`` -class, which allows us to load the dataset in a lazy way. +.. note:: + The ``ImageDataGenerator`` class is a subclass of the Keras + ``Sequence`` class, which allows us to load the dataset in a lazy + way. On the first instantiation, the generator will download the + dataset in a local directory ``__data__/crowd_seg``, which might take + a while. After that, the generator will load the images and masks on + the fly, as requested by the training loop. diff --git a/docs/source/index.rst b/docs/source/index.rst index 232f0a4..5ca5921 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -1,4 +1,3 @@ - ###################################### Semantic Segmentation TCGE Framework ######################################