Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] When files are at different levels in the folder tree... #18

Open
alexsavio opened this issue Oct 31, 2016 · 0 comments
Open

[WIP] When files are at different levels in the folder tree... #18

alexsavio opened this issue Oct 31, 2016 · 0 comments

Comments

@alexsavio
Copy link
Owner

alexsavio commented Oct 31, 2016

hansel is quite nice when all the files in your folder structure are at the same level.

The programming patterns or the features Crumb would need to work with files at different levels is yet not clear to me.

For example:

└── kr_281358
    └── session_0
        ├── anat
        │   ├── anat_hc_biascorrected.nii
        │   ├── tissues
        │   │   ├── native
        │   │   │   ├── anat_hc_csf.nii
        │   │   │   ├── anat_hc_gm.nii
        │   │   │   └── anat_hc_wm.nii
        │   ├── tissues_brain_mask.nii.gz
        │   └── transform
        │       ├── anat_hc_to_mni_affine.mat
        └── rest
            ├── anat_hc_rest.nii
            ├── artifact_stats
            │   ├── motion_stats.json
            ├── avg_epi_anat.nii

Imagine this tree hangs from /media/data.
If I needed a Crumb to work with this whole tree, I would need crumb arguments for each of its levels:

cr = Crumb("/home/data/{sid}/session_0/{modality}/{img}/{file}/{subfile}")

In this case I would have files in the img, file, and subfile levels. cr would work without any problems finding those files, the issue is: "how to make it almost as convenient and beautiful as if the tree had files in only one level?". Of course you can still work only with cr and set the crumb arguments at your convenience, however:

  1. putting sensible names to the crumb arguments can be tricky or impossible, and
  2. you might get into trouble when working with the earliest levels, since you will have to make sure that no folder in a deeper level, e.g. subfile, matches your search criteria, it could happen that cr finds matches one or more levels after what you are searching for if you don't explicitly set it as empty.
    This usually wouldn't be a problem if your file names and folder names don't match at all, but who knows?

So there should be a way to restrict globally the depth in the Crumb when you are looking for files in an early argument.

I have thought on some options, which may be improved with new features of different complexities.

Option 1

The most straightforward way I can think of is to have different Crumbs for files in different levels. For example, given the previous cr:

import os.path as path

base_dir, crumbs = cr.split()

print(base_dir)
>>> '/home/data'

print(crumbs)
>>> '{sid}/session_0/{modality}/{img}/{file}/{file2}'

# let's say we want a crumb for the files in the `img` level:
max_depth = '{img}'
crumbs  = crumbs.split('/')
img_crumbs = path.sep.join(crumbs[:index(max_depth)+1])

# we create the crumb for the img level files...
img_cr = Crumb(path.join(base_dir, img_crumbs))

print(img_cr)
>>> Crumb("/home/data/{sid}/session_0/{modality}/{img}")

One clean solution for this would be to add a function, e.g. branch_out or max_depth or set_limit, which would return a copy of the Crumb, with a smaller path only up to the crumb argument given to this function. For example:
This function would have to take care of copying and correcting the internal patterns already set for the arguments, if any.

# we create the crumb for the img level files...
img_cr = cr.branch_out('img')

print(img_cr)
>>> Crumb("/home/data/{sid}/session_0/{modality}/{img}")

The good thing of this solution is that is easy to implement, however is not too much different from the current situation, and you would have to set/replace arguments and patterns for each Crumb object you want to work with.

Another way of seeing this solution would be:

biascorr_cr = cr.max_depth('img').set_pattern('img', 'anat_hc_biascorrected.nii')
motionst_cr = cr.max_depth('file').set_pattern('file', 'motion_stats.json')

print(biascorr_cr)
>>> Crumb("/home/data/{sid}/session_0/{modality}/{img:anat_hc_biascorrected.nii}")

print(motionst_cr)
>>> Crumb("/home/data/{sid}/session_0/{modality}/{img}/{file:motion_stats.json}")

motionstat_files = motionst_cr.ls()

For the example I have this not necessary, but I am thinking on a more complex and bigger folder tree.

Option 2

A max_depth argument in most of the functions in Crumb?
I think this would lead to an uglier and more complex solution.

Option 3

A set of synchronised Crumb objects accessible by name inside a Crumbs class...
With "synchronised" I mean that they keep only one global copy of pattern values.
TBD

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant