Short best practice blueprint for sharing code scripts with academic outputs #186
Replies: 6 comments 7 replies
-
Thanks for raising @adamkucharski - I think this is pretty important. Just putting down some thoughts here - might sound a bit basic for some and I've mostly picked these up informally. Here's an example of how I've previously structured repos for submission to journals: "Source Code and Supplementary Material for: A Guide to Pre-processing High-throughput Animal Tracking Data" Some basic steps, in addition to a clear Readme:
Some extra steps I take, which definitely take time but are worth it for oneself and the wider community imo:
|
Beta Was this translation helpful? Give feedback.
-
I think some general recommendations/suggestions would be good, but also think there is potentially scope for some slightly more opinionated definition of "best practice" (as e.g. in the EPIFORGE guidelines) that could be pointed by paper reviewers. |
Beta Was this translation helpful? Give feedback.
-
Thanks @sbfnk for pointing me to this post. I had a couple of more or less relevant thoughts :) Firstly - similar to @avallecam I thought I'd chip in to add an experience teaching on this theme (also with @avallecam and @bquilty25). I recently developed a workshop on "R for research - intro to good practices". It broadly covered the same principles as the Wilson et al paper (modularity, documentation etc), but was pitched slightly differently than the The feedback from course attendees was that the material was either new to them, or that it was practice they'd seen and had maybe partly tried to adopt but without having any structured motivation or guidance for doing so (e.g. directory set up or having a README). They especially liked having lots of small steps/packages that they could use straightaway to improve existing code. In general there was loads of interest in the material, and I think there's definitely unmet demand for making this really simple and accessible. Secondly, I had a thought which is more along the lines of @sbfnk suggestion for an opinionated piece. Personally I feel I'm missing and would love to see opinions/guidance on when to leave code for a paper as it is, or when to split out into a package. I've struggled with this in a couple of papers that may/may not be re-used, and wondered whether the extra steps for package dev are worth it (simple as they may be with |
Beta Was this translation helpful? Give feedback.
-
This is a really nice aspect to provide for users/readers. That 'to package or not to package' is something that's come up in a few other project discussions recently, agree could be useful to include. Shall we start with a draft blog post outlining key points/opinions for best practice, then can decide whether to keep as online post (faster, informal) or refine into a publication (slower, more formal)? |
Beta Was this translation helpful? Give feedback.
-
Interesting thread. I agree with many points already mentioned. I wonder if any blog post/paper we write should be more of a how-to guide to actively facilitate good practises, by pointing to existing tools and explaining step-by-step procedures that are crucial to reproducibility, e.g. version releases (DOIs) & repository structure. It will be easier if we focus on best practises for R rather than research software or analysis code in general, but would of course limit the readership of the piece.
This is a good point because it is not clear where to draw the line on "best practises" for code sharing, often code is shared in a very rough state and there are lots of small easy wins to improve this and a how-to guide could easily assist with this. However, there is also an optimal method of code sharing which could involve containers (e.g. Docker) and the use of tools like {renv} which are not as easy for all researchers to quickly pick up and use. |
Beta Was this translation helpful? Give feedback.
-
This seems like it might be relevant. Not read but sharing anyways ... |
Beta Was this translation helpful? Give feedback.
-
Description
As a paper reviewer, I increasingly see code shared on git repositories alongside papers (which is a good thing) but in an often impenetrable way (no README, minimal comments, not much file structure, unclear modularisation of code, no knitted vignettes, no licence).
I wonder if there is an opportunity to provide a short document on contributors' opinion for best practice for sharing analysis code that isn't necessiarly a full package. This isn't about asking users to fundamentally redesign their analysis code – rather ensure that it is clearly documented and structured, to enable ease-of-understanding and reproducibility.
It's to analogous similar discussions we're having with @joshwlambert and @CarmenTamayo in {epiparameter} – at one end, we have best practice for estimating parameters in the first place (more demanding on users) and at the other we have best practice for reporting (less demanding, but still valuable for removing reuse obstacles).
Typical end-users
Researchers publishing preprints/papers in outbreak analysis
Potential contributors
Others interested in best practice for code sharing
Key collaborators
Colleagues at LSHTM and beyond
Inputs
NA (not a package)
Outputs
NA (not a package)
Imports
NA
Used by
NA
Related projects
Model share programme (via @jamesmbaazam): https://sciencegateways.org/networking-community/community-news/n/introducing-modelshare-program
Repo quality metrics (@Bisaloo et al): WHO-Collaboratory/collaboratory-epipipeline-community#6
CODECHECK project: https://codecheck.org.uk/
Usage
Additional comments
...
Beta Was this translation helpful? Give feedback.
All reactions