Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PRE REVIEW]: lucie: An Improved Python Package for Loading Datasets from the UCI Machine Learning Repository #7689

Closed
editorialbot opened this issue Jan 17, 2025 · 12 comments
Labels
HTML JavaScript pre-review query-scope Submissions of uncertain scope for JOSS rejected TeX Track: 5 (DSAIS) Data Science, Artificial Intelligence, and Machine Learning

Comments

@editorialbot
Copy link
Collaborator

Submitting author: @kenneth-ge (Kenneth Ge)
Repository: https://github.com/ArnaoutLab/Lucie
Branch with paper.md (empty if default branch): main
Version: v1.0.2
Editor: Pending
Reviewers: Pending
Managing EiC: Chris Vernon

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/307ecb707a5d5102e19642d65d4ec755"><img src="https://joss.theoj.org/papers/307ecb707a5d5102e19642d65d4ec755/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/307ecb707a5d5102e19642d65d4ec755/status.svg)](https://joss.theoj.org/papers/307ecb707a5d5102e19642d65d4ec755)

Author instructions

Thanks for submitting your paper to JOSS @kenneth-ge. Currently, there isn't a JOSS editor assigned to your paper.

@kenneth-ge if you have any suggestions for potential reviewers then please mention them here in this thread (without tagging them with an @). You can search the list of people that have already agreed to review and may be suitable for this submission.

Editor instructions

The JOSS submission bot @editorialbot is here to help you find and assign reviewers and start the main review. To find out what @editorialbot can do for you type:

@editorialbot commands
@editorialbot editorialbot added pre-review Track: 5 (DSAIS) Data Science, Artificial Intelligence, and Machine Learning labels Jan 17, 2025
@editorialbot
Copy link
Collaborator Author

Hello human, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf

@editorialbot
Copy link
Collaborator Author

Software report:

github.com/AlDanial/cloc v 1.98  T=0.02 s (534.4 files/s, 220755.4 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
TeX                              1            142              4           1597
HTML                             1              0              0            746
JavaScript                       1             95             87            551
Python                           3            148             93            484
CSS                              1            166              3            168
Markdown                         2             65              0            147
TOML                             1              2              0             22
YAML                             1              1              4             19
-------------------------------------------------------------------------------
SUM:                            11            619            191           3734
-------------------------------------------------------------------------------

Commit count by author:

    17	Kenneth Ge
     4	kennethge
     1	rarnaout

@editorialbot
Copy link
Collaborator Author

Paper file info:

📄 Wordcount for paper.md is 2628

🔴 Failed to discover a Statement of need section in paper

@editorialbot
Copy link
Collaborator Author

License info:

✅ License found: MIT License (Valid open source OSI approved license)

@editorialbot
Copy link
Collaborator Author

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

✅ OK DOIs

- 10.1016/j.cub.2023.05.036 is OK
- 10.1038/nbt.2495 is OK
- 10.1128/JCM.01239-21 is OK
- 10.1101/2020.04.14.20065094 is OK
- 10.1371/journal.pone.0265233 is OK
- 10.3390/app11020796 is OK
- 10.1016/j.neunet.2018.07.011 is OK
- 10.1145/1007730.1007737 is OK
- 10.1145/1007730.1007736 is OK
- 10.1007/3-540-45153-6_7 is OK
- 10.1093/jamia/ocad055 is OK
- 10.1371/journal.pone.0282532 is OK
- 10.1161/CIR.0000000000001173 is OK
- 10.1016/j.jacc.2023.10.025 is OK
- 10.1016/j.jcmg.2023.05.012 is OK
- 10.1109/ACCESS.2020.3010287 is OK
- 10.1016/j.compbiomed.2021.104319 is OK
- 10.1038/s41591-021-01342-5 is OK
- 10.1016/j.media.2022.102680 is OK
- 10.1093/jamia/ocad055 is OK
- 10.1038/s41746-017-0013-1 is OK
- 10.1098/rstl.1763.0053 is OK
- 10.1038/163688a0 is OK
- 10.1126/science.168.3937.1345 is OK
- 10.2307/1934352 is OK
- 10.1890/06-1736.1 is OK
- 10.1098/rstb.2010.0272 is OK
- 10.1890/10-2402.1 is OK
- 10.1007/978-3-031-16749-2_13 is OK
- 10.1038/s41597-022-01721-8 is OK
- 10.48550/arXiv.2312.08912 is OK
- 10.1186/s40537-019-0197-0 is OK
- 10.1109/ICDAR.2003.1227801 is OK
- 10.1038/s41592-021-01302-4 is OK
- 10.48550/arXiv.2401.16298 is OK
- 10.1109/TPAMI.2024.3361979 is OK
- 10.1098/rsta.2021.0197 is OK
- 10.1056/NEJMra2214964 is OK
- 10.1109/CVPR46437.2021.01400 is OK
- 10.1016/j.artmed.2017.03.003 is OK
- 10.1109/TNN.2008.2005496 is OK
- 10.1038/s41598-019-42294-8 is OK
- 10.48550/arXiv.2203.09118 is OK
- 10.1007/s10994-022-06205-9 is OK
- 10.48550/arXiv.1805.03677 is OK
- 10.1109/WACV45572.2020.9093475 is OK
- 10.1109/TCYB.2021.3105696 is OK
- 10.1109/TCYB.2021.3054978 is OK
- 10.1162/EVCO_a_00102 is OK
- 10.1109/TSMCB.2012.2191953 is OK
- 10.1088/0031-9155/56/2/012 is OK
- 10.1109/TSMCB.2012.2206381 is OK
- 10.1109/TPAMI.2010.155 is OK
- 10.1109/TPAMI.2009.164 is OK
- 10.1109/TPAMI.2006.248 is OK
- 10.1109/AITEST52744.2021.00011 is OK
- 10.3390/app11020472 is OK
- 10.1007/978-981-19-8746-5_3 is OK
- 10.3390/jimaging6040016 is OK
- 10.1093/jamia/ocad055 is OK
- 10.1016/j.jcmg.2023.05.012 is OK
- 10.1002/uog.27503 is OK
- 10.1109/ITSC.2017.8317828 is OK
- 10.1093/pubmed/fdac031 is OK

🟡 SKIP DOIs

- No DOI given, and none found for title: Beautiful soup documentation
- No DOI given, and none found for title: UCI Diabetes Data Set
- No DOI given, and none found for title: ucirvine/reuters21578 · Datasets at Hugging Face
- No DOI given, and none found for title: cernoch/movies
- No DOI given, and none found for title: reuters-text-categorization/reuters_loader.py at m...
- No DOI given, and none found for title: uci-ml-repo/ucimlrepo
- No DOI given, and none found for title: Top Sources For Machine Learning Datasets
- No DOI given, and none found for title: The UCI machine learning repository
- No DOI given, and none found for title: Beyond Size and Class Balance: Alpha as a New Data...
- No DOI given, and none found for title: Concept Learning and the Problem of Small Disjunct...
- No DOI given, and none found for title: On measures of entropy and information
- No DOI given, and none found for title: How to partition diversity
- No DOI given, and none found for title: Entropy and Diversity: The Axiomatic Approach
- No DOI given, and none found for title: Dropout: a simple way to prevent neural networks f...
- No DOI given, and none found for title: Dropout as a Bayesian Approximation: Representing ...
- No DOI given, and none found for title: Deep Bayesian Active Learning with Image Data
- No DOI given, and none found for title: MNIST handwritten digit database, 1998
- No DOI given, and none found for title: \textitgreylock: A Python Package for Measuring Th...
- No DOI given, and none found for title: Scikit-learn: Machine Learning in Python
- No DOI given, and none found for title: PyTorch: An Imperative Style, High-Performance Dee...
- No DOI given, and none found for title: Machine Learning for Imbalanced Data: Tackle imbal...
- No DOI given, and none found for title: You Only Condense Once: Two Rules for Pruning Cond...
- No DOI given, and none found for title: Tangent Prop - A formalism for specifying selected...
- No DOI given, and none found for title: Effective Training of a Neural Network Character C...
- No DOI given, and none found for title: Learning algorithms for classification: A comparis...
- No DOI given, and none found for title: The class imbalance problem: Significance and stra...
- No DOI given, and none found for title: Grokking: Generalization Beyond Overfitting on Sma...
- No DOI given, and none found for title: Exploiting Unlabeled Texts with Clustering-based I...
- No DOI given, and none found for title: A Survey of Data Optimization for Problems in Comp...
- No DOI given, and none found for title: A Comparative Survey of Deep Active Learning
- No DOI given, and none found for title: Image similarity using Deep CNN and Curriculum Lea...
- No DOI given, and none found for title: Self-Supervised Similarity Learning for Digital Pa...
- No DOI given, and none found for title: Deep Generative Models in the Real-World: An Open ...
- No DOI given, and none found for title: Image Complexity Guided Network Compression for Bi...
- No DOI given, and none found for title: Self-Supervised Learning with an Information Maxim...
- No DOI given, and none found for title: Understanding FAISS
- No DOI given, and none found for title: Self-Supervised Visual Representation Learning fro...
- No DOI given, and none found for title: A Bayesian Perspective of Convolutional Neural Net...
- No DOI given, and none found for title: The Minimum Information about CLinical Artificial ...
- No DOI given, and none found for title: Label-free segmentation from cardiac ultrasound us...
- No DOI given, and none found for title: Reduced, Reused and Recycled: The Life of a Datase...
- No DOI given, and none found for title: Are we cobblers without shoes? Making Computer Sci...

❌ MISSING DOIs

- 10.1126/science.zyhtj1t may be a valid DOI for title: UK Biobank releases half a million whole-genome se...
- 10.1109/icassp49357.2023.10095172 may be a valid DOI for title: Prune then distill: Dataset distillation with impo...
- 10.4135/9781412959384.n229 may be a valid DOI for title: A mathematical theory of communication
- 10.1109/isbi48211.2021.9434062 may be a valid DOI for title: MedMNIST Classification Decathlon: A Lightweight A...
- 10.1109/cvpr.2016.90 may be a valid DOI for title: Deep Residual Learning for Image Recognition
- 10.1007/978-3-642-25093-4_1 may be a valid DOI for title: KOIOS: Utilizing Semantic Search for Easy-Access a...

❌ INVALID DOIs

- https://doi.org/10.1016/j.envsoft.2016.10.006 is INVALID because of 'https://doi.org/' prefix

@editorialbot
Copy link
Collaborator Author

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

@editorialbot
Copy link
Collaborator Author

Five most similar historical JOSS papers:

Foundry-ML - Software and Services to Simplify Access to Machine Learning Datasets in Materials Science
Submitting author: @blaiszik
Handling editor: @Fei-Tao (Active)
Reviewers: @duhd1993, @marshallmcdonnell
Similarity score: 0.6213

Rdataretriever: R Interface to the Data Retriever
Submitting author: @henrykironde
Handling editor: @fboehm (Active)
Reviewers: @rmhogervorst, @jsgalan
Similarity score: 0.6156

Autorank: A Python package for automated ranking of classifiers
Submitting author: @sherbold
Handling editor: @arfon (Active)
Reviewers: @JonathanReardon, @ejhigson
Similarity score: 0.6137

CGIMP: Real-time exploration and covariate projection for self-organizing map datasets
Submitting author: @adadiehl
Handling editor: @lpantano (Active)
Reviewers: @adriancbondia, @ @arunhpatil, @arunhpatil
Similarity score: 0.6074

AuDoLab: Automatic document labelling and classification for extremely unbalanced data
Submitting author: @ArneTillmann
Handling editor: @arfon (Active)
Reviewers: @linuxscout, @pps121
Similarity score: 0.6073

⚠️ Note to editors: If these papers look like they might be a good match, click through to the review issue for that paper and invite one or more of the authors before considering asking the reviewers of these papers to review again for JOSS.

@crvernon
Copy link

@editorialbot query scope

👋 @kenneth-ge - I am going to run this through review with our larger editorial board to ensure it meets our research software requirements. I'll be back in touch ASAP.

@editorialbot
Copy link
Collaborator Author

Submission flagged for editorial review.

@editorialbot editorialbot added the query-scope Submissions of uncertain scope for JOSS label Jan 17, 2025
@kenneth-ge
Copy link

@crvernon Thanks sm! Lmk if there is anything you need me to do.

@crvernon
Copy link

@editorialbot reject

@kenneth-ge - Thank you for your submission to JOSS. After further review with our larger editorial board, we have decided to reject your submission due to it not meeting our substantial scholarly effort requirements on the following grounds:

  • low lines of code count
  • short commit history

We wish you the best.

@editorialbot
Copy link
Collaborator Author

Paper rejected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
HTML JavaScript pre-review query-scope Submissions of uncertain scope for JOSS rejected TeX Track: 5 (DSAIS) Data Science, Artificial Intelligence, and Machine Learning
Projects
None yet
Development

No branches or pull requests

3 participants