Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

print an informative output even if there is no validation issue #143

Closed
avallecam opened this issue Jul 30, 2024 · 4 comments · Fixed by #146
Closed

print an informative output even if there is no validation issue #143

avallecam opened this issue Jul 30, 2024 · 4 comments · Fixed by #146

Comments

@avallecam
Copy link
Member

avallecam commented Jul 30, 2024

Is your feature request related to a problem? Please describe.

The output of validate_linelist() is equivalent to the output of make_linelist(). It would be informative to users to get a message that validation was successfull.

library(outbreaks)
library(tidyverse)
library(linelist)

# cleaned data
ebola <- outbreaks::ebola_sim_clean %>% 
  pluck("linelist") %>% 
  as_tibble()

# messy data
ebola_messy <- ebola %>% 
  mutate(date_of_onset = as.character(date_of_onset))

# ebola %>% glimpse()
# linelist::tags_names()

# validate messy data - rejected
ebola_messy %>% 
  make_linelist(id = "case_id",date_onset = "date_of_onset") %>% 
  validate_linelist()
#> Error: Some tags have the wrong class:
#>   - date_onset: Must inherit from class 'integer'/'numeric'/'Date'/'POSIXct'/'POSIXlt', but has class 'character'


# validate cleaned data - passed
ebola %>%  
  make_linelist(id = "case_id",date_onset = "date_of_onset") %>% 
  validate_linelist()
#> 
#> // linelist object
#> # A tibble: 5,829 × 11
#>    case_id generation date_of_infection date_of_onset date_of_hospitalisation
#>    <chr>        <int> <date>            <date>        <date>                 
#>  1 d1fafd           0 NA                2014-04-07    2014-04-17             
#>  2 53371b           1 2014-04-09        2014-04-15    2014-04-20             
#>  3 f5c3d8           1 2014-04-18        2014-04-21    2014-04-25             
#>  4 6c286a           2 NA                2014-04-27    2014-04-27             
#>  5 0f58c4           2 2014-04-22        2014-04-26    2014-04-29             
#>  6 49731d           0 2014-03-19        2014-04-25    2014-05-02             
#>  7 f9149b           3 NA                2014-05-03    2014-05-04             
#>  8 881bd4           3 2014-04-26        2014-05-01    2014-05-05             
#>  9 e66fa4           2 NA                2014-04-21    2014-05-06             
#> 10 20b688           3 NA                2014-05-05    2014-05-06             
#> # ℹ 5,819 more rows
#> # ℹ 6 more variables: date_of_outcome <date>, outcome <fct>, gender <fct>,
#> #   hospital <fct>, lon <dbl>, lat <dbl>
#> 
#> // tags: id:case_id, date_onset:date_of_onset

Created on 2024-07-30 with reprex v2.1.0

Describe the solution you'd like
A printed message like All tagged variables are valid. or similar.

Additional context
A similar request was suggested for the cleanepi package at epiverse-trace/cleanepi#150

@Bisaloo
Copy link
Member

Bisaloo commented Aug 8, 2024

What about the case where this is used in a pipeline and the user just wants the pipeline to continue if everything is valid.

Should this behaviour be controlled by an extra argument?

@avallecam
Copy link
Member Author

What about the case where this is used in a pipeline and the user just wants the pipeline to continue if everything is valid.

Should this behaviour be controlled by an extra argument?

if everything is valid, and we get a message similar to a "warning" (or just a neutral print message) I think this will allow the pipeline to continue, right?

I imagine sth similar to the

#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

as in this reprex:

library(tidyverse)
starwars %>% 
  ggplot(aes(x = height)) +
  geom_histogram()
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> Warning: Removed 6 rows containing non-finite outside the scale range
#> (`stat_bin()`).

Created on 2024-08-08 with reprex v2.1.0

@Bisaloo
Copy link
Member

Bisaloo commented Aug 12, 2024

if everything is valid, and we get a message similar to a "warning" (or just a neutral print message) I think this will allow the pipeline to continue, right?

Yes, you're correct, the pipeline can continue. One problem with long pipelines however is that you get overflowed with messages and they are no longer helpful.

@chartgerink, any opinions on this, as it is relevant for datatagr as well?

@chartgerink
Copy link
Member

I don't see any problems with that 👍 Thanks for the suggestion @avallecam 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants