Skip to content

Latest commit

 

History

History
81 lines (68 loc) · 3.75 KB

curation_notes.md

File metadata and controls

81 lines (68 loc) · 3.75 KB

SCWAReD Workset Curation and Hosting

Random sites to explore

Possible directory structure for hosted worksets

+ workset title
    + docs
        + index.html
        + introduction.md
        + project_report.md
        + images
    + worksets
    	+ workset.csv
	+ workset.json
	+ README-worksets.md
    + datasets
        + dataset1
            + dataset1.csv
            + README-dataset1.md
        + dataset2
            + dataset2.csv
            + README-dataset2.md
        + dataset3
            + dataset3.json
            + README-dataset3.md

See https://docs.github.com/en/pages/getting-started-with-github-pages/configuring-a-publishing-source-for-your-github-pages-site for configuring a repository directory to serve pages at github.io.

Next steps

Workset file format

  • Fields we may want to use:

    • briefDescription
    • fullDescription
    • creator // name, email, affiliation. How to model in JSON?
    • created
    • publisher? HTRC?
    • title
    • description cf. briefDescription, fullDescription
    • languages
    • contributer? HTRC staff?
    • subject
    • extent
    • visibility
  • What does this message mean when downloading a workset: “The downloaded workset will contain only 906 volumes out of the total(919 volumes).”?

  • Workset CSV includes titles. JSON does not. Titles would be useful for someone who wanted to browse the workset in the github repository. Could we include the titles in the JSON-LD? We could include both the JSON-LD and CSV, but that seems confusing to for users--two different data formats representing the workset, but each with different data. If we got titles into the JSON-LD it would be complete: metadata about the workset, and the list of IDs with titles.

Derived datasets

General idea: Come up with two or three derived datasets that we can generate for all SCWAReD projects. Then we know we will deliver that part of the deliverable, and there will be some consistency among the SCWAReD packages. If scholars generate additional datasets, of course, we will use those as well. Also, we can re-use READM.md files across all the packages.

Possibilities:

  • names entity data
  • geographic data w/ coordinates
  • genre data
  • sentiment data

Issues:

  • get scholar approval to include these in packages.