RDF Dataset Description #110

couppeym · 2023-09-20T14:48:58Z

couppeym
Sep 20, 2023

Hey there,

I've been working on an RDF implementation/description way to describe ERDDAP dataset's using DCAT for a European Project (FAIR-EASE),

The main idea behind all of this, would be to (externally), create a client which harvest a list of ERDDAP server (or data provider in general) to create a database (in our case a TripleStore) which contains every useful information about every (public) dataset, like their description, where to find them, specification about things like the subsetting URL..
Using this Triplestore we could use it to find the best suited Dataset to our needs (based on the description) and get the URL to it (+ ideally the associated subsetting URL, to get the minimal amount of information).
For instance, if we are looking for a dataset in a certain spatial region and time, it will ask for the TripleStore to get every dataset matching theses constraints (across the multiple Data Provider Server), and return the corresponding URL and if it's possible the best suited Subsetting URL.

I'm using an external RDF library to generate in-memory graph, and serialize it.
See: https://jena.apache.org/tutorials/rdf_api.html

This does not intend to replace the .das or .dds format, but rather allows user (or bot) for every dataset, to get :

The general description (identifier, title, description, keywords, author, ..)
The description of each column/row/variable (name, description, data type, min/max values, default value, ..)
The differents files formats available (file type or MIME type, URL)
(experimental) The description of the subsetting URL

Currently, the RDF model looks like this :

We can now generate these file formats :

.jsonld : Json LD
.n3 : Notation3
.nt : N-Triples
.nq : N-Quads
.rdfxml : RDF XML
.trig : TriG
.ttl : Turtle

For example, for the dataset : https://coastwatch.pfeg.noaa.gov/erddap/griddap/erdMWcflh1day.html
If you're asking to the RDF Model in Turtle, http://localhost:8080/erddap/griddap/erdMWcflh1day.ttl would return :

@base             <https://coastwatch.pfeg.noaa.gov/erddap/griddap/erdMWcflh1day> .
@prefix csvw:     <http://www.w3.org/ns/csvw#> .
@prefix dcat:     <http://www.w3.org/ns/dcat#> .
@prefix dct:      <http://purl.org/dc/terms/> .
@prefix fairease: <http://fairease.eu#> .
@prefix foaf:     <http://xmlns.com/foaf/0.1/> .
@prefix gsp:      <http://www.opengis.net/ont/geosparql#> .
@prefix hydra:    <http://www.w3.org/ns/hydra/core#> .
@prefix qudt:     <http://qudt.org/schema/qudt/> .
@prefix rdf:      <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:     <http://www.w3.org/2000/01/rdf-schema#> .
@prefix schema:   <http://schema.org/> .
@prefix sh:       <http://www.w3.org/ns/shacl#> .
@prefix xsd:      <http://www.w3.org/2001/XMLSchema#> .

<#.mat>  rdf:type         dcat:Distribution;
        dct:format        ".mat";
        dcat:downloadURL  "https://coastwatch.pfeg.noaa.gov/erddap/griddap/erdMWcflh1day.mat" .

<#.nq>  rdf:type          dcat:Distribution;
        dcat:downloadURL  "https://coastwatch.pfeg.noaa.gov/erddap/griddap/erdMWcflh1day.nq";
        dcat:mediaType    <https://www.iana.org/assignments/media-types/application/n-quads> .

<#.timeGaps>  rdf:type    dcat:Distribution;
        dct:format        ".timeGaps";
        dcat:downloadURL  "https://coastwatch.pfeg.noaa.gov/erddap/griddap/erdMWcflh1day.timeGaps" .

<#.ncHeader>  rdf:type    dcat:Distribution;
        dct:format        ".ncHeader";
        dcat:downloadURL  "https://coastwatch.pfeg.noaa.gov/erddap/griddap/erdMWcflh1day.ncHeader" .

<#.n3>  rdf:type          dcat:Distribution;
        dcat:downloadURL  "https://coastwatch.pfeg.noaa.gov/erddap/griddap/erdMWcflh1day.n3";
        dcat:mediaType    <https://www.iana.org/assignments/media-types/text/n3> .

<#.csv>  rdf:type         dcat:Distribution;
        dcat:downloadURL  "https://coastwatch.pfeg.noaa.gov/erddap/griddap/erdMWcflh1day.csv";
        dcat:mediaType    <https://www.iana.org/assignments/media-types/text/csv> .

<#.itx>  rdf:type         dcat:Distribution;
        dct:format        ".itx";
        dcat:downloadURL  "https://coastwatch.pfeg.noaa.gov/erddap/griddap/erdMWcflh1day.itx" .

<#.dods>  rdf:type        dcat:Distribution;
        dcat:downloadURL  "https://coastwatch.pfeg.noaa.gov/erddap/griddap/erdMWcflh1day.dods";
        dcat:mediaType    <https://www.iana.org/assignments/media-types/application/octet-stream> .

<#.nccsv>  rdf:type       dcat:Distribution;
        dcat:downloadURL  "https://coastwatch.pfeg.noaa.gov/erddap/griddap/erdMWcflh1day.nccsv";
        dcat:mediaType    <https://www.iana.org/assignments/media-types/text/csv> .

<#.subsettingOpenDap>
        rdf:type                  dcat:Dataservice;
        dcat:endPointDescription  [ rdf:type           hydra:ApiDocumentation;
                                    hydra:description  "ERDDAP Api to get erdMWcflh1day, or subsettings of the erdMWcflh1day dataset";
                                    hydra:entrypoint   "https://coastwatch.pfeg.noaa.gov/erddap/griddap/erdMWcflh1day"^^xsd:anyURI;
                                    hydra:title        "ERDDAP Api"
                                  ];
        dcat:servesDataset        <>;
        hydra:entrypoint          "https://coastwatch.pfeg.noaa.gov/erddap/griddap/erdMWcflh1day"^^xsd:anyURI;
        hydra:supportedOperation  [ rdf:type        hydra:Operation;
                                    dct:spatial     [ rdf:type                dct:PeriodOfTime;
                                                      fairease:generateValue  [ rdf:type                 fairease:Template;
                                                                                fairease:targetProperty  dcat:bbox;
                                                                                fairease:valueTemplate   "POLYGON(({latitude.min} {longitude.min},{latitude.min} {longitude.max},{latitude.max} {longitude.max},{latitude.max} {longitude.min},{latitude.min} {longitude.min}))";
                                                                                fairease:valueType       gsp:wktLiteral
                                                                              ]
                                                    ];
                                    dct:temporal    [ rdf:type                dct:Location;
                                                      fairease:generateValue  [ rdf:type                 fairease:Template;
                                                                                fairease:targetProperty  dcat:endDate;
                                                                                fairease:valueTemplate   "{time.min}";
                                                                                fairease:valueType       xsd:dateTime
                                                                              ];
                                                      fairease:generateValue  [ rdf:type                 fairease:Template;
                                                                                fairease:targetProperty  dcat:endDate;
                                                                                fairease:valueTemplate   "{time.max}";
                                                                                fairease:valueType       xsd:dateTime
                                                                              ]
                                                    ];
                                    hydra:method    "GET";
                                    hydra:property  [ rdf:type        hydra:IriTemplate;
                                                      hydra:mapping   [ rdf:type             hydra:IriTemplateMapping;
                                                                        rdfs:label           "File extension Format";
                                                                        schema:defaultValue  "htmlTable";
                                                                        hydra:required       "true";
                                                                        hydra:variable       "format"
                                                                      ];
                                                      hydra:mapping   [ rdf:type             hydra:IriTemplateMapping;
                                                                        rdfs:label           "File extension Format";
                                                                        schema:defaultValue  "htmlTable";
                                                                        hydra:required       "true";
                                                                        hydra:variable       "format"
                                                                      ];
                                                      hydra:mapping   [ rdf:type             hydra:IriTemplateMapping;
                                                                        rdfs:range           "http://www.w3.org/2001/XMLSchema#double";
                                                                        schema:defaultValue  "255.0"^^xsd:double;
                                                                        hydra:required       true;
                                                                        hydra:variable       "longitude.max"
                                                                      ];
                                                      hydra:mapping   [ rdf:type             hydra:IriTemplateMapping;
                                                                        rdfs:range           "http://www.w3.org/2001/XMLSchema#string";
                                                                        schema:defaultValue  "longitude";
                                                                        hydra:required       true;
                                                                        hydra:variable       "longitude"
                                                                      ];
                                                      hydra:mapping   [ rdf:type             hydra:IriTemplateMapping;
                                                                        rdfs:range           "http://www.w3.org/2001/XMLSchema#dateTime";
                                                                        schema:defaultValue  "2023-07-25T12:00:00Z"^^xsd:dateTime;
                                                                        hydra:required       true;
                                                                        hydra:variable       "time.min"
                                                                      ];
                                                      hydra:mapping   [ rdf:type             hydra:IriTemplateMapping;
                                                                        rdfs:range           "http://www.w3.org/2001/XMLSchema#string";
                                                                        schema:defaultValue  "time";
                                                                        hydra:required       true;
                                                                        hydra:variable       "time"
                                                                      ];
                                                      hydra:mapping   [ rdf:type        hydra:IriTemplateMapping;
                                                                        rdfs:range      "http://www.w3.org/2001/XMLSchema#float";
                                                                        hydra:required  false;
                                                                        hydra:variable  "fluorescence.min"
                                                                      ];
                                                      hydra:mapping   [ rdf:type             hydra:IriTemplateMapping;
                                                                        rdfs:range           "http://www.w3.org/2001/XMLSchema#dateTime";
                                                                        schema:defaultValue  "1";
                                                                        hydra:required       true;
                                                                        hydra:variable       "time.str"
                                                                      ];
                                                      hydra:mapping   [ rdf:type             hydra:IriTemplateMapping;
                                                                        rdfs:label           "File extension Format";
                                                                        schema:defaultValue  "htmlTable";
                                                                        hydra:required       "true";
                                                                        hydra:variable       "format"
                                                                      ];
                                                      hydra:mapping   [ rdf:type             hydra:IriTemplateMapping;
                                                                        rdfs:range           "http://www.w3.org/2001/XMLSchema#double";
                                                                        schema:defaultValue  "1";
                                                                        hydra:required       true;
                                                                        hydra:variable       "altitude.str"
                                                                      ];
                                                      hydra:mapping   [ rdf:type        hydra:IriTemplateMapping;
                                                                        rdfs:range      "http://www.w3.org/2001/XMLSchema#float";
                                                                        hydra:required  false;
                                                                        hydra:variable  "fluorescence.max"
                                                                      ];
                                                      hydra:mapping   [ rdf:type             hydra:IriTemplateMapping;
                                                                        rdfs:range           "http://www.w3.org/2001/XMLSchema#double";
                                                                        schema:defaultValue  "51.0"^^xsd:double;
                                                                        hydra:required       true;
                                                                        hydra:variable       "latitude.max"
                                                                      ];
                                                      hydra:mapping   [ rdf:type             hydra:IriTemplateMapping;
                                                                        rdfs:label           "File extension Format";
                                                                        schema:defaultValue  "htmlTable";
                                                                        hydra:required       "true";
                                                                        hydra:variable       "format"
                                                                      ];
                                                      hydra:mapping   [ rdf:type             hydra:IriTemplateMapping;
                                                                        rdfs:range           "http://www.w3.org/2001/XMLSchema#dateTime";
                                                                        schema:defaultValue  "2023-07-25T12:00:00Z"^^xsd:dateTime;
                                                                        hydra:required       true;
                                                                        hydra:variable       "time.max"
                                                                      ];
                                                      hydra:mapping   [ rdf:type             hydra:IriTemplateMapping;
                                                                        rdfs:range           "http://www.w3.org/2001/XMLSchema#double";
                                                                        schema:defaultValue  "205.0"^^xsd:double;
                                                                        hydra:required       true;
                                                                        hydra:variable       "longitude.min"
                                                                      ];
                                                      hydra:mapping   [ rdf:type             hydra:IriTemplateMapping;
                                                                        rdfs:range           "http://www.w3.org/2001/XMLSchema#double";
                                                                        schema:defaultValue  "0.0"^^xsd:double;
                                                                        hydra:required       true;
                                                                        hydra:variable       "altitude.max"
                                                                      ];
                                                      hydra:mapping   [ rdf:type             hydra:IriTemplateMapping;
                                                                        rdfs:range           "http://www.w3.org/2001/XMLSchema#string";
                                                                        schema:defaultValue  "fluorescence";
                                                                        hydra:required       false;
                                                                        hydra:variable       "fluorescence"
                                                                      ];
                                                      hydra:mapping   [ rdf:type             hydra:IriTemplateMapping;
                                                                        rdfs:range           "http://www.w3.org/2001/XMLSchema#float";
                                                                        schema:defaultValue  "1";
                                                                        hydra:required       false;
                                                                        hydra:variable       "fluorescence.str"
                                                                      ];
                                                      hydra:mapping   [ rdf:type             hydra:IriTemplateMapping;
                                                                        rdfs:range           "http://www.w3.org/2001/XMLSchema#double";
                                                                        schema:defaultValue  "0.0"^^xsd:double;
                                                                        hydra:required       true;
                                                                        hydra:variable       "altitude.min"
                                                                      ];
                                                      hydra:mapping   [ rdf:type             hydra:IriTemplateMapping;
                                                                        rdfs:range           "http://www.w3.org/2001/XMLSchema#string";
                                                                        schema:defaultValue  "altitude";
                                                                        hydra:required       true;
                                                                        hydra:variable       "altitude"
                                                                      ];
                                                      hydra:mapping   [ rdf:type             hydra:IriTemplateMapping;
                                                                        rdfs:range           "http://www.w3.org/2001/XMLSchema#double";
                                                                        schema:defaultValue  "1";
                                                                        hydra:required       true;
                                                                        hydra:variable       "latitude.str"
                                                                      ];
                                                      hydra:mapping   [ rdf:type             hydra:IriTemplateMapping;
                                                                        rdfs:range           "http://www.w3.org/2001/XMLSchema#string";
                                                                        schema:defaultValue  "latitude";
                                                                        hydra:required       true;
                                                                        hydra:variable       "latitude"
                                                                      ];
                                                      hydra:mapping   [ rdf:type             hydra:IriTemplateMapping;
                                                                        rdfs:range           "http://www.w3.org/2001/XMLSchema#double";
                                                                        schema:defaultValue  "22.0"^^xsd:double;
                                                                        hydra:required       true;
                                                                        hydra:variable       "latitude.min"
                                                                      ];
                                                      hydra:mapping   [ rdf:type             hydra:IriTemplateMapping;
                                                                        rdfs:range           "http://www.w3.org/2001/XMLSchema#double";
                                                                        schema:defaultValue  "1";
                                                                        hydra:required       true;
                                                                        hydra:variable       "longitude.str"
                                                                      ];
                                                      hydra:mapping   [ rdf:type             hydra:IriTemplateMapping;
                                                                        rdfs:label           "File extension Format";
                                                                        schema:defaultValue  "htmlTable";
                                                                        hydra:required       "true";
                                                                        hydra:variable       "format"
                                                                      ];
                                                      hydra:template  "https://coastwatch.pfeg.noaa.gov/erddap/griddap/erdMWcflh1day.{format}?fluorescence[({time.min}):{time.str}:({time.max})][({altitude.min}):{altitude.str}:({altitude.max})][({latitude.min}):{latitude.str}:({latitude.max})][({longitude.min}):{longitude.str}:({longitude.max})]"
                                                    ]
                                  ] .

<#.tsv0>  rdf:type        dcat:Distribution;
        dcat:downloadURL  "https://coastwatch.pfeg.noaa.gov/erddap/griddap/erdMWcflh1day.tsv0";
        dcat:mediaType    <https://www.iana.org/assignments/media-types/text/tab-separated-values> .

<#.htmlTable>  rdf:type   dcat:Distribution;
        dcat:downloadURL  "https://coastwatch.pfeg.noaa.gov/erddap/griddap/erdMWcflh1day.htmlTable";
        dcat:mediaType    <https://www.iana.org/assignments/media-types/text/html> .

<#.fgdc>  rdf:type        dcat:Distribution;
        dcat:downloadURL  "https://coastwatch.pfeg.noaa.gov/erddap/griddap/erdMWcflh1day.fgdc";
        dcat:mediaType    <https://www.iana.org/assignments/media-types/application/xml> .

<>      rdf:type           dcat:Dataset;
        dct:conformsTo     [ rdf:type     csvw:TableSchema;
                             csvw:column  <#fluorescence> , <#longitude> , <#latitude> , <#altitude> , <#time>
                           ];
        dct:creator        [ rdf:type     foaf:Group;
                             foaf:member  [ rdf:type                foaf:Person;
                                            foaf:mbox               "[email protected]";
                                            foaf:name               "NOAA NMFS SWFSC ERD";
                                            foaf:workplaceHomepage  "https://www.pfeg.noaa.gov"
                                          ]
                           ];
        dct:description    "Moderate Resolution Imaging Spectroradiometer (MODIS) measures chlorophyll fluorescence, which gives insight into the physiology of phytoplankton in the ocean. When phytoplankton are under stress, the rate at which the fluoresce can be decoupled from the rate of photosynthetic productivity.";
        dct:identifier     "erdMWcflh1day";
        dct:issued         "2023-07-26"^^xsd:dateTime;
        dct:license        "The data may be used and redistributed for free but is not intended\nfor legal use, since it may contain inaccuracies. Neither the data\nContributor, ERD, NOAA, nor the United States Government, nor any\nof their employees or contractors, makes any warranty, express or\nimplied, including warranties of merchantability and fitness for a\nparticular purpose, or assumes any legal liability for the accuracy,\ncompleteness, or usefulness, of this information.";
        dct:modified       "2023-09-19T12:58:46Z"^^xsd:dateTime;
        dct:publisher      [ rdf:type     foaf:Group;
                             foaf:member  [ rdf:type                foaf:Person;
                                            foaf:mbox               "[email protected]";
                                            foaf:name               "NOAA NMFS SWFSC ERD";
                                            foaf:workplaceHomepage  "https://www.pfeg.noaa.gov"
                                          ]
                           ];
        dct:spatial        [ rdf:type   dct:Location;
                             dcat:bbox  "POLYGON((22.0 205.0,22.0 255.0,51.0 255.0,51.0 205.0,22.0 205.0))"^^gsp:wktLiteral
                           ];
        dct:temporal       [ rdf:type        dct:PeriodOfTime;
                             dcat:endDate    "2023-07-25T12:00:00Z"^^xsd:dateTime;
                             dcat:startDate  "2023-07-25T12:00:00Z"^^xsd:dateTime
                           ];
        dct:title          "Fluorescence, Aqua MODIS, NPP, 0.0125Â°, West US, 2002-present (1 Day Composite)";
        dcat:contactPoint  [ rdf:type   foaf:Person;
                             foaf:name  " NASA GSFC OBPG"
                           ];
        dcat:contactPoint  [ rdf:type   foaf:Person;
                             foaf:name  "NASA"
                           ];
        dcat:distribution  <#.ttl> , <#.fgdc> , <#.jsonld> , <#.das> , <#.xhtml> , <#.n3> , <#.jsonlCSV1> , <#.dods> , <#.nc> , <#.mat> , <#.nt> , <#.ncoJson> , <#.csvp> , <#.graph> , <#.tsv0> , <#.jsonlCSV> , <#.csv> , <#.iso19115> , <#.asc> , <#.rdfxml> , <#.nccsvMetadata> , <#.tsvp> , <#.odvTxt> , <#.tsv> , <#.ncml> , <#.htmlTable> , <#.timeGaps> , <#.json> , <#.nq> , <#.csv0> , <#.itx> , <#.esriAscii> , <#.dds> , <#.jsonlKVP> , <#.html> , <#.trig> , <#.subsettingOpenDap> , <#.wav> , <#.help> , <#.nccsv> , <#.ncHeader>;
        dcat:keyword       "altitude" , "spectroradiometer" , "latitude" , "longitude" , "mwcflh" , "0.0125\\u00b0" , "optics" , "earth" , "fluorescence" , "swfsc" , "day" , "degrees" , "optical properties" , "time" , "moderate" , "coast" , "wcn" , "coastwatch" , "polar-orbiting" , "aqua" , "marine" , "data" , "imaging" , "national" , "noaa" , "service" , "Earth Science > Oceans > Ocean Optics > Fluorescence" , "west" , "optical" , "node" , "fisheries" , "center" , "orbiting" , "npp" , "partnership" , "erd" , "properties" , "present" , "2002-present" , "modis" , "ocean" , "composite" , "polar" , "oceans" , "US" , "southwest" , "science" , "resolution" , "nmfs";
        dcat:landingPage   <> .

<#.odvTxt>  rdf:type      dcat:Distribution;
        dct:format        ".odvTxt";
        dcat:downloadURL  "https://coastwatch.pfeg.noaa.gov/erddap/griddap/erdMWcflh1day.odvTxt" .

<#altitude>  rdf:type    csvw:Column;
        qudt:lowerBound  "0.0"^^xsd:double;
        qudt:upperBound  "0.0"^^xsd:double;
        csvw:name        "altitude";
        csvw:titles      "Altitude";
        sh:dataType      xsd:double .

<#.rdfxml>  rdf:type      dcat:Distribution;
        dcat:downloadURL  "https://coastwatch.pfeg.noaa.gov/erddap/griddap/erdMWcflh1day.rdfxml";
        dcat:mediaType    <https://www.iana.org/assignments/media-types/application/rdf+xml> .

<#.help>  rdf:type        dcat:Distribution;
        dcat:downloadURL  "https://coastwatch.pfeg.noaa.gov/erddap/griddap/erdMWcflh1day.help";
        dcat:mediaType    <https://www.iana.org/assignments/media-types/text/html> .

<#time>  rdf:type        csvw:Column;
        qudt:lowerBound  "2023-07-25T12:00:00Z"^^xsd:dateTime;
        qudt:upperBound  "2023-07-25T12:00:00Z"^^xsd:dateTime;
        csvw:name        "time";
        csvw:titles      "Centered Time";
        sh:dataType      xsd:double .

<#.html>  rdf:type        dcat:Distribution;
        dcat:downloadURL  "https://coastwatch.pfeg.noaa.gov/erddap/griddap/erdMWcflh1day.html";
        dcat:mediaType    <https://www.iana.org/assignments/media-types/text/html> .

<#.ncml>  rdf:type        dcat:Distribution;
        dcat:downloadURL  "https://coastwatch.pfeg.noaa.gov/erddap/griddap/erdMWcflh1day.ncml";
        dcat:mediaType    <https://www.iana.org/assignments/media-types/application/xml> .

<#.ttl>  rdf:type         dcat:Distribution;
        dcat:downloadURL  "https://coastwatch.pfeg.noaa.gov/erddap/griddap/erdMWcflh1day.ttl";
        dcat:mediaType    <https://www.iana.org/assignments/media-types/text/turtle> .

<#.tsvp>  rdf:type        dcat:Distribution;
        dcat:downloadURL  "https://coastwatch.pfeg.noaa.gov/erddap/griddap/erdMWcflh1day.tsvp";
        dcat:mediaType    <https://www.iana.org/assignments/media-types/text/tab-separated-values> .

<#latitude>  rdf:type    csvw:Column;
        qudt:lowerBound  "22.0"^^xsd:double;
        qudt:upperBound  "51.0"^^xsd:double;
        csvw:name        "latitude";
        csvw:titles      "Latitude";
        sh:dataType      xsd:double .

<#.das>  rdf:type         dcat:Distribution;
        dct:format        ".das";
        dcat:downloadURL  "https://coastwatch.pfeg.noaa.gov/erddap/griddap/erdMWcflh1day.das" .

<#.jsonlKVP>  rdf:type    dcat:Distribution;
        dct:format        ".jsonlKVP";
        dcat:downloadURL  "https://coastwatch.pfeg.noaa.gov/erddap/griddap/erdMWcflh1day.jsonlKVP" .

<#.nt>  rdf:type          dcat:Distribution;
        dcat:downloadURL  "https://coastwatch.pfeg.noaa.gov/erddap/griddap/erdMWcflh1day.nt";
        dcat:mediaType    <https://www.iana.org/assignments/media-types/application/n-triples> .

<#.xhtml>  rdf:type       dcat:Distribution;
        dcat:downloadURL  "https://coastwatch.pfeg.noaa.gov/erddap/griddap/erdMWcflh1day.xhtml";
        dcat:mediaType    <https://www.iana.org/assignments/media-types/application/xhtml+xml> .

<#.nc>  rdf:type          dcat:Distribution;
        dct:format        ".nc";
        dcat:downloadURL  "https://coastwatch.pfeg.noaa.gov/erddap/griddap/erdMWcflh1day.nc" .

<#.csv0>  rdf:type        dcat:Distribution;
        dcat:downloadURL  "https://coastwatch.pfeg.noaa.gov/erddap/griddap/erdMWcflh1day.csv0";
        dcat:mediaType    <https://www.iana.org/assignments/media-types/text/csv> .

<#.jsonld>  rdf:type      dcat:Distribution;
        dcat:downloadURL  "https://coastwatch.pfeg.noaa.gov/erddap/griddap/erdMWcflh1day.jsonld";
        dcat:mediaType    <https://www.iana.org/assignments/media-types/application/ld+json> .

<#longitude>  rdf:type   csvw:Column;
        qudt:lowerBound  "205.0"^^xsd:double;
        qudt:upperBound  "255.0"^^xsd:double;
        csvw:name        "longitude";
        csvw:titles      "Longitude";
        sh:dataType      xsd:double .

<#.nccsvMetadata>  rdf:type  dcat:Distribution;
        dcat:downloadURL  "https://coastwatch.pfeg.noaa.gov/erddap/griddap/erdMWcflh1day.nccsvMetadata";
        dcat:mediaType    <https://www.iana.org/assignments/media-types/text/csv> .

<#.ncoJson>  rdf:type     dcat:Distribution;
        dcat:downloadURL  "https://coastwatch.pfeg.noaa.gov/erddap/griddap/erdMWcflh1day.ncoJson";
        dcat:mediaType    <https://www.iana.org/assignments/media-types/application/json> .

<#.wav>  rdf:type         dcat:Distribution;
        dct:format        ".wav";
        dcat:downloadURL  "https://coastwatch.pfeg.noaa.gov/erddap/griddap/erdMWcflh1day.wav" .

<#.jsonlCSV1>  rdf:type   dcat:Distribution;
        dct:format        ".jsonlCSV1";
        dcat:downloadURL  "https://coastwatch.pfeg.noaa.gov/erddap/griddap/erdMWcflh1day.jsonlCSV1" .

<#.asc>  rdf:type         dcat:Distribution;
        dct:format        ".asc";
        dcat:downloadURL  "https://coastwatch.pfeg.noaa.gov/erddap/griddap/erdMWcflh1day.asc" .

<#.graph>  rdf:type       dcat:Distribution;
        dcat:downloadURL  "https://coastwatch.pfeg.noaa.gov/erddap/griddap/erdMWcflh1day.graph";
        dcat:mediaType    <https://www.iana.org/assignments/media-types/text/html> .

<#fluorescence>  rdf:type    csvw:Column;
        schema:defaultValue  "-9999999.0";
        csvw:name            "fluorescence";
        csvw:titles          "Fluorescence";
        sh:dataType          xsd:float .

<#.tsv>  rdf:type         dcat:Distribution;
        dcat:downloadURL  "https://coastwatch.pfeg.noaa.gov/erddap/griddap/erdMWcflh1day.tsv";
        dcat:mediaType    <https://www.iana.org/assignments/media-types/text/tab-separated-values> .

<#.iso19115>  rdf:type    dcat:Distribution;
        dcat:downloadURL  "https://coastwatch.pfeg.noaa.gov/erddap/griddap/erdMWcflh1day.iso19115";
        dcat:mediaType    <https://www.iana.org/assignments/media-types/application/xml> .

<#.json>  rdf:type        dcat:Distribution;
        dcat:downloadURL  "https://coastwatch.pfeg.noaa.gov/erddap/griddap/erdMWcflh1day.json";
        dcat:mediaType    <https://www.iana.org/assignments/media-types/application/json> .

<#.csvp>  rdf:type        dcat:Distribution;
        dcat:downloadURL  "https://coastwatch.pfeg.noaa.gov/erddap/griddap/erdMWcflh1day.csvp";
        dcat:mediaType    <https://www.iana.org/assignments/media-types/text/csv> .

<#.dds>  rdf:type         dcat:Distribution;
        dct:format        ".dds";
        dcat:downloadURL  "https://coastwatch.pfeg.noaa.gov/erddap/griddap/erdMWcflh1day.dds" .

<#.esriAscii>  rdf:type   dcat:Distribution;
        dct:format        ".esriAscii";
        dcat:downloadURL  "https://coastwatch.pfeg.noaa.gov/erddap/griddap/erdMWcflh1day.esriAscii" .

<#.jsonlCSV>  rdf:type    dcat:Distribution;
        dct:format        ".jsonlCSV";
        dcat:downloadURL  "https://coastwatch.pfeg.noaa.gov/erddap/griddap/erdMWcflh1day.jsonlCSV" .

<#.trig>  rdf:type        dcat:Distribution;
        dcat:downloadURL  "https://coastwatch.pfeg.noaa.gov/erddap/griddap/erdMWcflh1day.trig";
        dcat:mediaType    <https://www.iana.org/assignments/media-types/application/trig> .

Another important part is, from a single URL, access to the RDF description of every available and accessible, while maintaining a certain form of control on the number results.

For that, I've created 2 URL, the first one like /{warName}/info/catalog.{RDF format}?page={..}&itemsPerPage={itemsPerPage} who list and wrap every {itemsPerPage} firsts datasets (sorted by the latestModifiedDate) RDF description inside an RDF dcat:Catalog.

I also created another URL (even if it's technically the same) /{warName}/info/catalog.{RDF format} which contains and list every /{warName}/info/catalog.{RDF format}?page={..}&itemsPerPage={..} URL available.
With this, just using a simple static URL, a user/bot can redirect himself and get every dataset without any knowledge of the support, and get the minimal information he needs.

                                                                <{baseURL}/info/catalog.ttl>
                                            /                                |                                  \
<{baseURL}/info/catalog.ttl?page=1&itemsPerPage=5>     <{baseURL}/info/catalog.ttl?page=2&itemsPerPage=5>    <{baseURL}/info/catalog.ttl?page=..&itemsPerPage=5>
          /                   |           \                       /                   |           \                                         |
{dataset1 graph}     {dataset2 graph}     {...}         {dataset6 graph}     {dataset7 graph}     {...}                                   {...}

Another cool feature would be, depending on the Header of the request, redirect it to the corresponding URL (kind of content Negotiation):

curl -L -X GET https://coastwatch.pfeg.noaa.gov/erddap/griddap/erdMWcflh1day.html -H "Accept: */*"

-> return the default html page

curl -L -X GET https://coastwatch.pfeg.noaa.gov/erddap/griddap/erdMWcflh1day.html -H "Accept: text/turtle"

-> get redirect to https://coastwatch.pfeg.noaa.gov/erddap/griddap/erdMWcflh1day.ttl
-> return the Turtle content

I have already done it for the default RDF format, and created these redirection:
/tabledap|griddap/{datasetid}.{format} -> Accept Header : RDF format -> /tabledap|griddap/{datasetid}.{corresponding RDF format}
/info/index.html -> Accept Header : RDF format -> /info/catalog.{corresponding RDF format}

All changes can be found in this ERDDAP fork : https://github.com/vliz-be-opsci/FAIR-EASE-erddap
There's also a complete docker build & runnable environment inside, but this is not the main topic.

I don't think we can see this as "the final" or optimal solution, but really think it can be a pretty good starting point to think about the next step for a more FAIR ERDDAP

BobSimons · 2023-09-20T19:33:57Z

BobSimons
Sep 20, 2023
Maintainer

There are a lot of topics in your email. I'm not sure I understand all of them correctly. (Please forgive me if I get something wrong.)

I think (I could easily be wrong) that at one point, you suggest that one request will return the RDF info for all datasets in a given ERDDAP. In general, I think that is a bad idea. Many ERDDAPs have a large number of datasets (3,000 to 30,000), so this response would be huge. If you envision some external system polling this frequently (every 15 minutes) to maintain all the metadata from a given ERDDAP, that seems like a very inefficient system that will take up lots of ERDDAP resources (especially given that most dataset won't have changed in that time period). It would be better to build an external system that subscribes to all of an ERDDAP's datasets so that it can then request metadata just for datasets that change and immediately after they have changed. Subscriptions (plus subsequent requests for the related metadata file for a dataset) are the most efficient (fastest, with minimal data transfer) way to detect changes to a given dataset, by far.

I understand that there will always be new ways of formatting each dataset's metadata and that different users prefer different formats. That's a fundamental feature of ERDDAP. I see that you would find your RDF format variants useful. Okay. So I see the value in making a new response format(s) for the erddap/info/ or erddap/metadata/ system in ERDDAP. These files could then be requested by clients when the dataset has changed. So that sounds like a good addition to ERDDAP.

I don't understand all of your fancy header options. What do they provide that simple, direct requests (e.g., give me the .ttl file for this dataset) don't?

So I pushed back on some of your ideas. I suspect my suggestions aren't what you want to hear. Please tell me more and give me use-case examples, so that I understand what you want, and why, and why your approach is needed.

Best wishes.

1 reply

BobSimons Sep 25, 2023
Maintainer

Two follow ups:

I think I originally misread your comments about the catalog of urls for all datasets. I thought you were going to return all metadata for all datasets. I now think/see that is incorrect. (Sorry about that.) Yes, a catalog of urls for the metadata files makes sense.
Regarding the header stuff: I should have said: It may make sense to do this type of processing of additional information in the header (notably as an option), but in general, ERDDAP has been designed to be RESTful in the sense that the URL specifies the entire request, which makes things simple for users no matter what software is making the request, e.g., including a web browser. Putting some of the request info in the header violates that and makes things more complicated.

Again, I leave it to Chris to decide how much of your proposal he wants to pursue. I think he will be unavailable for new projects for a while.

Best wishes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RDF Dataset Description #110

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

RDF Dataset Description #110

couppeym Sep 20, 2023

Replies: 1 comment · 1 reply

BobSimons Sep 20, 2023 Maintainer

BobSimons Sep 25, 2023 Maintainer

couppeym
Sep 20, 2023

Replies: 1 comment 1 reply

BobSimons
Sep 20, 2023
Maintainer

BobSimons Sep 25, 2023
Maintainer