Skip to content

Specifying CrawlDB in Config

Kevin Yan edited this page Mar 13, 2021 · 6 revisions

General note on config file modifications:

The config file in question is sparkler-default.yaml. However, there are 3 sparkler-default.yaml config files in use:

Changes to the config file should be made across all 3 files for consistency.

Specifying which crawldb to use

The section of the config file pertaining to crawldb is set up as following (subject to change):

  crawldb.backend: solr

  solr.uri: http://localhost:8983/solr/crawldb
  elasticsearch.uri: http://localhost:9200

The 'crawldb.backend' field specifies which crawldb to use. Note, the value for 'crawldb.backend' must match one of the following '*.uri' fields. For example, the following specifies elasticsearch as the crawldb to use:

  crawldb.backend: elasticsearch

  solr.uri: http://localhost:8983/solr/crawldb
  elasticsearch.uri: http://localhost:9200

Adding a crawldb to the config file

To add a crawldb to this config file, add in the URI and specify the new crawldb. The following is an example done with an hypothetical crawldb called 'testdb'.

  crawldb.backend: testdb

  solr.uri: http://localhost:8983/solr/crawldb
  elasticsearch.uri: http://localhost:9200
  testdb.uri: http://localhost:9999  # replace http://localhost:9999 with the appropriate URI