From 4e251ea4e3a5f9ba4384b21325843515e88a1648 Mon Sep 17 00:00:00 2001 From: Luca Favatella Date: Wed, 7 Jun 2017 13:21:28 +0100 Subject: [PATCH] Document required encoding of query parameters of search ## Solr A note in the documented changes of Solr 4.1.0 regarding portability of Solr across Web containers points out that ["Query strings passed in via the URL need to be properly-%-escaped, UTF-8 encoded bytes, otherwise Solr refuses to handle the request"](https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.10.4/solr/CHANGES.txt#L3376-L3381). A note in the documented changes of Solr 4.5.0 mentions parametrization of encoding of query parameters by [`ie` parameter](https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.10.4/solr/CHANGES.txt#L1995-L1997) (e.g. [`ie=iso-8859-1`](https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.10.4/solr/core/src/test/org/apache/solr/servlet/SolrRequestParserTest.java#L249)), parametrization of encoding of POST request body by [`Content-Type` header](https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.10.4/solr/CHANGES.txt#L1997-L1998) (e.g. [`application/x-www-form-urlencoded; charset=iso-8859-1`](https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.10.4/solr/core/src/test/org/apache/solr/servlet/SolrRequestParserTest.java#L251)), and [UTF-8 as the default encoding](https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.10.4/solr/CHANGES.txt#L1997). As of Solr 4.10.4 UTF-8 is still the [default](https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.10.4/solr/core/src/java/org/apache/solr/servlet/SolrRequestParsers.java#L345-L348) encoding for both [query parameters](https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.10.4/solr/core/src/java/org/apache/solr/servlet/SolrRequestParsers.java#L248) and [POST request body](https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.10.4/solr/core/src/java/org/apache/solr/servlet/SolrRequestParsers.java#L602-L606). ## Riak Search [The version of yokozuna in riak kv 2.2.3 is 2.1.10](https://github.com/basho/riak/blob/riak-2.2.3/rebar.config#L24) [that integrates Solr 4.10.4](https://github.com/basho/yokozuna/blob/2.1.10/tools/grab-solr.sh#L21) (see also https://github.com/basho/yokozuna/pull/709/commits/7f0d464b9190ee6db115aa4bfcd38f6407791e4a) whose documentation is available [online](https://archive.apache.org/dist/lucene/solr/ref-guide/apache-solr-ref-guide-4.10.pdf). [Yokozuna 2.1.10 depends on riak_kv 2.1.7](https://github.com/basho/yokozuna/blob/2.1.10/rebar.config#L14) that [via](https://github.com/basho/riak_kv/blob/2.1.7/rebar.config#L38) [riak_api 2.1.6 depends on basho/webmachine 1.10.8-basho1](https://github.com/basho/riak_api/blob/2.1.6/rebar.config#L6) that contains e.g. module [`wrq`](https://github.com/basho/webmachine/blob/1.10.8-basho1/src/wrq.erl), and that [depends on mochiweb v2.9.0p2](https://github.com/basho/webmachine/blob/1.10.8-basho1/rebar.config#L9) that contains e.g. module [`mochiweb_util`](https://github.com/basho/mochiweb/blob/v2.9.0p2/src/mochiweb_util.erl). When receiving a [search request](https://docs.basho.com/riak/kv/2.2.3/developing/api/http/search-query/#request), yokozuna [calls the `search` function](https://github.com/basho/yokozuna/blob/2.1.10/src/yz_wm_search.erl#L58), that [extracts](https://github.com/basho/yokozuna/blob/2.1.10/src/yz_wm_search.erl#L125) [the](https://github.com/basho/webmachine/blob/1.10.8-basho1/src/wrq.erl#L111) [query](https://github.com/basho/webmachine/blob/1.10.8-basho1/src/wrq.erl#L68-L70) - [percent-decoded but not further decoded e.g. Unicode](https://github.com/basho/mochiweb/blob/v2.9.0p2/src/mochiweb_util.erl#L202-L203) - then [appends some distributed search related parameters](https://github.com/basho/yokozuna/blob/2.1.10/src/yz_solr.erl#L323) then [percent-encodes (not further e.g. Unicode) the parameters](https://github.com/basho/yokozuna/blob/2.1.10/src/yz_solr.erl#L330) and [contacts Solr via POST request](https://github.com/basho/yokozuna/blob/2.1.10/src/yz_solr.erl#L334) [setting header content type to `application/x-www-form-urlencoded`](https://github.com/basho/yokozuna/blob/2.1.10/src/yz_solr.erl#L332). As such content type header has no charset specified, Solr interprets the POST body as UTF-8. --- content/riak/kv/2.2.3/developing/api/http/search-query.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/content/riak/kv/2.2.3/developing/api/http/search-query.md b/content/riak/kv/2.2.3/developing/api/http/search-query.md index 1ce6ecec77..f5fa5c7b5e 100644 --- a/content/riak/kv/2.2.3/developing/api/http/search-query.md +++ b/content/riak/kv/2.2.3/developing/api/http/search-query.md @@ -25,6 +25,8 @@ GET /search/query/ ## Optional Query Parameters +Query parameters must be UTF-8 encoded. + * `wt` --- The [response writer](https://cwiki.apache.org/confluence/display/solr/Response+Writers) to be used when returning the Search payload. The currently