Replies: 5 comments 18 replies
-
The warning that there's "No index available for this query" is the explanation. Without an index, mango will read the entire database each time. That's great for prototyping and adhoc queries but not for you. Build an index (the "_index" endpoint) for this field and you'll see a huge improvement. |
Beta Was this translation helpful? Give feedback.
-
Hi Todd! Good to hear from you. Putting all the work of the actual DBMS aside for a moment, I was curious about the latency floor that we could expect from the Now, CouchDB's regular expression operator hasn't seen a lot of optimization work. In particular, it does not compile the expression, which one would expect to be useful when executing the RE 200k times in a loop! I tried doing that and was a little surprised to find that the compiled regular expression still took 2500 +/- 200 milliseconds to execute. Not too much improvement. On the other hand, your use case doesn't actually need the full might of PCRE. Erlang does have highly optimized code for matching parts of binary strings, so I tried using Of course that still leaves 70% of the execution time outside of the regular expression execution, presumably taken up with the process of running a table scan on all the document data. IIRC CouchDB doesn't do any predicate pushdown which probably drives up the overhead a lot for a query like this that scans a lot of data to return a small result set. |
Beta Was this translation helpful? Give feedback.
-
@kocolosk wrote
Is this internal to couchdb or something we can declare in our queries for testing? |
Beta Was this translation helpful? Give feedback.
-
Is the size denoted here accurate in terms of disk footprint? I am trying to understand the impact of sharding (single node) for an IoT like deployment. Same question on memory. |
Beta Was this translation helpful? Give feedback.
-
not directly applicable, but I made a little demo for what it could look like adding string/array/date manipulation functions to the Mango indexer, which would help speed up this query: https://gist.github.com/janl/e5469f6f08c9be0405f31451889d5030 |
Beta Was this translation helpful? Give feedback.
-
My team has selected couchdb for our persistence tier because we have strong requirements for highly scalable data replication for an IoT style deployment. The IoT device has very limited hardware (disk/ram) so we want to also use the DB for queries as well (not just a data transport).
We have created a very simple application to test out the speed for Like queries. Note that both of these test runs were done on a development machine using the standard install of CouchDB and MongoDB on a mac so the results are based upon vanilla out of the box configurations.
To validate the scenarios, we created an 200K document DB (example document looks like this)..
We want to be able to do very basic pattern matches (basically the SQL equivalent of %SEARCH_STRING%) but the performance we are seeing with couchdb(mango) is extremely slow (9 seconds) in comparison to MongoDB queries (400ms). I see that the community typically suggests lucene but the overhead for this IoT scenario is not really something that can fit our footprint requirements.
CouchDB Results
![image](https://user-images.githubusercontent.com/66785012/141385781-962b4924-1cc0-44cc-8259-b73d9ce3cdd1.png)
![image](https://user-images.githubusercontent.com/66785012/141325567-75c3bc62-cb0f-462a-ab25-5c996175238e.png)
(updated scenario since the original screenshot had the wrong query)
If we compare this against a very similar query in MongoDB (note the randomString is randomly generated so not the same in both DBs but the scripts are essentially the same that are populating the DBs) the performance is in ms.
![image](https://user-images.githubusercontent.com/66785012/141292883-7bf38a2c-da8b-4b09-8df9-41d1bb6cb1ba.png)
Beta Was this translation helpful? Give feedback.
All reactions