Tenantized queries / collections. Scope queries by top level hierarchy. #570

joepio · 2023-02-01T09:31:56Z

Running a query is currently always done at a Server level. This makes sense for typical single-tenant deployments, but for server like atomicdata.dev, this can lead to suboptimal performance. Say you query all Folders. If a high percentage of these are private, the server will have to iterate over many items and check their rights. This is slow.

I think we should also index information about either the hierarchy or the drive. Let's explore some options.

Store the drive in index rows

Most (but not all) resources are scoped to a Drive. We could add a drive option to every QueryFilter. Most Atoms we index will have a drive. We'll need to derive the Drive either from the URL or by iterating over the parents. In every query filter, we store the drive, which causes it to be part of the index.

Similar to #288

Index parent

It would be nice if we could query the store using every single parent from a Resource. Let's say we have this nested structure:

a > b > c > d

We may want to find all Article resources inside b, which also includes resources that have their parent set to c. If we only find the direct children we'll not find d. We could find d like so:

Find all children using the existing indexes where parent: b. This will find c. Now, we also need to find all resources where parent: c. This will return d. Since this is a recursive process, we'll also try finding the children of d, which do not exist in this example. This entire process now took 3 lookups. Two with results, and one for the leafs. I think this should be pretty fast on its own, but it does more than n lookups, where n is the amount of leaves / results. If we need to do this for every single changed atom that is updated, things will be slow!

Now let's do an update. Let's say we're adding article d, and the Query looking for articles in b should be updated. How do we achieve this?

When we make the relevant change (e.g. change a parent from x to c), we create a new Atom. This Atom is turned into an IndexAtom. This new IndexAtom now has an array of parents. To do this, we need to know all the parents for this atom, the entire hierarchy. We find the direct parent, the parent above that, etc. With this list of parents, we can try all the QueryFilters, and check if the parents match.

Add hierarchy index

This is a new sled::Tree that stores hierarchical data. In here, each key is a Parent + (sub)Child. There is no value. Each row simply represents that that Child subject has the Parent as (in)direct parent.

Whenever we do add_atom_to_index (or remove), we check if the property is parent. If that's the case, we find all parents and then update these indexes to add the children to the list.

For every resource, we might have various index items. A child with 5 parents will have 5 index items. That's not necessarily a problem, as the items itself will be small.

I think we'll want two new methods on Store: get_parents and get_children. These should return an iterator, as the amount could be big.

The text was updated successfully, but these errors were encountered:

joepio added the performance Speed improvements label Feb 1, 2023

joepio mentioned this issue Mar 13, 2023

Sudomain support, register endpoint #505

Draft

19 tasks

joepio mentioned this issue Feb 1, 2024

Multi-tenancy #288

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tenantized queries / collections. Scope queries by top level hierarchy. #570

Tenantized queries / collections. Scope queries by top level hierarchy. #570

joepio commented Feb 1, 2023 •

edited

Loading

Tenantized queries / collections. Scope queries by top level hierarchy. #570

Tenantized queries / collections. Scope queries by top level hierarchy. #570

Comments

joepio commented Feb 1, 2023 • edited Loading

Store the drive in index rows

Index parent

Add hierarchy index

joepio commented Feb 1, 2023 •

edited

Loading