Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tenantized queries / collections. Scope queries by top level hierarchy. #570

Open
joepio opened this issue Feb 1, 2023 · 0 comments
Open
Labels
performance Speed improvements

Comments

@joepio
Copy link
Member

joepio commented Feb 1, 2023

Running a query is currently always done at a Server level. This makes sense for typical single-tenant deployments, but for server like atomicdata.dev, this can lead to suboptimal performance. Say you query all Folders. If a high percentage of these are private, the server will have to iterate over many items and check their rights. This is slow.

I think we should also index information about either the hierarchy or the drive. Let's explore some options.

Store the drive in index rows

Most (but not all) resources are scoped to a Drive. We could add a drive option to every QueryFilter. Most Atoms we index will have a drive. We'll need to derive the Drive either from the URL or by iterating over the parents. In every query filter, we store the drive, which causes it to be part of the index.

Similar to #288

Index parent

It would be nice if we could query the store using every single parent from a Resource. Let's say we have this nested structure:

a > b > c > d

We may want to find all Article resources inside b, which also includes resources that have their parent set to c. If we only find the direct children we'll not find d. We could find d like so:

Find all children using the existing indexes where parent: b. This will find c. Now, we also need to find all resources where parent: c. This will return d. Since this is a recursive process, we'll also try finding the children of d, which do not exist in this example. This entire process now took 3 lookups. Two with results, and one for the leafs. I think this should be pretty fast on its own, but it does more than n lookups, where n is the amount of leaves / results. If we need to do this for every single changed atom that is updated, things will be slow!

Now let's do an update. Let's say we're adding article d, and the Query looking for articles in b should be updated. How do we achieve this?

When we make the relevant change (e.g. change a parent from x to c), we create a new Atom. This Atom is turned into an IndexAtom. This new IndexAtom now has an array of parents. To do this, we need to know all the parents for this atom, the entire hierarchy. We find the direct parent, the parent above that, etc. With this list of parents, we can try all the QueryFilters, and check if the parents match.

Add hierarchy index

This is a new sled::Tree that stores hierarchical data. In here, each key is a Parent + (sub)Child. There is no value. Each row simply represents that that Child subject has the Parent as (in)direct parent.

Whenever we do add_atom_to_index (or remove), we check if the property is parent. If that's the case, we find all parents and then update these indexes to add the children to the list.

For every resource, we might have various index items. A child with 5 parents will have 5 index items. That's not necessarily a problem, as the items itself will be small.

I think we'll want two new methods on Store: get_parents and get_children. These should return an iterator, as the amount could be big.

@joepio joepio added the performance Speed improvements label Feb 1, 2023
@joepio joepio mentioned this issue Feb 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Speed improvements
Projects
None yet
Development

No branches or pull requests

1 participant