Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coin and document effective ACL URI #325

Open
bblfish opened this issue Oct 18, 2021 · 15 comments
Open

Coin and document effective ACL URI #325

bblfish opened this issue Oct 18, 2021 · 15 comments
Assignees

Comments

@bblfish
Copy link
Contributor

bblfish commented Oct 18, 2021

There are some very good reasons to have default ACLs (WAC). There are also very good reason for each resource to have its own ACL (ACP). How can these two points of views come together?
Simply by allowing LDPRs to distinguish between two ACL resources

  1. A link to the effective Access Control Resource
  2. A link to the resource's potential Access Control Resource

Indeed the current WAC spec states that

a separate link relation type targeting the effective ACL resource is allowed, but no behaviour is defined by this specification.

Please specify this officially!

Trellis has defined http://www.trellisldp.org/ns/trellis#effectiveAcl. It should be possible to have one that we all agree on in a solid namespace.

Problems solved

The problems are:

  • If only 1) is given then there is no way to create subspaces on a server with different access control rules, as there is no way to work out where the resource's acl will be
  • if only 2) is given but the default acl needs to be discovered (as per current WAC) then the cost of finding the effective ACL is 2n+1 where n is the number of folders between the resource and the effective one. This is unworkable.
  • If every resource has 2 be the effective resource - as ACP does - then we have a problem of wastage and maintenance. This creates duplication of information as per issue 206 of the authorization panel. For all those cases where a default set of rules would do for most content, one ends up with a duplication of rules. Every resource will end up with the same copy of pretty much the same ACL. This creates problems for updating them, and also would slow down requests, fill up intermediate caches with duplicate information, etc... So it can be pretty wasteful. It slows down requests that are access controlled, by requiring essentially 1 more request than needed.

The answer to both these problems, is just to coin a new Link relation as discussed in last weeks authorization panel meeting
https://github.com/solid/authorization-panel/blob/draft-minutes/meetings/2021-10-13.md

@csarven
Copy link
Member

csarven commented Oct 18, 2021

Can we move any new information in this issue to solid/web-access-control-spec#99 and then close this issue? or is the request in this repo to have one that can be used by any access control system?

@bblfish
Copy link
Contributor Author

bblfish commented Oct 18, 2021

I would like this to be open until the name is decided on.

When I came to the solid specification meeting a couple of weeks ago, I mentioned to @timbl that there was a very simple issue that could start bringing WAC and ACP together. Whoever wants to adopt this is welcome to do that. But because it could be adopted by both, and it just needs an agreement on the name, I think it would help for it to have wider visibility.

@acoburn
Copy link
Member

acoburn commented Oct 18, 2021

w.r.t the Trellis implementation (since it is mentioned in the issue), I encountered a number of challenges with support for "effective ACL" links. Ultimately, I concluded that it is not a great feature, and if I had more time, I would remove that feature from the Trellis codebase altogether. In short, please don't use its existence in Trellis as justification for adding it to Solid.

@bblfish
Copy link
Contributor Author

bblfish commented Oct 18, 2021

@acoburn you gave some reasons in the 2n+1 thread about why you think that saving 2n+1 requests may be problematic, but I did not find those reasons convincing.

Currently because the "acl" link header points to the non-existent ACL in WAC, every client will need to discover the effective ACL with 2n+1 requests. That is a crazy waste of resources. If one wanted to make WAC fail one would propose an architecture like that.

@acoburn
Copy link
Member

acoburn commented Oct 18, 2021

@bblfish I have very little interest in arguing this point. We have different perspectives, and that is fine. I am simply trying to clarify that if your argument starts by pointing at prior art in Trellis, I would urge caution. My implementation experience writing client applications that would interact with that property let me to the conclusion that this is not a great idea.

@bblfish
Copy link
Contributor Author

bblfish commented Oct 19, 2021

Well there is a default currently and that is to use http://www.trellisldp.org/ns/trellis#effectiveAcl. The WAC spec should mention that, since its aim is to describe existing implementations. I am ok to implement that.

@prefix trellis: <http://www.trellisldp.org/ns/trellis#> .
trellis:effectiveAcl     a rdf:Property;
         :comment "The ACL that currently controls the resource in question."@en;
         :isDefinedBy t:;
         :label "effective ACL"@en;
         vs:term_status "working-draft" .

Do any clients rely on that?

@bblfish
Copy link
Contributor Author

bblfish commented Oct 19, 2021

Evidence I'd accept in this case is actual performance measurements of applications.

The simple maths and experience writing web applications for 25 years is quite enough for me. Requests to web servers are expensive. Here is a helpful page on Latency numbers every programmer should know from 11 years ago. Remarkably on that page you will find that if you humanise computer time with 1 clock cycle being a human heart beat, then sending a packet to the Netherlands from California and back would take 4 years. Furthermore, the speed of light is a hard limit, so it ain't going to get any faster.

When I worked at AltaVista the further under 1 second we could get responses the more queries we would get. And we would do all to avoid unnecessary requests.

Imagine you have a simple setup say for a Web Science server storing billions of pictures from satellites peering into deep space. Imagine it may have a very simple set of default rules. Perhaps the systax for such rules could evolve even a little bit so that they get tied to tags on the documents, allowing a lot of flexibility. But we'd have a root default for all the content.

If you keep that in mind, you will see that the WAC Link system seems very broken: you have a Link: <doc.acl>; rel="acl" header that will point for most resources to a non-existent resource on all servers that have pretty simple access control rules. So for every request to a resource that is access controlled, the client will first end up getting a link to a non-existent acl.

The problems are numerous: to start how is that going to improve the intuitiveness of working with ACLs if most links are broken?

The fix for this is extremely simple, has already been proposed and is deployed.

@bblfish
Copy link
Contributor Author

bblfish commented Oct 19, 2021

And we've verified experimentally that dozens of requests per second are not an issue:

Does that study deal with clients needing to do access control? I could not find one word in the first pdf about "access control" or related key words I tried.

In any case having a link that will in many situations always fail does not seem to me to be following the intuition we had about the "Link: <doc.acl>; rel="acl"` header, nor be a good design. Until recently the wiki based spec stated:

If such a resource does not exist, a client SHOULD NOT search for, or interact with, the inherited ACLs from an upstream container.

So it is not even entrenched behaviour, and could easily be fixed now.

@bblfish
Copy link
Contributor Author

bblfish commented Oct 19, 2021

Does that study deal with clients needing to do access control?

No, with an even more complex Linked Data case that involves much more than 2n+1 and we can pull it off just fine.

But you are not writing apps for end users there. You are writing a crawler or something like that. The AltaVista crawler could take 3 months to collect the whole web around 1999, and I am sure Google crawlers go on for the whole year. Also those crawlers take into account dead links: too many dead links on a page and it is shelved at the bottom of the index.

But the apps we are talking of here are not crawlers. They need quick feedback to users, whose data will be pointing all over the web of linked data. I remember @timbl complaining that 303 redirects were slowing down the tabulator, and there we had 1 redirect. Here we have 2n+1 extra requests needed!

The fact that it is a 404 is seen as a feature in Solid; a signal that there is no resource-specific ACL and that the agent is able to create one at that place if they so desire.

I have nothing against 404s. But in with my Web Science example you have 100 billion resources on a server, each with an acl link pointing to a non existent resource, then it may be quite understandable if developers come to the conclusion that "acl" stands for "a corrupt link".

This is also about follow your nose. The name matters here: "potentialACLocation" would clue developers in. And indeed for the potential Link I think having a name identified by a URL would be very important, as that would give developers at least one link to follow when they find that the resource linked to does not exist. So we should really have

Link: </default.acl>; rel="acl"
Link: </2021/foo/bar/baz/2093supernovae.acl>; \     
      rel="http://www.w3.org/ns/auth/acl#potentialAccessControlResource"

we need server implementers in favor

+1 I am in favour of two links, as the author of Reactive-Solid, for the reasons given at the top.

But really you need client implementors that are dealing with access control rules, because they will be the ones noticing the slowness in their applications. As it happens I am also working this month on a client implementation to demonstrate access control.

@bblfish
Copy link
Contributor Author

bblfish commented Oct 20, 2021

And you're at liberty to add them; perhaps this can help gather the evidence about what performance difference it makes for actual end user applications.

Ok, so that is where I was trying to get us to: to help find a first compromise between WAC and ACP; to bring them together so that we don't end up having two access control systems.

The idea is that we should keep the "acl" relation work the obvious way: point to the "Active Control List" and have a new relation point to the Access Control Potential (ACP).

more here: #326

@bblfish
Copy link
Contributor Author

bblfish commented Oct 22, 2021

@RubenVerborgh wrote:

Let's not do premature performance optimization; we should first demonstrate that there is a problem.

I wonder if you have a misunderstanding what the 2n+1 problem is. I went through it in extreme detail in WAC issue 99 carefully laying out each http request needed there. You may not have read it.

The reason I wonder is because in WAC issue 97 you are pushing for a change to the spec giving efficiency reasons that are much much smaller than the one described here and much more difficult to evaluate empirically.

Your problem in WAC-97 is that the auth system may need to make 1 more request between the Solid back-end and the LDP server.

Note that:

  1. even though the access control layer and backend need not be tightly coupled, they are a lot more coupled than a hyper-app on the web and a random solid server. For example it is quite easy to imagine, as @kjetilk did in his comment, that one could have a caching system there reducing the need for communication.
  2. connectivity between a Auth system and the backend can be made to be excellent. One cannot rely on this on the wide web.

I am not arguing with your reasons there. I am just pointing out that if they apply to the backend, they apply even more strongly to app/server communication.

In that issue you bring up the following principle in support of your change:

Specifically, if an access control system A is protecting an interface I, then we want to eliminate or at least reduce the number of requests from A to I.

It is good to see that we agree on the basic principle!
The difference is just that I am looking at it from the Point of View of a client App. The client app needs to know what the access control rules are before choosing how to authenticate, whereas the Guard needs to check the identity matches the rules. These are dual roles.

In your first comment to WAC issue 97 you wrote

Not to mention the possibility of a DDOS with PUT /a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a.

I calculate that there are 40 directories there, so that would require the client to make 2*40+1 = 81
HTTP requests if the WAC resource were located at the root. This is a denial of service attack on the client and on the server. The client DoS is important because that is where the user experience lies: clients are what people investing in Solid see first and it is what they will use to evaluate Solid.

The nice thing is that by agreeing to a simple convention (e.g. as proposed in 326) we can reduce those 81 connections down to 1.

@bblfish
Copy link
Contributor Author

bblfish commented Oct 22, 2021

81 requests is still peanuts.

80 unnecessary requests when 1 would do is not peanuts!!!
Indeed, you yourself called that a Denial of Service attack here.

The moment we do encounter it, we add the link and it's solved.

I am encountering it as I am implementing a server and a client for Solid.
If you don't care now, can you please not obstruct the discussion?
Especially as it is so easy to fix. Either:

  1. Have the acl Link relation point to the potential acl resource (as the current WAC spec now defines) and coin a new link relation for the active acl. The spec actually mentions such a link, it just does not define it, which is not that good for interoperability. Even @acoburn's http://www.trellisldp.org/ns/trellis#effectiveAclwould do.
  2. Have the acl link point to the active Access Control Resource and use another link to point to the potential one when they differ. I suggested acp:AccessControl in spec solid/specification#326

I favour 2, because links to potentially non-existent resources would be better given as dereferenceable URLs, which would allow developers to learn what they were meant to point to if they ever find one on the web. It would also I think be a lot closer to what we have understood the "acl" link relation to mean over the years. So my guess is that the current WAC spec does not define well entrenched usage right now.

@bblfish
Copy link
Contributor Author

bblfish commented Oct 22, 2021

@RubenVerborgh wrote above

The actually occurring cases where you a) need an ACL b) that is deep, are very slim.

The "actually occurring cases" are of course limited because of the problems such as the one I am pointing out here. A car with a flat tyre cannot actually drive far or fast. At that point it can potentially drive far and fast: once the tyre is fixed it will be able to drive far and fast.

For my use cases, ACLs will be needed a lot and there is no need to limit the depth of the the tree artificially.

I used to be weary of default ACLs, but I think they have some serious advantages and a lot of potential (once this problem fixed).

Here is an interesting use case of a web server with a default ACL at the root that most resource could Link to. Consider a situation where access rules are dependent on the metadata of the resources. One example of such metadata would be resources being tagged as over_18, under_16, under_13, under_6, etc... The rules would then state things like "to access a resource tagged over18 you need to present the proof of being major".
Each of these resources could link to the default active acl. As a result clients coming across a new resource would only need to fetch that default ACL once. All other requests to that server would be retrievable from the client's cache. That is good web architecture: we use the caching architecture of the web as it was designed to be used. As a result clients would not just reduce 2n+1 down to 1, but for k requests the server would have a saving of (2n+1)*k since it would never need to fetch or search for the root resource again.

Note that the idea of having public metadata and protected resources was recently mentioned by @phochste in the issue on metadata mechanisms, where he wrote that

in academic libraries we have the inverse use-case where the metadata of a non-RDF resource might be public available, but the non-RDF resource itself can be protected

@bblfish
Copy link
Contributor Author

bblfish commented Mar 18, 2023

We could use wac:accessControl, i.e. http://www.w3.org/ns/auth/acl#accessControl, to link to the effective access control resource. That would require @timbl to make a little change in the definition of that relation, perhaps.

There may be a good reason anyway to make changes to the URL of the wac ontology as per solid/vocab#86. Following up on that idea, I will use, for a demonstration the https relation

Link: </defaul.ac>; rel="https://www.w3.org/ns/auth/acl#accessControl`

to show how this helps reduce client requests.

@bblfish
Copy link
Contributor Author

bblfish commented May 11, 2023

I implemented default access control rules on the Reactive-Solid. As there is a minor bug with a library I have not used the URL but instead effectiveAccessControl string for the rel name (see commit).

While implementing the client I moved to a slightly different proposal described in #531 effectiveAccessControl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants