Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue-6642 Add http.send request attribute to ignore headers for caching key #6675

Merged

Conversation

rudrakhp
Copy link
Contributor

@rudrakhp rudrakhp commented Apr 4, 2024

Why the changes in this PR are needed?

Need to support exclusion of certain headers from http.send query cache key.

What are the changes in this PR?

  • A new attribute cache_ignored_headers in the http.send built in request object. Enables policy authors to define an exclusion list for headers to ignore while caching.
  • Modify key used by the HTTP request executor when building the cache key.
  • Unit tests for these changes.

Notes to assist PR review:

  • Note that if request attribute cache_ignored_headers changes but does not affect the final computed cache key, the results are still served from cache. This attribute is always excluded from the cache key.
  • If there is a change in the exclusion list that leads to change in the cache key (addition/removal of a header), a new cache key is generated which leads to a cache miss.

Further comments:

Resolves #6642

@rudrakhp rudrakhp force-pushed the cache_ignored_headers branch 2 times, most recently from 8a0052e to 81c75e9 Compare April 4, 2024 12:44
Copy link
Member

@ashutosh-narkar ashutosh-narkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. Here we are excluding certain headers from the key calculation. Is there a reason for that level of granularity? Why not exclude certain request object params themselves from the key calculation?

@rudrakhp
Copy link
Contributor Author

rudrakhp commented Apr 5, 2024

Thanks for working on this. Here we are excluding certain headers from the key calculation. Is there a reason for that level of granularity? Why not exclude certain request object params themselves from the key calculation?

I would have to explore if there is any straight forward way to delete a value by path reference in the Request ast.Object, since we might require that to ignore any generic request attribute from the cache key. Let me get back to you on this next week. Meanwhile please do share if you have any pointers. Thanks!

@ashutosh-narkar
Copy link
Member

@rudrakhp just checking to see if you got a chance to explore the option of excluding certain request object params.

@rudrakhp
Copy link
Contributor Author

@ashutosh-narkar was out last week, will raise an updated PR this week.

Copy link

stale bot commented May 17, 2024

This pull request has been automatically marked as stale because it has not had any activity in the last 30 days.

@stale stale bot added the inactive label May 17, 2024
Copy link
Member

@ashutosh-narkar ashutosh-narkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rudrakhp the changes seem fine. More testing would be helpful. Also we need to update the docs for the builtin.

func getKeyFromRequest(req ast.Object) (ast.Object, error) {
var cacheIgnoredHeaders []string
var allHeaders map[string]interface{}
cacheIgnoredHeadersTerm := req.Get(ast.StringTerm("cache_ignored_headers"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We can do an early exit here

if cacheIgnoredHeadersTerm == nil {
    return nil, nil
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// new copy so headers in request object doesn't change

Did you mean deep copy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess new and copy are redundant, will update comment to:

// deep copy so changes to key do not reflect in the request object

expectedReqCount: 1,
},
{
note: "http.send GET cache miss different headers (force_cache enabled)",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what scenario in the changes is this test case trying to exercise.

Copy link
Contributor Author

@rudrakhp rudrakhp May 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to clarify how cache behaves when cache_ignored_headers has not been set. There was no such test case which tested this cache miss scenario due to different header values.

note: "http.send GET different cache_ignored_headers but still cached (force_cache enabled)",
ruleTemplate: `p = x {
r1 = http.send({"method": "get", "url": "%URL%", "force_json_decode": true, "headers": {"h1": "v1", "h2": "v2"}, "force_cache": true, "force_cache_duration_seconds": 300, "cache_ignored_headers": ["h2"]})
r2 = http.send({"method": "get", "url": "%URL%", "force_json_decode": true, "headers": {"h1": "v1", "h2": "v2"}, "force_cache": true, "force_cache_duration_seconds": 300, "cache_ignored_headers": ["h2", "h3"]}) # cached
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for testing we can actually have a h3 header in the headers object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also a test for the scenario when the value of cache_ignored_headers and headers differs and we get a cache miss would be helpful

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more case I can think of:

R1: {"headers": {"h1": "v1"}}
R2: {"headers": {"h1": "v1", "h2": "v2"}, "cache_ignored_headers": ["h2"]}

So here R1 and R2 are equivalent, correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another one:

R1: {"headers": {"h1": "v1"}, "cache_ignored_headers": []}
R2: {"headers": {"h1": "v1", "h2": "v2"}, "cache_ignored_headers": ["h2"]}

So here R1 and R2 are equivalent, correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@stale stale bot removed the inactive label May 18, 2024
@ashutosh-narkar
Copy link
Member

@rudrakhp this would be a good one to get in the next release. Let us know if you need any help. Thanks.

@rudrakhp
Copy link
Contributor Author

@ashutosh-narkar I just realised that request object was a pointer and it was getting modified as we modify the key, the downstream request will also not get these ignored headers. I have made a couple of new changes to address that, please do review. Thanks for the comments!

@rudrakhp rudrakhp force-pushed the cache_ignored_headers branch 2 times, most recently from 85d4af0 to 8c78f07 Compare May 26, 2024 19:52
Copy link

netlify bot commented May 26, 2024

Deploy Preview for openpolicyagent ready!

Name Link
🔨 Latest commit 54a5835
🔍 Latest deploy log https://app.netlify.com/sites/openpolicyagent/deploys/6658a82ed49ad8000858d4b7
😎 Deploy Preview https://deploy-preview-6675--openpolicyagent.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@rudrakhp rudrakhp force-pushed the cache_ignored_headers branch from 8c78f07 to 66508d1 Compare May 26, 2024 19:54
@@ -729,13 +772,13 @@ func newHTTPSendCache() *httpSendCache {
}

func valueHash(v util.T) int {
return v.(ast.Value).Hash()
return ast.StringTerm(v.(ast.Value).String()).Hash()
Copy link
Contributor Author

@rudrakhp rudrakhp May 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ashutosh-narkar fyi needed changes to the hash and eq functions of the hashmap to bring parity between inter and intra query cache key generation. The inter query cache already relies on the String() representation of Value for cache key.
Relying on the raw Value instead of theString() representation will always lead to a cache miss in intra query due to change in Hash when Copy() from request object is performed during key generation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation.

The inter query cache already relies on the String() representation of Value for cache key.

Can you point to where this code is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ashutosh-narkar please refer topdown/cache/cache.go#L240, String() representation of the input key (of type ast.Value) is used as a cache key.

@rudrakhp rudrakhp force-pushed the cache_ignored_headers branch from 66508d1 to 06c7372 Compare May 26, 2024 20:06
| `caching_mode` | no | `string` | Controls the format in which items are inserted into the inter-query cache. Allowed modes are `serialized` and `deserialized`. In the `serialized` mode, items will be serialized before inserting into the cache. This mode is helpful if memory conservation is preferred over higher latency during cache lookup. This is the default mode. In the `deserialized` mode, an item will be inserted in the cache without any serialization. This means when items are fetched from the cache, there won't be a need to decode them. This mode helps to make the cache lookup faster at the expense of more memory consumption. If this mode is enabled, the configured `caching.inter_query_builtin_cache.max_size_bytes` value will be ignored. This means an unlimited cache size will be assumed. |
| `raise_error` | no | `bool` | If `raise_error` is set, `http.send` will return an error that can halt policy evaluation when used in conjunction with the `strict-builtin-errors` option. Default: `true`. |
| `max_retry_attempts` | no | `number` | Number of times to retry a HTTP request when a network error is encountered. If provided, retries are performed with an exponential backoff delay. Default: `0`. |
| `cache_ignored_headers` | no | `list` | List of header keys from `headers` parameter that should not considered when interacting with the cache. Default is `nil`, meaning all headers will be considered. **Important:** Note that if a cache entry exists with a subset/superset of headers that are considered in this request, it will lead to a cache miss. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important: Note that if a cache entry exists with a subset/superset of headers that are considered in this request, it will lead to a cache miss.

Can you provide an example of this please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an unit test for this here.

func getKeyFromRequest(req ast.Object) (ast.Object, error) {
var cacheIgnoredHeaders []string
var allHeaders map[string]interface{}
cacheIgnoredHeadersTerm := req.Get(ast.StringTerm("cache_ignored_headers"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// new copy so headers in request object doesn't change

Did you mean deep copy?

@@ -729,13 +772,13 @@ func newHTTPSendCache() *httpSendCache {
}

func valueHash(v util.T) int {
return v.(ast.Value).Hash()
return ast.StringTerm(v.(ast.Value).String()).Hash()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation.

The inter query cache already relies on the String() representation of Value for cache key.

Can you point to where this code is?

@rudrakhp rudrakhp force-pushed the cache_ignored_headers branch 2 times, most recently from 4fd9a4a to 4cf2338 Compare May 29, 2024 18:02
note: "http.send GET cache miss different headers in cache key",
ruleTemplate: `p = x {
r1 = http.send({"method": "get", "url": "%URL%", "force_json_decode": true, "headers": {"h1": "v1", "h2": "v2", "h3": "v3"}, "cache_ignored_headers": ["h2"]})
r2 = http.send({"method": "get", "url": "%URL%", "force_json_decode": true, "headers": {"h1": "v1", "h2": "v21"}, "cache_ignored_headers": ["h2"]}) # cached
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# cached this is a miss, correct?

Copy link
Member

@ashutosh-narkar ashutosh-narkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good @rudrakhp. Can you please squash your commits.

@rudrakhp rudrakhp force-pushed the cache_ignored_headers branch from 4cf2338 to a74670b Compare May 30, 2024 16:23
@rudrakhp rudrakhp force-pushed the cache_ignored_headers branch from a74670b to 54a5835 Compare May 30, 2024 16:24
Copy link
Member

@ashutosh-narkar ashutosh-narkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ashutosh-narkar ashutosh-narkar merged commit eeb6338 into open-policy-agent:main May 30, 2024
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Interquery http.send cache - Add provision to exclude/include headers from cache key
2 participants