-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(connector): resolve $ref
in json schema
#20446
Conversation
…hao/use-jsonschema
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
license-eye has checked 5568 files.
Valid | Invalid | Ignored | Fixed |
---|---|---|---|
2348 | 1 | 3219 | 0 |
Click to see the invalid file list
- src/connector/codec/tests/integration_tests/json.rs
Use this command to fix any missing license headers
```bash
docker run -it --rm -v $(pwd):/github/workspace apache/skywalking-eyes header fix
</details>
let mut schema = match self.schema_cache.get(&ref_no_fragment) { | ||
Some(cached_schema) => cached_schema.clone(), | ||
None => { | ||
if url_schema == "http" || url_schema == "https" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure we want to support schema reference from http/fs?
Should we support retrieve references from schema registry instead? https://docs.confluent.io/platform/current/schema-registry/fundamentals/serdes-develop/serdes-json.html#schema-references-in-json-schemas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this feature a customer request? Please provide more context, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
User only requested schema ref resolution without an external link.
I thought the HTTP support can be a general approve to support schema registry. But I just realized there is a version segment after the subject. I'll change this pr to only support schema resolution without external resource for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, we do allow users using http and file. So it won't hurt to add those support.
It's just for the schema registry, we need to investigate more into how it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though schemas are identified by URIs, those identifiers are not necessarily network-addressable. They are just identifiers. Generally, implementations don't make HTTP requests (
https://
) or read from the file system (file://
) to fetch schemas. Instead, they provide a way to load schemas into an internal schema database. When a schema is referenced by it's URI identifier, the schema is retrieved from the internal schema database.
https://json-schema.org/understanding-json-schema/structuring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they provide a way to load schemas into an internal schema database
It's true a for a deref library to focus on derefing instead of fetching (in fact it's referencing do). But if we want do do so we will have to ask users to provide all the URL in the schema ref or parse the schema to get all the $ref recurrently before ref.
let mut schema = match self.schema_cache.get(&ref_no_fragment) { | ||
Some(cached_schema) => cached_schema.clone(), | ||
None => { | ||
if url_schema == "http" || url_schema == "https" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though schemas are identified by URIs, those identifiers are not necessarily network-addressable. They are just identifiers. Generally, implementations don't make HTTP requests (
https://
) or read from the file system (file://
) to fetch schemas. Instead, they provide a way to load schemas into an internal schema database. When a schema is referenced by it's URI identifier, the schema is retrieved from the internal schema database.
https://json-schema.org/understanding-json-schema/structuring
#[derive(Debug)] | ||
pub struct JsonRef { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Acknowledge that $id
and $anchor
are currently ignored and unsupported. To be honest I feel like this shall be supported by the json schema library rather than by a hack in the caller that replaces the schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did check the jsonschema library. It's more like designed for validation and is vary hard to use it for dereffing the schema file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we move the impl to a standalone crate? I feel that the kernel should not touch the logic of JsonRef.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. We may need to hack it to support confluent schema registry. The confluent schema registry should be handled differently. IIUC they can use reference name to retrieve a reference instead of URI.
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
close #20351
This PR only resolve the
$ref
in a json schema. Other fields like$anchor
$id
etc was not resolved or checked.This PR does not handle the json schema used by confluent schema registry.
Checklist
Documentation
Release note