Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] Remove artifact namespace constraint #11505

Open
tferi opened this issue Jan 9, 2025 · 2 comments
Open

[feature] Remove artifact namespace constraint #11505

tferi opened this issue Jan 9, 2025 · 2 comments

Comments

@tferi
Copy link

tferi commented Jan 9, 2025

Feature Area

/area sdk

What feature would you like to see?

Context:

def validate_schema_title(schema_title: str) -> None:

Python dsl users should be able to use custom namespaces for their artifacts. The currently enforced system and google namespaces seem arbitrary, I see no reason why they should be constrained like that. For instance TFX runs perfectly happily on Kubeflow with artifacts from the tfx namespace, and they do so by having their own PipelineSpec compiler that is not the KFP one.

What is the use case or pain point?

We're migrating a TFX Pipeline to Kubeflow, but there are systems that integrate with this pipeline execution, and expect certain components to have certain artifacts with certain types (tfx.Whatever). The native Kubeflow compiler does not allow components to have artifacts that aren't from google or system.

Also, companies other than Google may have perfectly legitimate use cases to introduce their own namespace.

Is there a workaround currently?

Monkey patch the KFP codebase at runtime.

type_utils.validate_schema_title = lambda x: pass


Love this idea? Give it a 👍.

@chensun
Copy link
Member

chensun commented Jan 10, 2025

If the request is to allow tfx.* schema title, I think that's okay and likely a simple change. But if it's for more generic case, that would be more complicated.

When we send the the pipeline spec to API, the artifact with scheme_title will be validated by the backend service, and the title must be preregistered in the system. While custom schema registration is possible via Vertex Metadata API: https://cloud.google.com/vertex-ai/docs/ml-metadata/custom-schemas, it's not implemented on Kubeflow Pipelines open source, and the registration is a separate step that cannot be done during pipeline submission.

I personally think the metadata scheme support is a half-baked solution while the added value is debatable.

@tferi
Copy link
Author

tferi commented Jan 13, 2025

We are also using our own custom namespace in the TFX pipeline that we'd like to migrate, so the request is not just to allow tfx.*.

We use the artifact types as part of the interface between the pipeline and whatever executes it. For instance, when the pipeline is finished, another system may look for an artifact of type namespace.type produced by component to discover results that it's looking for.

We're using Vertex Pipelines, and the error message we got from that on submission told us very clearly that we're referencing an unknown artifact type, so we started registering them as you wrote above. In case this would not be enough for other users, would it be appropriate to print an info level message when the namespace is not (tfx)|(system)|(google) telling the user that custom artifact types must be registered with the runtime?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants