Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SDK] improve PVC error message #2491

Open
mahdikhashan opened this issue Jan 16, 2025 · 7 comments · May be fixed by #2496
Open

[SDK] improve PVC error message #2491

mahdikhashan opened this issue Jan 16, 2025 · 7 comments · May be fixed by #2496
Assignees
Labels
area/api good first issue Good for newcomers help wanted Extra attention is needed kind/bug

Comments

@mahdikhashan
Copy link
Contributor

What happened?

while trying to implement a notebook on local with the recent llm hp python sdk, i faced the below error

File ~/miniconda3/envs/llm-hp-optimization-katib-nb/lib/python3.9/site-packages/kubeflow/katib/api/katib_client.py:580, in KatibClient.tune(self, name, model_provider_parameters, dataset_provider_parameters, trainer_parameters, storage_config, objective, base_image, parameters, namespace, env_per_trial, algorithm_name, algorithm_settings, objective_metric_name, additional_metric_names, objective_type, objective_goal, max_trial_count, parallel_trial_count, max_failed_trial_count, resources_per_trial, retain_trials, packages_to_install, pip_index_url, metrics_collector_config)
    578             break
    579     else:
--> 580         raise RuntimeError(f"failed to create PVC. Error: {e}")
    582 if isinstance(model_provider_parameters, HuggingFaceModelParams):
    583     mp = "hf"

RuntimeError: failed to create PVC. Error: (422)
Reason: Unprocessable Entity
HTTP response headers: HTTPHeaderDict({'Audit-Id': '2abbe5b3-07d0-4254-b710-67520a09c45b', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': 'f421b4f1-b00a-449d-981b-fd500b3697db', 'X-Kubernetes-Pf-Prioritylevel-Uid': '225da36a-6099-4970-8c35-95547fa53796', 'Date': 'Thu, 16 Jan 2025 13:48:00 GMT', 'Content-Length': '948'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"PersistentVolumeClaim \"Llama-3.2-fine-tune\" is invalid: metadata.name: Invalid value: \"Llama-3.2-fine-tune\": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')","reason":"Invalid","details":{"name":"Llama-3.2-fine-tune","kind":"PersistentVolumeClaim","causes":[{"reason":"FieldValueInvalid","message":"Invalid value: \"Llama-3.2-fine-tune\": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')","field":"metadata.name"}]},"code":422}

What did you expect to happen?

i expect the error be more clear for the user, preferable not returning the raw body of the response - and then document the correct usage of the api

json formatted of the error:

{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"PersistentVolumeClaim \"Llama-3.2-fine-tune\" is invalid: metadata.name: Invalid value: \"Llama-3.2-fine-tune\": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')","reason":"Invalid","details":{"name":"Llama-3.2-fine-tune","kind":"PersistentVolumeClaim","causes":[{"reason":"FieldValueInvalid","message":"Invalid value: \"Llama-3.2-fine-tune\": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')","field":"metadata.name"}]},"code":422}

Environment

Kubernetes version:

$ kubectl version

Katib controller version:

$ kubectl get pods -n kubeflow -l katib.kubeflow.org/component=controller -o jsonpath="{.items[*].spec.containers[*].image}"

Katib Python SDK version:

$ pip show kubeflow-katib

Impacted by this bug?

Give it a 👍 We prioritize the issues with most 👍

@mahdikhashan
Copy link
Contributor Author

happened in #2480

@mahdikhashan
Copy link
Contributor Author

@Electronic-Waste hey, would you kindly set the correct labels for this issue and assign it to me?

@mahdikhashan mahdikhashan changed the title [SDK] improve PVC error exception [SDK] improve PVC error message Jan 16, 2025
@helenxie-bit
Copy link
Contributor

Hi, I encountered this issue as well. It would be helpful if the error message could be improved for better readability!

/remove-label lifecycle/needs-triage
/good-first-issue
/area api
/assign @mahdikhashan

By the way, feel free to assign yourself to any issue you’re interested in using /assign in the future.

Copy link

@helenxie-bit:
This request has been marked as suitable for new contributors.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-good-first-issue command.

In response to this:

Hi, I encountered this issue as well. It would be helpful if the error message could be improved for better readability!

/remove-label lifecycle/needs-triage
/good-first-issue
/area api
/assign @mahdikhashan

By the way, feel free to assign yourself to any issue you’re interested in using /assign in the future.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@google-oss-prow google-oss-prow bot added area/api good first issue Good for newcomers help wanted Extra attention is needed and removed lifecycle/needs-triage labels Jan 17, 2025
@Electronic-Waste
Copy link
Member

Hi @mahdikhashan. In Kubeflow community, you can feel free to create an issue and assign it to yourself. And you could also pick some issues labeled with good-first-issue or help-wanted (if they were not assigned to someone else) to get started.

And Thanks for your interest in contributing to Kubeflow!

/remove-label kind/bug

Copy link

@Electronic-Waste: The label(s) /remove-label kind/bug cannot be applied. These labels are supported: tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash, lifecycle/needs-triage. Is this label configured under labels -> additional_labels or labels -> restricted_labels in plugin.yaml?

In response to this:

Hi @mahdikhashan. In Kubeflow community, you can feel free to create an issue and assign it to yourself. And you could also pick some issues labeled with good-first-issue or help-wanted (if they were not assigned to someone else) to get started.

And Thanks for your interest in contributing to Kubeflow!

/remove-label kind/bug

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mahdikhashan
Copy link
Contributor Author

thank you both for your response @helenxie-bit @Electronic-Waste

@mahdikhashan mahdikhashan linked a pull request Jan 17, 2025 that will close this issue
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api good first issue Good for newcomers help wanted Extra attention is needed kind/bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants