Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty dataset error kills the workflow #240

Closed
2 tasks
aakankshaduggal opened this issue Aug 9, 2024 · 3 comments · Fixed by #272
Closed
2 tasks

Empty dataset error kills the workflow #240

aakankshaduggal opened this issue Aug 9, 2024 · 3 comments · Fixed by #272
Labels
bug Something isn't working

Comments

@aakankshaduggal
Copy link
Member

When the qna.yaml is not appropriate or a wrong model is used, the generation fails to happen and throws an error --
instructlab.sdg.pipeline.EmptyDatasetError: Pipeline stopped: Empty dataset after running pipe

Proposed solution:

  • We need this to fail gracefully and continue operations if there were multiple leaf nodes.
  • Provide more information on the error so that the user knows what steps need to be followed to rectify the issue.
@nathan-weinberg nathan-weinberg added the bug Something isn't working label Aug 20, 2024
@relyt0925
Copy link
Contributor

+1 to this: I noticed this as well

relyt0925 added a commit to relyt0925/sdg that referenced this issue Sep 12, 2024
Previously, an EmptyDatasetError was raised when the dataset was empty after running the pipeline. This change logs a warning and continues processing instead, allowing the function to handle empty datasets more gracefully. Fixes instructlab#240

Signed-off-by: Tyler Lisowski <lisowski@us.ibm.com>
relyt0925 added a commit to relyt0925/sdg that referenced this issue Sep 12, 2024
Previously, an EmptyDatasetError was raised when the dataset was empty after running the sdg pipeline of a leaf node. This change logs a warning and continues processing instead, allowing the function to handle empty datasets more gracefully and process other leaf nodes in the taxonomy. Fixes instructlab#240

Signed-off-by: Tyler Lisowski <lisowski@us.ibm.com>
relyt0925 added a commit to relyt0925/sdg that referenced this issue Sep 15, 2024
Previously, an EmptyDatasetError was raised when the dataset was empty after running the sdg pipeline of a leaf node. This change logs a warning and continues processing instead, allowing the function to handle empty datasets more gracefully and process other leaf nodes in the taxonomy. Fixes instructlab#240

Signed-off-by: Tyler Lisowski <lisowski@us.ibm.com>
@marceloleitner
Copy link

It is getting better with this patch but it would be nicer if it could have some hint on possible reasons. Like, "please ensure the number of examples is enough.", "please make sure it attends the guidelines at HTTP", or something like that. You will know better.

What I know is that I just spent a day debugging this issue. I could only understand the reason after I found the issue that led to this MR, #240

@bbrowning
Copy link
Contributor

@marceloleitner Those are reasonable suggestions, although I'd ask that perhaps that be a separate issue because that's less about handling the case of a dataset being empty without crashing and more a request for better logging when something fails during the generation giving a user more indication of what potential causes of that type of failure may be.

relyt0925 added a commit to relyt0925/sdg that referenced this issue Oct 1, 2024
Previously, an EmptyDatasetError was raised when the dataset was empty after running the sdg pipeline of a leaf node. This change logs a warning and continues processing instead, allowing the function to handle empty datasets more gracefully and process other leaf nodes in the taxonomy. Fixes instructlab#240

Signed-off-by: Tyler Lisowski <lisowski@us.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants