-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Empty dataset error kills the workflow #240
Comments
+1 to this: I noticed this as well |
Previously, an EmptyDatasetError was raised when the dataset was empty after running the pipeline. This change logs a warning and continues processing instead, allowing the function to handle empty datasets more gracefully. Fixes instructlab#240 Signed-off-by: Tyler Lisowski <lisowski@us.ibm.com>
Previously, an EmptyDatasetError was raised when the dataset was empty after running the sdg pipeline of a leaf node. This change logs a warning and continues processing instead, allowing the function to handle empty datasets more gracefully and process other leaf nodes in the taxonomy. Fixes instructlab#240 Signed-off-by: Tyler Lisowski <lisowski@us.ibm.com>
Previously, an EmptyDatasetError was raised when the dataset was empty after running the sdg pipeline of a leaf node. This change logs a warning and continues processing instead, allowing the function to handle empty datasets more gracefully and process other leaf nodes in the taxonomy. Fixes instructlab#240 Signed-off-by: Tyler Lisowski <lisowski@us.ibm.com>
It is getting better with this patch but it would be nicer if it could have some hint on possible reasons. Like, "please ensure the number of examples is enough.", "please make sure it attends the guidelines at HTTP", or something like that. You will know better. What I know is that I just spent a day debugging this issue. I could only understand the reason after I found the issue that led to this MR, #240 |
@marceloleitner Those are reasonable suggestions, although I'd ask that perhaps that be a separate issue because that's less about handling the case of a dataset being empty without crashing and more a request for better logging when something fails during the generation giving a user more indication of what potential causes of that type of failure may be. |
Previously, an EmptyDatasetError was raised when the dataset was empty after running the sdg pipeline of a leaf node. This change logs a warning and continues processing instead, allowing the function to handle empty datasets more gracefully and process other leaf nodes in the taxonomy. Fixes instructlab#240 Signed-off-by: Tyler Lisowski <lisowski@us.ibm.com>
When the qna.yaml is not appropriate or a wrong model is used, the generation fails to happen and throws an error --
instructlab.sdg.pipeline.EmptyDatasetError: Pipeline stopped: Empty dataset after running pipe
Proposed solution:
The text was updated successfully, but these errors were encountered: