Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warning instead of Error when read datapackage without resources #265

Closed
Rafnuss opened this issue Sep 12, 2024 · 5 comments
Closed

Warning instead of Error when read datapackage without resources #265

Rafnuss opened this issue Sep 12, 2024 · 5 comments

Comments

@Rafnuss
Copy link

Rafnuss commented Sep 12, 2024

The same way it's possible to create an empty package (i.e., without resources), I would expect that it's possible to read a datapacakage.json that doesn't include ressources.

I understand that it would not be a valid datapackage according to https://datapackage.org/standard/data-package/#resources:

The resources property is REQUIRED, with at least one resource.

So, throwing a warning message would be required. I'd still love to be able to build a datapackage from an existing datapackage.json and then add the resources later.

pkg <- frictionless::create_package()

frictionless::read_package("https://gist.githubusercontent.com/Rafnuss/457b9096a4fc0ad004115df1712a8923/raw/46d5d6e4461f919f725961b037199f4fcf8313b0/datapackage.json")
#> Error in `frictionless::read_package()`:
#> ! `file`
#>   'https://gist.githubusercontent.com/Rafnuss/457b9096a4fc0ad004115df1712a8923/raw/46d5d6e4461f919f725961b037199f4fcf8313b0/datapackage.json'
#>   must have a resources property containing at least one resource.

Created on 2024-09-12 with reprex v2.1.1

@Rafnuss
Copy link
Author

Rafnuss commented Sep 12, 2024

Or, maybe better allowing to create a package from a datapackage.json as descriptor:

pkg <- frictionless::create_package("https://gist.githubusercontent.com/Rafnuss/457b9096a4fc0ad004115df1712a8923/raw/46d5d6e4461f919f725961b037199f4fcf8313b0/datapackage.json")

Intead of my current solution:

pkg <- create_package(jsonlite::fromJSON("https://gist.githubusercontent.com/Rafnuss/457b9096a4fc0ad004115df1712a8923/raw/46d5d6e4461f919f725961b037199f4fcf8313b0/datapackage.json", simplifyVector = F))

@peterdesmet
Copy link
Member

peterdesmet commented Sep 13, 2024

Thanks for the suggestion!

I initially liked the idea of implementing it as part of create_package(), since it only takes a call to the internal function read_descriptor(). But working on it further I noticed a lot of the tests for read_package() and create_package() had to be duplicated, indicating a wrong scope.

I now think reading from a descriptor file should remain the scope of read_package(), not create_package():

  • It's clearer from the name (might be hard for users to understand that create_package() can also read
  • It's clearer from the argument file (vs. descriptor)
  • It gives a reason for read_package() to exist vs create_package() which is mainly used to start from scratch or alter your list.

Regarding the error on missing resources: it was originally there to be in line with the spec. But since then, every read_package() is passed to create_package() internally (mainly to add class), meaning that they will always have a resources property, since that is added by create_package() if missing. That is sufficient for all other functions. The fact that resources contains no resources (yet) is indeed too strict if you're starting. That should probably be updated in the spec.

So my suggestion is to relax the error to a warning. In the long term (if the specs change), we might want to remove the warning too.

@peterdesmet
Copy link
Member

On second thought, it makes sense that a valid Data Package has at least one resource (and that it is defined like that in the spec). But an implementor like frictionless-r should allow to read packages that are invalid for this reason, so the user can correct it.

I will retain the error when writing packages:

library(frictionless)
x <- create_package()
write_package(x, directory = ".")
#> Error in `write_package()`:
#> ! `package` must have resources.
#> ℹ Use `add_resource()` to add resources.

Created on 2024-09-13 with reprex v2.1.0

This stops the creation of invalid packages and keeps the paradigm that reading and writing without editing should not affect a package. By relaxing the condition from error to warning when reading, a package without resources will silently get the resources property with create_package() and would look differently if we write it again. The above error stops that.

peterdesmet added a commit that referenced this issue Sep 13, 2024
@peterdesmet
Copy link
Member

This has been implemented at #266. @Rafnuss you can test with:

library(devtools)
#> Loading required package: usethis
devtools::install_github("frictionlessdata/frictionless-r#266")
#> Using GitHub PAT from the git credential store.
#> Downloading GitHub repo frictionlessdata/frictionless-r@warn_for_resources
#> 
#> ── R CMD build ─────────────────────────────────────────────────────────────────
#> * checking for file ‘/private/var/folders/dl/xs4s859d7077r33qyzfd6bx80000gn/T/RtmpQ3WKiQ/remotes8e9e13d414f7/frictionlessdata-frictionless-r-e6f7100/DESCRIPTION’ ... OK
#> * preparing ‘frictionless’:
#> * checking DESCRIPTION meta-information ... OK
#> * checking for LF line-endings in source and make files and shell scripts
#> * checking for empty or unneeded directories
#> Omitted ‘LazyData’ from DESCRIPTION
#> * building ‘frictionless_1.2.0.9000.tar.gz’
library(frictionless)
#> 
#> Attaching package: 'frictionless'
#> The following object is masked from 'package:usethis':
#> 
#>     create_package

file <- "https://gist.githubusercontent.com/Rafnuss/457b9096a4fc0ad004115df1712a8923/raw/46d5d6e4461f919f725961b037199f4fcf8313b0/datapackage.json"
frictionless::read_package(file)
#> Warning: `file`
#> 'https://gist.githubusercontent.com/Rafnuss/457b9096a4fc0ad004115df1712a8923/raw/46d5d6e4461f919f725961b037199f4fcf8313b0/datapackage.json'
#> should have a resources property containing at least one resource.
#> ℹ Use `add_resource()` to add resources.
#> A Data Package with 0 resources.
#> Use `unclass()` to print the Data Package as a list.

Created on 2024-09-13 with reprex v2.1.0

@Rafnuss
Copy link
Author

Rafnuss commented Sep 13, 2024

Great, thanks! That's works for me. (I haven't done much testing)

@Rafnuss Rafnuss closed this as completed Sep 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants