-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
s3_list_parts and s3_list_multipart_uploads return lists with empty elements #608
Comments
Hi @jogrue, thanks for identifying this. I will look into it :D |
I believe I have found the issue, it looks like an operation name issue. It should be an easy fix. list_multipart_uploads <- function(Bucket, Delimiter = NULL, EncodingType = NULL, KeyMarker = NULL,
MaxUploads = NULL, Prefix = NULL, UploadIdMarker = NULL,
ExpectedBucketOwner = NULL) {
op <- paws.common:::new_operation(
name = "ListMultipartUploadsRequest", http_method = "GET",
http_path = "/{Bucket}?uploads", paginator = list()
)
input <- paws.storage:::.s3$list_multipart_uploads_input(
Bucket = Bucket,
Delimiter = Delimiter, EncodingType = EncodingType, KeyMarker = KeyMarker,
MaxUploads = MaxUploads, Prefix = Prefix, UploadIdMarker = UploadIdMarker,
ExpectedBucketOwner = ExpectedBucketOwner
)
output <- paws.storage:::.s3$list_multipart_uploads_output()
config <- paws.common:::get_config()
svc <- paws.storage:::.s3$service(config)
request <- paws.common:::new_request(svc, op, input, output)
response <- paws.common:::send_request(request)
return(response)
}
list_parts <- function(Bucket, Key, MaxParts = NULL, PartNumberMarker = NULL,
UploadId, RequestPayer = NULL, ExpectedBucketOwner = NULL,
SSECustomerAlgorithm = NULL, SSECustomerKey = NULL, SSECustomerKeyMD5 = NULL) {
op <- paws.common:::new_operation(
name = "ListPartsRequest", http_method = "GET",
http_path = "/{Bucket}/{Key+}", paginator = list()
)
input <- paws.storage:::.s3$list_parts_input(
Bucket = Bucket, Key = Key,
MaxParts = MaxParts, PartNumberMarker = PartNumberMarker,
UploadId = UploadId, RequestPayer = RequestPayer, ExpectedBucketOwner = ExpectedBucketOwner,
SSECustomerAlgorithm = SSECustomerAlgorithm, SSECustomerKey = SSECustomerKey,
SSECustomerKeyMD5 = SSECustomerKeyMD5
)
output <- paws.storage:::.s3$list_parts_output()
config <- paws.common:::get_config()
svc <- paws.storage:::.s3$service(config)
request <- paws.common:::new_request(svc, op, input, output)
response <- paws.common:::send_request(request)
return(response)
}
bucket <- "MyTestBucket"
key <- "dummy.csv"
s3 <- paws::s3()
upload_id <- s3$create_multipart_upload(
Bucket = bucket, Key = key
)$UploadId
list_parts(
Bucket = bucket, Key = key, UploadId = upload_id
)
list_multipart_uploads(
Bucket = bucket, Prefix = key
)
s3$abort_multipart_upload(
Bucket = bucket, Key = key, UploadId = upload_id
) |
@jogrue This fix will require a paws regen, so should be in the cran in the next couple of weeks or so. If you want to upload files in multiparts have a look at the package s3fs. This is an R package (that has been inspired by Python’s s3fs), however it’s API and implementation has been developed to follow R’s fs. If you want to develop your own method then have a look at the private function .s3_upload_multipart_file, it should provide some insight in how to develop your own variant :) |
@jogrue you should be able to try out the fix on my dev fork. Please try out and let me know how you get on :) # Installed regen paws.storage (dev)
remotes::install_github("dyfanjones/paws/cran/paws.storage", ref = "regen_paws") bucket <- "MyBucket"
key <- "dummy.csv"
s3 <- paws::s3()
upload_id <- s3$create_multipart_upload(
Bucket = bucket, Key = key
)$UploadId
s3$list_parts(
Bucket = bucket, Key = key, UploadId = upload_id
)
s3$list_multipart_uploads(
Bucket = bucket, Prefix = key
)
s3$abort_multipart_upload(
Bucket = bucket, Key = key, UploadId = upload_id
) |
Hi @DyfanJones! Thanks a lot, I got it to work now. Only thing I had to do: The parts list returned as part of Basically, each So with something like this, it now worked for me format_part <- function(Part) {
new_part <- Part[c('ETag', 'ChecksumCRC32', 'ChecksumCRC32C',
'ChecksumSHA1', 'ChecksumSHA256', 'PartNumber')]
return(new_part)
}
Parts <- lapply(ret$Parts, format_part) Thanks again! Will also take a closer look at s3fs. |
Cool that is good to know :) I will merge PR so that it can be picked up in the regen. I believe the behaviour of returning paws
Boto3
Side Note: I believe you can do the multipart upload without the need to call # file you want to upload in multiparts
dummy_csv <- "dummy.csv"
write.csv(iris, dummy_csv)
# s3 bucket and key to upload to
bucket <- "MyBucket"
key <- "dummy.csv"
size <- file.size(dummy_csv)
# size of each multipart in bytes
max_batch <- 2000
num_parts <- ceiling(size / max_batch)
# set up a connection to file
con <- file(dummy_csv, open = "rb")
s3 <- paws::s3()
# setup multipart upload
upload_id <- s3$create_multipart_upload(
Bucket = bucket, Key = key
)$UploadId
kwargs <- list(
Bucket = bucket,
Key = key,
Uploadid = upload_id
)
# upload file in multiparts
parts <- lapply(seq_len(num_parts), function(i) {
body <- readBin(con, what = "raw", n = max_batch)
kwargs$Body = body
etag <- do.call(s3$upload_part, kwargs)$Etag
return(list(ETag = etag, PartNumber = i))
})
kwargs$MultipartUpload <- list(Parts = parts)
kwargs$Body <- NULL
kwargs$PartNumber <- NULL
# complete multipart upload
do.call(s3$complete_multipart_upload, kwargs)
# close file connection
close(con) Hope this helps |
@jogrue closing ticket as paws 0.3.0 fixed issue |
Hi everyone,
I was trying to implement the multipart upload for larger files but
s3
'slist_parts
function returns a list where every entry is empty. Also theParts
list is empty. Thelist_multipart_uploads
function also returns a list with empty elements.Unfortunately, I did not get to do a lot of debugging. Was running this on a computer with R 4.2.2 and paws 0.2.0. Does this work for anyone else? I ran the same commands using the
aws
CLI (same credentials), and there I got results.I also stumbled across this bug #501 (and fix by @DyfanJones and @davidkretch here: #503), not sure if this got anything to do with my issue
The text was updated successfully, but these errors were encountered: