Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s3_list_parts and s3_list_multipart_uploads return lists with empty elements #608

Closed
jogrue opened this issue Apr 24, 2023 · 7 comments
Closed
Labels
bug 🐞 Something isn't working
Milestone

Comments

@jogrue
Copy link

jogrue commented Apr 24, 2023

Hi everyone,

I was trying to implement the multipart upload for larger files but s3's list_parts function returns a list where every entry is empty. Also the Parts list is empty. The list_multipart_uploads function also returns a list with empty elements.

Unfortunately, I did not get to do a lot of debugging. Was running this on a computer with R 4.2.2 and paws 0.2.0. Does this work for anyone else? I ran the same commands using the aws CLI (same credentials), and there I got results.

I also stumbled across this bug #501 (and fix by @DyfanJones and @davidkretch here: #503), not sure if this got anything to do with my issue

@DyfanJones DyfanJones added the bug 🐞 Something isn't working label Apr 24, 2023
@DyfanJones
Copy link
Member

Hi @jogrue, thanks for identifying this. I will look into it :D

@DyfanJones
Copy link
Member

DyfanJones commented Apr 24, 2023

I believe I have found the issue, it looks like an operation name issue. It should be an easy fix.

list_multipart_uploads <- function(Bucket, Delimiter = NULL, EncodingType = NULL, KeyMarker = NULL,
                                   MaxUploads = NULL, Prefix = NULL, UploadIdMarker = NULL,
                                   ExpectedBucketOwner = NULL) {
  op <- paws.common:::new_operation(
    name = "ListMultipartUploadsRequest", http_method = "GET",
    http_path = "/{Bucket}?uploads", paginator = list()
  )
  input <- paws.storage:::.s3$list_multipart_uploads_input(
    Bucket = Bucket,
    Delimiter = Delimiter, EncodingType = EncodingType, KeyMarker = KeyMarker,
    MaxUploads = MaxUploads, Prefix = Prefix, UploadIdMarker = UploadIdMarker,
    ExpectedBucketOwner = ExpectedBucketOwner
  )
  output <- paws.storage:::.s3$list_multipart_uploads_output()
  config <- paws.common:::get_config()
  svc <- paws.storage:::.s3$service(config)
  request <- paws.common:::new_request(svc, op, input, output)
  response <- paws.common:::send_request(request)
  return(response)
}

list_parts <- function(Bucket, Key, MaxParts = NULL, PartNumberMarker = NULL,
                       UploadId, RequestPayer = NULL, ExpectedBucketOwner = NULL,
                       SSECustomerAlgorithm = NULL, SSECustomerKey = NULL, SSECustomerKeyMD5 = NULL) {
  op <- paws.common:::new_operation(
    name = "ListPartsRequest", http_method = "GET",
    http_path = "/{Bucket}/{Key+}", paginator = list()
  )
  input <- paws.storage:::.s3$list_parts_input(
    Bucket = Bucket, Key = Key,
    MaxParts = MaxParts, PartNumberMarker = PartNumberMarker,
    UploadId = UploadId, RequestPayer = RequestPayer, ExpectedBucketOwner = ExpectedBucketOwner,
    SSECustomerAlgorithm = SSECustomerAlgorithm, SSECustomerKey = SSECustomerKey,
    SSECustomerKeyMD5 = SSECustomerKeyMD5
  )
  output <- paws.storage:::.s3$list_parts_output()
  config <- paws.common:::get_config()
  svc <- paws.storage:::.s3$service(config)
  request <- paws.common:::new_request(svc, op, input, output)
  response <- paws.common:::send_request(request)
  return(response)
}

bucket <- "MyTestBucket"
key <- "dummy.csv"

s3 <- paws::s3()

upload_id <- s3$create_multipart_upload(
  Bucket = bucket, Key = key
)$UploadId

list_parts(
  Bucket = bucket, Key = key, UploadId = upload_id
)

list_multipart_uploads(
  Bucket = bucket, Prefix = key
)

s3$abort_multipart_upload(
  Bucket = bucket, Key = key, UploadId = upload_id
)

@DyfanJones
Copy link
Member

@jogrue This fix will require a paws regen, so should be in the cran in the next couple of weeks or so.

If you want to upload files in multiparts have a look at the package s3fs. This is an R package (that has been inspired by Python’s s3fs), however it’s API and implementation has been developed to follow R’s fs.

If you want to develop your own method then have a look at the private function .s3_upload_multipart_file, it should provide some insight in how to develop your own variant :)

@DyfanJones
Copy link
Member

@jogrue you should be able to try out the fix on my dev fork. Please try out and let me know how you get on :)

# Installed regen paws.storage (dev)
remotes::install_github("dyfanjones/paws/cran/paws.storage", ref = "regen_paws")
bucket <- "MyBucket"
key <- "dummy.csv"

s3 <- paws::s3()

upload_id <- s3$create_multipart_upload(
  Bucket = bucket, Key = key
)$UploadId

s3$list_parts(
  Bucket = bucket, Key = key, UploadId = upload_id
)

s3$list_multipart_uploads(
  Bucket = bucket, Prefix = key
)

s3$abort_multipart_upload(
  Bucket = bucket, Key = key, UploadId = upload_id
)

@jogrue
Copy link
Author

jogrue commented Apr 25, 2023

Hi @DyfanJones!

Thanks a lot, I got it to work now.

Only thing I had to do: The parts list returned as part of list_parts did not adhere to the format expected in complete_multipart_upload. I got an error about the LastModified element.

Basically, each Part in the returned Partslist consists of PartNumber, LastModified, ETag, Size, ChecksumCRC32, ChecksumCRC32C, ChecksumSHA1, ChecksumSHA256. According to the documentation, the parts in the Parts list supplied to complete_multipart_upload should not have LastModified and Size.

So with something like this, it now worked for me

format_part <- function(Part) {
  new_part <- Part[c('ETag', 'ChecksumCRC32', 'ChecksumCRC32C',
                     'ChecksumSHA1', 'ChecksumSHA256', 'PartNumber')]
  return(new_part)
}

Parts <- lapply(ret$Parts, format_part)

Thanks again! Will also take a closer look at s3fs.

@DyfanJones
Copy link
Member

Cool that is good to know :) I will merge PR so that it can be picked up in the regen.

I believe the behaviour of returning PartNumber, LastModified, ETag, Size, ChecksumCRC32, ChecksumCRC32C, ChecksumSHA1, ChecksumSHA256 for Parts in s3_list_parts is correct. This appears the be the same returning syntax for boto3

paws s3_list_parts response syntax for Parts:

  Parts = list(
    list(
      PartNumber = 123,
      LastModified = as.POSIXct(
        "2015-01-01"
      ),
      ETag = "string",
      Size = 123,
      ChecksumCRC32 = "string",
      ChecksumCRC32C = "string",
      ChecksumSHA1 = "string",
      ChecksumSHA256 = "string"
    )
  )

Boto3 client.list_parts response syntax for Parts:

'Parts': [
    {
        'PartNumber': 123,
        'LastModified': datetime(2015, 1, 1),
        'ETag': 'string',
        'Size': 123,
        'ChecksumCRC32': 'string',
        'ChecksumCRC32C': 'string',
        'ChecksumSHA1': 'string',
        'ChecksumSHA256': 'string'
    }

Side Note: I believe you can do the multipart upload without the need to call list_uploads. Here is a quick example that s3fs uses 😄

# file you want to upload in multiparts
dummy_csv <- "dummy.csv"
write.csv(iris, dummy_csv)

# s3 bucket and key to upload to
bucket <- "MyBucket"
key <- "dummy.csv"

size <- file.size(dummy_csv)

# size of each multipart in bytes
max_batch <- 2000
num_parts <- ceiling(size / max_batch)

# set up a connection to file
con <- file(dummy_csv, open = "rb")

s3 <- paws::s3()

# setup multipart upload
upload_id <- s3$create_multipart_upload(
  Bucket = bucket, Key = key
)$UploadId

kwargs <- list(
  Bucket = bucket,
  Key = key,
  Uploadid = upload_id
)

# upload file in multiparts
parts <- lapply(seq_len(num_parts), function(i) {
  body <- readBin(con, what = "raw", n = max_batch)
  kwargs$Body = body
  etag <- do.call(s3$upload_part, kwargs)$Etag
  return(list(ETag = etag, PartNumber = i))
})

kwargs$MultipartUpload <- list(Parts = parts)
kwargs$Body <- NULL
kwargs$PartNumber <- NULL

# complete multipart upload
do.call(s3$complete_multipart_upload, kwargs)

# close file connection
close(con)

Hope this helps

@DyfanJones DyfanJones added this to the paws 0.3.0 milestone Jun 14, 2023
@DyfanJones
Copy link
Member

@jogrue closing ticket as paws 0.3.0 fixed issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants