Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Progress bar when read/write #261

Open
matthewgson opened this issue Jul 22, 2021 · 1 comment
Open

Progress bar when read/write #261

matthewgson opened this issue Jul 22, 2021 · 1 comment
Assignees
Milestone

Comments

@matthewgson
Copy link

Thank you for creating this awesome package, and it has been my go-to package whenever I save big files on disk.
I hope to see the progressbar when I read/write big file. Is there a plan for implementing a simple progress bar option when reading /writing fst file in the future?

@MarcusKlik
Copy link
Collaborator

MarcusKlik commented Nov 16, 2022

Hi @matthewgson, thanks for your request!

A progress bar would be very nice, but the actual call to the fstlib C++ library doesn't come back after the complete file has been read. We could create a hook to call from fstlib to update a progress bar, but that seems like overkill (and would add more dependencies to the fst package).

If you want feedback when reading very large files, you could read chunks and update a progress bar after each chunk, would that work for you?

library(dplyr)
library(fst)
library(progress)

# function to read and show progress
read_fst_progress <- function(path, columns) {

  nr_of_rows <- metadata_fst(path)$nrOfRows

  # determine chunks
  nr_of_chunks <- 100
  chunk_size <- 1 + (nr_of_rows - 1) %/% nr_of_chunks  # take partial chunks into account

  pb <- progress_bar$new(total = 100)

  lapply(1:nr_of_chunks, function(chunk) {

    pb$tick()
    Sys.sleep(0.1)  # remove this line!!!

    y <- read_fst(
      tmp_file,
      columns = columns,
      from = 1 + (chunk - 1) * chunk_size,
      to = min(chunk * chunk_size, nr_of_rows)
    )
  }) %>%
    bind_rows
}

# write sample fst file
tmp_file <- tempfile(fileext = "fst")
nr_of_rows <- 1e6
data.frame(
  X = sample(sample(1:100, nr_of_rows, replace = TRUE)),
  Y = LETTERS[sample(1:26, nr_of_rows, replace = TRUE)]
) %>%
  write_fst(tmp_file)

y <- read_fst_progress(tmp_file)

#> [===========================================================>------------------------------------------]  59%

@MarcusKlik MarcusKlik added this to the fst v0.9.10 milestone Nov 16, 2022
@MarcusKlik MarcusKlik self-assigned this Nov 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants