Skip to content

Commit

Permalink
Remove the Python parts, for now
Browse files Browse the repository at this point in the history
Will expect to add them back later.
  • Loading branch information
gaborcsardi committed May 1, 2024
1 parent 4ff6f63 commit 15e8322
Show file tree
Hide file tree
Showing 11 changed files with 24 additions and 579 deletions.
37 changes: 0 additions & 37 deletions Makefile

This file was deleted.

40 changes: 24 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,29 +6,37 @@
[![](http://cranlogs.r-pkg.org/badges/miniparquet)](https://dgrtwo.shinyapps.io/cranview/)
<!-- badges: end -->

`miniparquet` is a reader for a common subset of Parquet files. miniparquet only supports rectangular-shaped data structures (no nested tables) and only the Snappy compression scheme. miniparquet has no (zero, none, 0) [external dependencies](https://research.swtch.com/deps) and is very lightweight. It compiles in seconds to a binary size of under 1 MB.
`miniparquet` is a reader for a common subset of Parquet files.
miniparquet only supports rectangular-shaped data structures
(no nested tables) and only the Snappy compression scheme.
miniparquet has no (zero, none, 0)
[external dependencies](https://research.swtch.com/deps) and is very
lightweight. It compiles in seconds to a binary size of under 1 MB.

## Installation
Miniparquet comes as C++ library, a Python package and a R package. Install the R package like so:

`devtools::install_github("hannesmuehleisen/miniparquet")`

The C++ library can be built by typing `make`.

The Python package is installed using `python setup.py install`
Install the R package from CRAN:

```r
install.packages("miniparquet")
```

## Usage
Use the R package like so: `df <- miniparquet::parquet_read("example.parquet")`

Folders of similar-structured Parquet files (e.g. produced by Spark) can be read like this:

`df <- data.table::rbindlist(lapply(Sys.glob("some-folder/part-*.parquet"), miniparquet::parquet_read))`

If you find a file that should be supported but isn't, please open an issue here with a link to the file.
Call `parquet_read()` to read a Parquet file:
```r
df <- miniparquet::parquet_read("example.parquet")
```

Use the Python package like so: `miniparquet.read('example.parquet')`. You can convert the result to a Pandas dataframe like so: `pandas.DataFrame.from_dict(miniparquet.read('example.parquet'))`
Folders of similar-structured Parquet files (e.g. produced by Spark)
can be read like this:

```r
df <- data.table::rbindlist(lapply(
Sys.glob("some-folder/part-*.parquet"),
miniparquet::parquet_read
))
```

## Performance
`miniparquet` is quite fast, on my laptop (I7-4578U) it can read compressed Parquet files at over 200 MB/s using only a single thread. Previously, there was a comparision with the arrow package here, but it appeared that results were caused by a bug which is fixed.
If you find a file that should be supported but isn't, please open an
issue here with a link to the file.
59 changes: 0 additions & 59 deletions bench.cpp

This file was deleted.

1 change: 0 additions & 1 deletion dependencies.R

This file was deleted.

18 changes: 0 additions & 18 deletions dump.py

This file was deleted.

123 changes: 0 additions & 123 deletions pq2csv.cpp

This file was deleted.

85 changes: 0 additions & 85 deletions roundingdiff.py

This file was deleted.

Loading

0 comments on commit 15e8322

Please sign in to comment.