Replies: 1 comment 2 replies
-
Hi @KRJackLee - so rapidcsv was mainly designed to be easy to use and enable rapid development. It enables simple high-level access to read and modify CSV data, and this comes with some performance impact (the whole CSV file is read into a vector of vectors of strings - so you can imagine it's not superfast). The number you shared is probably reasonable for rapidcsv. I downloaded a random large CSV file from https://www.stats.govt.nz/large-datasets/csv-files-for-download/ - unzipped it was 818 MB (6 columns, 34959673 rows) and it took 7 seconds for rapidcsv (O2 optimization level with clang) to read on my MacBook Pro (M2 Pro). There are currently no performance-improving flags to use with rapidcsv. I once wanted to add a read-only mode, which could improve performance quite a bit, but I have not really had the need for it myself, so I haven't looked into it.. yet.. Anyway, for maximum performance it will be faster with a custom-written parser, or some library that leaves more handling up to the application. |
Beta Was this translation helpful? Give feedback.
-
I evaluated the time cost of reading csv. The data is about 430 MB, 1,050,000 x 9, which costs 81.633379 seconds. Is it normal performance of
rapidcsv
? If not, did my syntax need improvement in applyingrapidcsv
?I am using MSVC from VS 2022 + CMake built with Ninja, on windows 10.
Hardware is Intel i7-10870H @2.2GHz, 32 GB memory in a laptop.
Beta Was this translation helpful? Give feedback.
All reactions