Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid dynamic parse method dispatch for faster access #311

Merged
merged 1 commit into from
Jul 11, 2024

Conversation

marshall-lee
Copy link
Contributor

@marshall-lee marshall-lee commented Jul 4, 2024

On some benchmarks it seems to make a difference:

  • quoted from benchmark/parse.yaml
  • quote_char_nil from benchmark/parse_quote_char_nil.yaml
N_ROWS=5000 rake benchmark:parse benchmark:parse_liberal_parsing benchmark:parse_quote_char_nil benchmark:parse_strip
RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/vladimirkochnev/.asdf/installs/ruby/3.3.3/bin/ruby -v -S benchmark-driver /Users/vladimirkochnev/code/csv/benchmark/parse.yaml
ruby 3.3.3 (2024-06-12 revision f1c7b6f435) [arm64-darwin23]
Calculating -------------------------------------
                      csv 3.3.0      master
            unquoted     22.147      22.131 i/s -     100.000 times in 4.515187s 4.518589s
              quoted     11.517      12.997 i/s -     100.000 times in 8.682986s 7.694298s
               mixed     14.097      13.964 i/s -     100.000 times in 7.093660s 7.161389s
     include_col_sep      5.214       5.188 i/s -     100.000 times in 19.178537s 19.277059s
     include_row_sep      5.195       5.101 i/s -     100.000 times in 19.250419s 19.605061s
        encode_utf-8     16.030      15.984 i/s -     100.000 times in 6.238449s 6.256427s
         encode_sjis     16.546      16.376 i/s -     100.000 times in 6.043603s 6.106603s

Comparison:
                         unquoted
           csv 3.3.0:        22.1 i/s
              master:        22.1 i/s - 1.00x  slower

                           quoted
              master:        13.0 i/s
           csv 3.3.0:        11.5 i/s - 1.13x  slower

                            mixed
           csv 3.3.0:        14.1 i/s
              master:        14.0 i/s - 1.01x  slower

                  include_col_sep
           csv 3.3.0:         5.2 i/s
              master:         5.2 i/s - 1.01x  slower

                  include_row_sep
           csv 3.3.0:         5.2 i/s
              master:         5.1 i/s - 1.02x  slower

                     encode_utf-8
           csv 3.3.0:        16.0 i/s
              master:        16.0 i/s - 1.00x  slower

                      encode_sjis
           csv 3.3.0:        16.5 i/s
              master:        16.4 i/s - 1.01x  slower

RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/vladimirkochnev/.asdf/installs/ruby/3.3.3/bin/ruby -v -S benchmark-driver /Users/vladimirkochnev/code/csv/benchmark/parse_liberal_parsing.yaml
ruby 3.3.3 (2024-06-12 revision f1c7b6f435) [arm64-darwin23]
Calculating -------------------------------------
                                   csv 3.3.0      master
                         unquoted      8.132       8.250 i/s -     100.000 times in 12.297793s 12.121689s
         unquoted_backslash_quote      3.868       3.866 i/s -     100.000 times in 25.849956s 25.869413s
                           quoted      3.642       3.638 i/s -     100.000 times in 27.454032s 27.484247s
quoted_double_quote_outside_quote      2.277       2.202 i/s -     100.000 times in 43.921488s 45.419138s
           quoted_backslash_quote      1.801       1.803 i/s -     100.000 times in 55.522265s 55.464641s
                  include_col_sep      3.644       3.633 i/s -     100.000 times in 27.440353s 27.528626s
                  include_row_sep      3.629       3.614 i/s -     100.000 times in 27.559354s 27.670274s
                     encode_utf-8      8.149       8.136 i/s -     100.000 times in 12.270936s 12.290646s
                      encode_sjis      8.527       8.425 i/s -     100.000 times in 11.727969s 11.868855s

Comparison:
                                      unquoted
                           master:         8.2 i/s
                        csv 3.3.0:         8.1 i/s - 1.01x  slower

                      unquoted_backslash_quote
                        csv 3.3.0:         3.9 i/s
                           master:         3.9 i/s - 1.00x  slower

                                        quoted
                        csv 3.3.0:         3.6 i/s
                           master:         3.6 i/s - 1.00x  slower

             quoted_double_quote_outside_quote
                        csv 3.3.0:         2.3 i/s
                           master:         2.2 i/s - 1.03x  slower

                        quoted_backslash_quote
                           master:         1.8 i/s
                        csv 3.3.0:         1.8 i/s - 1.00x  slower

                               include_col_sep
                        csv 3.3.0:         3.6 i/s
                           master:         3.6 i/s - 1.00x  slower

                               include_row_sep
                        csv 3.3.0:         3.6 i/s
                           master:         3.6 i/s - 1.00x  slower

                                  encode_utf-8
                        csv 3.3.0:         8.1 i/s
                           master:         8.1 i/s - 1.00x  slower

                                   encode_sjis
                        csv 3.3.0:         8.5 i/s
                           master:         8.4 i/s - 1.01x  slower

RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/vladimirkochnev/.asdf/installs/ruby/3.3.3/bin/ruby -v -S benchmark-driver /Users/vladimirkochnev/code/csv/benchmark/parse_quote_char_nil.yaml
ruby 3.3.3 (2024-06-12 revision f1c7b6f435) [arm64-darwin23]
Calculating -------------------------------------
                      csv 3.3.0      master
  without_quote_char     22.840      22.844 i/s -     100.000 times in 4.378284s 4.377488s
      quote_char_nil     32.370      43.729 i/s -     100.000 times in 3.089285s 2.286831s
       col_sep_space     12.135      12.106 i/s -     100.000 times in 8.240368s 8.260030s

Comparison:
               without_quote_char
              master:        22.8 i/s
           csv 3.3.0:        22.8 i/s - 1.00x  slower

                   quote_char_nil
              master:        43.7 i/s
           csv 3.3.0:        32.4 i/s - 1.35x  slower

                    col_sep_space
           csv 3.3.0:        12.1 i/s
              master:        12.1 i/s - 1.00x  slower

RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/vladimirkochnev/.asdf/installs/ruby/3.3.3/bin/ruby -v -S benchmark-driver /Users/vladimirkochnev/code/csv/benchmark/parse_strip.yaml
ruby 3.3.3 (2024-06-12 revision f1c7b6f435) [arm64-darwin23]
Calculating -------------------------------------
                      csv 3.3.0      master
             default     13.132      13.043 i/s -     100.000 times in 7.615051s 7.667227s
      no_quote_strip      8.955       8.957 i/s -     100.000 times in 11.167272s 11.164189s

Comparison:
                          default
           csv 3.3.0:        13.1 i/s
              master:        13.0 i/s - 1.01x  slower

                   no_quote_strip
              master:         9.0 i/s
           csv 3.3.0:         9.0 i/s - 1.00x  slower

Comment on lines 4 to 6
csv: 3.0.1
- gems:
csv: 3.0.2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not remove old versions because we use them as base line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kou Done

else
parse_quotable_loose(&block)
end
@parse_method.call(&block)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use :parse_XXX and __send__ instead of method(:parse_XXX) and call?

Suggested change
@parse_method.call(&block)
__send__(@parse_method, &block)

Comment on lines 801 to 807
@parse_method = if @quote_character.nil?
method(:parse_no_quote)
elsif @liberal_parsing || @strip
method(:parse_quotable_robust)
else
method(:parse_quotable_loose)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use the existing style?

Suggested change
@parse_method = if @quote_character.nil?
method(:parse_no_quote)
elsif @liberal_parsing || @strip
method(:parse_quotable_robust)
else
method(:parse_quotable_loose)
end
if @quote_character.nil?
@parse_method = :parse_no_quote
elsif @liberal_parsing or @strip
@parse_method = :parse_quotable_robust
else
@parse_method = :parse_quotable_loose
end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you check this comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I missed or idea

@@ -987,7 +984,7 @@ def parse_quotable_loose(&block)
quoted_fields = []
elsif line.include?(@cr) or line.include?(@lf)
@scanner.keep_back
@need_robust_parsing = true
@parse_method = method(:parse_quotable_robust)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@parse_method = method(:parse_quotable_robust)
@parse_method = :parse_quotable_robust

@@ -1011,7 +1008,7 @@ def parse_quotable_loose(&block)
row[i] = column[1..-2]
else
@scanner.keep_back
@need_robust_parsing = true
@parse_method = method(:parse_quotable_robust)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@parse_method = method(:parse_quotable_robust)
@parse_method = :parse_quotable_robust

@kou kou changed the title Save @parse_method for faster access Avoid dynamic parse method dispatch for faster access Jul 5, 2024
@marshall-lee marshall-lee requested a review from kou July 6, 2024 20:44
@kou
Copy link
Member

kou commented Jul 6, 2024

Could you benchmark the __send__ approach too?
I think that __send__ is faster than method.call but we should measure them.

@marshall-lee
Copy link
Contributor Author

@kou they appear to perform the same but I agree that __send__ is simpler.

RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/vladimirkochnev/.asdf/installs/ruby/3.3.3/bin/ruby -v -S benchmark-driver /Users/vladimirkochnev/code/csv/benchmark/parse.yaml
ruby 3.3.3 (2024-06-12 revision f1c7b6f435) [arm64-darwin23]
Calculating -------------------------------------
                      csv 3.3.0      master
            unquoted     22.147      22.131 i/s -     100.000 times in 4.515187s 4.518589s
              quoted     11.517      12.997 i/s -     100.000 times in 8.682986s 7.694298s
               mixed     14.097      13.964 i/s -     100.000 times in 7.093660s 7.161389s
     include_col_sep      5.214       5.188 i/s -     100.000 times in 19.178537s 19.277059s
     include_row_sep      5.195       5.101 i/s -     100.000 times in 19.250419s 19.605061s
        encode_utf-8     16.030      15.984 i/s -     100.000 times in 6.238449s 6.256427s
         encode_sjis     16.546      16.376 i/s -     100.000 times in 6.043603s 6.106603s

Comparison:
                         unquoted
           csv 3.3.0:        22.1 i/s
              master:        22.1 i/s - 1.00x  slower

                           quoted
              master:        13.0 i/s
           csv 3.3.0:        11.5 i/s - 1.13x  slower

                            mixed
           csv 3.3.0:        14.1 i/s
              master:        14.0 i/s - 1.01x  slower

                  include_col_sep
           csv 3.3.0:         5.2 i/s
              master:         5.2 i/s - 1.01x  slower

                  include_row_sep
           csv 3.3.0:         5.2 i/s
              master:         5.1 i/s - 1.02x  slower

                     encode_utf-8
           csv 3.3.0:        16.0 i/s
              master:        16.0 i/s - 1.00x  slower

                      encode_sjis
           csv 3.3.0:        16.5 i/s
              master:        16.4 i/s - 1.01x  slower

RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/vladimirkochnev/.asdf/installs/ruby/3.3.3/bin/ruby -v -S benchmark-driver /Users/vladimirkochnev/code/csv/benchmark/parse_liberal_parsing.yaml
ruby 3.3.3 (2024-06-12 revision f1c7b6f435) [arm64-darwin23]
Calculating -------------------------------------
                                   csv 3.3.0      master
                         unquoted      8.132       8.250 i/s -     100.000 times in 12.297793s 12.121689s
         unquoted_backslash_quote      3.868       3.866 i/s -     100.000 times in 25.849956s 25.869413s
                           quoted      3.642       3.638 i/s -     100.000 times in 27.454032s 27.484247s
quoted_double_quote_outside_quote      2.277       2.202 i/s -     100.000 times in 43.921488s 45.419138s
           quoted_backslash_quote      1.801       1.803 i/s -     100.000 times in 55.522265s 55.464641s
                  include_col_sep      3.644       3.633 i/s -     100.000 times in 27.440353s 27.528626s
                  include_row_sep      3.629       3.614 i/s -     100.000 times in 27.559354s 27.670274s
                     encode_utf-8      8.149       8.136 i/s -     100.000 times in 12.270936s 12.290646s
                      encode_sjis      8.527       8.425 i/s -     100.000 times in 11.727969s 11.868855s

Comparison:
                                      unquoted
                           master:         8.2 i/s
                        csv 3.3.0:         8.1 i/s - 1.01x  slower

                      unquoted_backslash_quote
                        csv 3.3.0:         3.9 i/s
                           master:         3.9 i/s - 1.00x  slower

                                        quoted
                        csv 3.3.0:         3.6 i/s
                           master:         3.6 i/s - 1.00x  slower

             quoted_double_quote_outside_quote
                        csv 3.3.0:         2.3 i/s
                           master:         2.2 i/s - 1.03x  slower

                        quoted_backslash_quote
                           master:         1.8 i/s
                        csv 3.3.0:         1.8 i/s - 1.00x  slower

                               include_col_sep
                        csv 3.3.0:         3.6 i/s
                           master:         3.6 i/s - 1.00x  slower

                               include_row_sep
                        csv 3.3.0:         3.6 i/s
                           master:         3.6 i/s - 1.00x  slower

                                  encode_utf-8
                        csv 3.3.0:         8.1 i/s
                           master:         8.1 i/s - 1.00x  slower

                                   encode_sjis
                        csv 3.3.0:         8.5 i/s
                           master:         8.4 i/s - 1.01x  slower

RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/vladimirkochnev/.asdf/installs/ruby/3.3.3/bin/ruby -v -S benchmark-driver /Users/vladimirkochnev/code/csv/benchmark/parse_quote_char_nil.yaml
ruby 3.3.3 (2024-06-12 revision f1c7b6f435) [arm64-darwin23]
Calculating -------------------------------------
                      csv 3.3.0      master
  without_quote_char     22.840      22.844 i/s -     100.000 times in 4.378284s 4.377488s
      quote_char_nil     32.370      43.729 i/s -     100.000 times in 3.089285s 2.286831s
       col_sep_space     12.135      12.106 i/s -     100.000 times in 8.240368s 8.260030s

Comparison:
               without_quote_char
              master:        22.8 i/s
           csv 3.3.0:        22.8 i/s - 1.00x  slower

                   quote_char_nil
              master:        43.7 i/s
           csv 3.3.0:        32.4 i/s - 1.35x  slower

                    col_sep_space
           csv 3.3.0:        12.1 i/s
              master:        12.1 i/s - 1.00x  slower

RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/vladimirkochnev/.asdf/installs/ruby/3.3.3/bin/ruby -v -S benchmark-driver /Users/vladimirkochnev/code/csv/benchmark/parse_strip.yaml
ruby 3.3.3 (2024-06-12 revision f1c7b6f435) [arm64-darwin23]
Calculating -------------------------------------
                      csv 3.3.0      master
             default     13.132      13.043 i/s -     100.000 times in 7.615051s 7.667227s
      no_quote_strip      8.955       8.957 i/s -     100.000 times in 11.167272s 11.164189s

Comparison:
                          default
           csv 3.3.0:        13.1 i/s
              master:        13.0 i/s - 1.01x  slower

                   no_quote_strip
              master:         9.0 i/s
           csv 3.3.0:         9.0 i/s - 1.00x  slower

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you update the description to the latest information before we merge this?

Comment on lines 801 to 807
@parse_method = if @quote_character.nil?
:parse_no_quote
elsif @liberal_parsing or @strip
:parse_quotable_robust
else
:parse_quotable_loose
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@parse_method = if @quote_character.nil?
:parse_no_quote
elsif @liberal_parsing or @strip
:parse_quotable_robust
else
:parse_quotable_loose
end
if @quote_character.nil?
@parse_method = :parse_no_quote
elsif @liberal_parsing or @strip
@parse_method = :parse_quotable_robust
else
@parse_method = :parse_quotable_loose
end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@kou kou merged commit e75132e into ruby:master Jul 11, 2024
46 checks passed
@kou
Copy link
Member

kou commented Jul 11, 2024

Thanks.

@marshall-lee marshall-lee deleted the optimize/parse_method branch July 12, 2024 01:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants