-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow CSV.open with StringIO argument #302
Conversation
lib/csv.rb
Outdated
pos = filename_or_io.pos | ||
f = StringIO.new(filename_or_io.read) | ||
filename_or_io.seek(pos) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use StringIO#string
here?
pos = filename_or_io.pos | |
f = StringIO.new(filename_or_io.read) | |
filename_or_io.seek(pos) | |
f = StringIO.new(filename_or_io.string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd then run into an issue similar to #98, which I confirmed with another test:
def test_foreach_stringio_with_bom
s = StringIO.new
s << "\ufeff" # BOM
s << @data
s.rewind
s.read(3)
rows = CSV.foreach(s, col_sep: "\t", row_sep: "\r\n").to_a
assert_equal(@rows, rows)
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the most relevant counter-argument is, CSV.new
also uses read
, so using read
here is also a matter of consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Could you also add a test for the BOM case?
How about using StringIO#set_encoding_by_bom
for it?
f = StringIO.new(filename_or_io.string)
unless f.set_encoding_by_bom
f.set_encoding(filename_or_io.external_encoding)
end
CSV.new
uses already opened IO. So it must not ignore the current position. So we must use read
.
But CSV.open
opens a new IO. So it must use all contents instead of contents after the current position.
Could you check the current behavior of CSV.open
with sought real IO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But CSV.open opens a new IO. So it must use all contents instead of contents after the current position.
You're right, CSV.open
reads the whole file content. I changed open
again to use string
instead of read
.
Could you also add a test for the BOM case?
Sure thing. I added it, and together with using the **file_opts
in the new StringIO
, it's handled correctly.
How about using StringIO#set_encoding_by_bom for it?
I couldn't see any effect of doing it. It seems like including the file_opts
is enough. Or is there a case I'm missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't see any effect of doing it.
Ah, b706d91 is a trick for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there anything you still feel is missing, after I've added file_opts
with the trick from b706d91 ?
Could you rebase on main? Our CI on master broken... and I've fixed it now. |
Thank you, I've rebase it now. |
Thanks. But sorry... The master CI was broken... I've fixed it. Could you rebase on master again? |
No problem, I rebased it now. |
I'm assuming the test failures are related to https://bugs.ruby-lang.org/issues/20526, right? |
Hmm. It should be fixed by 47bf76f . |
Rebased it, and removed a now unneeded check when closing the IO within |
Ah, the failed test is an added test. Does this work? diff --git a/lib/csv.rb b/lib/csv.rb
index 7511ced..d2d1c57 100644
--- a/lib/csv.rb
+++ b/lib/csv.rb
@@ -1591,7 +1591,10 @@ class CSV
# wrap a File opened with the remaining +args+ with no newline
# decorator
file_opts = {}
- may_enable_bom_deletection_automatically(mode, options, file_opts)
+ may_enable_bom_deletection_automatically(filename_or_io,
+ mode,
+ options,
+ file_opts)
file_opts.merge!(options)
unless file_opts.key?(:newline)
file_opts[:universal_newline] ||= false
@@ -1900,10 +1903,15 @@ class CSV
private_constant :ON_WINDOWS
private
- def may_enable_bom_deletection_automatically(mode, options, file_opts)
- # "bom|utf-8" may be buggy on Windows:
- # https://bugs.ruby-lang.org/issues/20526
- return if ON_WINDOWS
+ def may_enable_bom_deletection_automatically(filename_or_io,
+ mode,
+ options,
+ file_opts)
+ unless filename_or_io.is_a?(StringIO)
+ # "bom|utf-8" may be buggy on Windows:
+ # https://bugs.ruby-lang.org/issues/20526
+ return if ON_WINDOWS
+ end
return unless Encoding.default_external == Encoding::UTF_8
return if options.key?(:encoding)
return if options.key?(:external_encoding) |
Sorry, I was away for a while. I saw that support to |
…nstead of reusing it
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
This check had been added before we started copying the StringIO argument passed to `#open`, as part of instantiating CSV. Right now we can call `csv.close` even if the underlying IO is a StringIO, without any unexpected side effect.
test/csv/interface/test_read.rb
Outdated
if RUBY_VERSION >= "2.7" | ||
# Support to StringIO was dropped for Ruby 2.6 and earlier: | ||
# https://github.com/ruby/stringio/pull/47 | ||
def test_foreach_stringio | ||
string_io = StringIO.new(@data) | ||
rows = CSV.foreach(string_io, col_sep: "\t", row_sep: "\r\n").to_a | ||
assert_equal(@rows, rows) | ||
end | ||
|
||
def test_foreach_stringio_with_bom | ||
string_io = StringIO.new("\ufeff#{@data}") # U+FEFF ZERO WIDTH NO-BREAK SPACE | ||
rows = CSV.foreach(string_io, col_sep: "\t", row_sep: "\r\n").to_a | ||
assert_equal(@rows, rows) | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does test_foreach_stringio
work with Ruby 2.7? If so, could you omit only test_foreach_stringio_with_bom
?
if RUBY_VERSION >= "2.7" | |
# Support to StringIO was dropped for Ruby 2.6 and earlier: | |
# https://github.com/ruby/stringio/pull/47 | |
def test_foreach_stringio | |
string_io = StringIO.new(@data) | |
rows = CSV.foreach(string_io, col_sep: "\t", row_sep: "\r\n").to_a | |
assert_equal(@rows, rows) | |
end | |
def test_foreach_stringio_with_bom | |
string_io = StringIO.new("\ufeff#{@data}") # U+FEFF ZERO WIDTH NO-BREAK SPACE | |
rows = CSV.foreach(string_io, col_sep: "\t", row_sep: "\r\n").to_a | |
assert_equal(@rows, rows) | |
end | |
end | |
def test_foreach_stringio | |
string_io = StringIO.new(@data) | |
rows = CSV.foreach(string_io, col_sep: "\t", row_sep: "\r\n").to_a | |
assert_equal(@rows, rows) | |
end | |
def test_foreach_stringio_with_bom | |
if RUBY_VERSION < "2.7" | |
# Support to StringIO was dropped for Ruby 2.6 and earlier without BOM support: | |
# https://github.com/ruby/stringio/pull/47 | |
omit("StringIO's BOM support isn't available with Ruby < 2.7") | |
end | |
string_io = StringIO.new("\ufeff#{@data}") # U+FEFF ZERO WIDTH NO-BREAK SPACE | |
rows = CSV.foreach(string_io, col_sep: "\t", row_sep: "\r\n").to_a | |
assert_equal(@rows, rows) | |
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does if I also initialize StringIO
without additional options. I just pushed another commit like that. It's unfortunate having to run an additional RUBY_VERSION
check at runtime, but at least that's only when calling open
.
lib/csv.rb
Outdated
f = if RUBY_VERSION >= "2.7" | ||
StringIO.new(filename_or_io.string, **file_opts) | ||
else | ||
StringIO.new(filename_or_io.string) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we always use an empty file_opts
for Ruby 2.7 or earlier instead?
diff --git a/lib/csv.rb b/lib/csv.rb
index 5f75894..6f94bd4 100644
--- a/lib/csv.rb
+++ b/lib/csv.rb
@@ -1605,11 +1605,7 @@ class CSV
options.delete_if {|k, _| /newline\z/.match?(k)}
if filename_or_io.is_a?(StringIO)
- f = if RUBY_VERSION >= "2.7"
- StringIO.new(filename_or_io.string, **file_opts)
- else
- StringIO.new(filename_or_io.string)
- end
+ f = StringIO.new(filename_or_io.string, **file_opts)
else
begin
f = File.open(filename_or_io, mode, **file_opts)
@@ -1911,7 +1907,11 @@ class CSV
mode,
options,
file_opts)
- unless filename_or_io.is_a?(StringIO)
+ if filename_or_io.is_a?(StringIO)
+ # Support to StringIO was dropped for Ruby 2.6 and earlier without BOM support:
+ # https://github.com/ruby/stringio/pull/47
+ return if RUBY_VERSION < "3.0"
+ else
# "bom|utf-8" may be buggy on Windows:
# https://bugs.ruby-lang.org/issues/20526
return if ON_WINDOWS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't work because CSV#open
still adds keys to file_opts
here:
Lines 1586 to 1589 in bb93c28
file_opts.merge!(options) | |
unless file_opts.key?(:newline) | |
file_opts[:universal_newline] ||= false | |
end |
For Ruby < 2.7, if file_opts
is not empty, we get TypeError: no implicit conversion of Hash into String
.
I like that something fails if options
is given to CSV#open
when it's not supported, as opposed to simply ignoring it, the way I did it, but then the error should be ideally more clear than what the old StringIO
returns. I can go that route, by rescuing the TypeError
, and raising another exception with a clearer message. In any case, the file_opts[:universal_newline] ||= false
would still have to be removed for < 2.7
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't work because
CSV#open
still adds keys tofile_opts
here:Lines 1586 to 1589 in bb93c28
file_opts.merge!(options) unless file_opts.key?(:newline) file_opts[:universal_newline] ||= false end
Oh, sorry.
For Ruby < 2.7, if
file_opts
is not empty, we getTypeError: no implicit conversion of Hash into String
.I like that something fails if
options
is given toCSV#open
when it's not supported, as opposed to simply ignoring it, the way I did it, but then the error should be ideally more clear than what the oldStringIO
returns. I can go that route, by rescuing theTypeError
, and raising another exception with a clearer message.
I agree with you but can we use raise ... unless file_opts.empty?
for it instead of rescuing the TypeError
?
In any case, the
file_opts[:universal_newline] ||= false
would still have to be removed for< 2.7
.
I agree with you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out passing an empty file_opts
doesn't work either. I've noticed this quirk from the interpreter in Ruby 2.6:
StringIO.new("bla", **{}) # => #<StringIO:0x0000563108bb88b8>
opts = {}
StringIO.new("bla", **opts) # => TypeError (no implicit conversion of Hash into String)
To address that, I've changed the StringIO
initialization in Ruby < 2.7, so that it:
- ignores options that are meant only for
CSV.new
- raises an
ArgumentError
if anything else is passed asoptions
toCSV.open
- initializes
StringIO
without keyword arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Let's use the approach.
Thanks. |
Fix #300