Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove Kconv.toutf8 conversion #16

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion lib/ogpr/fetcher/html_fetcher.rb
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@ def fetch(headers = {})
acceptable_content!(head.headers[:content_type])

res = send_request(:get, @uri, headers)
Kconv.toutf8(res.to_str)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary: my opinion is that such behavior (converting string encodings in this gem) better to be configurable for various use cases, instead of removing this line simply.

read the followings for the detail. 🙏🏼


At first, let's check the String value in the OGP spec.
https://ogp.me/#string 👀

As you can see in the official docs, String value is described as A sequence of Unicode characters. (Unicode, but not UTF-8)
So, I think that this gem should follow the String value spec as possible.

Based on this thought, and just for my personal use,
I had decided to convert those web contents(meta tags) into UTF-8 encoding.
(I think that this is the root cause of those encoding issue in this gem, and my bad decision. 😢 )

However, web contents (especially meta tag values in HTML files in this context) could be in various encodings as you know.
After merging your PR, users of this library will have to consider OGP string encoding without any additional information (like, which string encoding was used in each web site).

Due to above reason, I don't think that removing converting string encodings is the best way, like this PR. 🤔

So, as the result, as I wrote in the head of this comment,
my opinion is that such behavior (converting string encodings in this gem) better to be configurable for various cases.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I simply made a GitHub issue for this encoding issue, #17

rescue => e
raise e
end
Expand Down