Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Files beginning with the Swedish characters å, ä, ö are renamed on upload #37

Open
havet opened this issue Jan 31, 2016 · 13 comments
Open

Comments

@havet
Copy link
Contributor

havet commented Jan 31, 2016

uploading of files beginning in the Swedish characters å, ä, ö ( a with a ring, a with trema= two dots and o with trema = two dots) loose the first letter when uploaded. The letters are preserved alright if they are at any other place in the file name.

Apparently, the PHP-function 'basename' decodes the names alright, except for the first letter.

Using the proposed function ' latinhtmlspecialchars' + my function 'getfilename' (simple searching for the last '/´) gives exactly the same result.

Any clue?

@havet
Copy link
Contributor Author

havet commented Jan 31, 2016

See 'Sort issue' added by madsenfr for the function 'latinhtmlspecialchars'.
You might test on files named e.g.
ål (eel)
älg (moose)
ödla (lizard)

@havet
Copy link
Contributor Author

havet commented Feb 13, 2016

Fixed! Made a pull request.
Added character conversion on:

  1. Listing files
  2. Uploading files

Now file names with national characters are displayed properly and are preserved on uploading.

@NewEraCracker
Copy link
Contributor

Also refer to issue #42 for display of those chars.

I am thinking.

  1. Converting them from os_charset to charset for display (implemented and being tested)
  2. during the upload (where possible) convert backwards (from charset to os_charset).

Let me know what you think about that.

PS: If fixing #42 succeeds the easy way, then this should be a piece of cake as well.

NewEraCracker added a commit to NewEraCracker/encode-explorer that referenced this issue Apr 5, 2016
Will make easier the implementation of marekrei#40 and later fixing of marekrei#37
@havet
Copy link
Contributor Author

havet commented Apr 6, 2016

Hi,
I've tested NewEraCracker's solution to #42
and it works for listing, but not for uploading. "älggräs.txt" get weird on uploading from Windows.
In his solution I have to set $_CONFIG['os_charset'] = "CP1252";
I supposed it would be better to detect the OS, rather than having it set in the PHP-script.
Strangely enough, the files are properly displayed on both Windows, Linux, an Android device and an Ipad using NewEraCracker's solution. But uploading corrupts the filenames, what ever the OS. The server is Linux-Apache. I'm lost!

@NewEraCracker
Copy link
Contributor

I am planning on digging the upload issue during the next days (and hopefully fixing by finding a generic solution that meets both swedish and russian chars - depending on chosen encoding).

@NewEraCracker
Copy link
Contributor

I think upload should now work fine on Linux-Apache (as both, including PHP) use UTF-8 there: NewEraCracker@7300afa

Let me know what happens when encode-explorer is running on Windows.

Thanks.

@havet
Copy link
Contributor Author

havet commented Apr 6, 2016

I've tested on a Linux-Apache server from Windows and Linux: filenames properly encoded (UTF-8) are displayed alright, but uploaded files get weird names in in both cases.
NB I have a working solution for Swedish. I first tried converting CP1252 to UTF-8 and back with iconv, but didn't succeed. Then I suddenly recognized the errors as typical ISO-8859-1 taken for UTF-8. Thus I now do conversion to/from ISO-8859-1. NB You cannot test on files originally uploaded with Encode Explorer: they will display alright if you set the encoding to ISO-8859-1 in the settings (instead of UTF-8)! But I found that solution a bit off: it's better to keep UTF-8 in the PHP-file. Uploading/downloading files with WinSCP transforms the filenames in the very same way as with my solution ( I haven't checked the code, only the result).

@NewEraCracker
Copy link
Contributor

I have committed the following change: NewEraCracker@433e5b0

Hopefully, it will now play nice.

@havet
Copy link
Contributor Author

havet commented Apr 10, 2016

I've just tested uploading from Windows: it doesn't work with Swedish characters. And in a very intriguing way - the first character is omitted if the file names begins in one of the characters å ä ö Å Ä Ö - but (amasing!) those character work if they are placed somewhere else in the file name. Strange, isn't it? I tested the word "Älggräs" (= old name for meadowsweet).

@NewEraCracker
Copy link
Contributor

This should have dealt with the problem: NewEraCracker@d285d07

@havet
Copy link
Contributor Author

havet commented Apr 11, 2016

Congratulations! It works like a charm for Swedish: from Windows as well as from Linux/Android. Both uploading and viewing works as expected. Maybe there should be a comment with a hint to the settings for Windows:
// for Windows use:
//$_CONFIG['os_charset'] = "CP1252";

@NewEraCracker
Copy link
Contributor

Already there: https://github.com/NewEraCracker/encode-explorer/blob/feature/index.php#L126

This is a little "complicated" because it depends on OS regional settings. Also took care to make everyone happy as far it was possible.

PS: Thanks a lot for your feedback! It was very helpful! 😃

@havet
Copy link
Contributor Author

havet commented Apr 12, 2016

Fine! Now I know 3 ways of solving this problem, all of them works for me:

  1. Change the coding of index.php to iso-8859-1 (iso-88-59-5 for Cyrillic)
  2. Translate the encoding with iconv to/from iso-8859-1 (se Non english filenames in windows apache #42) as in my fix:
    https://github.com/havet/encode-explorer/tree/character-conversion
  3. Your solution.

The first solution isn't any good: the php-file should be UTF-8 encoded not to cause other problems with e.g. the translations.

The second is equivalent to your solution, except for that it has to be implemented as an option, as far as I can see. Further studies of encodings reveals that iso-8859-1 should be replaced by the more correct IANA code windows-1252 (I suppose it would have to be windows-1251 for Cyrillic) - making the solution even more similar to your solution.

BTW Maybe a link/reference to some page on encoding standards would be fine in the comment preceding the charset option. This might help anyone using some other encoding than Western European or Cyrillic.

NewEraCracker added a commit to NewEraCracker/encode-explorer that referenced this issue Jul 31, 2016
Encoding translation is now more transparent and handled in a logical,
albeit hackish, way.

Thanks to @havet and @kofbox for their feedback.
Will be followed by a commit that handles the basename() issue.

Tested and working when encode-explorer is running on Windows platform.
NewEraCracker added a commit to NewEraCracker/encode-explorer that referenced this issue Jul 31, 2016
Encoding translation is now more transparent and handled in a logical,
albeit hackish, way.

Thanks to @havet and @kofbox for their feedback.
Will be followed by a commit that handles the basename() issue.

Tested and working when encode-explorer is running on Windows platform.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants