X-Git-Url: http://git.vanrenterghem.biz/git.ikiwiki.info.git/blobdiff_plain/1fc3f034191d3eec78b4d5da343e282092a221be..refs/heads/debian/master:/doc/todo/should_use_a_standard_encoding_for_utf_chars_in_filenames.mdwn diff --git a/doc/todo/should_use_a_standard_encoding_for_utf_chars_in_filenames.mdwn b/doc/todo/should_use_a_standard_encoding_for_utf_chars_in_filenames.mdwn index a454d7da5..5a55fcce5 100644 --- a/doc/todo/should_use_a_standard_encoding_for_utf_chars_in_filenames.mdwn +++ b/doc/todo/should_use_a_standard_encoding_for_utf_chars_in_filenames.mdwn @@ -5,7 +5,7 @@ I hope it's a bug, not a feature and you fix it soon :) --[[Paweł|ptecza]] > ikiwiki only allows a very limited set of characters raw in page names, > this is done as a deny-by-default security thing. All other characters -> need to be encoded in __code__ format, where "code" is the character +> need to be encoded in `__code__` format, where "code" is the character > number. This is normally done for you, but if you're adding a page > manually, you need to handle it yourself. --[[Joey]] @@ -48,6 +48,11 @@ I hope it's a bug, not a feature and you fix it soon :) --[[Paweł|ptecza]] >>>>>> What's your locale? I have both pl\_PL (ISO-8859-2) and pl\_PL.UTF-8, >>>>>> but I use pl\_PL. Is it wrong? --[[Paweł|ptecza]] +>>>>>>> IkiWiki assumes UTF-8 throughout, so escaped filename characters +>>>>>>> should be `__x____y____z__` where x, y, z are the bytes of the +>>>>>>> UTF-8 encoding of the character. I don't know how to achieve that +>>>>>>> from a non-UTF-8 locale. --[[smcv]] + >>>> Now, as to UTF7, in retrospect, using a standard encoding might be a >>>> better idea than coming up with my own encoding for filenames. Can >>>> you provide a pointer to a description to modified-UTF7? --[[Joey]] @@ -58,4 +63,38 @@ I hope it's a bug, not a feature and you fix it soon :) --[[Paweł|ptecza]] >>>>> There is a Perl [Unicode::IMAPUtf7](http://search.cpan.org/~fabpot/Unicode-IMAPUtf7-2.01/lib/Unicode/IMAPUtf7.pm) >>>>> module at the CPAN, but probably it hasn't been debianized yet :( --[[Paweł|ptecza]] +> Note: [libencode-imaputf7-perl][1] has made it into debian. +> +>> "IMAP UTF-7" uses & as an escape character, which seems like a recipe +>> for shell injection vulnerabilities... so I would not recommend it +>> for this particular use. --[[smcv]] + +> I would value some clarification, in the ikiwiki setup file I have +> +> wiki_file_chars: -[:alnum:][\p{Arabic}()]+/.:_ +> +> Ikiwiki doesn't seem to produce any errors on the commandline for this, but +> when I attempt to create a new post with Arabic characters from the web I get the following error : +> +> Error: Cannot decode string with wide characters at /usr/lib/x86_64-linux-gnu/perl/5.20/Encode.pm line 215. +> +> Should the modified regexp not be sufficient? +> Ikiwiki 3.20140815. +> --[[mhameed]] + +>> This seems like a bug: in principle non-ASCII in `wiki_file_chars` should work, +>> in practice it does not. I would suggest either using the default +>> `wiki_file_chars`, or digging into the code to find what is wrong. +>> Solving this sort of bug usually requires having a clear picture of +>> which "strings" are bytestrings, and which "strings" are Unicode. --[[smcv]] + +>>> mhameed confirmed on IRC that anarcat's [[patch]] from +>>> [[bugs/garbled_non-ascii_characters_in_body_in_web_interface]] fixes this. +>>> --[[smcv]] + +>>>> Merged that patch. Not marking this page as done, because the todo +>>>> about using a standard encoding still stands (although I'm not at +>>>> all sure there's an encoding that would be better). --[[smcv]] + [[wishlist]] +[1]: https://packages.debian.org/search?suite=all§ion=all&arch=any&searchon=names&keywords=libencode-imaputf7-perl