X-Git-Url: http://git.vanrenterghem.biz/git.ikiwiki.info.git/blobdiff_plain/8adf73b8e9cc71bd3036c6874a8e7bbbaeca2293..c2f784d2ad08d1656e5cd7ee2054cf092bb56ce6:/doc/todo/should_use_a_standard_encoding_for_utf_chars_in_filenames.mdwn diff --git a/doc/todo/should_use_a_standard_encoding_for_utf_chars_in_filenames.mdwn b/doc/todo/should_use_a_standard_encoding_for_utf_chars_in_filenames.mdwn index 78d39fc88..5a55fcce5 100644 --- a/doc/todo/should_use_a_standard_encoding_for_utf_chars_in_filenames.mdwn +++ b/doc/todo/should_use_a_standard_encoding_for_utf_chars_in_filenames.mdwn @@ -5,7 +5,7 @@ I hope it's a bug, not a feature and you fix it soon :) --[[Paweł|ptecza]] > ikiwiki only allows a very limited set of characters raw in page names, > this is done as a deny-by-default security thing. All other characters -> need to be encoded in __code__ format, where "code" is the character +> need to be encoded in `__code__` format, where "code" is the character > number. This is normally done for you, but if you're adding a page > manually, you need to handle it yourself. --[[Joey]] @@ -48,6 +48,11 @@ I hope it's a bug, not a feature and you fix it soon :) --[[Paweł|ptecza]] >>>>>> What's your locale? I have both pl\_PL (ISO-8859-2) and pl\_PL.UTF-8, >>>>>> but I use pl\_PL. Is it wrong? --[[Paweł|ptecza]] +>>>>>>> IkiWiki assumes UTF-8 throughout, so escaped filename characters +>>>>>>> should be `__x____y____z__` where x, y, z are the bytes of the +>>>>>>> UTF-8 encoding of the character. I don't know how to achieve that +>>>>>>> from a non-UTF-8 locale. --[[smcv]] + >>>> Now, as to UTF7, in retrospect, using a standard encoding might be a >>>> better idea than coming up with my own encoding for filenames. Can >>>> you provide a pointer to a description to modified-UTF7? --[[Joey]] @@ -60,6 +65,10 @@ I hope it's a bug, not a feature and you fix it soon :) --[[Paweł|ptecza]] > Note: [libencode-imaputf7-perl][1] has made it into debian. > +>> "IMAP UTF-7" uses & as an escape character, which seems like a recipe +>> for shell injection vulnerabilities... so I would not recommend it +>> for this particular use. --[[smcv]] + > I would value some clarification, in the ikiwiki setup file I have > > wiki_file_chars: -[:alnum:][\p{Arabic}()]+/.:_ @@ -73,5 +82,19 @@ I hope it's a bug, not a feature and you fix it soon :) --[[Paweł|ptecza]] > Ikiwiki 3.20140815. > --[[mhameed]] +>> This seems like a bug: in principle non-ASCII in `wiki_file_chars` should work, +>> in practice it does not. I would suggest either using the default +>> `wiki_file_chars`, or digging into the code to find what is wrong. +>> Solving this sort of bug usually requires having a clear picture of +>> which "strings" are bytestrings, and which "strings" are Unicode. --[[smcv]] + +>>> mhameed confirmed on IRC that anarcat's [[patch]] from +>>> [[bugs/garbled_non-ascii_characters_in_body_in_web_interface]] fixes this. +>>> --[[smcv]] + +>>>> Merged that patch. Not marking this page as done, because the todo +>>>> about using a standard encoding still stands (although I'm not at +>>>> all sure there's an encoding that would be better). --[[smcv]] + [[wishlist]] [1]: https://packages.debian.org/search?suite=all§ion=all&arch=any&searchon=names&keywords=libencode-imaputf7-perl