]> git.vanrenterghem.biz Git - git.ikiwiki.info.git/blobdiff - doc/bugs/garbled_non-ascii_characters_in_body_in_web_interface.mdwn
(no commit message)
[git.ikiwiki.info.git] / doc / bugs / garbled_non-ascii_characters_in_body_in_web_interface.mdwn
index ad7b27cfee5ff3dc5710d6e83b0e96f4ce76d935..e80c52ba68f9102d7e759037383da9cca3dc011c 100644 (file)
@@ -17,6 +17,68 @@ anarcat@marcos:ikiwiki$ echo "P�re" | hd
 00000007
 ~~~~
 
+> I don't know what that is, but it isn't the usual double-UTF-8 encoding:
+>
+>     >>> u'è'.encode('utf-8')
+>     '\xc3\xa8'
+>     >>> u'è'.encode('utf-8').decode('latin-1').encode('utf-8')
+>     '\xc3\x83\xc2\xa8'
+>
+> A packet capture of the incorrect HTTP request/response headers and body
+> might be enlightening? --[[smcv]]
+>
+> > Here are the headers according to chromium:
+> > 
+> > ~~~~
+> > GET /ikiwiki.cgi?do=edit&page=wishlist HTTP/1.1
+> > Host: anarc.at
+> > Connection: keep-alive
+> > Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
+> > User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36
+> > Referer: http://anarc.at/wishlist/
+> > Accept-Encoding: gzip,deflate,sdch
+> > Accept-Language: fr,en-US;q=0.8,en;q=0.6
+> > Cookie: openid_provider=openid; ikiwiki_session_anarcat=XXXXXXXXXXXXXXXXXXXXXXX
+> > 
+> > HTTP/1.1 200 OK
+> > Date: Mon, 08 Sep 2014 21:22:24 GMT
+> > Server: Apache/2.4.10 (Debian)
+> > Set-Cookie: ikiwiki_session_anarcat=XXXXXXXXXXXXXXXXXXXXXXX; path=/; HttpOnly
+> > Vary: Accept-Encoding
+> > Content-Encoding: gzip
+> > Content-Length: 4093
+> > Keep-Alive: timeout=5, max=100
+> > Connection: Keep-Alive
+> > Content-Type: text/html; charset=utf-8
+> > ~~~~
+> > 
+> > ... which seem fairly normal... getting more data than this is a little inconvenient since the data is gzip-encoded and i'm kind of lazy extracting that from the stream. Chromium does seem to auto-detect it as utf8 according to the menus however... not sure what's going on here. I would focus on the following error however, since it's clearly emanating from the CGI... --[[anarcat]]
+
+Clicking on the Cancel button yields the following warning:
+
+~~~~
+Error: Cannot decode string with wide characters at /usr/lib/x86_64-linux-gnu/perl/5.20/Encode.pm line 215.
+~~~~
+
+> Looks as though you might be able to get a Python-style backtrace for this
+> by setting `$Carp::Verbose = 1`.
+>
+> The error is that we're taking some string (which string? only a backtrace
+> would tell you) that is already flagged as Unicode, and trying to decode
+> it from byte-blob to Unicode again, analogous to this Python:
+>
+>     some_bytes.decode('utf-8').decode('utf-8')
+>
+> --[[smcv]]
+
+The apache logs yield:
+
+~~~~
+[Mon Sep 08 16:17:43.995827 2014] [cgi:error] [pid 2609] [client 192.168.0.3:47445] AH01215: Died at /usr/share/perl5/IkiWiki/CGI.pm line 467., referer: http://anarc.at/ikiwiki.cgi?do=edit&page=wishlist
+~~~~
+
+Interestingly enough, I can't reproduce the bug here (at least in this page). Also, editing the page through git works fine.
+
 I had put ikiwiki on hold during the last upgrade, so it was upgraded separately. The bug happens both with 3.20140613 and 3.20140831. The major thing that happened today is the upgrade from perl 5.18 to 5.20. Here's the output of `egrep '[0-9] (remove|purge|install|upgrade)' /var/log/dpkg.log | pastebinit -b paste.debian.net` to give an idea of what was upgraded today:
 
 http://paste.debian.net/plain/119944