I would not be comfortable with merging this into headinganchors and enabling it by
-default for two reasons:
+default for two main reasons:
* it adds a new dependency on [[!cpan Text::Unidecode]]
* Text::Unidecode specifically documents its transliteration as not being stable
Google Translate) hopefully works? (Use a small browser window to make it
clearer where it goes)
+> Can we assume Ikiwiki generates HTML5 all the time? I thought that was still a
+> setting off by default... --[[anarcat]]
+
+>> ikiwiki always generates HTML5, since 3.20150107. The `html5` option has
+>> been repurposed to control whether we generate new-in-HTML5 semantic
+>> markup like `<section>` and `<nav>` (`html5` enabled), or HTML4 equivalents
+>> like `<div>` with a class (`html5` disabled). The default is still off,
+>> although I should probably either toggle it to on or remove the option
+>> altogether in the next release. --s
+
So perhaps we could try this Unicode-aware version of what Pandoc documents:
* Remove footnote links if any (this might have to be heuristic, or we could
an unused identifier
(Or to provide better uniqueness, we could parse the document looking for any existing
-ID, then generate IDs avoiding collisions with any of them.)
+ID, then append `-1`, `-2` to each generated ID until there is no collision.)
This would give us, for example, `## Visiting 北京` → `id="visiting-北京"`
-(where Text::Unidecode would instead transliterate, resulting in `id="visiting-bei-jing"`).
+(whereas Text::Unidecode would instead transliterate, resulting in
+`id="visiting-bei-jing"`).
To use these IDs in fragments, I would be inclined to rely on browsers
supporting [IRIs](https://tools.ietf.org/html/rfc3987): `<a href="#visiting-北京">`.
--[[smcv]]
+> I guess this makes sense. I just wonder how well this is actually supported in all
+> browsers.. I looked around and suspect this will work in more recent browsers, but,
+> as an example, https://caniuse.com/ doesn't have that feature listed in their
+> tables. :) -- [[anarcat]]
+
+>> That might well indicate that all major browsers have always supported it so
+>> there is no need to check. I don't see any particular reason why a browser vendor
+>> would not want to accept arbitrary non-whitespace as a valid anchor.
+>>
+>> In practice, minor or old browsers are probably insecure anyway, so I don't care
+>> too much about supporting them perfectly... --s
+
+> After thinking more about this, I don't feel that IRIs are a good
+> solution. Sure, there are machine-readable ways of encoding
+> non-ASCII characters in URLs. But that's not the point here: the
+> point here is to have *human* readable URLs. In the example I give
+> in the plugin documentation, I mention the french word "liberté"
+> which can easily be transliterated to "liberte". By using the
+> RFC3987 scheme, we could use unicode directly in the links (`a
+> href="#liberté"`), but the actual URL would be encoded as
+> `#libert%e9`, which is really not as pretty.
+>
+> I understand you not wanting to introduce another dependency. And I
+> also worry about the transliteration not being stable across
+> releases. After all, it might not even be stable across Unicode
+> releases either! But I'm ready to live with that inconvenience for
+> the user-friendliness of the resulting URLs. --[[anarcat]]
+
+----
+
+Documentation says:
+
+> _Also note that all heading attributes are overriden with the ID tag. If this
+> is not desirable, we'd need to fire up a full HTML::Parser or do some more
+> regex magic to preserve the attributes other than id= which we want to keep._
+
+I think this is a bug, particularly if you are using Pandoc's
+[header attributes](http://pandoc.org/MANUAL.html#extension-header_attributes)
+or similar.
+
+> It's not a bug, it's a limitation. :) But sure, it's a thing. It's an issue in
+> headinganchors as well of course. -- [[anarcat]]
+
+>> No, current/historical headinganchors has a different bug: it ignores headings
+>> that have any attributes, and does not generate anchors for them. That gives it
+>> degraded functionality, but no information loss. I think that's less bad. --s
+
+I think we should try to use an existing ID before generating our own, with the
+generation step as a fallback, just like Pandoc does. If a htmlize layer like
+Text::MultiMarkdown or Pandoc is generating worse IDs than this plugin, the
+the right solution to that is to send a bug report / feature request to
+make its IDs as good as this plugin's, or turn off ID generation in the
+htmlize layer, or stop using Text::MultiMarkdown.
+
+--[[smcv]]
+
+> Agreed. However, the situation I was in was that multimarkdown *and* the
+> headinganchors plugins had issues I had to fix. So it was better and easier
+> for me to just override whatever attributes were there for testing and
+> fixing this in the short term... -- [[anarcat]]
+
+> To bounce on this again: my problem with keeping existing IDs is
+> that it basically makes headinganchors fail to do anything if
+> something else adds the anchors. So I understand where you're coming
+> from with this, but that "bug" was introduced on purpose, to
+> actually fix a problem I was having.
+>
+> So I understand you might not want to *replace* headinganchors
+> completely with this module, but could we at least merge it in so I
+> wouldn't have to carry this patch around forever? :) Or what's our
+> way forward here?
+>
+> Thanks! -- [[anarcat]]
+
----
<pre>Some long scrollable text
.
.
.</pre>
+
+> This works for me on ` Mozilla/5.0 (X11; Linux x86_64; rv:50.0) Gecko/20100101 Firefox/50.0` on Debian stretch, FWIW. --[[anarcat]]