1 The u32 page is excellent, but I wonder if documenting the procedure here
2 would be worthwhile. Who knows, the remote site might disappear. But also
3 there are some variations on the approach that might be useful:
5 * using a python script and the dom library to extract the page names from
6 Special:Allpages (such as
7 <http://www.staff.ncl.ac.uk/jon.dowland/unix/docs/get_pagenames.py>)
8 * Or, querying the mysql back-end to get the names
9 * using WWW::MediaWiki for importing/exporting pages from the wiki, instead
12 Also, some detail on converting mediawiki transclusion to ikiwiki inlines...
16 > "Who knows, the remote site might disappear.". Right now, it appears to
17 > have done just that. -- [[users/Jon]]
20 The iki-fast-load ruby script from the u32 page is given below:
24 # This script is called on the final sorted, de-spammed revision
27 # It doesn't currently check for no-op revisions... I believe
28 # that git-fast-load will dutifully load them even though nothing
29 # happened. I don't care to solve this by adding a file cache
30 # to this script. You can run iki-diff-next.rb to highlight any
31 # empty revisions that need to be removed.
33 # This turns each node into an equivalent file.
34 # It does not convert spaces to underscores in file names.
35 # This would break wikilinks.
36 # I suppose you could fix this with mod_speling or mod_rewrite.
38 # It replaces nodes in the Image: namespace with the files themselves.
42 require 'node-callback'
47 # pipe is the stream to receive the git-fast-import commands
48 # putfrom is true if this branch has existing commits on it, false if not.
49 def format_git_commit(pipe, f)
50 # Need to escape backslashes and double-quotes for git?
51 # No, git breaks when I do this.
52 # For the filename "path with \\", git sez: bad default revision 'HEAD'
53 # filename = '"' + filename.gsub('\\', '\\\\\\\\').gsub('"', '\\"') + '"'
55 # In the calls below, length must be the size in bytes!!
56 # TODO: I haven't figured out how this works in the land of UTF8 and Ruby 1.9.
57 pipe.puts "commit #{f.branch}"
58 pipe.puts "committer #{f.username} <#{f.email}> #{f.timestamp.rfc2822}"
59 pipe.puts "data #{f.message.length}\n#{f.message}\n"
60 pipe.puts "from #{f.branch}^0" if f.putfrom
61 pipe.puts "M 644 inline #{f.filename}"
62 pipe.puts "data #{f.content.length}\n#{f.content}\n"
66 > Would be nice to know where you could get "node-callbacks"... this thing is useless without it. --[[users/simonraven]]
69 Mediawiki.pm - A plugin which supports mediawiki format.
72 # By Scott Bronson. Licensed under the GPLv2+ License.
73 # Extends Ikiwiki to be able to handle Mediawiki markup.
75 # To use the Mediawiki Plugin:
76 # - Install Text::MediawikiFormat
77 # - Turn of prefix_directives in your setup file.
78 # (TODO: we probably don't need to do this anymore?)
79 # prefix_directives => 1,
80 # - Add this plugin on Ikiwiki's path (perl -V, look for @INC)
81 # cp mediawiki.pm something/IkiWiki/Plugin
82 # - And enable it in your setup file
83 # add_plugins => [qw{mediawiki}],
84 # - Finally, turn off the link plugin in setup (this is important)
85 # disable_plugins => [qw{link}],
86 # - Rebuild everything (actually, this should be automatic right?)
87 # - Now all files with a .mediawiki extension should be rendered properly.
90 package IkiWiki::Plugin::mediawiki;
98 # This is a gross hack... We disable the link plugin so that our
99 # linkify routine is always called. Then we call the link plugin
100 # directly for all non-mediawiki pages. Ouch... Hopefully Ikiwiki
101 # will be updated soon to support multiple link plugins.
102 require IkiWiki::Plugin::link;
104 # Even if T:MwF is not installed, we can still handle all the linking.
105 # The user will just see Mediawiki markup rather than formatted markup.
106 eval q{use Text::MediawikiFormat ()};
107 my $markup_disabled = $@;
109 # Work around a UTF8 bug in Text::MediawikiFormat
110 # http://rt.cpan.org/Public/Bug/Display.html?id=26880
111 unless($markup_disabled) {
114 *{'Text::MediawikiFormat::uri_escape'} = \&URI::Escape::uri_escape_utf8;
117 my %metaheaders; # keeps track of redirects for pagetemplate.
118 my %tags; # keeps track of tags for pagetemplate.
122 hook(type => "checkconfig", id => "mediawiki", call => \&checkconfig);
123 hook(type => "scan", id => "mediawiki", call => \&scan);
124 hook(type => "linkify", id => "mediawiki", call => \&linkify);
125 hook(type => "htmlize", id => "mediawiki", call => \&htmlize);
126 hook(type => "pagetemplate", id => "mediawiki", call => \&pagetemplate);
132 return IkiWiki::Plugin::link::checkconfig(@_);
136 my $link_regexp = qr{
137 \[\[(?=[^!]) # beginning of link
138 ([^\n\r\]#|<>]+) # 1: page to link to
140 \# # '#', beginning of anchor
141 ([^|\]]+) # 2: anchor text
146 ([^\]\|]*) # 3: link text
149 ([a-zA-Z]*) # optional trailing alphas
153 # Convert spaces in the passed-in string into underscores.
154 # If passed in undef, returns undef without throwing errors.
158 $var =~ tr{ }{_} if $var;
163 # Underscorize, strip leading and trailing space, and scrunch
164 # multiple runs of spaces into one underscore.
169 $var =~ s/^\s+|\s+$//g; # strip leading and trailing space
170 $var =~ s/\s+/ /g; # squash multiple spaces to one
176 # Translates Mediawiki paths into Ikiwiki paths.
177 # It needs to be pretty careful because Mediawiki and Ikiwiki handle
178 # relative vs. absolute exactly opposite from each other.
182 my $path = scrunch(shift);
184 # always start from root unless we're doing relative shenanigans.
185 $page = "/" unless $path =~ /^(?:\/|\.\.)/;
188 for(split(/\//, "$page/$path")) {
192 push @result, $_ if $_ ne "";
196 # temporary hack working around http://ikiwiki.info/bugs/Can__39__t_create_root_page/index.html?updated
197 # put this back the way it was once this bug is fixed upstream.
198 # This is actually a major problem because now Mediawiki pages can't link from /Git/git-svn to /git-svn. And upstream appears to be uninterested in fixing this bug. :(
199 # return "/" . join("/", @result);
200 return join("/", @result);
204 # Figures out the human-readable text for a wikilink
207 my($page, $inlink, $anchor, $title, $trailing) = @_;
208 my $link = translate_path($page,$inlink);
210 # translate_path always produces an absolute link.
211 # get rid of the leading slash before we display this link.
216 $out = IkiWiki::pagetitle($title);
218 $link = $inlink if $inlink =~ /^\s*\//;
219 $out = $anchor ? "$link#$anchor" : $link;
220 if(defined $title && $title eq "") {
221 # a bare pipe appeared in the link...
222 # user wants to strip namespace and trailing parens.
223 $out =~ s/^[A-Za-z0-9_-]*://;
224 $out =~ s/\s*\(.*\)\s*$//;
226 # A trailing slash suppresses the leading slash
227 $out =~ s#^/(.*)/$#$1#;
229 $out .= $trailing if defined $trailing;
238 if (exists $config{tagbase} && defined $config{tagbase}) {
239 $tag=$config{tagbase}."/".$tag;
246 # Pass a URL and optional text associated with it. This call turns
247 # it into fully-formatted HTML the same way Mediawiki would.
248 # Counter is used to number untitled links sequentially on the page.
249 # It should be set to 1 when you start parsing a new page. This call
250 # increments it automatically.
251 sub generate_external_link
257 # Mediawiki trims off trailing commas.
258 # And apparently it does entity substitution first.
259 # Since we can't, we'll fake it.
261 # trim any leading and trailing whitespace
262 $url =~ s/^\s+|\s+$//g;
264 # url properly terminates on > but must special-case >
266 $url =~ s{(\&(?:gt|lt)\;.*)$}{ $trailer = $1, ''; }eg;
268 # Trim some potential trailing chars, put them outside the link.
270 $url =~ s{([,)]+)$}{ $tmptrail .= $1, ''; }eg;
271 $trailer = $tmptrail . $trailer;
276 $text = "[$$counter]";
279 $text =~ s/^\s+|\s+$//g;
285 return "<a href='$url' title='$title'>$text</a>$trailer";
289 # Called to handle bookmarks like \[[#heading]] or <span class="createlink"><a href="http://u32.net/cgi-bin/ikiwiki.cgi?page=%20text%20&from=Mediawiki_Plugin%2Fmediawiki&do=create" rel="nofollow">?</a>#a</span>
290 sub generate_fragment_link
297 $url = scrunch($url);
299 if(defined($text) && $text ne "") {
300 $text = scrunch($text);
305 $url = underscorize($url);
307 # For some reason Mediawiki puts blank titles on all its fragment links.
308 # I don't see why we would duplicate that behavior here.
309 return "<a href='$url'>$text</a>";
313 sub generate_internal_link
315 my($page, $inlink, $anchor, $title, $trailing, $proc) = @_;
317 # Ikiwiki's link link plugin wrecks this line when displaying on the site.
318 # Until the code highlighter plugin can turn off link finding,
319 # always escape double brackets in double quotes: \[[
320 if($inlink eq '..') {
321 # Mediawiki doesn't touch links like \[[..#hi|ho]].
322 return "\[[" . $inlink . ($anchor?"#$anchor":"") .
323 ($title?"|$title":"") . "]]" . $trailing;
326 my($linkpage, $linktext);
327 if($inlink =~ /^ (:?) \s* Category (\s* \: \s*) ([^\]]*) $/x) {
328 # Handle category links
331 $linkpage = IkiWiki::linkpage(translate_path($page, $inlink));
333 # Produce a link but don't add this page to the given category.
334 $linkpage = tagpage($linkpage);
335 $linktext = ($title ? '' : "Category$sep") .
336 linktext($page, $inlink, $anchor, $title, $trailing);
337 $tags{$page}{$linkpage} = 1;
339 # Add this page to the given category but don't produce a link.
340 $tags{$page}{$linkpage} = 1;
341 &$proc(tagpage($linkpage), $linktext, $anchor);
345 # It's just a regular link
346 $linkpage = IkiWiki::linkpage(translate_path($page, $inlink));
347 $linktext = linktext($page, $inlink, $anchor, $title, $trailing);
350 return &$proc($linkpage, $linktext, $anchor);
358 my $page=$params{page};
359 my $destpage=$params{destpage};
360 my $content=$params{content};
362 return "" if $page ne $destpage;
364 if($content !~ /^ \s* \#REDIRECT \s* \[\[ ( [^\]]+ ) \]\]/x) {
365 # this page isn't a redirect, render it normally.
369 # The rest of this function is copied from the redir clause
370 # in meta::preprocess and actually handles the redirect.
373 $value =~ s/^\s+|\s+$//g;
376 if ($value !~ /^\w+:\/\//) {
378 my ($redir_page, $redir_anchor) = split /\#/, $value;
380 add_depends($page, $redir_page);
381 my $link=bestlink($page, underscorize(translate_path($page,$redir_page)));
382 if (! length $link) {
383 return "<b>Redirect Error:</b> <nowiki>\[[$redir_page]] not found.</nowiki>";
386 $value=urlto($link, $page);
387 $value.='#'.$redir_anchor if defined $redir_anchor;
390 # redir cycle detection
391 $pagestate{$page}{mediawiki}{redir}=$link;
394 while (exists $pagestate{$at}{mediawiki}{redir}) {
396 return "<b>Redirect Error:</b> cycle found on <nowiki>\[[$at]]</nowiki>";
399 $at=$pagestate{$at}{mediawiki}{redir};
402 # it's an external link
403 $value = encode_entities($value);
406 my $redir="<meta http-equiv=\"refresh\" content=\"0; URL=$value\" />";
407 $redir=scrub($redir) if !$safe;
408 push @{$metaheaders{$page}}, $redir;
410 return "Redirecting to $value ...";
414 # Feed this routine a string containing <nowiki>...</nowiki> sections,
415 # this routine calls your callback for every section not within nowikis,
416 # collecting its return values and returning the rewritten string.
425 for(split(/(<nowiki[^>]*>.*?<\/nowiki\s*>)/s, $content)) {
426 $result .= ($state ? $_ : &$proc($_));
434 # Converts all links in the page, wiki and otherwise.
439 my $page=$params{page};
440 my $destpage=$params{destpage};
441 my $content=$params{content};
443 my $file=$pagesources{$page};
444 my $type=pagetype($file);
447 if($type ne 'mediawiki') {
448 return IkiWiki::Plugin::link::linkify(@_);
451 my $redir = check_redirect(%params);
452 return $redir if defined $redir;
454 # this code was copied from MediawikiFormat.pm.
455 # Heavily changed because MF.pm screws up escaping when it does
456 # this awful hack: $uricCheat =~ tr/://d;
457 my $schemas = [qw(http https ftp mailto gopher)];
458 my $re = join "|", map {qr/\Q$_\E/} @$schemas;
459 my $schemes = qr/(?:$re)/;
460 # And this is copied from URI:
461 my $reserved = q(;/?@&=+$,); # NOTE: no colon or [] !
462 my $uric = quotemeta($reserved) . $URI::unreserved . "%#";
464 my $result = skip_nowiki($content, sub {
468 #s/<(a[\s>\/])/<$1/ig;
469 # Disabled because this appears to screw up the aggregate plugin.
470 # I guess we'll rely on Iki to post-sanitize this sort of stuff.
472 # Replace external links, http://blah or [http://blah]
473 s{\b($schemes:[$uric][:$uric]+)|\[($schemes:[$uric][:$uric]+)([^\]]*?)\]}{
474 generate_external_link($1||$2, $3, \$counter)
477 # Handle links that only contain fragments.
478 s{ \[\[ \s* (\#[^|\]'"<>&;]+) (?:\| ([^\]'"<>&;]*))? \]\] }{
479 generate_fragment_link($1, $2)
482 # Match all internal links
484 generate_internal_link($page, $1, $2, $3, $4, sub {
485 my($linkpage, $linktext, $anchor) = @_;
486 return htmllink($page, $destpage, $linkpage,
487 linktext => $linktext,
488 anchor => underscorize(scrunch($anchor)));
499 # Find all WikiLinks in the page.
503 my $page=$params{page};
504 my $content=$params{content};
506 my $file=$pagesources{$page};
507 my $type=pagetype($file);
509 if($type ne 'mediawiki') {
510 return IkiWiki::Plugin::link::scan(@_);
513 skip_nowiki($content, sub {
515 while(/$link_regexp/g) {
516 generate_internal_link($page, $1, '', '', '', sub {
517 my($linkpage, $linktext, $anchor) = @_;
518 push @{$links{$page}}, $linkpage;
527 # Convert the page to HTML.
531 my $page = $params{page};
532 my $content = $params{content};
535 return $content if $markup_disabled;
537 # Do a little preprocessing to babysit Text::MediawikiFormat
538 # If a line begins with tabs, T:MwF won't convert it into preformatted blocks.
539 $content =~ s/^\t/ /mg;
541 my $ret = Text::MediawikiFormat::format($content, {
543 allowed_tags => [#HTML
544 # MediawikiFormat default
545 qw(b big blockquote br caption center cite code dd
546 div dl dt em font h1 h2 h3 h4 h5 h6 hr i li ol p
547 pre rb rp rt ruby s samp small strike strong sub
548 sup table td th tr tt u ul var),
552 qw(del ins), # These should have been added all along.
553 qw(span), # Mediawiki allows span but that's rather scary...?
554 qw(a), # this is unfortunate; should handle links after rendering the page.
558 qw(title align lang dir width height bgcolor),
561 qw(cite), # BLOCKQUOTE, Q
562 qw(size face color), # FONT
563 # For various lists, mostly deprecated but safe
564 qw(type start value compact),
566 qw(summary width border frame rules cellspacing
567 cellpadding valign char charoff colgroup col
568 span abbr axis headers scope rowspan colspan),
569 qw(id class name style), # For CSS
584 # This is only needed to support the check_redirect call.
588 my $page = $params{page};
589 my $destpage = $params{destpage};
590 my $template = $params{template};
592 # handle metaheaders for redirects
593 if (exists $metaheaders{$page} && $template->query(name => "meta")) {
594 # avoid duplicate meta lines
596 $template->param(meta => join("\n", grep { (! $seen{$_}) && ($seen{$_}=1) } @{$metaheaders{$page}}));
599 $template->param(tags => [
601 link => htmllink($page, $destpage, tagpage($_), rel => "tag")
602 }, sort keys %{$tags{$page}}
603 ]) if exists $tags{$page} && %{$tags{$page}} && $template->query(name => "tags");
605 # It's an rss/atom template. Add any categories.
606 if ($template->query(name => "categories")) {
607 if (exists $tags{$page} && %{$tags{$page}}) {
608 $template->param(categories => [map { category => $_ },
609 sort keys %{$tags{$page}}]);