1 The u32 page is excellent, but I wonder if documenting the procedure here
2 would be worthwhile. Who knows, the remote site might disappear. But also
3 there are some variations on the approach that might be useful:
5 * using a python script and the dom library to extract the page names from
6 Special:Allpages (such as
7 <http://www.staff.ncl.ac.uk/jon.dowland/unix/docs/get_pagenames.py>)
8 * Or, querying the mysql back-end to get the names
9 * using WWW::MediaWiki for importing/exporting pages from the wiki, instead
12 Also, some detail on converting mediawiki transclusion to ikiwiki inlines...
16 > "Who knows, the remote site might disappear.". Right now, it appears to
17 > have done just that. -- [[users/Jon]]
20 The iki-fast-load ruby script from the u32 page is given below:
24 # This script is called on the final sorted, de-spammed revision
27 # It doesn't currently check for no-op revisions... I believe
28 # that git-fast-load will dutifully load them even though nothing
29 # happened. I don't care to solve this by adding a file cache
30 # to this script. You can run iki-diff-next.rb to highlight any
31 # empty revisions that need to be removed.
33 # This turns each node into an equivalent file.
34 # It does not convert spaces to underscores in file names.
35 # This would break wikilinks.
36 # I suppose you could fix this with mod_speling or mod_rewrite.
38 # It replaces nodes in the Image: namespace with the files themselves.
42 require 'node-callback'
47 # pipe is the stream to receive the git-fast-import commands
48 # putfrom is true if this branch has existing commits on it, false if not.
49 def format_git_commit(pipe, f)
50 # Need to escape backslashes and double-quotes for git?
51 # No, git breaks when I do this.
52 # For the filename "path with \\", git sez: bad default revision 'HEAD'
53 # filename = '"' + filename.gsub('\\', '\\\\\\\\').gsub('"', '\\"') + '"'
55 # In the calls below, length must be the size in bytes!!
56 # TODO: I haven't figured out how this works in the land of UTF8 and Ruby 1.9.
57 pipe.puts "commit #{f.branch}"
58 pipe.puts "committer #{f.username} <#{f.email}> #{f.timestamp.rfc2822}"
59 pipe.puts "data #{f.message.length}\n#{f.message}\n"
60 pipe.puts "from #{f.branch}^0" if f.putfrom
61 pipe.puts "M 644 inline #{f.filename}"
62 pipe.puts "data #{f.content.length}\n#{f.content}\n"
67 Mediawiki.pm - A plugin which supports mediawiki format.
70 # By Scott Bronson. Licensed under the GPLv2+ License.
71 # Extends Ikiwiki to be able to handle Mediawiki markup.
73 # To use the Mediawiki Plugin:
74 # - Install Text::MediawikiFormat
75 # - Turn of prefix_directives in your setup file.
76 # (TODO: we probably don't need to do this anymore?)
77 # prefix_directives => 1,
78 # - Add this plugin on Ikiwiki's path (perl -V, look for @INC)
79 # cp mediawiki.pm something/IkiWiki/Plugin
80 # - And enable it in your setup file
81 # add_plugins => [qw{mediawiki}],
82 # - Finally, turn off the link plugin in setup (this is important)
83 # disable_plugins => [qw{link}],
84 # - Rebuild everything (actually, this should be automatic right?)
85 # - Now all files with a .mediawiki extension should be rendered properly.
88 package IkiWiki::Plugin::mediawiki;
96 # This is a gross hack... We disable the link plugin so that our
97 # linkify routine is always called. Then we call the link plugin
98 # directly for all non-mediawiki pages. Ouch... Hopefully Ikiwiki
99 # will be updated soon to support multiple link plugins.
100 require IkiWiki::Plugin::link;
102 # Even if T:MwF is not installed, we can still handle all the linking.
103 # The user will just see Mediawiki markup rather than formatted markup.
104 eval q{use Text::MediawikiFormat ()};
105 my $markup_disabled = $@;
107 # Work around a UTF8 bug in Text::MediawikiFormat
108 # http://rt.cpan.org/Public/Bug/Display.html?id=26880
109 unless($markup_disabled) {
112 *{'Text::MediawikiFormat::uri_escape'} = \&URI::Escape::uri_escape_utf8;
115 my %metaheaders; # keeps track of redirects for pagetemplate.
116 my %tags; # keeps track of tags for pagetemplate.
120 hook(type => "checkconfig", id => "mediawiki", call => \&checkconfig);
121 hook(type => "scan", id => "mediawiki", call => \&scan);
122 hook(type => "linkify", id => "mediawiki", call => \&linkify);
123 hook(type => "htmlize", id => "mediawiki", call => \&htmlize);
124 hook(type => "pagetemplate", id => "mediawiki", call => \&pagetemplate);
130 return IkiWiki::Plugin::link::checkconfig(@_);
134 my $link_regexp = qr{
135 \[\[(?=[^!]) # beginning of link
136 ([^\n\r\]#|<>]+) # 1: page to link to
138 \# # '#', beginning of anchor
139 ([^|\]]+) # 2: anchor text
144 ([^\]\|]*) # 3: link text
147 ([a-zA-Z]*) # optional trailing alphas
151 # Convert spaces in the passed-in string into underscores.
152 # If passed in undef, returns undef without throwing errors.
156 $var =~ tr{ }{_} if $var;
161 # Underscorize, strip leading and trailing space, and scrunch
162 # multiple runs of spaces into one underscore.
167 $var =~ s/^\s+|\s+$//g; # strip leading and trailing space
168 $var =~ s/\s+/ /g; # squash multiple spaces to one
174 # Translates Mediawiki paths into Ikiwiki paths.
175 # It needs to be pretty careful because Mediawiki and Ikiwiki handle
176 # relative vs. absolute exactly opposite from each other.
180 my $path = scrunch(shift);
182 # always start from root unless we're doing relative shenanigans.
183 $page = "/" unless $path =~ /^(?:\/|\.\.)/;
186 for(split(/\//, "$page/$path")) {
190 push @result, $_ if $_ ne "";
194 # temporary hack working around http://ikiwiki.info/bugs/Can__39__t_create_root_page/index.html?updated
195 # put this back the way it was once this bug is fixed upstream.
196 # This is actually a major problem because now Mediawiki pages can't link from /Git/git-svn to /git-svn. And upstream appears to be uninterested in fixing this bug. :(
197 # return "/" . join("/", @result);
198 return join("/", @result);
202 # Figures out the human-readable text for a wikilink
205 my($page, $inlink, $anchor, $title, $trailing) = @_;
206 my $link = translate_path($page,$inlink);
208 # translate_path always produces an absolute link.
209 # get rid of the leading slash before we display this link.
214 $out = IkiWiki::pagetitle($title);
216 $link = $inlink if $inlink =~ /^\s*\//;
217 $out = $anchor ? "$link#$anchor" : $link;
218 if(defined $title && $title eq "") {
219 # a bare pipe appeared in the link...
220 # user wants to strip namespace and trailing parens.
221 $out =~ s/^[A-Za-z0-9_-]*://;
222 $out =~ s/\s*\(.*\)\s*$//;
224 # A trailing slash suppresses the leading slash
225 $out =~ s#^/(.*)/$#$1#;
227 $out .= $trailing if defined $trailing;
236 if (exists $config{tagbase} && defined $config{tagbase}) {
237 $tag=$config{tagbase}."/".$tag;
244 # Pass a URL and optional text associated with it. This call turns
245 # it into fully-formatted HTML the same way Mediawiki would.
246 # Counter is used to number untitled links sequentially on the page.
247 # It should be set to 1 when you start parsing a new page. This call
248 # increments it automatically.
249 sub generate_external_link
255 # Mediawiki trims off trailing commas.
256 # And apparently it does entity substitution first.
257 # Since we can't, we'll fake it.
259 # trim any leading and trailing whitespace
260 $url =~ s/^\s+|\s+$//g;
262 # url properly terminates on > but must special-case >
264 $url =~ s{(\&(?:gt|lt)\;.*)$}{ $trailer = $1, ''; }eg;
266 # Trim some potential trailing chars, put them outside the link.
268 $url =~ s{([,)]+)$}{ $tmptrail .= $1, ''; }eg;
269 $trailer = $tmptrail . $trailer;
274 $text = "[$$counter]";
277 $text =~ s/^\s+|\s+$//g;
283 return "<a href='$url' title='$title'>$text</a>$trailer";
287 # Called to handle bookmarks like [[#heading]] or <span class="createlink"><a href="http://u32.net/cgi-bin/ikiwiki.cgi?page=%20text%20&from=Mediawiki_Plugin%2Fmediawiki&do=create" rel="nofollow">?</a>#a</span>
288 sub generate_fragment_link
295 $url = scrunch($url);
297 if(defined($text) && $text ne "") {
298 $text = scrunch($text);
303 $url = underscorize($url);
305 # For some reason Mediawiki puts blank titles on all its fragment links.
306 # I don't see why we would duplicate that behavior here.
307 return "<a href='$url'>$text</a>";
311 sub generate_internal_link
313 my($page, $inlink, $anchor, $title, $trailing, $proc) = @_;
315 # Ikiwiki's link link plugin wrecks this line when displaying on the site.
316 # Until the code highlighter plugin can turn off link finding,
317 # always escape double brackets in double quotes: [[
318 if($inlink eq '..') {
319 # Mediawiki doesn't touch links like [[..#hi|ho]].
320 return "[[" . $inlink . ($anchor?"#$anchor":"") .
321 ($title?"|$title":"") . "]]" . $trailing;
324 my($linkpage, $linktext);
325 if($inlink =~ /^ (:?) \s* Category (\s* \: \s*) ([^\]]*) $/x) {
326 # Handle category links
329 $linkpage = IkiWiki::linkpage(translate_path($page, $inlink));
331 # Produce a link but don't add this page to the given category.
332 $linkpage = tagpage($linkpage);
333 $linktext = ($title ? '' : "Category$sep") .
334 linktext($page, $inlink, $anchor, $title, $trailing);
335 $tags{$page}{$linkpage} = 1;
337 # Add this page to the given category but don't produce a link.
338 $tags{$page}{$linkpage} = 1;
339 &$proc(tagpage($linkpage), $linktext, $anchor);
343 # It's just a regular link
344 $linkpage = IkiWiki::linkpage(translate_path($page, $inlink));
345 $linktext = linktext($page, $inlink, $anchor, $title, $trailing);
348 return &$proc($linkpage, $linktext, $anchor);
356 my $page=$params{page};
357 my $destpage=$params{destpage};
358 my $content=$params{content};
360 return "" if $page ne $destpage;
362 if($content !~ /^ \s* \#REDIRECT \s* \[\[ ( [^\]]+ ) \]\]/x) {
363 # this page isn't a redirect, render it normally.
367 # The rest of this function is copied from the redir clause
368 # in meta::preprocess and actually handles the redirect.
371 $value =~ s/^\s+|\s+$//g;
374 if ($value !~ /^\w+:\/\//) {
376 my ($redir_page, $redir_anchor) = split /\#/, $value;
378 add_depends($page, $redir_page);
379 my $link=bestlink($page, underscorize(translate_path($page,$redir_page)));
380 if (! length $link) {
381 return "<b>Redirect Error:</b> <nowiki>[[$redir_page]] not found.</nowiki>";
384 $value=urlto($link, $page);
385 $value.='#'.$redir_anchor if defined $redir_anchor;
388 # redir cycle detection
389 $pagestate{$page}{mediawiki}{redir}=$link;
392 while (exists $pagestate{$at}{mediawiki}{redir}) {
394 return "<b>Redirect Error:</b> cycle found on <nowiki>[[$at]]</nowiki>";
397 $at=$pagestate{$at}{mediawiki}{redir};
400 # it's an external link
401 $value = encode_entities($value);
404 my $redir="<meta http-equiv=\"refresh\" content=\"0; URL=$value\" />";
405 $redir=scrub($redir) if !$safe;
406 push @{$metaheaders{$page}}, $redir;
408 return "Redirecting to $value ...";
412 # Feed this routine a string containing <nowiki>...</nowiki> sections,
413 # this routine calls your callback for every section not within nowikis,
414 # collecting its return values and returning the rewritten string.
423 for(split(/(<nowiki[^>]*>.*?<\/nowiki\s*>)/s, $content)) {
424 $result .= ($state ? $_ : &$proc($_));
432 # Converts all links in the page, wiki and otherwise.
437 my $page=$params{page};
438 my $destpage=$params{destpage};
439 my $content=$params{content};
441 my $file=$pagesources{$page};
442 my $type=pagetype($file);
445 if($type ne 'mediawiki') {
446 return IkiWiki::Plugin::link::linkify(@_);
449 my $redir = check_redirect(%params);
450 return $redir if defined $redir;
452 # this code was copied from MediawikiFormat.pm.
453 # Heavily changed because MF.pm screws up escaping when it does
454 # this awful hack: $uricCheat =~ tr/://d;
455 my $schemas = [qw(http https ftp mailto gopher)];
456 my $re = join "|", map {qr/\Q$_\E/} @$schemas;
457 my $schemes = qr/(?:$re)/;
458 # And this is copied from URI:
459 my $reserved = q(;/?@&=+$,); # NOTE: no colon or [] !
460 my $uric = quotemeta($reserved) . $URI::unreserved . "%#";
462 my $result = skip_nowiki($content, sub {
466 #s/<(a[\s>\/])/<$1/ig;
467 # Disabled because this appears to screw up the aggregate plugin.
468 # I guess we'll rely on Iki to post-sanitize this sort of stuff.
470 # Replace external links, http://blah or [http://blah]
471 s{\b($schemes:[$uric][:$uric]+)|\[($schemes:[$uric][:$uric]+)([^\]]*?)\]}{
472 generate_external_link($1||$2, $3, \$counter)
475 # Handle links that only contain fragments.
476 s{ \[\[ \s* (\#[^|\]'"<>&;]+) (?:\| ([^\]'"<>&;]*))? \]\] }{
477 generate_fragment_link($1, $2)
480 # Match all internal links
482 generate_internal_link($page, $1, $2, $3, $4, sub {
483 my($linkpage, $linktext, $anchor) = @_;
484 return htmllink($page, $destpage, $linkpage,
485 linktext => $linktext,
486 anchor => underscorize(scrunch($anchor)));
497 # Find all WikiLinks in the page.
501 my $page=$params{page};
502 my $content=$params{content};
504 my $file=$pagesources{$page};
505 my $type=pagetype($file);
507 if($type ne 'mediawiki') {
508 return IkiWiki::Plugin::link::scan(@_);
511 skip_nowiki($content, sub {
513 while(/$link_regexp/g) {
514 generate_internal_link($page, $1, '', '', '', sub {
515 my($linkpage, $linktext, $anchor) = @_;
516 push @{$links{$page}}, $linkpage;
525 # Convert the page to HTML.
529 my $page = $params{page};
530 my $content = $params{content};
533 return $content if $markup_disabled;
535 # Do a little preprocessing to babysit Text::MediawikiFormat
536 # If a line begins with tabs, T:MwF won't convert it into preformatted blocks.
537 $content =~ s/^\t/ /mg;
539 my $ret = Text::MediawikiFormat::format($content, {
541 allowed_tags => [#HTML
542 # MediawikiFormat default
543 qw(b big blockquote br caption center cite code dd
544 div dl dt em font h1 h2 h3 h4 h5 h6 hr i li ol p
545 pre rb rp rt ruby s samp small strike strong sub
546 sup table td th tr tt u ul var),
550 qw(del ins), # These should have been added all along.
551 qw(span), # Mediawiki allows span but that's rather scary...?
552 qw(a), # this is unfortunate; should handle links after rendering the page.
556 qw(title align lang dir width height bgcolor),
559 qw(cite), # BLOCKQUOTE, Q
560 qw(size face color), # FONT
561 # For various lists, mostly deprecated but safe
562 qw(type start value compact),
564 qw(summary width border frame rules cellspacing
565 cellpadding valign char charoff colgroup col
566 span abbr axis headers scope rowspan colspan),
567 qw(id class name style), # For CSS
582 # This is only needed to support the check_redirect call.
586 my $page = $params{page};
587 my $destpage = $params{destpage};
588 my $template = $params{template};
590 # handle metaheaders for redirects
591 if (exists $metaheaders{$page} && $template->query(name => "meta")) {
592 # avoid duplicate meta lines
594 $template->param(meta => join("\n", grep { (! $seen{$_}) && ($seen{$_}=1) } @{$metaheaders{$page}}));
597 $template->param(tags => [
599 link => htmllink($page, $destpage, tagpage($_), rel => "tag")
600 }, sort keys %{$tags{$page}}
601 ]) if exists $tags{$page} && %{$tags{$page}} && $template->query(name => "tags");
603 # It's an rss/atom template. Add any categories.
604 if ($template->query(name => "categories")) {
605 if (exists $tags{$page} && %{$tags{$page}}) {
606 $template->param(categories => [map { category => $_ },
607 sort keys %{$tags{$page}}]);