1 The u32 page is excellent, but I wonder if documenting the procedure here
2 would be worthwhile. Who knows, the remote site might disappear. But also
3 there are some variations on the approach that might be useful:
5 * using a python script and the dom library to extract the page names from
6 Special:Allpages (such as
7 <http://www.staff.ncl.ac.uk/jon.dowland/unix/docs/get_pagenames.py>)
8 * Or, querying the mysql back-end to get the names
9 * using WWW::MediaWiki for importing/exporting pages from the wiki, instead
11 * use <a href="http://www.overnightpools.com/Winter_Covers.htm">pool covers</a>
13 Also, some detail on converting mediawiki transclusion to ikiwiki inlines...
19 > "Who knows, the remote site might disappear.". Right now, it appears to
20 > have done just that. -- [[users/Jon]]
22 I have manage to recover most of the site using the Internet Archive. What
23 I was unable to retrieve I have rewritten. You can find a copy of the code
24 at <http://github.com/mithro/media2iki>
26 > This is excellent news. However, I'm still keen on there being a
27 > comprehensive and up-to-date set of instructions on *this* site. I wouldn't
28 > suggest importing that material into ikiwiki like-for-like (not least for
29 > [[licensing|freesoftware]] reasons), but it's excellent to have it available
30 > for reference, especially since it (currently) is the only set of
31 > instructions that gives you the whole history.
33 > The `mediawiki.pm` that was at u32.net is licensed GPL-2. I'd like to see it
34 > cleaned up and added to IkiWiki proper (although I haven't requested this
35 > yet, I suspect the way it (ab)uses linkify would disqualify it at present).
37 > I've imported Scott's initial `mediawiki.pm` into a repository at
38 > <http://github.com/jmtd/mediawiki.pm> as a start.
43 The iki-fast-load ruby script from the u32 page is given below:
47 # This script is called on the final sorted, de-spammed revision
50 # It doesn't currently check for no-op revisions... I believe
51 # that git-fast-load will dutifully load them even though nothing
52 # happened. I don't care to solve this by adding a file cache
53 # to this script. You can run iki-diff-next.rb to highlight any
54 # empty revisions that need to be removed.
56 # This turns each node into an equivalent file.
57 # It does not convert spaces to underscores in file names.
58 # This would break wikilinks.
59 # I suppose you could fix this with mod_speling or mod_rewrite.
61 # It replaces nodes in the Image: namespace with the files themselves.
65 require 'node-callback'
70 # pipe is the stream to receive the git-fast-import commands
71 # putfrom is true if this branch has existing commits on it, false if not.
72 def format_git_commit(pipe, f)
73 # Need to escape backslashes and double-quotes for git?
74 # No, git breaks when I do this.
75 # For the filename "path with \\", git sez: bad default revision 'HEAD'
76 # filename = '"' + filename.gsub('\\', '\\\\\\\\').gsub('"', '\\"') + '"'
78 # In the calls below, length must be the size in bytes!!
79 # TODO: I haven't figured out how this works in the land of UTF8 and Ruby 1.9.
80 pipe.puts "commit #{f.branch}"
81 pipe.puts "committer #{f.username} <#{f.email}> #{f.timestamp.rfc2822}"
82 pipe.puts "data #{f.message.length}\n#{f.message}\n"
83 pipe.puts "from #{f.branch}^0" if f.putfrom
84 pipe.puts "M 644 inline #{f.filename}"
85 pipe.puts "data #{f.content.length}\n#{f.content}\n"
89 > Would be nice to know where you could get "node-callbacks"... this thing is useless without it. --[[users/simonraven]]
92 Mediawiki.pm - A plugin which supports mediawiki format.
95 # By Scott Bronson. Licensed under the GPLv2+ License.
96 # Extends Ikiwiki to be able to handle Mediawiki markup.
98 # To use the Mediawiki Plugin:
99 # - Install Text::MediawikiFormat
100 # - Turn of prefix_directives in your setup file.
101 # (TODO: we probably don't need to do this anymore?)
102 # prefix_directives => 1,
103 # - Add this plugin on Ikiwiki's path (perl -V, look for @INC)
104 # cp mediawiki.pm something/IkiWiki/Plugin
105 # - And enable it in your setup file
106 # add_plugins => [qw{mediawiki}],
107 # - Finally, turn off the link plugin in setup (this is important)
108 # disable_plugins => [qw{link}],
109 # - Rebuild everything (actually, this should be automatic right?)
110 # - Now all files with a .mediawiki extension should be rendered properly.
113 package IkiWiki::Plugin::mediawiki;
121 # This is a gross hack... We disable the link plugin so that our
122 # linkify routine is always called. Then we call the link plugin
123 # directly for all non-mediawiki pages. Ouch... Hopefully Ikiwiki
124 # will be updated soon to support multiple link plugins.
125 require IkiWiki::Plugin::link;
127 # Even if T:MwF is not installed, we can still handle all the linking.
128 # The user will just see Mediawiki markup rather than formatted markup.
129 eval q{use Text::MediawikiFormat ()};
130 my $markup_disabled = $@;
132 # Work around a UTF8 bug in Text::MediawikiFormat
133 # http://rt.cpan.org/Public/Bug/Display.html?id=26880
134 unless($markup_disabled) {
137 *{'Text::MediawikiFormat::uri_escape'} = \&URI::Escape::uri_escape_utf8;
140 my %metaheaders; # keeps track of redirects for pagetemplate.
141 my %tags; # keeps track of tags for pagetemplate.
145 hook(type => "checkconfig", id => "mediawiki", call => \&checkconfig);
146 hook(type => "scan", id => "mediawiki", call => \&scan);
147 hook(type => "linkify", id => "mediawiki", call => \&linkify);
148 hook(type => "htmlize", id => "mediawiki", call => \&htmlize);
149 hook(type => "pagetemplate", id => "mediawiki", call => \&pagetemplate);
155 return IkiWiki::Plugin::link::checkconfig(@_);
159 my $link_regexp = qr{
160 \[\[(?=[^!]) # beginning of link
161 ([^\n\r\]#|<>]+) # 1: page to link to
163 \# # '#', beginning of anchor
164 ([^|\]]+) # 2: anchor text
169 ([^\]\|]*) # 3: link text
172 ([a-zA-Z]*) # optional trailing alphas
176 # Convert spaces in the passed-in string into underscores.
177 # If passed in undef, returns undef without throwing errors.
181 $var =~ tr{ }{_} if $var;
186 # Underscorize, strip leading and trailing space, and scrunch
187 # multiple runs of spaces into one underscore.
192 $var =~ s/^\s+|\s+$//g; # strip leading and trailing space
193 $var =~ s/\s+/ /g; # squash multiple spaces to one
199 # Translates Mediawiki paths into Ikiwiki paths.
200 # It needs to be pretty careful because Mediawiki and Ikiwiki handle
201 # relative vs. absolute exactly opposite from each other.
205 my $path = scrunch(shift);
207 # always start from root unless we're doing relative shenanigans.
208 $page = "/" unless $path =~ /^(?:\/|\.\.)/;
211 for(split(/\//, "$page/$path")) {
215 push @result, $_ if $_ ne "";
219 # temporary hack working around http://ikiwiki.info/bugs/Can__39__t_create_root_page/index.html?updated
220 # put this back the way it was once this bug is fixed upstream.
221 # This is actually a major problem because now Mediawiki pages can't link from /Git/git-svn to /git-svn. And upstream appears to be uninterested in fixing this bug. :(
222 # return "/" . join("/", @result);
223 return join("/", @result);
227 # Figures out the human-readable text for a wikilink
230 my($page, $inlink, $anchor, $title, $trailing) = @_;
231 my $link = translate_path($page,$inlink);
233 # translate_path always produces an absolute link.
234 # get rid of the leading slash before we display this link.
239 $out = IkiWiki::pagetitle($title);
241 $link = $inlink if $inlink =~ /^\s*\//;
242 $out = $anchor ? "$link#$anchor" : $link;
243 if(defined $title && $title eq "") {
244 # a bare pipe appeared in the link...
245 # user wants to strip namespace and trailing parens.
246 $out =~ s/^[A-Za-z0-9_-]*://;
247 $out =~ s/\s*\(.*\)\s*$//;
249 # A trailing slash suppresses the leading slash
250 $out =~ s#^/(.*)/$#$1#;
252 $out .= $trailing if defined $trailing;
261 if (exists $config{tagbase} && defined $config{tagbase}) {
262 $tag=$config{tagbase}."/".$tag;
269 # Pass a URL and optional text associated with it. This call turns
270 # it into fully-formatted HTML the same way Mediawiki would.
271 # Counter is used to number untitled links sequentially on the page.
272 # It should be set to 1 when you start parsing a new page. This call
273 # increments it automatically.
274 sub generate_external_link
280 # Mediawiki trims off trailing commas.
281 # And apparently it does entity substitution first.
282 # Since we can't, we'll fake it.
284 # trim any leading and trailing whitespace
285 $url =~ s/^\s+|\s+$//g;
287 # url properly terminates on > but must special-case >
289 $url =~ s{(\&(?:gt|lt)\;.*)$}{ $trailer = $1, ''; }eg;
291 # Trim some potential trailing chars, put them outside the link.
293 $url =~ s{([,)]+)$}{ $tmptrail .= $1, ''; }eg;
294 $trailer = $tmptrail . $trailer;
299 $text = "[$$counter]";
302 $text =~ s/^\s+|\s+$//g;
308 return "<a href='$url' title='$title'>$text</a>$trailer";
312 # Called to handle bookmarks like \[[#heading]] or <span class="createlink"><a href="http://u32.net/cgi-bin/ikiwiki.cgi?page=%20text%20&from=Mediawiki_Plugin%2Fmediawiki&do=create" rel="nofollow">?</a>#a</span>
313 sub generate_fragment_link
320 $url = scrunch($url);
322 if(defined($text) && $text ne "") {
323 $text = scrunch($text);
328 $url = underscorize($url);
330 # For some reason Mediawiki puts blank titles on all its fragment links.
331 # I don't see why we would duplicate that behavior here.
332 return "<a href='$url'>$text</a>";
336 sub generate_internal_link
338 my($page, $inlink, $anchor, $title, $trailing, $proc) = @_;
340 # Ikiwiki's link link plugin wrecks this line when displaying on the site.
341 # Until the code highlighter plugin can turn off link finding,
342 # always escape double brackets in double quotes: \[[
343 if($inlink eq '..') {
344 # Mediawiki doesn't touch links like \[[..#hi|ho]].
345 return "\[[" . $inlink . ($anchor?"#$anchor":"") .
346 ($title?"|$title":"") . "]]" . $trailing;
349 my($linkpage, $linktext);
350 if($inlink =~ /^ (:?) \s* Category (\s* \: \s*) ([^\]]*) $/x) {
351 # Handle category links
354 $linkpage = IkiWiki::linkpage(translate_path($page, $inlink));
356 # Produce a link but don't add this page to the given category.
357 $linkpage = tagpage($linkpage);
358 $linktext = ($title ? '' : "Category$sep") .
359 linktext($page, $inlink, $anchor, $title, $trailing);
360 $tags{$page}{$linkpage} = 1;
362 # Add this page to the given category but don't produce a link.
363 $tags{$page}{$linkpage} = 1;
364 &$proc(tagpage($linkpage), $linktext, $anchor);
368 # It's just a regular link
369 $linkpage = IkiWiki::linkpage(translate_path($page, $inlink));
370 $linktext = linktext($page, $inlink, $anchor, $title, $trailing);
373 return &$proc($linkpage, $linktext, $anchor);
381 my $page=$params{page};
382 my $destpage=$params{destpage};
383 my $content=$params{content};
385 return "" if $page ne $destpage;
387 if($content !~ /^ \s* \#REDIRECT \s* \[\[ ( [^\]]+ ) \]\]/x) {
388 # this page isn't a redirect, render it normally.
392 # The rest of this function is copied from the redir clause
393 # in meta::preprocess and actually handles the redirect.
396 $value =~ s/^\s+|\s+$//g;
399 if ($value !~ /^\w+:\/\//) {
401 my ($redir_page, $redir_anchor) = split /\#/, $value;
403 add_depends($page, $redir_page);
404 my $link=bestlink($page, underscorize(translate_path($page,$redir_page)));
405 if (! length $link) {
406 return "<b>Redirect Error:</b> <nowiki>\[[$redir_page]] not found.</nowiki>";
409 $value=urlto($link, $page);
410 $value.='#'.$redir_anchor if defined $redir_anchor;
413 # redir cycle detection
414 $pagestate{$page}{mediawiki}{redir}=$link;
417 while (exists $pagestate{$at}{mediawiki}{redir}) {
419 return "<b>Redirect Error:</b> cycle found on <nowiki>\[[$at]]</nowiki>";
422 $at=$pagestate{$at}{mediawiki}{redir};
425 # it's an external link
426 $value = encode_entities($value);
429 my $redir="<meta http-equiv=\"refresh\" content=\"0; URL=$value\" />";
430 $redir=scrub($redir) if !$safe;
431 push @{$metaheaders{$page}}, $redir;
433 return "Redirecting to $value ...";
437 # Feed this routine a string containing <nowiki>...</nowiki> sections,
438 # this routine calls your callback for every section not within nowikis,
439 # collecting its return values and returning the rewritten string.
448 for(split(/(<nowiki[^>]*>.*?<\/nowiki\s*>)/s, $content)) {
449 $result .= ($state ? $_ : &$proc($_));
457 # Converts all links in the page, wiki and otherwise.
462 my $page=$params{page};
463 my $destpage=$params{destpage};
464 my $content=$params{content};
466 my $file=$pagesources{$page};
467 my $type=pagetype($file);
470 if($type ne 'mediawiki') {
471 return IkiWiki::Plugin::link::linkify(@_);
474 my $redir = check_redirect(%params);
475 return $redir if defined $redir;
477 # this code was copied from MediawikiFormat.pm.
478 # Heavily changed because MF.pm screws up escaping when it does
479 # this awful hack: $uricCheat =~ tr/://d;
480 my $schemas = [qw(http https ftp mailto gopher)];
481 my $re = join "|", map {qr/\Q$_\E/} @$schemas;
482 my $schemes = qr/(?:$re)/;
483 # And this is copied from URI:
484 my $reserved = q(;/?@&=+$,); # NOTE: no colon or [] !
485 my $uric = quotemeta($reserved) . $URI::unreserved . "%#";
487 my $result = skip_nowiki($content, sub {
491 #s/<(a[\s>\/])/<$1/ig;
492 # Disabled because this appears to screw up the aggregate plugin.
493 # I guess we'll rely on Iki to post-sanitize this sort of stuff.
495 # Replace external links, http://blah or [http://blah]
496 s{\b($schemes:[$uric][:$uric]+)|\[($schemes:[$uric][:$uric]+)([^\]]*?)\]}{
497 generate_external_link($1||$2, $3, \$counter)
500 # Handle links that only contain fragments.
501 s{ \[\[ \s* (\#[^|\]'"<>&;]+) (?:\| ([^\]'"<>&;]*))? \]\] }{
502 generate_fragment_link($1, $2)
505 # Match all internal links
507 generate_internal_link($page, $1, $2, $3, $4, sub {
508 my($linkpage, $linktext, $anchor) = @_;
509 return htmllink($page, $destpage, $linkpage,
510 linktext => $linktext,
511 anchor => underscorize(scrunch($anchor)));
522 # Find all WikiLinks in the page.
526 my $page=$params{page};
527 my $content=$params{content};
529 my $file=$pagesources{$page};
530 my $type=pagetype($file);
532 if($type ne 'mediawiki') {
533 return IkiWiki::Plugin::link::scan(@_);
536 skip_nowiki($content, sub {
538 while(/$link_regexp/g) {
539 generate_internal_link($page, $1, '', '', '', sub {
540 my($linkpage, $linktext, $anchor) = @_;
541 push @{$links{$page}}, $linkpage;
550 # Convert the page to HTML.
554 my $page = $params{page};
555 my $content = $params{content};
558 return $content if $markup_disabled;
560 # Do a little preprocessing to babysit Text::MediawikiFormat
561 # If a line begins with tabs, T:MwF won't convert it into preformatted blocks.
562 $content =~ s/^\t/ /mg;
564 my $ret = Text::MediawikiFormat::format($content, {
566 allowed_tags => [#HTML
567 # MediawikiFormat default
568 qw(b big blockquote br caption center cite code dd
569 div dl dt em font h1 h2 h3 h4 h5 h6 hr i li ol p
570 pre rb rp rt ruby s samp small strike strong sub
571 sup table td th tr tt u ul var),
575 qw(del ins), # These should have been added all along.
576 qw(span), # Mediawiki allows span but that's rather scary...?
577 qw(a), # this is unfortunate; should handle links after rendering the page.
581 qw(title align lang dir width height bgcolor),
584 qw(cite), # BLOCKQUOTE, Q
585 qw(size face color), # FONT
586 # For various lists, mostly deprecated but safe
587 qw(type start value compact),
589 qw(summary width border frame rules cellspacing
590 cellpadding valign char charoff colgroup col
591 span abbr axis headers scope rowspan colspan),
592 qw(id class name style), # For CSS
607 # This is only needed to support the check_redirect call.
611 my $page = $params{page};
612 my $destpage = $params{destpage};
613 my $template = $params{template};
615 # handle metaheaders for redirects
616 if (exists $metaheaders{$page} && $template->query(name => "meta")) {
617 # avoid duplicate meta lines
619 $template->param(meta => join("\n", grep { (! $seen{$_}) && ($seen{$_}=1) } @{$metaheaders{$page}}));
622 $template->param(tags => [
624 link => htmllink($page, $destpage, tagpage($_), rel => "tag")
625 }, sort keys %{$tags{$page}}
626 ]) if exists $tags{$page} && %{$tags{$page}} && $template->query(name => "tags");
628 # It's an rss/atom template. Add any categories.
629 if ($template->query(name => "categories")) {
630 if (exists $tags{$page} && %{$tags{$page}}) {
631 $template->param(categories => [map { category => $_ },
632 sort keys %{$tags{$page}}]);
641 Hello. Got ikiwiki running and I'm planning to convert my personal
642 Mediawiki wiki to ikiwiki so I can take offline copies around. If anyone
643 has an old copy of the instructions, or any advice on where to start I'd be
644 glad to hear it. Otherwise I'm just going to chronicle my journey on the
645 page.--[[users/Chadius]]
647 > Today I saw that someone is working to import wikipedia into git.
648 > <http://www.gossamer-threads.com/lists/wiki/foundation/181163>
649 > Since wikipedia uses mediawiki, perhaps his importer will work
650 > on mediawiki in general. It seems to produce output that could be
651 > used by the [[plugins/contrib/mediawiki]] plugin, if the filenames
652 > were fixed to use the right extension. --[[Joey]]