1 20100428 - I just wrote a simple ruby script which will connect to a mysql server and then recreate the pages and their revision histories with Grit. It also does one simple conversion of equals titles to pounds. Enjoy!
3 <http://github.com/docunext/mediawiki2gitikiwiki>
9 I wrote a script that will download all the latest revisions of a mediawiki site. In short, it does a good part of the stuff required for the migration: it downloads the goods (ie. the latest version of every page, automatically) and commits the resulting structure. There's still a good few pieces missing for an actual complete conversion to ikiwiki, but it's a pretty good start. It only talks with mediawiki through HTTP, so no special access is necessary. The downside of that is that it will not attempt to download every revision for performance reasons. The code is here: http://anarcat.ath.cx/software/mediawikigitdump.git/ See header of the file for more details and todos. -- [[users/Anarcat]]
13 The u32 page is excellent, but I wonder if documenting the procedure here
14 would be worthwhile. Who knows, the remote site might disappear. But also
15 there are some variations on the approach that might be useful:
17 * using a python script and the dom library to extract the page names from
18 Special:Allpages (such as
19 <http://www.staff.ncl.ac.uk/jon.dowland/unix/docs/get_pagenames.py>)
20 * Or, querying the mysql back-end to get the names
21 * using WWW::MediaWiki for importing/exporting pages from the wiki, instead
24 Also, some detail on converting mediawiki transclusion to ikiwiki inlines...
30 > "Who knows, the remote site might disappear.". Right now, it appears to
31 > have done just that. -- [[users/Jon]]
33 I have manage to recover most of the site using the Internet Archive. What
34 I was unable to retrieve I have rewritten. You can find a copy of the code
35 at <http://github.com/mithro/media2iki>
37 > This is excellent news. However, I'm still keen on there being a
38 > comprehensive and up-to-date set of instructions on *this* site. I wouldn't
39 > suggest importing that material into ikiwiki like-for-like (not least for
40 > [[licensing|freesoftware]] reasons), but it's excellent to have it available
41 > for reference, especially since it (currently) is the only set of
42 > instructions that gives you the whole history.
44 > The `mediawiki.pm` that was at u32.net is licensed GPL-2. I'd like to see it
45 > cleaned up and added to IkiWiki proper (although I haven't requested this
46 > yet, I suspect the way it (ab)uses linkify would disqualify it at present).
48 > I've imported Scott's initial `mediawiki.pm` into a repository at
49 > <http://github.com/jmtd/mediawiki.pm> as a start.
54 The iki-fast-load ruby script from the u32 page is given below:
58 # This script is called on the final sorted, de-spammed revision
61 # It doesn't currently check for no-op revisions... I believe
62 # that git-fast-load will dutifully load them even though nothing
63 # happened. I don't care to solve this by adding a file cache
64 # to this script. You can run iki-diff-next.rb to highlight any
65 # empty revisions that need to be removed.
67 # This turns each node into an equivalent file.
68 # It does not convert spaces to underscores in file names.
69 # This would break wikilinks.
70 # I suppose you could fix this with mod_speling or mod_rewrite.
72 # It replaces nodes in the Image: namespace with the files themselves.
76 require 'node-callback'
81 # pipe is the stream to receive the git-fast-import commands
82 # putfrom is true if this branch has existing commits on it, false if not.
83 def format_git_commit(pipe, f)
84 # Need to escape backslashes and double-quotes for git?
85 # No, git breaks when I do this.
86 # For the filename "path with \\", git sez: bad default revision 'HEAD'
87 # filename = '"' + filename.gsub('\\', '\\\\\\\\').gsub('"', '\\"') + '"'
89 # In the calls below, length must be the size in bytes!!
90 # TODO: I haven't figured out how this works in the land of UTF8 and Ruby 1.9.
91 pipe.puts "commit #{f.branch}"
92 pipe.puts "committer #{f.username} <#{f.email}> #{f.timestamp.rfc2822}"
93 pipe.puts "data #{f.message.length}\n#{f.message}\n"
94 pipe.puts "from #{f.branch}^0" if f.putfrom
95 pipe.puts "M 644 inline #{f.filename}"
96 pipe.puts "data #{f.content.length}\n#{f.content}\n"
100 > Would be nice to know where you could get "node-callbacks"... this thing is useless without it. --[[users/simonraven]]
103 Mediawiki.pm - A plugin which supports mediawiki format.
106 # By Scott Bronson. Licensed under the GPLv2+ License.
107 # Extends Ikiwiki to be able to handle Mediawiki markup.
109 # To use the Mediawiki Plugin:
110 # - Install Text::MediawikiFormat
111 # - Turn of prefix_directives in your setup file.
112 # (TODO: we probably don't need to do this anymore?)
113 # prefix_directives => 1,
114 # - Add this plugin on Ikiwiki's path (perl -V, look for @INC)
115 # cp mediawiki.pm something/IkiWiki/Plugin
116 # - And enable it in your setup file
117 # add_plugins => [qw{mediawiki}],
118 # - Finally, turn off the link plugin in setup (this is important)
119 # disable_plugins => [qw{link}],
120 # - Rebuild everything (actually, this should be automatic right?)
121 # - Now all files with a .mediawiki extension should be rendered properly.
124 package IkiWiki::Plugin::mediawiki;
132 # This is a gross hack... We disable the link plugin so that our
133 # linkify routine is always called. Then we call the link plugin
134 # directly for all non-mediawiki pages. Ouch... Hopefully Ikiwiki
135 # will be updated soon to support multiple link plugins.
136 require IkiWiki::Plugin::link;
138 # Even if T:MwF is not installed, we can still handle all the linking.
139 # The user will just see Mediawiki markup rather than formatted markup.
140 eval q{use Text::MediawikiFormat ()};
141 my $markup_disabled = $@;
143 # Work around a UTF8 bug in Text::MediawikiFormat
144 # http://rt.cpan.org/Public/Bug/Display.html?id=26880
145 unless($markup_disabled) {
148 *{'Text::MediawikiFormat::uri_escape'} = \&URI::Escape::uri_escape_utf8;
151 my %metaheaders; # keeps track of redirects for pagetemplate.
152 my %tags; # keeps track of tags for pagetemplate.
156 hook(type => "checkconfig", id => "mediawiki", call => \&checkconfig);
157 hook(type => "scan", id => "mediawiki", call => \&scan);
158 hook(type => "linkify", id => "mediawiki", call => \&linkify);
159 hook(type => "htmlize", id => "mediawiki", call => \&htmlize);
160 hook(type => "pagetemplate", id => "mediawiki", call => \&pagetemplate);
166 return IkiWiki::Plugin::link::checkconfig(@_);
170 my $link_regexp = qr{
171 \[\[(?=[^!]) # beginning of link
172 ([^\n\r\]#|<>]+) # 1: page to link to
174 \# # '#', beginning of anchor
175 ([^|\]]+) # 2: anchor text
180 ([^\]\|]*) # 3: link text
183 ([a-zA-Z]*) # optional trailing alphas
187 # Convert spaces in the passed-in string into underscores.
188 # If passed in undef, returns undef without throwing errors.
192 $var =~ tr{ }{_} if $var;
197 # Underscorize, strip leading and trailing space, and scrunch
198 # multiple runs of spaces into one underscore.
203 $var =~ s/^\s+|\s+$//g; # strip leading and trailing space
204 $var =~ s/\s+/ /g; # squash multiple spaces to one
210 # Translates Mediawiki paths into Ikiwiki paths.
211 # It needs to be pretty careful because Mediawiki and Ikiwiki handle
212 # relative vs. absolute exactly opposite from each other.
216 my $path = scrunch(shift);
218 # always start from root unless we're doing relative shenanigans.
219 $page = "/" unless $path =~ /^(?:\/|\.\.)/;
222 for(split(/\//, "$page/$path")) {
226 push @result, $_ if $_ ne "";
230 # temporary hack working around http://ikiwiki.info/bugs/Can__39__t_create_root_page/index.html?updated
231 # put this back the way it was once this bug is fixed upstream.
232 # This is actually a major problem because now Mediawiki pages can't link from /Git/git-svn to /git-svn. And upstream appears to be uninterested in fixing this bug. :(
233 # return "/" . join("/", @result);
234 return join("/", @result);
238 # Figures out the human-readable text for a wikilink
241 my($page, $inlink, $anchor, $title, $trailing) = @_;
242 my $link = translate_path($page,$inlink);
244 # translate_path always produces an absolute link.
245 # get rid of the leading slash before we display this link.
250 $out = IkiWiki::pagetitle($title);
252 $link = $inlink if $inlink =~ /^\s*\//;
253 $out = $anchor ? "$link#$anchor" : $link;
254 if(defined $title && $title eq "") {
255 # a bare pipe appeared in the link...
256 # user wants to strip namespace and trailing parens.
257 $out =~ s/^[A-Za-z0-9_-]*://;
258 $out =~ s/\s*\(.*\)\s*$//;
260 # A trailing slash suppresses the leading slash
261 $out =~ s#^/(.*)/$#$1#;
263 $out .= $trailing if defined $trailing;
272 if (exists $config{tagbase} && defined $config{tagbase}) {
273 $tag=$config{tagbase}."/".$tag;
280 # Pass a URL and optional text associated with it. This call turns
281 # it into fully-formatted HTML the same way Mediawiki would.
282 # Counter is used to number untitled links sequentially on the page.
283 # It should be set to 1 when you start parsing a new page. This call
284 # increments it automatically.
285 sub generate_external_link
291 # Mediawiki trims off trailing commas.
292 # And apparently it does entity substitution first.
293 # Since we can't, we'll fake it.
295 # trim any leading and trailing whitespace
296 $url =~ s/^\s+|\s+$//g;
298 # url properly terminates on > but must special-case >
300 $url =~ s{(\&(?:gt|lt)\;.*)$}{ $trailer = $1, ''; }eg;
302 # Trim some potential trailing chars, put them outside the link.
304 $url =~ s{([,)]+)$}{ $tmptrail .= $1, ''; }eg;
305 $trailer = $tmptrail . $trailer;
310 $text = "[$$counter]";
313 $text =~ s/^\s+|\s+$//g;
319 return "<a href='$url' title='$title'>$text</a>$trailer";
323 # Called to handle bookmarks like \[[#heading]] or <span class="createlink"><a href="http://u32.net/cgi-bin/ikiwiki.cgi?page=%20text%20&from=Mediawiki_Plugin%2Fmediawiki&do=create" rel="nofollow">?</a>#a</span>
324 sub generate_fragment_link
331 $url = scrunch($url);
333 if(defined($text) && $text ne "") {
334 $text = scrunch($text);
339 $url = underscorize($url);
341 # For some reason Mediawiki puts blank titles on all its fragment links.
342 # I don't see why we would duplicate that behavior here.
343 return "<a href='$url'>$text</a>";
347 sub generate_internal_link
349 my($page, $inlink, $anchor, $title, $trailing, $proc) = @_;
351 # Ikiwiki's link link plugin wrecks this line when displaying on the site.
352 # Until the code highlighter plugin can turn off link finding,
353 # always escape double brackets in double quotes: \[[
354 if($inlink eq '..') {
355 # Mediawiki doesn't touch links like \[[..#hi|ho]].
356 return "\[[" . $inlink . ($anchor?"#$anchor":"") .
357 ($title?"|$title":"") . "]]" . $trailing;
360 my($linkpage, $linktext);
361 if($inlink =~ /^ (:?) \s* Category (\s* \: \s*) ([^\]]*) $/x) {
362 # Handle category links
365 $linkpage = IkiWiki::linkpage(translate_path($page, $inlink));
367 # Produce a link but don't add this page to the given category.
368 $linkpage = tagpage($linkpage);
369 $linktext = ($title ? '' : "Category$sep") .
370 linktext($page, $inlink, $anchor, $title, $trailing);
371 $tags{$page}{$linkpage} = 1;
373 # Add this page to the given category but don't produce a link.
374 $tags{$page}{$linkpage} = 1;
375 &$proc(tagpage($linkpage), $linktext, $anchor);
379 # It's just a regular link
380 $linkpage = IkiWiki::linkpage(translate_path($page, $inlink));
381 $linktext = linktext($page, $inlink, $anchor, $title, $trailing);
384 return &$proc($linkpage, $linktext, $anchor);
392 my $page=$params{page};
393 my $destpage=$params{destpage};
394 my $content=$params{content};
396 return "" if $page ne $destpage;
398 if($content !~ /^ \s* \#REDIRECT \s* \[\[ ( [^\]]+ ) \]\]/x) {
399 # this page isn't a redirect, render it normally.
403 # The rest of this function is copied from the redir clause
404 # in meta::preprocess and actually handles the redirect.
407 $value =~ s/^\s+|\s+$//g;
410 if ($value !~ /^\w+:\/\//) {
412 my ($redir_page, $redir_anchor) = split /\#/, $value;
414 add_depends($page, $redir_page);
415 my $link=bestlink($page, underscorize(translate_path($page,$redir_page)));
416 if (! length $link) {
417 return "<b>Redirect Error:</b> <nowiki>\[[$redir_page]] not found.</nowiki>";
420 $value=urlto($link, $page);
421 $value.='#'.$redir_anchor if defined $redir_anchor;
424 # redir cycle detection
425 $pagestate{$page}{mediawiki}{redir}=$link;
428 while (exists $pagestate{$at}{mediawiki}{redir}) {
430 return "<b>Redirect Error:</b> cycle found on <nowiki>\[[$at]]</nowiki>";
433 $at=$pagestate{$at}{mediawiki}{redir};
436 # it's an external link
437 $value = encode_entities($value);
440 my $redir="<meta http-equiv=\"refresh\" content=\"0; URL=$value\" />";
441 $redir=scrub($redir) if !$safe;
442 push @{$metaheaders{$page}}, $redir;
444 return "Redirecting to $value ...";
448 # Feed this routine a string containing <nowiki>...</nowiki> sections,
449 # this routine calls your callback for every section not within nowikis,
450 # collecting its return values and returning the rewritten string.
459 for(split(/(<nowiki[^>]*>.*?<\/nowiki\s*>)/s, $content)) {
460 $result .= ($state ? $_ : &$proc($_));
468 # Converts all links in the page, wiki and otherwise.
473 my $page=$params{page};
474 my $destpage=$params{destpage};
475 my $content=$params{content};
477 my $file=$pagesources{$page};
478 my $type=pagetype($file);
481 if($type ne 'mediawiki') {
482 return IkiWiki::Plugin::link::linkify(@_);
485 my $redir = check_redirect(%params);
486 return $redir if defined $redir;
488 # this code was copied from MediawikiFormat.pm.
489 # Heavily changed because MF.pm screws up escaping when it does
490 # this awful hack: $uricCheat =~ tr/://d;
491 my $schemas = [qw(http https ftp mailto gopher)];
492 my $re = join "|", map {qr/\Q$_\E/} @$schemas;
493 my $schemes = qr/(?:$re)/;
494 # And this is copied from URI:
495 my $reserved = q(;/?@&=+$,); # NOTE: no colon or [] !
496 my $uric = quotemeta($reserved) . $URI::unreserved . "%#";
498 my $result = skip_nowiki($content, sub {
502 #s/<(a[\s>\/])/<$1/ig;
503 # Disabled because this appears to screw up the aggregate plugin.
504 # I guess we'll rely on Iki to post-sanitize this sort of stuff.
506 # Replace external links, http://blah or [http://blah]
507 s{\b($schemes:[$uric][:$uric]+)|\[($schemes:[$uric][:$uric]+)([^\]]*?)\]}{
508 generate_external_link($1||$2, $3, \$counter)
511 # Handle links that only contain fragments.
512 s{ \[\[ \s* (\#[^|\]'"<>&;]+) (?:\| ([^\]'"<>&;]*))? \]\] }{
513 generate_fragment_link($1, $2)
516 # Match all internal links
518 generate_internal_link($page, $1, $2, $3, $4, sub {
519 my($linkpage, $linktext, $anchor) = @_;
520 return htmllink($page, $destpage, $linkpage,
521 linktext => $linktext,
522 anchor => underscorize(scrunch($anchor)));
533 # Find all WikiLinks in the page.
537 my $page=$params{page};
538 my $content=$params{content};
540 my $file=$pagesources{$page};
541 my $type=pagetype($file);
543 if($type ne 'mediawiki') {
544 return IkiWiki::Plugin::link::scan(@_);
547 skip_nowiki($content, sub {
549 while(/$link_regexp/g) {
550 generate_internal_link($page, $1, '', '', '', sub {
551 my($linkpage, $linktext, $anchor) = @_;
552 push @{$links{$page}}, $linkpage;
561 # Convert the page to HTML.
565 my $page = $params{page};
566 my $content = $params{content};
569 return $content if $markup_disabled;
571 # Do a little preprocessing to babysit Text::MediawikiFormat
572 # If a line begins with tabs, T:MwF won't convert it into preformatted blocks.
573 $content =~ s/^\t/ /mg;
575 my $ret = Text::MediawikiFormat::format($content, {
577 allowed_tags => [#HTML
578 # MediawikiFormat default
579 qw(b big blockquote br caption center cite code dd
580 div dl dt em font h1 h2 h3 h4 h5 h6 hr i li ol p
581 pre rb rp rt ruby s samp small strike strong sub
582 sup table td th tr tt u ul var),
586 qw(del ins), # These should have been added all along.
587 qw(span), # Mediawiki allows span but that's rather scary...?
588 qw(a), # this is unfortunate; should handle links after rendering the page.
592 qw(title align lang dir width height bgcolor),
595 qw(cite), # BLOCKQUOTE, Q
596 qw(size face color), # FONT
597 # For various lists, mostly deprecated but safe
598 qw(type start value compact),
600 qw(summary width border frame rules cellspacing
601 cellpadding valign char charoff colgroup col
602 span abbr axis headers scope rowspan colspan),
603 qw(id class name style), # For CSS
618 # This is only needed to support the check_redirect call.
622 my $page = $params{page};
623 my $destpage = $params{destpage};
624 my $template = $params{template};
626 # handle metaheaders for redirects
627 if (exists $metaheaders{$page} && $template->query(name => "meta")) {
628 # avoid duplicate meta lines
630 $template->param(meta => join("\n", grep { (! $seen{$_}) && ($seen{$_}=1) } @{$metaheaders{$page}}));
633 $template->param(tags => [
635 link => htmllink($page, $destpage, tagpage($_), rel => "tag")
636 }, sort keys %{$tags{$page}}
637 ]) if exists $tags{$page} && %{$tags{$page}} && $template->query(name => "tags");
639 # It's an rss/atom template. Add any categories.
640 if ($template->query(name => "categories")) {
641 if (exists $tags{$page} && %{$tags{$page}}) {
642 $template->param(categories => [map { category => $_ },
643 sort keys %{$tags{$page}}]);
652 Hello. Got ikiwiki running and I'm planning to convert my personal
653 Mediawiki wiki to ikiwiki so I can take offline copies around. If anyone
654 has an old copy of the instructions, or any advice on where to start I'd be
655 glad to hear it. Otherwise I'm just going to chronicle my journey on the
656 page.--[[users/Chadius]]
658 > Today I saw that someone is working to import wikipedia into git.
659 > <http://www.gossamer-threads.com/lists/wiki/foundation/181163>
660 > Since wikipedia uses mediawiki, perhaps his importer will work
661 > on mediawiki in general. It seems to produce output that could be
662 > used by the [[plugins/contrib/mediawiki]] plugin, if the filenames
663 > were fixed to use the right extension. --[[Joey]]
665 >> Here's another I found while browsing around starting from the link you gave Joey<br />
666 >> <http://github.com/scy/levitation><br />
667 >> As I don't run mediawiki anymore, but I still have my xz/gzip-compressed XML dumps,
668 >> it's certainly easier for me to do it this way; also a file or a set of files is easier to lug
669 >> around on some medium than a full mysqld or postgres master and relevant databases.