X-Git-Url: http://git.vanrenterghem.biz/git.ikiwiki.info.git/blobdiff_plain/c9dc60328c4d3464f71769e635e31d72a3756c86..ae1857b43cf55a393a507b8434f172cbdb29d5b0:/doc/tips/convert_mediawiki_to_ikiwiki.mdwn diff --git a/doc/tips/convert_mediawiki_to_ikiwiki.mdwn b/doc/tips/convert_mediawiki_to_ikiwiki.mdwn index 4e32e8949..8d1d52b49 100644 --- a/doc/tips/convert_mediawiki_to_ikiwiki.mdwn +++ b/doc/tips/convert_mediawiki_to_ikiwiki.mdwn @@ -144,7 +144,12 @@ into an ikiwiki tag name using a script such as sys.stdout.write(re.sub(pattern, manglecat, line)) else: sys.stdout.write(line) -## Step 4: Mediawiki plugin +## Step 4: Mediawiki plugin or Converting to Markdown + +You can use a plugin to make ikiwiki support Mediawiki syntax, or you can +convert pages to a format ikiwiki understands. + +### Step 4a: Mediawiki plugin The [[plugins/contrib/mediawiki]] plugin can be used by ikiwiki to interpret most of the Mediawiki syntax. @@ -155,15 +160,127 @@ The following things are not working: * tables * spaces and other funky characters ("?") in page names +### Step 4b: Converting pages + +#### Converting to Markdown + +There is a Python script for converting from the Mediawiki format to Markdown in [[mithro]]'s conversion repository at <http://github.com/mithro/media2iki>. *WARNING:* While the script tries to preserve everything is can, Markdown syntax is not as flexible as Mediawiki so the conversion is lossy! + + # The script needs the mwlib library to work + # If you don't have easy_install installed, apt-get install python-setuptools + sudo easy_install mwlib + + # Get the repository + git clone git://github.com/mithro/media2iki.git + cd media2iki + + # Do a conversion + python mediawiki2markdown.py --no-strict --no-debugger <my mediawiki file> > output.md + + +[[mithro]] doesn't frequent this page, so please report issues on the [github issue tracker](https://github.com/mithro/media2iki/issues). + ## Scripts -[[sabr]] used to explain how to [import MediaWiki content into -git](http://u32.net/Mediawiki_Conversion/index.html?updated), including full -edit history, but as of 2009/10/16 that site is not available. A copy of the -information found on this website is stored at <http://github.com/mithro/media2iki>. +There is a repository of tools for converting MediaWiki to Git based Markdown wiki formats (such as ikiwiki and github wikis) at <http://github.com/mithro/media2iki>. It also includes a standalone tool for converting from the Mediawiki format to Markdown. [[mithro]] doesn't frequent this page, so please report issues on the [github issue tracker](https://github.com/mithro/media2iki/issues). [[Albert]] wrote a ruby script to convert from mediawiki's database to ikiwiki at <https://github.com/docunext/mediawiki2gitikiwiki> +[[scy]] wrote a python script to convert from mediawiki XML dumps to git repositories at <https://github.com/scy/levitation>. + [[Anarcat]] wrote a python script to convert from a mediawiki website to ikiwiki at <http://anarcat.ath.cx/software/mediawikigitdump.git/>. The script doesn't need any special access or privileges and communicates with the documented API (so it's a bit slower, but allows you to mirror sites you are not managing, like parts of Wikipedia). The script can also incrementally import new changes from a running site, through RecentChanges inspection. It also supports mithro's new Mediawiki2markdown converter. -[[scy]] wrote a python script to convert from mediawiki XML dumps to git repositories at <https://github.com/scy/levitation>. +> Some assembly is required to get Mediawiki2markdown and its mwlib +> gitmodule available in the right place for it to use.. perhaps you could +> automate that? --[[Joey]] + +> > You mean a debian package? :) media2iki is actually a submodule, so you need to go through extra steps to install it. mwlib being the most annoying part... I have fixed my script so it looks for media2iki directly in the submodule and improved the install instructions in the README file, but I'm not sure I can do much more short of starting to package the whole thing... --[[anarcat]] + +>>> You may have forgotten to push that, I don't see those changes. +>>> Packaging the python library might be a good 1st step. +>>> --[[Joey]] + +> Also, when I try to run it with -t on www.amateur-radio-wiki.net, it +> fails on some html in the page named "4_metres". On archiveteam.org, +> it fails trying to write to a page filename starting with "/", --[[Joey]] + +> > can you show me exactly which commandline arguments you're using? also, I have made improvements over the converter too, also available here: <http://anarcat.ath.cx/software/media2iki.git/> -- [[anarcat]] + +>>> Not using your new converter, just the installation I did earlier +>>> today: +>>> --[[Joey]] + +<pre> +fetching page 4 metres from http://www.amateur-radio-wiki.net//index.php?action=raw&title=4+metres into 4_metres.mdwn +Unknown tag TagNode tagname='div' vlist={'style': {u'float': u'left', u'border': u'2px solid #aaa', u'margin-left': u'20px'}}->'div' div +Traceback (most recent call last): + File "./mediawikigitdump.py", line 298, in <module> + fetch_allpages(namespace) + File "./mediawikigitdump.py", line 82, in fetch_allpages + fetch_page(page.getAttribute('title')) + File "./mediawikigitdump.py", line 187, in fetch_page + c.parse(urllib.urlopen(url).read()) + File "/home/joey/tmp/mediawikigitdump/mediawiki2markdown.py", line 285, in parse + self.parse_node(ast) + File "/home/joey/tmp/mediawikigitdump/mediawiki2markdown.py", line 76, in parse_node + f(node) + File "/home/joey/tmp/mediawikigitdump/mediawiki2markdown.py", line 88, in on_article + self.parse_children(node) + File "/home/joey/tmp/mediawikigitdump/mediawiki2markdown.py", line 83, in parse_children + self.parse_node(child) + File "/home/joey/tmp/mediawikigitdump/mediawiki2markdown.py", line 76, in parse_node + f(node) + File "/home/joey/tmp/mediawikigitdump/mediawiki2markdown.py", line 413, in on_section + self.parse_node(child) + File "/home/joey/tmp/mediawikigitdump/mediawiki2markdown.py", line 76, in parse_node + f(node) + File "/home/joey/tmp/mediawikigitdump/mediawiki2markdown.py", line 83, in parse_children + self.parse_node(child) + File "/home/joey/tmp/mediawikigitdump/mediawiki2markdown.py", line 76, in parse_node + f(node) + File "/home/joey/tmp/mediawikigitdump/mediawiki2markdown.py", line 474, in on_tagnode + assert not options.STRICT +AssertionError +zsh: exit 1 ./mediawikigitdump.py -v -t http://www.amateur-radio-wiki.net/ +</pre> + +<pre> +joey@wren:~/tmp/mediawikigitdump>./mediawikigitdump.py -v -t http://archiveteam.org +fetching page list from namespace 0 () +found 222 pages +fetching page /Sites using MediaWiki (English) from http://archiveteam.org/index.php?action=raw&title=%2FSites+using+MediaWiki+%28English%29 into /Sites_using_MediaWiki_(English).mdwn +Traceback (most recent call last): + File "./mediawikigitdump.py", line 298, in <module> + fetch_allpages(namespace) + File "./mediawikigitdump.py", line 82, in fetch_allpages + fetch_page(page.getAttribute('title')) + File "./mediawikigitdump.py", line 188, in fetch_page + f = open(filename, 'w') +IOError: [Errno 13] Permission denied: u'/Sites_using_MediaWiki_(English).mdwn' +zsh: exit 1 ./mediawikigitdump.py -v -t http://archiveteam.org +</pre> + +> > > > > I have updated my script to call the parser without strict mode and to trim leading slashes (and /../, for that matter...) -- [[anarcat]] + +> > > > > > Getting this error with the new version on any site I try (when using -t only): `TypeError: argument 1 must be string or read-only character buffer, not None` +> > > > > > bisecting, commit 55941a3bd89d43d09b0c126c9088eee0076b5ea2 broke it. +> > > > > > --[[Joey]] + +> > > > > > > I can't reproduce here, can you try with -v or -d to try to trace down the problem? -- [[anarcat]] + +<pre> +fetching page list from namespace 0 () +found 473 pages +fetching page 0 - 9 from http://www.amateur-radio-wiki.net/index.php?action=raw&title=0+-+9 into 0_-_9.mdwn +Traceback (most recent call last): + File "./mediawikigitdump.py", line 304, in <module> + main() + File "./mediawikigitdump.py", line 301, in main + fetch_allpages(options.namespace) + File "./mediawikigitdump.py", line 74, in fetch_allpages + fetch_page(page.getAttribute('title')) + File "./mediawikigitdump.py", line 180, in fetch_page + f.write(options.convert(urllib.urlopen(url).read())) +TypeError: argument 1 must be string or read-only character buffer, not None +zsh: exit 1 ./mediawikigitdump.py -v -d -t http://www.amateur-radio-wiki.net/ +</pre>