-
Notifications
You must be signed in to change notification settings - Fork 6
/
Copy pathMediawiki_Conversion.html
136 lines (111 loc) · 7.1 KB
/
Mediawiki_Conversion.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
<!DOCTYPE HTML PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="content-type" content="text/html; charset=UTF-8">
<title>Mediawiki Conversion</title>
<link rel="stylesheet" href="sp_data/url.html" type="text/css" target="_top">
<link rel="stylesheet" href="sp_data/url_002.html" type="text/css" target="_top">
</head><body>
<div class="header">
<span class="title">Mediawiki Conversion</span>
</div>
<div id="content">
<p>This page describes how to transfer Mediawiki content to
Git, preserving full history complete with renames.
It also shows how to purge spam and other unwanted edits in the process.</p>
<p>Quick Links: <a href="Mediawiki_Conversion/Git_Import.html" target="_top">Git Load</a></p>
<a name="Introduction" target="_top"></a><h2>Introduction</h2>
<p>The procedure described here moves the files into Git
but it does not convert the Mediawiki markup or change the
content of the edits at all. You can do one of two things to fix this:</p>
<ul>
<li>Install the <a href="Mediawiki_Conversion/../Mediawiki_Plugin.html" target="_top">Mediawiki Plugin</a> to allow Ikiwiki to display Mediawiki markup right in your wiki.</li>
<li>Write a script to bulk-convert all mediawiki files to Markdown or whatever markup you desire.</li>
</ul>
<p>Finially, you might want to scan over the TODO list at the bottom of the page.
There are a few features that we don't support yet. Patches welcome!</p>
<a name="Edit Courageously" target="_top"></a><h5>Edit Courageously</h5>
<p>Thes pages are just some coarse notes that I took while going through
this process myself. I'm sure they're crap.
Please edit these pages directly to ask questions, make comments,
clarify things, whatever.</p>
<a name="Prerequisites" target="_top"></a><h2>Prerequisites</h2>
<p>We'll use:</p>
<ul>
<li>Ruby. I used REXML because it's a pretty nice API for working with XML.</li>
<li>LibXML's Ruby bindings because it's a lot faster than REXML</li>
<li>xsltproc because it's even faster</li>
<li>A few GB of disk space (depending on wiki size) for the conversion.</li>
</ul>
<p>On Debian / Ubuntu:</p>
<pre> $ sudo apt-get install ruby libxml-ruby xsltproc
</pre>
<p>If you want to display Mediawiki markup in Ikiwiki, you should
install the <a href="Mediawiki_Conversion/../Mediawiki_Plugin.html" target="_top">Mediawiki Plugin</a> before following these instructions and make sure it works
to your satisfaction.</p>
<a name="Mark Existing Wiki Read-Only" target="_top"></a><h3>Mark Existing Wiki Read-Only</h3>
<p>First mark your Mediawiki read-only so nobody loses edits into an obsolete wiki.</p>
<ul>
<li>Add to LocalSettings.php: <tt>$wgReadOnlyFile = "read-only"</tt></li>
<li>Create a file named "read-only" in the same directory as LocalSettings.php and fill it with your edit message. Something like, <em>this wiki is being moved and despammed. Editing disabled.</em></li>
<li>That message only shows up when you try to edit a page. If you want
to put a banner at the top of every page, first edit your
Mediawiki:Sitenotice special page. Here's an example:</li>
</ul>
<pre> <a title="http://wiki.u32.net/index.php?title=Mediawiki:Sitenotice&action=edit" href="http://wiki.u32.net/index.php%3Ftitle%3DMediawiki:Sitenotice%26action%3Dedit" target="_top">http://wiki.u32.net/index.php?title=Mediawiki:Sitenotice&action=edit</a>
</pre>
<a name="The Process" target="_top"></a><h2>The Process</h2>
<p>If you have a small site and don't want to perform any despamming,
you should be able to do the conversion in 20 minutes to an hour. If
you have a large site with a lot of spam, despamming can take hours or
even days. Definitely comment anywhere the procedure isn't clear.</p>
<p>Follow these steps in order.</p>
<ol>
<li><a href="Mediawiki_Conversion/Mediawiki_Export.html" target="_top">Export</a> -- Ask Mediawiki to produce all edits in hundred-megabyte XML files.</li>
<li><a href="Mediawiki_Conversion/Mediawiki_Translate.html" target="_top">Translate</a> -- Convert page moves, namespaces, etc from Mediawiki conventions to Git/Ikiwiki.</li>
<li><a href="Mediawiki_Conversion/Mediawiki_Despam.html" target="_top">Despam</a> -- (optional) Delete all history that you don't care to preserve.</li>
<li><a href="Mediawiki_Conversion/Git_Import.html" target="_top">Import</a> -- Sort, reformat, and import into your Git repository.</li>
</ol>
<a name="The Files" target="_top"></a><h2>The Files</h2>
<p>Remember that these scripts were written as one-time hacks! They are
uuuugly. Well, only iki-fast-load is really ugly, the rest are just
expedient.</p>
<ul>
<li><a href="iki-reformat.xsl" target="_top">iki-reformat.xsl</a></li>
<li><a href="iki-sort.xsl" target="_top">iki-sort.xsl</a></li>
<li><a href="iki-diff-next.rb" target="_top">iki-diff-next.rb</a></li>
<li><a href="iki-fast-load.rb" target="_top">iki-fast-load.rb</a></li>
<li><a href="iki-find-empty-edits.rb" target="_top">iki-find-empty-edits.rb</a></li>
<li><a href="iki-gather-revs.rb" target="_top">iki-gather-revs.rb</a></li>
<li><a href="iki-scatter-revs.rb" target="_top">iki-scatter-revs.rb</a></li>
<li><a href="rename-helper.sh" target="_top">rename-helper.sh</a></li>
</ul>
<a name="Thanks" target="_top"></a><h2>Thanks</h2>
<p>Special thanks to Loye Young who reverted spam off my old wiki a whole bunch of times.</p>
<a name="TODO" target="_top"></a><h2>TODO</h2>
<ul>
<li>Push as much of the conversion as possible into the final git repo.
Ferinstance, converting Main Page to Main_Page should be done after all
the files are imported into Git. Same with doing all the namespace
conversions (Category: -> /Category/, User: -> /users/, etc).</li>
<li>Finish the rename-helper script. Hell, renames should probably be done after git-import too.</li>
<li>Make scripts executable and agnostic. (in other words, write xslt command, get rid of .rb, .xslt, etc extensions)</li>
<li>Standardize the arguments. i.e. src should always come before dst.
The scripts should probably assume they should start working in the cwd
-- don't force them to accept 'node' as an argument (unless that
argument represents the source or the destination...)?</li>
<li>Talk:Blah pages should be turned into Blah/Discussion pages.</li>
<li>User:Blah pages should be turned into users/Blah.</li>
<li>Factor renaming "page name" -> "page_name" out of iki-fast-load and into its own script.</li>
<li>Categories should be turned into tags.</li>
<li>Tables? Transclusion? Thumbnails?</li>
<li>Write a script demonstrating how to convert all .mediawiki files to .mdwn or other (for people who don't want to run the <a href="Mediawiki_Conversion/../Mediawiki_Plugin.html" target="_top">mediawiki plugin</a>)</li>
</ul>
<a name="comments" target="_top"></a><h2>comments</h2>
<p>> I'd like to pull together some docs on ikiwiki.info about conversions.. could I copy this page to there? --<span class="createlink"><a href="http://u32.net/cgi-bin/ikiwiki.cgi%3Fpage%3Djoey%26from%3DMediawiki_Conversion%26do%3Dcreate" target="_top">?</a>Joey</span></p>
</div>
<div id="footer">
<div id="pageinfo">
</div>
<!-- from home -->
</div>
</td></tr></tbody></table>
</body></html>