forked from mithro/media2iki
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathiki-scatter-revs.rb
53 lines (42 loc) · 1.71 KB
/
iki-scatter-revs.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
#!/usr/bin/env ruby
# This script turns an XML file of revisions into a directory full
# of files, one revision per file.
#
# It doesn't currently check for duplicates -- if another revision
# is made on the exact date and time as a previous revision (down to
# the last second), it's just appended to the file. This creates an
# invalid XML file that we can detect and correct when reading them
# back.
#
# Once you have the directory full of revision files, you can use
# mv, rm, grep etc. See iki-gather-revs.rb for more.
# We set the modification date of each file to the date of each edit.
# That way you can use commands like "find -newer" to work with dates.
require 'rubygems'
require 'node-callback'
require 'time'
require 'fileutils' # for mkdir_p
def parse_revision(dir, node)
elements = node.elements
title = elements["title"].text
timestamp = Time.parse(elements["timestamp"].text)
filename = File.join(dir, title + '-' + timestamp.strftime("%Y%m%d-%H%M%S"))
FileUtils.mkdir_p File.dirname(filename)
File.open(filename, "a") { |f|
# node.write is supposed to re-indent the document.
# Too bad it doesn't. I should have used libxml.
bar = REXML::Formatters::Default.new
bar.write(node, f)
f << "\n"
}
# Make each file have the same modification date as its edit time.
File.utime(timestamp, timestamp, filename)
puts "Wrote #{filename}"
end
throw "You must supply the name of the file to parse!" unless infile = ARGV[0]
throw "You must supply the name of the directory to fill!" unless outdir = ARGV[1]
parse_node(infile, 'mediawiki/revision',
proc { |rev| parse_revision(outdir, rev)},
{:compress_whitespace => %w{revision contributor}}
)
puts "done."