Parse HipChat JSON to wiki and HTML

This little project:

Reads the files in a HipChat export (must be unzipped, delete rooms that should not be included)
Creates wiki markup formatted tables with the contents
- Deletes empty messages, welcome messages and messages from JIRA and GitHub
- Replaces @name-references with [~name] -- assumes names in HipChat and Confluence are the same
- Replaces /code with {noformat}...{noformat}
Writes one .room-file for each room, to be inserted as wiki markup in Confluence
- The result format is
```
h2. <room name>

| date | @user | text |
| <blank if equal to above> | <ditto> | ... |
```
- The text tables are split at each 1000 rows, to stop Confluence from choking...
- What looks like URLs are converted into [links]
Also creates one large HTML file containing tables with the above structure for each room (+ a TOC)

Structure

The project is written in Scala using Akka, because I can :)

Communication outline:

This is obviously overengineered compared to the task at hand... But why not. I have tried to make it fairly simple to change, e.g.

read from API rather than file export
parse XML (or whatevet) rather than JSON
produce HTML rather than wiki markup
write to an API rather than file
etc

There are several "pipes" that eventually merge:

The main class tells the RoomlistParser to parse the 'rooms/list.json' file, which in turn sends definitions to the RoomDb. On receipt of the LastRoom, it is ready to service the filter chain.
The main class tells the UserParser to parse the 'users/list.json' file, which in turn sends definitions to the UserDb. On receipt of the LastUser, it is ready to service the filter chain.
The main class tells the RoomDirReader to list the subdirectories, sending a message to the RoomFileReaderfor each directory, which in turn reads alle JSON files in that directory (chronologically), and sends the room definition to the filter chain.
The first part of the filter chain is the RoomFilter, which by default (configurable) excludes private rooms (found by asking RoomDb), and looks up the real name of the room (directory names are somewhat mangled). Then the rooms are sent to the UserFilter, which looks up users in the UserDb to get full names / mention names. Next is the MessageFilter which excludes welcome messages, empty messages and messages from JIRA and GitHub.
The result from filtering is sent to WikiRoomFormatter to create formatted contents, followed by RoomWriter to place the final result in output files (one for each room).

Usage

sbt assembly
java -jar <....>/hipchatparse.jar baseDir [resultDir] (default for resultDir is baseDir)

Authors

lre = Lars Reed, Mesan AS

Notes

History

v1 2015.08.31 lre Initial version
v2 2015.09.02 lre Reads room directory
v3 2015.09.06 lre Includes HTML

Caveats

Overwrites the result files without warning
The URL-matching regexp is slow. Added a timeout to parsing to avoid hangs / aborted processing
There is a timeout to abort if the communication fails, however, it will not abort if an actor hangs while processing a message

TODO

yes... e.g.

Akka-level tests...
alternative @-mentions

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
project		project
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.sbt		build.sbt
hipchatparse.png		hipchatparse.png
scalastyle-config.xml		scalastyle-config.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parse HipChat JSON to wiki and HTML

Structure

Usage

Authors

Notes

History

Caveats

TODO

About

Releases

Packages

Languages

License

larsreed/hipchatparse

Folders and files

Latest commit

History

Repository files navigation

Parse HipChat JSON to wiki and HTML

Structure

Usage

Authors

Notes

History

Caveats

TODO

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages