docs/CHANGES.TXT

0.89 (TBD)
-----------------
- Fix: Fix broken links in README
- Fix: Timing in DVB, sub duration check for timeout.
- New: Added support for SCC and CCD encoder formats
- New: Added support to output captions to MCC file (#733).
- New: Add support for censoring words ("Kid Friendly") (#1139)
- New: Extend support of capitalization for all BITMAP and 608 subtitles (#1214)
- New: Added an option to disable timestamps for WebVTT (In response to issue #1127)
- Fix: Change inet_ntop to inet_ntoa for Windows XP compatibility
- Fix: Added italics, underline, and color rendering support for -out=spupng with EIA608/teletext
- Fix: ccx_demuxer_mxf.c: Parse framerate from MXF captions to fix caption timings.
- Fix: hardsubx_decoder.c: Fix memory leaks using Leptonica API.
- Fix: linux/Makefile.am: added some sources to enable rpms to be created.
- Fix: Crash when using -sc (sentence case) option (#1115)
- Fix: Segmentation fault on VOB #1128
- Fix: Hang while processing video #1121
- Fix: lib_ccx.c: Initialize fatal error logging function before first usage in init_libraries
- Fix: A few (minor) memory leaks around the code.
- Fix: General code clean up / reformatting
- Fix: Fix multiple definitions with new -fno-common default in GCC 10
- Fix: Mac now builds reproducibly again without errors on the date command (#1230)
- Fix: Allow all oem modes with tesseract v4 (#1264)
- Doc: Updated ccextractor.cnf.sample.
- Update: Updated LibPNG to 1.6.37
- Remove: Python API (since no one cares about it and it's unmaintained)
- Remove: -cf , just use FFmpeg if you want a ES from a TS or PS, CCExtractor is a bad tool
  for this.
- Fix: Segmentation fault on Windows
- Update: Updated libGPAC to 1.0.1
- Fix: Segmentation fault with unsupported and multitrack file reports
- Fix: Write subtitle header to multitrack outputs
- Fix: Write multitrack files to the output file directory

0.88 (2019-05-21)
-----------------
- New: More tapping points for debug image in ccextractor.
- New: Add support for tesseract 4.0
- Optimize: Remove multiple RGB to grey conversion in OCR.
- Fix: Update UTF8Proc to 2.2.0
- Fix: Update LibPNG to 1.6.35
- Fix: Update Protobuf-c to 1.3.1
- Fix: Warn instead of fatal when a 0xFF marker is missing
- Fix: Segfault in general_loop.c due to null pointer dereference (case of no encoder)
- Fix: Enable printing hdtv stats to console.
- Fix: Many typos in comments and output messages
- Fix: Ignore Visual Studio temporary project files
- New: Add support for non-Latin characters in stdout
- Fix: Check whether stream is empty
- New: Add support for EIA-608 inside .mkv
- New: Add support for DVB inside .mkv
- Fix: Added -latrusmap Map Latin symbols to Cyrillic ones in special cases
       of Russian Teletext files (issue #1086)
- Fix: Several OCR crashes 

0.87 (2018-10-23)
-----------------
- New: Upgrade libGPAC to 0.7.1.
- New: mp4 tx3g & multitrack subtitles.
- New: Guide to update dependencies (docs/Updating_Dependencies.txt).
- New: Add LICENSE File (#959).
- New: Display quantisation mode in info box (#954).
- New: Add instruction required to build ccextractor with HARDSUBX support (#946).
- New: Added version no. of libraries to --version.
- New: Added -quant (OCR quantization function).
- New: Python API now compatible with Python 3.
- Fix: linux/builddebug: Added non-local directories to the incluye search path so we don't
       require a locally compiled tesseract or leptonica.
- Fix: Correct -HARDSUBX Bug In CMake, allow build with hardsubx using cmake (#966).
- Fix: possible segfaults in hardsubx_classifier.c due to strdup (#963).
- Fix: Improve the start and end timestamps of extracted burned in captions (#962).
- Fix: Update COMPILATION.md (#960).
- Fix: Fixed crash with "-out=report" and "-out=null".
- Fix: -nocf not working with OCR'ing (#958).
- Fix: segfault in add_cc_sub_text and initialize to NULL in init_encoder (#950).
- Fix: ccx_decoders_common.c: Copy data type when creating a copy of the subtitle structure.
- Fix: Implicit declaration of these functions throws warning during build (#948).
- Fix: ccx_decoders_common.c: Properly release allocated resources on free_subtitle().
- Fix: Added a datatype member to struct cc_subtitle - needed so we can properly free all
       memory when void *data points to a structure that has its own pointers.
- Fix: dvb_subtitle_decoder.c: When combining image regions verify that the offset is
       never negative.
- Fix: Updated traivis.yml to fix osx build (#947).
- Fix: Add utf8proc src file to cmake, updated header file (#944).
- Fix: Added required pointers on freep() calls.
- Fix: Removed dvb_debug_traces_to_stdout and used the usual dbg_print instead.
- Fix: Additional debug traces for DVB.
- Fix: Fix minor memory leak in ocr.c.
- Fix: Fix issue with displaying utf8proc version.
- Fix: Fix failing cmake due to liblept/tesseract header files.
- Fix: Added missing \n in params.c.
- Fix: builddebug: Use -fsanitize=address -fno-omit-frame-pointer.
- Fix: ccx_decoders_common.c: Removed trivial memory leak.
- Fix: ccx_encoders_srt.c: Made sure a pointer is non-NULL before dereferencing.
- Fix: dvb_subtitle_decoder.c: Initialize pointer members to NULL when creating a structure.
- Fix: lib_ccx.c: Initialize (memset 0) structure cc_subtitle after memory allocation.
- Fix: Added verboseness to error/warnings in dvb_subtitle_decoder.c.
- Fix: dvb_subtitle_decoder.c: Work on passing invalid streams errors upstream (plus some
       warning messages) so we can eventually recover from this situation instead of crashing.
- Fix: telxcc.c: Currently setting a colour doesn't necessarily add a space even though the
       specifications mandate it. (#930).
- Fix: dvb_subtitle_decoder.c: Fix null pointer derefence when region==NULL in write_dvb_sub.
- Fix: DVB Teletext subtitle incomplete.
- Fix: replace all 0xA characters within startbox with 0x20.
- Fix: DVB Teletext subtitle incomplete (#922).
- Fix: Add missing return value to one of the returns in process_tx3g().
- Fix: Typos and other minor bugs.
- Fix: Tidy CMakeLists & vcxproj (#920).
- Fix: Added m2ts and -mxf to help screen.
- Fix: Added MKV to demuxer_print_cfg.
- Fix: Added MXF to demuxer_print_cfg.
- Fix: "Out of order packets" error had wrong print() parameters.
- Fix: Updated Python documentation.
- Fix: Fix incorrect path in XML (#904).
- Fix: linux build script (non-debug): Don't hide warnings from compiler.
- Fix: linux build script (debug): Display what's step of the build script we're in.
- Fix: Make the build reproducible (#976).
- Fix: Remove instance of o1 and o2 from help.
- Fix: Colors of DVB subtitles with depth 2 broken due to a missing break.
- Fix: CEA-708: Caption loss due to CW command (#991).
- Fix: CEA-708: Update patch for windows priority with functions (#990).

0.86 (2018-01-09)
-----------------
- New: Preliminary MXF support
- New: Added a histogram in one-minute increments of the number of lines in a subtitle.
- New: Added Autoconf build scripts for CCExtractor to generate makefiles (mac).
- New: Added Autoconf build scripts for CCExtractor to generate makefiles (linux).
- New: Added .rpm package generation script.
- New: Added build/installation script for .pkg.tar.xz (Arch Linux).
- New: Added tarball generation script.
- New: Added --analyzevideo. If present the video stream will be processed even if the
  subtitles are in a different stream. This is useful when we want video information
  (resolution, frame type, etc). -vides now implies this option too. 
  [Note: Tentative - some possibly breaking changed were made for this, so if you
  use it validate results]
- New: Added a GUI in the main CCExtractor binary (separate from the external GUIs 
  such as CCExtractorGUI).
- New: A Python binding extension so it's possible to use CCExtractor's tools from
  Python.
- New: Added -nospupngocr (don't OCR bitmaps when generating spupng, faster)
- New: Add support for file split on keyframe (-segmentonkeyonly)
- New: Added WebVTT output from Matroska.
- New: Support for source-specific multicast.
- New: FreeType-based text renderer (-out=spupng with teletext/EIA608).
- New: Upgrade library UTF8proc
- New: Upgrade library win_iconv
- New: Upgrade library zlib 
- New: Upgrade library LibPNG 
- New: Support for Source-Specific Multicast
- New: Added Travis CI support 
- New: Made error messages clearer, less ambiguous
- Fix: Prevent the OCR being initialized more than once (happened on multiprogram and
  PAT changes)
- Fix: Makefiles, build scripts, etc... everything updated and corrected for all
  platforms.
 -Fix: Proper line ending for .srt files from bitmaps.  
- Fix: OCR corrections using grayscale before extracting texts. 
- Fix: End timestamps in transcripts from DVB.
- Fix: Forcing -noru to cause deduplication in ISDB
- Fix: TS: Skip NULL packets 
- Fix:  When NAL decoding fails, don't dump the whole decoded thing, limit to 160 bytes.
- Fix: Modify Autoconf scripts to generate tarball for mac from `/package_creators/tarball.sh` 
  and include GUI files in tarball
- Fix: Started work on libGPAC upgrade.
- Fix: DVB subtitle not extracted if there's no display segment
- Fix: Heap corruption in add_ocrtext2str
- Fix: bug that caused -out=spupng sometimes crashes
- Fix: Checks for text before newlines on DVB subtitles 
- Fix: OCR issue caused by separated dvb subtitle regions 
- Fix: DVB crash on specific condition (!rect->ocr_text)
- Fix: DVB bug (Multiple-line subtitle; Missing last line)
- Fix: --sentencecap for teletext samples
- Fix: Crash when image passed into OCR is empty
- Fix: Temporarily wrapped the Python API, not production ready yet
- Fix: -delay option in DVB


0.85b (2017-01-26)
------------------
- Fix: Base Windows binary (without OCR) compiled without DLL dependencies.

0.85 (2017-01-23)
-----------------
- New: Added FFMPEG 3.0 to Windows build - last one that is XP compatible.
- New: Major improvements in CEA-608 to WebVTT (styles, etc).
- New: Return a non-zero return code if no subtitles are found.
- New: Windows build files updated to Visual Studio 2015, new target platform is 140_xp.
- New: Added basic support of Tesseract 4.0.0.
- New: Added build script for .deb.
- New: Updated -debugdvbsub parameter to get the most relevant DVB traces for debugging.
- New: SMPTE-TT files are now compatible with Adobe Premiere.
- New: Updated libpng.
- New: Added 3rd party (Tracy from archive.org) static linux build script.
- New: Add chapter extraction for MP4 files.
- New: Return code 10 if no captions are found at all.
- Fix: Teletext duplicate lines in certain cases.
- Fix: Improved teletext timing.
- Fix: DVB timing is finally good.
- Fix: A few minor memory leaks.
- Fix: tesseract library file included in mac build command.
- Fix: Bad WTV timings in some cases.
- Fix: Mac build script.
- Fix: Memory optimization in HARDSUBX edit_distance.
- Fix: SubStation Alpha subtitles in bitmap.
- Fix: lept msg severity in linux.
- Fix: SSA, SPUPNG and VTT timing and skipping of subtitles for SAMI and TTML.
- Fix: SMPTE-TT : Added support for font color.
- Fix: SAMI unnecessary empty subtitle when extracting DVB subs.
- Fix: Skip the packet if the adaptation field length is broken.
- Fix: 708 - lots of work done in the decoder. Implementation of more commands. Better timing.


0.84 (2016-12-16)
-----------------
- New: In Windows, both with and without-OCR binaries are bundled, since the OCR one causes problems due to 
  dependencies in some system. So unless you need the OCR just use the non-OCR version.
- New: Added -sbs (sentence by sentence) for DVB output. Each frame in the output file contains a complete
  sentence (experimental).
- New: Added -curlposturl. If used each output frame will be sent with libcurl by doing a POST to that URL.
- Fix: More code consistency checking in function names.
- Fix: linux build script now tries to verify dependencies.
- Fix: Mac build script was missing a directory.


0.83 (2016-12-13)
-----------------
- Fix: Duplicate lines in mp4 (specifically affects itunes).
- Fix: Timing in .mp4, timing now calculated for each CC pair instead of per atom.
- Fix: Typos everywhere in the documentation and source code.
- Fix: CMakeLists for build in cmake.
- Fix: -unixts option.
- Fix: FPS switching messages.
- Fix: Removed ugly debug statement with local path in HardsubX.
- Fix: Changed platform target to v120_xp in Visual Studio (so XP is supported again).
- Fix: Added detail in many error messages.
- Fix: Memory leaks in videos with XDS.
- Fix: Makefile compatibility issues with Raspberry pi.
- Fix: missing separation between WebVTT header and body. 
- Fix: Stupid bug in M2TS that preventing it from working.
- Fix: OCR libraries dependencies for the release version in Windows.
- Fix: non-buffered reading from pipes.
- Fix: --stream option with stdin.
- New: terminate_asap to buffered_read_opt
- New: Added some TV-show specific spelling dictionaries.
- New: Updated GPAC library.
- New: ASS/SSA.
- New: Capture sigterm to do some clean up before terminating.
- New: Work on 708: Changed DefineWindow behavior, only clear text of an existing window is style has changed.

0.82 (2016-08-15)
-----------------
- New: HardsubX - Burned in subtitle extraction subsystem.
- New: Color Detection in DVB Subtitles
- Fix: Corrected sentence capitalization
- Fix: Skipping redundant bytes at the end of tx3g atom in MP4
- Fix: Illegal SRT files being created from DVB subtitles
- Fix: Incorrect Progress Display

0.81 (2016-06-13)
-----------------
- New: --version parameter for extensive version information (version number, compile date, executable hash, git commit (if appropriate))
- New: Add -sem (semaphore) to create a .sem file when an output file is open and delete it when it's closed.
- New: Add --append parameter. This will prevent overwriting of existing files.
- New: File Rotation support added. The user has to send a USR1 signal to rotate.
- Fix: Issues with files <1 Mb
- Fix: Preview of generated transcript.
- Fix: Statistics were not generated anymore.
- Fix: Correcting display of sub mode and info in transcripts.
- Fix: Teletext page number displayed in -UCLA.
- Fix: Removal of excessive XDS notices about aspect ratio info.
- Fix: Force Flushing of file buffers works for all files now.
- Fix: mp4 void atoms that was causing some .mp4 files to fail.
- Fix: Memory usage caused by EPG processing was high due to many non-dynamic buffers.
- Fix: Project files for Visual Studio now include OCR support in Windows.

0.80 (2016-04-24)
-----------------
- Fix: "Premature end of file" (one of the scenarios)
- Fix: XDS data is always parsed again (needed to extract information such as program name)
- Fix: Teletext parsing: @ was incorrectly exported as * - X/26 packet specifications in ETS 300 706 v1.2.1 now better followed
- Fix: Teletext parsing: Latin G2 subsets and accented characters not displaying properly
- Fix: Timing in -ucla
- Fix: Timing in ISDB (some instances)
- Fix: "mfra" mp4 box weight changed to 1 (this helps with correct file format detection)
- Fix: Fix for TARGET File is null. 
- Fix: Fixed SegFaults while parsing parameters (if mandatory parameter is not present in -outinterval, -codec or -nocodec)
- Fix: Crash when input small is too small
- Fix: Update some URLs in code (references to docs)
- Fix: -delay now updates final timestamp in ISDB, too
- Fix: Removed minor compiler warnings
- Fix: Visual Studio solution files working again
- Fix: ffmpeg integration working again
- New: Added --forceflush (-ff). If used, output file descriptors will be flushed immediately after being written to
- New: Hexdump XDS packets that we cannot parse (shouldn't be many of those anyway)
- New: If input file cannot be open, provide a decent human readable explanation
- New: GXF support

0.79 (2016-01-09)
-----------------
- Support for Grid Format (g608)
- Show Correct number of teletext packet processed
- Removed Segfault on incorrect mp4 detection
- Remove xml header from transcript format
- Help message updated for Teletext
- Added --help and -h for help message
- Added --nohtmlescape option
- Added --noscte20 option

0.78 (2015-12-12)
-----------------
- Support to extract Closed Caption from MultiProgram at once.
- CEA-708: exporting to SAMI (.smi), Transcript (.txt), Timed Transcript (ttxt) and SubRip (.srt).
- CEA-708: 16 bit charset support (tested on Korean).
- CEA-708: Roll Up captions handling.
- Changed TCP connection protocol (BIN data is now wrapped in packets, added EPG support and keep-alive packets).
- TCP connection password prompt is removed. To set connection password use -tcppassword argument instead.
- Support ISDB Closed Caption.
- Added a new output format, simplexml (used internally by a CCExtractor user, may or may not be useful for
  anyone else).

0.77 (2015-06-20)
-----------------
- Fixed bug in capitalization code ('I' was not being capitalized).
- GUI should now run in Windows 8 (using the include .Net runtime, since
  3.5 cannot be installed in Windows 8 apparently).
- Fixed Mac build script, binary is now compiled with support for
  files over 2 GB.
- Fixed bug in PMT code, damaged PMT sections could make CCExtractor
  crash.

0.76 (2015-03-28)
-----------------
- Added basic M2TS support
- Added EPG support - you can now export the Program Guide to XML
- Some bug fixes

0.75 (2015-01-15)
-----------------
- Fixed issue with teletext to other then srt.
- CCExtractor can be used as library if compiled using cmake
- By default the Windows version adds BOM to generated UTF files (this is
  because it's needed to open the files correctly) while all other
  builds don't add it (because it messes with text processing tools). 
  You can use -bom and -nobom to change the behaviour.

0.74 (2014-09-24)
-----------------
- Fixed issue with -o1 -o2 and -12 parameters (where it would write output only in the o2 file)
- Fixed UCLA parameter issue. Now the UCLA parameter settings can't be overwritten anymore by later parameters that affect the custom transcript
- Switched order around for TLT and TT page number in custom transcript to match UCLA settings
- Added nobom parameter, for when files are processed by tools that can't handle the BOM. If using this, files might be not readable under windows.
- Segfault fix when no input files were given
- No more bin output when sending to server + possibility to send TT to server for processing
- Windows: Added the Microsoft redistributable MSVCR120.DLL to both the installation package and the application zip.

0.73 - GSOC (2014-08-19)
------------------------
- Added support of BIN format for Teletext
- Added start of librarization. This will allow in the future for other programs to use encoder/decoder functions and more.

0.72 - GSOC (2014-08-12)
------------------------
- Fix for WTV files with incorrect timing
- Added support for fps change using data from AVC video track in a H264 TS file.
- Added FFMpeg Support to enable all encapsulator and decoder provided by ffmpeg

0.71 - GSOC (2014-07-31)
------------------------
- Added feature to receive captions in BIN format according to CCExtractor's own
  protocol over TCP (-tcp port [-tcppassword password])
- Added ability to send captions to the server described above or to the
  online repository (-sendto host[:port])
- Added -stdin parameter for reading input stream from standard input
- Compilation in Cygwin using linux/Makefile
- Fix for .bin files when not using latin1 charset
- Correction of mp4 timing, when one timestamp points timing of two atom

0.70 - GSOC (2014-07-06)
------------------------
This is the first release that is part of Google's Summer of Code.
Anshul, Ruslan and Willem joined CCExtractor to work on a number of things
over the summer, and their work is already reaching the mainstream 
version of CCExtractor.

- Added a huge dictionary submitted by Matt Stockard.
- Added DVB subtitles decoder, spupng in output
- Added support for cdt2 media atoms in QT video files. Now multiple atoms in
 a single sample sequence are supported.
- Changed Makefile.
- Fixed some bugs.
- Added feature to print info about file's subtitles and streams (-out=report).
- Support Long PMT.
- Support Configuration file.
	- There is an sample configuration file in doc/ folder with name
	  ccextractor.cnf.sample
	- Just now only ccextractor.cnf named files kept beside ccextractor
	  executable is supported
	- for details of which options can be set using configuration file,
	  please look at sample file.

- Added options for custom transcript output:
	new parameter (-customtxt format), where the format must be like this: 1100100 (7 digits).
	These indicate whether the next things should be displayed  or not in the (timed) transcript:
		- Display start time
		- Display end time
		- Display caption mode
		- Display caption channel
		- Use a relative timestamp (relative to the sample)
		- Display XDS info
		- Use colors
	Examples:
		0000101 is the default setting for transcripts
		1110101 is the default for timed transcripts
		1111001 is the default setting for -ucla
	Make sure you use this parameter after others that might affect these 
	settings (-out, -ucla, -xds, -txt, -ttxt, ...)
- Fixed Negative timing Bug

0.69 (2014-04-05)
-----------------
- A few patches from Christopher Small, including proper support
  for multiple multicast clients listening on the same port.
- GUI: Fixed teletext preview.
- GUI: Added a small indicator of data being received when reading from
  UDP.
- GUI: Added UTF-8 support to preview Window (used for teletext).
- Fixes in Makefile and build script, compilation in linux and OSX failed
  if another libpng was found in the system.
- WTV support directly in CCExtractor (no need for wtvccdump any more).
- Started refactoring and clean-up.
- Fix: MPEG clock rollover (happens each 26 hours) caused a time
  discontinuity.
- Windows GUI: Started work on HDHomeRun support. For now it just looks 
  for HDHomeRun devices. Lots of other things will arrive in the next
  versions.
- Windows GUI: Some code refactoring, since the HDHomeRun support makes
  the code larger enough to require more than one source file :-)

0.68 (2013-12-24)
-----------------
- A couple of shared variables between 608 decoders were causing
  problems when both fields were processed at the same time with
  -12, fixed.
- Added BOM for UTF-8 files.
- Corrected a few extended characters in the UTF-8 encoding,
  probably never used in real world captioning but since we got
  a good test sample file...
- Color and fonts in PAC commands were ignored, fixed (Helen Buus).
- Added a new output format, spupng. It consists on one .png file
  for each subtitle frame and one .xml with all the timing 
  (Heleen Buus).
- Some fixes (Chris Small).

0.67 (2013-10-09)
-----------------
- Padding bytes were being discarded early in the process in 0.66,
  which is convenient for debugging, but it messes with timing in
  .raw, which depends on padding. Fixed.
- MythTV's branch had a fixed size buffer that could not be enough
  some times. Made dynamic.
- Better support for PAT changing mid-stream.
- Removed quotes in Start in .smi (format fix).
- Added multicast support (Chris Small)
- Added ability to select IP address to bind in UDP (Chris Small)
- Fixes in -unixts and -delay for teletext.
- Added -autodash : When two people are talking, add a dash as
  needed (this is based on subtitle position). Only in .srt and
  with -trim. Quite experimental, feedback appreciated.
- Added -latin1 to select Latin 1 as encoding. Default is now
  UTF-8 (-utf8 still exists but it's not needed).
- Added -ru1, which emulates a (non-existing in real life) 1 line
  roll-up mode. 


0.66 (2013-07-01)
-----------------
- Fixed bug in auto detection code that triggered a message 
  about file being auto of sync.
- Added -investigate_packets
  The PMT is used to select the most promising elementary stream
  to get captions from. Sometimes captions are where you least
  expect it so -datapid allows you to select a elementary stream
  manually, in case the CC location is not obvious from the PMT
  contents. To assist looking for the right stream, the parameter
  "-investigate_packets" will have CCExtractor look inside each
  stream, looking for CC markers, and report the streams that 
  are likely to contain CC data even if it can't be determined from
  their PMT entry.
- Added -datastreamtype to manually selecting a stream based on
  its type instead of its PID. Useful if your recording program
  always hides the caption under the stream type. 
- Added -streamtype so if an elementary stream is selected manually
  for processing, the streamtype can be selected too. This can be 
  needed if you process, for example a stream that is declared as 
  "private MPEG" in the PMT, so CCExtractor can't tell what it is.
  Usually you'll want -streamtype 2 (MPEG video) or -streamtype 6
  (MPEG private data).
- PMT content listing improved, it now shows the stream type for
  more types.
- Fixes in roll-up, cursor was being moved to column 1 if a 
  RU2, RU3 or RU4 was received even if already in roll-up mode.
- Added -autoprogram. If a multiprogram TS is processed and 
  -autoprogram is used, CCExtractor will analyze all PMTs and use
  the first program that has a suitable data stream.
- Timed transcript (ttxt) now also exports the caption mode 
  (roll-up, paint-on, etc.) next to each line, as it's useful to 
  detect things like commercials.
- Content Advisory information from XDS is now decoded if it's
  transmitted in "US TV parental guidelines" or "MPA". 
  Other encoding such as Canada's are not supported yet due 
  to lack of samples.
- Copy Management information from XDS is now decoded.
- Added -xds. If present and export format is timed transcript
  (only), XDS information will be saved to file (same file as the
  transcript, with XDS being clearly marked). Note that for now
  all XDS data is exported even if it doesn't change, so the 
  transcript file will be significantly larger.
- Added some PaintOn support, at least enough to prevent it 
  from breaking things when the other modes are used.
- Removed afd_data() warning. AFD doesn't carry any caption related
  data. AFD still detected in code in case we want to do something
  with it later anyway.
- Ported last changes from Petr Kutalek's telxcc. Current version
  is 2.4.4.
- In teletext mode when exporting to transcript (not .srt), an effort
  is made to detect and merge line duplicates. This is done by using
  the Levenshtein's distance, which is the number of changes requires
  to convert one string to another. To simplify things, strings are
  compared up to the length of the shortest one.
  There are 3 parameters that can be used to tweak the thresholds:
      -deblev: Enable debug so the calculated distance for each two
	   strings is displayed. The output includes both strings, the
	   calculated distance, the maximum allowed distance, and whether
	   the strings are ultimately considered equivalent or not, i.e.
	   the calculated distance is less or equal than the max allowed.
	  -levdistmincnt value: Minimum distance we always allow 
	   regardless of the length of the strings. Default 2. This means 
	   that if the calculated distance is 0, 1 or 2, we consider the 
	   strings to be equivalent.
	  -levdistmaxpct value: Maximum distance we allow, as a 
	   percentage of the shortest string length. Default 10%. For 
	   example, consider a comparison of one string of 30 characters 
	   and one of 60 characters. We want to determine whether the 
	   first 30 characters of the longer string are more or less the 
	   same as the shortest string, i.e. whether the longest string 
	   is the shortest one plus new characters and maybe some 
	   corrections. Since the shortest string is 30 characters and 
	   the default percentage is 10%, we would allow a distance of 
	   up to 3 between the first 30 characters.
- Added -lf : Use UNIX line terminator (LF) instead of Windows (CRLF).	   
- Added -noautotimeref: Prevent UTC reference from being auto set from
  the stream data.

0.65 (2013-03-14)
-----------------
- Minor GUI changes for teletext
- Added end timestamps in timed transcripts
- Added support for SMPTE (patch by John Kemp)
- Initial support for MPEG2 video tracks inside MP4 files (thanks a
  lot to GPAC's Jean who assisted in analyzing the sample and 
  doing the required changes in GPAC).
- Improved MP4 auto detection
- Support for PCR if PTS is not available (needed for some teletext
  samples, and probably useful for everything else).
- Support for UDP streaming - finally. Use "-udp $port" to have
  CCExtractor listen for a stream. I've only been able to test it
  with an European HDHomeRun, but it should work fine with any other
  tuner.
- Refactored PMT / PAT processing in transport streams, now allows to
  display their contents (-parsePAT and -parsePMT) which makes
  troubleshooting easier.
  
0.64 (2012-10-29)
-----------------
- Changed Window GUI size (larger).
- Added Teletext options to GUI.
- Added -teletext to force teletext mode even if not detected
- Added -noteletext to disable teletext detection. This can be needed
  for streams that have both 608 data and teletext packets if you
  need to process the 608 data (if teletext is detected it will
  take precedence otherwise).
- Added -datapid to force a specific elementary stream to be used for
  data (bypassing detections).
- Added -ru2 and -ru3 to limit the number of visible lines in roll-up
  captions (bypassing whatever the broadcast says).
- Added support for a .hex (hexadecimal) dump of data. 
- Added support for wtv in Windows. This is done by using a new program
  (wtvccdump.exe) and a new DirectShow filter (CCExtractorDump.dll) that
  process the .wtv using DirecShow's filters and export the line 21 data
  to a .hex file. The GUI calls wtvccdump.exe as needed.
- Added --nogoptime to force PTS timing even when CCExtractor would
  use GOP timing otherwise.

0.63 (2012-08-17)
-----------------
- Telext support added, by integrating Petr Kutalek's telxcc. Integration is
  still quite basic (there's equivalent code from both CCExtractor and 
  telxcc) and some clean up is needed, but it works. Petr has announced that 
  he's abandoning telxcc so further development will happen directly in 
  CCExtractor.
- Some bug fixes, as usual.

0.62 (2012-05-23)
-----------------
- Corrected Mac build "script" (needed to add GPAC includes). Thanks to the
  Mac users that sent this.
- Hauppauge mode now uses PES timing, needed for files that don't have
  caption data during all the video (such as in commercial breaks).
- Added -mp4 and -in:mp4 to force the input to be processed as MP4. 
- CC608 data embedded in a separate stream (as opposed as in the video
  stream itself) in MP4 files is now supported (not heavily tested). 
  This should be rather useful since closed captioned files from iTunes
  use this format.
- More CEA-708 work. The debugger is now able to dump the "TV" contents for
  the first time. Also, a .srt can be generated, however timing is not quite 
  good yet (still need to figure out why). 
- Added -svc (or --service) to select the CEA-708 services to be processed.
  For example, -svc 1,2 will process the primary and secondary language
  services. Valid values are 1-63, where 1 is the primary language, 2 is
  the secondary language (this is part of the specification) and 3-63 are
  provider defined.
- Rajesh Hingorani sent a fix for the MPEG decoder that fixes garbled output
  or certain samples (we had none like this in our test collection). Thanks,
  Rajesh.

0.61 (2012-03-08)
-----------------
- Fix: GCC 3.4.4 can now build CCExtractor.
- Fix: Damaged TS packets (those that come with 'error in transport' bit
  on) are now skipped.
- Fix: Part of the changes for MP4 support (CC packets buffering in 
  particular) broke some stuff for other files, causing at least very 
  annoying character duplication. We hope we've fixed it without breaking 
  anything but please report).
- Some non-interesting cleanup.

0.60 (unreleased)
-----------------
- Add: MP4 support, using GPAC (a media library). Integration is currently
  "enough so it works", but needs some more work. There's some duplicate
  code, the stream must be a file (no streaming), etc.
- Fix: The Windows version was writing text files with double \r.
- Fix: Closed captions blocks with no data could cause a crash.
- Fix: -noru (to generate files without duplicate lines in 
  roll-up) was broken, with complete lines being missing.
- Fix: bin format not working as input. 

0.59 (2011-10-07)
-----------------
- More AVC/H.264 work. pic_order_cnt_type != 0 will be processed now. 
- Fix: Roll-up captions with interruptions for Text (with ResumeTextDisplay
  in the middle of the caption data) were missing complete lines.
- Added a timed text transcript output format, probably only useful for
  roll-up captions. Use --timedtranscript or -ttxt. Output is like this:

00:01:25,485 | HOST: LAST NIGHT THE REPUBLICAN
00:01:29,522 | HOPEFULS INTRODUCE THEMSELVES TO
00:01:30,623 | PRIMARY VOTERS.

- XDS parser. Not complete (no point in dealing with V-Chip stuff for
  example), but enough to extract program and station information.
- Input streams can now come from standard input using - (just an hyphen)
  as parameter.
- Added a new output format called 'null' (use -null or -out=null). This
  format means "Don't produce any file", and is useful to have CCExtractor
  process the stream (for XDS messages, debugging, etc) without actually
  generating anything.
- Updated Windows GUI.
- Added -quiet => If used, CCExtractor will not write any message.
- Added -stdout => If used, the captions will be sent to stdout (console)
  instead of file. Combined with -, CCExtractor can work as a filter in
  a larger process, receiving the stream from stdin and sending the
  captions to stdout. 
- Some code clean up, minor refactoring.
- Teletext detection (not yet processing).

0.58 (2011-08-21)
-----------------
- Implemented new PTS based mode to order the caption information
  of AVC/H.264 data streams.  The old pic_order_cnt_lsb based method
  is still available via the -poc or --usepicorder command switches.
- Removed a couple of those annoying "Impossible!" error messages 
  that appears when processing some (possibly broken, unsure) files.
- Added -nots --notypesettings to prevent italics and underline 
  codes from being displayed.
- Note to those not liking the paragraph symbol being used for the 
  music note: Submit a VALID replacement in latin-1.
- Added preliminary support for multiple program TS files. The 
  parameter --program-number (or -pn) will let you choose which
  program number to process. If no number is passed and the TS 
  file contains more than one, CCExtractor will display a list of
  found programs and terminate.
- Added support (basic, because I only received one sample) for some
  Hauppauge cards that save CC data in their own format. Use the
  parameter -haup to enable it (CCExtractor will display a notice 
  if it thinks that it's processing a Hauppauge capture anyway).
- Fixed bug in roll-up.
- More AVC work, now TS files from echostar that provided garbled
  output are processed OK.
- Updated Windows GUI.

0.57 (2010-12-16)
-----------------
- Bug fixes in the Windows version. Some debug code was unintentionally
  left in the released version. 

0.56 (2010-12-09)
-----------------
- H264 support
- Other minor changes a lot less important

0.55 (2009-08-09)
-----------------
- Replace pattern matching code with improved parser for MPEG-2 elementary
  streams.
- Fix parsing of ReplayTV 5000 captions.
- Add ability to decode SCTE 20 encoded captions.
- Make decoding of TS files more error tolerant.
- Start implementation of EIA-708 decoding (not active yet).
- Add -gt / --goptime switch to use GOP timing instead of PTS timing.
- Start implementation of AVC/H.264 decoding (not active yet).
- Fixed: The basic problem is that when 24fps movie film gets converted to 30fps NTSC 
  they repeat every 4th frame. Some pics have 3 fields of CC data with field 3 CC data 
  belongs to the same channel as field 1. The following pics have the fields reversed 
  because of the odd number of fields. I used top_field_first to tell when the channels 
  are reversed. See Table 6-1 of the SCTE 20 [Paul Fernquist]

0.54 (2009-04-16)
-----------------
- Add -nosync and -fullbin switches for debugging purposes.
- Remove -lg (--largegops) switch.
- Improve synchronization of captions for source files with
  jumps in their time information or gaps in the caption
  information.
- [R. Abarca] Changed Mac script, it now compiles/link 
  everything from the /src directory. 
- It's now possible to have CCExtractor add credits 
  automatically.
- Added a feature to add start and end messages (for credits).
  See help screen for details.

0.53 (2009-02-24)
-----------------
- Force generated RCWT files to have the same length as source file.
- Fix documentation for -startat / -endat switches.
- Make -startat / -endat work with all output formats.
- Fix sync check for raw/rcwt files.
- Improve timing of dvr-ms NTSC captions.
- Add -in=bin switch to read CCExtractor's own binary format.
- Fix problem with short input files (smaller 1MB).
- Clean up regular and debug output.
- Add -out=bin switch to write RCWT data.
- Remove -bo/--bufferoutput switch and functionality.
- [Volker] Added new generic binary format (RCWT
  for Raw Captions With Time). This new format
  allows one file to contain all the available
  closed caption data instead of just one stream.
- Added --no_progress_bar to disable status 
  information (mostly used when debugging, as the
  progress information is annoying in the middle
  of debug logs).
- The Windows GUI was reported to freeze in some 
  conditions. Fixed.
- The Windows GUI is now targeted for .NET 2.0 
  instead of 3.5. This allows Windows 2000 to run
  it (there's not .NET 3.5 for Windows 2000), as
  requested by a couple of key users.

0.51 (unreleased)
-----------------
- Removed -autopad and -goppad, no longer needed.
- In preparation to a new binary format we have 
  renamed the current .bin to .raw. Raw files 
  have only CC data (with no header, timing, etc.).
- The input file format (when forced) is now
  specified with 
    	-in=format
  such as -in=ts, -in=raw, -in=ps ...
  The old switches (-ts, -ps, etc.) still work.
  The only exception is -bin which has been removed
  (reserved for the new binary format). Use
  -in=raw to process a raw file. 
- Removed -d, which when produced a raw file used
  a DVD format. This has been merged into a new
  output type "dvdraw". So now instead of using
  -raw -d as before, use -out=dvdraw if you need
  this.
- Removed --noff
- Added gui_mode_reports for frontend communications,
  see related file.
- Windows GUI rewritten. Source code now included, 
  too.
- [Volker] Dish Network clean-up

0.50 (2008-12-12)
-----------------
- [Volker] Fix in DVR-MS NTSC timing
- [Volker] More clean-up
- Minor fixes

0.49 (2008-12-10)
-----------------
- [Volker] Major MPEG parser rework. Code much
  cleaner now. 
- Some stations transmit broken roll-up captions,
  and for some reason don't send CRs but RUs...
  Added work-around code to make captions readable.
- Started work on EIA-708 (DTV). Right now you can
  add -debug-708 to get a dump of the 708 data. 
  An actually useful decoder will come soon.
- Some of the changes MIGHT HAVE BROKEN MythTV's
  code. I don't use MythTV myself so I rely on
  other people's samples and reports. If MythTV
  is broken please let me know.
- Added new debug options.
- [Volker] Added support for DVR-MS NTSC files.
- Other minor bug fixes and changes.

0.46 (2008-11-24)
-----------------
- Added support for live streaming, CCExtractor
  can now process files that are being recorded
  at the same time.
  
- [Volker] Added a new DVR-MS loop - this is 
  completely new, DVR-MS specific code, so we no
  longer use the generic MPEG code for DVR-MS. 
  DVR-MS should (or will be eventually at least)
  be as reliable as TS.
  Note: For now, it's only ATSC recordings, not
  NTSC (analog) recordings.

0.45 (2008-11-14)
-----------------
- Added auto-detection of DVR-MS files.
- Added -asf to force DVR-MS mode.
- Added some specific support for DVR-MS
  files. These format used to work
  correctly in 0.34 (pure luck) but the
  MPEG code rework broke it. It should
  work as it used to.
- Updated Windows GUI to support the
  new options.
- Added      -lg --largegops
  From the help screen:
  Each Group-of-Picture comes with timing 
  information. When this info is too separate 
  (for example because there are a lot of 
  frames in a GOP) ccextractor may prefer not 
  to use GOP timing. Use this option is you 
  need ccextractor to use GOP timing in large
  GOPs.

0.44 (2008-09-10)
-----------------
- Added an option to the GUI to process
  individual files in batch, i.e. call
  ccextractor once per file. Use it if you
  want to process several unrelated files
  in one go.
- Added an option to prevent duplicate
  lines in roll-up captions.
- Several minor bug fixes.
- Updated the GUI to add the new options.

0.43 (2008-06-20)
-----------------
- Fixed a bug in the read loop (no less)
  that caused some files to fail when 
  reading without buffering (which is 
  the default in the Linux build).
- Several improvements in the GUI, such as
  saving current options as default.

0.42 (2008-06-17)
-----------------
- The option switch "-transcript" has been
  changed to "--transcript". Also, "-txt"
  has been added as the short alias.
- Windows GUI
- Updated help screen

0.41 (2008-06-15)
-----------------
- Default output is now .srt instead of .bin,
  use -raw if you need the data dump instead of
  .srt. 
- Added -trim, which removes blank spaces at 
  the left and rights of each line in .srt.
  Note that those spaces are there to help
  deaf people know if the person talking is
  at the left or the right of the screen, i.e.
  there aren't useless. But if they annoy
  you, go ahead...

0.40 (2008-05-20)
-----------------
- Fixed a bug in the sanity check function 
  that caused the Myth branch to abort. 
- Fixed the OSX build script, it needed a
  new #define to work.

0.39 (2008-05-11)
-----------------
- Added a -transcript. If used, the output will
  have no time information. Also, if in roll-up
  mode there will be no repeated lines.
- Lots of changes in the MPEG parser, most of
  them submitted by Volker Quetschke. 
- Fixed a bug in the CC decoder that could cause
  the first line not to be cleared in roll-up
  mode. 
- CCExtractor can now follow number sequences in
  file names, by suffixing the name with +.
  For example,
  
  DVD0001.VOB+ 

  means DVD0001.VOB, DVD0002.VOB, etc. This works
  for all files, so part001.ts+ does what you
  could expect.
- Added -90090 which changes the clock frequency
  from the MPEG standard 90000 to 90090. It 
  *could* (remains to be seen) help if there are
  timing issues. 
- Better support for Tivo files.
- By default ccextractor now considers the whole
  input file list a one large file, instead of
  several, independent, video files. This has
  been changed because most programs (for example
  DVDDecrypt) just cut the files by size. 
  If you need the old behaviour (because you 
  actually edited the video files and want to
  join the subs), use -ve.


0.36 (unreleased)
-----------------
- Fixed bug in SMI, nbsp was missing a ;.
- Footer for SAMI files was incorrect (<body> and
  <sami> tags were being opened again instead of
  being closed).
- Displayed memory is now written to disk at end
  of stream even if there is no command requesting
  so (may prevent losing the last screen-full).
- Important change that could break scripts, but
  that have been added because old behaviour was
  annoying to most people: _1 and _2 at the end
  of the output file names is now added ONLY if
  -12 is used (i.e. when there are two output 
  files to produce). So

  ccextractor -srt sopranos.mpg

  now produces sopranos.srt instead of sopranos_1.srt.
  If you use -12, i.e.

  ccextractor -srt -12 sopranos.mpg

  You get

  sopranos_1.srt and
  sopranos_2.srt

  as usual.


0.35 (unreleased)
-----------------
- Added --defaultcolor to the help screen. Code
  was already in 0.34 but the documentation wasn't
  updated.
- Buffer is larger now, since I've found a sample
  where 256 Kb isn't enough for a PES (go figure).
- At the end of the process, a ratio between
  video length and time to process is displayed.

0.34 (2007-06-03)
-----------------
- Added some basic letter case and capitalization
  support. For captions that broadcast in ALL
  UPPERCASE (most of them), ccextractor can now
  do the first part of the job.

  --sentencecap or -sc will tell ccextractor to
  follow the typical capitalization rules, such
  as capitalize months, days of week, etc.

  So from
             YOU BETTER RESPECT
             THIS ROBE, ALAN

  You get

             You better respect
             this robe, alan.

  --capfile or -caf also enables the case
  processing part and adds an extra list of
  words in the specified file, for example:

  --capfile names.txt

  where names.txt is just a plain text file
  with the proper spelling for some words,
  such as
  
  Alan
  Tony

  So you get 

             You better respect
             this robe, Alan.

  Which is the correct spelling. You can
  have a different spelling file per TV
  show, or a large file with a lot of
  words, etc.
- ccextractor has been reported to 
  compile and run on Mac with a minor
  change in the build script, so I've
  created a mac directory with the
  modified script. I haven't tested it
  myself.
- Windows build comes with a File Version
  Number (0.0.0.34 in this version) in case
  you want to check for version info.

0.33 (unreleased)
-----------------
- Added -scr or --screenfuls, to select the
  number of screenfuls ccextractor should
  write before exiting. A screenful is 
  a change of screen contents caused by
  a CC command (not new characters). In
  practice, this means that for .srt each
  group of lines is a screenful, except when
  using -dru (which produces a lot of 
  groups of lines because each new character
  produces a new group).
- Completed tables for all encodings.
- Fixed bug in .srt related to milliseconds
  in time lines. 
- Font colors are back for .srt (apparently
  some programs do support them after all).
  Use -nofc or --nofontcolor if you don't
  want these tags.

0.32 (unreleased)
-----------------
- Added -delay ms, which adds (or subtracts)
  a number of milliseconds to all times in 
  .srt/.sami files. For example,
  
         -delay 400

  causes all subtitles to appear 400 ms later
  than they would normally do, and

         -delay -400

  causes all subtitles to appear 400 ms before
  they would normally do.
- Added -startat at -endat which lets you
  select just a portion of data to be processed,
  such as from minute 3 to minute 5. Check
  help screen for exact syntax.

0.31 (unreleased)
-----------------
- Added -dru (direct rollup), which causes
  roll-up captions to be written as
  they would on TV instead of line by line.
  This makes .srt/.sami files a lot longer,
  and ugly too (each line is written many
  times, two characters at time).

0.30 (2007-05-24)
-----------------
- Fix in extended char decoding, I wasn't
  replacing the previous char.
- When a sequence code was found before
  having a PTS, reported time was 
  undefined. 

0.29 (unreleased)
-----------------
- Minor bug fix.

0.28 (unreleased)
-----------------
- Fixed a buffering related issue. Short version,
  the first 2 Mb in non-TS mode were being
  discarded.
- .srt no longer has <font> tags. No player
  seems to process them so my guess is that
  they are not part of the .srt "standard"
  even if McPoodle add them.

0.27 (unreleased)
-----------------
- Modified sanitizing code, it's less aggressive
  now. Ideally it should mean that characters
  won't be missed anymore. We'll see.

0.26 (unreleased)
-----------------
- Added -gp (or -goppad) to make ccextractor use
  GOP timing. Try it for non TS files where 
  subs start OK but desync as the video advances.

0.25 (unreleased)
-----------------
- Format detection is not perfect yet. I've added
  -nomyth to prevent the MytvTV code path to be
  called. I've seen apparently correct files that
  make MythTV's MPEG decoder to choke. So, if it
  doesn't work correctly automatically: Try 
  -nomyth and -myth. Hopefully one of the two
  options will work.


0.24 (unreleased)
-----------------
- Fixed a bug that caused dvr-ms (Windows Media Center)
  files to be incorrectly processed (letters out of
  order all the time).
- Reworked input buffer code, faster now.
- Completed MythTV's MPEG decoder for Program Streams,
  which results in better processing of some specific
  files. 
- Automatic file format detection for all kind of
  files and closed caption storage method. No need to
  tell ccextractor anything about your file (but you
  still can).


0.22 (2007-05-15)
-----------------
- Added text mode handling into decoder, which gets rids 
  of junk when text mode data is present.
- Added support for certain (possibly non standard
  compliant) DVDs that add more captions block in a 
  user data block than they should (such as Red October).
- Fix in roll-up init code that caused the previous popup
  captions not to be written to disk.
- Other Minor bug fixes.


0.20 (2007-05-07)
-----------------
- Unicode should be decent now.
- Added support for Hauppauge PVR 250 cards, and (possibly)
  many others (bttv) with the same closed caption recording 
  format.
  This is the result of hacking MythTV's MPEG parser into
  CCExtractor. Integration is not very good (to put it
  midly) but it seems to work. Depending on the feedback I
  may continue working on this or just leave it 'as it'
  (good enough). 
  If you want to process a file generated by one of these
  analog cards, use -myth. This is essential as it will
  make the program take a totally different code path.
- Added .SAMI generation. I'm sure this can be improved,
  though. If you have a good CSS for .SAMI files let me
  know.

0.19 (2007-05-03)
-----------------
- Work on Dish Network streams, timing was completely broken. 
  It's fixed now at least for the samples I have, if it's not
  completely fixed let me know. Credit for this goes to
  Jack Ha who sent me a couple of samples and a first 
  implementation of a semi working-fix.
- Added support for several input files (see help screen for
  details).
- Added Unicode and Latin-1 encoding.
  

0.17 (2007-04-29)
-----------------
- Extraction to .srt is almost complete - works correctly for
  pop-up and roll-up captions, possibly not yet for paint-on
  (mostly because I don't have any sample with paint-on captions
  so I can't test).
- Minor bug fixes.
- Automatic TS/non-TS mode detection.

0.14 (2007-04-25)
-----------------
- Work on handling special cases related to the MPEG reference
  clock: Roll over, jumps, etc.
- Modified padding code a bit: In particular, padding occurs
  on B-Frames now.
- Started work on CC data parsing (use -608 to see output).
- Added built-in input buffering.
- Major code reorganization.
- Added a decent progress indicator.
- Added TS header synchronization (so the input file no longer
  needs to start with a TS header).
- Minor bug fixes.

0.07 (2007-04-19)
-----------------
- Added MPEG reference clock parsing.
- Added auto padding in TS. Does miracles with timing.
- Added video information (as extracted from sequence header).
- Some code clean-up.
- FF sanity check enabled by default.