-
Notifications
You must be signed in to change notification settings - Fork 12
/
ChangeLog
437 lines (298 loc) · 14.8 KB
/
ChangeLog
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
--------------------------------------------------------------------
Thot 0.90 26th May 2004
=> First working code. Models estimated from alignment vectors
--------------------------------------------------------------------
Thot 0.91 1st Jun 2004
=> Estimation from alignment matrices
--------------------------------------------------------------------
Thot 0.92 6th Jul 2004
- Added a tool to perform operations between alignment matrices
- First version of the pML estimation
- Basic C++ phrase-model library implemented
--------------------------------------------------------------------
Thot 0.93 22th Jan 2005
=> Several changes from the previous version:
- Ported to gcc 3.3.2
- Faster pML estimation
- Added bilingual segmentation functionality
- Sum between alignment matrices
- Estimation of phrase length models
- Bug fixes
--------------------------------------------------------------------
Thot 0.94 5th May 2005
=> Modifications in the pML estimation (normalize by segmentation
size)
- bug fixes
--------------------------------------------------------------------
Thot 0.95 27th Jul 2005
- Code reorganization (modularity increased)
- Some documentation added
- Changes in command-line options
- Output in pharaoh-v123 input format
--------------------------------------------------------------------
Thot 0.96 2nd Sep 2005
- Use of GNU tools: automake and autoconf
- Ported to Windows using cygwin
- Ported to gcc 4.0.0
- bug fixes
--------------------------------------------------------------------
Thot 0.97a 28th Sep 2005
- PhraseCounts now implemented as a trie
- PhraseTModel implemented as a composition of
PhraseCounts and PhraseDic classes
- bug fixes
--------------------------------------------------------------------
Thot 0.97b 19th Oct 2005
- Deep reorganization of the PhraseTModel class
- Changes in PhraseDict class statistics
- Added the scripts getModelStats.sh and convGizaAligFile.pl
-------------------------------------------------------------------
Thot 0.97c 28th Nov 2005
- PhraseDict reimplemented as a trie
- The -pc option now prints the count of (f,e) and the counts
of e
- The -ms option allows to impose the maximum phrase-length
constraint over the source phrase
--------------------------------------------------------------------
Thot 0.98 17th Jan 2006
- PhraseExtractionCell is reimplemented using a trie structure.
This change reduces significantly the memory required by the
pseudo_ml estimation process
- "-mc" option (used during pml estimation) now is given in
millions of segmentations
- The maximum sentence length is now equal to 101 words
--------------------------------------------------------------------
Thot 0.99 29th Jan 2006
- Both, pml-estimation and the extraction of the best phrase
alignments now do not require extra memory to work. Because
of this, it is not necessary to prune the segmentations
- "-mc" option is no longer used unless the "configure" script
is executed with the "--disable-iterators" option
--------------------------------------------------------------------
Thot 1.0.0 Beta 25th May 2006
- thot is able to train models of an arbitrary size accessing
hard disk
- Both "alig_op" and "thot" programs are able to handle large
files (>2 GB)
- PhraseDict and PhraseCounts are implemented using the
TrieOfWords data structure
- PhraseTable inherits from the abstract base class
BasePhraseTable
- thot program is now a shell-script which internally calls the
binary "thot_ker"
- The name of the "alignManip" tool is changed to "alig_op"
- The verbosity of "alig_op" has been increased
- The name of the file extensions is changed:
.alig -> .A3.final
.model -> .ttable
.segmLengthTable -> .seglentable
- A new class is created in order to load, print and access
segmentation-length tables (SegLenTable)
- Some functionality implemented by the PhraseModel_From_WBA
class is moved to the PhraseModel class
- awkInputString class is removed from awkInputStream.h and
awkInputStream.cc files
- Ported to FreeBsd
--------------------------------------------------------------------
Thot 1.1.0 Beta 04th Jul 2006
- Fixed a bug in the Bitset class
- Simple tests can be executed by typing "make check"
- Portability increased, the package was compiled
(but not deeply tested) in the following platforms:
* [x86] Linux 2.6 (Debian 3.1)
* [x86] NetBSD (2.0.2)
* [AMD64] Linux 2.6 (Fedora Core 3 on AMD64 Opteron)
* [Alpha] Linux 2.2 (Debian 3.0)
* [Power5 - OpenPower 720] SuSE Enterprise Server 9
* [Sparc - R220] Sun Solaris (9)
* [x86] OpenBSD 3.8
* [PPC - G5] MacOS X 10.2 SERVER Edition
* [x86] Solaris 9
--------------------------------------------------------------------
Thot 2.0.0 Beta 14th Feb 2007
- Implemented a parallel version of Thot
- Incorporated a new output format for the translation tables
- Added a utility to prune the translation tables
- Improvements in the implementation of the data structures
that store the bilingual phrases
- Incorporated a shell script to filter translation tables
given test corpora to be translated
- Fixed a bug when reading the alignments in GIZA++ format
- The directory for temporaries used by thot can be given
as a parameter
- The function "extendModel" is added to the
PhraseModel_From_WBA class. "extendModel" allows to
incrementally add new phrase pairs to the models given an
alignment file in .A3.final format
- The output of the conv_giza_alig_file script is improved.
In addition, the direction of the alignments can
be inverted
- Solved a bug in the shell script invert_ttable
- Added the shell script gen_seglentable_poisson which
allows to generate segmentation-length tables by means
of Poisson distributions
- Incorporated an interface for the PhraseModel class by
means of the BasePhraseModel abstract class
--------------------------------------------------------------------
Thot 2.0.1 Beta 8th Jun 2007
- Solved a portability issue with awk which could cause
execution errors using thot and pbs_thot
--------------------------------------------------------------------
Thot 2.0.2 Beta 17th Jun 2007
- Added a short tutorial for the Thot toolkit (including some
brief notes about how to combine Thot with Pharaoh and
Moses)
- Thot incorporates a new parameter: -c, which allows to
prune the translation table given a cutoff value for
the count of the source phrases
- Solved a bug in alig_op when using the -e option
- "-no-invert" parameter of alig_op has changed. The new name
is "-no-transpose"
- Changed the name of the "flip_ef" utility, now its name is
"flip_phr"
- Changes in the return values of some functions, using
constants defined in the file ErrorDefs.h
--------------------------------------------------------------------
Thot 2.1.0 Beta
- MAJOR CHANGE: NbestTableNode no longer works with probabilities
but with log-probabilities. This affects to the functions
getNbestTransFor_s_() and getNbestTransFor_t_()
- New option for thot and pbs_thot: -lex. The -lex option
incorporates two new score components in the phrase table
- The library of phrase models provides faster access to the model
probabilities
- Better implementation of the OrderedVector class
- The tool "filter_ttable" has been incorporated to the
distribution. "filter_ttable" is a new tool to filter phrase
models in order to apply them in machine translation tasks
- The tool "score_phr_ibm" has been incorporated to the
distribution. "score_phr_ibm" adds lexical scores for the
phrase pairs of a given phrase table. The lexical scores are
calculated as the score for the viterbi alignment of IBM 1 model.
- "make check" is replaced by "make installcheck" since the tool
checks are to be executed after installing the package
- Source and target sentences or phrases are no longer noted using
the letters 'e' and 'f' respectively, which are replaced by 's'
and 't'. The modifications affects the name of variables and
functions
Functions that have changed their names:
a) The scoring functions defined in BasePhraseModel: pf_e_, pe_f_,
have changed its names: ps_t_, pt_s_ ...
b) Functions to obtain lists of translations
The old functions cited above are maintained for backward
compatibility
Functions that have altered the order in which the source and
target sentences are given:
a) The functions defined in the "print_func" module
b) Some function members declared in the "PhraseModel_From_WBA" and
the "PhraseExtractionTable" classes. Three functions which were
modified are declared as public functions:
PhraseModel_From_WBA::obtainBestAlignment
PhraseExtractionTable::extractConsistentPhrases
PhraseExtractionTable::obtainPossibleSegmentations
- Some data types and functions names have been changed in order to
simplify and clarify the code:
InvNbestTransTable -> PhraseNbestTransTable
TransTableNodeData and InvTransTableNodeData -> PhraseTableNodeData
BasePhraseModel::getNbestInvTransFor_t_() -> BasePhraseModel::getNbestTransFor_t_()
PhraseDict::getInvTranslationsFor_t_() -> PhraseDict::getTranslationsFor_t_()
- Solved a bug in ps_t_ function defined in "PhraseModel.cc"
- Solved a bug in the alig_op tool. The bug could appear when
using the -f option, specifically when the prefix for the output
file contained directory components
- The robustness of the alig_op tool has been increased due to a
better management of the temporary files that are used internally
--------------------------------------------------------------------
Thot 2.1.1 Beta
- Solved a portability issue with i/o functions
- Bug fixed in the pbs_thot tool (reported by Soren Harder)
- Faster compilation
- Thot package now splits the library libphrase_model.a into two
files:
* libphrase_model.a
* libnlp_common.a
libphrase_model.a depends on libnlp_common.a, so both libraries
are needed to access the functionality provided by the package.
--------------------------------------------------------------------
Thot 2.2.0 Beta
- Solved a memory management bug in the PhraseExtractionTable class
- Solved portability issues with 64 bit processors
- Solved possible portability problems with locales
- Added the utility "train_giza", which allows to easily train
IBM models using the GIZA and the mkcls tools
- Added a simple class diagram created with the program "dia" (see
doc/class_diag.dia)
- The hierarchy of phrase-based model classes has been modified.
Two new classes have been added: BaseCountPhraseModel and
BaseIncrPhraseModel.
The following classes have changed their names:
* PhraseModel -> _incrPhraseModel
* PhraseModel_From_WBA -> WbaIncrPhraseModel
* PhraseModelWBAParameters -> WbaIncrPhraseModelParams
- Some function names in BasePhraseModel.h and BaseIncrPhraseModel.h
have changed:
* pt_s_(const Vector<string> ...) -> strPt_s_(.)
* logpt_s_(const Vector<string> ...) -> strLogpt_s_(.)
* ps_t_(const Vector<string> ...) -> strPs_t_(.)
* logps_t_(const Vector<string> ...) -> strLogps_t_(.)
* getTransFor_s_(const Vector<string> ...) -> strGetTransFor_s_(.)
* getTransFor_t_(const Vector<string> ...) -> strGetTransFor_t_(.)
* getNbestTransFor_s_(const Vector<string> ...) -> strGetNbestTransFor_s_(.)
* getNbestTransFor_t_(const Vector<string> ...) -> strGetNbestTransFor_t_(.)
* addTableEntry(const Vector<string> ...) -> strAddTableEntry(.)
* incrCountsOfEntry(const Vector<string> ...) -> strIncrCountsOfEntry(.)
--------------------------------------------------------------------
Thot 2.2.1 Beta
- Solved some minor bugs
- Some portions of the code have been refactored
- Added the train_phr_model tool
--------------------------------------------------------------------
Thot 3.0.0 Beta
=> The project has been restarted. Backward compatibility with
previous versions of Thot is not ensured.
=> Lots of new features have been added:
- Phrase-based statistical machine translation decoder
- Computer-aided translation (post-edition and interactive
machine translation)
- Incremental estimation of all of the models involved in the
translation process
- Client-server implementation of the translation functionality
- Single word alignment model estimation using the incremental
EM algorithm
- Scalable implementation of the different estimation algorithms
using Map-Reduce
- Integration with the CasMaCat Workbench (http://www.casmacat.eu)
- ...
--------------------------------------------------------------------
Thot 3.1.0 Beta
- Incorporation of a whole set of pre/post-processing tools
- Portability increased (Thot has been successfully compiled in
many different platforms, including Mac OS X, FreeBSD, OpenBSD,
NetBSD, etc.)
- Improved checking of runtime errors in all of the tools involved
in the translation pipeline
- Early detection of bugs and portability problems using built-in
checks
- Improvements in tools to carry out translation experiments and
incorporation of new ones
- Translation can now be executed in parallel using clusters or
multi-processor systems by means of the thot_decoder tool
- The Thot manual has been extended and revised
- Multiple minor improvements and bug fixes
--------------------------------------------------------------------
Thot 3.2.0 Beta
- Ability to force translations for specific phrases of the source
sentence to be translated
- Capability to access language and translation models from disk
- Added functionality to use multiple language and translation
models
- Added tool to create empty models useful to perform online
learning from scratch
- Portability fixes
- Improved checking of runtime errors in all of the tools involved
in the translation pipeline
- Added new built-in checks for early detection of bugs and
portability problems
- The Thot manual has been extended and revised
- Multiple minor improvements and bug fixes
--------------------------------------------------------------------