forked from laurikari/tre
-
-
Notifications
You must be signed in to change notification settings - Fork 2
/
ChangeLog.old
815 lines (541 loc) · 28.2 KB
/
ChangeLog.old
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
Fri Dec 10 21:15:14 2004 Ville Laurikari <vl@iki.fi>
* Released tre-0.7.2.
Sat Dec 4 12:04:29 2004 Ville Laurikari <vl@iki.fi>
* lib/tre-compile.c (tre_expand_ast): Bugfix. If a back reference
occurred after {m,n} in a regexp, its position was not updated
causing incorrect match results.
* lib/tre-compile.c (tre_make_trans): Bugfix. If a back reference
was immediately followed by $ or ^, it triggered an assertion
failure when compiling the regexp.
* tre/retest.c: Added regression tests to catch the above bugs.
* lib/tre-compile.c (tre_version): Changed to return a better
human-readable string instead of just the version number.
* src/agrep.c (tre_agrep_handle_file): Bugfix. The read buffer
must be reset when starting to read a new file because the read
loop may bail out without reading the full file.
Sun Nov 21 18:22:26 2004 Ville Laurikari <vl@iki.fi>
* Released tre-0.7.1.
Sat Nov 20 10:10:12 2004 Ville Laurikari <vl@iki.fi>
* src/agrep.c: Added the --delimiter-after command line option.
It can be used to output the record delimiter after the matching
record when a custom delimiter regex has been given instead of
before the matching record, which is the default.
* src/agrep.c: Added the --color (and --colour) command line
option. It highlights the matching part of the text with a color
code from the GREP_COLOR environment variable, or red by default.
* src/agrep.c: Made some changes which hopefully make agrep faster
in certain conditions.
* win32/tre.def: Added reguexec.
Sun Nov 7 17:26:54 2004 Ville Laurikari <vl@iki.fi>
* Makefile.am: Fixed to include all files under the python
directory to distributions.
* lib/*: Divided tre-compile.c to several smaller files, to make
things easier to maintain.
* doc/agrep.1.in: Added this man page for agrep.
Fri Sep 10 21:47:23 2004 Ville Laurikari <vl@iki.fi>
* Released tre-0.7.0.
Sat Sep 4 14:55:00 2004 Ville Laurikari <vl@iki.fi>
* lib/tre-compile.c (tre_parse): Added support for the \x1B and
\x{263a} extensions for entering 8 bit and wide characters in
hexadecimal.
* tests/retest.c: Added tests for the above.
* python/{tre-python.c, setup.py.in, example.py}, configure.ac:
Added Python language bindings contributed by Nikolai SAOUKH.
Thanks!
* lib/regex.c (tre_have_backrefs, tre_have_approx): Added these
functions to query from a compiled regexp whether it uses back
references or approximate matching, respectively.
Sun Aug 29 19:30:01 2004 Ville Laurikari <vl@iki.fi>
* Added the reguexec() function. It can be used to match regexps
over arbitrary data structures, since characters are fed to the
matcher loop one by one with a user specified function. Unless
the backtracking matcher is used, the user specified function does
not even need to keep the whole string in memory at once.
* tests/test-str-source.c: Test program for the above.
Tue Aug 3 12:59:15 2004 Ville Laurikari <vl@iki.fi>
* lib/regex.h: Added the REG_APPROX_MATCHER and
REG_BACKTRACKING_MATCHER execution flags to force using the
approximate matcher and backtracking matcher, respectively.
* tests/retest.c: Rewrote to run tests with different pmatch[] and
nmatch arguments, different compilation flags and different
matcher loops.
* lib/tre-match-approx.c (tre_tnfa_run_approx): Fixed to work
correctly in multibyte mode. Before the approximate matcher did
not find matches if there were characters more than one byte long
in the string.
Sun Aug 1 19:42:59 2004 Ville Laurikari <vl@iki.fi>
* lib/tre-compile.c (tre_parse): Added support for \Q and \E for
turning REG_LITERAL on for parts of a regexp.
Mon Jul 5 16:22:11 2004 Ville Laurikari <vl@iki.fi>
* configure.ac: Fixed to prepend "-lgnugetopt" to LIBS if
gnugetopt is needed for getopt_long().
Sat Jul 3 12:47:51 2004 Ville Laurikari <vl@iki.fi>
* lib/tre-compile.c, lib/regex.h: Added a new compilation flag
REG_RIGHT_ASSOC. It can be used to change concatenation
associativity from left associative (the default) to right
associative.
* lib/tre-compile.c (tre_parse): Added support for (?inr-inr)
and (?inr-inr:regex) extensions which work like in Perl.
The (?inr-inr) extension allows turning the REG_ICASE,
REG_NEWLINE, and REG_RIGHT_ASSOC flags on and off for chosen parts
of a regexp, and the (?:regex) extension can be used to
parenthesize a subexpression without capturing a submatch for it.
* lib/Makefile.am, lib/regexec.c, tests/Makefile.am: Fixed to
compile with --disable-approx.
* lib/tre-match-approx.c: Bugfix. There was a place where the
tags array was modified even if it was empty, causing semi-random
crashes.
Mon Jun 28 16:51:38 2004 Ville Laurikari <vl@iki.fi>
* configure.ac: Added AC_SYS_LARGEFILE so that large files work
with agrep.
* configure.ac: Added a call to AC_FUNC_ALLOCA unless
--without-alloca is used.
* tests/retest.c: Changed to run all tests with all matcher
backends when possible. This makes the tests a lot more
comprehensive, and already caught a crashbug in the approximate
matcher.
* tre/win32/tre-config.h: Added version information so compilation
on Windows will work again.
Wed May 26 20:11:15 2004 Ville Laurikari <vl@iki.fi>
* Released tre-0.6.8.
Tue May 25 20:34:12 2004 Ville Laurikari <vl@iki.fi>
* configure.ac: Define _GNU_SOURCE so all GNU extensions get
detected (such as iswblank()).
* lib/tre-internal.h: Removed an "#undef TRE_USE_SYSTEM_WCTYPE"
which was not meant to be left in the release version.
Mon May 10 19:45:11 2004 Ville Laurikari <vl@iki.fi>
* m4/ax_check_funcs_comp.m4, m4/ax_check_sign.m4: Fixed to use
"tr [a-z] [A-Z]" which works with Solaris /bin/tr.
Sun May 9 19:37:10 2004 Ville Laurikari <vl@iki.fi>
* Released tre-0.6.7.
Sat May 8 21:50:49 2004 Ville Laurikari <vl@iki.fi>
* src/agrep.c: Added the command line option -y. It does nothing,
but is needed for compatibility with the non-free agrep.
* lib/tre-compile.c (tre_parse): Fixed a bug which caused memory
to be used exponentially with the number of macros (e.g. \s or \d)
in a regexp.
Sat May 1 11:47:27 2004 Ville Laurikari <vl@iki.fi>
* lib/tre-match-utils.h (tre_neg_char_classes_match): Fixed to
handle null bytes in multibyte strings (when string length
explicitly given). The previous version did not advance in the
input string if a null byte was encountered, effectively leaving
the matchers in an infinite loop.
Sat Apr 24 12:58:19 2004 Ville Laurikari <vl@iki.fi>
* configure.ac: Added --with-libutf8 and --without-libutf8 and
checks for libutf8. Now libutf8 is searched for if mbrtowc is not
found elsewhere. This means wide character support can be used on
any platform where libutf8 works.
* m4/ax_check_funcs_comp.m4 (AX_CHECK_FUNCS_COMP): New macro
working very much like AC_CHECK_FUNCS, but can be used to check
for the existence of functions which are renamed with macros
(libutf8 does this).
* m4/vl_*.m4: Renamed to most of these to ax_*.m4.
* lib/tre-compile.c: Removed wide L"..." string constants, which
may not work with libutf8. Replaced with 8 bit "..." strings and
code to convert to wide character strings when needed.
* lib/tre-compile.c (tre_config): New function to check which
optional features have been compiled into the library. Useful
especially when linking dynamically with libtre.
* lib/tre-compile.c (tre_version): New function to get the version
of the library. This is just a convenience function for
tre_config().
Thu Apr 15 07:36:27 2004 Ville Laurikari <vl@iki.fi>
* Changed to use iswalpha(), iswalnum(), etc. if iswctype() and
wctype() are not available. Now wide character support should
work on systems where wctype() and/or iswctype() are not
available, but the other functions are (old FreeBSD versions).
* configure.ac: Changed accordingly (iswctype and wctype no longer
a requirement for wchar support).
* tests/Makefile.am: Use LTLIBINTL for linking the test programs.
Now the tests should compile on hosts which have a separate
libintl installed and a non-GNU C library.
* src/agrep.c: Fixed not to always print the filenames.
Sun Apr 4 21:02:57 2004 Ville Laurikari <vl@iki.fi>
* lib/tre-compile.c (tre_expand_ast): Fixed yet more bugs. Sigh.
Sun Mar 21 16:39:58 2004 Ville Laurikari <vl@iki.fi>
* Released tre-0.6.6.
Sun Mar 21 14:08:39 2004 Ville Laurikari <vl@iki.fi>
* src/agrep.c: Added the command line option -H (--with-filename)
to always print the filename for each match.
Sun Mar 21 12:24:11 2004 Ville Laurikari <vl@iki.fi>
* lib/tre-compile.c (tre_expand_ast): Fixed bugs which occurred
sometimes when *, +, or ? repeats were used after {m,n} repeats in
a regexp.
* tests/retest.c: Added some regression tests which catch the bug
fixed above.
* tre.pc.in: Include @LIBINTL@ in the Libs field.
Fri Mar 5 23:49:38 2004 Ville Laurikari <vl@iki.fi>
* Released tre-0.6.5.
Fri Mar 5 23:16:57 2004 Ville Laurikari <vl@iki.fi>
* tests/retest.c: Changed to run all regexec tests also with a
NULL pmatch[] array.
* lib/tre-match-*.c: Fixed bugs related to NULL pmatch[] arrays.
Fri Mar 5 20:40:25 2004 Ville Laurikari <vl@iki.fi>
* lib/tre-compile.c (tre_expand_ast): Fixed a bug which caused too
large indexes to be used for states if more than one (non-nested)
{m,n} repeats were used in a regexp.
* doc/tre-syntax.html: Merged in additions from Dominick Meglio,
thank you!
* tre/m4/ac_libtool_tags.m4: Backport of AC_LIBTOOL_TAGS from
Libtool 1.6 to Libtool 1.5.x.
* configure.ac: Added AC_LIBTOOL_TAGS([]) so TRE can be compiled
without working C++ and Fortran compilers.
Mon Jan 5 13:34:19 2004 Ville Laurikari <vl@iki.fi>
* lib/tre-match-backtrack.c (tre_tnfa_run_backtrack): Fixed
bugs that caused crashes if REG_NOSUB was used.
Fri Jan 2 16:53:10 2004 Ville Laurikari <vl@iki.fi>
* Released tre-0.6.4.
Fri Jan 2 16:14:37 2004 Ville Laurikari <vl@iki.fi>
* lib/tre-match-backtrack.c (tre_tnfa_run_backtrack): Fixed to
compile if TRE_DEBUG is defined but TRE_WCHAR is not. Thanks to
Dominick Meglio for pointing this out.
Fri Jan 2 09:53:02 2004 Ville Laurikari <vl@iki.fi>
* lib/tre-compile.c (tre_copy_ast): Bugfix. Did not recurse when
handling an iteration node, and everything under the node was not
genuinely copied but only referenced. This caused things like
"(a+){5}" not to work correctly; this example was basically
equivalent to "a*" but in a very very slow way.
Mon Dec 22 10:31:16 2003 Ville Laurikari <vl@iki.fi>
* Released tre-0.6.3.
Sun Dec 14 11:22:58 2003 Ville Laurikari <vl@iki.fi>
* Bugfix. Back references did not work if REG_NOSUB was used when
compiling the regexp.
Tue Dec 2 16:06:37 2003 Ville Laurikari <vl@iki.fi>
* Small fixes and changes here and there to avoid compiler
warnings.
* lib/tre-match-backtrack.c (tre_tnfa_run_backtrack): Fixed to
compile if TRE_WCHAR is not defined.
* lib/tre-mem.c (tre_mem_alloc_impl): Bugfix. When a new block was
allocated with malloc() the next tre_mem_alloc() returned a
possibly unaligned pointer.
Sun Nov 23 21:52:58 2003 Ville Laurikari <vl@iki.fi>
* Released tre-0.6.2.
Sun Nov 23 18:40:57 2003 Ville Laurikari <vl@iki.fi>
* lib/tre-match-backtrack.c (tre_tnfa_run_backtrack): Bugfix.
If the TNFA has a loop with an empty back reference, the matcher
went to an infinite loop. This happened e.g. with the regexp
"()(\1)*".
* lib/regexec.c (tre_match_approx): Fixed to return error if the
regexp has back references.
* lib/tre-compile.c (tre_parse): Bugfix in parsing empty
expressions and missing closing parentheses.
Sun Nov 16 20:27:17 2003 Ville Laurikari <vl@iki.fi>
* lib/tre-match-approx.c (tre_tnfa_run_approx): Fixed a bug which
caused non-optimal matches to be returned in some cases.
* tests/retest.c: Added a couple of tests.
Sat Nov 15 15:21:23 2003 Ville Laurikari <vl@iki.fi>
* lib/tre-compile.c (tre_expand_ast): Fixed to handle nested
repeats correctly.
Wed Nov 12 21:45:44 2003 Ville Laurikari <vl@iki.fi>
* lib/tre-compile.c: Fixed to compile if REG_LITERAL is not
defined.
* lib/tre-match-backtrack.c: Fixed to compile without wide
character support.
Thu Nov 6 22:23:03 2003 Ville Laurikari <vl@iki.fi>
* lib/tre-match-parallel.c: Bugfix. If pmatch[] was null, the
matcher loop referred past an array, sometimes crashing.
Mon Nov 3 17:41:37 2003 Ville Laurikari <vl@iki.fi>
* Released tre-0.6.0.
Sun Nov 2 20:27:37 2003 Ville Laurikari <vl@iki.fi>
* lib/tre-compile.c (tre_parse): Implemented support for
REG_LITERAL. If REG_LITERAL is used, the entire regexp is
interpreted as a literal word.
Tue Oct 14 21:20:35 2003 Ville Laurikari <vl@iki.fi>
* lib/tre-compile.c: Changed the parser to use a context object
which contains all parse state instead of passing each state
variable separately. Fixed bug that caused `have_backrefs' to be
reset if macros were used (this caused the wrong matcher to be
used and back references not to work).
Mon Sep 29 21:43:35 2003 Ville Laurikari <vl@iki.fi>
* Added a "tre_" prefix to all functions that did not yet have it.
* lib/tre-compile.c: Separated regexp compilation from regcomp.c
to this file. Now all actual functionality is implemeted in
lib/tre-*.c, and lib/reg*.c have the POSIX API wrappers.
* Implemented new syntax to control approximate matching
parameters dynamically during matching. Thanks to Bill Yerazunis
for the suggestions!
Wed Sep 3 19:41:40 2003 Ville Laurikari <vl@iki.fi>
* lib/tre-match-backtrack.c (tre_tnfa_run_backtrack): Bugfix. Now
matching back references works correctly in wide character mode.
Thu Jul 3 19:47:33 2003 Ville Laurikari <vl@iki.fi>
* configure.ac: Made --disable-system-abi the default.
* configure.ac: alloca() is no longer required. Unless
--without-alloca is specified, alloca() will be used if found.
Fri May 15 09:39:34 2003 Ville Laurikari <vl@iki.fi>
* tre/tre-match-approx.c (tre_tnfa_run_approx): Bugfix in handling
insertions. If an insertion was found that had better cost than a
previous path, the tag values were not copied resulting in
incorrect match and submatch positions being reported.
Wed May 14 21:14:47 2003 Ville Laurikari <vl@iki.fi>
* Released tre-0.5.3.
Wed May 14 20:11:43 2003 Ville Laurikari <vl@iki.fi>
* tre/tre-mem.c (tre_mem_alloc_impl): Bugfix. The returned
pointer was not always properly aligned.
Tue May 13 19:55:30 2003 Ville Laurikari <vl@iki.fi>
* configure.ac, lib/regex.h, lib/tre-internal.h: Fixed to compile
if --disable-system-abi is used.
* lib/Makefile.am: Fixed to use $(LTLIBINTL), so gettext is found
on systems where it is not in libc (e.g. FreeBSD has it in
libintl).
Thanks to Dominick Meglio <codemstr@ptdprolog.net> for the above!
Thu May 8 21:23:30 2003 Ville Laurikari <vl@iki.fi>
* win32/config.h, win32/tre-config.h: Updated and fixed
compilation errors on Windows. Enabled wide character and
multibyte support.
* win32/tre.dsp, win32/retest.dsp: Link against msvcprt.lib to get
wide character functions.
* src/retest.c: Don't try to call setlocale() on Windows (it seems
to crash).
* lib/regcomp.c (parse_re): Fixed bugs in the regexp parser when
wide character support is not used. Also fixed some references
past the end of the input string.
* lib/regex.h: regcomp, regexec, regerror, and regfree weren't
defined if TRE_WCHAR was not defined. Fixed.
Tue Apr 15 22:37:48 2003 Ville Laurikari <vl@iki.fi>
* lib/tre-match-approx.c (tre_tnfa_run_approx): Fixed bugs. A
match starting earlier was sometimes preferred over a match with a
smaller cost, and insertions were not handled correctly.
* src/agrep.c: Implemented the -B (best match) mode. It scans the
input files twice; first to find out what is the cost of the best
matching record(s), and another time to output all records that
match with that cost.
* test/test-approx.c: Added some simple test cases.
Sun Apr 13 13:53:00 2003 Ville Laurikari <vl@iki.fi>
* doc/tre-api.html, doc/tre-syntax.html: Beginnings of
API and regexp syntax documentation.
* lib/tre-mem.c: Changed to allocate blocks bigger than
TRE_MEM_BLOCK_SIZE if the requested amount is large. This fixes
REG_ESPACE problems when trying to compile large regexps,
especially ones with a lot of "|".
* Released tre-0.5.2.
Mon Apr 7 18:39:05 2003 Ville Laurikari <vl@iki.fi>
* lib/regcomp.c, lib/tre-match-parallel.c: Added support for
non-greedy repetition operators "*?", "+?", "??", and "{m,n}?".
They work similarly to the ones in Perl.
* tests/retest.c: Added tests for minimal repetition operators.
* tre.pc.in: Added pkgconfig file.
Tue Apr 1 20:20:37 2003 Ville Laurikari <vl@iki.fi>
* lib/tre-match-parallel.c, lib/tre-match-approx.c: Fixed
alignment bugs when allocating pointers from a buffer.
Sat Mar 15 20:35:11 2003 Ville Laurikari <vl@iki.fi>
* lib/regcomp.c (parse_re): Fixed to allow the empty regexp.
These were already allowed inside parentheses (e.g. "(a|)"), but
e.g. "a|" caused REG_EPAREN to be returned. Now "", "a|", "|a",
"*", "?", etc. work as expected.
Thu Mar 13 19:49:20 2003 Ville Laurikari <vl@iki.fi>
* lib/tre-match-backtrack.c: Bugfix. Stopped too early when
scanning asciiz strings.
* configure.in: System ABI support: added checks for absolute path
to regex.h and a field in the system defined regex_t suitable for
storing a pointer to a TNFA. TRE is now configured by default to
be compatible with the system regex ABI, unless
--disable-system-abi is used.
* lib/regex.h, lib/tre-config.h.in: System ABI support: if
TRE_USE_SYSTEM_REGEX_H if defined, include system regex.h instead
of defining everything here.
* lib/regcomp.c, lib/regexec.c, lib/tre-config.h.in:
System ABI support: use the configured field in regex_t struct for
getting and setting the pointer to the TFNA.
Thu Feb 27 19:32:43 2003 Ville Laurikari <vl@iki.fi>
* lib/tre-match-*.[ch]: Fixed several references past the end of
the input string.
* lib/tre-match-approx.c: Fixed bugs in submatch tracking.
* configure.in: Added flag --disable-agrep to disable building and
installing agrep.
* Released tre-0.5.1.
Sun Feb 23 14:43:06 2003 Ville Laurikari <vl@iki.fi>
* Released tre-0.5.0.
Fri Feb 21 19:10:30 2003 Ville Laurikari <vl@iki.fi>
* win32/: New directory, contains project and workspace files for
compiling TRE and `retest' for Windows with MS Visual C++.
Original version contributed by Aymeric Moizard <jack@atosc.org>,
thank you!
Sun Feb 16 12:55:13 2003 Ville Laurikari <vl@iki.fi>
* lib/regcomp.c: Rewrote code that adds tags in the AST for
submatch addressing. Changes include:
- Submatch boundaries now all have a tag with offset zero. This
makes it possible to get correct submatches for approximate
matches.
- Removed marker and boundary tags. Now nested submatches are
tracked and that information is used to reset submatches from
old repetitions.
- Bounded iterations are now expanded after adding tags instead
of at parse time. This makes the code a lot cleaner.
* lib/regexec.c, lib/tre-internal.h: Related changes (no more
marker and boundary tags).
* tests/test-approx.c: Small test program for approximate
matcher.
Mon Jan 13 20:28:52 2003 Ville Laurikari <vl@iki.fi>
* lib/tre-match-approx.c: Now returns submatches of approximate
matches in the `pmatch[]' array of the `regamatch_t' struct.
Sun Jan 12 13:52:37 2003 Ville Laurikari <vl@iki.fi>
* lib/regexec.c, lib/regex.h: Changed API of approximate matching
functions. This API is easier to extend without having to change
the applications using the API at all.
* src/agrep.c: New command line option --show-cost (-s) to prefix
the cost of the match found to each output line.
* tests/retest.c: Added tests for back referencing.
* lib/*: Rearranged stuff. Split all three matchers (parallel,
approximate, backtracking) into separate files. Put tre-mem into
its own files.
Mon Jan 6 21:18:12 2003 Ville Laurikari <vl@iki.fi>
* utils/autogen.sh: Fixed (must run aclocal before automake).
* m4/vl_check_sign.m4, m4/vl_decl_wchar_max.m4, configure.in:
Updated for new autoconf style (AC_TRY_COMPILE ->
AC_COMPILE_IFELSE, AC_ERROR -> AC_MSG_ERROR, etc.)
* lib/regcomp.c, lib/regexec.c, lib/regexec-bt.c, lib/tre.h:
Implemented support for back references. A backtracking routine
implemented in `regex-bt.c' is used instead of the parallel
matcher if back references are used is the regexp.
Fri Nov 29 20:57:52 2002 Ville Laurikari <vl@iki.fi>
* configure.in: New options --disable-wchar and
--disable-multibyte that disable wchar_t support (and requirement)
and multibyte character support, respectively.
* lib/regcomp.c, lib/regex.h, lib/regexec.c, lib/tre.h: Related
changes.
Sun Oct 20 21:55:56 2002 Ville Laurikari <vl@iki.fi>
* configure.in: Check getopt_long support.
* lib/regcomp.c (ast_compute_tag_info): Merged into ast_add_tags
and removed this function.
* lib/regcomp.c (ast_add_tags, parse_re, parse_bound): Bugfixes.
Range repetition did not work correctly when applied to a
parenthesized subexpression. For example, "a{5,6}" worked correctly,
but "(a|b){5,6}" did not.
Sun Oct 20 20:27:24 2002 Ville Laurikari <vl@iki.fi>
* Changed the name of the package from "libtre" to just "TRE".
* lib/regexec.c (tnfa_execute_approx): Implemented approximate
regexp matching.
* lib/regex.h (regaexec, reganexec, regawexec, regawnexec):
Added approximate matching API.
* src/agrep.c: First version of agrep (approximate grep). Uses
the new approximate matching feature in the matcher library.
* lib/regexec.c (tnfa_execute): Added a loop to quickly skip over
characters that cannot possibly be the first character of a
match.
* lib/regcomp.c (regwncomp): Related changes.
Sat Aug 3 23:42:29 2002 Ville Laurikari <vl@iki.fi>
* Moved the library part from src/ to lib/, and changed the name of
macros/ to m4/.
* lib/regex.c: Split into `regcomp.c', `regexec.c', and
`regerror.c'.
* lib/regerror.c, lib/regex.h: Threw away regwerror() since it was
pretty useless.
* lib/regerror.c: Internationalized. The error messages returned
by regerror() are now localized through gettext() if found.
Note that libintl is *not* included in the TRE package.
* po/fi.po: Finnish translation.
* lib/regcomp.c (parse_re): Fixed bugs (there were references to
before the start of the regexp string).
* lib/regcomp.c (parse_re): Fixed bugs in parsing BREs.
* tests/retest.c: Added test cases for the BRE stuff I fixed.
* lib/regexec.c (tnfa_execute): Fixed to work when the length of
the input string is given (e.g. with regnexec()).
Sun Jul 28 00:19:04 2002 Ville Laurikari <vl@iki.fi>
* lib/Makefile.am: Changed to install the header file `regex.h' to
$(includedir)/tre to avoid accidental inclusion with
"#include <regex.h>".
Mon Jun 24 19:34:50 2002 Ville Laurikari <vl@iki.fi>
* src/regex.c (ast_compute_nfl): Bugfix, did not mark an iteration
node as nullable if the minimum number of iterations was above
zero and the child was nullable. As a result, e.g. "(a*)+" did not
match the empty string.
* src/regex.c (ast_compute_tag_info): Bugfix, the tree was
traversed in the wrong order resulting in incorrect num_tags
counts for nodes in some cases. The results ranged from missing
submatches to segfaults.
* src/regex.c (make_transitions): Bugfix, if a transition between
two states was already handled then the code aborted the loop when
it should have just skipped to the next iteration. The result was
that sometimes some transitions were not added to the NFA and
matches were not found.
* src/regex.c (parse_bracket_items): Bugfix, referred one or two
characters past the end of the string in several places. E.g.
compiling the regex "[a-" could cause a segfault.
* src/regex.c (fill_pmatch): Bugfix. If the marker boundary tag
number is bigger than tnfa->num_tags, the marker boundary tag
does not exist (or rather it does, but is the same as the match
end point). The code here still used marker_boundary even if it
was bigger than num_tags, causing either segfaults, missing
submatches, or no symptoms.
* src/regex.c (tnfa_execute): Bugfix. When matches were found,
the first tag value was checked to be smaller than for the
previous match. Firstly, this was a redundant and useless check.
Secondly, it caused a segfault if REG_NOSUB was used when
compiling the regexp since there are no tags in that case and the
array is NULL.
* tests/retest.c: Added tests for all of the above. Thanks to
Glenn Fowler for running into the bugs and providing the test
cases.
* Released libtre-0.3.2.
Wed Mar 27 21:48:48 2002 Ville Laurikari <vl@iki.fi>
* src/regex.c: Added support for new zero-width assertions \b, \B,
\<, and \>. Fixed a bug in ^ and $.
* src/regex.c (parse_bound): Bugfix, had forgotten to handle
boundaries of the form "{12,}" altogether.
* src/regex.c (ast_add_tags): Bugfix, set the direction of the
current tag to MAXIMIZE at ADDTAGS_POST_CATENATION, but should not
have.
* src/regex.c (parse_re): A `)' is now interpreted as an ordinary
character in the absence of a matched `('.
* src/regex.c (regwncomp): Bugfix, did not set preg->re_nsub to
the number of parenthesized subexpressions.
* tests/retest.c: Added tests for all of the above.
* src/regex.c: Fixed to be completely thread safe. A single
compiled regexp can now be used simultaneously in several
contexts, e.g. in main() and a signal handler, or multiple
threads.
Wed Mar 20 19:50:37 2002 Ville Laurikari <vl@iki.fi>
* src/regex.c: Added support for Perl-compatible syntax
extensions: \t, \n, \r, \f, \a, \e, \w, \W, \s, \S, \d, \D.
* src/regex.c: Now expands character classes when using 8 bit
character sets so that iswctype() calls are avoided during
matching.
Sun Mar 3 17:50:14 2002 Ville Laurikari <vl@iki.fi>
* macros/*: Updated all macros (they were renamed from AC_* to
VL_*).
* macros/vl_check_sign.m4, macros/vl_decl_wchar_max.m4: Added.
* src/regex.c: Memory management cleanups. Much of the small
memory blocks, like AST nodes, are now allocated in large blocks
instead of one by one using the `tre_mem_t' allocator. This got
rid of hundreds of lines of confusing memory management code.
Sat Feb 16 23:47:01 2002 Ville Laurikari <vl@iki.fi>
* macros/ac_prog_cc_warnings.m4: Updated to version 1.3.
* macros/ac_decl_wchar_max.m4: Added this macro for checking
whether WCHAR_MAX is defined, and defining it if it isn't.
* configure.in: Added some checks for wide character stuff.
* src/regex.c: Added `tre_' prefix to all local type names to
avoid conflicts.
Mon Feb 11 21:48:03 2002 Ville Laurikari <vl@iki.fi>
* src/regex.c (parse_bound, parse_re): Enabled support for
bound expressions. The iterated atom is duplicated by parsing it
many times -- this seemed to be the simplest way to do it.
* src/Makefile.am: Changed library name from `libregx' to
`libtre'.
* tests/retest.c: Added tests for bound expressions.
Sun Feb 10 18:47:23 2002 Ville Laurikari <vl@iki.fi>
* src/regex.c (set_union): Bugfix, had `set1[s2].neg_classes' where
should have been `set2[s2].neg_classes' (this caused crashes).
* src/regex.c (ast_to_efree_tnfa): Bugfix, didn't check for
infinite maximum iteration count before making transitions for
them.
* src/regex.c: Added interfaces regncomp() and regwncomp() which
look only at the first `n' characters of the regexp pattern. Null
characters are allowed in the regexps when using these functions.
Sat Feb 9 22:39:08 2002 Ville Laurikari <vl@iki.fi>
* src/regex.c: Added wide character interface: regwcomp(),
regwexec(), and regwerror(). They work exactly like regcomp(),
regexec() and regerror() except that the strings are
`wchar_t *'. Also added support for multibyte character sets.
Fixed a lot of bugs (memory leaks, crashes) here and there.
* tests/retest.c: Added tests for multibyte character sets and
regcomp() error reporting.
* tests/randtest.c: Makes random strings and tries to compile them
with regcomp(). This can be used to find memory leaks and crashes
in the regexp compiler.
Sun Jan 27 21:42:06 2002 Ville Laurikari <vl@iki.fi>
* src/regex.c: Added support for bracket expressions
(e.g. "[abc]"). Multicharacter collating elements are not
supported, neither are equivalence classes.
* test: Renamed this directory to `tests'.
Sun Dec 2 20:20:12 2001 Ville Laurikari <vl@iki.fi>
* First public release.