-
Notifications
You must be signed in to change notification settings - Fork 6
/
Copy pathREADME
571 lines (435 loc) · 25.1 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
= Lingo - A full-featured automatic indexing system
* {Version}[rdoc-label:label-VERSION]
* {Description}[rdoc-label:label-DESCRIPTION]
* {Introduction}[rdoc-label:label-Introduction]
* {Attendees}[rdoc-label:label-Attendees]
* {Filters}[rdoc-label:label-Filters]
* {Markup}[rdoc-label:label-Markup]
* {Inline annotation}[rdoc-label:label-Inline+annotation]
* {Plugins}[rdoc-label:label-Plugins]
* {Server}[rdoc-label:label-Server]
* {JSON endpoint}[rdoc-label:label-JSON+endpoint]
* {Raw endpoint}[rdoc-label:label-Raw+endpoint]
* {Deployment}[rdoc-label:label-Deployment]
* {Example}[rdoc-label:label-EXAMPLE]
* {Installation and Usage}[rdoc-label:label-INSTALLATION+AND+USAGE]
* {Dictionary and configuration file lookup}[rdoc-label:label-Dictionary+and+configuration+file+lookup]
* {File formats}[rdoc-label:label-FILE+FORMATS]
* {Configuration}[rdoc-label:label-Configuration]
* {Language definition}[rdoc-label:label-Language+definition]
* {Dictionaries}[rdoc-label:label-Dictionaries]
* {Encoding word classes and gender information}[rdoc-label:label-Encoding+word+classes+and+gender+information]
* {Lexicalizing multiword expressions}[rdoc-label:label-Lexicalizing+multiword+expressions]
* {Lexicalizing compounds}[rdoc-label:label-Lexicalizing+compounds]
* {Issues and Contributions}[rdoc-label:label-ISSUES+AND+CONTRIBUTIONS]
* {Links}[rdoc-label:label-LINKS]
* {Literature}[rdoc-label:label-LITERATURE]
* {Background and theoretical foundations}[rdoc-label:label-Background+and+theoretical+foundations]
* {Research publications}[rdoc-label:label-Research+publications]
* {Credits}[rdoc-label:label-CREDITS]
* {Authors}[rdoc-label:label-Authors]
* {Contributors}[rdoc-label:label-Contributors]
* {License and Copyright}[rdoc-label:label-LICENSE+AND+COPYRIGHT]
== VERSION
This documentation refers to Lingo version 1.10.2
== DESCRIPTION
Lingo is an open source indexing system for research and teachings. The main
functions of Lingo are:
* identification of (i.e. reduction to) basic word form by means of dictionaries
and suffix lists
* algorithmic decomposition
* dictionary-based synonymization and identification of phrases
* generic identification of phrases/word sequences based on patterns of word
classes
=== Introduction
Lingo allows flexible and extendable linguistic analysis of text files. Here
is a minimal configuration example to analyse this README file:
meeting:
attendees:
- text_reader: { files: 'README' }
- debugger: { eval: 'true', ceval: 'cmd!=:EOL', prompt: '<debug>: ' }
Lingo is told to invite two attendees and wants them to talk to each other,
hence the name Lingo (= the technical language).
The first attendee is the text_reader[rdoc-ref:Lingo::Attendee::TextReader].
It can read files and communicates their content to other attendees. For this
purpose, the +text_reader+ is given an output channel. Everything that the
+text_reader+ has to say is steered through this channel. It will do nothing
further until Lingo tells the first attendee to speak. Then the +text_reader+
will open the file +README+ (as per the +files+ parameter) and pass the content
to the other attendees via its output channel.
The second attendee, debugger[rdoc-refLingo::Attendee::Debugger], does nothing
else than to put everything on the console (standard error) that comes into its
input channel. If you write the Lingo configuration which is shown above as an
example into the file <tt>readme.cfg</tt> and then run <tt>lingo -c readme -l en</tt>,
the result will look something like this:
<debug>: *FILE('README')
<debug>: "= Lingo - [...]"
...
<debug>: "Lingo allows flexible and extendable linguistic analysis [...]"
<debug>: "is a minimal configuration example to analyse this README [...]"
...
<debug>: *EOF('README')
What we see are lines beginning with an asterisk (<tt>*</tt>) and lines without.
That's because Lingo distinguishes between commands and data. The +text_reader+
did not only read the content of the file, but also communicated through the
commands when a file began and when it ended. This can (and will) be an
important piece of information for other attendees that will be added later.
To try out Lingo's functionality without installing it first, have a look at
{Lingo Web}[http://ixtrieve.fh-koeln.de/lingoweb]. There you can enter some
text and see the debug output Lingo generated -- including tokenization, word
identification, decomposition, etc.
=== Attendees
Available attendees that can be used for solving a specific problem (for more
information see each attendee's documentation):
+text_reader+:: Reads files (or standard input) and puts their content into
the channels line by line. (see Lingo::Attendee::TextReader)
+tokenizer+:: Dissects lines into defined character strings, i.e. tokens.
(see Lingo::Attendee::Tokenizer)
+abbreviator+:: Identifies abbreviations and produces the long form if
listed in a dictionary. (see Lingo::Attendee::Abbreviator)
+word_searcher+:: Identifies tokens and turns them into words for further
processing. To this end, it consults the dictionaries.
(see Lingo::Attendee::WordSearcher)
+stemmer+:: Identifies tokens not identified by the +word_searcher+ by
means of stemming. (see Lingo::Attendee::Stemmer)
+decomposer+:: Tests any tokens not identified by the +word_searcher+ for
being compounds. (see Lingo::Attendee::Decomposer)
+synonymer+:: Extends words with their synonyms. (see
Lingo::Attendee::Synonymer)
+vector_filter+:: Filters out everything and lets through only those tokens
that are considered useful for indexing. (see
Lingo::Attendee::VectorFilter)
+object_filter+:: Similar to the +vector_filter+. (see
Lingo::Attendee::ObjectFilter)
+text_writer+:: Writes anything that it receives into a file (or to
standard output). (see Lingo::Attendee::TextWriter)
+formatter+:: Similar to the +text_writer+, but allows for custom output
formats. (see Lingo::Attendee::Formatter)
+debugger+:: Shows everything for debugging. (see
Lingo::Attendee::Debugger)
+variator+:: Tries to correct spelling errors and the like. (see
Lingo::Attendee::Variator)
+multi_worder+:: Identifies phrases (word sequences) based on a multiword
dictionary. (see Lingo::Attendee::MultiWorder)
+sequencer+:: Identifies phrases (word sequences) based on patterns of
word classes. (see Lingo::Attendee::Sequencer)
Furthermore, it may be useful to have a look at the configuration files
<tt>lingo.cfg</tt> and <tt>en.lang</tt>.
=== Filters
Lingo is able to read HTML, XML, and PDF in addition to plain text.
_Examples_:
Read any file, guessing the correct type automatically:
- text_reader: { files: $(files), filter: true }
Read HTML files specifically (accordingly for XML):
- text_reader: { files: $(files), filter: 'html' }
Read PDF files, either with the pdf-reader[http://rubygems.org/gems/pdf-reader]
gem (default):
- text_reader: { files: $(files), filter: 'pdf' }
or with the pdftotext[http://en.wikipedia.org/wiki/Pdftotext] command line tool:
- text_reader: { files: $(files), filter: 'pdftotext' }
=== Markup
Lingo is able to, in a limited form, parse HTML/XML and
MediaWiki[http://mediawiki.org/wiki/Help:Formatting] markup.
_Examples_:
Identify HTML/XML tags in the input stream:
- tokenizer: { tags: true }
Identify MediaWiki markup in the input stream:
- tokenizer: { wiki: true }
=== Inline annotation
Lingo is able to annotate input text inline, instead of printing results out
of context to external files.
_Example_:
# read files
- text_reader: { files: $(files) }
# keep whitespace
- tokenizer: { space: true }
# do processing...
- word_searcher: { source: sys-dic, mode: first }
# insert formatted results (e.g. "[[Name::lingo|Lingo]] finds [[Noun::word|words]].")
- formatter: { ext: out, format: '[[%3$s::%2$s|%1$s]]', map: { e: Name, s: Noun } }
=== Plugins
Lingo has a plugin system that allows you to implement additional features
(e.g. add new attendees) or modify existing ones. Just create a file named
+lingo_plugin.rb+ in your Gem's +lib+ directory or any directory that's in
<tt>$LOAD_PATH</tt>. You can also define an environment variable
+LINGO_PLUGIN_PATH+ (by default <tt>~/.lingo/plugins</tt>) with additional
directories to load plugins from (<tt>*.rb</tt>).
A dedicated API to support writing and integrating plugins will be added in
the future.
=== Server
Lingo comes with a server daemon Lingo::Srv that exposes an HTTP interface to
Lingo's functionality. The configuration needs to ensure that input is read
from standard input (<tt>files: STDIN</tt> on +text_reader+) and output is
written to standard output (<tt>ext: STDOUT</tt> on +text_writer+).
_Example_: Start Lingo server on port 6789 with language configuration +en+
and default configuration file; server options come before <tt>--</tt>, Lingo
options come after.
> lingosrv -p 6789 -- -l en
You can also pass Lingo options through the +LINGO_SRV_OPTS+ environment
variable (e.g., <tt>LINGO_SRV_OPTS='-l en -c /path/to/your/srv.cfg'</tt>).
==== JSON endpoint
_Example_: Ask the server about "Lingo server"; returns JSON data (output
formatted for clarity).
> curl 'http://localhost:6789/?q=Lingo+server'
{
"Lingo server" : [
" <Lingo = [(lingo/s), (lingo/e)]>",
" <server = [(server/s)]>"
]
}
_Example_: Ask the server about "Lingo" and "server"; returns JSON data (output
formatted for clarity).
> curl -g 'http://localhost:6789/?q[]=Lingo&q[]=server'
{
"[\"Lingo\", \"server\"]" : {
"Lingo" : [
" <Lingo = [(lingo/s), (lingo/e)]>"
],
"server" : [
" <server = [(server/s)]>"
]
}
}
==== Raw endpoint
_Example_: Ask the server about "Lingo server"; returns raw Lingo response.
> curl --data 'Lingo server' http://localhost:6789/raw
<Lingo = [(lingo/s), (lingo/e)]>
<server = [(server/s)]>
_Example_: Ask the server about this file; returns raw Lingo response (output
truncated for clarity).
> curl --data @README -H 'Content-Type: text/plain' http://localhost:6789/raw
:=/OTHR:
<Lingo = [(lingo/s), (lingo/e)]>
<-|?>
<A|?>
<full-featured|COM = [(full-featured/k), (full/s+), (full/a+), (full/v+), (featured/a+)]>
<automatic = [(automatic/s), (automatic/a)]>
<indexing = [(index/v)]>
<system = [(system/s)]>
[...]
==== Deployment
Lingo::Srv can be started directly through the provided command-line executable
+lingosrv+ (see above) or through any other Rack[http://rack.github.com/]
-compatible deployment option; a +rackup+ file is included (see <tt>lingoctl
rackup srv</tt>).
_Example_: To deploy Lingo::Srv with Passenger[http://phusionpassenger.com/]
on Apache, create a symlink in the DocumentRoot pointing to the app's
<tt>public/</tt> directory; adjust the paths according to your environment
(you can use current_gem[http://blackwinter.github.com/current_gem] to
create a stable gem path):
/var/www
|
+-- lingo-srv -> /usr/lib/ruby/gems/2.1.0/gems/lingo-x.y.z/lib/lingo/srv/public
Then put the following snippet in Apache's VirtualHost configuration:
<VirtualHost *:80>
...
RackBaseURI /lingo-srv
<Directory /var/www/lingo-srv>
Options -MultiViews
SetEnv LINGO_SRV_OPTS "-l en" # <-- Optionally set Lingo options
</Directory>
</VirtualHost>
In order to provide your own +rackup+ file and Lingo configuration, create a
directory with those files:
/srv/lingo-srv
|
+-- config.ru
|
+-- lingosrv.cfg
And then point Passenger at it:
<VirtualHost *:80>
...
RackBaseURI /lingo-srv
<Directory /var/www/lingo-srv>
Options -MultiViews
PassengerAppRoot /srv/lingo-srv # <-- Add this line
</Directory>
</VirtualHost>
Restart Apache and test the result (output formatted for clarity):
> curl http://localhost/lingo-srv/about
{
"Lingo::Srv" : {
"version" : "x.y.z"
}
}
== EXAMPLE
TODO: Full-fledged example to show off Lingo's features and provide a basis
for further discussion.
== INSTALLATION AND USAGE
Since version 1.8.0, Lingo is available as a
RubyGem[http://rubygems.org/gems/lingo]. So a simple <tt>gem install lingo</tt>
will install Lingo and its dependencies. You might want to run that command
with administrator privileges, depending on your environment. Then you can call
the +lingo+ executable to process your text files. See <tt>lingo --help</tt>
for available options.
Please note that Lingo requires Ruby version 2.1 or higher to run
(2.6[http://ruby-lang.org/en/downloads/] is the currently recommended
version).
Since Lingo depends on native extensions, you need to make sure that
development files for your Ruby version are installed. On Debian-based
Linux platforms, they are included in the package <tt>ruby-dev</tt>;
other distributions may have a similarly named package. On Windows, those
development files are currently not required.
On JRuby, install gdbm[https://rubygems.org/gems/gdbm] for efficient database
operations: <tt>gem install gdbm</tt>.
=== Dictionary and configuration file lookup
Lingo will search different locations to find dictionaries and configuration
files. By default, these are the current working directory, your personal Lingo
directory (<tt>~/.lingo</tt>) and the installation directory (in that order).
You can control this lookup path by either moving files up the chain (using
the +lingoctl+ executable) or by setting various environment variables.
With +lingoctl+ you can copy dictionaries and configuration files from your
personal Lingo directory or the installation directory to the current working
directory so you can modify them and they will take precedence over the
original ones. See <tt>lingoctl --help</tt> for usage information.
In order to change the search path itself, you can define the
+LINGO_PATH+ environment variable as a whole or its individual parts
+LINGO_CURR+ (the local Lingo directory), +LINGO_HOME+ (your personal
Lingo directory), and +LINGO_BASE+ (the system-wide Lingo directory).
Inside of any of these directories, dictionaries and configuration files are
typically organized in the following directory structure:
<tt>config/</tt>:: Configuration files (<tt>*.cfg</tt>).
<tt>dict/</tt>:: Dictionary source files (<tt>*.txt</tt>) in
language-specific subdirectories (+de/+, +en/+, ...).
<tt>lang/</tt>:: Language definition files (<tt>*.lang</tt>).
<tt>store/</tt>:: Compiled dictionaries, generated from source files.
But for compatibility reasons these naming conventions are not enforced.
== FILE FORMATS
Lingo uses three different types of files to determine its behaviour:
{configuration files}[rdoc-label:label-Configuration] control the details of the
indexing process; {language definitions}[rdoc-label:label-Language+definition]
specify grammar rules and dictionaries available for indexing;
dictionaries[rdoc-label:label-Dictionaries], finally, hold the
vocabulary used in indexing the input text and producing the results.
=== Configuration
Configuration files are defined in the YAML[http://yaml.org/] syntax. They
specify the attendees[rdoc-label:label-Attendees] to call in order and the
options to provide them with. The first attendee in any indexing process is
the text_reader[rdoc-ref:Lingo::Attendee::TextReader], who reads the input
text and passes it on to the other attendees. Every attendee transforms or
extends the input stream and automatically sends everything down to the next
attendee. This process may be customized by explicitly specifying the input
and/or output channels of individual attendees with the +in+ and +out+ options.
_Example_:
# input is taken from the previous attendee,
# output is sent to the named channel "syn"
- synonymer: { skip: '?,t', source: sys-syn, out: syn }
# input is taken from the named channel "syn",
# output is sent to the next attendee
- vector_filter: { in: syn, lexicals: y, sort: term_abs }
# input is taken from the previous attendee,
# output is sent to the next attendee
- text_writer: { ext: syn, sep: "\n" }
# input is taken from the named channel "syn"
# (ignoring the output of the previous attendee),
# output is sent to the next attendee
- vector_filter: { in: syn, lexicals: m }
# input is taken from the previous attendee,
# output is sent to the next attendee
- text_writer: { ext: mul, sep: "\n" }
=== Language definition
Language definitions, like {configuration files}[rdoc-label:label-Configuration],
are defined in the YAML[http://yaml.org/] syntax. They specify the
dictionaries[rdoc-label:label-Dictionaries] to be used as well as the grammar
rules according to which the input shall be processed. These settings do not
necessarily have to coincide with an existing language, they are
application-specific.
=== Dictionaries
Dictionaries come in different varieties and encode the knowledge about the
vocabulary used for indexing and analysis.
Supported dictionary formats:
+SingleWord+:: One word (projection) per line. E.g. <tt>open source</tt>. (see
Lingo::Database::Source::SingleWord)
+MultiValue+:: Multiple words per line (separated with a unique symbol), all of
which are interpreted as belonging to a single equivalence class.
E.g. <tt>fax;telefax;facsimile</tt>. (see
Lingo::Database::Source::MultiValue)
+MultiKey+:: Similar to +MultiValue+, except that the first word will be
treated as the preferred term (descriptor). E.g.
<tt>fax;telefax;facsimile</tt>. (see
Lingo::Database::Source::MultiKey)
+KeyValue+:: One word and its associated projection per line, separated with
a unique symbol. E.g. <tt>abfrage*query</tt>. (see
Lingo::Database::Source::KeyValue)
+WordClass+:: Similar to +KeyValue+, except that the projection may consist of
multiple lexicalizations, each with its own word class and
(optional) gender information. E.g. <tt>abort,abort #s|v</tt>,
which is equivalent to <tt>abort,abort #s abort #v</tt>. (see
Lingo::Database::Source::WordClass)
==== Encoding word classes and gender information
TODO...
==== Lexicalizing multiword expressions
TODO...
==== Lexicalizing compounds
TODO...
== ISSUES AND CONTRIBUTIONS
If you find bugs or want to suggest new features, please report them
on GitHub[http://github.com/lex-lingo/lingo/issues]. Include your Ruby
version (<tt>ruby --version</tt>) and the version of Lingo you are using
(<tt>lingo --version</tt>).
If you want to contribute to Lingo, please fork the project
on GitHub[http://github.com/lex-lingo/lingo] and submit a
{pull request}[http://github.com/lex-lingo/lingo/pulls]
(bonus points for topic branches).
To make sure that Lingo's tests pass, install hen[http://blackwinter.github.com/hen]
(typically <tt>gem install hen</tt>) and all development dependencies (either with
<tt>gem install --development lingo</tt> or manually; see <tt>rake gem:dependencies</tt>).
Then run <tt>rake test</tt> for the basic tests or <tt>rake test:all</tt> for
the full test suite.
== LINKS
Website:: http://lex-lingo.de
Demo:: http://lex-lingo.de/lingoweb
Documentation:: http://lex-lingo.de/doc
Source code:: https://github.com/lex-lingo/lingo
RubyGem:: https://rubygems.org/gems/lingo
Bug tracker:: https://github.com/lex-lingo/lingo/issues
Travis CI:: https://travis-ci.org/lex-lingo/lingo
== LITERATURE
=== Background and theoretical foundations
* Gödert, W.; Lepsky, K.; Nagelschmidt, M.: <em>{Informationserschließung und Automatisches Indexieren: ein Lehr- und Arbeitsbuch}[http://dx.doi.org/10.1007/978-3-642-23513-9]</em>. (German) Berlin etc.: Springer, 2012.
* Lepsky, K.; Vorhauer, J.: <em>{Lingo – ein open source System für die automatische Indexierung deutschsprachiger Dokumente}[http://dx.doi.org/10.1515/ABITECH.2006.26.1.18]</em>. (German) In: ABI Technik 26 (1), 2006. pp 18-29.
* Nohr, H.: <em>{Grundlagen der automatischen Indexierung: ein Lehrbuch}[http://logos-verlag.de/cgi-bin/buch/isbn/0121]</em>. (German) Berlin: Logos, 2005.
* Hausser, R.: <em>{Grundlagen der Computerlinguistik. Mensch-Maschine-Kommunikation in natürlicher Sprache}[http://zbmath.org/?q=an:0956.68141]</em>. (German) Berlin etc.: Springer, 2000.
* Allen, J.: <em>{Natural language understanding}[http://zbmath.org/?q=an:0851.68106]</em>. (English) Redwood City, CA: Benjamin/Cummings, 1995.
* Grishman, R.: <em>{Computational linguistics: an introduction}[http://cambridge.org/9780521310383]</em>. (English) Cambridge: Cambridge Univ. Press, 1986.
* Salton, G.; McGill, M.: <em>{Introduction to modern information retrieval}[http://zbmath.org/?q=an:0523.68084]</em>. (English) New York etc.: McGraw-Hill, 1983.
* Porter, M.: <em>{An algorithm for suffix stripping}[http://tartarus.org/~martin/PorterStemmer/]</em>. (English) In: Program 14 (3), 1980. pp 130-137.
=== Research publications
* Busch, D.: <em>{Domänenspezifische hybride automatischeIndexierung von bibliographischen Metadaten}[https://www.b-i-t-online.de/heft/2019-06-fachbeitrag-busch.pdf]</em>. (German) In: b.i.t.online 22 (6), 2019. pp 465-469.
* Grün, S.: <em>Mehrwortbegriffe und Latent Semantic Analysis: Bewertung automatisch extrahierter Mehrwortgruppen mit LSA</em>. (German) Düsseldorf: Heinrich-Heine-Universität Düsseldorf, 2017.
* Siebenkäs, A.; Markscheffel, B.: <em>{Conception of a workflow for the semi-automatic construction of a thesaurus for the German printing industry}[https://zenodo.org/record/17945]</em>. (English) In: Re:inventing Information Science in the Networked Society. Proceedings of the 14th International Symposium on Information Science (ISI 2015), Zadar, Croatia, 19th-21st May 2015. Eds.: F. Pehar, C. Schlögl, C. Wolff. Glückstadt: Verlag Werner Hülsbusch, 2015. pp 217-229.
* Grün, S.: <em>Bildung von Komposita-Indextermen auf der Basis einer algorithmischen Mehrwortgruppenanalyse mit Lingo</em>. (German) Köln: Fachhochschule Köln, 2015.
* Bredack, J.; Lepsky, K.: <em>{Automatische Extraktion von Fachterminologie aus Volltexten}[http://dx.doi.org/10.1515/abitech-2014-0002]</em>. (German) In: ABI Technik 34 (1), 2014. pp 2-12.
* Bredack, J.: <em>{Terminologieextraktion von Mehrwortgruppen in kunsthistorischen Fachtexten}[http://ixtrieve.fh-koeln.de/lehre/bredack-2013.pdf]</em>. (German) Köln: Fachhochschule Köln, 2013.
* Maylein, L.; Langenstein, A.: <em>{Neues vom Relevanz-Ranking im HEIDI-Katalog der Universitätsbibliothek Heidelberg}[http://b-i-t-online.de/heft/2013-03-fachbeitrag-maylein.pdf]</em>. (German) In: b.i.t.online 16 (3), 2013. pp 190-200.
* Gödert, W.: <em>{Detecting multiword phrases in mathematical text corpora}[http://arxiv.org/abs/1210.0852]</em>. (English) arXiv:1210.0852 [cs.CL], 2012.
* Jersek, T.: <em>{Automatische DDC-Klassifizierung mit Lingo: Vorgehensweise und Ergebnisse}[http://www.citeulike.org/user/klaus-lepsky/article/12476139]</em>. (German) Köln: Fachhochschule Köln, 2012.
* Glaesener, L.: <em>{Automatisches Indexieren einer informationswissenschaftlichen Datenbank mit Mehrwortgruppen}[http://www.citeulike.org/user/klaus-lepsky/article/12476133]</em>. (German) Köln: Fachhochschule Köln, 2012.
* Schiffer, R.: <em>{Automatisches Indexieren technischer Kongressschriften}[http://ixtrieve.fh-koeln.de/lehre/schiffer-2007.pdf]</em>. (German) Köln: Fachhochschule Köln, 2007.
== CREDITS
Lingo is based on a collective development by Klaus Lepsky and John Vorhauer.
=== Authors
* John Vorhauer <mailto:lingo@vorhauer.de>
* Jens Wille <mailto:jens.wille@gmail.com>
=== Contributors
* Klaus Lepsky <mailto:klaus@lepsky.de>
* Jan-Helge Jacobs <mailto:plancton@web.de>
* Thomas Müller <mailto:thomas.mueller@gesis.org>
* Yulia Dorokhova <mailto:jdorokhova@hse.ru>
== LICENSE AND COPYRIGHT
Copyright (C) 2005-2007 John Vorhauer
Copyright (C) 2007-2019 John Vorhauer, Jens Wille
Lingo is free software: you can redistribute it and/or modify it under the
terms of the GNU Affero General Public License as published by the Free
Software Foundation, either version 3 of the License, or (at your option)
any later version.
Lingo is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more
details.
You should have received a copy of the GNU Affero General Public License along
with Lingo. If not, see <http://www.gnu.org/licenses/>.