libleipzig -- wortschatz.uni-leipzig.de binding

libleipzig-python provides a wrapper to the web services provided by the Deutscher Wortschatz project of the University of Leipzig. Deutscher Wortschatz is a German database of text corpora and can be utilized to analyze and contextualize words in the thesaurus. libleipzig currently supports all public service calls. These do not require authentication and are provided free of charge for private or scientific purposes.

Contents

Attention!

libleipzig prefetches all service interfaces on initial load. This process requires an Internet connection.

Subsequent imports use indefinitely cached definitions (WSDL files).

Example

>>> from libleipzig import * # might take some time initially
>>> r = Baseform(u"Schlangen")
>>> r # doctest: +NORMALIZE_WHITESPACE
[(Grundform: u'Schlange', Wortart: u'N'),
 (Grundform: u'Schlangen', Wortart: u'S')]
>>> r[0].Grundform
u'Schlange'
>>> help(Baseform) # doctest: +NORMALIZE_WHITESPACE
Help on function Baseform in module libleipzig.protocol:
Baseform(*vectors)
    Baseform(Wort) -> Grundform, Wortart
        Return the lemmatized (base) form.
>>>

Dependencies

Changelog

1.1
  • Bumped suds version to 0.3.9.
  • Fixed numerous unicode issues and pointed out potential pitfalls.
  • Fixed caching to be persistent but lazy.
  • Upgraded virtual environment to incremental build steps.
  • Pushed tests into installed package.

Reference

RightNeighbours(Wort, Limit)

Returns:Wort, Nachbar, Signifikanz

Return statistically significant right neighbours (words co-occurring immediately next to the input word).

Kreuzwortraetsel(Wort, Wortlaenge, Limit)

Returns:Wort

Return words that match the pattern Wort. The percentage sign (%) acts as a wildmask.

Baseform(Wort)

Returns:Grundform, Wortart

Return the lemmatized (base) form.

Similarity(Wort, Limit)

Returns:Wort, Verwandter, Signifikanz

Return automatically computed contextually similar words of the input word. Such similar words may be antonyms, hyperonyms, synonyms, cohyponyms or other. Note that due to the huge amount of data any query to this service may take a long time.

LeftNeighbours(Wort, Limit)

Returns:Nachbar, Wort, Signifikanz

Returns statistically significant left neighbours (words co-occurring immediately next to the input word).

RightCollocationFinder(Wort, Wortart, Limit)

Returns:Wort, Kollokation, Wortart

Attempt to find linguistic collocations that occur right to the word. The Wortart parameter shall be either A, V, N, or S meaning adjective, verb, noun and stopword, respectively. The parameter restricts the type of words found.

Frequencies(Wort)

Returns:Anzahl, Frequenzklasse

Return the frequency and frequency class. Frequency classes are computed in relation to the most frequent word in the corpus. The higher the class, the rarer the word.

Thesaurus(Wort, Limit)

Returns:Synonym

Return synonyms (like the Synonyms service). However, this lemmatizes the input word first and thus returns more synonyms.

Synonyms(Wort, Limit)

Returns:Synonym

Return synonyms. In other words, this is a thesaurus.

Sentences(Wort, Limit)

Returns:Satz

Return sample sentences containing the input word.

LeftCollocationFinder(Wort, Wortart, Limit)

Returns:Kollokation, Wortart, Wort

Attempt to find linguistic collocations that occur left to the word. The Wortart parameter shall be either A, V, N, or S meaning adjective, verb, noun and stopword, respectively. The parameter restricts the type of words found.

Wordforms(Word, Limit)

Returns:Form

Return all other word forms of the same lemma.

Sachgebiet(Wort)

Returns:Sachgebiet

Return categories.

Cooccurrences(Wort, Mindestsignifikanz, Limit)

Returns:Wort, Kookkurrenz, Signifikanz

Return statistically significant co-occurrences.