lexnlp.extract.es package¶
Subpackages¶
Submodules¶
lexnlp.extract.es.copyrights module¶
-
class
lexnlp.extract.es.copyrights.
CopyrightEsParser
¶ Bases:
lexnlp.extract.common.copyrights.copyright_en_style_parser.CopyrightEnStyleParser
-
classmethod
extract_phrases_with_coords
(sentence: str) → List[Tuple[str, int, int]]¶
-
static
init_parser
()¶
-
line_processor
= <lexnlp.utils.lines_processing.line_processor.LineProcessor object>¶
-
classmethod
-
lexnlp.extract.es.copyrights.
get_copyright_annotations
(text: str, return_sources=False) → Generator[[lexnlp.extract.common.annotations.copyright_annotation.CopyrightAnnotation, None], None]¶
-
lexnlp.extract.es.copyrights.
get_copyright_list
(text: str, return_sources=False) → List[lexnlp.extract.common.annotations.copyright_annotation.CopyrightAnnotation]¶
-
lexnlp.extract.es.copyrights.
get_copyrights
(text: str, return_sources=False) → Generator[[dict, None], None]¶
lexnlp.extract.es.courts module¶
Court extraction for Spanish.
This module implements extraction functionality for courts in Spain, including formal names, abbreviations, and aliases.
-
lexnlp.extract.es.courts.
get_court_annotations
(text: str, language: str = None) → Generator[[dict, None], None]¶
-
lexnlp.extract.es.courts.
get_courts
(text: str, court_config_list: List[Tuple[int, str, int, List[Tuple[str, str, bool, int]]]], priority: bool = False, text_languages: List[str] = None) → Generator[[Tuple[Tuple, Tuple], Any], Any]¶ See lexnlp/extract/en/tests/test_courts.py
-
lexnlp.extract.es.courts.
setup_es_parser
()¶
lexnlp.extract.es.dates module¶
Date extraction for Spanish. Dates parser based on dateparser package
-
class
lexnlp.extract.es.dates.
ESDateParser
(text=None, language='en', dateparser_settings=None, enable_classifier_check=None, classifier_model=None, classifier_threshold=None)¶ Bases:
lexnlp.extract.common.dates.DateParser
-
DATEPARSER_SETTINGS
= {'DATE_ORDER': 'DMY', 'PREFER_DAY_OF_MONTH': 'first', 'STRICT_PARSING': False}¶
-
ENABLE_CLASSIFIER_CHECK
= False¶
-
SEQUENTIAL_DATES_RE
= regex.Regex('(?P<text>(?P<day>\\d{1,2}) de (?P<month>septiembre|diciembre|noviembre|setiembre|febrero|octubre|agosto|abril|enero|julio|junio|marzo|mayo|sept|abr|ago|dic|ene|feb|jul|jun|mar|may|nov|oct|sep|set)(?:, | y | de (?P<year>\\d{4})))', flags=regex.I | regex.M | regex.V0)¶
-
WEIRD_DATES_NORM
= [(regex.Regex('(\\d+º\\s?de (?:septiembre|diciembre|noviembre|setiembre|febrero|octubre|agosto|abril|enero|julio|junio|marzo|mayo|sept|abr|ago|dic|ene|feb|jul|jun|mar|may|nov|oct|sep|set)(?: de \\d{4})?)', flags=regex.I | regex.M | regex.V0), <function ESDateParser.<lambda>>)]¶
-
get_extra_dates
()¶ Add custom search logic; use self.TEXT, self.LANGUAGE, self.DATES; update self.DATES :return: None
-
lexnlp.extract.es.definitions module¶
-
class
lexnlp.extract.es.definitions.
SpanishParsingMethods
¶ Bases:
object
- the class contains methods with the same signature:
def method_name(phrase: str) -> List[DefinitionMatch]:
the methods are used for finding definition “candidates”
-
static
match_es_def_by_hereafter
(phrase: str) → List[lexnlp.extract.common.pattern_found.PatternFound]¶ - Parameters
phrase – las instrucciones de uso o instalación del software o todas las descripciones de uso del mismo (de aquí en adelante, la “Documentación”);
- Returns
{name: ‘Documentación’, probability: 100, …}
-
static
match_es_def_by_reffered
(phrase: str) → List[lexnlp.extract.common.pattern_found.PatternFound]¶ - Parameters
phrase – En este acuerdo, el término “Software” se refiere a: (i) el programa informático que acompaña a este Acuerdo y todos sus componentes;
- Returns
definitions (objects)
-
static
match_first_word_is
(phrase: str) → List[lexnlp.extract.common.pattern_found.PatternFound]¶ - Parameters
phrase – El tabaquismo es la adicción al tabaco, provocada principalmente.
- Returns
definitions (objects)
-
reg_first_word_is
= re.compile('^.+?(?=es\\s+\\w+\\W+\\w+|está\\s+\\w+\\W+\\w+)')¶
-
reg_hereafter
= re.compile('(?<=(en adelante[,\\s]))[\\w\\s*\\"*]+')¶
-
reg_reffered
= re.compile('^.+(?=se refiere)')¶
-
lexnlp.extract.es.definitions.
get_definition_annotations
(text: str, language=None) → Generator[[lexnlp.extract.common.annotations.definition_annotation.DefinitionAnnotation, None], None]¶
-
lexnlp.extract.es.definitions.
get_definition_list
(text: str, language=None) → List[lexnlp.extract.common.annotations.definition_annotation.DefinitionAnnotation]¶
-
lexnlp.extract.es.definitions.
get_definitions
(text: str, language=None) → Generator[[dict, None], None]¶
-
lexnlp.extract.es.definitions.
make_es_definitions_parser
()¶
lexnlp.extract.es.language_tokens module¶
lexnlp.extract.es.regulations module¶
-
class
lexnlp.extract.es.regulations.
RegulationsParser
(regulations_dataframe: pandas.core.frame.DataFrame = None)¶ Bases:
object
Parses Spanish regulations (acts, institutions and so on): - “la emisión de instrumentos inscritos en el Registro Nacional de Valores, colocados”
boils down to ‘Registro Nacional de Valores’
expects words like ‘registro’, ‘comisión’, ‘comision’, ‘ley del’ that open the following phrase
-
get_annotations_as_dictionaries
() → List¶
-
load_trigger_words
() → None¶
-
match_start_trigger
(phrase: str) → None¶ - Parameters
phrase – mediante la emisión de instrumentos inscritos en el Registro Nacional de Valores, colocados
- Returns
{name: ‘Registro Nacional de Valores’, probability: 100, …}
-
parse
(text: str, locale: str = None) → List[lexnlp.extract.common.annotations.regulation_annotation.RegulationAnnotation]¶
-
setup_regexes
() → None¶
-
trim_annotations
() → None¶
-
lexnlp.extract.es.regulations.
get_regulation_annotations
(text: str, language: str = None) → Generator[[lexnlp.extract.common.annotations.regulation_annotation.RegulationAnnotation, None], None]¶
-
lexnlp.extract.es.regulations.
get_regulation_list
(text: str, language: str = None) → List[lexnlp.extract.common.annotations.regulation_annotation.RegulationAnnotation]¶
-
lexnlp.extract.es.regulations.
get_regulations
(text: str, language: str = None) → Generator[[dict, None], None]¶
-
lexnlp.extract.es.regulations.
make_de_regulations_parser
()¶