hashformers.segmenter package

Submodules

hashformers.segmenter.segmenter module

class hashformers.segmenter.segmenter.BaseWordSegmenter(segmenter=None, reranker=None, ensembler=None)

Bases: hashformers.segmenter.base_segmenter.BaseSegmenter

A general-purpose word segmentation API.

segment(word_list: List[str], segmenter_run: Optional[Any] = None, preprocessing_kwargs: dict = {}, segmenter_kwargs: dict = {}, ensembler_kwargs: dict = {}, reranker_kwargs: dict = {}, use_reranker: bool = True, use_ensembler: bool = True, return_ranks: bool = False) Any
class hashformers.segmenter.segmenter.TweetSegmenter(matcher=None, word_segmenter=None)

Bases: hashformers.segmenter.base_segmenter.BaseSegmenter

build_hashtag_container(tweets: str, preprocessing_kwargs: dict = {}, segmenter_kwargs: dict = {})
compile_dict(hashtags, segmentations, hashtag_token=None, lower=False, separator=' ', hashtag_character='#')
extract_hashtags(tweets)
replace_hashtags(tweet, regex_pattern, replacement_dict)
segment(tweets: List[str], regex_flag: Any = 0, preprocessing_kwargs: dict = {}, segmenter_kwargs: dict = {})
segmented_tweet_generator(tweets, hashtags, hashtag_set, replacement_dict, flag=0)
class hashformers.segmenter.segmenter.TwitterTextMatcher

Bases: object

class hashformers.segmenter.segmenter.WordSegmenterCascade(cascade_nodes)

Bases: hashformers.segmenter.base_segmenter.BaseSegmenter

generate_pipeline(word_list)
segment(word_list, **kwargs)

Module contents