hashformers.segmenter package
Submodules
hashformers.segmenter.segmenter module
- class hashformers.segmenter.segmenter.BaseWordSegmenter(segmenter=None, reranker=None, ensembler=None)
Bases:
hashformers.segmenter.base_segmenter.BaseSegmenter
A general-purpose word segmentation API.
- segment(word_list: List[str], segmenter_run: Optional[Any] = None, preprocessing_kwargs: dict = {}, segmenter_kwargs: dict = {}, ensembler_kwargs: dict = {}, reranker_kwargs: dict = {}, use_reranker: bool = True, use_ensembler: bool = True, return_ranks: bool = False) Any
- class hashformers.segmenter.segmenter.TweetSegmenter(matcher=None, word_segmenter=None)
Bases:
hashformers.segmenter.base_segmenter.BaseSegmenter
- build_hashtag_container(tweets: str, preprocessing_kwargs: dict = {}, segmenter_kwargs: dict = {})
- compile_dict(hashtags, segmentations, hashtag_token=None, lower=False, separator=' ', hashtag_character='#')
- extract_hashtags(tweets)
- replace_hashtags(tweet, regex_pattern, replacement_dict)
- segment(tweets: List[str], regex_flag: Any = 0, preprocessing_kwargs: dict = {}, segmenter_kwargs: dict = {})
- segmented_tweet_generator(tweets, hashtags, hashtag_set, replacement_dict, flag=0)
- class hashformers.segmenter.segmenter.TwitterTextMatcher
Bases:
object