Module stylotool.src.freestylo.EpiphoraAnnotation

Classes

class EpiphoraAnnotation (text: freestylo.TextObject.TextObject, min_length=2, conj=['and', 'or', 'but', 'nor'], punct_pos='PUNCT')

This class is used to find epiphora candidates in a text. It uses the TextObject class to store the text and its annotations.

Constructor for the EpiphoraAnnotation class.

Parameters

text : TextObject
The text to be analyzed.
min_length : int, optional
The minimum length of the epiphora candidates.
conj : list, optional
A list of conjunctions that should be considered when looking for epiphora.
punct_pos : str, optional
The part of speech tag for punctuation.
Expand source code
class EpiphoraAnnotation:
    """
    This class is used to find epiphora candidates in a text.
    It uses the TextObject class to store the text and its annotations.
    """
    def __init__(self, text : TextObject, min_length=2, conj = ["and", "or", "but", "nor"], punct_pos="PUNCT"):
        """
        Constructor for the EpiphoraAnnotation class.

        Parameters
        ----------
        text : TextObject
            The text to be analyzed.
        min_length : int, optional
            The minimum length of the epiphora candidates.
        conj : list, optional
            A list of conjunctions that should be considered when looking for epiphora.
        punct_pos : str, optional
            The part of speech tag for punctuation.
        """

        self.text = text
        self.candidates = []
        self.min_length = min_length
        self.conj = conj
        self.punct_pos = punct_pos

    def split_in_phrases(self):
        """
        This method splits the text into phrases.

        Returns
        -------
        list
            A list of lists, each containing the start and end index of a phrase.
        """
            
        phrases = []
        current_start = 0
        for i, token in enumerate(self.text.tokens):
            if token in self.conj or self.text.pos[i] == self.punct_pos:
                if i-current_start > 2:
                    phrases.append([current_start, i])
                    current_start = i+1
        phrases.append([current_start, len(self.text.tokens)])
        return phrases


    def find_candidates(self):
        """
        This method finds epiphora candidates in the text.
        """
        candidates = []
        current_candidate = EpiphoraCandidate([], "")
        phrases = self.split_in_phrases()
        for phrase in phrases:
            word = self.text.tokens[phrase[1]-1]
            if word != current_candidate.word:
                if len(current_candidate.ids) >= self.min_length:
                    candidates.append(current_candidate)
                current_candidate = EpiphoraCandidate([phrase], word)
            else:
                current_candidate.ids.append(phrase)
        self.candidates = candidates

    def serialize(self) -> list:
        """
        This method serializes the epiphora candidates.

        Returns
        -------
        list
            A list of dictionaries, each containing the ids, length, and word of an epiphora candidate.
        """
        candidates = []
        for c in self.candidates:
            candidates.append({
                "ids": c.ids,
                "length": c.length,
                "word": c.word})
        return candidates

Methods

def find_candidates(self)

This method finds epiphora candidates in the text.

def serialize(self) ‑> list

This method serializes the epiphora candidates.

Returns

list
A list of dictionaries, each containing the ids, length, and word of an epiphora candidate.
def split_in_phrases(self)

This method splits the text into phrases.

Returns

list
A list of lists, each containing the start and end index of a phrase.
class EpiphoraCandidate (ids, word)

This class represents an epiphora candidate.

Constructor for the EpiphoraCandidate class.

Parameters

ids : list
A list of token ids that form the candidate.
word : str
The word that the candidate ends with.
Expand source code
class EpiphoraCandidate():
    """
    This class represents an epiphora candidate.
    """
    def __init__(self, ids, word):
        """
        Constructor for the EpiphoraCandidate class.

        Parameters
        ----------
        ids : list
            A list of token ids that form the candidate.
        word : str
            The word that the candidate ends with.
        """
        self.ids = ids
        self.word = word

    @property
    def score(self):
        """
        This property returns the score of the candidate.
        """
        return len(self.ids)

Instance variables

prop score

This property returns the score of the candidate.

Expand source code
@property
def score(self):
    """
    This property returns the score of the candidate.
    """
    return len(self.ids)