Custom text recoding using regular expressions.
Inputs:
Segmentation covering the text that should be recoded
JSON Message controlling the list of substitutions
Outputs:
Segmentation covering the recoded text
The Recode widget creates a modified copy of the input segmentation. The modifications applied are defined by substitutions, namely pairs composed of a regular expression and a replacement string. The interface of the Recode widget is available in two versions, according to whether or not the Advanced Settings checkbox is selected.
The basic version if the widget is limited to the application of a single substitution.
The advanced interface allows the user to define several substitutions and to determine the order in which they should be applied.
The Substitutions section allows the user to select the substitutions applied to each successive input segment and to determine their application order. The list of substitutions with corresponding regular expressions, replacement strings and additional options associated with the regular expression used appears at the top of the window.
The buttons on the left side of the Sources section allow the user to modify his selection by:
- changing the order in which the substitutions are applied: Move Up, Move Down
- deleting individual substitutions from the list: Remove
- clearing the list of all substitutions: Clear All
- import a list of substitutions in JSON format and add it to the previously selected sources: Import List
- export the list of substitutions in a JSON file: Export List
Define a new subsitution by using RegEx.
Define the replacement string.
Control the application of the corresponding options to the regular expression: Ignore case, Unicode dependent, Multiline, Dot matches.
Add a new substitution to the list.
Copy every annotation of the input segmentation to the output segmentation.
By clicking Send, changes are communicated to the output of the widget. Alternatively, tick Send automatically and changes will be communicated to the output at every modification.
Information about the number of segments in the output segmentation or the reasons why no segmentation is emitted
For the purpose of this example, we have decided to adapt our American spelling for our readers that prefer British English. We mostly focused on the difference in spelling the affix -ize (or -ise that is preferred in British English). We used the Text Field widget to input several words that use the American way of spelling and then segmented the input segmentation into words. Lastly, we used the advanced interface of the Recode widget and replaced all occurences of the affix -ize with the affix -ise.