Trees | Indices | Help |
---|
|
Arabic module
Author: Taha Zerrouki
Contact: taha dot zerrouki at gmail dot com
Copyright: Arabtechies, Arabeyes, Taha Zerrouki
License: GPL
Date: 2010/03/01
Version: 0.1
|
|||
is letter functions | |||
---|---|---|---|
Boolean |
|
||
Boolean |
|
||
Boolean |
|
||
Boolean |
|
||
Boolean |
|
||
Boolean |
|
||
Boolean |
|
||
Boolean |
|
||
Boolean |
|
||
Boolean |
|
||
Boolean |
|
||
Boolean |
|
||
Boolean |
|
||
Boolean |
|
||
Boolean |
|
||
Boolean |
|
||
Boolean |
|
||
general letter functions | |||
integer |
|
||
unicode |
|
||
unicode |
|
||
Has letter functions | |||
Boolean |
|
||
word and text functions | |||
Boolean |
|
||
Boolean |
|
||
Boolean |
|
||
Boolean |
|
||
Boolean |
|
||
Char functions | |||
unicode char |
|
||
unicode char |
|
||
unicode char |
|
||
unicode char |
|
||
Strip functions | |||
unicode. |
|
||
unicode. |
|
||
unicode. |
|
||
unicode. |
|
||
unicode. |
|
||
unicode. |
|
||
unicode. |
|
||
couple of unicode |
|
||
unicode |
|
||
Boolean |
|
||
Boolean |
|
||
Boolean |
|
||
unicode. |
|
||
Boolean / int |
|
||
list. |
|
|
|||
COMMA =
|
|||
SEMICOLON =
|
|||
QUESTION =
|
|||
HAMZA =
|
|||
ALEF_MADDA =
|
|||
ALEF_HAMZA_ABOVE =
|
|||
WAW_HAMZA =
|
|||
ALEF_HAMZA_BELOW =
|
|||
YEH_HAMZA =
|
|||
ALEF =
|
|||
BEH =
|
|||
TEH_MARBUTA =
|
|||
TEH =
|
|||
THEH =
|
|||
JEEM =
|
|||
HAH =
|
|||
KHAH =
|
|||
DAL =
|
|||
THAL =
|
|||
REH =
|
|||
ZAIN =
|
|||
SEEN =
|
|||
SHEEN =
|
|||
SAD =
|
|||
DAD =
|
|||
TAH =
|
|||
ZAH =
|
|||
AIN =
|
|||
GHAIN =
|
|||
TATWEEL =
|
|||
FEH =
|
|||
QAF =
|
|||
KAF =
|
|||
LAM =
|
|||
MEEM =
|
|||
NOON =
|
|||
HEH =
|
|||
WAW =
|
|||
ALEF_MAKSURA =
|
|||
YEH =
|
|||
MADDA_ABOVE =
|
|||
HAMZA_ABOVE =
|
|||
HAMZA_BELOW =
|
|||
ZERO =
|
|||
ONE =
|
|||
TWO =
|
|||
THREE =
|
|||
FOUR =
|
|||
FIVE =
|
|||
SIX =
|
|||
SEVEN =
|
|||
EIGHT =
|
|||
NINE =
|
|||
PERCENT =
|
|||
DECIMAL =
|
|||
THOUSANDS =
|
|||
STAR =
|
|||
MINI_ALEF =
|
|||
ALEF_WASLA =
|
|||
FULL_STOP =
|
|||
BYTE_ORDER_MARK =
|
|||
FATHATAN =
|
|||
DAMMATAN =
|
|||
KASRATAN =
|
|||
FATHA =
|
|||
DAMMA =
|
|||
KASRA =
|
|||
SHADDA =
|
|||
SUKUN =
|
|||
SMALL_ALEF =
|
|||
SMALL_WAW =
|
|||
SMALL_YEH =
|
|||
LAM_ALEF =
|
|||
LAM_ALEF_HAMZA_ABOVE =
|
|||
LAM_ALEF_HAMZA_BELOW =
|
|||
LAM_ALEF_MADDA_ABOVE =
|
|||
SIMPLE_LAM_ALEF =
|
|||
SIMPLE_LAM_ALEF_HAMZA_ABOVE =
|
|||
SIMPLE_LAM_ALEF_HAMZA_BELOW =
|
|||
SIMPLE_LAM_ALEF_MADDA_ABOVE =
|
|||
LETTERS =
|
|||
TASHKEEL =
|
|||
HARAKAT =
|
|||
SHORTHARAKAT =
|
|||
TANWIN =
|
|||
NOT_DEF_HARAKA =
|
|||
LIGUATURES =
|
|||
HAMZAT =
|
|||
ALEFAT =
|
|||
WEAK =
|
|||
YEHLIKE =
|
|||
WAWLIKE =
|
|||
TEHLIKE =
|
|||
SMALL =
|
|||
MOON =
|
|||
SUN =
|
|||
ALPHABETIC_ORDER =
|
|||
NAMES =
|
|||
HARAKAT_PATTERN = re.compile(r'
|
|||
LASTHARAKA_PATTERN = re.compile(r'
|
|||
SHORTHARAKAT_PATTERN = re.compile(r'
|
|||
TASHKEEL_PATTERN = re.compile(r'
|
|||
HAMZAT_PATTERN = re.compile(r'
|
|||
ALEFAT_PATTERN = re.compile(r'
|
|||
LIGUATURES_PATTERN = re.compile(r'
|
|||
TOKEN_PATTERN = re.compile(r'
|
|||
TOKEN_REPLACE = re.compile(r'
|
|||
__package__ =
|
|
Checks for Arabic Sukun Mark.
|
Checks for Arabic Shadda Mark.
|
Checks for Arabic Tatweel letter modifier.
|
Checks for Arabic Tanwin Marks (FATHATAN, DAMMATAN, KASRATAN).
|
Checks for Arabic Tashkeel Marks (
|
Checks for Arabic Harakat Marks (FATHA, DAMMA, KASRA, SUKUN, TANWIN).
|
Checks for Arabic short Harakat Marks (FATHA, DAMMA, KASRA, SUKUN).
|
Checks for Arabic Ligatures like LamAlef. (LAM_ALEF, LAM_ALEF_HAMZA_ABOVE, LAM_ALEF_HAMZA_BELOW, LAM_ALEF_MADDA_ABOVE)
|
Checks for Arabic Hamza forms. HAMZAT are (HAMZA, WAW_HAMZA, YEH_HAMZA, HAMZA_ABOVE, HAMZA_BELOW, ALEF_HAMZA_BELOW, ALEF_HAMZA_ABOVE )
|
Checks for Arabic Alef forms. ALEFAT = (ALEF, ALEF_MADDA, ALEF_HAMZA_ABOVE, ALEF_HAMZA_BELOW, ALEF_WASLA, ALEF_MAKSURA )
|
Checks for Arabic Yeh forms. Yeh forms : YEH, YEH_HAMZA, SMALL_YEH, ALEF_MAKSURA
|
Checks for Arabic Waw like forms. Waw forms : WAW, WAW_HAMZA, SMALL_WAW
|
Checks for Arabic Teh forms. Teh forms : TEH, TEH_MARBUTA
|
Checks for Arabic Small letters. SMALL Letters : SMALL ALEF, SMALL WAW, SMALL YEH
|
Checks for Arabic Weak letters. Weak Letters : ALEF, WAW, YEH, ALEF_MAKSURA
|
Checks for Arabic Moon letters. Moon Letters :
|
Checks for Arabic Sun letters. Moon Letters :
|
return Arabic letter order between 1 and 29. Alef order is 1, Yeh is 28, Hamza is 29. Teh Marbuta has the same ordre with Teh, 3.
|
return Arabic letter name in arabic. Alef order is 1, Yeh is 28, Hamza is 29. Teh Marbuta has the same ordre with Teh, 3.
|
return a list of arabic characteres . Return a list of characteres between ، to ْ
|
Checks if the arabic word contains shadda.
|
Checks if the arabic word is vocalized. the word musn't have any spaces and pounctuations.
|
Checks if the arabic text is vocalized. The text can contain many words and spaces
|
Checks for an Arabic standard Unicode block characters An arabic string can contain spaces, digits and pounctuation. but only arabic standard characters, not extended arabic
|
Checks for an Arabic Unicode block characters
|
Checks for an valid Arabic word. An Arabic word not contains spaces, digits and pounctuation avoid some spelling error, TEH_MARBUTA must be at the end.
|
Return the first char
|
Return the second char
|
Return the last letter example: zerrouki; 'i' is the last.
|
Return the second last letter example: zerrouki; 'k' is the second last.
|
Strip Harakat from arabic word except Shadda. The striped marks are :
Example: >>> text = u"الْعَرَبِيّةُ" >>> stripTashkeel(text) >>> العربيّة
|
Strip the last Haraka from arabic word except Shadda. The striped marks are :
Example: >>> text = u"الْعَرَبِيّةُ" >>> stripTashkeel(text) >>> الْعَرَبِيّة
|
Strip vowels from a text, include Shadda. The striped marks are :
Example: >>> text = u"الْعَرَبِيّةُ" >>> stripTashkeel(text) العربية
|
Strip tatweel from a text and return a result text. Example: >>> text = u"العـــــربية" >>> stripTatweel(text) >>> العربية
|
Strip Shadda from a text and return a result text. Example: >>> text = u"الشّمسيّة" >>> stripTatweel(text) الشمسية
|
Normalize Lam Alef ligatures into two letters (LAM and ALEF), and Tand return a result text. Some systems present lamAlef ligature as a single letter, this function convert it into two letters, The converted letters into LAM and ALEF are :
Example: >>> text = u"لانها لالء الاسلام" >>> normalizeLigature(text) لانها لالئ الاسلام
|
Standardize the Hamzat into one form of hamza, replace Madda by hamza and alef. Replace the LamAlefs by simplified letters. Example: >>> text = u"سئل أحد الأئمة" >>> normalizeHamza(text) سءل ءحد الءءمة
|
separate the letters from the vowels, in arabic word, if a letter hasn't a haraka, the not definited haraka is attributed. return ( letters, vowels)
|
joint the letters with the marks the length ot letters and marks must be equal return word
|
if the two words has the same letters and the same harakats, this fuction return True. The two words can be full vocalized, or partial vocalized
|
if the word1 is like a wazn (pattern), the letters must be equal, the wazn has FEH, AIN, LAM letters. this are as generic letters. The two words can be full vocalized, or partial vocalized
|
If the two words has the same letters and the same harakats, this fuction return True. The first word is partially vocalized, the second is fully if the partially contians a shadda, it must be at the same place in the fully
|
Reduce the Tashkeel, by deleting evident cases.
|
if the two words has the same letters and the same harakats, this function return True. The two words can be full vocalized, or partial vocalized
|
Tokenize text into words
|
|
MOON
|
SUN
|
ALPHABETIC_ORDER
|
NAMES
|
HARAKAT_PATTERN
|
LASTHARAKA_PATTERN
|
SHORTHARAKAT_PATTERN
|
TASHKEEL_PATTERN
|
HAMZAT_PATTERN
|
ALEFAT_PATTERN
|
LIGUATURES_PATTERN
|
TOKEN_PATTERN
|
Trees | Indices | Help |
---|
Generated by Epydoc 3.0.1 on Sun Feb 22 15:24:46 2015 | http://epydoc.sourceforge.net |