Package biblio :: Package webquery :: Module utils
[hide private]
[frames] | no frames]

Module utils

source code

Various utilities.
Functions [hide private]
 
normalize_isbn(isbn)
Remove formatting from an ISBN, making it suitable for web-queries.
source code
 
parse_single_name(name_str)
Clean up an indivdual name into a more consistent format.
source code
 
parse_names(name_str)
Clean up a list of names into a more consistent format.
source code
 
parse_editing_info(name_str)
Detect whethers names are editors and returns
source code
 
parse_publisher(pub_str)
Parse a string of publisher information.
source code
 
_doctest() source code
Variables [hide private]
  EDITOR_PATS = [re.compile(r'(?iu)^edited by\s+'), re.compile(r...
  STRIP_PATS = [re.compile(r'(?iu)^by\s+'), re.compile(r'(?iu)\s...
  AND_PAT = re.compile(r'\s+and\s+')
  COLLAPSE_SPACE_RE = re.compile(r'\s+')
  PUBLISHER_RES = [re.compile(r'(?iu)^(?P<city>.*)\s*:\s*(?P<pub...
  p = '^(?P<pub>.*)\\.?$'
  x = '\\s*;.*$'
Function Details [hide private]

parse_names(name_str)

source code 

Clean up a list of names into a more consistent format.

Xisbn data can be irregularly formatted, unpredictably including ancillary information. This function attempts to cleans up the author field into a list of consistent author names.

For example:

>>> n = parse_names ("Leonard Richardson and Sam Ruby.")
>>> print (n[0].family == 'Richardson')
True
>>> print (n[0].given == 'Leonard')
True
>>> print (not n[0].other)
True
>>> n = parse_names ("Stephen P. Schoenberger, Bali Pulendran")
>>> print (n[0].family == 'Schoenberger')
True
>>> print (n[0].given == 'Stephen')
True
>>> print (n[0].other == 'P.')
True
>>> n = parse_names ("Madonna")
>>> print (not n[0].family)
True
>>> print (n[0].given == 'Madonna')
True
>>> print (not n[0].other)
True
Parameters:
  • name_str (string) - The "author" attribute from a Xisbn record in XML.
Returns:
A list of the authors in "reverse" format, e.g. "['Smith, A. B.', 'Jones, X. Y.']"

parse_editing_info(name_str)

source code 

Detect whethers names are editors and returns

Returns:
Whether editing information was recognised and the name with that editing information removed.

For example:

>>> parse_editing_info ("Leonard Richardson and Sam Ruby.")
(False, 'Leonard Richardson and Sam Ruby.')
>>> parse_editing_info ("Ann Thomson.")
(False, 'Ann Thomson.')
>>> parse_editing_info ("Stephen P. Schoenberger, Bali Pulendran, editors.")
(True, 'Stephen P. Schoenberger, Bali Pulendran')
>>> print parse_editing_info ("Madonna")
(False, 'Madonna')

parse_publisher(pub_str)

source code 

Parse a string of publisher information.

As with author names, publication details are often inconsistently set out, even in bibliographic data. This function attempts to parse out and normalise the details.

For example:

>>> parse_publisher ('New York: Asia Pub. House, c1979.')
('Asia Pub. House', 'New York', '1979')
>>> parse_publisher ('New York : LearningExpress, 1999.')
('LearningExpress', 'New York', '1999')
>>> parse_publisher ('HarperTorch')
('HarperTorch', '', '')
>>> parse_publisher ('Berkeley Heights, NJ: Enslow Publishers, c2000.')
('Enslow Publishers', 'Berkeley Heights, NJ', '2000')
Parameters:
  • pub_str (string) - text giving publisher details.
Returns:
A tuple of strings, being (<publisher>, <city of publication>, <year of publication>). If no value is available, an empty string returned.

Variables Details [hide private]

EDITOR_PATS

Value:
[re.compile(r'(?iu)^edited by\s+'),
 re.compile(r'(?iu)\s*, editors\.?$'),
 re.compile(r'(?iu)^editors,?\s*')]

STRIP_PATS

Value:
[re.compile(r'(?iu)^by\s+'),
 re.compile(r'(?iu)\s*;\s+with an introduction by .*$'),
 re.compile(r'(?iu)^\[\s*'),
 re.compile(r'(?iu)\s*\]$'),
 re.compile(r'(?iu)\.{3,}'),
 re.compile(r'(?iu)et[\. ]al\.'),
 re.compile(r'(?iu)\['),
 re.compile(r'(?iu)\]'),
...

PUBLISHER_RES

Value:
[re.compile(r'(?iu)^(?P<city>.*)\s*:\s*(?P<pub>.*)\s*,\s*c?(?P<year>\d\
{4})\.?$'),
 re.compile(r'(?iu)^(?P<pub>.*)\.?$')]