Package biblio :: Package webquery :: Module worldcat
[hide private]
[frames] | no frames]

Module worldcat

Querying WorldCat for bibliographic information and normalising the results.
Classes [hide private]
  WorldcatQuery
Functions [hide private]
 
_doctest()
 
parse_authors(auth_str)
Clean up Worldcat author information into a more consistent format.
 
parse_metadata(mdata_xml)
Retrieve fields from metadata and return and cleanup in a sensible form.
 
parse_title(title)
Clean up Worldcat title information into a more consistent format.
Variables [hide private]
  AND_PAT = re.compile(r'\s+and\s+')
  STRIP_PATS = [re.compile(r'(?iu)^((edited )?by\s+)'), re.compi...
  WORLDCAT_ROOTURL = 'http://xisbn.worldcat.org/webservices/xid/...
  x = '\\([^\\)]+\\)'
Function Details [hide private]

parse_authors(auth_str)

 

Clean up Worldcat author information into a more consistent format.

Worldcat data can be irregularly formatted, unpredictably including ancillary information. This function attempts to cleans up the author field into a list of consistent author names.

For example:

>>> parse_authors ("Leonard Richardson and Sam Ruby.")
['Richardson, Leonard', 'Ruby, Sam']
>>> parse_authors ("Ann Thomson.")
['Thomson, Ann']
>>> parse_authors ("Stephen P. Schoenberger, Bali Pulendran, editors.")
['Schoenberger, Stephen P.', 'Pulendran, Bali']
>>> parse_authors ("Madonna")
['Madonna']
Parameters:
  • auth_str (string) - The "author" attribute from a Worldcat record in XML.
Returns:
A list of the authors in "reverse" format, e.g. "['Smith, A. B.', 'Jones, X. Y.']"

parse_metadata(mdata_xml)

 
Retrieve fields from metadata and return and cleanup in a sensible form.
Parameters:
  • mdata_xml (string) - An Worldcat record in XML.
Returns:
A dictionary with keys "year", "title" and "authors" parsed from the Worldcat record. If a field is not present or parseable, neither is the key.

parse_title(title)

 

Clean up Worldcat title information into a more consistent format.

Althogh this currently does nothing, in the future it will normalise the titles, e.g. by stripping out subtitle and edition information.


Variables Details [hide private]

STRIP_PATS

Value:
[re.compile(r'(?iu)^((edited )?by\s+)'),
 re.compile(r'(?iu)\s*, editors\.?$'),
 re.compile(r'(?iu)^editors,?\s*'),
 re.compile(r'(?iu)\s*;\s+with an introduction by .*$'),
 re.compile(r'(?iu)^\[\s*'),
 re.compile(r'(?iu)\s*\]$'),
 re.compile(r'(?iu)\.{3,}'),
 re.compile(r'(?iu)et[\. ]al\.'),
...

WORLDCAT_ROOTURL

Value:
'http://xisbn.worldcat.org/webservices/xid/isbn/'