Library Reference
This page documents how to include CiteURL in your Python programming projects.
The first step is to instantiate a Citator, which by default contains all of CiteURL's built-in Templates:
from citeurl import Citator
citator = Citator()
After that, you can feed it text to return a list of Citations it finds:
text = """
Federal law provides that courts should award prevailing civil rights plaintiffs reasonable attorneys fees, 42 USC § 1988(b), and, by discretion, expert fees, id. at (c). This is because the importance of civil rights litigation cannot be measured by a damages judgment. See Riverside v. Rivera, 477 U.S. 561 (1986). But Evans v. Jeff D. upheld a settlement where the plaintiffs got everything they wanted, on condition that they waive attorneys' fees. 475 U.S. 717 (1986). This ruling lets savvy defendants create a wedge between plaintiffs and their attorneys, discouraging civil rights suits and undermining the court's logic in Riverside, 477 U.S. at 574-78.
"""
citations = citator.list_citations(text)
Once you have a list of citations, you can get information about each one:
print(citations[0].text)
# 42 USC § 1988(b)
print(citations[0].tokens)
# {'title': '42', 'section': '1988', 'subsection': '(b)'}
print(citations[0].URL)
# https://www.law.cornell.edu/uscode/text/42/1988#b
You can also use insert_links() to insert the citations back into the source text as HTML hyperlinks:
from citeurl import insert_links
output = insert_links(citations, text)
Or, you can use list_authorities() to combine all the citations into a list of all the authorities cited in the text:
from citeurl import list_authorities
authorities = list_authorities(citations)
for authority in authorities:
auth_cites = authority.citations
print(f"{authority} was cited {len(auth_cites)} time(s)")
# 42 USC § 1988 was cited 2 time(s)
# 477 U.S. 561 was cited 2 time(s)
# 475 U.S. 717 was cited 1 time(s)
Citator
CiteURL's main feature: a collection of templates, and the tools to apply them to text, to find all kinds of citations in a text.
Attributes:
Name | Type | Description |
---|---|---|
templates |
list |
A list of template objects that this citator will try to match against. |
generic_id |
str |
A common regex the citator will append to each template when it is loaded, to recognize a simple citation to the most-recently cited source. |
__init__(self, yaml_paths=[], defaults=True, generic_id='\\b(Ib)?[Ii]d\\.(<\\/(i|em|u)>)?')
special
Calls load_yaml one or more times, to load the citator with templates.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
defaults |
bool |
Whether to load CiteURL's default templates |
True |
yaml_paths |
list |
paths to additional YAML files with templates that should be loaded to supplement or replace the defaults. |
[] |
generic_id |
str |
a common regex to append to all templates, to recognize a simple citation to the most-recently cited source. Detects "id." or "ibid." by default. To disable, set to None. |
'\\b(Ib)?[Ii]d\\.(<\\/(i|em|u)>)?' |
Source code in citeurl/__init__.py
def __init__(
self,
yaml_paths: list=[],
defaults: bool=True,
generic_id: str=GENERIC_ID
):
"""
Calls load_yaml one or more times, to load the citator with
templates.
Arguments:
defaults: Whether to load CiteURL's default templates
yaml_paths: paths to additional YAML files with templates that
should be loaded to supplement or replace the defaults.
generic_id: a common regex to append to all templates, to
recognize a simple citation to the most-recently cited
source. Detects "id." or "ibid." by default. To
disable, set to None.
"""
self.generic_id: str = generic_id
self.templates: list = []
if defaults:
self.load_yaml(DEFAULT_YAML_PATH)
for path in yaml_paths:
self.load_yaml(path)
insert_links(self, text, attrs={'class': 'citation'}, url_optional=False, link_detailed_ids=True, link_plain_ids=False, id_break_regex='L\\. ?Rev\\.|J\\. ?Law|\\. ?([Cc]ode|[Cc]onst)', id_break_indices=[])
Convenience method to return a copy of the given text, with citation hyperlinks inserted.
If you plan to do more than just insert links, it's better to get a list of citations with list_citations first, then insert those links with the module-wide insert_links function.
Source code in citeurl/__init__.py
def insert_links(
self,
text: str,
attrs: dict={'class': 'citation'},
url_optional: bool=False,
link_detailed_ids: bool=True,
link_plain_ids: bool=False,
id_break_regex: str=DEFAULT_ID_BREAKS,
id_break_indices: list=[]) -> str:
"""
Convenience method to return a copy of the given text, with
citation hyperlinks inserted.
If you plan to do more than just insert links, it's better to
get a list of citations with list_citations first, then insert
those links with the module-wide insert_links function.
"""
citations = self.list_citations(
text,
id_break_regex=id_break_regex,
id_break_indices=id_break_indices
)
return insert_links(
citations,
text,
attrs=attrs,
link_detailed_ids=link_detailed_ids,
link_plain_ids=link_plain_ids,
url_optional=url_optional
)
list_authorities(self, text)
Convenience method to list all the authorities cited in a given text.
If you plan to do more than list authorities, it's better to get a list of citations with list_citations, then list the unique authorities with the module-wide list_authorities function.
Source code in citeurl/__init__.py
def list_authorities(self, text: str) -> list:
"""
Convenience method to list all the authorities cited in a
given text.
If you plan to do more than list authorities, it's better to
get a list of citations with list_citations, then list the
unique authorities with the module-wide list_authorities
function.
"""
citations = self.list_citations(text)
return list_authorities(citations)
list_citations(self, text, id_forms=True, id_break_regex='L\\. ?Rev\\.|J\\. ?Law|\\. ?([Cc]ode|[Cc]onst)', id_break_indices=[])
Scan a text and return a list of all citations in it, in order of appearance.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
id_forms |
bool |
Whether to detect citations like "Id." and "Id. at 30." |
True |
id_break_regex |
str |
A pattern to look for in the text. Any occurrence of the pattern will interrupt a chain of "id." citations as if it were another citation. |
'L\\. ?Rev\\.|J\\. ?Law|\\. ?([Cc]ode|[Cc]onst)' |
id_break_indices |
list |
A list of positions in the text where "id." citations should be interrupted |
[] |
Returns:
Type | Description |
---|---|
list |
A list of citation objects, in order of appearance in the text. |
Source code in citeurl/__init__.py
def list_citations(self,
text: str,
id_forms: bool=True,
id_break_regex: str=DEFAULT_ID_BREAKS,
id_break_indices: list=[],
) -> list:
"""
Scan a text and return a list of all citations in it, in
order of appearance.
Arguments:
id_forms: Whether to detect citations like "Id." and
"Id. at 30."
id_break_regex: A pattern to look for in the text. Any
occurrence of the pattern will interrupt a chain of
"id." citations as if it were another citation.
id_break_indices: A list of positions in the text
where "id." citations should be interrupted
Returns:
A list of citation objects, in order of appearance in the
text.
"""
# First, get full citations:
citations = []
for template in self.templates:
citations += template.get_citations(text)
shortform_cites = []
# Then, add shortforms
for citation in citations:
shortform_cites += citation._get_shortform_citations(text)
citations += shortform_cites
citations = _sort_and_remove_overlaps(citations)
if not id_forms: # no need to proceed
return citations
# determine where to break chains of id. citations
for citation in citations: # break at full or short citations
id_break_indices.append(citation.span[0])
if id_break_regex: #also break at specified regexes
matches = re.compile(id_break_regex).finditer(text)
for match in matches:
id_break_indices.append(match.span()[0])
id_break_indices = sorted(set(id_break_indices))
# loop through all citations to find their id citations
id_citations = []
for citation in citations:
# find the next id break point
i = -1
for index in id_break_indices:
i += 1
if index > citation.span[1]:
end_point = index
break
else:
end_point = None
id_break_indices = id_break_indices[i:]
# get each citation's id citations until the break point
id_citations += citation._get_id_citations(
text, end_point=end_point
)
return _sort_and_remove_overlaps(citations + id_citations)
load_yaml(self, path, use_generic_id=True)
Import templates from the specified YAML file into the citator.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path |
str |
path to the YAML file to load |
required |
use_generic_id |
bool |
Whether to append the citator's generic_id |
True |
Source code in citeurl/__init__.py
def load_yaml(self, path: str, use_generic_id: bool=True):
"""
Import templates from the specified YAML file into the citator.
Arguments:
path: path to the YAML file to load
use_generic_id: Whether to append the citator's generic_id
citation format to the loaded templates.
"""
yaml_text = Path(path).read_text()
yaml_dict = safe_load(yaml_text)
# read each item in the YAML into a new template
for template_name, template_data in yaml_dict.items():
# if regex is specified in singular form, convert it to a
# list with one item, for sake of consistency with multiple-
# regex templates.
for key in ['regex', 'broadRegex']:
if key in template_data:
template_data[key + 'es'] = [template_data.pop(key)]
# unrelated: if an individual regex is given as a list of
# strings (convenient for reusing YAML anchors), concatenate
# it to one string.
for key in ['regexes', 'broadRegexes', 'idForms', 'shortForms']:
if key not in template_data:
continue
for i, regex in enumerate(template_data[key]):
if type(regex) is list:
template_data[key][i] = ''.join(regex)
# make the template and add it to the citator, adding the
# generic id-form citation if applicable
new_template = Template(name=template_name, **template_data)
if (
use_generic_id and self.generic_id
# weird bug: without this next check, the generic id
# gets added upwards of 44 times even though the code
# appears to run only once
and self.generic_id not in new_template.idForms
):
new_template.idForms.append(self.generic_id)
self.templates.append(new_template)
lookup(self, query, broad=True)
Convenience method to get the first citation from the first matching template, or None.
This is meant for cases where false positives are not an issue, so it uses broadRegex and case-insensitive matching by default.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
broad |
bool |
Whether to use case-insensitive regex matching and, if available, each template's broadRegex. |
True |
query |
str |
The text to scan for a citation |
required |
Returns:
Type | Description |
---|---|
Citation |
A single citation object, or None |
Source code in citeurl/__init__.py
def lookup(self, query: str, broad: bool=True) -> Citation:
"""
Convenience method to get the first citation from the first
matching template, or None.
This is meant for cases where false positives are not an issue,
so it uses broadRegex and case-insensitive matching by default.
Arguments:
broad: Whether to use case-insensitive regex matching and,
if available, each template's broadRegex.
query: The text to scan for a citation
Returns:
A single citation object, or None
"""
for template in self.templates:
citation = next(template.get_citations(query, broad=broad), None)
if citation:
return citation
return None
Template
A pattern to recognize a single kind of citation and generate URLs from matches.
In most cases, it is more useful to use the Citator class to load templates from YAML files and apply them en masse, rather than use the Template class directly.
__init__(self, name, regexes, URL=None, broadRegexes=None, idForms=[], shortForms=[], defaults={}, operations=[], parent_citation=None, _is_id=False)
special
Template constructor. Primarily meant for use in loading YAML files and dynamically generating shortform templates, but can be run directly if needed.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
str |
The name of this template |
required |
regexes |
list |
A list of one or more regexes that this template will match. Each regex should be provided as a string, and should include one or more named capture groups (i.e. "tokens") that will be used to generate the URL. |
required |
URL |
|
The template by which to generate URLs from citation matches. Placeholders in {curly braces} will be replaced by the value of the token with the same name, after that token has been processed by the template The URL template can be provided either as as a string or as a list of strings to concatenate. In the latter case, if a list item contains a placeholder for which no value is set, the list item will be skipped. |
None |
defaults |
dict |
A dictionary of tokens and corresponding default values which should be set if the token's value is not otherwise set by a regex capture group. |
{} |
operations |
list |
A list of operations to perform on the tokens,
in sequence, to transform them from Each operation must specify a The supported operations are The The The The |
[] |
shortForms |
list |
A list of regex templates to generate regexes that recognize short-forms of a parent long-form citation that has appeared earlier in the text. Any named section in {curly braces} will be replaced by
the value of the corresponding token from the parent
citation. So if a template detects a longform citation to
"372 U.S. 335" and has a shortform |
[] |
idForms |
list |
Think "id.", not ID. Identical to shortForms, except that these regexes will only match until the next different citation or other interruption. |
[] |
parent_citation |
Citation |
The citation, if any, that this template was created as a shortform of. This argument is for dynamically-generated templates, and there is usually no need to use it manually. |
None |
Source code in citeurl/__init__.py
def __init__(self,
name: str,
regexes: list,
URL=None,
broadRegexes: list=None,
idForms: list=[],
shortForms: list=[],
defaults: dict={},
operations: list=[],
parent_citation: Citation=None,
_is_id=False
):
"""
Template constructor. Primarily meant for use in loading YAML
files and dynamically generating shortform templates, but can be
run directly if needed.
Arguments:
name: The name of this template
regexes: A list of one or more regexes that this template will
match. Each regex should be provided as a string, and
should include one or more named capture groups
(i.e. "tokens") that will be used to generate the URL.
URL: The template by which to generate URLs from citation
matches. Placeholders in {curly braces} will be replaced
by the value of the token with the same name, after that
token has been processed by the template
The URL template can be provided either as as a string
or as a list of strings to concatenate. In the latter
case, if a list item contains a placeholder for which
no value is set, the list item will be skipped.
defaults: A dictionary of tokens and corresponding default
values which should be set if the token's value is not
otherwise set by a regex capture group.
operations: A list of operations to perform on the tokens,
in sequence, to transform them from `captured_tokens` to
`processed_tokens`, the tokens that are used for URL
generation.
Each operation must specify a `token` for its input. It
will also be used as the output of the operation, unless
`output` is specified. If the specified input token is
not set, the operation will be skipped.
The supported operations are `case`, `sub`, `lookup`,
`optionalLookup`, `lpad`, and `numberFormat`.
The `case` operation outputs the input token, set to the
specified capitalization, either 'upper', 'lower', or
'title'.
The `sub` operation performs a regex substitution. It
requires a list of two strings; the first is the regex
to match in the input token, and the second is the text
to replace each match with.
The `lookup` operation tries to match the input against
a series of dictionary keys (using case-insensitive
regex), and set the output to the corresponding value.
If the dictionary does not contain a matching key, the
entire template match will retroactively fail.
`optionalLookup` works the same way, except that failed
lookups will not cause the template to fail, and will
simply leave tokens unmodified.
The `numberFormat` operation assumes that the input
token is a number, either in digit form or Roman
numerals. It outputs the same number, converted to the
specified number format, either 'roman' or 'digit'.
shortForms: A list of regex templates to generate regexes
that recognize short-forms of a parent long-form
citation that has appeared earlier in the text.
Any named section in {curly braces} will be replaced by
the value of the corresponding token from the parent
citation. So if a template detects a longform citation to
"372 U.S. 335" and has a shortform `{volume} {reporter}
at (?P<pincite>\d+)`, it will generate the following
regex: `372 U.S. at (?P<pincite>\d+)`.
idForms: Think "id.", not ID. Identical to shortForms,
except that these regexes will only match until the
next different citation or other interruption.
parent_citation: The citation, if any, that this template
was created as a shortform of. This argument is
for dynamically-generated templates, and there is usually
no need to use it manually.
"""
# Basic values
self.name: str = name
self.regexes: str = regexes
self.is_id: bool = _is_id
if URL:
self.URL: str = URL if type(URL) is list else [URL]
# Supplemental regexes
self.broadRegexes: str = broadRegexes
self.idForms: list = idForms
self.shortForms: list = shortForms
# String operators
self.defaults: dict = defaults
self.operations: list = operations
# Extra data for shortform citations
self.parent_citation: Citation = parent_citation
# hack: prevent all regexes from matching mid-word
for key in ['regexes', 'broadRegexes', 'idForms', 'shortForms']:
regex_list = self.__dict__[key]
if not regex_list:
continue
regex_list = list(map(lambda x: fr'(?<!\w){x}(?!\w)', regex_list))
self.__dict__[key] = regex_list
# dictionaries of compiled regexes
self._compiled_regexes: dict = {}
self._compiled_broadRegexes: dict = {}
get_citations(self, text, broad=False, span=(0,))
Generator to return all citations the template finds in text.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text |
str |
The text to scan for a citation |
required |
broad |
bool |
Whether to use case-insensitive regex matching and, if available, the template's broadRegex. |
False |
span |
tuple |
A tuple of one or two values determining the start and end index of where in the text to search for citations. Defaults to (0,) to scan the entire text. |
(0,) |
Returns:
Type | Description |
---|---|
Iterable |
Generator that yields each citation the template finds in the text, or None. |
Source code in citeurl/__init__.py
def get_citations(
self,
text: str,
broad: bool=False,
span: tuple=(0,)
) -> Iterable:
"""
Generator to return all citations the template finds in text.
Arguments:
text: The text to scan for a citation
broad: Whether to use case-insensitive regex matching and,
if available, the template's broadRegex.
span: A tuple of one or two values determining
the start and end index of where in the text to search
for citations. Defaults to (0,) to scan the entire text.
Returns:
Generator that yields each citation the template finds in the
text, or None.
"""
matches = []
regex_count = len(self.regexes)
if broad and self.broadRegexes:
regex_count += len(self.broadRegexes)
for index in range(regex_count):
#print(f'scanning regex {index} for template {self}')
matches += self._compiled_re(index, broad).finditer(text, *span)
for match in matches:
try:
citation = Citation(match, self)
# skip citations where lookup failed:
except KeyError as e:
citation = None
if citation:
yield citation
return None
lookup(self, text, broad=True, span=(0,))
Returns the first citation it finds in the text, or None.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text |
str |
The text to scan for a citation. |
required |
broad |
bool |
Whether to use case-insensitive regex matching and, if available, the template's broadRegex. |
True |
span |
tuple |
A tuple of one or two values determining the start and end index of where in the text to search for citations. Defaults to (0,) to scan the entire text. |
(0,) |
Returns:
Type | Description |
---|---|
Citation |
The first citation this template finds in the scanned text, or None. |
Source code in citeurl/__init__.py
def lookup(
self,
text: str,
broad: bool=True,
span: tuple=(0,)
) -> Citation:
"""
Returns the first citation it finds in the text, or None.
Arguments:
text: The text to scan for a citation.
broad: Whether to use case-insensitive regex matching
and, if available, the template's broadRegex.
span: A tuple of one or two values determining
the start and end index of where in the text to search
for citations. Defaults to (0,) to scan the entire text.
Returns:
The first citation this template finds in the scanned text,
or None.
"""
try:
return next(self.get_citations(text, broad=broad, span=span))
except:
return None
Citation
A single citation found in text.
Attributes:
Name | Type | Description |
---|---|---|
text |
str |
The text of the citation itself, like "42 USC § 1988(b)" |
span |
tuple |
The beginning and end positions of this citation in the source text. |
template |
Template |
The template which recognized this citation |
tokens |
dict |
Dictionary of the named capture groups from the regex this citation matched. For "id." and "shortform" citations, this includes tokens carried over from the parent citation. |
processed_tokens |
dict |
Dictionary of tokens after they have been modified via the template's processes. |
URL |
str |
The URL where a user can read this citation online |
authority |
|
The Authority that this citation is a reference to. This attribute is not set until list_authorities() is run. |
__init__(self, match, template)
special
For internal use. There should be no need to create citations by means other than a Citator or Template object.
Source code in citeurl/__init__.py
def __init__(self, match: re.Match, template):
"""
For internal use. There should be no need to create citations
by means other than a Citator or Template object.
"""
self.span: tuple = match.span()
self.template: Template = template
self.text: str = match.group(0)
# idForm and shortForm citations get values from parent citation
# except where their regexes include space for those values
if template.parent_citation:
self.tokens: dict = dict(template.parent_citation.tokens)
for key, val in match.groupdict().items():
self.tokens[key] = val
else:
self.tokens: dict = match.groupdict()
self.processed_tokens: dict = self.template._process_tokens(self.tokens)
self.URL: str = self._get_url()
get_link(self, attrs={'class': 'citation'})
Return citation's link element, with given attributes
Source code in citeurl/__init__.py
def get_link(self, attrs: dict={'class': 'citation'}):
"""Return citation's link element, with given attributes"""
if self.URL:
attrs['href'] = self.URL
else:
del attrs['href']
attr_str = ''
for key, value in attrs.items():
attr_str += ' %s="%s"' % (key, value)
return '<a%s>%s</a>' % (attr_str, self.text)
Authority
A single source cited one or more times in a text.
Attributes:
Name | Type | Description |
---|---|---|
defining_tokens |
dict |
A dictionary of tokens that define this authority, such that any citations with incompatible token values will not match it. Note that this uses processed_tokens (those which have been modified by the template's operations). |
template |
Template |
The template which found all the citations to this authority |
citations |
list |
The list of all the citations that refer to this authority. |
base_citation |
Citation |
A citation object representing the hypothetical generic citation to this authority. |
name |
str |
The text of base_citation |
__init__(self, first_cite, allowed_differences=[])
special
Define an authority by providing a single long-form citation, and the list of tokens which, if present in the citation, should be discarded from the definition of the authority.
Generates a base_citation to represent the generic instance of this authority.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
first_cite |
Citation |
A long-form citation object representing the first and archetypal citation to this authority. The first_cite will be added as the first entry in the authority's citation list, and it will be used as the basis to generate the authority's base_citation. |
required |
allowed_differences |
list |
A list of tokens whose values can differ among citations to the same authority |
[] |
Source code in citeurl/__init__.py
def __init__(self, first_cite: Citation, allowed_differences: list=[]):
"""
Define an authority by providing a single long-form citation,
and the list of tokens which, if present in the citation, should
be discarded from the definition of the authority.
Generates a base_citation to represent the generic instance of
this authority.
Arguments:
first_cite: A long-form citation object representing the
first and archetypal citation to this authority. The
first_cite will be added as the first entry in the
authority's citation list, and it will be used as the
basis to generate the authority's base_citation.
allowed_differences: A list of tokens whose values can
differ among citations to the same authority
"""
long_cite = first_cite._original_cite()
self.template: Template = long_cite.template
self.citations: list = [first_cite]
# List the token values that distinguish this authority from
# others in the same template. This uses processed tokens, not
# raw, so that a citation to "50 U.S. 5" will match
# a citation to "50 U. S. 5", etc.
self.defining_tokens: dict = {}
for t in first_cite.processed_tokens:
if (
first_cite.processed_tokens[t] != None
and t not in allowed_differences
):
self.defining_tokens[t] = first_cite.processed_tokens[t]
# Next, derive a base citation to represent this authority.
# If the first_citation to this authority isn't a longform, use
# whatever longform it's a child of.
self.base_citation: Citation = None
try:
self.base_citation = self._derive_base_citation(long_cite)
except TypeError:
self.base_citation = first_cite
# Set other instance variables
self.name: str = self.base_citation.text
self.URL: str = self.base_citation.URL
# finally, give the first citation a reference to this authority
first_cite.authority = self
include(self, citation)
Adds the citation to this authority's list of citations. Also,
adds the authority
tag to the citation, referring back to this
authority.
Source code in citeurl/__init__.py
def include(self, citation):
"""Adds the citation to this authority's list of citations. Also,
adds the `authority` tag to the citation, referring back to this
authority."""
self.citations.append(citation)
citation.authority = self
matches(self, citation)
Checks whether a given citation matches the template and defining tokens of this authority.
Source code in citeurl/__init__.py
def matches(self, citation) -> bool:
"""
Checks whether a given citation matches the template and defining
tokens of this authority.
"""
if self.template.name != citation.template.name:
return False
for key, value in self.defining_tokens.items():
if (key not in citation.processed_tokens
or citation.processed_tokens[key] != value):
return False
return True
insert_links()
Given a text and a list of citations found in it, return a text with an HTML hyperlink inserted for each citation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
citations |
list |
A list of citation objects found in the text |
required |
text |
str |
The text the citations were found in |
required |
attrs |
dict |
HTML tag attributes (like css class, rel, etc) to give each inserted hyperlink. |
{'class': 'citation'} |
link_detailed_ids |
bool |
Whether to insert hyperlinks for citations like "Id. at 30." |
True |
link_plain_ids |
bool |
Whether to insert hyperlinks for simple repeat citations like "id." |
False |
url_optional |
bool |
Whether to insert link elements for citations that do not have an associated URL |
False |
Returns:
Type | Description |
---|---|
str |
The input text, with HTML links inserted for each citation |
Source code in citeurl/__init__.py
def insert_links(
citations: list,
text: str,
attrs: dict={'class': 'citation'},
link_detailed_ids: bool=True,
link_plain_ids: bool=False,
url_optional: bool=False
) -> str:
"""
Given a text and a list of citations found in it, return a text
with an HTML hyperlink inserted for each citation.
Arguments:
citations: A list of citation objects found in the text
text: The text the citations were found in
attrs: HTML tag attributes (like css class, rel, etc)
to give each inserted hyperlink.
link_detailed_ids: Whether to insert hyperlinks for citations
like "Id. at 30."
link_plain_ids: Whether to insert hyperlinks for simple repeat
citations like "id."
url_optional: Whether to insert link elements for citations that
do not have an associated URL
Returns:
The input text, with HTML links inserted for each citation
"""
offset = 0
for citation in citations:
# by default, skip citations without URLs
if not citation.URL and not url_optional:
continue
if citation.template.is_id:
# check whether the matched citation is from a template that
# has any named capture groups. If it doesn't, it's a
# "plain id." and should normally be skipped
if not '(?P<' in citation.template._compiled_re().pattern:
if not link_plain_ids:
continue
elif not link_detailed_ids:
continue
link = citation.get_link(attrs=attrs)
# insert each link into the proper place by offsetting later
# citations by however many extra characters are added by each
cite_start = citation.span[0] + offset
cite_end = citation.span[1] + offset
text = ''.join([text[:cite_start], link, text[cite_end:]])
offset += len(link) - len(citation.text)
return text
list_authorities()
Combine a list of citations into a list of authorities, each of which represents all the citations to a particular source.
As a side-effect, this also gives each citation an authority
attribute referring to the proper authority.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
citations |
list |
The list of citations to combine |
required |
irrelevant_tokens |
list |
A list of tokens whose values may differ among citations to the same authority. |
['subsection', 'subdivision', 'clause', 'pincite', 'pincite_end', 'footnote'] |
Returns:
Type | Description |
---|---|
list |
A list of authority objects, sorted by the number of citations that refer to each, from most to least. |
Source code in citeurl/__init__.py
def list_authorities(
citations: list,
irrelevant_tokens: list=NON_AUTHORITY_TOKENS
) -> list:
"""
Combine a list of citations into a list of authorities, each
of which represents all the citations to a particular source.
As a side-effect, this also gives each citation an `authority`
attribute referring to the proper authority.
Arguments:
citations: The list of citations to combine
irrelevant_tokens: A list of tokens whose values may
differ among citations to the same authority.
Returns:
A list of authority objects, sorted by the number of citations
that refer to each, from most to least.
"""
authorities = []
for citation in citations:
for authority in authorities:
if authority.matches(citation):
authority.include(citation)
break
else:
authorities.append(Authority(citation, irrelevant_tokens))
def authority_sort_key(authority):
return 0 - len(authority.citations)
return sorted(authorities, key=authority_sort_key)