As stated above, a Form is a grouping of data that are about a morpheme, word,
phrase or sentence of the object language.
This subsection describes the types of data that comprise a Form. In the
context of the relational database model, these types of data are the columns of
the Form table.
The ID is a unique integer assigned by the RDBMS to each Form upon creation.
Knowing the ID of a Form comes in handy when you want to associate that Form to
a Collection. It is also good to know for when you want to quickly access a
Form.
For example, enter "domain_name/form/view/11" (where
"domain_name" is your OLD application's domain name, e.g., "www.old.org") in
your browser's address bar to view form with ID 11.
You can also enter a comma-separated list of Form IDs to quickly access several
Forms: "domain_name/form/view/11,12,13,14"
The transcription is a textual representation of the sound of a Form.
A transcription is an object language string and as such it should be written
using the graphs of the input orthography. (If not specified in the current
user's settings, the input orthography is the default input orthography
specified in the application settings.) Before being stored in the database,
the transcription will be converted from the input orthography to the
storage orthography.
The recommended position of the transcription along the "broad-narrow"
"phonetic/phonemic" spectrum should be stated in your particular OLD
application's help section.
A multi-sentence discourse should not be entered as a single form, but as
multiple Forms, grouped together and ordered into a Collection.
It is recommended that transcriptions contain only the following (strings of)
characters:
- graphs from the input orthography
- standard punctuation: " ' ( ) ! ? , ; :
(In future versions of the OLD, functionality may be created that
would output a warning after, or even disallow, entry of transcriptions
violating these conditions.)
The grammaticality indicates whether a particular object language Form is
grammatical (), ungrammatical ('*') or questionable ('?'). (Either more
grammaticality options should be added as standard or administrators/researchers
should be able to customize the forced-choice grammaticality field.)
When adding a Form, grammaticality judgments should not be indicated in the text
of the transcription. Instead, use the forced choice select field to the left
of the transcription input field.
The morpheme break field contains a morphological analysis of the form. The
system is currently preset to expect '-' and '=' as morpheme delimiters and ' '
as the word delimiter, but this could/should be made customizable on an
application-specific basis.
The morpheme break field may or may not be specified as an object language
string field. Such specification is made by administrators in the application
settings page. If the morpheme break field is set up as an object language
string field, then morpheme break input will be converted to the storage
orthography for storage and converted to the output orthography for display
(just like the data in the transcription field.) If the morpheme break field is
not set up as an object language string field, then no conversion will be
applied.
Whenever a Form is entered or updated, the OLD attempts to identify all of that
Form's morpheme-gloss pairs and searches for matches in existing Forms. If one
or more matches are found, then the morpheme and its gloss are displayed as HTML
links the match(es). This allows users to immediately see the extent to which
their morphological analyses are consistent with the rest of the data in the
system.
To understand this morphological linking in detail, imagine an OLD application
containing the following two Forms:
ID |
1 |
transcription |
chien |
morpheme break |
chien |
morpheme gloss |
dog |
gloss |
dog, mutt |
ID |
2 |
transcription |
s |
morpheme break |
s |
morpheme gloss |
plrl |
gloss |
plural marker |
Now, when the following Form is entered,
transcription |
chiens |
morpheme break |
chien-s |
morpheme gloss |
dog-PL |
gloss |
dogs |
the system identifies the following morpheme-gloss pairs ('chien'-'dog' and
's'-'PL'). It first searches the database for a Form with 'chien' as its
morpheme break value and 'dog' as its morpheme gloss value. It finds such a
match in Form 1 and as a result it displays both 'chien' and 'dog' in our newly
entered Form as links to Form 1. The link is displayed in blue font to indicate
a perfect match.
Its second search is for a Form with 's' as its morpheme break value and 'PL' as
its morpheme gloss value. A match is found in Form 2, but it is partial because
the morpheme gloss value of Form 2 is 'plrl' and not 'PL'. Therefore, 's' in
our new Form will be displayed as a green link (green to indicate a partial
match) to Form 2 and 'PL' will not be displayed as a link.
The morpheme gloss field should contain a gloss in the metalanguage for each
object language morpheme listed in the morpheme break field. The same
delimiters should be used between the morpheme glosses as were used between the
morphemes in the morpheme break line.
Researchers of an OLD application might want to work toward a consensus on how
morphemes should be glossed.
Morpheme glosses will be displayed as links to matching Forms in the manner
described above in the section on morpheme breaks.
The gloss is a translation of the Form into the metalanguage. When the Form
represents a spatio-temporally located utterance, whenever possible the gloss
should be something that the speaker offered, or would at least consent to, as a
translation.
The OLD allows multiple (up to four) glosses for a single Form. Each gloss has
its own gloss grammaticality field. This makes it possible to document a Form
as compatible with certain glosses but not with others. For example, a form
about the French word 'banque' might have 'bank (financial institutition)' as
its first gloss and '*riverbank' as its second gloss.
As discussed in the gloss section above, each gloss may have its own
grammaticality. This grammaticality indicates the acceptability of the Form
with a particular translation into the metalanguage. At present, the OLD allows
three choices: compatible (''), incompatible ('*') and questionable ('?').
The elicitation method refers to the means via which a particular Form was
obtained. Often, for example, it is useful to know whether the speaker
translated a metalanguage utterance of the elicitor, or described a visually
represented context or judged the grammaticality of an object language utterance
made by the elicitor, or whether the Form was obtained in some other manner.
The elicitation method field is a forced-choice user-populated field. That is,
researchers must choose from a list of possible elicitation methods, but that
list can be modified by researchers. By default there are no elicitation
methods predefined by an OLD application. Users must click on "database" in the
primary menu and then "tags" in the secondary menu in order to add (or possibly
update) the list of elicitation methods. (See the elicitation methods section).
The intention behind forced-choice user-populated fields is to encourage intra-
and inter-user consistency.
Keywords provide users with a general-purpose way of tagging Forms. A single
Form may be associated to zero, one or many keywords. Keywords are defined by
users of the OLD application in question. Click on "database" in the primary
menu and then "tags" in the secondary menu to add new keywords.
The category refers to the syntactic or morphological category of the Form.
Like elicitation method, category is a forced-choice user-populated field which
is initially empty in a new OLD application. Researchers can add new categories
(e.g., S, N, V, A, Adv, etc.) by clicking on "database" in the primary menu and
"tags" in the secondary menu. (See the category section.)
The category string is a string representing the morpho-syntactic categories of
the morphemes within the Form. This string is generated by the system based on
the morpheme break and morpheme gloss data entered by the user. For example,
suppose that the following form has just been entered.
transcription |
chiens |
morpheme break |
chien-s |
morpheme gloss |
dog-PL |
gloss |
dogs |
Suppose further that when the system searches for the morpheme-gloss pairs
'chien'-'dog' and 's'-'PL' it finds and exact match for each. In that scenario,
the categories of the 'chien'-'dog' and 's'-'PL' Forms (lets say they are 'N'
and 'Agr') will be used to generate the category string of 'chiens' and the
result will be 'N-Agr'.
When an OLD application contains many Forms whose morpheme break and morpheme
gloss fields are consistent with the system's own lexical Forms, many category
strings will be generated. When this is the case, users can search the category
strings to reveal high-level morpho-syntactic patterns.
The speaker is the individual who uttered the object language token that the
Form represents. To view the list of speakers documented in an OLD application,
click on "database" in the primary menu and "people" in the secondary menu.
Both administrators and researchers may add new speakers to the system (see the
speaker section).
The elicitor is the researcher who elicited the Form, that is, the person who
recorded and/or transcribed the utterance of a speaker. Entering an elicitor
involves choosing from a list of people registered as researchers for the OLD
application in question. To view the list of registered researchers of an OLD
application, click on "database" in the primary menu and "people" in the
secondary menu.
The enterer field is automatically populated with the name of the OLD researcher
who is adding the Form. In order to add a Form, a person must be logged in to
the OLD application.
The verifier of a Form is another (perhaps more experienced) researcher who has
already elicited a near-identical utterance and wants to indicate her agreement
about the accuracy of the first researcher's documentation of that utterance.
(The verifier field might be seldom used in practice...)
This category refers to the textual source of a Form, if applicable.
Researchers can add new sources by clicking on "database" in the primary menu
and "sources" in the secondary menu.
The date when the Form was elicited (if applicable) is documented in the date
elicited field in mm/dd/yyyy format.
The date and time when the Form was entered into the OLD application is
automatically generated upon entry by the system.
The date and time when the Form was last updated (modified) is automatically
generated by the system during each update.
The OLD allows you to perform powerful searches on your Form data. The
screenshot below shows the OLD Form search page.
Users can enter one or two search expressions and these expressions can be
coordinated via conjunction ('and'), disjunction ('or') or negated conjunction
('and not').
Each search expressions is comprised of (i) a text input field (ii) a search
type select field and (iii) a search location select field.
The text input field is where one enters the pattern that the result data must
match.
The search type select field indicates the way in which the search is to
be implemented. The search type options are 'as a phrase', 'all of these',
'any of these', 'as a reg exp' and 'exactly'. These will be discussed in more
detail below.
The search location select field lists options for where (i.e., which column of
the Form table) the system should look to match the pattern. The search
location options are 'transcription', 'gloss', 'morpheme break', 'morpheme
gloss', 'general comments', 'speaker comments', 'syntactic category string' and
'ID'.
The order by expression allows one to state the order in which the matching
results should be returned. One thing to note about the order by expression is
that it will (probably) not order your results according to the order of graphs
in the orthography specified in your OLD application settings. The order by
expression uses the code points (in utf-8 encoding) of the characters in order
to determine order. If your orthography contains characters outside of the 26
standard English ones, then the code points of those characters will likely be
quite high and those characters will be understood by the system as coming after
'z'.
Additional search filters can be added to your search by clicking on the '+'
button next to 'additional search filters'. Here one can further refine a
Form search by putting conditions on the speaker, elicitor, enterer, verifier,
source, grammaticality, gloss grammaticality, elicitation method, (syntactic)
category, keywords, date elicited, date entered and/or date modified.
One can repeat or make modifications to a previous search by clicking
on the "previous searches" button at the bottom of the Form search page. The
OLD keeps a record of each user's last ten searches and displays them when this
button is clicked. When a past search is clicked, the search Form is returned
with the appropriate fields set so that clicking "Search Forms" will repeat the
search. This functionality is good for repeating searches as well as for making
modifications to previous searches without having to re-enter all the search
criteria.
The following subsections describe how to use each of the different search types
available in a search expression.
The 'as a phrase' search type option causes the system to return all forms where
the search location contains the specified search term as a substring. Thus,
searching for the pattern 'the' as a phrase in the transcription field will
return all Forms where the string 'the' is in the transcription, e.g., 'the
dog', 'they left', 'another thing', etc.
The 'all of these' search type option causes the search pattern to be split by
whitespace into sub-patterns and returns all Forms where the chosen location
contains all of the sub-patterns. For example, searching for 'the and' with the
'all of these' search type option in the transcription field will return all
Forms where both 'the' and 'and' are in the transcription, e.g., 'the sand',
'the cat and the dog', 'Mandy hit her brother', etc.
The 'any of these' search type option causes the search pattern to be split by
whitespace into sub-patterns and returns all Forms where the chosen location
contains any of the sub-patterns. For example, searching for 'the and' with the
'any of these' search type option in the transcription field will return all
Forms where either 'the' and 'and' are in the transcription, e.g., 'the dog',
'Mandy ran', 'John and Mary smiled', 'other people', etc.
The 'exactly' search type option returns all Forms where the selected search
location contains nothing but the search pattern. For example, searching for
'dog' in the transcription with 'exactly' as the search type option will return
all Forms where the transcription value contains the string 'dog' and nothing
else.
The 'as a reg exp' search type option causes the search term to be interpreted
as a regular expression. Regular expressions use a certain syntax which allows
you to specify complex patterns. For example, using regular expressions one
could search for the word 'the' and avoid matching 'other' or 'they' or one
could search for strings that begin with 't'. Regular expression searches are
very powerful but require learning a bit of the regular expression syntax. See
the Regular Expressions section for more details.
The text input of a search expression whose location is transcription (or
morpheme break, depending on the system settings) will be converted from the
input orthography to the storage orthography before the query is performed.