ObjTables is a toolkit for using schemas to model collections of tables that represent complex datasets, combining the ease of use of Excel with the rigor and power of schemas.
ObjTables makes it easy to
The ObjTables toolkit includes five components:
ObjTables enables users to leverage Excel as a graphical interface for viewing and editing complex datasets. Excel-encoded datasets have the following features:
ObjTables supports multiple levels of validation of datasets:
ObjTables provides four user interfaces to the software tools:
We developed ObjTables to implement languages for describing whole-cell computational models and the data needed to build and verify them.
ObjTables was designed for uses cases where users need to quickly view, edit, validate, analyze, and share entire datasets.
ObjTables schemas can either be defined using the tabular format described below or using the ObjTables Python API. The tabular format is easy to use, and requires no programming. The tabular format supports most of the features of ObjTables, including a wide range of data types, relationships, transposed and embedded tables, single inheritance, and basic validation.
The ObjTables Python API is more flexible. In addition to the features of the tabular schema format, the Python API supports abstract classes and methods, custom data types, custom validation, custom Excel formatting, and more. The Python API can also be used to encapsulate methods for manipulating data inside schemas.
We recommend that developers begin with the tabular format, and transition to the Python format when more capabilities are needed. When more capabilities are required, we recommend that developers use the ObjTables software tools to convert their initial tabular-formatted schema into a Python module. This workflow provides a quick path to developing a custom schema.
Tabular-formatted schemas should begin with a single header row, which indicates that the schema is encoded in ObjTables format (!!ObjTables type="Schema" ...).
After the header row, the schema should contain a table with the columns below that defines the classes/models/tables and their attributes/columns. Each row in the table should define a single class or attribute.
Tabular-formatted schemas can be saved in comma-separated (.csv), tab-separated (.tsv), or Excel (.xlsx) format.
The following example illustrates a schema for an address book of people, the companies that they work for, and their addresses. The Address class will be embedded into the tables for the Company and Person classes. This schema design minimizes the number of data tables needed to represent an address book. The example is available in TSV format from GitHub .
!!ObjTables type='Schema' tableFormat='row' description='Table/model and column/attribute definitions' date='2020-03-10 21:34:50' objTablesVersion='0.0.8' | ||||||
---|---|---|---|---|---|---|
!Name | !Type | !Parent | !Format | !Verbose name | !Verbose name plural | !Description |
Company | Class | column | Company | Companies | ||
name | Attribute | Company | String(primary=True, unique=True) | Name | ||
url | Attribute | Company | Url | URL | ||
address | Attribute | Company | OneToOne('Address', related_name='company') | Address | ||
Person | Class | row | Person | People | ||
name | Attribute | Person | String(primary=True, unique=True) | Name | ||
type | Attribute | Person | Enum(['family', 'friend', 'business']) | Type | ||
company | Attribute | Person | ManyToOne('Company', related_name='employees') | Company | ||
email_address | Attribute | Person | Email address | |||
phone_number | Attribute | Person | String | Phone number | ||
address | Attribute | Person | OneToOne('Address', related_name='person') | Address | ||
Address | Class | multiple_cells | Address | Addresses | ||
street | Attribute | Address | String(primary=True, unique=True) | Street | ||
city | Attribute | Address | String | City | ||
state | Attribute | Address | String | State | ||
zip_code | Attribute | Address | String | Zip code | ||
country | Attribute | Address | String | Country |
Additional examples are available at GitHub :
Schemas can also be implemented using the ObjTables Python API. The API enables developers to define schemas with a similar syntax to object-relational mapping tools such as Django , Ruby on Rails , and SQLAlchemy .
Please see the Python documentation for more information.
The following example illustrates a schema for an address book. The schema includes three classes to represent people (Person), the companies that they work for (Company), and their addresses (Address). The example is available in Python format from GitHub .
import obj_tables class Address(obj_tables.Model): street = obj_tables.StringAttribute(primary=True, unique=True, verbose_name='Street') city = obj_tables.StringAttribute(verbose_name='City') state = obj_tables.StringAttribute(verbose_name='State') zip_code = obj_tables.StringAttribute(verbose_name='Zip code') country = obj_tables.StringAttribute(verbose_name='Country') class Meta(obj_tables.Model.Meta): table_format = obj_tables.TableFormat.multiple_cells attribute_order = ('street', 'city', 'state', 'zip_code', 'country',) verbose_name = 'Address' verbose_name_plural = 'Addresses' class Person(obj_tables.Model): name = obj_tables.StringAttribute(primary=True, unique=True, verbose_name='Name') type = obj_tables.EnumAttribute(['family', 'friend', 'business'], verbose_name='Type') company = obj_tables.ManyToOneAttribute('Company', related_name='employees', verbose_name='Company') email_address = obj_tables.EmailAttribute(verbose_name='Email address') phone_number = obj_tables.StringAttribute(verbose_name='Phone number') address = obj_tables.OneToOneAttribute('Address', related_name='person', verbose_name='Address') class Meta(obj_tables.Model.Meta): table_format = obj_tables.TableFormat.row attribute_order = ('name', 'type', 'company', 'email_address', 'phone_number', 'address',) verbose_name = 'Person' verbose_name_plural = 'People' class Company(obj_tables.Model): name = obj_tables.StringAttribute(primary=True, unique=True, verbose_name='Name') url = obj_tables.UrlAttribute(verbose_name='URL') address = obj_tables.OneToOneAttribute('Address', related_name='company', verbose_name='Address') class Meta(obj_tables.Model.Meta): table_format = obj_tables.TableFormat.column attribute_order = ('name', 'url', 'address',) verbose_name = 'Company' verbose_name_plural = 'Companies'
Additional examples are available at GitHub :
To encode complex datasets into a minimal number of tables, related classes can be encoded into a column or a series of columns of the table that represents their (row-formatted) parent classes. (For column-formatted parent classes, related classes can be encoded into a row or a series of rows.)
ObjTables supports numerous data types, and it is easy to implement additional types.
This includes powerful data types from several mathematics and science packages:
The following table lists the attribute types currently available in ObjTables. The first column indicates the names that should be used in conjunction with the tabular schema format. The second column indicates the Python class which implements each type, and which should be used to define schemas with the Python API. The third column indicates the data type used to encode each attribute into an Excel cell. The fourth column indicates the Python data type used to represent each attribute in the ObjTables software tools.
A few of the attributes have required arguments. Please see the Python documentation for more information.
Most of the attributes also have optional arguments that can be used to control how values of the attribute are validated. For example, the Integer and Float attributes have optional min and max arguments which can be used to indicate the minimum and maximum valid values. The String and LongString attributes have optional min_length and max_length attributes which can be used to indicate the minimum and maximum valid lengths of each value.
In particular, the following arguments can be used to configure attributes to function as primary keys for encoding relationships from instances of other classes into their tables. These attributes are necessary to encode relationships between entries in different tables.
Please see the Python documentation for detailed information about these required and optional arguments.
ObjTables can encode and decode datasets into and from the file formats outlined below. We recommend using Excel for viewing and editing datasets. We recommend using CSV or TSV to store datasets because these formats are conducive to revisioning with version control systems such as Git. We recommend using JSON for working with ObjTables in other programming languages beyond Python.
Each collection of tables must declare that the tables are encoded in ObjTables's tabular format by including the line !!!ObjTables ... before one of the tables.
Optionally, document declarations can capture key-value pairs of document-level metadata. Each key and value should be a string. At a minimum, we recommend using the keys below. The ObjTables software automatically generates these keys when datasets are exported to files.
Each table must declare the class that it represents by including a line that begins with !!ObjTables type='Data' id='<class_name>' ...). The value of id should be the name of a class in the schema.
Optionally, table declarations can capture key-value pairs of class/table-level metadata. Each key and value should be a string. At a minimum, we recommend using the keys below. The ObjTables software automatically generates these keys when datasets are exported to files.
Class instances can be encoded into tables as follows:
The value of each attribute of each instance should be encoded into cells as follows:
Comments about individual class instances can be encoded as follows:
Sets of tables that encode datasets can include additional tables, rows, and columns. This can be used to encapsulate additional metadata alongside datasets.
The following example illustrates an address book of the CEOs of several major technology companies, the companies that they lead, and their addresses. The example is encoded into the schema outlined above . To capture this information with a minimal number of tables, each CEO and company's address is encoded into a range of cells within the row/column which represents each CEO and each company. This makes the tables human-readable and enables Python to encapsulate related information into distinct objects. The example is available in merged-TSV format from GitHub .
!!!ObjTables objTablesVersion='0.0.8' date='2020-03-14 13:19:04' |
---|
!!ObjTables type='Data' tableFormat='column' id='Company' name='Companies' date='2020-03-14 13:19:04' objTablesVersion='0.0.8' | |||||
---|---|---|---|---|---|
!Name | Apple | Netflix | |||
!URL | https://www.apple.com/ | https://www.facebook.com/ | https://www.google.com/ | https://www.netflix.com/ | |
!Address | !Street | 10600 N Tantau Ave | 1 Hacker Way #15 | 1600 Amphitheatre Pkwy | 100 Winchester Cir |
!City | Cupertino | Menlo Park | Mountain View | Los Gatos | |
!State | CA | CA | CA | CA | |
!Zip code | 95014 | 94025 | 94043 | 95032 | |
!Country | US | US | US | US |
!!ObjTables type='Data' tableFormat='row' id='Person' name='People' date='2020-03-14 13:19:04' objTablesVersion='0.0.8' | |||||||||
---|---|---|---|---|---|---|---|---|---|
!Address | |||||||||
!Name | !Type | !Company | !Email address | !Phone number | !Street | !City | !State | !Zip code | !Country |
Mark Zuckerberg | family | zuck@fb.com | 650-543-4800 | 1 Hacker Way #15 | Menlo Park | CA | 94025 | US | |
Reed Hastings | business | Netflix | reed.hastings@netflix.com | 408-540-3700 | 100 Winchester Cir | Los Gatos | CA | 95032 | US |
Sundar Pichai | business | sundar@google.com | 650-253-0000 | 1600 Amphitheatre Pkwy | Mountain View | CA | 94043 | US | |
Tim Cook | business | Apple | tcook@apple.com | 408-996-1010 | 10600 N Tantau Ave | Cupertino | CA | 95014 | US |
To make it easy to use ObjTables with other programming languages, ObjTables can encode datasets into JSON or YAML documents, which can easily be parsed by a wide range of languages. ObjTables encodes datasets into JSON and YAML as described and illustrated below.
{ # dictionary of metadata about the document "_documentMetadata": { "objTablesVersion": "<ObjTables version>", "date": "<Date>", ... # additional document metadata }, # dictionary which maps the name of each class to a dictionary with metadata about each class "_classMetadata": { "<ClassName>": { "id": "<ClassName>", "name": "<Class verbose name>", ... # additional metadata for the class }, ... # dictionaries with metadata for additional classes }, # for each class, lists of dictionaries which represent the instances of the classes "<ClassName>": [ { "__type": "<ClassName>", "__id": <unique integer id>, "<attribute name>": <value of attribute>, "<*to-one attribute name>": <integer id of related object>, "<*to-many attribute name>": [<integer id of related object>, ...], ... # additional attributes of the instance of the class }, ... # additional instances of the class ], ... # lists of dictionaries which represent the instances of additional classes }
This Python code illustrates how to decode datasets encoded into this JSON/YAML format. We recommend following this example to create methods for other languages.
Below are several examples of ObjTables schemas and datasets in all of the supported formats.
Description | Schema | UML diagram | Dataset | Schema and dataset |
---|---|---|---|---|
Address book | tsv, xlsx, py | svg | tsv, multi.tsv, xlsx, json, yml | multi.tsv, xlsx |
Financial transactions | tsv, xlsx, py | svg | tsv, multi.tsv, xlsx, json, yml | multi.tsv, xlsx |
Kinetic models of biochemical reactions | tsv, xlsx, py | svg | tsv, multi.tsv, xlsx, json, yml | multi.tsv, xlsx |
SBtab: systems biology data and kinetic models | tsv, xlsx, py | svg | ||
Hynne model of yeast glycolysis | xlsx | |||
Jiang et al. model of pancreatic beta-cell insulin secretion | xlsx | |||
Related data | xlsx | |||
Noor et al. model of Escherichia coli metabolism | xlsx | |||
Related data | xlsx | |||
Sigurdsson et al. model of mouse metabolism | xlsx | |||
Teusink model of yeast glycolysis | xlsx | |||
Related data | xlsx | |||
Wortel et al. model of Escherichia coli metabolism | xlsx | |||
Related data | xlsx | |||
Standard free energies of reactions calculated by eQuilibrator | xlsx | |||
Yeast transcriptional regulatory network inferred by Chang et al. | xlsx |
One of the major goals of ObjTables is to make it easy to rigorously validate the syntax and semantics of complex datasets. ObjTables achieves this by making it easy to utilize the six levels of validation summarized below. The validations marked with ○ can optionally be disabled.
ObjTables provides several tools for working with schemas and datasets. The tools marked with ● are available through all of the user interfaces (the web application, REST API, command-line program, and Python API). Please see below for more information about using these interfaces. The tools marked with ○ are only available through the Python API. Please see the Python documentation for more information about these tools .
A Python API is available from PyPI . As described above, the Python API has significantly more capabilities than the web application, REST API, and command-line program.
The following example briefly introduces the API by illustrating how to use the API to programmatically create, manipulate, analyze, and export the same address book of tech CEOs described above . This tutorial and additional tutorials are available as interactive Jupyter notebooks at sandbox.karrlab.org .
import schema as address_book PersonType = address_book.Person.type.enum_class
apple = address_book.Company(name='Apple', url='https://www.apple.com/', address=address_book.Address(street='10600 N Tantau Ave', city='Cupertino', state='CA', zip_code='95014', country='US')) facebook = address_book.Company(name='Facebook', url='https://www.facebook.com/', address=address_book.Address(street='1 Hacker Way #15', city='Menlo Park', state='CA', zip_code='94025', country='US')) google = address_book.Company(name='Google', url='https://www.google.com/', address=address_book.Address(street='1600 Amphitheatre Pkwy', city='Mountain View', state='CA', zip_code='94043', country='US')) netflix = address_book.Company(name='Netflix', url='https://www.netflix.com/', address=address_book.Address(street='100 Winchester Cir', city='Los Gatos', state='CA', zip_code='95032', country='US')) companies = [apple, facebook, google, netflix]
cook = address_book.Person(name='Tim Cook', type=PersonType.business, company=apple, email_address='tcook@apple.com', phone_number='408-996-1010', address=apple.address) hastings = address_book.Person(name='Reed Hastings', type=PersonType.business, company=netflix, email_address='reed.hastings@netflix.com', phone_number='408-540-3700', address=netflix.address) pichai = address_book.Person(name='Sundar Pichai', type=PersonType.business, company=google, email_address='sundar@google.com', phone_number='650-253-0000', address=google.address) zuckerberg = address_book.Person(name='Mark Zuckerberg', type=PersonType.family, company=facebook, email_address='zuck@fb.com', phone_number='650-543-4800', address=facebook.address) ceos = [cook, hastings, pichai, zuckerberg]
assert facebook.url == 'https://www.facebook.com/'
facebook.url = 'https://about.fb.com/'
import obj_tables errors = obj_tables.Validator().run(companies + ceos) assert errors is None
import obj_tables.io import os import tempfile dirname = tempfile.mkdtemp() filename_xlsx = os.path.join(dirname, 'address_book.xlsx') obj_tables.io.Writer().run(filename_xlsx, companies + ceos, models=[address_book.Company, address_book.Person])
objects = obj_tables.io.Reader().run(filename_xlsx, models=[address_book.Company, address_book.Person], group_objects_by_model=False, ignore_sheet_order=True)
zuckerberg_copy = next(el for el in objects if isinstance(el, address_book.Person) and el.name == 'Mark Zuckerberg') assert zuckerberg_copy.is_equal(zuckerberg) assert zuckerberg_copy.difference(zuckerberg) == ''
We recommend that developers use the REST API to work with ObjTables in other programming languages:
The documentation for the REST API contains detailed information about the inputs and outputs of each endpoint.
ObjTables is under active development. Below are several open issues that we intend to address going forward. Please see the GitHub issue list for more information.
Below are several resources that we recommend for working with ObjTables-encoded datasets:
While ObjTables has many similarities to other toolkits, ObjTables's unique combination of features provides some advantages for some use cases. Below, we outline the advantages and disadvantages of ObjTables of over several other types of tools.
ObjTables was developed by the Karr Lab at the Icahn School of Medicine at Mount Sinai in New York, US and the Applied Mathematics and Computer Science, from Genomes to the Environment research unit at the Institut National de la Recherche Agronomique in Jouy en Josas, FR.