ObjTables is a toolkit for working with complex data as collections of tables which combines the ease of use and flexibility of Excel with the rigor and power of defined schemas.
ObjTables makes it easy to
The ObjTables toolkit includes five components:
ObjTables enables users to leverage Excel as a graphical user interface for viewing and editing complex datasets. Excel-encoded datasets have the following features:
ObjTables makes it easy to implement rigorous validations of datasets:
ObjTables provides four user interfaces to the software tools:
ObjTables is available open-source under the MIT license .
ObjTables was developed to implement languages for describing whole-cell computational models and the data needed to build and verify them.
ObjTables supports three formats for collections of tables, or datasets:
The first row in each table must describe the version of ObjTables used by the table and the model which the table represents (e.g., !!ObjTables ObjTablesVersion='2.0' TableID='<model_name>').
Optionally, the first row can contain additional key-value pairs which represent additional metadata such as a description of the table and the date when the table was updated (e.g., !!ObjTables ObjTablesVersion='2.0' TableID='model_name' Description='Model description').
The attributes of each model are represented by the columns of its table, each instance of each model is represented as a row in a table, and the attributes of each instance are encoded into the cells in tables. The first rows in the table should define the attributes represented by each column; each cell should contain a ! followed by the name an attribute in the schema (e.g., !Id).
To help users encode complex relationships into a minimal number of tables, relationships between models can be encoded into groups of columns and individual cells.
Optionally, a model can be encoded into a transposed table in which the columns represent instances of the model and the rows represent attributes of the model. This feature is useful for models which are intended to only have a single instance. Models can be encoded into transposed tables by setting their format to column rather than row.
Comments about individual model instances can be encoded by inserting rows whose first cell begins with %.
Excel workbooks can include additional worksheets. Worksheets that are outside the scope of the schema must have names which do not not begin with an !.
Tables can include additional columns. Columns that are outside of the scope of the schema must have names which do not begin with an !.
Tables can include additional empty rows and rows with comments. Rows that contain comments must begin with %. Comments will be associated with the next data row, except for comments which are below the final row which will be associated to the final row.
The following example illustrates how to encode parents, children, and their favorite video games into two tables according to the example schema below. The FavoriteVideoGame model enables information about the favorite game of each child to be encapsulated into a group of columns within the Child table. The FavoriteVideoGame model also enables the Python representation of the data to encapsulate information about the favorite game of each child into a separate class. This helps make tables and Python code more human-readable.
!!SBtab TableID='Parent' SBtabVersion='2.0' | |
---|---|
!Id | !Name |
jane_doe | Jane Doe |
john_doe | John Doe |
mary_roe | Mary Roe |
richard_roe | Richard Roe |
!!SBtab TableID='Child' SBtabVersion='2.0' | ||||||
---|---|---|---|---|---|---|
!FavoriteVideoGame | ||||||
!Id | !Name | !Gender | !Parents | !Name | !Publisher | !Year |
jamie_doe | Jamie Doe | female | jane_doe, john_doe | Legend of Zelda | Nintendo | 1986 |
jimie_doe | Jimie Doe | male | jane_doe, john_doe | Super Mario Brothers | Nintendo | 1985 |
linda_roe | Linda Roe | female | mary_roe, richard_roe | Sonic the Hedgehog | Sega | 1991 |
mike_roe | Michael Roe | male | mary_roe, richard_roe | SimCity | Electronic Arts | 1989 |
Excel and Python files for the example are available here .
Additional examples are available at SBtab.net.
Schemas can either be defined using the tabular format described here or defined using the Python API. The tabular format is easier to use. The Python API enables methods for manipulating data to be encapsulated with schemas. This makes it easy to define custom validations such as for the element balance of a chemical reaction or for the lack of cycles in a network. Furthermore, the software tools can generate Python schemas from tabular-formatted schemas, which is a convenient starting point for further development.
Tabular-formatted schemas should begin with a single header row which indicates that the dataset is encoded in an ObjTables schema (!!ObjTables TableID="DEFINITION" ...).
After the header row, the schema file should contain a table with the following columns that defines the models/tables and their attributes/columns. Each row in the table should define a single model or attribute.
Tabular-formatted schemas can be saved in comma-separated (.csv), tab-separated (.tsv), or Excel (.xlsx) format.
The table should contain the following columns:
The following example illustrates a schema for encoding three models of parents, children, and their favorite video games into two tables of parents and children, with the favorite games of the children embedded into a group of columns within the children table.
!!SBtab TableID='DEFINITION' TableName='Table/model and column/attribute definitions' SBtabVersion='2.0' | ||||
---|---|---|---|---|
!Name | !Type | !Parent | !Format | !Description |
Parent | Table | column | Represents a parent | |
Id | Column | Parent | slug | Identifier |
Name | Column | Parent | string | |
Child | Table | row | Represents a child | |
Id | Column | Child | slug | Identifier |
Name | Column | Child | string | |
Gender | Column | Child | enum(['female', 'male']) | |
Parents | Column | Child | manyToMany('Parent', related_name='children') | |
FavoriteVideoGame | Column | Child | manyToOne('Game', related_name='children') | |
Game | Table | multiple_cells | Represents a video game | |
Name | Column | Game | string(unique=True) | |
Publisher | Column | Game | float | |
Year | Column | Game | integer |
Excel and Python files for the example are available here .
Additional examples are available at SBtab.net.
Schemas can also be implemented as Python modules. The software can convert tabular-formatted schemas into Python modules. The Python module format provides more flexibility than the tabular format. For example, Python-formatted schemas can encapsulate methods into schemas, which can be used to implement custom validations.
ObjTables supports numerous datatypes and makes it easy to implement additional types. For example, SBtab extends ObjTables by adding a variety of types for genomics, systems biology, and synthetic biology research.
ObjTables includes a variety of methods for working with schemas and datasets:
ObjTables includes four user interfaces to the software tools described above.
A Python library is available from PyPI .
Python modules which implement ObjTables schemas make it easy to create datasets, parse files into structured Python representations, query and edit datasets, and save datasets to files.
The following example illustrates to programmatically create, manipulate, analyze, and export the same dataset of parents and children described above.
import parents_children
# Create parents jane_doe = parents_children.Parent(id='jane_doe', name='Jane Doe') john_doe = parents_children.Parent(id='john_doe', name='John Doe') mary_roe = parents_children.Parent(id='mary_roe', name='Mary Roe') richard_roe = parents_children.Parent(id='richard_roe', name='Richard Roe') # Create children jamie_doe = parents_children.Child(id='jamie_doe', name='Jamie Doe', gender=parents_children.Child.gender.enum_class.female, parents=[jane_doe, john_doe]) jamie_doe.favorite_video_game = parents_children.Game(name='Legend of Zelda: Ocarina of Time', publisher='Nintendo', year=1998) jimie_doe = parents_children.Child(id='jimie_doe', name='Jimie Doe', gender=parents_children.Child.gender.enum_class.male, parents=[jane_doe, john_doe]) jimie_doe.favorite_video_game = parents_children.Game(name='Super Mario Brothers', publisher='Nintendo', year=1985) linda_roe = parents_children.Child(id='linda_roe', name='Linda Roe', gender=parents_children.Child.gender.enum_class.female, parents=[mary_roe, richard_roe]) linda_roe.favorite_video_game = parents_children.Game(name='Sonic the Hedgehog', publisher='Sega', year=1991) mike_roe = parents_children.Child(id='mike_roe', name='Michael Roe', gender=parents_children.Child.gender.enum_class.male, parents=[mary_roe, richard_roe]) mike_roe.favorite_video_game = parents_children.Game(name='SimCity', publisher='Electronic Arts', year=1989)
mike_roe = mary_roe.children.get_one(id='mike_roe') mikes_parents = mike_roe.parents mikes_sisters = mikes_parents[0].children.get(gender=parents_children.Child.gender.enum_class.female)
jamie_doe.favorite_video_game.name = 'Legend of Zelda' jamie_doe.favorite_video_game.year = 1986
import obj_tables objects = [jane_doe, john_doe, mary_roe, richard_roe, jamie_doe, jimie_doe, linda_roe, mike_roe] errors = obj_tables.Validator().run(objects) assert errors is None
import obj_tables.io filename = 'obj_tables/web_app/examples/parents_children.xlsx' objects = obj_tables.io.Reader().run(filename, sbtab=True, models=[parents_children.Parent, parents_children.Child], group_objects_by_model=True) parents = objects[parents_children.Parent] jane_doe_2 = next(parent for parent in parents if parent.id == 'jane_doe')
filename = 'obj_tables/web_app/examples/parents_children_copy.xlsx' objects = [jane_doe, john_doe, mary_roe, richard_roe, jamie_doe, jimie_doe, linda_roe, mike_roe] obj_tables.io.Writer().run(filename, objects, models=[parents_children.Parent, parents_children.Child], sbtab=True)
assert jane_doe.is_equal(jane_doe_2)
ObjTables was developed by the Karr Lab at the Icahn School of Medicine at Mount Sinai in New York, US and the Applied Mathematics and Computer Science, from Genomes to the Environment research unit at the Institut National de la Recherche Agronomique in Jouy en Josas, FR.