ObjTables: a toolkit for parsing and validating tables with relational schemas

ObjTables is a toolkit for using schemas to model collections of tables that represent complex datasets, combining the ease of use of Excel with the rigor and power of schemas.

ObjTables makes it easy to

  • Use collections of tables (e.g., an Excel workbook) as an interface for viewing and editing complex datasets that consist of multiple related objects that have multiple attributes,
  • Use complex data types (e.g., numbers, strings, numerical arrays, symbolic mathematical expressions, chemical structures, biological sequences, etc.) within tables,
  • Use embedded tables and grammars to encode relational information into columns and groups of columns of tables,
  • Define schemas for collections of tables,
  • Use schemas to parse collections of tables into Python data structures for further analysis,
  • Use schemas to validate the syntax and semantics of collections of tables,
  • Conduct operations on complex datasets, such as comparing and merging objects, and
  • Edit schemas and migrate a dataset to a new version of a schema.

The ObjTables toolkit includes five components:

  • Tabular format for collections of tables. This includes syntax for declaring which cells each table, instance, and attribute; declaring which entries represent metadata such as the date that a table was updated; and declaring which entries represent comments.
  • Tabular format for schemas for collections of tables. ObjTables schemas capture the format of each table, including the name and data type of each column, which cells represent relationships among the entries in the tables, and constraints on the value of each cell. ObjTables supports three modes of encoding relationships into cells in tables.
    • Columns for relationships among objects represented by entries in tables: Relationships from one (primary) object to other (related) objects can be captured by (a) incorporating a column that represents a unique key for each related object into the table that represents the related objects and (b) encoding the keys for the related objects as a comma-separated list into a column in the table that represents the primary objects.
    • Embedded tables for *-to-one relationships: To help users encode complex datasets into a minimal number of tables, ObjTables can also encode instances of related classes into groups of columns. ObjTables uses merged headings to distinguish these columns.
    • Embedded grammars for relationships: To help users encode complex datasets into a minimal number of tables, grammars can be used to encode instances of related classes into a single column. These grammars can be defined declaratively in EBNF format using Lark .
  • Python API for defining schemas: For more flexibility, the Python API can be used to incorporate custom data types into schemas and define custom validation procedures.
  • Numerous data types including types for mathematics, science, chemoinformatics, and genomics.
  • Software tools for parsing, validating, and manipulating datasets according to schemas. This includes tools for
    • Pretty printing datasets as Excel workbooks. This enables users to use Excel as a graphical interface for quickly browsing and editing datasets as described below.
    • Creating templates for datasets.
    • Analyzing, comparing, merging, and revisioning datasets.
    • Migrating datasets between versions of their schemas.

ObjTables enables users to leverage Excel as a graphical interface for viewing and editing complex datasets. Excel-encoded datasets have the following features:

  • Table of contents: Optionally, each dataset can include a table that describes the classes represented by the data tables, displays the number of instances of each class, and provides hyperlinks to the data tables.
  • Formatted class titles: Each table includes a title bar that describes the class. The title bars are formatted, frozen, and protected from editing.
  • Formatted attribute headings: Each table includes headings for each column and group of columns. The headings are formatted, auto-filtered, frozen, and protected from editing.
  • Inline help for attributes: ObjTables uses Excel comments to embed help information about each attribute into its heading.
  • Select menus for enumerations and relationships: ObjTables provides select menus for each attribute that encodes an enumeration, a one-to-one relationship, or a many-to-one relationship.
  • Instant validation: ObjTables uses Excel to validate several properties of attributes. Note, due to the limitations of Excel, this provides limited validation. The ObjTables software provides far more extensive validation. Furthermore, ObjTables makes it easy to implement domain-specific validation at multiple levels.
  • Hidden extra rows and columns: To help users focus on the attributes of their classes, ObjTables hides all empty rows and columns.
  • Protection for unintentional editing: To help users avoid mistakes, ObjTables protects worksheets.

ObjTables supports multiple levels of validation of datasets:

  • Attribute validation: Validations of individual attributes can be defined declaratively (e.g., string(min_length=8)). More complex validations can be defined by implementing schemas in Python or by implementing custom types of attributes.
  • Instance validation: Users can implement custom instance-level validations by creating a Python module that implements a schema and implementing the validate method of each class.
  • Class-level validation: Most attributes can be constrained to have unique values across all instances (e.g., string(unique=True)). Python modules that implement schemas can also capture tuples of attributes that must be unique across all instances of a class. See the Python documentation for more information.

ObjTables provides four user interfaces to the software tools:

  • Web app below : The web app enables users to use ObjTables without having to install any software.
  • REST API : The REST API enables users to use ObjTables programmatically without having to install any software.
  • Command-line interface : The command-line interface enables users to use ObjTables without having to upload data to this website.
  • Python API : The Python API enables users to extend ObjTables with custom attributes and validation and use ObjTables to analyze complex datasets.

We developed ObjTables to implement languages for describing whole-cell computational models and the data needed to build and verify them.

Web app

Form

Output

Use cases

ObjTables was designed for uses cases where users need to quickly view, edit, validate, analyze, and share entire datasets.

  • Data integration and modeling: ObjTables is ideal for data integration and modeling because ObjTables makes it easy to use (a) workbooks for building and editing datasets/models, (b) the ObjTables software to rigorously validate datasets/models and quickly correct errors, (c) programming languages to analyze datasets and simulate models, and (d) version control systems such as Git to revision datasets and models.
  • Domain-specific formats/languages for nascent domains: ObjTables makes it easy to quickly define a format/language for a domain, including the schema and software tools for interacting with datasets encoded in the schema. In particular, ObjTables is well-suited to building domain-specific formats/langauges for nascent academic fields because ObjTables schemas and datasets are easy to create and iteratively edit.
  • Supplementary data of academic journal articles: ObjTables is an excellent medium for supplementary data for journal articles. ObjTables enables authors to publish their data in a format that can easily be (a) viewed by any reader (with just a workbook viewer such as Excel) generally without needing to install or learn any software and (b) parsed by readers into a data structure for further analysis. The later is made easy by the ability to package schemas and datasets together into a single workbook.
  • Sharing data with collaborators and colleagues: For similar reasons, ObjTables is an excellent medium for sharing data with collaborators and colleagues. Because Excel is so common, in many cases, ObjTables datasets shared via email or with networks such as DropBox and Google Drive can be previewed in email clients and web browsers without users having to open any additional software.

Formats for schemas

ObjTables schemas can either be defined using the tabular format described below or using the ObjTables Python API. The tabular format is easy to use, and requires no programming. The tabular format supports most of the features of ObjTables, including a wide range of data types, relationships, transposed and embedded tables, single inheritance, and basic validation.

The ObjTables Python API is more flexible. In addition to the features of the tabular schema format, the Python API supports abstract classes and methods, custom data types, custom validation, custom Excel formatting, and more. The Python API can also be used to encapsulate methods for manipulating data inside schemas.

We recommend that developers begin with the tabular format, and transition to the Python format when more capabilities are needed. When more capabilities are required, we recommend that developers use the ObjTables software tools to convert their initial tabular-formatted schema into a Python module. This workflow provides a quick path to developing a custom schema.

Tabular format for schemas

Tabular-formatted schemas should begin with a single header row, which indicates that the schema is encoded in ObjTables format (!!ObjTables type="Schema" ...).

After the header row, the schema should contain a table with the columns below that defines the classes/models/tables and their attributes/columns. Each row in the table should define a single class or attribute.

  • !Name: Name of the component (class or attribute).
    • Classes: A string that begins with a letter or underscore, and consists of letters, numbers, and underscores.
    • Attributes: A string that begins with a letter or underscore, and consists of letters, numbers, underscores, colons, forward carets, dots, dashes, square brackets, and spaces.
  • !Type: Type of the component (Class or Attribute).
  • !Parent:
    • Classes: Empty, or the name of the parent class. Because this is implemented using Python class inheritance, this must specify an acyclic inheritance graph.
    • Attributes: Name of the class that the attribute belongs to.
  • !Format:
    • Classes:
      • row: Encode the instances of the class as rows.
      • column: Encode the instances of the class as columns (i.e., transposed table).
      • multiple_cells: Encode the instances of the class within a group of columns in the tables of the one-to-many and one-to-one related classes.
      • cell: Encode the instances of the class within columns of the related classes, optionally, using a grammar. See the Python documentation for more information about working with grammars.
    • Attributes: One of the data types listed below (e.g., String, Float). Arguments for the data types should be described in parentheses (e.g., String(min_length=5). These arguments enable users to customize how data types function and are validated. See the Python documentation for more information about the attribute types and their arguments.
  • !Verbose name (Optional): Verbose name of the component.
  • !Verbose name plural (Optional): Plural verbose name of the component.
  • !Description (Optional): Description of the component.

Tabular-formatted schemas can be saved in comma-separated (.csv), tab-separated (.tsv), or Excel (.xlsx) format.

Example: Address book

The following example illustrates a schema for an address book of people, the companies that they work for, and their addresses. The Address class will be embedded into the tables for the Company and Person classes. This schema design minimizes the number of data tables needed to represent an address book. The example is available in TSV format from GitHub .

!!ObjTables type='Schema' tableFormat='row' description='Table/model and column/attribute definitions' date='2020-03-10 21:34:50' objTablesVersion='0.0.8'
!Name!Type!Parent!Format!Verbose name!Verbose name plural!Description
 
CompanyClasscolumnCompanyCompanies
nameAttributeCompanyString(primary=True, unique=True)Name
urlAttributeCompanyUrlURL
addressAttributeCompanyOneToOne('Address', related_name='company')Address
 
PersonClassrowPersonPeople
nameAttributePersonString(primary=True, unique=True)Name
typeAttributePersonEnum(['family', 'friend', 'business'])Type
companyAttributePersonManyToOne('Company', related_name='employees')Company
email_addressAttributePersonEmailEmail address
phone_numberAttributePersonStringPhone number
addressAttributePersonOneToOne('Address', related_name='person')Address
 
AddressClassmultiple_cellsAddressAddresses
streetAttributeAddressString(primary=True, unique=True)Street
cityAttributeAddressStringCity
stateAttributeAddressStringState
zip_codeAttributeAddressStringZip code
countryAttributeAddressStringCountry

Additional examples

Additional examples are available at GitHub :

  • The amounts and dates of financial transactions
  • Kinetic models of biochemical reactions
  • SBtab: systems biology data and kinetic models

Implementing schemas in Python using the Python API

Schemas can also be implemented using the ObjTables Python API. The API enables developers to define schemas with a similar syntax to object-relational mapping tools such as Django , Ruby on Rails , and SQLAlchemy .

  • Each class should be implemented as a subclass of obj_tables.Model.
  • Each attribute of each class should be implemented as a class attribute whose value is an instance of a subclass of obj_tables.Attribute. Numerous subclasses of obj_tables.Attribute, which support a wide range of data types, are available. Please see below and the Python documentation for more information, including the arguments of these classes.
    • Each relationship between two classes should be implemented using the obj_tables.OneToOneAttribute, obj_tables.OneToManyAttribute, obj_tables.ManyToOneAttribute, or obj_tables.ManyToManyAttribute classes.
  • The meta-information about each class, such as its verbose name, should be implemented as a class attribute with the name Meta that is a subclass of obj_tables.Model.Meta. This class can have the following class attributes:
    • unique_together: Tuple of tuples of the names of attributes whose values must be unique.
    • verbose_name: Verbose name of the class.
    • verbose_name_plural: Plural verbose name of the class.
    • description: Description of the class.
    • table_format: Indicates how the class should be formatted: obj_tables.TableFormat.row (normal table), obj_tables.TableFormat.column (transposed table), obj_tables.TableFormat.multiple_cells (inline as a set of columns/rows within the tables of related classes), or obj_tables.TableFormat.cell (inline encoded into a column or row of the tables of the related classes).
    • attribute_order: List of the names of the attributes of the class in the order they should be printed in its table.
    • ordering: Defines the default sorting for the instances of the class. This should be a list of strings of the names of attributes, in the order in which the values of the attributes should be sorted. Optionally, the prefix - can be used to indicate that instances should be sorted in reverse order of the values of the attribute.
    • frozen_columns: Number of columns to freeze when the class is printed to an Excel worksheet.
    • merge: Indicates how semantically-equivalent instances of the class should be merged when two datasets are merged: obj_tables.ModelMerge.join (concatenate their *-to-many relationships) or obj_tables.ModelMerge.append (throw an error if datasets contain semantically equivalent instances).

Please see the Python documentation for more information.

Example: Address book

The following example illustrates a schema for an address book. The schema includes three classes to represent people (Person), the companies that they work for (Company), and their addresses (Address). The example is available in Python format from GitHub .

import obj_tables

class Address(obj_tables.Model):
    street = obj_tables.StringAttribute(primary=True, unique=True, verbose_name='Street')
    city = obj_tables.StringAttribute(verbose_name='City')
    state = obj_tables.StringAttribute(verbose_name='State')
    zip_code = obj_tables.StringAttribute(verbose_name='Zip code')
    country = obj_tables.StringAttribute(verbose_name='Country')

    class Meta(obj_tables.Model.Meta):
        table_format = obj_tables.TableFormat.multiple_cells
        attribute_order = ('street', 'city', 'state', 'zip_code', 'country',)
        verbose_name = 'Address'
        verbose_name_plural = 'Addresses'


class Person(obj_tables.Model):
    name = obj_tables.StringAttribute(primary=True, unique=True, verbose_name='Name')
    type = obj_tables.EnumAttribute(['family', 'friend', 'business'], verbose_name='Type')
    company = obj_tables.ManyToOneAttribute('Company', related_name='employees', verbose_name='Company')
    email_address = obj_tables.EmailAttribute(verbose_name='Email address')
    phone_number = obj_tables.StringAttribute(verbose_name='Phone number')
    address = obj_tables.OneToOneAttribute('Address', related_name='person', verbose_name='Address')

    class Meta(obj_tables.Model.Meta):
        table_format = obj_tables.TableFormat.row
        attribute_order = ('name', 'type', 'company', 'email_address', 'phone_number', 'address',)
        verbose_name = 'Person'
        verbose_name_plural = 'People'


class Company(obj_tables.Model):
    name = obj_tables.StringAttribute(primary=True, unique=True, verbose_name='Name')
    url = obj_tables.UrlAttribute(verbose_name='URL')
    address = obj_tables.OneToOneAttribute('Address', related_name='company', verbose_name='Address')

    class Meta(obj_tables.Model.Meta):
        table_format = obj_tables.TableFormat.column
        attribute_order = ('name', 'url', 'address',)
        verbose_name = 'Company'
        verbose_name_plural = 'Companies'

Additional examples

Additional examples are available at GitHub :

  • The amounts and dates of financial transactions
  • Kinetic models of biochemical reactions
  • SBtab: systems biology data and kinetic models

Encoding multiple classes into a single table

To encode complex datasets into a minimal number of tables, related classes can be encoded into a column or a series of columns of the table that represents their (row-formatted) parent classes. (For column-formatted parent classes, related classes can be encoded into a row or a series of rows.)

  • Classes related by *-to-one attributes:
    • For row-formatted classes, classes related by *-to-one attributes can be encoded into a series of consecutive columns by setting the formats of the related classes to multiple_cells in the schema. When this feature is used, tables must include an additional row of column headings which indicate the groups of columns.
    • For column-formatted classes, classes related by *-to-one attributes can be encoded into a series of consecutive rows by setting the formats of the related classes to multiple_cells in the schema. When this feature is used, tables must include an additional column of row headings which indicate the groups of columns.
  • Classes related by a ToManyGrammar attribute:
    • For row-formatted classes, such related classes can be encoded into a column by setting the formats of the related classes to cell in the schema.
    • For column-formatted classes, such related classes can be encoded into a row by setting the formats of the related classes to cell in the schema.

Data types

ObjTables supports numerous data types, and it is easy to implement additional types.

This includes powerful data types from several mathematics and science packages:

  • BioPython : sequence informatics
  • lark : grammars for describing complex relationships
  • numpy : numeric arrays
  • pandas : data tables
  • pint : units
  • sympy : symbolic math
  • uncertainties : values and their uncertainty

The following table lists the attribute types currently available in ObjTables. The first column indicates the names that should be used in conjunction with the tabular schema format. The second column indicates the Python class which implements each type, and which should be used to define schemas with the Python API. The third column indicates the data type used to encode each attribute into an Excel cell. The fourth column indicates the Python data type used to represent each attribute in the ObjTables software tools.

Format Python class Excel type Python type
Fundamental data types
Booleanobj_tables.BooleanAttributeBooleanbool
Boolean
Enumobj_tables.EnumAttributeStringenum.Enum
Enumeration
Integerobj_tables.IntegerAttributeNumberint
Integer
PositiveIntegerobj_tables.PositiveIntegerAttributeNumberint
Positive integer
Floatobj_tables.FloatAttributeNumberfloat
Float
PositiveFloatobj_tables.PositiveFloatAttributeNumberfloat
Positive float
Stringobj_tables.StringAttributeTextstr
Short string
LongStringobj_tables.LongStringAttributeTextstr
Long string
Regexobj_tables.RegexAttributeTextstr
String that matches a regular expression
Emailobj_tables.EmailAttributeStringstr
Email address
Urlobj_tables.UrlAttributeStringstr
Uniform resource locater (URL)
Dateobj_tables.DateAttributeStringdatetime.date
Date
DateTimeobj_tables.DateTimeAttributeStringdatetime.datetime
Date and time
Timeobj_tables.TimeAttributeStringdatetime.time
Time
Relationships
OneToOneobj_tables.OneToOneAttributeStringobj_tables.Model
One-to-one relationship
OneToManyobj_tables.OneToManyAttributeStringlist of obj_tables.Model
One-to-many relationship
ManyToOneobj_tables.ManyToOneAttributeStringobj_tables.Model
Many-to-one relationship
ManyToManyobj_tables.ManyToManyAttributeStringlist of obj_tables.Model
Many-to-many relationship
Grammars for domain-specific languages for encoding data and relationships into an individual cell in a table
grammar.ToManyGrammarobj_tables.grammar.ToManyGrammarAttributeStringlist of obj_tables.Model
*-to-many relationship serialized/deserialized using a custom grammar.
Mathematics
math.Arrayobj_tables.math.ArrayAttributeStringnumpy.ndarray
Numerical array or matrix
math.Tableobj_tables.math.TableAttributeStringpandas.DataFrame
Data table, optionally with row and column labels
math.ManyToOneExpressionobj_tables.math.ManyToOneExpressionAttributeStringnumpy.ndarray
Numerical array or matrix
math.OneToOneExpressionobj_tables.math.OneToOneExpressionAttributeStringnumpy.ndarray
Numerical array or matrix
math.SymbolicExprobj_tables.math.SymbolicExprAttributeStringsympy.Expr
Symbolic expression
math.SymbolicSymbolobj_tables.math.SymbolicSymbolAttributeStringsympy.Symbol
Symbolic symbol
Science
sci.Unitobj_tables.sci.UnitAttributeStringpint.unit._Unit
Units
sci.Quantityobj_tables.sci.QuantityAttributeStringpint.quantity._Quantity
Magnitude and units
sci.UncertainFloatobj_tables.sci.UncertainFloatAttributeStringuncertainties.core.Variable
Float and its uncertainty
sci.Doiobj_tables.sci.DoiAttributeStringstr
Digital Object Identifier
sci.Doisobj_tables.sci.DoisAttributeStringlist of str
List of Digital Object Identifiers
sci.Identifierobj_tables.sci.IdentifierAttributeStringobj_tables.sci.Identifer
An identifier in a namespace registered with Identifiers.org
sci.Identifiersobj_tables.sci.IdentifiersAttributeStringlist of obj_tables.sci.Identifer
List of identifiers in namespaces registered with Identifiers.org
sci.OntoTermobj_tables.sci.OntoTermAttributeStringpronto.Term
Term in an ontology
sci.PubMedIdobj_tables.sci.PubMedIdAttributeNumberint
PubMed identifier
sci.PubMedIdsobj_tables.sci.PubMedIdsAttributeStringlist of int
List of PubMed identifiers
Chemistry
chem.ChemicalStructureobj_tables.chem.ChemicalStructureAttributeStringwc_utils.utils.chem.Structure
Chemical structure(openbabel.OBMol , bpforms.BpForm , bcforms.BcForm )
chem.EmpiricalFormulaobj_tables.chem.EmpiricalFormulaAttributeStringwc_utils.utils.chem.EmpiricalFormula
Empirical formula
Biology
bio.DnaSeqobj_tables.bio.DnaSeqAttributeStringBio.Seq.Seq
DNA sequence
bio.ProteinSeqobj_tables.bio.ProteinSeqAttributeStringBio.Seq.Seq
Protein sequence
bio.RnaSeqobj_tables.bio.RnaSeqAttributeStringBio.Seq.Seq
RNA sequence
bio.Seqobj_tables.bio.SeqAttributeStringBio.Seq.Seq
Sequence
bio.FeatureLocobj_tables.bio.FeatureLocAttributeStringBio.SeqFeature.FeatureLocation
Location of a sequence feature
bio.FreqPosMatrixobj_tables.bio.FreqPosMatrixAttributeStringBio.motif.matrix.FrequencyPositionMatrix
Frequency position matrix

Required arguments

A few of the attributes have required arguments. Please see the Python documentation for more information.

  • Enum attribute: The first argument must be a subclass of enum.Enum or a list of the enumerated values. For example, an enumerated attribute with two potential values 'A' and 'B' can be specified in the tabular schema format using the syntax Enum(['A', 'B']).
  • *To* attributes: The first argument must be a class or the name of a class. The constructor must also receive a keyword argument with the key related_name whose value is a string that represents the name of a virtual property that should be added to the related class to represent pointers from the related objects back to the primary objects. For example, an attribute with the name parents which represents a many-to-many relationship from a class Child that represents children to a class Parent that represents their parents and contains a virtual attribute children that represents the inverse many-to-many relationship can be specified in the tabular schema format using the syntax ManyToMany('Parent', related_name='children').
  • ToManyGrammar attributes: The first argument must be a class or the name of a class. The constructor must also receive a keyword argument with the key grammar whose argument is a string that defines the grammar or is the path to a file that defines the grammar. The grammar must be defined using Lark's EBNF syntax . In addition, Python developers can customize how data encoded into a grammar is transformed into class instances and attributes by (1) creating a subclass of obj_tables.ToManyGrammar and (2) setting its Transformer attribute to a subclass of obj_tables.ToManyGrammarTransformer. Please see the Python documentation for more information.

Optional arguments

Most of the attributes also have optional arguments that can be used to control how values of the attribute are validated. For example, the Integer and Float attributes have optional min and max arguments which can be used to indicate the minimum and maximum valid values. The String and LongString attributes have optional min_length and max_length attributes which can be used to indicate the minimum and maximum valid lengths of each value.

In particular, the following arguments can be used to configure attributes to function as primary keys for encoding relationships from instances of other classes into their tables. These attributes are necessary to encode relationships between entries in different tables.

  • unique: Set this argument to True to indicate that ObjTables should constrain the values of the attribute to be unique across all instances of the its parent class.
  • primary: Set this argument to True to indicate that the attribute is the primary key for its parent class. Each class can only have one primary attribute, and primary attributes should also have unique = True.

More information about the arguments

Please see the Python documentation for detailed information about these required and optional arguments.

File formats for datasets

ObjTables can encode and decode datasets into and from the file formats outlined below. We recommend using Excel for viewing and editing datasets. We recommend using CSV or TSV to store datasets because these formats are conducive to revisioning with version control systems such as Git. We recommend using JSON for working with ObjTables in other programming languages beyond Python.

  • Tabular formats: Collections of tables in comma-separated (CSV), tab-separated (TSV), or Excel format.
    • Set of CSV or TSV files (.csv, .tsv)
      • Each tabular file represents the instances of a class (and possibly related instances of other classes).
      • The names of the files should follow the pattern *.{csv,tsv} (e.g., Name.csv).
      • As described below, the table in each file should be proceeded by a line that begins with !!ObjTables id="TableName" ... that declares that ObjTables should process the table in the file. This declaration can also contain metadata about the following table.
      • ObjTables ignores files that do not contain this declaration. This can be used to store additional metadata alongside a dataset.
      • Sets of CSV and TSV files can be uploaded to the web application as a zip archive.
      • ObjTables uses the Excel dialect of the CSV and TSV formats .
    • Single text file which contains multiple CSV or TSV-formatted tables (.multi.csv, .multi.tsv)
      • This format is similar to a set of CSV or TSV files, except that the tables are concatenated into a single file.
      • The beginning of each table (and the end of the previous table) is indicated by the !!ObjTables id="TableName" ... declarations.
    • Excel workbook (.xlsx)
      • This format is also similar to a set of CSV or TSV files, except that each table is contained in a worksheet in a workbook rather than in a separate text file.
      • The title of each worksheet should be !! followed by the name of the class in the schema (e.g., !!Name) encoded into the worksheet.
      • Optionally, workbooks can include a table of contents worksheet that summarizes and provides hyperlinks to the data tables. This should have the title !!_Table of contents.
      • Optionally, workbooks can include a worksheet that describes the schema of the dataset. This should have the title !!_Schema.
      • The ObjTables software tools can generate these optional table of contents and schema worksheets.
      • Workbooks can contain additional worksheets whose names do not begin with !!. These worksheets will be ignored by ObjTables. This can be used to store additional metadata alongside a dataset.
  • Data serialization formats: Serialization of ObjTables' internal data structures. See below for more information about using these formats in other languages.
    • JavaScript Object Notation (.json)
    • YAML Ain't Markup Language (.yml)

Tabular format for datasets

Dataset/document declaration: declaring that a collection of tables is encoded in ObjTables' tabular format

Each collection of tables must declare that the tables are encoded in ObjTables's tabular format by including the line !!!ObjTables ... before one of the tables.

Dataset/document-level metadata

Optionally, document declarations can capture key-value pairs of document-level metadata. Each key and value should be a string. At a minimum, we recommend using the keys below. The ObjTables software automatically generates these keys when datasets are exported to files.

  • id: Use this key to provide a unique identifier for the dataset.
  • schema: Use this key to annotate the name of the schema of the dataset. If this key is set, ObjTables will check that its value matches the name of the schema used to interpret the dataset.
  • date: Use this key to annotate the date that the dataset was created or updated.
  • objTablesVersion: Use this key to indicate the version of the ObjTables tabular format used to encode the dataset.

Class/table declaration: declaring the class represented by each table

Each table must declare the class that it represents by including a line that begins with !!ObjTables type='Data' id='<class_name>' ...). The value of id should be the name of a class in the schema.

Class/table-level metadata

Optionally, table declarations can capture key-value pairs of class/table-level metadata. Each key and value should be a string. At a minimum, we recommend using the keys below. The ObjTables software automatically generates these keys when datasets are exported to files.

  • schema: Use this key to annotate the name of the schema of the table. If this key is set, ObjTables will check that its value matches the name of the schema used to interpret the table.
  • tableFormat: Use this key to indicate the format (row, column, multiple_cells, or cell) used to encode the class into the table. See above for more information about these formats. If this key is set, ObjTables will check that its value matches the format for the class defined in the schema.
  • name: Use this key to briefly describe the class.
  • description: Use this key to capture an extended description of the class.
  • date: Use this key to annotate the date that the table was created or updated.
  • objTablesVersion: Use this key to indicate the version of the ObjTables tabular format used to encode the class.

Class attribute declaration: declaring the attribute represented by each entry of each table

  • row-formatted classes: After the class/table declaration, the cells in the first row of the table should declare the attributes represented by each column. Each cell should begin with ! followed by the name of the attribute as defined in the schema (e.g., !Id).
  • column-formatted classes: Similarly, after the class declaration, the cells in the first column of the table should declare the attributes represented by each row.
  • multiple_cells-formatted classes: After the declaration of the table of the parent class, the cells in the second row/column of the table should declare the attributes represented by each range of columns/rows.

Class instances

Class instances can be encoded into tables as follows:

  • row-formatted classes: Each instance is encoded into a row of the table.
  • column-formatted classes: Each instance is encoded into a column of the table.
  • multiple_cells-formatted classes: Each instance is encoded into a range of cells within the row/column of its parent instance.
  • cell-formatted classes: Each instance is encoded into a cell in the row/column of its parent instance.

Attributes of class instances

The value of each attribute of each instance should be encoded into cells as follows:

  • Non-relational attributes: Each value should be serialized into a Boolean, number, or string that can be deserialized by the attribute's deserialize method. For example, dates can be deserialized into the format YYYY-MM-DD, and times can be serialized into the format hh:mm:ss. Please see the Python documentation for more information about how each attribute should be serialized.
  • Relational (*-to-one) attributes: Each value should be represented by the serialized value of the primary attribute of the related instance. For example, a value of an attribute, which is an instance of a class CreditCard, which has a primary attribute number, should be serialized to the number of the credit card. See above for more information about how to set the primary attribute of a class.
  • Relational (*-to-many) attributes: Each value should be represented as a comma-separated list of the serialized value of the primary attributes of the related instances. For example, a value of an attribute which is a list of instances of CreditCard should be serialized to a comma-separated list of the numbers of the credit cards. See above for more information about how to set the primary attribute of a class.

Comments about instances of classes

Comments about individual class instances can be encoded as follows:

  • row-formatted classes: Comments can be encoded as rows with a single cell in the first column which begins with %/ and ends with /%. Comments should be placed above the instance that the comment applies to.
  • column-formatted classes: Comments can similarly be encoded as columns with a single cell that begins with %/ and ends with /%.

Additional tables, rows, and columns not in the schema

Sets of tables that encode datasets can include additional tables, rows, and columns. This can be used to encapsulate additional metadata alongside datasets.

  • Additional tables: ObjTables ignores tabular files or worksheets which do not contain class/table declarations. ObjTables also ignores worksheets whose names do not begin with !!.
  • Additional columns (row-formatted tables) / rows (column-formatted tables): For row-formatted tables, ObjTables ignores columns whose headings do not begin with !. For column-formatted tables, ObjTables ignores rows whose headings do not begin with !.

Examples

Example: Address book

The following example illustrates an address book of the CEOs of several major technology companies, the companies that they lead, and their addresses. The example is encoded into the schema outlined above . To capture this information with a minimal number of tables, each CEO and company's address is encoded into a range of cells within the row/column which represents each CEO and each company. This makes the tables human-readable and enables Python to encapsulate related information into distinct objects. The example is available in merged-TSV format from GitHub .

!!!ObjTables objTablesVersion='0.0.8' date='2020-03-14 13:19:04'
!!ObjTables type='Data' tableFormat='column' id='Company' name='Companies' date='2020-03-14 13:19:04' objTablesVersion='0.0.8'
!NameAppleFacebookGoogleNetflix
!URLhttps://www.apple.com/https://www.facebook.com/https://www.google.com/https://www.netflix.com/
!Address
!Street10600 N Tantau Ave1 Hacker Way #151600 Amphitheatre Pkwy100 Winchester Cir
!CityCupertinoMenlo ParkMountain ViewLos Gatos
!StateCACACACA
!Zip code95014940259404395032
!CountryUSUSUSUS
!!ObjTables type='Data' tableFormat='row' id='Person' name='People' date='2020-03-14 13:19:04' objTablesVersion='0.0.8'
!Address
!Name!Type!Company!Email address!Phone number!Street!City!State!Zip code!Country
Mark ZuckerbergfamilyFacebookzuck@fb.com650-543-48001 Hacker Way #15Menlo ParkCA94025US
Reed HastingsbusinessNetflixreed.hastings@netflix.com408-540-3700100 Winchester CirLos GatosCA95032US
Sundar PichaibusinessGooglesundar@google.com650-253-00001600 Amphitheatre PkwyMountain ViewCA94043US
Tim CookbusinessAppletcook@apple.com408-996-101010600 N Tantau AveCupertinoCA95014US

Additional examples

Additional examples are available at GitHub :

  • The amounts and dates of financial transactions
  • Kinetic models of biochemical reactions
  • SBtab: systems biology data and kinetic models

JSON/YAML format for datasets

To make it easy to use ObjTables with other programming languages, ObjTables can encode datasets into JSON or YAML documents, which can easily be parsed by a wide range of languages. ObjTables encodes datasets into JSON and YAML as described and illustrated below.

  • Classes and metadata: Datasets are encoded into dictionaries. These dictionaries contain the following key-value pairs:
    • Document metadata
      • Key: _documentMetadata.
      • Value: Dictionary of key-value pairs that represent the document-level metadata of the dataset.
    • Class metadata
      • Key: _classMetadata.
      • Value: Dictionary whose keys are the names of the classes in the schema for the dataset and whose values are dictionaries of key-value pairs that represent the metadata for each class.
    • Instances of classes
      • Keys: Names of the classes in the schema.
      • Values: Lists of dictionaries that represent each instance of each class in the dataset.
  • Instances of classes: Each object in a dataset is encoded into a dictionary that has the following keys and values:
    • Type
      • Key: __type
      • Value: Name of the class of the object.
    • Unique identifier (used for serializing and deserializing relationships)
      • Key: __id
      • Value: Unique integer-valued id for each object. The ObjTables software automatically generates these ids when it exports a dataset.
    • Attributes
      • Keys: Names of the attributes of the class of the object.
      • Values: Values of the attributes (which are encoded into Booleans, numbers, strings, dictionaries, and lists as appropriate for each type of ObjTables attribute).
  • Relationships between instances: Attributes which represent relationships between objects are encoding using the unique integer-value ids generated for each object:
    • Each attribute which represents a *-to-one relationship is encoded into the unique integer id of the related object (or null if there is no related object).
    • Each attribute which represents a *-to-many relationship is encoded into a list of the unique integer ids of the related objects.
{
  # dictionary of metadata about the document
  "_documentMetadata":
  {
    "objTablesVersion": "<ObjTables version>",
    "date": "<Date>",
    ... # additional document metadata
  },

  # dictionary which maps the name of each class to a dictionary with metadata about each class
  "_classMetadata":
  {
    "<ClassName>":
    {
      "id": "<ClassName>",
      "name": "<Class verbose name>",
      ... # additional metadata for the class
    },
    ... # dictionaries with metadata for additional classes
  },

  # for each class, lists of dictionaries which represent the instances of the classes
  "<ClassName>":
  [
    {
      "__type": "<ClassName>",
      "__id": <unique integer id>,
      "<attribute name>": <value of attribute>,
      "<*to-one attribute name>": <integer id of related object>,
      "<*to-many attribute name>": [<integer id of related object>, ...],
      ... # additional attributes of the instance of the class
    },
    ... # additional instances of the class
  ],
  ... # lists of dictionaries which represent the instances of additional classes
}

This Python code illustrates how to decode datasets encoded into this JSON/YAML format. We recommend following this example to create methods for other languages.

Examples of schemas and datasets

Below are several examples of ObjTables schemas and datasets in all of the supported formats.

Description Schema UML diagram Dataset Schema and dataset
Address book tsv, xlsx, py svg tsv, multi.tsv, xlsx, json, yml multi.tsv, xlsx
Financial transactions tsv, xlsx, py svg tsv, multi.tsv, xlsx, json, yml multi.tsv, xlsx
Kinetic models of biochemical reactions tsv, xlsx, py svg tsv, multi.tsv, xlsx, json, yml multi.tsv, xlsx
SBtab: systems biology data and kinetic models tsv, xlsx, py svg
Hynne model of yeast glycolysis xlsx
Jiang et al. model of pancreatic beta-cell insulin secretion xlsx
Related data xlsx
Noor et al. model of Escherichia coli metabolism xlsx
Related data xlsx
Sigurdsson et al. model of mouse metabolism xlsx
Teusink model of yeast glycolysis xlsx
Related data xlsx
Wortel et al. model of Escherichia coli metabolism xlsx
Related data xlsx
Standard free energies of reactions calculated by eQuilibrator xlsx
Yeast transcriptional regulatory network inferred by Chang et al. xlsx

Validation of ObjTables datasets

One of the major goals of ObjTables is to make it easy to rigorously validate the syntax and semantics of complex datasets. ObjTables achieves this by making it easy to utilize the six levels of validation summarized below. The validations marked with ○ can optionally be disabled.

  1. Syntactic validation: The ObjTables software performs the following checks:
    • A dataset is encoded in the one of the tabular, JSON, or YAML formats.
    • The dataset contains all of the classes/tables and attributes/columns defined in the schema.
    • The dataset contains no extra classes/tables or attributes/columns.
    • Each class/table is oriented (normal, transposed) as defined in the schema.
    • The document and class metadata are syntactically valid lists of key-value pairs.
  2. Attribute validation
    • ObjTables checks that the value of each attribute is consistent with its type. For example, ObjTables checks that the value of each Integer attribute can be decoded to an integer.
    • ObjTables checks that the value of each attribute is consistent with any constraints defined in the schema. For example, if an Integer attribute is defined with min and max arguments, ObjTables checks that each value is in between these quantities.
    • For *-to-* relationship attributes, ObjTables checks that each related object is defined within the same dataset.
    • Python schema developers can implement additional validation by creating subclasses of obj_tables.Attribute and overriding their validate methods. For example, this can be used to check that an attribute that represents the participants of a biochemical reaction represents an element-balanced reaction.
  3. Object validation: Python schema developers can validate the semantic meaning of each object by overriding the validate method of each class. For example, this can be used to check that the values of two attributes that represent that the chemical formula and molecular weight of a compound are consistent.
  4. Uniqueness validation:
    • For each attribute that is defined with the optional argument unique=True, ObjTables checks that the values of the attribute are unique across the dataset.
    • For each unique_together constraint of each class, ObjTables checks that the combinations of the values of the attributes defined in the constraint are unique across the dataset. For example, this can be used to check that objects that represent linear mathematical expressions (set of other objects) represent unique expressions.
  5. Class validation: Python schema developers can validate the semantic meaning of all of the objects of a class by overriding the validate_unique method of each class. For example, this can be used to check that a class that represents the nodes of a graph and their edges does not contain cycles.
  6. Dataset validation: Python schema developers can validate the semantic meaning of an entire dataset by creating a subclass of obj_tables.Validator and overriding its run method. For example, this could be used to validate that a dataset that represents a model of the biochemistry of a cell captures the self-replicating behavior of that cell.

Software tools

ObjTables provides several tools for working with schemas and datasets. The tools marked with ● are available through all of the user interfaces (the web application, REST API, command-line program, and Python API). Please see below for more information about using these interfaces. The tools marked with ○ are only available through the Python API. Please see the Python documentation for more information about these tools .

  • Generate a Python module that implements a tabular-formatted schema.
  • Generate a UML diagram for a schema.
  • Generate a template CSV, TSV, or Excel file(s) for a schema.
  • Programmatically construct or edit a dataset.
  • Merge multiple datasets into a single dataset.
  • Partition a dataset into multiple datasets.
  • Migrate a dataset between versions of a schema. See the Python documentation for more information.
  • Use Git to revision a dataset.
  • Validate that a dataset adheres to a schema and report any errors.
  • Normalize a dataset into a deterministically reproducible ordering.
  • Sort a dataset into a random order.
  • Pretty print a dataset.
  • Use a schema to determine if two datasets are semantically equivalent.
  • Use a schema to evaluate the difference in the semantic meaning of two datasets.
  • Parse a dataset into a Python data structure.
  • Use a schema to convert a dataset to an alternate format.
  • Use a schema to convert a dataset to a dictionary of pandas data frames .

User interfaces

Web app

A web app is available above .

REST API

A REST API is available at objtables.org/api.

Command-line interface

A command-line interface is available from PyPI .

Python API

A Python API is available from PyPI . As described above, the Python API has significantly more capabilities than the web application, REST API, and command-line program.

The following example briefly introduces the API by illustrating how to use the API to programmatically create, manipulate, analyze, and export the same address book of tech CEOs described above . This tutorial and additional tutorials are available as interactive Jupyter notebooks at sandbox.karrlab.org .

Import schema

import schema as address_book
PersonType = address_book.Person.type.enum_class

Create companies

apple = address_book.Company(name='Apple',
                             url='https://www.apple.com/',
                             address=address_book.Address(street='10600 N Tantau Ave',
                                                          city='Cupertino',
                                                          state='CA',
                                                          zip_code='95014',
                                                          country='US'))
facebook = address_book.Company(name='Facebook',
                                url='https://www.facebook.com/',
                                address=address_book.Address(street='1 Hacker Way #15',
                                                             city='Menlo Park', state='CA',
                                                             zip_code='94025',
                                                             country='US'))
google = address_book.Company(name='Google',
                              url='https://www.google.com/',
                              address=address_book.Address(street='1600 Amphitheatre Pkwy',
                                                           city='Mountain View',
                                                           state='CA',
                                                           zip_code='94043',
                                                           country='US'))
netflix = address_book.Company(name='Netflix',
                               url='https://www.netflix.com/',
                               address=address_book.Address(street='100 Winchester Cir',
                                                            city='Los Gatos',
                                                            state='CA',
                                                            zip_code='95032',
                                                            country='US'))
companies = [apple, facebook, google, netflix]

Create CEOs

cook = address_book.Person(name='Tim Cook',
                           type=PersonType.business,
                           company=apple,
                           email_address='tcook@apple.com',
                           phone_number='408-996-1010',
                           address=apple.address)
hastings = address_book.Person(name='Reed Hastings',
                               type=PersonType.business,
                               company=netflix,
                               email_address='reed.hastings@netflix.com',
                               phone_number='408-540-3700',
                               address=netflix.address)
pichai = address_book.Person(name='Sundar Pichai',
                             type=PersonType.business,
                             company=google,
                             email_address='sundar@google.com',
                             phone_number='650-253-0000',
                             address=google.address)
zuckerberg = address_book.Person(name='Mark Zuckerberg',
                                 type=PersonType.family,
                                 company=facebook,
                                 email_address='zuck@fb.com',
                                 phone_number='650-543-4800',
                                 address=facebook.address)

ceos = [cook, hastings, pichai, zuckerberg]

Get a property of a company

assert facebook.url == 'https://www.facebook.com/'

Edit a property of a company

facebook.url = 'https://about.fb.com/'

Validate address book

import obj_tables
errors = obj_tables.Validator().run(companies + ceos)
assert errors is None

Export address book to a file

import obj_tables.io
import os
import tempfile
dirname = tempfile.mkdtemp()
filename_xlsx = os.path.join(dirname, 'address_book.xlsx')
obj_tables.io.Writer().run(filename_xlsx, companies + ceos,
                           models=[address_book.Company, address_book.Person])

Import address book from a file

objects = obj_tables.io.Reader().run(filename_xlsx,
                                     models=[address_book.Company, address_book.Person],
                                     group_objects_by_model=False,
                                     ignore_sheet_order=True)

Check if two CEOs are semantically equivalent

zuckerberg_copy = next(el for el in objects if isinstance(el, address_book.Person) and el.name == 'Mark Zuckerberg')
assert zuckerberg_copy.is_equal(zuckerberg)
assert zuckerberg_copy.difference(zuckerberg) == ''

Working with ObjTables datasets with other programming languages

We recommend that developers use the REST API to work with ObjTables in other programming languages:

  • Visualizing a schema: Use the viz-schema endpoint to generate a UML diagram of a schema.
  • Generating a template for a dataset of a schema: Use the gen-template endpoint to generate a template for a dataset in CSV, TSV, merged CSV, merged TSV, or Excel format.
  • Validating that a dataset is consistent with a schema: Use the validate endpoint to check that a dataset adheres to a schema and identify any errors.
  • Parsing a dataset into a native data structure:
    1. Use the convert endpoint to encode a dataset into a JSON document. See above for more information about how ObjTables encodes datasets into JSON.
    2. Parse the JSON-encoded document into a native data structure. For example, use Python's json.loads method to parse a JSON-encoded document into a combination of lists, dictionaries, and scalars.
    3. Implement a method for decoding a dataset from this native data structure. This Python module illustrates how to decode datasets from native Python data structures. We recommend following this example to create methods for other languages.
    4. Use this method to decode the dataset from the native data structure that represent the JSON-encoded document.
  • Differencing datasets: Use the diff endpoint to check if two datasets are semantically equivalent and identify any differences.
  • Pretty printing a dataset: Use the normalize endpoint to pretty print a dataset in Excel format. This will print the classes and attributes in their canonical orders, highlight and freeze the headings of each table, embed help information about each column into comments, setup Excel validation for each cell, and protect each worksheet. Optionally, pretty printed datasets can include an additional worksheet with a table of contents that summarizes and provides hyperlinks to each class, as well as an additional worksheet that summarizes the schema. We recommend embedding schemas with datasets when sharing datasets with others or publishing datasets.
  • Converting a dataset to other formats: Use the convert endpoint to convert a dataset among CSV, TSV, merged CSV, merged TSV, Excel, JSON, and YAML formats.

The documentation for the REST API contains detailed information about the inputs and outputs of each endpoint.

Known limitations and future directions

ObjTables is under active development. Below are several open issues that we intend to address going forward. Please see the GitHub issue list for more information.

  • Currently, ObjTables has limited capabilities to convert schemas implemented with the Python API to the tabular format.
  • Currently, ObjTables has limited support for multiple inheritance.
  • Currently, ObjTables only supports one view per class.
  • Currently, ObjTables serializes Array and Table to individual cells. We are contemplating enabling these attributes to be serialized to separate files/worksheets.
  • Currently, ObjTables datasets can be exported to JSON and YAML. To facilitate use with other tools, we aim to also convert datasets to an SQL format such as SQLite.
  • Due to the limitations of the pint package, datasets that use Unit attributes cannot be pickled.
  • ObjTables is still somewhat memory and CPU-inefficient. In particular, the importing and exporting of datasets is inefficient. Going forward, we aim to improve the performance of ObjTables.

Examples, tutorials, documentation, and help

Installation instructions for the CLI interface and Python API

Installation instructions for the command-line program and Python API are available at docs.karrlab.org . A Dockerfile for building an Ubuntu Linux image with ObjTables is available from the ObjTables Git repository .

Examples

Several example schemas and datasets are available above .

Tutorials for the Python API

A Jupyter notebook with interactive tutorials is available at sandbox.karrlab.org .

Documentation for the formats for schemas and datasets

Documentation for the formats for schemas and the formats for datasets is available above .

Documentation for the REST API

Documentation for the REST API is available at objtables.org/api.

Documentation for the command-line program

Documentation for the command-line program is available inline by running obj-tables --help.

Documentation for the Python API

An introduction to the Python API is available above . Detailed documentation is available at docs.karrlab.org .

Further help

Please contact the Karr Lab with any questions.

Useful related resources for working with ObjTables

Below are several resources that we recommend for working with ObjTables-encoded datasets:

  • Graphical workbook editors
    • Microsoft Excel : Leading workbook editor
    • Libre Office : Open-source workbook editor for Linux, Mac, and Windows
    • WPS Office : Free workbook editor for Linux, Mac, and Windows
  • Python : Language for programmatically interacting with ObjTables-encoded datasets.

Comparison with other data modeling tools

While ObjTables has many similarities to other toolkits, ObjTables's unique combination of features provides some advantages for some use cases. Below, we outline the advantages and disadvantages of ObjTables of over several other types of tools.

  • Workbook editors with validation such as Excel: Workbook editors are user-friendly tools for quickly viewing and editing datasets, including relatively large datasets. However, workbook editors have limited data types, limited support for relationships, and limited support for validation. In addition, workbook editors do not separate workbooks from schemas such that schemas can be applied to multiple workbooks. As a result, tools for programmatically reading workbooks such as openpxyl cannot link related records across multiple worksheets into a connected object graph. ObjTables leverages the user-friendliness of workbook editors and adds the abilities to parse and validate workbooks with schemas that support many data types and rich validation at multiple levels. Consequently, ObjTables makes it easy to interact with datasets both as workbooks (with tools such as Excel) and as data structures (with languages such as Python).
  • Web frameworks with object-relational mapping (ORM) tools such as Django, Ruby on Rails, and SQLAlchemy: ORMs make it easy to define schemas and provide extensive support for validation. Together with web frameworks, ORMs can be used to build graphical web-based interfaces for viewing and editing datasets. However, these tools are focused on viewing and editing individual records rather than on viewing and editing entire datasets at once. As a result, it is difficult to use these tools to construct interfaces that enable users to quickly view and edit datasets with the ease of workbook tools such as Excel. In contrast, ObjTables leverages workbooks and workbook editors as an interface for quickly viewing and editing entire datasets. As a result, ObjTables makes it easier to create copies of datasets, revision datasets with version control systems such as Git, share datasets with others, and publish datasets as supplementary materials of journal articles. Furthermore, ObjTables minimizes the number of tables required to encode datasets by making it easy to nest tables. In addition, ObjTables leverages the fact that many more people are familiar with workbooks than web frameworks and ORMs, which enables nearly anyone to view and edit ObjTables datasets. However, unlike web frameworks and ORMs, ObjTables does not support separate models from views, and ObjTables is not suitable for very large datasets.
  • Schemas for data serialization formats such as JSON Schema and XMLs: Schema systems for data serialization formats such as JSON and XML make it easy to validate complex datasets that are encoded with formats that can easily be copied, shared, and published. However, viewers and editors for data serialization formats focus on viewing and editing individual objects, which can be cumbersome for large datasets. ObjTables improves over these systems by making it possible to convert data encoded in JSON and YAML into workbooks, which are easier to view and edit with tools such as Excel. In addition, ObjTables leverages the fact that many more people are familiar with workbooks than data serialization formats. However, ObjTables is not as mature as these systems, and ObjTables provides limited support for languages beyond Python.
  • Relational databases and querying tools such as MySQL Workbench for MySQL: Relational databases make it easy to define schemas and conduct arbitrary queries of datasets. However, querying tools such as MySQL Workbench provide cumbersome interfaces for viewing and editing large datasets, relational databases support limited data types, and it is difficult to validate relational databases without ORMs. Furthermore, using a relational database typically requires significant knowledge of its schema. ObjTables improves over these tools by leveraging workbooks and workbook editors for viewing and editing datasets, by supporting more data types, by using nested tables and grammars to simplify the representation of relationships, and by supporting validation.

Contributing to ObjTables

We welcome contributions to ObjTables! To contribute, please submit a GitHub pull request or contact us by email .

About ObjTables

Source code

ObjTables is available open-source from GitHub .

License

ObjTables is released under the MIT license .

Citing ObjTables

Coming soon!

Team

ObjTables was developed by the Karr Lab at the Icahn School of Medicine at Mount Sinai in New York, US and the Applied Mathematics and Computer Science, from Genomes to the Environment research unit at the Institut National de la Recherche Agronomique in Jouy en Josas, FR.

  • Jonathan Karr
  • Arthur Goldberg
  • Wolfram Liebermeister
  • Bilal Shaikh

Acknowledgments

ObjTables was supported by a National Institute of Health P41 award , a National Institute of Health MIRA R35 award , and a National Science Foundation INSPIRE award .

Questions/comments

Please contact the Karr Lab with any questions or comments.