odsparsator, a .ods parser.¶
Generate a JSON file from an OpenDocument Format .ods
file.
When used as a script, odsparsator
parses a .ods
file and generates a JSON
file using the odfdo
library.
When used as a library, odsparsator
parses a .ods
file and returns a Python
structure.
The resulting data follows the format of the reverse odsgenerator.py
script,
see https://github.com/jdum/odsgenerator
odsparsator
is a Python3
package, using the odfdo library. Current version requires Python >= 3.9, see prior versions for older environments.
Project: https://github.com/jdum/odsparsator
Author: jerome.dumonteil@gmail.com
License: MIT
Installation¶
Installation from Pypi (recommended):
pip install odsparsator
Installation from sources:
pip install .
CLI usage¶
odsparsator [-h] [--version] [options] input_file output_file
arguments¶
input_file
: input file, a .ods file.
output_file
: output file, JSON file generated from input.
Use odsparsator --help
for options:
options:
-h, --help show this help message and exit
--version show program's version number and exit
-m, --minimal keep only rows and cells, no styles, no formula, no column width
-a, --all-styles collect all styles from the input
-c, --color collect background color of cells
-k, --keep-styled keep styled cells with empty value
-s, --see-hidden parse also the hidden sheets
sample¶
$ odsparsator --minimal sample.ods sample_minimal.json
The result:
{
"body": [
{
"name": "first tab",
"table": [
["a", "b", "c"],
[10, 20, 30]
]
}
]
}
Without the –minimal option:
$ odsparsator sample.ods sample_with_styles.json
The result:
{
"body": [
{
"name": "first tab",
"table": [
{
"row": [
{
"value": "a",
"style": "bold_center_bg_gray_grid_06pt"
},
{
"value": "b",
"style": "bold_center_bg_gray_grid_06pt"
...
Usage from python code¶
from odsparsator import odsparsator
content = odsparsator.ods_to_python("sample1.ods")
Principle¶
A document is a list or dict containing tabs,
a tab is a list or dict containing rows,
a row is a list or dict containing cells.
A cell can be:
int, float or str,
a dict, with the following keys (only the ‘value’ key is mandatory):
value: int, float or str,
style: str or list of str, a style name or a list of style names,
text: str, a string representation of the value (for ODF readers who use it),
formula: str, content of the ‘table:formula’ attribute, some “of:” OpenFormula string,
colspanned: int, the number of spanned columns,
rowspanned: int, the number of spanned rows.
A row can be:
a list of cells,
a dict, with the following keys (only the ‘row’ key is mandatory):
row: a list of cells, see above,
style: str or list of str, a style name or a list of style names.
A tab can be:
a list of rows,
a dict, with the following keys (only the ‘table’ key is mandatory):
table: a list of rows,
width: a list containing the width of each column of the table
name: str, the name of the tab,
style: str or list of str, a style name or a list of style names.
A tab may have some post transformation:
a list of span areas, cell coordinates are defined in the tab after its creation using odfo method Table.set_span(), with either coordiante system: “A1:B3” or [0, 0, 2, 1].
A document can be:
a list of tabs,
a dict, with the following keys (only the ‘body’ key is mandatory):
body: a list of tabs,
styles: a list of dict of styles definitions,
defaults: a dict, for the defaults styles.
A style definition is a dict with 2 items:
the name of the style (optional, if not present the attribute style:name of the definition is used),
an XML definition of the ODF style, see list below.