Quick ms3
reference#
To run this notebook#
install ms3 (
pip install ms3
)set the
DATA_PATH
to where you want the folderdcml_corpora
to be created that contains the data
Read about Keys and IDs
DATA_PATH = '~'
Setup#
import os
import ms3
from git import Repo
corpora_path = os.path.join(os.path.expanduser(DATA_PATH), 'dcml_corpora')
if os.path.isdir(corpora_path):
repo = Repo(corpora_path)
else:
repo = Repo.clone_from(url='https://github.com/DCMLab/dcml_corpora.git',
to_path=corpora_path,
multi_options=['--recurse-submodules', '--shallow-submodules'])
print(f"dcml_corpora @ commit {repo.commit().hexsha}")
dcml_corpora @ commit 9dcde40cba36d31b900ff12852cc557b8cca8221
Parsing multiple scores at once#
The Corpus object#
Scores often come grouped into a corpus, so when we want to parse multiple scores, we create a Corpus object and pass it the directory containing the scores. ms3
will scan the directory and discover all scores and TSV files that can be potentially parsed:
tchaikovsky_path = os.path.join(corpora_path, 'tchaikovsky_seasons')
corpus = ms3.Corpus(tchaikovsky_path)
corpus
[default|all]
Corpus 'tchaikovsky_seasons'
----------------------------
Location: /home/hentsche/dcml_corpora/tchaikovsky_seasons
View: This view is called 'default'. It
- excludes fnames that are not contained in the metadata,
- filters out file extensions requiring conversion (such as .xml), and
- excludes review files and folders.
All 12 pieces are listed in 'metadata.tsv':
scores measures notes expanded events chords
detected detected detected detected detected detected
op37a01 1 1 1 1 1 1
op37a02 1 1 1 1 1 1
op37a03 1 1 1 1 1 1
op37a04 1 1 1 1 1 1
op37a05 1 1 1 1 1 1
op37a06 1 1 1 1 1 1
op37a07 1 1 1 1 1 1
op37a08 1 1 1 1 1 1
op37a09 1 1 1 1 1 1
op37a10 1 1 1 1 1 1
op37a11 1 1 1 1 1 1
op37a12 1 1 1 1 1 1
72/288 files are excluded from this view.
72 files have been excluded based on their subdir.
When inspecting this object,
corpora_path = '~/corelli'
corpora = ms3.Parse(corpora_path, level='c')
corpora
[default|all]
All corpora
-----------
View: This view is called 'default'. It
- excludes fnames that are not contained in the metadata,
- filters out file extensions requiring conversion (such as .xml), and
- excludes review files and folders.
has active scores measures notes expanded
metadata view detected detected detected detected
corpus
corelli yes default 149 149 149 149
1058/2995 files are excluded from this view.
1043 files have been excluded based on their subdir.
15 files have been excluded based on their file name.
From here we can use the methods
parse_scores() to parse all detected scores,
parse_tsv() to parse all detected TSV files (previously extracted from scores),
parse() to parse everything.
corpora.parse_scores()
corpora
[default|all]
All corpora
-----------
View: This view is called 'default'. It
- excludes fnames that are not contained in the metadata,
- filters out file extensions requiring conversion (such as .xml), and
- excludes review files and folders.
has active scores measures notes expanded
metadata view detected parsed detected detected detected
corpus
corelli yes default 149 149 149 149 149
1058/2995 files are excluded from this view.
1043 files have been excluded based on their subdir.
15 files have been excluded based on their file name.
Now we can extract the facets we need from the parsed scores, e.g. information on all measures from all scores:
corpora.get_facet('measures')
mc | mn | quarterbeats | duration_qb | keysig | timesig | act_dur | mc_offset | numbering_offset | dont_count | barline | breaks | repeats | next | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
corpus | fname | i | ||||||||||||||
corelli | op01n01a | 0 | 1 | 1 | 0 | 4.0 | -1 | 4/4 | 1 | 0 | <NA> | <NA> | NaN | NaN | firstMeasure | (2,) |
1 | 2 | 2 | 4 | 4.0 | -1 | 4/4 | 1 | 0 | <NA> | <NA> | NaN | NaN | <NA> | (3,) | ||
2 | 3 | 3 | 8 | 4.0 | -1 | 4/4 | 1 | 0 | <NA> | <NA> | NaN | NaN | <NA> | (4,) | ||
3 | 4 | 4 | 12 | 4.0 | -1 | 4/4 | 1 | 0 | <NA> | <NA> | NaN | NaN | <NA> | (5,) | ||
4 | 5 | 5 | 16 | 4.0 | -1 | 4/4 | 1 | 0 | <NA> | <NA> | NaN | NaN | <NA> | (6,) | ||
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | |
op04n12c | 14 | 15 | 15 | 84 | 6.0 | 2 | 12/8 | 3/2 | 0 | <NA> | <NA> | <NA> | <NA> | NaN | (16,) | |
15 | 16 | 16 | 90 | 6.0 | 2 | 12/8 | 3/2 | 0 | <NA> | <NA> | <NA> | <NA> | NaN | (17,) | ||
16 | 17 | 17 | 96 | 6.0 | 2 | 12/8 | 3/2 | 0 | <NA> | <NA> | <NA> | <NA> | NaN | (18,) | ||
17 | 18 | 18 | 102 | 6.0 | 2 | 12/8 | 3/2 | 0 | <NA> | <NA> | <NA> | <NA> | NaN | (19,) | ||
18 | 19 | 19 | 108 | 6.0 | 2 | 12/8 | 3/2 | 0 | <NA> | <NA> | <NA> | <NA> | end | (9, -1) |
4790 rows × 14 columns
Or we iterate through the corpora and print information on the first 10 notes:
for corpus_name, corpus_object in corpora:
print(f"First ten measures of {corpus_name}:")
display(corpus_object.get_facet('notes').iloc[:10])
First ten measures of corelli:
mc | mn | quarterbeats | duration_qb | mc_onset | mn_onset | timesig | staff | voice | duration | nominal_duration | scalar | tied | tpc | midi | name | octave | chord_id | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
fname | notes_i | ||||||||||||||||||
op01n01a | 0 | 1 | 1 | 0 | 1.0 | 0 | 0 | 4/4 | 3 | 1 | 1/4 | 1/4 | 1 | <NA> | -1 | 53 | F3 | 3 | 8 |
1 | 1 | 1 | 0 | 1.0 | 0 | 0 | 4/4 | 4 | 1 | 1/4 | 1/4 | 1 | <NA> | -1 | 53 | F3 | 3 | 14 | |
2 | 1 | 1 | 0 | 1.0 | 0 | 0 | 4/4 | 2 | 1 | 1/4 | 1/4 | 1 | <NA> | 3 | 81 | A5 | 5 | 4 | |
3 | 1 | 1 | 0 | 1.0 | 0 | 0 | 4/4 | 1 | 1 | 1/4 | 1/4 | 1 | <NA> | 0 | 84 | C6 | 6 | 0 | |
4 | 1 | 1 | 1 | 1.0 | 1/4 | 1/4 | 4/4 | 3 | 1 | 1/4 | 1/4 | 1 | <NA> | 1 | 55 | G3 | 3 | 9 | |
5 | 1 | 1 | 1 | 1.0 | 1/4 | 1/4 | 4/4 | 4 | 1 | 1/4 | 1/4 | 1 | <NA> | 1 | 55 | G3 | 3 | 15 | |
6 | 1 | 1 | 1 | 1.0 | 1/4 | 1/4 | 4/4 | 2 | 1 | 1/4 | 1/4 | 1 | <NA> | 1 | 79 | G5 | 5 | 5 | |
7 | 1 | 1 | 1 | 1.0 | 1/4 | 1/4 | 4/4 | 1 | 1 | 1/4 | 1/4 | 1 | <NA> | -2 | 82 | Bb5 | 5 | 1 | |
8 | 1 | 1 | 2 | 0.5 | 1/2 | 1/2 | 4/4 | 3 | 1 | 1/8 | 1/8 | 1 | <NA> | 3 | 57 | A3 | 3 | 10 | |
9 | 1 | 1 | 2 | 1.5 | 1/2 | 1/2 | 4/4 | 4 | 1 | 3/8 | 1/4 | 3/2 | <NA> | 3 | 57 | A3 | 3 | 16 |
The available facets are 'measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords'
.
We can request several at the same time:
corpora.get_facets(['labels', 'chords'])
mc | mn | quarterbeats | duration_qb | mc_onset | mn_onset | timesig | staff | voice | harmony_layer | ... | thoroughbass_duration | thoroughbass_level_1 | thoroughbass_level_2 | slur | thoroughbass_level_3 | articulation | staff_text | system_text | placement | dynamics | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
corpus | fname | facet | i | |||||||||||||||||||||
corelli | op01n01a | labels | 0 | 1 | 1 | 0 | 1.0 | 0 | 0 | 4/4 | 4 | 1 | 1 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | 1 | 1 | 1 | 1.0 | 1/4 | 1/4 | 4/4 | 4 | 1 | 1 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | |||
2 | 1 | 1 | 2 | 2.0 | 1/2 | 1/2 | 4/4 | 4 | 1 | 1 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | |||
3 | 2 | 2 | 4 | 0.5 | 0 | 0 | 4/4 | 4 | 1 | 1 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | |||
4 | 2 | 2 | 9/2 | 0.5 | 1/8 | 1/8 | 4/4 | 4 | 1 | 1 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | |||
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | |
op04n12c | chords | 421 | 19 | 19 | 110 | 2.0 | 1/2 | 1/2 | 12/8 | 3 | 1 | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | |
422 | 19 | 19 | 108 | 1.0 | 0 | 0 | 12/8 | 4 | 1 | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | |||
423 | 19 | 19 | 109 | 0.0 | 1/4 | 1/4 | 12/8 | 4 | 1 | NaN | ... | 1/4 | # | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | |||
424 | 19 | 19 | 109 | 1.0 | 1/4 | 1/4 | 12/8 | 4 | 1 | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | |||
425 | 19 | 19 | 110 | 2.0 | 1/2 | 1/2 | 12/8 | 4 | 1 | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
95639 rows × 31 columns