--- title: Music30 dataset keywords: fastai sidebar: home_sidebar summary: "Music30 dataset." description: "Music30 dataset." nb_path: "nbs/datasets/datasets.music30.ipynb" ---
{% raw %}
{% endraw %} {% raw %}
{% endraw %} {% raw %}

class Music30Dataset[source]

Music30Dataset(root, process_method, min_session_length=2, min_item_support=2, num_slices=5, days_offset=0, days_shift=95, days_train=90, days_test=5) :: SessionDataset

Session data base class.

Args: min_session_length (int): Minimum number of items for a session to be valid min_item_support (int): Minimum number of interactions for an item to be valid eval_sec (int): these many seconds from the end will be taken as validation data

References:

1. https://github.com/Ethan-Yys/GRU4REC-pytorch-master/blob/master/preprocessing.py
{% endraw %} {% raw %}
{% endraw %} {% raw %}
musicdata = Music30Dataset(root='/content/music30', process_method='last')
Downloading https://github.com/RecoHut-Datasets/30music/raw/v1/30music.zip
Extracting /content/music30/raw/30music.zip
Processing...
Loaded data set
	Events: 3707857
	Sessions: 200000
	Items: 1203432
	Span: 2014-01-20 / 2015-01-20


Filtered data set
	Events: 2953382
	Sessions: 190216
	Items: 452855
	Span: 2014-01-20 / 2015-01-20


Full train set
	Events: 2892862
	Sessions: 186627
	Items: 450895
Test set
	Events: 54606
	Sessions: 3468
	Items: 35100
Train set
	Events: 2847481
	Sessions: 183674
	Items: 449290
Validation set
	Events: 41785
	Sessions: 2852
	Items: 29293
Done!
{% endraw %} {% raw %}
musicdata = Music30Dataset(root='/content/music30', process_method='last')
Processing...
Loaded data set
	Events: 3707857
	Sessions: 200000
	Items: 1203432
	Span: 2014-01-20 / 2015-01-20


Filtered data set
	Events: 2149666
	Sessions: 165766
	Items: 139016
	Span: 2014-01-20 / 2015-01-20


Full train set
	Events: 2105847
	Sessions: 162634
	Items: 138861
Test set
	Events: 41871
	Sessions: 3091
	Items: 23508
Train set
	Events: 2073194
	Sessions: 160047
	Items: 138755
Validation set
	Events: 31937
	Sessions: 2564
	Items: 20210
Done!
{% endraw %} {% raw %}
!tree --du -h -C /content/music30
/content/music30
├── [157M]  processed
│   ├── [1.6M]  events_test.txt
│   ├── [ 78M]  events_train_full.txt
│   ├── [ 77M]  events_train_tr.txt
│   └── [1.2M]  events_train_valid.txt
└── [137M]  raw
    └── [137M]  30music-200ks.csv

 295M used in 2 directories, 5 files
{% endraw %} {% raw %}
!rm /content/music30/processed/*
musicdata = Music30Dataset(root='/content/music30', process_method='days_test')
Processing...
Loaded data set
	Events: 3707857
	Sessions: 200000
	Items: 1203432
	Span: 2014-01-20 / 2015-01-20


Filtered data set
	Events: 2149666
	Sessions: 165766
	Items: 139016
	Span: 2014-01-20 / 2015-01-20


Full train set
	Events: 2073194
	Sessions: 160047
	Items: 138755
Test set
	Events: 73532
	Sessions: 5652
	Items: 36423
Done!
{% endraw %} {% raw %}
!tree --du -h -C /content/music30
/content/music30
├── [ 79M]  processed
│   ├── [2.7M]  events_test.txt
│   └── [ 77M]  events_train_full.txt
└── [137M]  raw
    └── [137M]  30music-200ks.csv

 217M used in 2 directories, 3 files
{% endraw %} {% raw %}
!rm /content/music30/processed/*
musicdata = Music30Dataset(root='/content/music30', process_method='slice')
Processing...
Loaded data set
	Events: 3707857
	Sessions: 200000
	Items: 1203432
	Span: 2014-01-20 / 2015-01-20


Filtered data set
	Events: 2149666
	Sessions: 165766
	Items: 139016
	Span: 2014-01-20 / 2015-01-20


Done!
{% endraw %} {% raw %}
!tree --du -h -C /content/music30
/content/music30
├── [4.0K]  processed
└── [137M]  raw
    └── [137M]  30music-200ks.csv

 137M used in 2 directories, 1 file
{% endraw %}