--- title: AOTM dataset keywords: fastai sidebar: home_sidebar summary: "AOTM dataset." description: "AOTM dataset." nb_path: "nbs/datasets/aotm.ipynb" ---
{% raw %}
{% endraw %} {% raw %}
{% endraw %} {% raw %}

class AOTMDataset[source]

AOTMDataset(root, process_method, min_session_length=2, min_item_support=2, num_slices=5, days_offset=0, days_shift=95, days_train=90, days_test=5) :: SessionDataset

Session data base class.

Args: min_session_length (int): Minimum number of items for a session to be valid min_item_support (int): Minimum number of interactions for an item to be valid eval_sec (int): these many seconds from the end will be taken as validation data

References:

1. https://github.com/Ethan-Yys/GRU4REC-pytorch-master/blob/master/preprocessing.py
{% endraw %} {% raw %}
{% endraw %} {% raw %}
!rm -r /content/aotm
aotmdata = AOTMDataset(root='/content/aotm', process_method='last')
Downloading https://github.com/RecoHut-Datasets/aotm/raw/v1/aotm.zip
Extracting /content/aotm/raw/aotm.zip
Processing...
Loaded data set
	Events: 1821241
	Sessions: 93313
	Items: 765790
	Span: 2016-01-02 / 2016-12-30


Filtered data set
	Events: 1192938
	Sessions: 87654
	Items: 138815
	Span: 2016-01-02 / 2016-12-30


Full train set
	Events: 1189593
	Sessions: 87410
	Items: 138815
Test set
	Events: 3345
	Sessions: 244
	Items: 3105
Train set
	Events: 1185992
	Sessions: 87145
	Items: 138814
Validation set
	Events: 3600
	Sessions: 265
	Items: 3363
Done!
{% endraw %} {% raw %}
!tree --du -h -C /content/aotm
/content/aotm
├── [ 83M]  processed
│   ├── [120K]  events_test.txt
│   ├── [ 42M]  events_train_full.txt
│   ├── [ 41M]  events_train_tr.txt
│   └── [129K]  events_train_valid.txt
└── [ 65M]  raw
    └── [ 65M]  playlists-aotm.csv

 149M used in 2 directories, 5 files
{% endraw %} {% raw %}
!rm -r /content/aotm
aotmdata = AOTMDataset(root='/content/aotm', process_method='days_test')
Downloading https://github.com/RecoHut-Datasets/aotm/raw/v1/aotm.zip
Extracting /content/aotm/raw/aotm.zip
Processing...
Loaded data set
	Events: 1821241
	Sessions: 93313
	Items: 765790
	Span: 2016-01-02 / 2016-12-30


Filtered data set
	Events: 1192938
	Sessions: 87654
	Items: 138815
	Span: 2016-01-02 / 2016-12-30


Full train set
	Events: 1176744
	Sessions: 86474
	Items: 138786
Test set
	Events: 16138
	Sessions: 1179
	Items: 12841
Done!
{% endraw %} {% raw %}
!tree --du -h -C /content/aotm
/content/aotm
├── [ 42M]  processed
│   ├── [578K]  events_test.txt
│   └── [ 41M]  events_train_full.txt
└── [ 65M]  raw
    └── [ 65M]  playlists-aotm.csv

 107M used in 2 directories, 3 files
{% endraw %} {% raw %}
!rm -r /content/aotm/processed/*
aotmdata = AOTMDataset(root='/content/aotm', process_method='slice')
Processing...
Loaded data set
	Events: 1821241
	Sessions: 93313
	Items: 765790
	Span: 2016-01-02 / 2016-12-30


Filtered data set
	Events: 1192938
	Sessions: 87654
	Items: 138815
	Span: 2016-01-02 / 2016-12-30


Full data set 0
	Events: 1192938
	Sessions: 87654
	Items: 138815
	Span: 2016-01-02T23:00:00+00:00 / 2016-12-30T23:00:42+00:00
Slice data set 0
	Events: 315346
	Sessions: 23197
	Items: 95205
	Span: 2016-01-02 / 2016-04-01 / 2016-04-06
Train set 0
	Events: 298510
	Sessions: 21960
	Items: 92778
	Span: 2016-01-02 / 2016-04-01
Test set 0
	Events: 14333
	Sessions: 1213
	Items: 10851
	Span: 2016-04-01 / 2016-04-06 


Full data set 1
	Events: 1192938
	Sessions: 87654
	Items: 138815
	Span: 2016-01-02T23:00:00+00:00 / 2016-12-30T23:00:42+00:00
Slice data set 1
	Events: 310422
	Sessions: 22778
	Items: 94240
	Span: 2016-04-06 / 2016-07-05 / 2016-07-10
Train set 1
	Events: 294246
	Sessions: 21594
	Items: 91873
	Span: 2016-04-06 / 2016-07-05
Test set 1
	Events: 13733
	Sessions: 1163
	Items: 10441
	Span: 2016-07-05 / 2016-07-10 


Full data set 2
	Events: 1192938
	Sessions: 87654
	Items: 138815
	Span: 2016-01-02T23:00:00+00:00 / 2016-12-30T23:00:42+00:00
Slice data set 2
	Events: 309023
	Sessions: 22710
	Items: 94366
	Span: 2016-07-10 / 2016-10-08 / 2016-10-13
Train set 2
	Events: 292756
	Sessions: 21528
	Items: 91935
	Span: 2016-07-10 / 2016-10-08
Test set 2
	Events: 13747
	Sessions: 1155
	Items: 10553
	Span: 2016-10-08 / 2016-10-13 


Full data set 3
	Events: 1192938
	Sessions: 87654
	Items: 138815
	Span: 2016-01-02T23:00:00+00:00 / 2016-12-30T23:00:42+00:00
Slice data set 3
	Events: 258147
	Sessions: 18969
	Items: 86140
	Span: 2016-10-13 / 2017-01-11 / 2017-01-16
Train set 3
	Events: 258147
	Sessions: 18969
	Items: 86140
	Span: 2016-10-13 / 2017-01-11
Test set 3
	Events: 0
	Sessions: 0
	Items: 0
	Span: 2017-01-11 / 2017-01-16 


Full data set 4
	Events: 1192938
	Sessions: 87654
	Items: 138815
	Span: 2016-01-02T23:00:00+00:00 / 2016-12-30T23:00:42+00:00
Slice data set 4
	Events: 0
	Sessions: 0
	Items: 0
	Span: 2017-01-16 / 2017-04-16 / 2017-04-21
Train set 4
	Events: 0
	Sessions: 0
	Items: 0
	Span: 2017-01-16 / 2017-04-16
Test set 4
	Events: 0
	Sessions: 0
	Items: 0
	Span: 2017-04-16 / 2017-04-21 


Done!
{% endraw %} {% raw %}
!tree --du -h -C /content/aotm
/content/aotm
├── [ 41M]  processed
│   ├── [513K]  events_test.0.txt
│   ├── [491K]  events_test.1.txt
│   ├── [491K]  events_test.2.txt
│   ├── [  38]  events_test.3.txt
│   ├── [  38]  events_test.4.txt
│   ├── [ 10M]  events_train_full.0.txt
│   ├── [ 10M]  events_train_full.1.txt
│   ├── [ 10M]  events_train_full.2.txt
│   ├── [9.0M]  events_train_full.3.txt
│   └── [  38]  events_train_full.4.txt
└── [ 65M]  raw
    └── [ 65M]  playlists-aotm.csv

 107M used in 2 directories, 11 files
{% endraw %}