--- title: RetailRocket Dataset keywords: fastai sidebar: home_sidebar summary: "RetailRocket dataset." description: "RetailRocket dataset." nb_path: "nbs/datasets/datasets.retailrocket.ipynb" ---
{% raw %}
{% endraw %} {% raw %}
{% endraw %}

v1

{% raw %}

class RetailRocketDataset[source]

RetailRocketDataset(root, shuffle=False, n_node=40727, is_train=True) :: SessionGraphDataset

References

1. COTREC session-based recommender model training. https://t.ly/cXTH.
{% endraw %} {% raw %}
{% endraw %} {% raw %}
root = '/content/retail_rocket'

train_data = RetailRocketDataset(root=root, shuffle=True, is_train=True)
test_data = RetailRocketDataset(root=root, shuffle=False, is_train=False)
Downloading https://github.com/RecoHut-Datasets/retail_rocket/raw/v1/all_train_seq.txt
Downloading https://github.com/RecoHut-Datasets/retail_rocket/raw/v1/train.txt
Using existing file all_train_seq.txt
Downloading https://github.com/RecoHut-Datasets/retail_rocket/raw/v1/test.txt
{% endraw %}

v2

{% raw %}

class RetailRocketDatasetv2[source]

RetailRocketDatasetv2(root, process_method, min_date='2015-09-02', session_length=1800, min_session_length=2, min_item_support=5, num_slices=5, days_offset=0, days_shift=27, days_train=25, days_test=2) :: Dataset

Load and process RetailRocket dataset.

Args: root (string): Root directory where the dataset should be saved. process_method (string): last: last day => test set last_min_date: last day => test set, but from a minimal date onwards days_test: last N days => test set slice: create multiple train-test-combinations with a sliding window approach min_date (string, optional): Minimum date session_length (int, optional): Session time length :default = 30 * 60 #30 minutes min_session_length (int, optional): Minimum number of items for a session to be valid min_item_support (int, optional): Minimum number of interactions for an item to be valid num_slices (int, optional): Offset in days from the first date in the data set days_offset (int, optional): Number of days the training start date is shifted after creating one slice days_shift (int, optional): Days shift days_train (int, optional): Days in train set in each slice days_test (int, optional): Days in test set in each slice

{% endraw %} {% raw %}
{% endraw %} {% raw %}
rr = RetailRocketDatasetv2(root='/content/retailrocket', process_method='last_min_date')
Processing...
1430622011
1442545187
Loaded data set
	Events: 2664312
	Sessions: 1755206
	Items: 234838
	Span: 2015-05-03 / 2015-09-18


Filtered data set
	Events: 1085763
	Sessions: 306919
	Items: 49070
	Span: 2015-05-03 / 2015-09-18


Filtered data set
	Events: 103032
	Sessions: 30705
	Items: 23246
	Span: 2015-09-01 / 2015-09-18


Full train set
	Events: 99363
	Sessions: 29631
	Items: 22866
Test set
	Events: 2925
	Sessions: 849
	Items: 1736
Train set
	Events: 95145
	Sessions: 28467
	Items: 22325
Validation set
	Events: 3295
	Sessions: 925
	Items: 1977
Done!
{% endraw %} {% raw %}
rr = RetailRocketDatasetv2(root='/content/retailrocket', process_method='last')
Processing...
1430622011
1442545187
Loaded data set
	Events: 2664312
	Sessions: 1755206
	Items: 234838
	Span: 2015-05-03 / 2015-09-18


Filtered data set
	Events: 1085763
	Sessions: 306919
	Items: 49070
	Span: 2015-05-03 / 2015-09-18


Full train set
	Events: 1082094
	Sessions: 305845
	Items: 49062
Test set
	Events: 3627
	Sessions: 1065
	Items: 2190
Train set
	Events: 1077876
	Sessions: 304681
	Items: 49058
Validation set
	Events: 4194
	Sessions: 1162
	Items: 2606
Done!
{% endraw %} {% raw %}
!tree --du -h -C /content/retailrocket
/content/retailrocket
├── [104M]  processed
│   ├── [4.1M]  events_buys.txt
│   ├── [456K]  events_test.0.txt
│   ├── [487K]  events_test.1.txt
│   ├── [494K]  events_test.2.txt
│   ├── [426K]  events_test.3.txt
│   ├── [363K]  events_test.4.txt
│   ├── [115K]  events_test.txt
│   ├── [6.6M]  events_train_full.0.txt
│   ├── [6.5M]  events_train_full.1.txt
│   ├── [6.4M]  events_train_full.2.txt
│   ├── [5.9M]  events_train_full.3.txt
│   ├── [5.1M]  events_train_full.4.txt
│   ├── [ 33M]  events_train_full.txt
│   ├── [ 33M]  events_train_tr.txt
│   └── [132K]  events_train_valid.txt
└── [ 90M]  raw
    └── [ 90M]  events.csv

 194M used in 2 directories, 16 files
{% endraw %}