--- title: Bi-partite Dataset keywords: fastai sidebar: home_sidebar summary: "Generate bi-partite graph dataset." description: "Generate bi-partite graph dataset." nb_path: "nbs/transforms/bipartite.ipynb" ---
{% raw %}
{% endraw %} {% raw %}
{% endraw %} {% raw %}

class BipartiteDataset[source]

BipartiteDataset(*args, **kwds) :: Dataset

An abstract class representing a :class:Dataset.

All datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite :meth:__getitem__, supporting fetching a data sample for a given key. Subclasses could also optionally overwrite :meth:__len__, which is expected to return the size of the dataset by many :class:~torch.utils.data.Sampler implementations and the default options of :class:~torch.utils.data.DataLoader.

.. note:: :class:~torch.utils.data.DataLoader by default constructs a index sampler that yields integral indices. To make it work with a map-style dataset with non-integral indices/keys, a custom sampler must be provided.

{% endraw %} {% raw %}
{% endraw %} {% raw %}
import pandas as pd

train = pd.DataFrame(
    {'userId':[1,1,2,2,3,4,5],
     'itemId':[1,2,1,3,2,4,5],
     'rating':[4,5,2,5,3,2,4]}
)

train
userId itemId rating
0 1 1 4
1 1 2 5
2 2 1 2
3 2 3 5
4 3 2 3
5 4 4 2
6 5 5 4
{% endraw %} {% raw %}
class Args:
    # default column names
    user_col = 'userId'
    item_col = 'itemId'
    feedback_col = 'rating'
    # params
    K = 1 # The number of negative samples
    offset = 3.5 # Criterion of likes/dislikes
    # dataset
    num_u=5
    num_v=5
{% endraw %} {% raw %}
args = Args()
{% endraw %} {% raw %}
def deg_dist(train, num_v):
    uni, cou = np.unique(train[args.item_col].values-1, return_counts=True)
    cou = cou**(0.75)
    deg = np.zeros(num_v)
    deg[uni] = cou
    return torch.tensor(deg)

neg_dist = deg_dist(train, args.num_v)
neg_dist
tensor([1.6818, 1.6818, 1.0000, 1.0000, 1.0000], dtype=torch.float64)
{% endraw %} {% raw %}
import warnings
warnings.filterwarnings('ignore')
{% endraw %} {% raw %}
training_dataset = BipartiteDataset(args, train, neg_dist, args.offset, args.num_u, args.num_v, args.K)
training_dataset.negs_gen_EP(1)
training_dataset.edge_4 = training_dataset.edge_4_tot[:,:,:]
negative sampling for next epochs...
complete ! 0.05542922019958496
{% endraw %} {% raw %}
train
userId itemId rating
0 1 1 4
1 1 2 5
2 2 1 2
3 2 3 5
4 3 2 3
5 4 4 2
6 5 5 4
{% endraw %} {% raw %}
[(a,b,c,d) for a,b,c,d in zip(training_dataset.edge_1.tolist(),
                      training_dataset.edge_2.tolist(),
                      training_dataset.edge_3.tolist(),
                      training_dataset.edge_4[:,0,0].tolist())]
[(0, 5, 0.5, 7),
 (0, 6, 1.5, 9),
 (1, 5, -1.5, 8),
 (1, 7, 1.5, 9),
 (2, 6, -0.5, 8),
 (3, 8, -1.5, 6),
 (4, 9, 0.5, 6)]
{% endraw %}