src.fairreckitlib.model.algorithms.implicit.implicit_recommender

This module contains the implicit recommender and creation functions.

Classes:

ImplicitRecommender: recommender implementation for implicit.

Functions:

create_als: create AlternatingLeastSquares recommender (factory creation compatible).
create_bpr: create BayesianPersonalizedRanking recommender (factory creation compatible).
create_lmf: create LogisticMatrixFactorization recommender (factory creation compatible).

This program has been developed by students from the bachelor Computer Science at Utrecht University within the Software Project course. © Copyright Utrecht University (Department of Information and Computing Sciences)

  1"""This module contains the implicit recommender and creation functions.
  2
  3Classes:
  4
  5    ImplicitRecommender: recommender implementation for implicit.
  6
  7Functions:
  8
  9    create_als: create AlternatingLeastSquares recommender (factory creation compatible).
 10    create_bpr: create BayesianPersonalizedRanking recommender (factory creation compatible).
 11    create_lmf: create LogisticMatrixFactorization recommender (factory creation compatible).
 12
 13This program has been developed by students from the bachelor Computer Science at
 14Utrecht University within the Software Project course.
 15© Copyright Utrecht University (Department of Information and Computing Sciences)
 16"""
 17
 18import time
 19from typing import Any, Dict, List
 20
 21from implicit.als import AlternatingLeastSquares
 22from implicit.bpr import BayesianPersonalizedRanking
 23from implicit.lmf import LogisticMatrixFactorization
 24from implicit.recommender_base import RecommenderBase
 25
 26import numpy as np
 27import pandas as pd
 28from scipy import sparse
 29
 30from ..base_recommender import Recommender
 31
 32
 33class ImplicitRecommender(Recommender):
 34    """Recommender implementation for the Implicit package."""
 35
 36    def __init__(self, algo: RecommenderBase, name: str, params: Dict[str, Any], **kwargs):
 37        """Construct the implicit recommender.
 38
 39        Args:
 40            algo: the implicit recommender algorithm.
 41            name: the name of the recommender.
 42            params: the parameters of the recommender.
 43
 44        Keyword Args:
 45            num_threads(int): the max number of threads the recommender can use.
 46            rated_items_filter(bool): whether to filter already rated items when
 47                producing item recommendations.
 48        """
 49        Recommender.__init__(self, name, params, kwargs['num_threads'],
 50                             kwargs['rated_items_filter'])
 51        self.algo = algo
 52
 53    def on_train(self, train_set: sparse.csr_matrix) -> None:
 54        """Train the algorithm on the train set.
 55
 56        The recommender should be trained with a csr matrix.
 57
 58        Args:
 59            train_set: the set to train the recommender with.
 60
 61        Raises:
 62            ArithmeticError: possibly raised by an algorithm on training.
 63            MemoryError: possibly raised by an algorithm on training.
 64            RuntimeError: possibly raised by an algorithm on training.
 65            TypeError: when the train set is not a csr matrix.
 66        """
 67        if not isinstance(train_set, sparse.csr_matrix):
 68            raise TypeError('Expected recommender to be trained with a csr matrix')
 69
 70        self.algo.fit(train_set, False)
 71
 72    def on_recommend(self, user: int, num_items: int) -> pd.DataFrame:
 73        """Compute item recommendations for the specified user.
 74
 75        Implicit recommenders use the stored CSR train set to produce item recommendations.
 76
 77        Args:
 78            user: the user ID to compute recommendations for.
 79            num_items: the number of item recommendations to produce.
 80
 81        Raises:
 82            ArithmeticError: possibly raised by a recommender on testing.
 83            MemoryError: possibly raised by a recommender on testing.
 84            RuntimeError: when the recommender is not trained yet.
 85
 86        Returns:
 87            dataframe with the columns: 'item' and 'score'.
 88        """
 89        items, scores = self.algo.recommend(
 90            user,
 91            self.train_set.get_matrix()[user],
 92            N=num_items,
 93            filter_already_liked_items=self.rated_items_filter
 94        )
 95
 96        return pd.DataFrame({ 'item': items, 'score': scores })
 97
 98    def on_recommend_batch(self, users: List[int], num_items: int) -> pd.DataFrame:
 99        """Compute the items recommendations for each of the specified users.
100
101        Implicit recommenders use the stored CSR train set to produce item recommendations.
102        Moreover, they allow for batching multiple users at the same time using multiple threads.
103
104        Args:
105            users: the user ID's to compute recommendations for.
106            num_items: the number of item recommendations to produce.
107
108        Raises:
109            ArithmeticError: possibly raised by a recommender on testing.
110            MemoryError: possibly raised by a recommender on testing.
111            RuntimeError: when the recommender is not trained yet.
112
113        Returns:
114            dataframe with the columns: 'rank', 'user', 'item', 'score'.
115        """
116        items, scores = self.algo.recommend(
117            users,
118            self.train_set.get_matrix()[users],
119            N=num_items,
120            filter_already_liked_items=True
121        )
122
123        result = pd.DataFrame()
124        num_users = len(users)
125        for i in range(num_users):
126            result = result.append(pd.DataFrame({
127                'rank': np.arange(1, 1 + num_items),
128                'user': np.full(num_items, users[i]),
129                'item': items[i],
130                'score': scores[i]
131            }), ignore_index=True)
132
133        return result
134
135
136def create_als(name: str, params: Dict[str, Any], **kwargs) -> ImplicitRecommender:
137    """Create the AlternatingLeastSquares recommender.
138
139    Args:
140        name: the name of the algorithm.
141        params: containing the following name-value pairs:
142            factors(int): the number of latent factors to compute.
143            regularization(float): the regularization factor to use.
144            use_native(bool): use native extensions to speed up model fitting.
145            use_cg(bool): use a faster Conjugate Gradient solver to calculate factors.
146            iterations(int): the number of ALS iterations to use when fitting data.
147            calculate_training_loss(bool): whether to log out the training loss at each iteration.
148            random_seed(int): the random seed or None for the current time as seed.
149
150    Keyword Args:
151        num_threads(int): the max number of threads the algorithm can use.
152
153    Returns:
154        the ImplicitRecommender wrapper of AlternatingLeastSquares.
155    """
156    if params['random_seed'] is None:
157        params['random_seed'] = int(time.time())
158
159    algo = AlternatingLeastSquares(
160        factors=params['factors'],
161        regularization=params['regularization'],
162        dtype=np.float32,
163        use_native=params['use_native'],
164        use_cg=params['use_cg'],
165        iterations=params['iterations'],
166        calculate_training_loss=params['calculate_training_loss'],
167        num_threads=kwargs['num_threads'],
168        random_state=params['random_seed']
169    )
170
171    return ImplicitRecommender(algo, name, params, **kwargs)
172
173
174def create_bpr(name: str, params: Dict[str, Any], **kwargs) -> ImplicitRecommender:
175    """Create the BayesianPersonalizedRanking recommender.
176
177    Args:
178        name: the name of the algorithm.
179        params: containing the following name-value pairs:
180            factors(int): the number of latent factors to compute.
181            learning_rate(float): the learning rate to apply for SGD updates during training.
182            regularization(float): the regularization factor to use.
183            iterations(int): the number of training epochs to use when fitting the data.
184            verify_negative_samples(bool): when sampling negative items, check if the randomly
185                picked negative item has actually been liked by the user. This check increases
186                the time needed to train but usually leads to better predictions.
187            random_seed(int): the random seed or None for the current time as seed.
188
189    Keyword Args:
190        num_threads(int): the max number of threads the algorithm can use.
191
192    Returns:
193        the ImplicitRecommender wrapper of BayesianPersonalizedRanking.
194    """
195    if params['random_seed'] is None:
196        params['random_seed'] = int(time.time())
197
198    algo = BayesianPersonalizedRanking(
199        factors=params['factors'],
200        learning_rate=params['learning_rate'],
201        regularization=params['regularization'],
202        dtype=np.float32,
203        iterations=params['iterations'],
204        num_threads=kwargs['num_threads'],
205        verify_negative_samples=params['verify_negative_samples'],
206        random_state=params['random_seed']
207    )
208
209    return ImplicitRecommender(algo, name, params, **kwargs)
210
211
212def create_lmf(name: str, params: Dict[str, Any], **kwargs) -> ImplicitRecommender:
213    """Create the LogisticMatrixFactorization recommender.
214
215    Args:
216        name: the name of the algorithm.
217        params: containing the following name-value pairs:
218            factors(int): the number of latent factors to compute.
219            learning_rate(float): the learning rate to apply for updates during training.
220            regularization(float): the regularization factor to use.
221            iterations(int): the number of training epochs to use when fitting the data.
222            neg_prop(int): the proportion of negative samples.
223            random_seed(int): the random seed or None for the current time as seed.
224
225    Keyword Args:
226        num_threads(int): the max number of threads the algorithm can use.
227
228    Returns:
229        the ImplicitRecommender wrapper of LogisticMatrixFactorization.
230    """
231    if params['random_seed'] is None:
232        params['random_seed'] = int(time.time())
233
234    algo = LogisticMatrixFactorization(
235        factors=params['factors'],
236        learning_rate=params['learning_rate'],
237        regularization=params['regularization'],
238        dtype=np.float32,
239        iterations=params['iterations'],
240        neg_prop=params['neg_prop'],
241        num_threads=kwargs['num_threads'],
242        random_state=params['random_seed']
243    )
244
245    return ImplicitRecommender(algo, name, params, **kwargs)
class ImplicitRecommender(src.fairreckitlib.model.algorithms.base_recommender.Recommender):
 34class ImplicitRecommender(Recommender):
 35    """Recommender implementation for the Implicit package."""
 36
 37    def __init__(self, algo: RecommenderBase, name: str, params: Dict[str, Any], **kwargs):
 38        """Construct the implicit recommender.
 39
 40        Args:
 41            algo: the implicit recommender algorithm.
 42            name: the name of the recommender.
 43            params: the parameters of the recommender.
 44
 45        Keyword Args:
 46            num_threads(int): the max number of threads the recommender can use.
 47            rated_items_filter(bool): whether to filter already rated items when
 48                producing item recommendations.
 49        """
 50        Recommender.__init__(self, name, params, kwargs['num_threads'],
 51                             kwargs['rated_items_filter'])
 52        self.algo = algo
 53
 54    def on_train(self, train_set: sparse.csr_matrix) -> None:
 55        """Train the algorithm on the train set.
 56
 57        The recommender should be trained with a csr matrix.
 58
 59        Args:
 60            train_set: the set to train the recommender with.
 61
 62        Raises:
 63            ArithmeticError: possibly raised by an algorithm on training.
 64            MemoryError: possibly raised by an algorithm on training.
 65            RuntimeError: possibly raised by an algorithm on training.
 66            TypeError: when the train set is not a csr matrix.
 67        """
 68        if not isinstance(train_set, sparse.csr_matrix):
 69            raise TypeError('Expected recommender to be trained with a csr matrix')
 70
 71        self.algo.fit(train_set, False)
 72
 73    def on_recommend(self, user: int, num_items: int) -> pd.DataFrame:
 74        """Compute item recommendations for the specified user.
 75
 76        Implicit recommenders use the stored CSR train set to produce item recommendations.
 77
 78        Args:
 79            user: the user ID to compute recommendations for.
 80            num_items: the number of item recommendations to produce.
 81
 82        Raises:
 83            ArithmeticError: possibly raised by a recommender on testing.
 84            MemoryError: possibly raised by a recommender on testing.
 85            RuntimeError: when the recommender is not trained yet.
 86
 87        Returns:
 88            dataframe with the columns: 'item' and 'score'.
 89        """
 90        items, scores = self.algo.recommend(
 91            user,
 92            self.train_set.get_matrix()[user],
 93            N=num_items,
 94            filter_already_liked_items=self.rated_items_filter
 95        )
 96
 97        return pd.DataFrame({ 'item': items, 'score': scores })
 98
 99    def on_recommend_batch(self, users: List[int], num_items: int) -> pd.DataFrame:
100        """Compute the items recommendations for each of the specified users.
101
102        Implicit recommenders use the stored CSR train set to produce item recommendations.
103        Moreover, they allow for batching multiple users at the same time using multiple threads.
104
105        Args:
106            users: the user ID's to compute recommendations for.
107            num_items: the number of item recommendations to produce.
108
109        Raises:
110            ArithmeticError: possibly raised by a recommender on testing.
111            MemoryError: possibly raised by a recommender on testing.
112            RuntimeError: when the recommender is not trained yet.
113
114        Returns:
115            dataframe with the columns: 'rank', 'user', 'item', 'score'.
116        """
117        items, scores = self.algo.recommend(
118            users,
119            self.train_set.get_matrix()[users],
120            N=num_items,
121            filter_already_liked_items=True
122        )
123
124        result = pd.DataFrame()
125        num_users = len(users)
126        for i in range(num_users):
127            result = result.append(pd.DataFrame({
128                'rank': np.arange(1, 1 + num_items),
129                'user': np.full(num_items, users[i]),
130                'item': items[i],
131                'score': scores[i]
132            }), ignore_index=True)
133
134        return result

Recommender implementation for the Implicit package.

ImplicitRecommender( algo: implicit.recommender_base.RecommenderBase, name: str, params: Dict[str, Any], **kwargs)
37    def __init__(self, algo: RecommenderBase, name: str, params: Dict[str, Any], **kwargs):
38        """Construct the implicit recommender.
39
40        Args:
41            algo: the implicit recommender algorithm.
42            name: the name of the recommender.
43            params: the parameters of the recommender.
44
45        Keyword Args:
46            num_threads(int): the max number of threads the recommender can use.
47            rated_items_filter(bool): whether to filter already rated items when
48                producing item recommendations.
49        """
50        Recommender.__init__(self, name, params, kwargs['num_threads'],
51                             kwargs['rated_items_filter'])
52        self.algo = algo

Construct the implicit recommender.

Args: algo: the implicit recommender algorithm. name: the name of the recommender. params: the parameters of the recommender.

Keyword Args: num_threads(int): the max number of threads the recommender can use. rated_items_filter(bool): whether to filter already rated items when producing item recommendations.

def on_train(self, train_set: scipy.sparse._csr.csr_matrix) -> None:
54    def on_train(self, train_set: sparse.csr_matrix) -> None:
55        """Train the algorithm on the train set.
56
57        The recommender should be trained with a csr matrix.
58
59        Args:
60            train_set: the set to train the recommender with.
61
62        Raises:
63            ArithmeticError: possibly raised by an algorithm on training.
64            MemoryError: possibly raised by an algorithm on training.
65            RuntimeError: possibly raised by an algorithm on training.
66            TypeError: when the train set is not a csr matrix.
67        """
68        if not isinstance(train_set, sparse.csr_matrix):
69            raise TypeError('Expected recommender to be trained with a csr matrix')
70
71        self.algo.fit(train_set, False)

Train the algorithm on the train set.

The recommender should be trained with a csr matrix.

Args: train_set: the set to train the recommender with.

Raises: ArithmeticError: possibly raised by an algorithm on training. MemoryError: possibly raised by an algorithm on training. RuntimeError: possibly raised by an algorithm on training. TypeError: when the train set is not a csr matrix.

def on_recommend(self, user: int, num_items: int) -> pandas.core.frame.DataFrame:
73    def on_recommend(self, user: int, num_items: int) -> pd.DataFrame:
74        """Compute item recommendations for the specified user.
75
76        Implicit recommenders use the stored CSR train set to produce item recommendations.
77
78        Args:
79            user: the user ID to compute recommendations for.
80            num_items: the number of item recommendations to produce.
81
82        Raises:
83            ArithmeticError: possibly raised by a recommender on testing.
84            MemoryError: possibly raised by a recommender on testing.
85            RuntimeError: when the recommender is not trained yet.
86
87        Returns:
88            dataframe with the columns: 'item' and 'score'.
89        """
90        items, scores = self.algo.recommend(
91            user,
92            self.train_set.get_matrix()[user],
93            N=num_items,
94            filter_already_liked_items=self.rated_items_filter
95        )
96
97        return pd.DataFrame({ 'item': items, 'score': scores })

Compute item recommendations for the specified user.

Implicit recommenders use the stored CSR train set to produce item recommendations.

Args: user: the user ID to compute recommendations for. num_items: the number of item recommendations to produce.

Raises: ArithmeticError: possibly raised by a recommender on testing. MemoryError: possibly raised by a recommender on testing. RuntimeError: when the recommender is not trained yet.

Returns: dataframe with the columns: 'item' and 'score'.

def on_recommend_batch(self, users: List[int], num_items: int) -> pandas.core.frame.DataFrame:
 99    def on_recommend_batch(self, users: List[int], num_items: int) -> pd.DataFrame:
100        """Compute the items recommendations for each of the specified users.
101
102        Implicit recommenders use the stored CSR train set to produce item recommendations.
103        Moreover, they allow for batching multiple users at the same time using multiple threads.
104
105        Args:
106            users: the user ID's to compute recommendations for.
107            num_items: the number of item recommendations to produce.
108
109        Raises:
110            ArithmeticError: possibly raised by a recommender on testing.
111            MemoryError: possibly raised by a recommender on testing.
112            RuntimeError: when the recommender is not trained yet.
113
114        Returns:
115            dataframe with the columns: 'rank', 'user', 'item', 'score'.
116        """
117        items, scores = self.algo.recommend(
118            users,
119            self.train_set.get_matrix()[users],
120            N=num_items,
121            filter_already_liked_items=True
122        )
123
124        result = pd.DataFrame()
125        num_users = len(users)
126        for i in range(num_users):
127            result = result.append(pd.DataFrame({
128                'rank': np.arange(1, 1 + num_items),
129                'user': np.full(num_items, users[i]),
130                'item': items[i],
131                'score': scores[i]
132            }), ignore_index=True)
133
134        return result

Compute the items recommendations for each of the specified users.

Implicit recommenders use the stored CSR train set to produce item recommendations. Moreover, they allow for batching multiple users at the same time using multiple threads.

Args: users: the user ID's to compute recommendations for. num_items: the number of item recommendations to produce.

Raises: ArithmeticError: possibly raised by a recommender on testing. MemoryError: possibly raised by a recommender on testing. RuntimeError: when the recommender is not trained yet.

Returns: dataframe with the columns: 'rank', 'user', 'item', 'score'.

def create_als( name: str, params: Dict[str, Any], **kwargs) -> src.fairreckitlib.model.algorithms.implicit.implicit_recommender.ImplicitRecommender:
137def create_als(name: str, params: Dict[str, Any], **kwargs) -> ImplicitRecommender:
138    """Create the AlternatingLeastSquares recommender.
139
140    Args:
141        name: the name of the algorithm.
142        params: containing the following name-value pairs:
143            factors(int): the number of latent factors to compute.
144            regularization(float): the regularization factor to use.
145            use_native(bool): use native extensions to speed up model fitting.
146            use_cg(bool): use a faster Conjugate Gradient solver to calculate factors.
147            iterations(int): the number of ALS iterations to use when fitting data.
148            calculate_training_loss(bool): whether to log out the training loss at each iteration.
149            random_seed(int): the random seed or None for the current time as seed.
150
151    Keyword Args:
152        num_threads(int): the max number of threads the algorithm can use.
153
154    Returns:
155        the ImplicitRecommender wrapper of AlternatingLeastSquares.
156    """
157    if params['random_seed'] is None:
158        params['random_seed'] = int(time.time())
159
160    algo = AlternatingLeastSquares(
161        factors=params['factors'],
162        regularization=params['regularization'],
163        dtype=np.float32,
164        use_native=params['use_native'],
165        use_cg=params['use_cg'],
166        iterations=params['iterations'],
167        calculate_training_loss=params['calculate_training_loss'],
168        num_threads=kwargs['num_threads'],
169        random_state=params['random_seed']
170    )
171
172    return ImplicitRecommender(algo, name, params, **kwargs)

Create the AlternatingLeastSquares recommender.

Args: name: the name of the algorithm. params: containing the following name-value pairs: factors(int): the number of latent factors to compute. regularization(float): the regularization factor to use. use_native(bool): use native extensions to speed up model fitting. use_cg(bool): use a faster Conjugate Gradient solver to calculate factors. iterations(int): the number of ALS iterations to use when fitting data. calculate_training_loss(bool): whether to log out the training loss at each iteration. random_seed(int): the random seed or None for the current time as seed.

Keyword Args: num_threads(int): the max number of threads the algorithm can use.

Returns: the ImplicitRecommender wrapper of AlternatingLeastSquares.

def create_bpr( name: str, params: Dict[str, Any], **kwargs) -> src.fairreckitlib.model.algorithms.implicit.implicit_recommender.ImplicitRecommender:
175def create_bpr(name: str, params: Dict[str, Any], **kwargs) -> ImplicitRecommender:
176    """Create the BayesianPersonalizedRanking recommender.
177
178    Args:
179        name: the name of the algorithm.
180        params: containing the following name-value pairs:
181            factors(int): the number of latent factors to compute.
182            learning_rate(float): the learning rate to apply for SGD updates during training.
183            regularization(float): the regularization factor to use.
184            iterations(int): the number of training epochs to use when fitting the data.
185            verify_negative_samples(bool): when sampling negative items, check if the randomly
186                picked negative item has actually been liked by the user. This check increases
187                the time needed to train but usually leads to better predictions.
188            random_seed(int): the random seed or None for the current time as seed.
189
190    Keyword Args:
191        num_threads(int): the max number of threads the algorithm can use.
192
193    Returns:
194        the ImplicitRecommender wrapper of BayesianPersonalizedRanking.
195    """
196    if params['random_seed'] is None:
197        params['random_seed'] = int(time.time())
198
199    algo = BayesianPersonalizedRanking(
200        factors=params['factors'],
201        learning_rate=params['learning_rate'],
202        regularization=params['regularization'],
203        dtype=np.float32,
204        iterations=params['iterations'],
205        num_threads=kwargs['num_threads'],
206        verify_negative_samples=params['verify_negative_samples'],
207        random_state=params['random_seed']
208    )
209
210    return ImplicitRecommender(algo, name, params, **kwargs)

Create the BayesianPersonalizedRanking recommender.

Args: name: the name of the algorithm. params: containing the following name-value pairs: factors(int): the number of latent factors to compute. learning_rate(float): the learning rate to apply for SGD updates during training. regularization(float): the regularization factor to use. iterations(int): the number of training epochs to use when fitting the data. verify_negative_samples(bool): when sampling negative items, check if the randomly picked negative item has actually been liked by the user. This check increases the time needed to train but usually leads to better predictions. random_seed(int): the random seed or None for the current time as seed.

Keyword Args: num_threads(int): the max number of threads the algorithm can use.

Returns: the ImplicitRecommender wrapper of BayesianPersonalizedRanking.

def create_lmf( name: str, params: Dict[str, Any], **kwargs) -> src.fairreckitlib.model.algorithms.implicit.implicit_recommender.ImplicitRecommender:
213def create_lmf(name: str, params: Dict[str, Any], **kwargs) -> ImplicitRecommender:
214    """Create the LogisticMatrixFactorization recommender.
215
216    Args:
217        name: the name of the algorithm.
218        params: containing the following name-value pairs:
219            factors(int): the number of latent factors to compute.
220            learning_rate(float): the learning rate to apply for updates during training.
221            regularization(float): the regularization factor to use.
222            iterations(int): the number of training epochs to use when fitting the data.
223            neg_prop(int): the proportion of negative samples.
224            random_seed(int): the random seed or None for the current time as seed.
225
226    Keyword Args:
227        num_threads(int): the max number of threads the algorithm can use.
228
229    Returns:
230        the ImplicitRecommender wrapper of LogisticMatrixFactorization.
231    """
232    if params['random_seed'] is None:
233        params['random_seed'] = int(time.time())
234
235    algo = LogisticMatrixFactorization(
236        factors=params['factors'],
237        learning_rate=params['learning_rate'],
238        regularization=params['regularization'],
239        dtype=np.float32,
240        iterations=params['iterations'],
241        neg_prop=params['neg_prop'],
242        num_threads=kwargs['num_threads'],
243        random_state=params['random_seed']
244    )
245
246    return ImplicitRecommender(algo, name, params, **kwargs)

Create the LogisticMatrixFactorization recommender.

Args: name: the name of the algorithm. params: containing the following name-value pairs: factors(int): the number of latent factors to compute. learning_rate(float): the learning rate to apply for updates during training. regularization(float): the regularization factor to use. iterations(int): the number of training epochs to use when fitting the data. neg_prop(int): the proportion of negative samples. random_seed(int): the random seed or None for the current time as seed.

Keyword Args: num_threads(int): the max number of threads the algorithm can use.

Returns: the ImplicitRecommender wrapper of LogisticMatrixFactorization.