src.fairreckitlib.model.algorithms.implicit.implicit_recommender
This module contains the implicit recommender and creation functions.
Classes:
ImplicitRecommender: recommender implementation for implicit.
Functions:
create_als: create AlternatingLeastSquares recommender (factory creation compatible).
create_bpr: create BayesianPersonalizedRanking recommender (factory creation compatible).
create_lmf: create LogisticMatrixFactorization recommender (factory creation compatible).
This program has been developed by students from the bachelor Computer Science at Utrecht University within the Software Project course. © Copyright Utrecht University (Department of Information and Computing Sciences)
1"""This module contains the implicit recommender and creation functions. 2 3Classes: 4 5 ImplicitRecommender: recommender implementation for implicit. 6 7Functions: 8 9 create_als: create AlternatingLeastSquares recommender (factory creation compatible). 10 create_bpr: create BayesianPersonalizedRanking recommender (factory creation compatible). 11 create_lmf: create LogisticMatrixFactorization recommender (factory creation compatible). 12 13This program has been developed by students from the bachelor Computer Science at 14Utrecht University within the Software Project course. 15© Copyright Utrecht University (Department of Information and Computing Sciences) 16""" 17 18import time 19from typing import Any, Dict, List 20 21from implicit.als import AlternatingLeastSquares 22from implicit.bpr import BayesianPersonalizedRanking 23from implicit.lmf import LogisticMatrixFactorization 24from implicit.recommender_base import RecommenderBase 25 26import numpy as np 27import pandas as pd 28from scipy import sparse 29 30from ..base_recommender import Recommender 31 32 33class ImplicitRecommender(Recommender): 34 """Recommender implementation for the Implicit package.""" 35 36 def __init__(self, algo: RecommenderBase, name: str, params: Dict[str, Any], **kwargs): 37 """Construct the implicit recommender. 38 39 Args: 40 algo: the implicit recommender algorithm. 41 name: the name of the recommender. 42 params: the parameters of the recommender. 43 44 Keyword Args: 45 num_threads(int): the max number of threads the recommender can use. 46 rated_items_filter(bool): whether to filter already rated items when 47 producing item recommendations. 48 """ 49 Recommender.__init__(self, name, params, kwargs['num_threads'], 50 kwargs['rated_items_filter']) 51 self.algo = algo 52 53 def on_train(self, train_set: sparse.csr_matrix) -> None: 54 """Train the algorithm on the train set. 55 56 The recommender should be trained with a csr matrix. 57 58 Args: 59 train_set: the set to train the recommender with. 60 61 Raises: 62 ArithmeticError: possibly raised by an algorithm on training. 63 MemoryError: possibly raised by an algorithm on training. 64 RuntimeError: possibly raised by an algorithm on training. 65 TypeError: when the train set is not a csr matrix. 66 """ 67 if not isinstance(train_set, sparse.csr_matrix): 68 raise TypeError('Expected recommender to be trained with a csr matrix') 69 70 self.algo.fit(train_set, False) 71 72 def on_recommend(self, user: int, num_items: int) -> pd.DataFrame: 73 """Compute item recommendations for the specified user. 74 75 Implicit recommenders use the stored CSR train set to produce item recommendations. 76 77 Args: 78 user: the user ID to compute recommendations for. 79 num_items: the number of item recommendations to produce. 80 81 Raises: 82 ArithmeticError: possibly raised by a recommender on testing. 83 MemoryError: possibly raised by a recommender on testing. 84 RuntimeError: when the recommender is not trained yet. 85 86 Returns: 87 dataframe with the columns: 'item' and 'score'. 88 """ 89 items, scores = self.algo.recommend( 90 user, 91 self.train_set.get_matrix()[user], 92 N=num_items, 93 filter_already_liked_items=self.rated_items_filter 94 ) 95 96 return pd.DataFrame({ 'item': items, 'score': scores }) 97 98 def on_recommend_batch(self, users: List[int], num_items: int) -> pd.DataFrame: 99 """Compute the items recommendations for each of the specified users. 100 101 Implicit recommenders use the stored CSR train set to produce item recommendations. 102 Moreover, they allow for batching multiple users at the same time using multiple threads. 103 104 Args: 105 users: the user ID's to compute recommendations for. 106 num_items: the number of item recommendations to produce. 107 108 Raises: 109 ArithmeticError: possibly raised by a recommender on testing. 110 MemoryError: possibly raised by a recommender on testing. 111 RuntimeError: when the recommender is not trained yet. 112 113 Returns: 114 dataframe with the columns: 'rank', 'user', 'item', 'score'. 115 """ 116 items, scores = self.algo.recommend( 117 users, 118 self.train_set.get_matrix()[users], 119 N=num_items, 120 filter_already_liked_items=True 121 ) 122 123 result = pd.DataFrame() 124 num_users = len(users) 125 for i in range(num_users): 126 result = result.append(pd.DataFrame({ 127 'rank': np.arange(1, 1 + num_items), 128 'user': np.full(num_items, users[i]), 129 'item': items[i], 130 'score': scores[i] 131 }), ignore_index=True) 132 133 return result 134 135 136def create_als(name: str, params: Dict[str, Any], **kwargs) -> ImplicitRecommender: 137 """Create the AlternatingLeastSquares recommender. 138 139 Args: 140 name: the name of the algorithm. 141 params: containing the following name-value pairs: 142 factors(int): the number of latent factors to compute. 143 regularization(float): the regularization factor to use. 144 use_native(bool): use native extensions to speed up model fitting. 145 use_cg(bool): use a faster Conjugate Gradient solver to calculate factors. 146 iterations(int): the number of ALS iterations to use when fitting data. 147 calculate_training_loss(bool): whether to log out the training loss at each iteration. 148 random_seed(int): the random seed or None for the current time as seed. 149 150 Keyword Args: 151 num_threads(int): the max number of threads the algorithm can use. 152 153 Returns: 154 the ImplicitRecommender wrapper of AlternatingLeastSquares. 155 """ 156 if params['random_seed'] is None: 157 params['random_seed'] = int(time.time()) 158 159 algo = AlternatingLeastSquares( 160 factors=params['factors'], 161 regularization=params['regularization'], 162 dtype=np.float32, 163 use_native=params['use_native'], 164 use_cg=params['use_cg'], 165 iterations=params['iterations'], 166 calculate_training_loss=params['calculate_training_loss'], 167 num_threads=kwargs['num_threads'], 168 random_state=params['random_seed'] 169 ) 170 171 return ImplicitRecommender(algo, name, params, **kwargs) 172 173 174def create_bpr(name: str, params: Dict[str, Any], **kwargs) -> ImplicitRecommender: 175 """Create the BayesianPersonalizedRanking recommender. 176 177 Args: 178 name: the name of the algorithm. 179 params: containing the following name-value pairs: 180 factors(int): the number of latent factors to compute. 181 learning_rate(float): the learning rate to apply for SGD updates during training. 182 regularization(float): the regularization factor to use. 183 iterations(int): the number of training epochs to use when fitting the data. 184 verify_negative_samples(bool): when sampling negative items, check if the randomly 185 picked negative item has actually been liked by the user. This check increases 186 the time needed to train but usually leads to better predictions. 187 random_seed(int): the random seed or None for the current time as seed. 188 189 Keyword Args: 190 num_threads(int): the max number of threads the algorithm can use. 191 192 Returns: 193 the ImplicitRecommender wrapper of BayesianPersonalizedRanking. 194 """ 195 if params['random_seed'] is None: 196 params['random_seed'] = int(time.time()) 197 198 algo = BayesianPersonalizedRanking( 199 factors=params['factors'], 200 learning_rate=params['learning_rate'], 201 regularization=params['regularization'], 202 dtype=np.float32, 203 iterations=params['iterations'], 204 num_threads=kwargs['num_threads'], 205 verify_negative_samples=params['verify_negative_samples'], 206 random_state=params['random_seed'] 207 ) 208 209 return ImplicitRecommender(algo, name, params, **kwargs) 210 211 212def create_lmf(name: str, params: Dict[str, Any], **kwargs) -> ImplicitRecommender: 213 """Create the LogisticMatrixFactorization recommender. 214 215 Args: 216 name: the name of the algorithm. 217 params: containing the following name-value pairs: 218 factors(int): the number of latent factors to compute. 219 learning_rate(float): the learning rate to apply for updates during training. 220 regularization(float): the regularization factor to use. 221 iterations(int): the number of training epochs to use when fitting the data. 222 neg_prop(int): the proportion of negative samples. 223 random_seed(int): the random seed or None for the current time as seed. 224 225 Keyword Args: 226 num_threads(int): the max number of threads the algorithm can use. 227 228 Returns: 229 the ImplicitRecommender wrapper of LogisticMatrixFactorization. 230 """ 231 if params['random_seed'] is None: 232 params['random_seed'] = int(time.time()) 233 234 algo = LogisticMatrixFactorization( 235 factors=params['factors'], 236 learning_rate=params['learning_rate'], 237 regularization=params['regularization'], 238 dtype=np.float32, 239 iterations=params['iterations'], 240 neg_prop=params['neg_prop'], 241 num_threads=kwargs['num_threads'], 242 random_state=params['random_seed'] 243 ) 244 245 return ImplicitRecommender(algo, name, params, **kwargs)
34class ImplicitRecommender(Recommender): 35 """Recommender implementation for the Implicit package.""" 36 37 def __init__(self, algo: RecommenderBase, name: str, params: Dict[str, Any], **kwargs): 38 """Construct the implicit recommender. 39 40 Args: 41 algo: the implicit recommender algorithm. 42 name: the name of the recommender. 43 params: the parameters of the recommender. 44 45 Keyword Args: 46 num_threads(int): the max number of threads the recommender can use. 47 rated_items_filter(bool): whether to filter already rated items when 48 producing item recommendations. 49 """ 50 Recommender.__init__(self, name, params, kwargs['num_threads'], 51 kwargs['rated_items_filter']) 52 self.algo = algo 53 54 def on_train(self, train_set: sparse.csr_matrix) -> None: 55 """Train the algorithm on the train set. 56 57 The recommender should be trained with a csr matrix. 58 59 Args: 60 train_set: the set to train the recommender with. 61 62 Raises: 63 ArithmeticError: possibly raised by an algorithm on training. 64 MemoryError: possibly raised by an algorithm on training. 65 RuntimeError: possibly raised by an algorithm on training. 66 TypeError: when the train set is not a csr matrix. 67 """ 68 if not isinstance(train_set, sparse.csr_matrix): 69 raise TypeError('Expected recommender to be trained with a csr matrix') 70 71 self.algo.fit(train_set, False) 72 73 def on_recommend(self, user: int, num_items: int) -> pd.DataFrame: 74 """Compute item recommendations for the specified user. 75 76 Implicit recommenders use the stored CSR train set to produce item recommendations. 77 78 Args: 79 user: the user ID to compute recommendations for. 80 num_items: the number of item recommendations to produce. 81 82 Raises: 83 ArithmeticError: possibly raised by a recommender on testing. 84 MemoryError: possibly raised by a recommender on testing. 85 RuntimeError: when the recommender is not trained yet. 86 87 Returns: 88 dataframe with the columns: 'item' and 'score'. 89 """ 90 items, scores = self.algo.recommend( 91 user, 92 self.train_set.get_matrix()[user], 93 N=num_items, 94 filter_already_liked_items=self.rated_items_filter 95 ) 96 97 return pd.DataFrame({ 'item': items, 'score': scores }) 98 99 def on_recommend_batch(self, users: List[int], num_items: int) -> pd.DataFrame: 100 """Compute the items recommendations for each of the specified users. 101 102 Implicit recommenders use the stored CSR train set to produce item recommendations. 103 Moreover, they allow for batching multiple users at the same time using multiple threads. 104 105 Args: 106 users: the user ID's to compute recommendations for. 107 num_items: the number of item recommendations to produce. 108 109 Raises: 110 ArithmeticError: possibly raised by a recommender on testing. 111 MemoryError: possibly raised by a recommender on testing. 112 RuntimeError: when the recommender is not trained yet. 113 114 Returns: 115 dataframe with the columns: 'rank', 'user', 'item', 'score'. 116 """ 117 items, scores = self.algo.recommend( 118 users, 119 self.train_set.get_matrix()[users], 120 N=num_items, 121 filter_already_liked_items=True 122 ) 123 124 result = pd.DataFrame() 125 num_users = len(users) 126 for i in range(num_users): 127 result = result.append(pd.DataFrame({ 128 'rank': np.arange(1, 1 + num_items), 129 'user': np.full(num_items, users[i]), 130 'item': items[i], 131 'score': scores[i] 132 }), ignore_index=True) 133 134 return result
Recommender implementation for the Implicit package.
37 def __init__(self, algo: RecommenderBase, name: str, params: Dict[str, Any], **kwargs): 38 """Construct the implicit recommender. 39 40 Args: 41 algo: the implicit recommender algorithm. 42 name: the name of the recommender. 43 params: the parameters of the recommender. 44 45 Keyword Args: 46 num_threads(int): the max number of threads the recommender can use. 47 rated_items_filter(bool): whether to filter already rated items when 48 producing item recommendations. 49 """ 50 Recommender.__init__(self, name, params, kwargs['num_threads'], 51 kwargs['rated_items_filter']) 52 self.algo = algo
Construct the implicit recommender.
Args: algo: the implicit recommender algorithm. name: the name of the recommender. params: the parameters of the recommender.
Keyword Args: num_threads(int): the max number of threads the recommender can use. rated_items_filter(bool): whether to filter already rated items when producing item recommendations.
54 def on_train(self, train_set: sparse.csr_matrix) -> None: 55 """Train the algorithm on the train set. 56 57 The recommender should be trained with a csr matrix. 58 59 Args: 60 train_set: the set to train the recommender with. 61 62 Raises: 63 ArithmeticError: possibly raised by an algorithm on training. 64 MemoryError: possibly raised by an algorithm on training. 65 RuntimeError: possibly raised by an algorithm on training. 66 TypeError: when the train set is not a csr matrix. 67 """ 68 if not isinstance(train_set, sparse.csr_matrix): 69 raise TypeError('Expected recommender to be trained with a csr matrix') 70 71 self.algo.fit(train_set, False)
Train the algorithm on the train set.
The recommender should be trained with a csr matrix.
Args: train_set: the set to train the recommender with.
Raises: ArithmeticError: possibly raised by an algorithm on training. MemoryError: possibly raised by an algorithm on training. RuntimeError: possibly raised by an algorithm on training. TypeError: when the train set is not a csr matrix.
73 def on_recommend(self, user: int, num_items: int) -> pd.DataFrame: 74 """Compute item recommendations for the specified user. 75 76 Implicit recommenders use the stored CSR train set to produce item recommendations. 77 78 Args: 79 user: the user ID to compute recommendations for. 80 num_items: the number of item recommendations to produce. 81 82 Raises: 83 ArithmeticError: possibly raised by a recommender on testing. 84 MemoryError: possibly raised by a recommender on testing. 85 RuntimeError: when the recommender is not trained yet. 86 87 Returns: 88 dataframe with the columns: 'item' and 'score'. 89 """ 90 items, scores = self.algo.recommend( 91 user, 92 self.train_set.get_matrix()[user], 93 N=num_items, 94 filter_already_liked_items=self.rated_items_filter 95 ) 96 97 return pd.DataFrame({ 'item': items, 'score': scores })
Compute item recommendations for the specified user.
Implicit recommenders use the stored CSR train set to produce item recommendations.
Args: user: the user ID to compute recommendations for. num_items: the number of item recommendations to produce.
Raises: ArithmeticError: possibly raised by a recommender on testing. MemoryError: possibly raised by a recommender on testing. RuntimeError: when the recommender is not trained yet.
Returns: dataframe with the columns: 'item' and 'score'.
99 def on_recommend_batch(self, users: List[int], num_items: int) -> pd.DataFrame: 100 """Compute the items recommendations for each of the specified users. 101 102 Implicit recommenders use the stored CSR train set to produce item recommendations. 103 Moreover, they allow for batching multiple users at the same time using multiple threads. 104 105 Args: 106 users: the user ID's to compute recommendations for. 107 num_items: the number of item recommendations to produce. 108 109 Raises: 110 ArithmeticError: possibly raised by a recommender on testing. 111 MemoryError: possibly raised by a recommender on testing. 112 RuntimeError: when the recommender is not trained yet. 113 114 Returns: 115 dataframe with the columns: 'rank', 'user', 'item', 'score'. 116 """ 117 items, scores = self.algo.recommend( 118 users, 119 self.train_set.get_matrix()[users], 120 N=num_items, 121 filter_already_liked_items=True 122 ) 123 124 result = pd.DataFrame() 125 num_users = len(users) 126 for i in range(num_users): 127 result = result.append(pd.DataFrame({ 128 'rank': np.arange(1, 1 + num_items), 129 'user': np.full(num_items, users[i]), 130 'item': items[i], 131 'score': scores[i] 132 }), ignore_index=True) 133 134 return result
Compute the items recommendations for each of the specified users.
Implicit recommenders use the stored CSR train set to produce item recommendations. Moreover, they allow for batching multiple users at the same time using multiple threads.
Args: users: the user ID's to compute recommendations for. num_items: the number of item recommendations to produce.
Raises: ArithmeticError: possibly raised by a recommender on testing. MemoryError: possibly raised by a recommender on testing. RuntimeError: when the recommender is not trained yet.
Returns: dataframe with the columns: 'rank', 'user', 'item', 'score'.
Inherited Members
137def create_als(name: str, params: Dict[str, Any], **kwargs) -> ImplicitRecommender: 138 """Create the AlternatingLeastSquares recommender. 139 140 Args: 141 name: the name of the algorithm. 142 params: containing the following name-value pairs: 143 factors(int): the number of latent factors to compute. 144 regularization(float): the regularization factor to use. 145 use_native(bool): use native extensions to speed up model fitting. 146 use_cg(bool): use a faster Conjugate Gradient solver to calculate factors. 147 iterations(int): the number of ALS iterations to use when fitting data. 148 calculate_training_loss(bool): whether to log out the training loss at each iteration. 149 random_seed(int): the random seed or None for the current time as seed. 150 151 Keyword Args: 152 num_threads(int): the max number of threads the algorithm can use. 153 154 Returns: 155 the ImplicitRecommender wrapper of AlternatingLeastSquares. 156 """ 157 if params['random_seed'] is None: 158 params['random_seed'] = int(time.time()) 159 160 algo = AlternatingLeastSquares( 161 factors=params['factors'], 162 regularization=params['regularization'], 163 dtype=np.float32, 164 use_native=params['use_native'], 165 use_cg=params['use_cg'], 166 iterations=params['iterations'], 167 calculate_training_loss=params['calculate_training_loss'], 168 num_threads=kwargs['num_threads'], 169 random_state=params['random_seed'] 170 ) 171 172 return ImplicitRecommender(algo, name, params, **kwargs)
Create the AlternatingLeastSquares recommender.
Args: name: the name of the algorithm. params: containing the following name-value pairs: factors(int): the number of latent factors to compute. regularization(float): the regularization factor to use. use_native(bool): use native extensions to speed up model fitting. use_cg(bool): use a faster Conjugate Gradient solver to calculate factors. iterations(int): the number of ALS iterations to use when fitting data. calculate_training_loss(bool): whether to log out the training loss at each iteration. random_seed(int): the random seed or None for the current time as seed.
Keyword Args: num_threads(int): the max number of threads the algorithm can use.
Returns: the ImplicitRecommender wrapper of AlternatingLeastSquares.
175def create_bpr(name: str, params: Dict[str, Any], **kwargs) -> ImplicitRecommender: 176 """Create the BayesianPersonalizedRanking recommender. 177 178 Args: 179 name: the name of the algorithm. 180 params: containing the following name-value pairs: 181 factors(int): the number of latent factors to compute. 182 learning_rate(float): the learning rate to apply for SGD updates during training. 183 regularization(float): the regularization factor to use. 184 iterations(int): the number of training epochs to use when fitting the data. 185 verify_negative_samples(bool): when sampling negative items, check if the randomly 186 picked negative item has actually been liked by the user. This check increases 187 the time needed to train but usually leads to better predictions. 188 random_seed(int): the random seed or None for the current time as seed. 189 190 Keyword Args: 191 num_threads(int): the max number of threads the algorithm can use. 192 193 Returns: 194 the ImplicitRecommender wrapper of BayesianPersonalizedRanking. 195 """ 196 if params['random_seed'] is None: 197 params['random_seed'] = int(time.time()) 198 199 algo = BayesianPersonalizedRanking( 200 factors=params['factors'], 201 learning_rate=params['learning_rate'], 202 regularization=params['regularization'], 203 dtype=np.float32, 204 iterations=params['iterations'], 205 num_threads=kwargs['num_threads'], 206 verify_negative_samples=params['verify_negative_samples'], 207 random_state=params['random_seed'] 208 ) 209 210 return ImplicitRecommender(algo, name, params, **kwargs)
Create the BayesianPersonalizedRanking recommender.
Args: name: the name of the algorithm. params: containing the following name-value pairs: factors(int): the number of latent factors to compute. learning_rate(float): the learning rate to apply for SGD updates during training. regularization(float): the regularization factor to use. iterations(int): the number of training epochs to use when fitting the data. verify_negative_samples(bool): when sampling negative items, check if the randomly picked negative item has actually been liked by the user. This check increases the time needed to train but usually leads to better predictions. random_seed(int): the random seed or None for the current time as seed.
Keyword Args: num_threads(int): the max number of threads the algorithm can use.
Returns: the ImplicitRecommender wrapper of BayesianPersonalizedRanking.
213def create_lmf(name: str, params: Dict[str, Any], **kwargs) -> ImplicitRecommender: 214 """Create the LogisticMatrixFactorization recommender. 215 216 Args: 217 name: the name of the algorithm. 218 params: containing the following name-value pairs: 219 factors(int): the number of latent factors to compute. 220 learning_rate(float): the learning rate to apply for updates during training. 221 regularization(float): the regularization factor to use. 222 iterations(int): the number of training epochs to use when fitting the data. 223 neg_prop(int): the proportion of negative samples. 224 random_seed(int): the random seed or None for the current time as seed. 225 226 Keyword Args: 227 num_threads(int): the max number of threads the algorithm can use. 228 229 Returns: 230 the ImplicitRecommender wrapper of LogisticMatrixFactorization. 231 """ 232 if params['random_seed'] is None: 233 params['random_seed'] = int(time.time()) 234 235 algo = LogisticMatrixFactorization( 236 factors=params['factors'], 237 learning_rate=params['learning_rate'], 238 regularization=params['regularization'], 239 dtype=np.float32, 240 iterations=params['iterations'], 241 neg_prop=params['neg_prop'], 242 num_threads=kwargs['num_threads'], 243 random_state=params['random_seed'] 244 ) 245 246 return ImplicitRecommender(algo, name, params, **kwargs)
Create the LogisticMatrixFactorization recommender.
Args: name: the name of the algorithm. params: containing the following name-value pairs: factors(int): the number of latent factors to compute. learning_rate(float): the learning rate to apply for updates during training. regularization(float): the regularization factor to use. iterations(int): the number of training epochs to use when fitting the data. neg_prop(int): the proportion of negative samples. random_seed(int): the random seed or None for the current time as seed.
Keyword Args: num_threads(int): the max number of threads the algorithm can use.
Returns: the ImplicitRecommender wrapper of LogisticMatrixFactorization.