기존 CF : Neighborhood Methods -> 유사한 item, user를 추천해주는 concept
Latent Factor Methods -> 내재된 특성을 이용한 추천
Latent Factor Model
User와 Item을 추상화된 동일한 특질(latent factor)을 이용하여 표현
- 영화 : comedy 요소, 액션, 교육적 요소 등등이 얼마나 많은지 등
- 사용자 : comedy 요소, 액션, 교육적 요소 등등을 얼마나 중요하게 생각하는지 등
[방식]
- Matrix Factorization(eg. SVD)
- Probabilistic Latent Semantic Analysis(PLSA)
- Latent Dirichlet Allocation(LDA)
- Neural Networks
Matrix Factorization
- User와 Item 모두 추상 공간에서의 벡터로 표현
- 추상 벡터에서의 유사도를 이용하여 추천 결과 생성
- User-Item 평점 데이터를 이용하여 추상 공간 도출
- User u의 Item i에 대한 평점 추정
* Matrix Factorization을 이용한 추천
- 추상화된 Item Vector를 이용한 Item-based CF
- 추상화된 UserVector를 이용한 User-based CF
- 추정 값을 이용한 추천
* Matrix Factorization Methods
- Singular Valse Decomposition(SVD)
- Gradient Descent
- Alternating Least Squares
실습
Read Data: movies and ratings
movies = pd.read_csv('movielens/movies_w_imgurl.csv')
ratings = pd.read_csv('ratings-9_1.csv')
train = ratings[ratings['type'] == 'train'][['userId', 'movieId', 'rating']]
test = ratings[ratings['type'] == 'test'][['userId', 'movieId', 'rating']]
Convert Ratings to User-Item Sparse Matrix - 평점 행렬 생성
- Create Index to Id Maps
movieIds = train.movieId.unique()
movieIdToIndex = {}
indexToMovieId = {}
colIdx = 0
for movieId in movieIds:
movieIdToIndex[movieId] = colIdx
indexToMovieId[colIdx] = movieId
colIdx += 1
userIds = train.userId.unique()
userIdToIndex = {}
indexToUserId = {}
rowIdx = 0
for userId in userIds:
userIdToIndex[userId] = rowIdx
indexToUserId[rowIdx] = userId
rowIdx += 1
- Create User-Item Sparse Matrix
rows = []
cols = []
vals = []
for row in train.itertuples():
rows.append(userIdToIndex[row.userId])
cols.append(movieIdToIndex[row.movieId])
vals.append(row.rating)
coomat = coo_matrix((vals, (rows, cols)), shape=(rowIdx, colIdx))
matrix = coomat.todense()
matrix.shape
Sigular Value Decomposition
U, s, V = LA.svd(coomat.toarray(), full_matrices = False)
- Define User and Item Feature Matrix
dim = 100
sqrtS = sqrtm(np.diag(s[0:dim]))
userFeatures = np.matmul(U.compress(np.ones(dim), axis=1), sqrtS)
itemFeatures = np.matmul(V.T.compress(np.ones(dim), axis=1), sqrtS.T)
itemFeatures.shape
- Compute Item Similarity Matirixes
itemNorms = LA.norm(itemFeatures, ord = 2, axis=1)
normalizedItemFeatures = np.divide(itemFeatures.T, itemNorms).T
itemSims = pd.DataFrame(data = np.matmul(normalizedItemFeatures, normalizedItemFeatures.T), index = movieIds, columns=movieIds)
itemSims
- Check Samples
movieIdx = 6
rels = itemSims.iloc[movieIdx,:].sort_values(ascending=False).head(6)[1:]
displayMovies(movies, [indexToMovieId[movieIdx]])
displayMovies(movies, rels.index, rels.values)
User Rating Prediction
userId = 33
userRatings = train[train['userId'] == userId][['movieId', 'rating']]
userRatings
- Predict Ratings
recSimSums = itemSims.loc[userRatings['movieId'].values, :].sum().values
recWeightedRatingSums = np.matmul(itemSims.loc[userRatings['movieId'].values, :].T.values, userRatings['rating'].values)
recItemRatings = pd.DataFrame(data = np.divide(recWeightedRatingSums, recSimSums), index=itemSims.index)
recItemRatings.columns = ['pred']
recItemRatings
- Compute Errors (MAE, RMSE)
userTestRatings = pd.DataFrame(data=test[test['userId'] == userId])
temp = userTestRatings.join(recItemRatings.loc[userTestRatings['movieId']], on='movieId')
mae = getMAE(temp['rating'], temp['pred'])
rmse = getRMSE(temp['rating'], temp['pred'])
print(f"MAE : {mae:.4f}")
print(f"RMSE: {rmse:.4f}")
- Compare Logs and Recommendations
logs = userRatings.sort_values(by='rating', ascending=False).head(20)
recs = recItemRatings.sort_values(by='pred', ascending=False).head(20)
print("logs")
displayMovies(movies, logs['movieId'].values, logs['rating'].values)
print("recs")
displayMovies(movies, recs.index, recs['pred'].values)
출처 : The RED : 현실 데이터를 활용한 추천시스템 구현 A to Z by 번개장터 CTO 이동주
'Learning > Recommendation System' 카테고리의 다른 글
04-3. Linear Regression 모델 기반 평점 예측 (0) | 2022.06.22 |
---|---|
04-2. 그래프 기반 추천(Graph-based) (0) | 2022.06.22 |
04-1. 고급 추천(Beyond Accuracy) (0) | 2022.06.13 |
03-4. Personalized Recommendation (0) | 2022.06.02 |
03-3. Related Recommendation (0) | 2022.06.02 |