본문 바로가기

Learning/Recommendation System

04-4. Matrix Factorization 기반 평점 예측

기존 CF : Neighborhood Methods -> 유사한 item, user를 추천해주는 concept

Latent Factor Methods -> 내재된 특성을 이용한 추천

Latent Factor Model

User와 Item을 추상화된 동일한 특질(latent factor)을 이용하여 표현

    - 영화 : comedy 요소, 액션, 교육적 요소 등등이 얼마나 많은지 등

    - 사용자 : comedy 요소, 액션, 교육적 요소 등등을 얼마나 중요하게 생각하는지 등

[방식]

- Matrix Factorization(eg. SVD)

- Probabilistic Latent Semantic Analysis(PLSA)

- Latent Dirichlet Allocation(LDA)

- Neural Networks

 

Matrix Factorization

- User와 Item 모두 추상 공간에서의 벡터로 표현

- 추상 벡터에서의 유사도를 이용하여 추천 결과 생성

- User-Item 평점 데이터를 이용하여 추상 공간 도출

- User u의 Item i에 대한 평점 추정

 

* Matrix Factorization을 이용한 추천

  - 추상화된 Item Vector를 이용한 Item-based CF

  - 추상화된 UserVector를 이용한 User-based CF

  - 추정 값을 이용한 추천

 

* Matrix Factorization Methods

  - Singular Valse Decomposition(SVD)

  - Gradient Descent

  - Alternating Least Squares

차원축소 : 특이값이 큰 순서 중 K개 선택

 


실습
Read Data: movies and ratings
movies = pd.read_csv('movielens/movies_w_imgurl.csv')

ratings = pd.read_csv('ratings-9_1.csv')

train = ratings[ratings['type'] == 'train'][['userId', 'movieId', 'rating']]
test = ratings[ratings['type'] == 'test'][['userId', 'movieId', 'rating']]

 

Convert Ratings to User-Item Sparse Matrix - 평점 행렬 생성

- Create Index to Id Maps

movieIds = train.movieId.unique()

movieIdToIndex = {}
indexToMovieId = {}

colIdx = 0

for movieId in movieIds:
    movieIdToIndex[movieId] = colIdx
    indexToMovieId[colIdx] = movieId
    colIdx += 1
    
userIds = train.userId.unique()

userIdToIndex = {}
indexToUserId = {}

rowIdx = 0

for userId in userIds:
    userIdToIndex[userId] = rowIdx
    indexToUserId[rowIdx] = userId
    rowIdx += 1

 

- Create User-Item Sparse Matrix

rows = []
cols = []
vals = []

for row in train.itertuples():
    rows.append(userIdToIndex[row.userId])
    cols.append(movieIdToIndex[row.movieId])
    vals.append(row.rating)

coomat = coo_matrix((vals, (rows, cols)), shape=(rowIdx, colIdx))

matrix = coomat.todense()
matrix.shape

 

Sigular Value Decomposition
U, s, V = LA.svd(coomat.toarray(), full_matrices = False)

- Define User and Item Feature Matrix

dim = 100
sqrtS = sqrtm(np.diag(s[0:dim]))

userFeatures = np.matmul(U.compress(np.ones(dim), axis=1), sqrtS)
itemFeatures = np.matmul(V.T.compress(np.ones(dim), axis=1), sqrtS.T)

itemFeatures.shape

- Compute Item Similarity Matirixes

itemNorms = LA.norm(itemFeatures, ord = 2, axis=1)
normalizedItemFeatures = np.divide(itemFeatures.T, itemNorms).T
itemSims = pd.DataFrame(data = np.matmul(normalizedItemFeatures, normalizedItemFeatures.T), index = movieIds, columns=movieIds)
itemSims

- Check Samples

movieIdx = 6

rels = itemSims.iloc[movieIdx,:].sort_values(ascending=False).head(6)[1:]

displayMovies(movies, [indexToMovieId[movieIdx]])
displayMovies(movies, rels.index, rels.values)

 

User Rating Prediction
userId = 33
userRatings = train[train['userId'] == userId][['movieId', 'rating']] 
userRatings

- Predict Ratings

recSimSums = itemSims.loc[userRatings['movieId'].values, :].sum().values
recWeightedRatingSums = np.matmul(itemSims.loc[userRatings['movieId'].values, :].T.values, userRatings['rating'].values)
recItemRatings = pd.DataFrame(data = np.divide(recWeightedRatingSums, recSimSums), index=itemSims.index)
recItemRatings.columns = ['pred']
recItemRatings

- Compute Errors (MAE, RMSE)

userTestRatings = pd.DataFrame(data=test[test['userId'] == userId])
temp = userTestRatings.join(recItemRatings.loc[userTestRatings['movieId']], on='movieId')
mae = getMAE(temp['rating'], temp['pred'])
rmse = getRMSE(temp['rating'], temp['pred'])

print(f"MAE : {mae:.4f}")
print(f"RMSE: {rmse:.4f}")

- Compare Logs and Recommendations

logs = userRatings.sort_values(by='rating', ascending=False).head(20)
recs = recItemRatings.sort_values(by='pred', ascending=False).head(20)

print("logs")
displayMovies(movies, logs['movieId'].values, logs['rating'].values)

print("recs")
displayMovies(movies, recs.index, recs['pred'].values)

출처 : The RED : 현실 데이터를 활용한 추천시스템 구현 A to Z by 번개장터 CTO 이동주

링크 : https://fastcampus.app/course-detail/205535