Learning/Recommendation System
04-3. Linear Regression 모델 기반 평점 예측
눈떠보니 월요일
2022. 6. 22. 10:23
"CBF 기반 영화 평점 예측 concept : 사용자의 아이템 평점은 아이템의 속성에 의해 결정된다."
Linear Regression Model
Model Parameter : w, b
Model 학습
: 학습 데이터 상에서의 예측 에러를 최소화하는 parameter를 탐색
Loss Function을 최소화
Regularization
: 에러만을 최소화하는 경우 과적합이 발생할 수 있으므로 Cost Fuction에 Regularization term을 추가하여 과적합 방지
-> 복잡도가 낮도록 Cost Function을 설정
Objective Fuction
Optimizer
실습
Movie Feature Matrix
movies = pd.read_csv('movielens/movies_w_imgurl.csv')
movieGenres = pd.DataFrame(data=movies['genres'].str.split('|').apply(pd.Series, 1).stack(), columns=['genre'])
movieGenres.index = movieGenres.index.droplevel(1)
genres = movieGenres.groupby('genre').count()
movieWeights = pd.DataFrame(data=movies['movieId'])
for genre in genres.index:
df = pd.DataFrame(data = movieGenres[movieGenres['genre'] == genre], columns=[genre])
df[genre] = 1
movieWeights = movieWeights.join(df, on='movieId')
movieWeights.fillna(0, inplace=True)
movieWeights
Make Regression Model for Users
ratings = pd.read_csv('ratings-9_1.csv')
train = ratings[ratings['type'] == 'train'][['userId', 'movieId', 'rating']]
test = ratings[ratings['type'] == 'test'][['userId', 'movieId', 'rating']]
userId = 33
userRatings = train[train['userId'] == userId][['movieId', 'rating']]
userRatings = userRatings.sort_values(by='movieId')
userLRTrain = movieWeights[movieWeights['movieId'].isin(userRatings['movieId'].values)].sort_values(by=['movieId'])
X = userLRTrain.iloc[:, 1:].values
Y = userRatings['rating'].values
- Linear Regression
from sklearn import linear_model as lm
reg = lm.LinearRegression()
reg.fit(X, Y)
print(reg.coef_)
print(reg.intercept_) # bias
userTestRatings = pd.DataFrame(test[test['userId'] == userId])
pred = reg.predict(movieWeights[movieWeights['movieId'].isin(userTestRatings['movieId'].values)].iloc[:, 1:].values)
userTestRatings['pred'] = pd.Series(data=pred, index = userTestRatings.index)
mae = getMAE(userTestRatings['rating'], userTestRatings['pred'])
rmse = getRMSE(userTestRatings['rating'], userTestRatings['pred'])
print(f"MAE : {mae:.4f}")
print(f"RMSE: {rmse:.4f}")