Learning/Recommendation System

04-3. Linear Regression 모델 기반 평점 예측

눈떠보니 월요일 2022. 6. 22. 10:23

"CBF 기반 영화 평점 예측 concept : 사용자의 아이템 평점은 아이템의 속성에 의해 결정된다."

Linear Regression Model

Model Parameter : w, b

 

 

 

 

 

 

 

Model 학습

: 학습 데이터 상에서의 예측 에러를 최소화하는 parameter를 탐색

Loss Function을 최소화

 

Regularization

: 에러만을 최소화하는 경우 과적합이 발생할 수 있으므로 Cost Fuction에 Regularization term을 추가하여 과적합 방지

-> 복잡도가 낮도록 Cost Function을 설정

 

Objective Fuction

Optimizer


실습
Movie Feature Matrix
movies = pd.read_csv('movielens/movies_w_imgurl.csv')

movieGenres = pd.DataFrame(data=movies['genres'].str.split('|').apply(pd.Series, 1).stack(), columns=['genre'])
movieGenres.index = movieGenres.index.droplevel(1)

genres = movieGenres.groupby('genre').count()

movieWeights = pd.DataFrame(data=movies['movieId'])

for genre in genres.index:
    df = pd.DataFrame(data = movieGenres[movieGenres['genre'] == genre], columns=[genre])
    df[genre] = 1
    movieWeights = movieWeights.join(df, on='movieId')

movieWeights.fillna(0, inplace=True)

movieWeights

 

Make Regression Model for Users
ratings = pd.read_csv('ratings-9_1.csv')

train = ratings[ratings['type'] == 'train'][['userId', 'movieId', 'rating']]
test = ratings[ratings['type'] == 'test'][['userId', 'movieId', 'rating']]

userId = 33

userRatings = train[train['userId'] == userId][['movieId', 'rating']] 
userRatings = userRatings.sort_values(by='movieId')

userLRTrain = movieWeights[movieWeights['movieId'].isin(userRatings['movieId'].values)].sort_values(by=['movieId'])

X = userLRTrain.iloc[:, 1:].values
Y = userRatings['rating'].values

 

- Linear Regression

from sklearn import linear_model as lm
reg = lm.LinearRegression()
reg.fit(X, Y)

print(reg.coef_)
print(reg.intercept_) # bias

userTestRatings = pd.DataFrame(test[test['userId'] == userId])

pred = reg.predict(movieWeights[movieWeights['movieId'].isin(userTestRatings['movieId'].values)].iloc[:, 1:].values)

userTestRatings['pred'] = pd.Series(data=pred, index = userTestRatings.index)

mae = getMAE(userTestRatings['rating'], userTestRatings['pred'])
rmse = getRMSE(userTestRatings['rating'], userTestRatings['pred'])

print(f"MAE : {mae:.4f}")
print(f"RMSE: {rmse:.4f}")