사용자-사용자 협업필터링¶

정규화 없는 협업필터링¶

사용자-사용자 연관성 행렬을 완성하시오. 체크를 위해서, 사용자 1648과 사용자 5136의 연관성은 0.40298, 그리고 사용자 918과 사용자 2824의 연관성은 -0.31706이다. 사용자들 사이의 연관성은 -1에서 1사이이다.
사용자 3867과 사용자 89의 이웃을 각각 5명씩 구하시오. 만약 타겟 사용자가 3712라면 이웃들은: 2724 (연관성: 0.46291), 3867 (연관성: 0.400275), 5062 (연관성: 0.247693), 442 (연관성: 0.22713), 3853 (연관성: 0.19366)이다.
이웃 사용자 다섯명의 평점들을 통해 사용자 3867과 사용자 89의 모든 영화들에 대한 예상평점을 구하시고.
각 사용자들에 대한 예상평점을 구하고, 상위 3개의 영화들을 구하시오.

### 사용자-사용자 행렬 먼저 pandas라이브러리와 행렬 데이터를 가져옵니다.

import pandas as pd
user_movie = pd.read_csv('user.csv', delimiter="\t", index_col=0)
user_movie.head()

movie_user = pd.read_csv('movie.csv', delimiter='\t', index_col=0)
movie_user.head()

사용자를 Pivot으로 피어슨 연관성 행렬을 구합니다. Pandas라이브러리의 corr()기능을 사용하면 간단하게 구할 수 있습니다.

corr = movie_user.corr('pearson')
corr

사용자 3867과 사용자 89의 연관성¶

연관성 행렬에서 문제의 각 사용자들을 indexing을 하면 연관성값이 나옵니다.

corr['1648']['5136'], corr['918']['2824']

(0.4029801884569963, -0.3170632437371138)

이웃5명 구하기¶

먼저 사용자 3712의 답이 나와있으니 구해보도록 하겠습니다.
각 사용자들에 대한 값을 정렬해준 뒤 상위 다섯개를 구해옵니다.
(index 0번은 사용자 3712 자신과의 연관성이고 값은 1이니 제외시켜줍니다.)

top_neigh = pd.DataFrame({c : corr[c].sort_values(ascending=False)[1:6].index.values for c in corr}, index=[1, 2, 3, 4, 5]).T
top_neigh.loc['3712']

1    2824
2    3867
3    5062
4     442
5    3853
Name: 3712, dtype: object

다음은 정렬해준 행렬에서 사용자 3867과 사용자 89의 상위 5명의 이웃들입니다.

print(top_neigh.loc['3867'])
print(top_neigh.loc['89'])

1    2492
2    3853
3    2486
4    3712
5    2288
Name: 3867, dtype: object
1    4809
2    5136
3     860
4    5062
5    3525
Name: 89, dtype: object

각 사용자에 대한 예측값 찾기¶

각 사용자의 상위 3개로 예상되는 영화들을 찾아보겠습니다.
아래 print_predictions라는 함수를 작성했습니다. 먼저 해당 사용자의 이웃들을 구하고, 연관성들, 그리고 평점들을 구합니다. eliminater는 행렬에서 nan값들이 값에 영향을 주지 않도록, 값이 있는것들은 1을 곱해주고 값이 nan인 것들인 0을 곱해주어 영향을 주지 않게 하는 행렬입니다. 마지막으로 예측값을 구합니다. 예측값은 이전 포스팅들에서 나온 가중치가 없는 수식을 구현한 것입니다.

import numpy as np

def print_predictions(current_user, n=3):
    neighbors = top_neigh.loc[current_user].astype('int64').values
    correlations = np.asarray([corr[current_user][str(neighbor)] for neighbor in neighbors])
    ratings = user_movie.loc[neighbors]
    eliminater = ratings.T.where(np.isnan(ratings.T), 1, False).where(~np.isnan(ratings.T), 0, False)
    predictions = (ratings.T * correlations).T.sum() /  (eliminater * correlations).T.sum()
    print(predictions.sort_values(ascending=False)[:n])

아래는 위 함수를 사용한 사용자 3867과 사용자 89의 상위 3개의 예측값입니다.
그리고 3712와 3525의 상위 3개 예측값도 구해보았습니다.

print_predictions('3867')
print()
print_predictions('89')

1891: Star Wars: Episode V - The Empire Strikes Back (1980)    4.760291
155: The Dark Knight (2008)                                    4.551454
122: The Lord of the Rings: The Return of the King (2003)      4.507637
dtype: float64

238: The Godfather (1972)               4.894124
278: The Shawshank Redemption (1994)    4.882194
807: Seven (a.k.a. Se7en) (1995)        4.774093
dtype: float64

print_predictions('3712')

641: Requiem for a Dream (2000)    5.000000
603: The Matrix (1999)             4.855924
105: Back to the Future (1985)     4.739173
dtype: float64

print_predictions('3525')

38: Eternal Sunshine of the Spotless Mind (2004)    5.0
238: The Godfather (1972)                           5.0
194: Amelie (2001)                                  5.0
dtype: float64

정규화를 더한 협업 필터링¶

아래는 위의 수식에서 정규화를 더한 함수입니다. 정규화된 수식 또한 이전 포스팅에 있는 수식을 사용하였습니다.

def print_normalized(current_user, n=3):
    neighbors = top_neigh.loc[current_user].astype('int64').values
    correlations = np.asarray([corr[current_user][str(neighbor)] for neighbor in neighbors])
    ratings = user_movie.loc[neighbors]
    eliminater = ratings.T.where(np.isnan(ratings.T), 1, False).where(~np.isnan(ratings.T), 0, False)
    norm = ((ratings.T - ratings.T.mean()) * correlations).T.sum() /  (eliminater * correlations).T.sum()
    curr_ratings_mean = user_movie.loc[int(current_user)].mean()
    predictions = curr_ratings_mean + norm
    print(predictions.sort_values(ascending=False)[:n])

위 함수 역시 사용한 사용자 3867과 사용자 89의 상위 3개의 예측값을 구해보았습니다. 그리고 3712와 3525의 상위 3개 예측값도 구해보았습니다.

print_normalized('3867')
print()
print_normalized('89')

1891: Star Wars: Episode V - The Empire Strikes Back (1980)    5.245509
155: The Dark Knight (2008)                                    4.856770
77: Memento (2000)                                             4.777803
dtype: float64

238: The Godfather (1972)               5.322015
278: The Shawshank Redemption (1994)    5.261424
275: Fargo (1996)                       5.241111
dtype: float64

print_normalized('3712')

641: Requiem for a Dream (2000)    5.900000
603: The Matrix (1999)             5.545567
105: Back to the Future (1985)     5.500585
dtype: float64

print_normalized('3525')

238: The Godfather (1972)                4.759504
424: Schindler's List (1993)             4.663251
134: O Brother Where Art Thou? (2000)    4.585337
dtype: float64

	11: Star Wars: Episode IV - A New Hope (1977)	12: Finding Nemo (2003)	13: Forrest Gump (1994)	14: American Beauty (1999)	22: Pirates of the Caribbean: The Curse of the Black Pearl (2003)	24: Kill Bill: Vol. 1 (2003)	38: Eternal Sunshine of the Spotless Mind (2004)	63: Twelve Monkeys (a.k.a. 12 Monkeys) (1995)	77: Memento (2000)	85: Raiders of the Lost Ark (Indiana Jones and the Raiders of the Lost Ark) (1981)	...	8467: Dumb & Dumber (1994)	8587: The Lion King (1994)	9331: Clear and Present Danger (1994)	9741: Unbreakable (2000)	9802: The Rock (1996)	9806: The Incredibles (2004)	10020: Beauty and the Beast (1991)	36657: X-Men (2000)	36658: X2: X-Men United (2003)	36955: True Lies (1994)
1648	NaN	NaN	NaN	NaN	4.0	3.0	NaN	NaN	NaN	NaN	...	NaN	4.0	NaN	NaN	5.0	3.5	3.0	NaN	3.5	NaN
5136	4.5	5.0	5.0	4.0	5.0	5.0	5.0	3.0	NaN	5.0	...	1.0	5.0	NaN	NaN	NaN	5.0	5.0	4.5	4.0	NaN
918	5.0	5.0	4.5	NaN	3.0	NaN	5.0	NaN	5.0	NaN	...	NaN	5.0	NaN	NaN	NaN	3.5	NaN	NaN	NaN	NaN
2824	4.5	NaN	5.0	NaN	4.5	4.0	NaN	NaN	5.0	NaN	...	NaN	3.5	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3867	4.0	4.0	4.5	NaN	4.0	3.0	NaN	NaN	NaN	4.5	...	1.0	4.0	NaN	NaN	NaN	3.0	4.0	4.0	3.5	3.0

	1648	5136	918	2824	3867	860	3712	2968	3525	4323	...	3556	5261	2492	5062	2486	4942	2267	4809	3853	2288
11: Star Wars: Episode IV - A New Hope (1977)	NaN	4.5	5.0	4.5	4.0	4.0	NaN	5.0	4.0	5.0	...	4.0	NaN	4.5	4.0	3.5	NaN	NaN	NaN	NaN	NaN
12: Finding Nemo (2003)	NaN	5.0	5.0	NaN	4.0	4.0	4.5	4.5	4.0	5.0	...	4.0	NaN	3.5	4.0	2.0	3.5	NaN	NaN	NaN	3.5
13: Forrest Gump (1994)	NaN	5.0	4.5	5.0	4.5	4.5	NaN	5.0	4.5	5.0	...	4.0	5.0	3.5	4.5	4.5	4.0	3.5	4.5	3.5	3.5
14: American Beauty (1999)	NaN	4.0	NaN	NaN	NaN	NaN	4.5	2.0	3.5	5.0	...	4.0	NaN	3.5	4.5	3.5	4.0	NaN	3.5	NaN	NaN
22: Pirates of the Caribbean: The Curse of the Black Pearl (2003)	4.0	5.0	3.0	4.5	4.0	2.5	NaN	5.0	3.0	4.0	...	3.0	1.5	4.0	4.0	2.5	3.5	NaN	5.0	NaN	3.5

	1648	5136	918	2824	3867	860	3712	2968	3525	4323	...	3556	5261	2492	5062	2486	4942	2267	4809	3853	2288
1648	1.000000	0.402980	-0.142206	0.517620	0.300200	0.480537	-0.312412	0.383348	0.092775	0.098191	...	-0.191988	0.493008	0.360644	0.551089	0.002544	0.116653	-0.429183	0.394371	-0.304422	0.245048
5136	0.402980	1.000000	0.118979	0.057916	0.341734	0.241377	0.131398	0.206695	0.360056	0.033642	...	0.488607	0.328120	0.422236	0.226635	0.305803	0.037769	0.240728	0.411676	0.189234	0.390067
918	-0.142206	0.118979	1.000000	-0.317063	0.294558	0.468333	0.092037	-0.045854	0.367568	-0.035394	...	0.373226	0.470972	0.069956	-0.054762	0.133812	0.015169	-0.273096	0.082528	0.667168	0.119162
2824	0.517620	0.057916	-0.317063	1.000000	-0.060913	-0.008066	0.462910	0.214760	0.169907	0.119350	...	-0.201275	0.228341	0.238700	0.259660	0.247097	0.149247	-0.361466	0.474974	-0.262073	0.166999
3867	0.300200	0.341734	0.294558	-0.060913	1.000000	0.282497	0.400275	0.264249	0.125193	-0.333602	...	0.174085	0.297977	0.476683	0.293868	0.438992	-0.162818	-0.295966	0.054518	0.464110	0.379856
860	0.480537	0.241377	0.468333	-0.008066	0.282497	1.000000	0.171151	0.072927	0.387133	0.146158	...	0.347470	0.399436	0.207314	0.311363	0.276306	0.079698	0.212991	0.165608	0.162314	0.279677
3712	-0.312412	0.131398	0.092037	0.462910	0.400275	0.171151	1.000000	0.065015	0.095623	-0.292501	...	0.016406	-0.240764	-0.115254	0.247693	0.166913	0.146011	0.009685	-0.451625	0.193660	0.113266
2968	0.383348	0.206695	-0.045854	0.214760	0.264249	0.072927	0.065015	1.000000	0.028529	-0.073252	...	0.049132	-0.009041	0.203613	0.033301	0.137982	0.070602	0.109452	-0.083562	-0.089317	0.229219
3525	0.092775	0.360056	0.367568	0.169907	0.125193	0.387133	0.095623	0.028529	1.000000	0.210879	...	0.475711	0.306957	0.136343	0.301750	0.143414	0.056100	0.179908	0.284648	0.170757	0.193131
4323	0.098191	0.033642	-0.035394	0.119350	-0.333602	0.146158	-0.292501	-0.073252	0.210879	1.000000	...	-0.040606	0.155045	-0.204164	0.263654	0.167198	-0.084592	0.315712	0.085673	-0.109892	-0.279385
3617	-0.041734	0.138548	0.011316	0.282756	-0.066576	0.219929	-0.038900	0.312573	0.243283	0.022907	...	0.079571	-0.165628	0.053306	0.007810	-0.244637	-0.030709	-0.070660	0.268595	-0.143503	0.013284
4360	0.264425	0.152948	-0.231660	-0.005326	-0.093801	-0.005316	-0.364324	0.053024	-0.086061	0.252529	...	0.072993	0.161882	-0.000311	-0.077598	0.039389	-0.156091	0.408592	0.179652	0.280402	0.040328
2756	0.261268	0.148882	0.148431	-0.087747	0.310104	0.323499	0.126899	0.143347	0.058365	-0.221789	...	0.101784	-0.140953	0.150476	0.024572	-0.031130	-0.133768	0.142067	0.015140	0.181210	-0.005935
89	0.464610	0.562449	0.267029	0.241567	-0.003878	0.539066	-0.051320	-0.118085	0.475495	0.258866	...	0.326774	0.291476	0.372676	0.525990	0.123380	0.178088	0.088600	0.668516	0.179680	0.155869
442	0.022308	0.414438	0.304139	0.116532	0.113581	0.181276	0.227130	0.100841	0.201734	-0.024337	...	0.251660	0.046822	0.218575	0.150431	0.280392	0.038378	0.262520	0.064179	-0.023439	0.257864
3556	-0.191988	0.488607	0.373226	-0.201275	0.174085	0.347470	0.016406	0.049132	0.475711	-0.040606	...	1.000000	0.086665	0.158739	-0.016164	0.256537	-0.055137	0.503247	0.100277	0.423225	0.222458
5261	0.493008	0.328120	0.470972	0.228341	0.297977	0.399436	-0.240764	-0.009041	0.306957	0.155045	...	0.086665	1.000000	0.149165	0.372177	0.198086	0.270928	-0.393376	0.455274	0.039050	0.374264
2492	0.360644	0.422236	0.069956	0.238700	0.476683	0.207314	-0.115254	0.203613	0.136343	-0.204164	...	0.158739	0.149165	1.000000	0.276883	0.158002	0.035825	-0.345495	0.449025	0.289410	0.169239
5062	0.551089	0.226635	-0.054762	0.259660	0.293868	0.311363	0.247693	0.033301	0.301750	0.263654	...	-0.016164	0.372177	0.276883	1.000000	0.403809	0.028521	0.107821	0.428055	0.407044	0.278868
2486	0.002544	0.305803	0.133812	0.247097	0.438992	0.276306	0.166913	0.137982	0.143414	0.167198	...	0.256537	0.198086	0.158002	0.403809	1.000000	-0.068421	0.173797	0.105761	0.472361	0.257462
4942	0.116653	0.037769	0.015169	0.149247	-0.162818	0.079698	0.146011	0.070602	0.056100	-0.084592	...	-0.055137	0.270928	0.035825	0.028521	-0.068421	1.000000	-0.346386	-0.004638	0.143672	0.074476
2267	-0.429183	0.240728	-0.273096	-0.361466	-0.295966	0.212991	0.009685	0.109452	0.179908	0.315712	...	0.503247	-0.393376	-0.345495	0.107821	0.173797	-0.346386	1.000000	-0.339845	0.165960	0.156341
4809	0.394371	0.411676	0.082528	0.474974	0.054518	0.165608	-0.451625	-0.083562	0.284648	0.085673	...	0.100277	0.455274	0.449025	0.428055	0.105761	-0.004638	-0.339845	1.000000	0.542192	0.435520
3853	-0.304422	0.189234	0.667168	-0.262073	0.464110	0.162314	0.193660	-0.089317	0.170757	-0.109892	...	0.423225	0.039050	0.289410	0.407044	0.472361	0.143672	0.165960	0.542192	1.000000	0.080403
2288	0.245048	0.390067	0.119162	0.166999	0.379856	0.279677	0.113266	0.229219	0.193131	-0.279385	...	0.222458	0.374264	0.169239	0.278868	0.257462	0.074476	0.156341	0.435520	0.080403	1.000000

추천시스템 18 - 경사하강법 (0)	2019.08.05
추천시스템 17 - 행렬 분해 (0)	2019.07.30
추천시스템 15 - 단항의 아이템 추천 (0)	2019.07.07
추천시스템 14 - 아이템-아이템 협업필터링 (0)	2019.07.07
추천시스템 13 - 사용자-사용자 협업필터링 (0)	2019.07.01

금융덕후

추천시스템 16 - 사용자-사용자 협업필터링 코드예제

추천시스템

사용자-사용자 협업필터링¶

정규화 없는 협업필터링¶

사용자 3867과 사용자 89의 연관성¶

이웃5명 구하기¶

각 사용자에 대한 예측값 찾기¶

정규화를 더한 협업 필터링¶

'아카이브 > 추천시스템(2019)' 카테고리의 다른 글

'아카이브/추천시스템(2019)'의 다른글

티스토리툴바

추천시스템 16 - 사용자-사용자 협업필터링 코드예제

추천시스템

사용자-사용자 협업필터링¶

정규화 없는 협업필터링¶

사용자 3867과 사용자 89의 연관성¶

이웃5명 구하기¶

각 사용자에 대한 예측값 찾기¶

정규화를 더한 협업 필터링¶

'아카이브 > 추천시스템(2019)' 카테고리의 다른 글

'아카이브/추천시스템(2019)'의 다른글

관련글

티스토리툴바