Mahout鏋勫徐鍥句功鎺ㄨ崘绯荤粺
聽
鍓嶈█
鏈枃鏄疢ahout瀹炵幇鎺ㄨ崘绯荤粺鐨勫張涓€妗堜緥锛岀敤Mahout鏋勫缓鍥句功鎺ㄨ崘绯荤粺銆備笌涔嬪墠鐨勪袱绡囨枃绔狅紝鎬濊矾涓婇潰绫讳技锛屼晶閲嶇偣鍦ㄤ簬鍥句功鐨勫睘鎬у浣曞埄鐢ㄣ€傛湰鏂囩殑鏁版嵁鍦ㄨ嚜浜嶢mazon缃戠珯锛岀敱鐖櫕鎶撳彇鑾峰緱銆?/p>
鐩綍
- 椤圭洰鑳屾櫙
- 闇€姹傚垎鏋?/li>
- 鏁版嵁璇存槑
- 绠楁硶妯″瀷
- 绋嬪簭寮€鍙?/li>
1. 椤圭洰鑳屾櫙
Amazon鏄渶鏃╃殑鐢靛瓙鍟嗗姟缃戠珯涔嬩竴锛屼互缃戜笂鍥句功璧峰锛屾渶鍚庡彂灞曟垚涓洪煶鍍忥紝鐢靛瓙娑堣垂鍝侊紝娓告垙锛岀敓娲荤敤鍝佺瓑鐨勭患鍚堟€х數瀛愬晢鍔″钩鍙般€侫mazon鐨勬帹鑽愮郴缁燂紝鏄簰鑱旂綉涓婃渶鏃╃殑鍟嗗搧鎺ㄨ崘绯荤粺锛屽畠涓篈mazon甯︽潵浜嗚嚦灏?0%鐨勬祦閲忥紝鍜屽彲瑙傜殑閿€鍞埄娑︺€?/p>
濡備粖鎺ㄨ崘绯荤粺宸茬粡鎴愪负鐢靛瓙鍟嗗姟缃戠珯鐨勬爣閰嶏紝濡傛灉杩樻病鏈夋帹鑽愮郴缁熼兘涓嶅ソ鎰忔€濓紝璇磋嚜宸辨槸鍋氱數鍟嗙殑銆?/p>
2. 闇€姹傚垎鏋?/h2>
鎺ㄨ崘绯荤粺濡傛閲嶈锛屾垜浠簲璇ュ鏋滅悊瑙o紵
鎵撳紑Amazon鐨凪ahout In Action鍥句功椤甸潰锛?br style="margin: 0pt; padding: 0pt;">http://www.amazon.com/Mahout-Action-Sean-Owen/dp/1935182684/ref=pd_sim_b_1?ie=UTF8&refRID=0H4H2NSSR8F34R76E2TP
缃戦〉涓婄殑鍏冪礌锛?/p>
- 骞垮憡浣嶏細骞垮憡鍟嗘姇鏀惧箍鍛婄殑浣嶇疆锛岀綉绔欏彲浠ラ潬缃戠粶骞垮憡璧氶挶锛屼竴鑸槸缃戦〉鏈€濂界殑浣嶇疆銆?/li>
- 骞冲潎鍒嗭細鐢ㄦ埛瀵瑰浘涔︾殑鎵撳垎
- 鍏宠仈瑙勫垯锛氶€氳繃鍏宠仈瑙勫垯锛屾帹鑽愪綅
- 鍗忓悓杩囨护锛氶€氳繃鍩轰簬鐗╁搧鐨勫崗鍚岃繃婊ょ畻娉曠殑锛屾帹鑽愪綅
- 鍥句功灞炴€э細鍖呮嫭椤垫暟锛屽嚭鐗堢ぞ锛孖SBN锛岃瑷€绛?/li>
- 浣滆€呬粙缁嶏細鏈夊叧浣滆€呯殑浠嬬粛锛屽拰浣滆€呯殑鍏朵粬钁椾綔
- 鐢ㄦ埛璇勫垎锛氱敤鎴疯瘎鍒嗚涓?/li>
- 鐢ㄦ埛璇勮锛氱敤鎴疯瘎璁虹殑鍐呭
鍦ㄧ綉椤典笂锛屽叾浠栫殑鎺ㄨ崘浣嶏細
缁撳悎涓婇潰2寮犳埅鍥撅紝鎴戜滑涓嶉毦鍙戠幇锛屾帹鑽愬浜嶢mazon鐨勯噸瑕佹€с€傞櫎浜嗘渶鏄庢樉鐨勫箍鍛婁綅缁欎簡鑳界洿鎺ュ甫鏉ュ埄娑︾殑骞垮憡鍟嗭紝缃戦〉涓湁4澶勬帹鑽愪綅锛屽垎鍒粠涓嶅悓鐨勭淮搴︼紝鐢ㄤ笉鍚岀殑鎺ㄨ崘绠楁硶锛岀寽鐢ㄦ埛鍠滄鐨勫晢鍝併€?/p>
3. 鏁版嵁璇存槑
2涓暟鎹枃浠讹細
- rating.csv 锛氱敤鎴疯瘎鍒嗚涓烘暟鎹?/li>
- users.csv 锛氱敤鎴峰睘鎬ф暟鎹?/li>
1). book-ratings.csv
- 3鍒楁暟鎹細鐢ㄦ埛ID锛屽浘涔D, 鐢ㄦ埛瀵瑰浘涔︾殑璇勫垎
- 璁板綍鏁? 4000娆$殑鍥句功璇勫垎
- 鐢ㄦ埛鏁? 200涓?/li>
- 鍥句功鏁? 1000涓?/li>
- 璇勫垎锛?-10
鏁版嵁绀轰緥
1,565,3
1,807,2
1,201,1
1,557,9
1,987,10
1,59,5
1,305,6
1,153,3
1,139,7
1,875,5
1,722,10
2,977,4
2,806,3
2,654,8
2,21,8
2,662,5
2,437,6
2,576,3
2,141,8
2,311,4
2,101,3
2,540,9
2,87,3
2,65,8
2,501,6
2,710,5
2,331,9
2,542,4
2,757,9
2,590,7
2). users.csv
- 3鍒楁暟鎹細鐢ㄦ埛ID锛岀敤鎴锋€у埆锛岀敤鎴峰勾榫?/li>
- 鐢ㄦ埛鏁? 200涓?/li>
- 鐢ㄦ埛鎬у埆: M涓虹敺鎬э紝F涓哄コ鎬?/li>
- 鐢ㄦ埛骞撮緞: 11-80宀佷箣闂?/li>
鏁版嵁绀轰緥
1,M,40
2,M,27
3,M,41
4,F,43
5,F,16
6,M,36
7,F,36
8,F,46
9,M,50
10,M,21
11,F,11
12,M,42
13,F,40
14,F,28
15,M,25
16,M,68
17,M,53
18,F,69
19,F,48
20,F,56
21,F,36
4. 绠楁硶妯″瀷
鏈枃涓昏浠嬬粛Mahout鐨勫熀浜庣墿鍝佺殑鍗忓悓杩囨护妯″瀷锛屽叾浠栫殑绠楁硶妯″瀷灏嗕笉鍐嶈繖閲岃В閲娿€?/p>
閽堝涓婇潰鐨勬暟鎹紝鎴戝皢鐢?绉嶇畻娉曠粍鍚堣繘琛屾祴璇曪細鏈夊叧Mahout绠楁硶缁勫悎鐨勮缁嗚В閲婏紝璇峰弬鑰冩枃绔狅細浠庢簮浠g爜鍓栨瀽Mahout鎺ㄨ崘寮曟搸
7绉嶇畻娉曠粍鍚?/p>
- userCF1: EuclideanSimilarity+ NearestNUserNeighborhood+ GenericUserBasedRecommender
- userCF2: LogLikelihoodSimilarity+ NearestNUserNeighborhood+ GenericUserBasedRecommender
- userCF3: EuclideanSimilarity+ NearestNUserNeighborhood+ GenericBooleanPrefUserBasedRecommender
- itemCF1: EuclideanSimilarity + GenericItemBasedRecommender
- itemCF2: LogLikelihoodSimilarity + GenericItemBasedRecommender
- itemCF3: EuclideanSimilarity + GenericBooleanPrefItemBasedRecommender
- slopeOne锛歋lopeOneRecommender
瀵逛笂闈㈢殑绠楁硶杩涜绠楁硶璇勪及锛屾湁鍏充簬绠楁硶璇勪及鐨勮缁嗚В閲婏紝璇峰弬鑰冩枃绔狅細Mahout鎺ㄨ崘绠楁硶API璇﹁В
- 鏌ュ噯鐜?
- 鍙洖鐜?鏌ュ叏鐜?:
5. 绋嬪簭寮€鍙?/h2>
绯荤粺鏋舵瀯锛歁ahout涓帹鑽愯繃婊ょ畻娉曟敮鎸佸崟鏈虹畻娉曞拰鍒嗘寮忕畻娉曚袱绉嶃€?/p>
- 鍗曟満绠楁硶: 鍦ㄥ崟鏈哄唴瀛樿绠楋紝鏀寔澶氱绠楁硶鎺ㄨ崘绠楁硶锛岄儴缃茶繍琛岀畝鍗曪紝淇澶勭悊鏁版嵁閲忔湁闄?/li>
- 鍒嗘寮忕畻娉? 鍩轰簬Hadoop闆嗙兢杩愯锛屾敮鎸佹湁闄愮殑鍑犵鎺ㄨ崘绠楁硶锛岄儴缃茶繍琛屽鏉傦紝鏀寔娴烽噺鏁版嵁
寮€鍙戠幆澧?/p>
- Win7 64bit
- Java 1.6.0_45
- Maven3
- Eclipse Juno Service Release 2
- Mahout-0.8
- Hadoop-1.1.2
寮€鍙戠幆澧僲ahout鐗堟湰涓?.8銆?璇峰弬鑰冩枃绔狅細鐢∕aven鏋勫缓Mahout椤圭洰
鏂板缓Java绫伙細
- BookEvaluator.java, 閫夊嚭鈥滆瘎浼版帹鑽愬櫒鈥濋獙璇佸緱鍒嗚緝楂樼殑绠楁硶
- BookResult.java, 瀵规寚瀹氭暟閲忕殑缁撴灉浜哄伐姣旇緝
- BookFilterGenderResult.java锛屽彧淇濈暀鐢锋€х敤鎴风殑鍥句功鍒楄〃
1). BookEvaluator.java, 閫夊嚭鈥滆瘎浼版帹鑽愬櫒鈥濋獙璇佸緱鍒嗚緝楂樼殑绠楁硶
婧愪唬鐮?/p>
package org.conan.mymahout.recommendation.book;
import java.io.IOException;
import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.eval.RecommenderBuilder;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;
import org.apache.mahout.cf.taste.similarity.ItemSimilarity;
import org.apache.mahout.cf.taste.similarity.UserSimilarity;
public class BookEvaluator {
final static int NEIGHBORHOOD_NUM = 2;
final static int RECOMMENDER_NUM = 3;
public static void main(String[] args) throws TasteException, IOException {
String file = "datafile/book/rating.csv";
DataModel dataModel = RecommendFactory.buildDataModel(file);
userEuclidean(dataModel);
userLoglikelihood(dataModel);
userEuclideanNoPref(dataModel);
itemEuclidean(dataModel);
itemLoglikelihood(dataModel);
itemEuclideanNoPref(dataModel);
slopeOne(dataModel);
}
public static RecommenderBuilder userEuclidean(DataModel dataModel) throws TasteException, IOException {
System.out.println("userEuclidean");
UserSimilarity userSimilarity = RecommendFactory.userSimilarity(RecommendFactory.SIMILARITY.EUCLIDEAN, dataModel);
UserNeighborhood userNeighborhood = RecommendFactory.userNeighborhood(RecommendFactory.NEIGHBORHOOD.NEAREST, userSimilarity, dataModel, NEIGHBORHOOD_NUM);
RecommenderBuilder recommenderBuilder = RecommendFactory.userRecommender(userSimilarity, userNeighborhood, true);
RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataModel, 0.7);
RecommendFactory.statsEvaluator(recommenderBuilder, null, dataModel, 2);
return recommenderBuilder;
}
public static RecommenderBuilder userLoglikelihood(DataModel dataModel) throws TasteException, IOException {
System.out.println("userLoglikelihood");
UserSimilarity userSimilarity = RecommendFactory.userSimilarity(RecommendFactory.SIMILARITY.LOGLIKELIHOOD, dataModel);
UserNeighborhood userNeighborhood = RecommendFactory.userNeighborhood(RecommendFactory.NEIGHBORHOOD.NEAREST, userSimilarity, dataModel, NEIGHBORHOOD_NUM);
RecommenderBuilder recommenderBuilder = RecommendFactory.userRecommender(userSimilarity, userNeighborhood, true);
RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataModel, 0.7);
RecommendFactory.statsEvaluator(recommenderBuilder, null, dataModel, 2);
return recommenderBuilder;
}
public static RecommenderBuilder userEuclideanNoPref(DataModel dataModel) throws TasteException, IOException {
System.out.println("userEuclideanNoPref");
UserSimilarity userSimilarity = RecommendFactory.userSimilarity(RecommendFactory.SIMILARITY.EUCLIDEAN, dataModel);
UserNeighborhood userNeighborhood = RecommendFactory.userNeighborhood(RecommendFactory.NEIGHBORHOOD.NEAREST, userSimilarity, dataModel, NEIGHBORHOOD_NUM);
RecommenderBuilder recommenderBuilder = RecommendFactory.userRecommender(userSimilarity, userNeighborhood, false);
RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataModel, 0.7);
RecommendFactory.statsEvaluator(recommenderBuilder, null, dataModel, 2);
return recommenderBuilder;
}
public static RecommenderBuilder itemEuclidean(DataModel dataModel) throws TasteException, IOException {
System.out.println("itemEuclidean");
ItemSimilarity itemSimilarity = RecommendFactory.itemSimilarity(RecommendFactory.SIMILARITY.EUCLIDEAN, dataModel);
RecommenderBuilder recommenderBuilder = RecommendFactory.itemRecommender(itemSimilarity, true);
RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataModel, 0.7);
RecommendFactory.statsEvaluator(recommenderBuilder, null, dataModel, 2);
return recommenderBuilder;
}
public static RecommenderBuilder itemLoglikelihood(DataModel dataModel) throws TasteException, IOException {
System.out.println("itemLoglikelihood");
ItemSimilarity itemSimilarity = RecommendFactory.itemSimilarity(RecommendFactory.SIMILARITY.LOGLIKELIHOOD, dataModel);
RecommenderBuilder recommenderBuilder = RecommendFactory.itemRecommender(itemSimilarity, true);
RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataModel, 0.7);
RecommendFactory.statsEvaluator(recommenderBuilder, null, dataModel, 2);
return recommenderBuilder;
}
public static RecommenderBuilder itemEuclideanNoPref(DataModel dataModel) throws TasteException, IOException {
System.out.println("itemEuclideanNoPref");
ItemSimilarity itemSimilarity = RecommendFactory.itemSimilarity(RecommendFactory.SIMILARITY.EUCLIDEAN, dataModel);
RecommenderBuilder recommenderBuilder = RecommendFactory.itemRecommender(itemSimilarity, false);
RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataModel, 0.7);
RecommendFactory.statsEvaluator(recommenderBuilder, null, dataModel, 2);
return recommenderBuilder;
}
public static RecommenderBuilder slopeOne(DataModel dataModel) throws TasteException, IOException {
System.out.println("slopeOne");
RecommenderBuilder recommenderBuilder = RecommendFactory.slopeOneRecommender();
RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataModel, 0.7);
RecommendFactory.statsEvaluator(recommenderBuilder, null, dataModel, 2);
return recommenderBuilder;
}
}
鎺у埗鍙拌緭鍑猴細
userEuclidean
AVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:0.33333325386047363
Recommender IR Evaluator: [Precision:0.3010752688172043,Recall:0.08542713567839195]
userLoglikelihood
AVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:2.5245869159698486
Recommender IR Evaluator: [Precision:0.11764705882352945,Recall:0.017587939698492466]
userEuclideanNoPref
AVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:4.288461538461536
Recommender IR Evaluator: [Precision:0.09045226130653267,Recall:0.09296482412060306]
itemEuclidean
AVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:1.408880928305655
Recommender IR Evaluator: [Precision:0.0,Recall:0.0]
itemLoglikelihood
AVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:2.448554412835434
Recommender IR Evaluator: [Precision:0.0,Recall:0.0]
itemEuclideanNoPref
AVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:2.5665197873957957
Recommender IR Evaluator: [Precision:0.6005025125628134,Recall:0.6055276381909548]
slopeOne
AVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:2.6893078179405814
Recommender IR Evaluator: [Precision:0.0,Recall:0.0]
鍙鍖栤€滆瘎浼版帹鑽愬櫒鈥濊緭鍑猴細
鎺ㄨ崘鐨勭粨鏋滅殑骞冲潎璺濈
鎺ㄨ崘鍣ㄧ殑璇勫垎
鍙湁itemEuclideanNoPref绠楁硶璇勪及鐨勭粨鏋滄槸闈炲父濂界殑锛屽叾浠栫畻娉曠殑缁撴灉閮戒笉澶ソ銆?/p>
2). BookResult.java, 瀵规寚瀹氭暟閲忕殑缁撴灉浜哄伐姣旇緝
涓哄緱鍒板樊寮傚寲缁撴灉锛屾垜浠垎鍒彇4涓畻娉曪細userEuclidean,itemEuclidean锛寀serEuclideanNoPref锛宨temEuclideanNoPref锛屽鎺ㄨ崘缁撴灉浜哄伐姣旇緝銆?/p>
婧愪唬鐮?/p>
package org.conan.mymahout.recommendation.book;
import java.io.IOException;
import java.util.List;
import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.eval.RecommenderBuilder;
import org.apache.mahout.cf.taste.impl.common.LongPrimitiveIterator;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.recommender.RecommendedItem;
public class BookResult {
final static int NEIGHBORHOOD_NUM = 2;
final static int RECOMMENDER_NUM = 3;
public static void main(String[] args) throws TasteException, IOException {
String file = "datafile/book/rating.csv";
DataModel dataModel = RecommendFactory.buildDataModel(file);
RecommenderBuilder rb1 = BookEvaluator.userEuclidean(dataModel);
RecommenderBuilder rb2 = BookEvaluator.itemEuclidean(dataModel);
RecommenderBuilder rb3 = BookEvaluator.userEuclideanNoPref(dataModel);
RecommenderBuilder rb4 = BookEvaluator.itemEuclideanNoPref(dataModel);
LongPrimitiveIterator iter = dataModel.getUserIDs();
while (iter.hasNext()) {
long uid = iter.nextLong();
System.out.print("userEuclidean =>");
result(uid, rb1, dataModel);
System.out.print("itemEuclidean =>");
result(uid, rb2, dataModel);
System.out.print("userEuclideanNoPref =>");
result(uid, rb3, dataModel);
System.out.print("itemEuclideanNoPref =>");
result(uid, rb4, dataModel);
}
}
public static void result(long uid, RecommenderBuilder recommenderBuilder, DataModel dataModel) throws TasteException {
List list = recommenderBuilder.buildRecommender(dataModel).recommend(uid, RECOMMENDER_NUM);
RecommendFactory.showItems(uid, list, false);
}
}
鎺у埗鍙拌緭鍑猴細鍙埅鍙栭儴鍒嗙粨鏋?/p>
...
userEuclidean =>uid:63,
itemEuclidean =>uid:63,(984,9.000000)(690,9.000000)(943,8.875000)
userEuclideanNoPref =>uid:63,(4,1.000000)(723,1.000000)(300,1.000000)
itemEuclideanNoPref =>uid:63,(867,3.791667)(947,3.083333)(28,2.750000)
userEuclidean =>uid:64,
itemEuclidean =>uid:64,(368,8.615385)(714,8.200000)(290,8.142858)
userEuclideanNoPref =>uid:64,(860,1.000000)(490,1.000000)(64,1.000000)
itemEuclideanNoPref =>uid:64,(409,3.950000)(715,3.830627)(901,3.444048)
userEuclidean =>uid:65,(939,7.000000)
itemEuclidean =>uid:65,(550,9.000000)(334,9.000000)(469,9.000000)
userEuclideanNoPref =>uid:65,(939,2.000000)(185,1.000000)(736,1.000000)
itemEuclideanNoPref =>uid:65,(666,4.166667)(96,3.093931)(345,2.958333)
userEuclidean =>uid:66,
itemEuclidean =>uid:66,(971,9.900000)(656,9.600000)(918,9.577709)
userEuclideanNoPref =>uid:66,(6,1.000000)(492,1.000000)(676,1.000000)
itemEuclideanNoPref =>uid:66,(185,3.650000)(533,3.617307)(172,3.500000)
userEuclidean =>uid:67,
itemEuclidean =>uid:67,(663,9.700000)(987,9.625000)(486,9.600000)
userEuclideanNoPref =>uid:67,(732,1.000000)(828,1.000000)(113,1.000000)
itemEuclideanNoPref =>uid:67,(724,3.000000)(279,2.950000)(890,2.750000)
...
鎴戜滑鏌ョ湅uid=65鐨勭敤鎴锋帹鑽愪俊鎭細
鏌ョ湅user.csv鏁版嵁闆?/p>
> user[65,]
userid gender age
65 65 M 14
鐢ㄦ埛65锛岀敺鎬э紝14宀併€?/p>
浠temEuclideanNoPref鐨勭畻娉曠殑鎺ㄨ崘缁撴灉锛屾煡鐪媌ookid=666鐨勫浘涔﹁瘎鍒嗘儏鍐?/p>
> rating[which(rating$bookid==666),]
userid bookid pref
646 44 666 10
1327 89 666 7
2470 165 666 3
2697 179 666 7
鍙戠幇鏈?涓敤鎴峰666鐨勫浘涔﹁瘎鍒嗭紝鏌ョ湅杩?涓敤鎴风殑灞炴€ф暟鎹?/p>
> user[c(44,89,165,179),]
userid gender age
44 44 F 76
89 89 M 40
165 165 F 59
179 179 F 68
杩?涓敤鎴凤紝3濂?鐢枫€?/p>
鎴戜滑鍋囪鐢锋€у拰鐢锋€ф湁鐩稿悓鐨勫浘涔﹀叴瓒o紝濂虫€у拰濂虫€ф湁鐩稿悓鐨勫浘涔﹀亸濂姐€傚洜涓虹敤鎴?5鏄敺鎬э紝鎵€浠ユ垜浠帴涓嬫潵鎺掗櫎濂虫€х殑璇勫垎鑰咃紝鍙繚鐣欑敺鎬ц瘎鍒嗚€呯殑璇勫垎璁板綍銆?/p>
3). BookFilterGenderResult.java锛屽彧淇濈暀鐢锋€х敤鎴风殑鍥句功鍒楄〃
婧愪唬鐮?/p>
package org.conan.mymahout.recommendation.book;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.eval.RecommenderBuilder;
import org.apache.mahout.cf.taste.impl.common.LongPrimitiveIterator;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.recommender.IDRescorer;
import org.apache.mahout.cf.taste.recommender.RecommendedItem;
public class BookFilterGenderResult {
final static int NEIGHBORHOOD_NUM = 2;
final static int RECOMMENDER_NUM = 3;
public static void main(String[] args) throws TasteException, IOException {
String file = "datafile/book/rating.csv";
DataModel dataModel = RecommendFactory.buildDataModel(file);
RecommenderBuilder rb1 = BookEvaluator.userEuclidean(dataModel);
RecommenderBuilder rb2 = BookEvaluator.itemEuclidean(dataModel);
RecommenderBuilder rb3 = BookEvaluator.userEuclideanNoPref(dataModel);
RecommenderBuilder rb4 = BookEvaluator.itemEuclideanNoPref(dataModel);
long uid = 65;
System.out.print("userEuclidean =>");
filterGender(uid, rb1, dataModel);
System.out.print("itemEuclidean =>");
filterGender(uid, rb2, dataModel);
System.out.print("userEuclideanNoPref =>");
filterGender(uid, rb3, dataModel);
System.out.print("itemEuclideanNoPref =>");
filterGender(uid, rb4, dataModel);
}
/**
* 瀵圭敤鎴锋€у埆杩涜杩囨护
*/
public static void filterGender(long uid, RecommenderBuilder recommenderBuilder, DataModel dataModel) throws TasteException, IOException {
Set userids = getMale("datafile/book/user.csv");
//璁$畻鐢锋€х敤鎴锋墦鍒嗚繃鐨勫浘涔?
Set bookids = new HashSet();
for (long uids : userids) {
LongPrimitiveIterator iter = dataModel.getItemIDsFromUser(uids).iterator();
while (iter.hasNext()) {
long bookid = iter.next();
bookids.add(bookid);
}
}
IDRescorer rescorer = new FilterRescorer(bookids);
List list = recommenderBuilder.buildRecommender(dataModel).recommend(uid, RECOMMENDER_NUM, rescorer);
RecommendFactory.showItems(uid, list, false);
}
/**
* 鑾峰緱鐢锋€х敤鎴稩D
*/
public static Set getMale(String file) throws IOException {
BufferedReader br = new BufferedReader(new FileReader(new File(file)));
Set userids = new HashSet();
String s = null;
while ((s = br.readLine()) != null) {
String[] cols = s.split(",");
if (cols[1].equals("M")) {// 鍒ゆ柇鐢锋€х敤鎴?
userids.add(Long.parseLong(cols[0]));
}
}
br.close();
return userids;
}
}
/**
* 瀵圭粨鏋滈噸璁$畻
*/
class FilterRescorer implements IDRescorer {
final private Set userids;
public FilterRescorer(Set userids) {
this.userids = userids;
}
@Override
public double rescore(long id, double originalScore) {
return isFiltered(id) ? Double.NaN : originalScore;
}
@Override
public boolean isFiltered(long id) {
return userids.contains(id);
}
}
鎺у埗鍙拌緭鍑?
userEuclidean =>uid:65,
itemEuclidean =>uid:65,(784,8.090909)(276,8.000000)(476,7.666667)
userEuclideanNoPref =>uid:65,
itemEuclideanNoPref =>uid:65,(887,2.250000)(356,2.166667)(430,1.866667)
鎴戜滑鍙戠幇锛岀敱浜庡彧淇濈暀鐢锋€х殑璇勫垎璁板綍锛屾暟鎹噺灏卞彉寰楁瘮杈冨皯浜嗭紝鍩轰簬鐢ㄦ埛鐨勫崗鍚岃繃婊ょ畻娉曪紝宸茬粡娌℃湁杈撳嚭鐨勭粨鏋滀簡銆傚熀浜庣墿鍝佺殑鍗忓悓杩囨护绠楁硶锛岀粨鏋滈泦涔熸湁鎵€鍙樺寲銆?/p>
瀵逛簬itemEuclideanNoPref绠楁硶锛岃緭鍑烘帓鍚嶇涓€鏉′负ID涓?87鐨勫浘涔︺€?/p>
鎴戝啀杩涗竴姝ュ悜涓嬭拷韪細鏌ヨ鍝簺鐢ㄦ埛瀵瑰浘涔?87杩涜浜嗘墦鍒嗐€?/p>
> rating[which(rating$bookid==887),]
userid bookid pref
1280 85 887 2
1743 119 887 8
2757 184 887 4
2791 186 887 5
鏈?涓敤鎴峰鍥句功887璇勫垎锛屽啀鍒嗗埆鏌ョ湅杩欎釜鐢ㄦ埛鐨勫睘鎬?/p>
> user[c(85,119,184,186),]
userid gender age
85 85 F 31
119 119 F 49
184 184 M 27
186 186 M 35
鍏朵腑2鐢凤紝2濂炽€傜敱浜庢垜浠殑绠楁硶锛屽凡缁忔帓闄や簡濂虫€х殑璇勫垎锛屾垜浠彲浠ユ帹鏂浘涔?87鐨勬帹鑽愬簲璇ユ潵鑷簬2涓敺鎬х殑璇勫垎鑰呯殑鎺ㄨ崘銆?/p>
鍒嗗埆璁$畻鐢ㄦ埛65锛屼笌鐢ㄦ埛184鍜岀敤鎴?86鐨勮瘎鍒嗙殑鍥句功浜ら泦銆?/p>
rat65<-rating[which(rating$userid==65),]
rat184<-rating[which(rating$userid==184),]
rat186<-rating[which(rating$userid==186),]
> intersect(rat65$bookid ,rat184$bookid)
integer(0)
> intersect(rat65$bookid ,rat186$bookid)
[1] 65 375
鏈€鍚庡彂鐜帮紝鐢ㄦ埛65涓庣敤鎴?86閮界粰鍥句功65鍜屽浘涔?75鎵撹繃鍒嗐€傛垜浠啀鎵撳垎鍑虹敤鎴?86鐨勮瘎鍒嗚褰曘€?/p>
> rat186
userid bookid pref
2790 186 65 7
2791 186 887 5
2792 186 529 3
2793 186 375 6
2794 186 566 7
2795 186 169 4
2796 186 907 1
2797 186 821 2
2798 186 720 5
2799 186 642 5
2800 186 137 3
2801 186 744 1
2802 186 896 2
2803 186 156 6
2804 186 392 3
2805 186 386 3
2806 186 901 7
2807 186 69 6
2808 186 845 6
2809 186 998 3
鐢ㄦ埛186锛岃繕缁欏浘涔?87鎵撹繃鍒嗭紝鎵€浠ュ浜庣粰65鐢ㄦ埛鎺ㄨ崘鍥句功887锛屾槸鍚堢悊鐨勩€?/p>
鎴戜滑閫氳繃涓€涓疄闄呯殑鍥句功鎺ㄨ崘鐨勬渚嬶紝鏇磋繘涓€姝ュ湴浜嗚В浜嗗浣曠敤Mahout鏋勫缓鎺ㄨ崘绯荤粺銆?/p>
杞浇璇锋敞鏄庡嚭澶勶細
http://blog.fens.me/hadoop-mahout-recommend-book/