Mahout鏋勫徐鍥句功鎺ㄨ崘绯荤粺

Mahout鏋勫缓鍥句功鎺ㄨ崘绯荤粺

鍓嶈█

鏈枃鏄疢ahout瀹炵幇鎺ㄨ崘绯荤粺鐨勫張涓€妗堜緥锛岀敤Mahout鏋勫缓鍥句功鎺ㄨ崘绯荤粺銆備笌涔嬪墠鐨勪袱绡囨枃绔狅紝鎬濊矾涓婇潰绫讳技锛屼晶閲嶇偣鍦ㄤ簬鍥句功鐨勫睘鎬у浣曞埄鐢ㄣ€傛湰鏂囩殑鏁版嵁鍦ㄨ嚜浜嶢mazon缃戠珯锛岀敱鐖櫕鎶撳彇鑾峰緱銆?/p>

鐩綍

  1. 椤圭洰鑳屾櫙
  2. 闇€姹傚垎鏋?/li>
  3. 鏁版嵁璇存槑
  4. 绠楁硶妯″瀷
  5. 绋嬪簭寮€鍙?/li>

1. 椤圭洰鑳屾櫙

Amazon鏄渶鏃╃殑鐢靛瓙鍟嗗姟缃戠珯涔嬩竴锛屼互缃戜笂鍥句功璧峰锛屾渶鍚庡彂灞曟垚涓洪煶鍍忥紝鐢靛瓙娑堣垂鍝侊紝娓告垙锛岀敓娲荤敤鍝佺瓑鐨勭患鍚堟€х數瀛愬晢鍔″钩鍙般€侫mazon鐨勬帹鑽愮郴缁燂紝鏄簰鑱旂綉涓婃渶鏃╃殑鍟嗗搧鎺ㄨ崘绯荤粺锛屽畠涓篈mazon甯︽潵浜嗚嚦灏?0%鐨勬祦閲忥紝鍜屽彲瑙傜殑閿€鍞埄娑︺€?/p>

濡備粖鎺ㄨ崘绯荤粺宸茬粡鎴愪负鐢靛瓙鍟嗗姟缃戠珯鐨勬爣閰嶏紝濡傛灉杩樻病鏈夋帹鑽愮郴缁熼兘涓嶅ソ鎰忔€濓紝璇磋嚜宸辨槸鍋氱數鍟嗙殑銆?/p>

2. 闇€姹傚垎鏋?/h2>

鎺ㄨ崘绯荤粺濡傛閲嶈锛屾垜浠簲璇ュ鏋滅悊瑙o紵

鎵撳紑Amazon鐨凪ahout In Action鍥句功椤甸潰锛?br style="margin: 0pt; padding: 0pt;">http://www.amazon.com/Mahout-Action-Sean-Owen/dp/1935182684/ref=pd_sim_b_1?ie=UTF8&refRID=0H4H2NSSR8F34R76E2TP

缃戦〉涓婄殑鍏冪礌锛?/p>

  • 骞垮憡浣嶏細骞垮憡鍟嗘姇鏀惧箍鍛婄殑浣嶇疆锛岀綉绔欏彲浠ラ潬缃戠粶骞垮憡璧氶挶锛屼竴鑸槸缃戦〉鏈€濂界殑浣嶇疆銆?/li>
  • 骞冲潎鍒嗭細鐢ㄦ埛瀵瑰浘涔︾殑鎵撳垎
  • 鍏宠仈瑙勫垯锛氶€氳繃鍏宠仈瑙勫垯锛屾帹鑽愪綅
  • 鍗忓悓杩囨护锛氶€氳繃鍩轰簬鐗╁搧鐨勫崗鍚岃繃婊ょ畻娉曠殑锛屾帹鑽愪綅
  • 鍥句功灞炴€э細鍖呮嫭椤垫暟锛屽嚭鐗堢ぞ锛孖SBN锛岃瑷€绛?/li>
  • 浣滆€呬粙缁嶏細鏈夊叧浣滆€呯殑浠嬬粛锛屽拰浣滆€呯殑鍏朵粬钁椾綔
  • 鐢ㄦ埛璇勫垎锛氱敤鎴疯瘎鍒嗚涓?/li>
  • 鐢ㄦ埛璇勮锛氱敤鎴疯瘎璁虹殑鍐呭

Mahout鏋勫徐鍥句功鎺ㄨ崘绯荤粺

鍦ㄧ綉椤典笂锛屽叾浠栫殑鎺ㄨ崘浣嶏細

Mahout鏋勫徐鍥句功鎺ㄨ崘绯荤粺

缁撳悎涓婇潰2寮犳埅鍥撅紝鎴戜滑涓嶉毦鍙戠幇锛屾帹鑽愬浜嶢mazon鐨勯噸瑕佹€с€傞櫎浜嗘渶鏄庢樉鐨勫箍鍛婁綅缁欎簡鑳界洿鎺ュ甫鏉ュ埄娑︾殑骞垮憡鍟嗭紝缃戦〉涓湁4澶勬帹鑽愪綅锛屽垎鍒粠涓嶅悓鐨勭淮搴︼紝鐢ㄤ笉鍚岀殑鎺ㄨ崘绠楁硶锛岀寽鐢ㄦ埛鍠滄鐨勫晢鍝併€?/p>

3. 鏁版嵁璇存槑

2涓暟鎹枃浠讹細

  • rating.csv 锛氱敤鎴疯瘎鍒嗚涓烘暟鎹?/li>
  • users.csv 锛氱敤鎴峰睘鎬ф暟鎹?/li>

1). book-ratings.csv

  • 3鍒楁暟鎹細鐢ㄦ埛ID锛屽浘涔D, 鐢ㄦ埛瀵瑰浘涔︾殑璇勫垎
  • 璁板綍鏁? 4000娆$殑鍥句功璇勫垎
  • 鐢ㄦ埛鏁? 200涓?/li>
  • 鍥句功鏁? 1000涓?/li>
  • 璇勫垎锛?-10

鏁版嵁绀轰緥


1,565,3
1,807,2
1,201,1
1,557,9
1,987,10
1,59,5
1,305,6
1,153,3
1,139,7
1,875,5
1,722,10
2,977,4
2,806,3
2,654,8
2,21,8
2,662,5
2,437,6
2,576,3
2,141,8
2,311,4
2,101,3
2,540,9
2,87,3
2,65,8
2,501,6
2,710,5
2,331,9
2,542,4
2,757,9
2,590,7

2). users.csv

  • 3鍒楁暟鎹細鐢ㄦ埛ID锛岀敤鎴锋€у埆锛岀敤鎴峰勾榫?/li>
  • 鐢ㄦ埛鏁? 200涓?/li>
  • 鐢ㄦ埛鎬у埆: M涓虹敺鎬э紝F涓哄コ鎬?/li>
  • 鐢ㄦ埛骞撮緞: 11-80宀佷箣闂?/li>

鏁版嵁绀轰緥


1,M,40
2,M,27
3,M,41
4,F,43
5,F,16
6,M,36
7,F,36
8,F,46
9,M,50
10,M,21
11,F,11
12,M,42
13,F,40
14,F,28
15,M,25
16,M,68
17,M,53
18,F,69
19,F,48
20,F,56
21,F,36

4. 绠楁硶妯″瀷

鏈枃涓昏浠嬬粛Mahout鐨勫熀浜庣墿鍝佺殑鍗忓悓杩囨护妯″瀷锛屽叾浠栫殑绠楁硶妯″瀷灏嗕笉鍐嶈繖閲岃В閲娿€?/p>

閽堝涓婇潰鐨勬暟鎹紝鎴戝皢鐢?绉嶇畻娉曠粍鍚堣繘琛屾祴璇曪細鏈夊叧Mahout绠楁硶缁勫悎鐨勮缁嗚В閲婏紝璇峰弬鑰冩枃绔狅細浠庢簮浠g爜鍓栨瀽Mahout鎺ㄨ崘寮曟搸

7绉嶇畻娉曠粍鍚?/p>

  • userCF1: EuclideanSimilarity+ NearestNUserNeighborhood+ GenericUserBasedRecommender
  • userCF2: LogLikelihoodSimilarity+ NearestNUserNeighborhood+ GenericUserBasedRecommender
  • userCF3: EuclideanSimilarity+ NearestNUserNeighborhood+ GenericBooleanPrefUserBasedRecommender
  • itemCF1: EuclideanSimilarity + GenericItemBasedRecommender
  • itemCF2: LogLikelihoodSimilarity + GenericItemBasedRecommender
  • itemCF3: EuclideanSimilarity + GenericBooleanPrefItemBasedRecommender
  • slopeOne锛歋lopeOneRecommender

瀵逛笂闈㈢殑绠楁硶杩涜绠楁硶璇勪及锛屾湁鍏充簬绠楁硶璇勪及鐨勮缁嗚В閲婏紝璇峰弬鑰冩枃绔狅細Mahout鎺ㄨ崘绠楁硶API璇﹁В

  • 鏌ュ噯鐜?
  • 鍙洖鐜?鏌ュ叏鐜?:

5. 绋嬪簭寮€鍙?/h2>

绯荤粺鏋舵瀯锛歁ahout涓帹鑽愯繃婊ょ畻娉曟敮鎸佸崟鏈虹畻娉曞拰鍒嗘寮忕畻娉曚袱绉嶃€?/p>

  • 鍗曟満绠楁硶: 鍦ㄥ崟鏈哄唴瀛樿绠楋紝鏀寔澶氱绠楁硶鎺ㄨ崘绠楁硶锛岄儴缃茶繍琛岀畝鍗曪紝淇澶勭悊鏁版嵁閲忔湁闄?/li>
  • 鍒嗘寮忕畻娉? 鍩轰簬Hadoop闆嗙兢杩愯锛屾敮鎸佹湁闄愮殑鍑犵鎺ㄨ崘绠楁硶锛岄儴缃茶繍琛屽鏉傦紝鏀寔娴烽噺鏁版嵁

Mahout鏋勫徐鍥句功鎺ㄨ崘绯荤粺

寮€鍙戠幆澧?/p>

  • Win7 64bit
  • Java 1.6.0_45
  • Maven3
  • Eclipse Juno Service Release 2
  • Mahout-0.8
  • Hadoop-1.1.2

寮€鍙戠幆澧僲ahout鐗堟湰涓?.8銆?璇峰弬鑰冩枃绔狅細鐢∕aven鏋勫缓Mahout椤圭洰

鏂板缓Java绫伙細

  • BookEvaluator.java, 閫夊嚭鈥滆瘎浼版帹鑽愬櫒鈥濋獙璇佸緱鍒嗚緝楂樼殑绠楁硶
  • BookResult.java, 瀵规寚瀹氭暟閲忕殑缁撴灉浜哄伐姣旇緝
  • BookFilterGenderResult.java锛屽彧淇濈暀鐢锋€х敤鎴风殑鍥句功鍒楄〃

1). BookEvaluator.java, 閫夊嚭鈥滆瘎浼版帹鑽愬櫒鈥濋獙璇佸緱鍒嗚緝楂樼殑绠楁硶

婧愪唬鐮?/p>


package org.conan.mymahout.recommendation.book;

import java.io.IOException;

import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.eval.RecommenderBuilder;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;
import org.apache.mahout.cf.taste.similarity.ItemSimilarity;
import org.apache.mahout.cf.taste.similarity.UserSimilarity;

public class BookEvaluator {

    final static int NEIGHBORHOOD_NUM = 2;
    final static int RECOMMENDER_NUM = 3;

    public static void main(String[] args) throws TasteException, IOException {
        String file = "datafile/book/rating.csv";
        DataModel dataModel = RecommendFactory.buildDataModel(file);
        userEuclidean(dataModel);
        userLoglikelihood(dataModel);
        userEuclideanNoPref(dataModel);
        itemEuclidean(dataModel);
        itemLoglikelihood(dataModel);
        itemEuclideanNoPref(dataModel);
        slopeOne(dataModel);
    }

    public static RecommenderBuilder userEuclidean(DataModel dataModel) throws TasteException, IOException {
        System.out.println("userEuclidean");
        UserSimilarity userSimilarity = RecommendFactory.userSimilarity(RecommendFactory.SIMILARITY.EUCLIDEAN, dataModel);
        UserNeighborhood userNeighborhood = RecommendFactory.userNeighborhood(RecommendFactory.NEIGHBORHOOD.NEAREST, userSimilarity, dataModel, NEIGHBORHOOD_NUM);
        RecommenderBuilder recommenderBuilder = RecommendFactory.userRecommender(userSimilarity, userNeighborhood, true);

        RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataModel, 0.7);
        RecommendFactory.statsEvaluator(recommenderBuilder, null, dataModel, 2);
        return recommenderBuilder;
    }
    
    public static RecommenderBuilder userLoglikelihood(DataModel dataModel) throws TasteException, IOException {
        System.out.println("userLoglikelihood");
        UserSimilarity userSimilarity = RecommendFactory.userSimilarity(RecommendFactory.SIMILARITY.LOGLIKELIHOOD, dataModel);
        UserNeighborhood userNeighborhood = RecommendFactory.userNeighborhood(RecommendFactory.NEIGHBORHOOD.NEAREST, userSimilarity, dataModel, NEIGHBORHOOD_NUM);
        RecommenderBuilder recommenderBuilder = RecommendFactory.userRecommender(userSimilarity, userNeighborhood, true);

        RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataModel, 0.7);
        RecommendFactory.statsEvaluator(recommenderBuilder, null, dataModel, 2);
        return recommenderBuilder;
    }
    
    public static RecommenderBuilder userEuclideanNoPref(DataModel dataModel) throws TasteException, IOException {
        System.out.println("userEuclideanNoPref");
        UserSimilarity userSimilarity = RecommendFactory.userSimilarity(RecommendFactory.SIMILARITY.EUCLIDEAN, dataModel);
        UserNeighborhood userNeighborhood = RecommendFactory.userNeighborhood(RecommendFactory.NEIGHBORHOOD.NEAREST, userSimilarity, dataModel, NEIGHBORHOOD_NUM);
        RecommenderBuilder recommenderBuilder = RecommendFactory.userRecommender(userSimilarity, userNeighborhood, false);

        RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataModel, 0.7);
        RecommendFactory.statsEvaluator(recommenderBuilder, null, dataModel, 2);
        return recommenderBuilder;
    }

    public static RecommenderBuilder itemEuclidean(DataModel dataModel) throws TasteException, IOException {
        System.out.println("itemEuclidean");
        ItemSimilarity itemSimilarity = RecommendFactory.itemSimilarity(RecommendFactory.SIMILARITY.EUCLIDEAN, dataModel);
        RecommenderBuilder recommenderBuilder = RecommendFactory.itemRecommender(itemSimilarity, true);

        RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataModel, 0.7);
        RecommendFactory.statsEvaluator(recommenderBuilder, null, dataModel, 2);
        return recommenderBuilder;
    }

    public static RecommenderBuilder itemLoglikelihood(DataModel dataModel) throws TasteException, IOException {
        System.out.println("itemLoglikelihood");
        ItemSimilarity itemSimilarity = RecommendFactory.itemSimilarity(RecommendFactory.SIMILARITY.LOGLIKELIHOOD, dataModel);
        RecommenderBuilder recommenderBuilder = RecommendFactory.itemRecommender(itemSimilarity, true);

        RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataModel, 0.7);
        RecommendFactory.statsEvaluator(recommenderBuilder, null, dataModel, 2);
        return recommenderBuilder;
    }
    
    public static RecommenderBuilder itemEuclideanNoPref(DataModel dataModel) throws TasteException, IOException {
        System.out.println("itemEuclideanNoPref");
        ItemSimilarity itemSimilarity = RecommendFactory.itemSimilarity(RecommendFactory.SIMILARITY.EUCLIDEAN, dataModel);
        RecommenderBuilder recommenderBuilder = RecommendFactory.itemRecommender(itemSimilarity, false);

        RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataModel, 0.7);
        RecommendFactory.statsEvaluator(recommenderBuilder, null, dataModel, 2);
        return recommenderBuilder;
    }

    public static RecommenderBuilder slopeOne(DataModel dataModel) throws TasteException, IOException {
        System.out.println("slopeOne");
        RecommenderBuilder recommenderBuilder = RecommendFactory.slopeOneRecommender();

        RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataModel, 0.7);
        RecommendFactory.statsEvaluator(recommenderBuilder, null, dataModel, 2);
        return recommenderBuilder;
    }
}

鎺у埗鍙拌緭鍑猴細


userEuclidean
AVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:0.33333325386047363
Recommender IR Evaluator: [Precision:0.3010752688172043,Recall:0.08542713567839195]
userLoglikelihood
AVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:2.5245869159698486
Recommender IR Evaluator: [Precision:0.11764705882352945,Recall:0.017587939698492466]
userEuclideanNoPref
AVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:4.288461538461536
Recommender IR Evaluator: [Precision:0.09045226130653267,Recall:0.09296482412060306]
itemEuclidean
AVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:1.408880928305655
Recommender IR Evaluator: [Precision:0.0,Recall:0.0]
itemLoglikelihood
AVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:2.448554412835434
Recommender IR Evaluator: [Precision:0.0,Recall:0.0]
itemEuclideanNoPref
AVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:2.5665197873957957
Recommender IR Evaluator: [Precision:0.6005025125628134,Recall:0.6055276381909548]
slopeOne
AVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:2.6893078179405814
Recommender IR Evaluator: [Precision:0.0,Recall:0.0]

鍙鍖栤€滆瘎浼版帹鑽愬櫒鈥濊緭鍑猴細

鎺ㄨ崘鐨勭粨鏋滅殑骞冲潎璺濈

Mahout鏋勫徐鍥句功鎺ㄨ崘绯荤粺

鎺ㄨ崘鍣ㄧ殑璇勫垎

Mahout鏋勫徐鍥句功鎺ㄨ崘绯荤粺

鍙湁itemEuclideanNoPref绠楁硶璇勪及鐨勭粨鏋滄槸闈炲父濂界殑锛屽叾浠栫畻娉曠殑缁撴灉閮戒笉澶ソ銆?/p>

2). BookResult.java, 瀵规寚瀹氭暟閲忕殑缁撴灉浜哄伐姣旇緝

涓哄緱鍒板樊寮傚寲缁撴灉锛屾垜浠垎鍒彇4涓畻娉曪細userEuclidean,itemEuclidean锛寀serEuclideanNoPref锛宨temEuclideanNoPref锛屽鎺ㄨ崘缁撴灉浜哄伐姣旇緝銆?/p>

婧愪唬鐮?/p>


package org.conan.mymahout.recommendation.book;

import java.io.IOException;
import java.util.List;

import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.eval.RecommenderBuilder;
import org.apache.mahout.cf.taste.impl.common.LongPrimitiveIterator;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.recommender.RecommendedItem;

public class BookResult {

    final static int NEIGHBORHOOD_NUM = 2;
    final static int RECOMMENDER_NUM = 3;

    public static void main(String[] args) throws TasteException, IOException {
        String file = "datafile/book/rating.csv";
        DataModel dataModel = RecommendFactory.buildDataModel(file);
        RecommenderBuilder rb1 = BookEvaluator.userEuclidean(dataModel);
        RecommenderBuilder rb2 = BookEvaluator.itemEuclidean(dataModel);
        RecommenderBuilder rb3 = BookEvaluator.userEuclideanNoPref(dataModel);
        RecommenderBuilder rb4 = BookEvaluator.itemEuclideanNoPref(dataModel);
        
        LongPrimitiveIterator iter = dataModel.getUserIDs();
        while (iter.hasNext()) {
            long uid = iter.nextLong();
            System.out.print("userEuclidean       =>");
            result(uid, rb1, dataModel);
            System.out.print("itemEuclidean       =>");
            result(uid, rb2, dataModel);
            System.out.print("userEuclideanNoPref =>");
            result(uid, rb3, dataModel);
            System.out.print("itemEuclideanNoPref =>");
            result(uid, rb4, dataModel);
        }
    }

    public static void result(long uid, RecommenderBuilder recommenderBuilder, DataModel dataModel) throws TasteException {
        List list = recommenderBuilder.buildRecommender(dataModel).recommend(uid, RECOMMENDER_NUM);
        RecommendFactory.showItems(uid, list, false);
    }
}

鎺у埗鍙拌緭鍑猴細鍙埅鍙栭儴鍒嗙粨鏋?/p>


...
userEuclidean       =>uid:63,
itemEuclidean       =>uid:63,(984,9.000000)(690,9.000000)(943,8.875000)
userEuclideanNoPref =>uid:63,(4,1.000000)(723,1.000000)(300,1.000000)
itemEuclideanNoPref =>uid:63,(867,3.791667)(947,3.083333)(28,2.750000)
userEuclidean       =>uid:64,
itemEuclidean       =>uid:64,(368,8.615385)(714,8.200000)(290,8.142858)
userEuclideanNoPref =>uid:64,(860,1.000000)(490,1.000000)(64,1.000000)
itemEuclideanNoPref =>uid:64,(409,3.950000)(715,3.830627)(901,3.444048)
userEuclidean       =>uid:65,(939,7.000000)
itemEuclidean       =>uid:65,(550,9.000000)(334,9.000000)(469,9.000000)
userEuclideanNoPref =>uid:65,(939,2.000000)(185,1.000000)(736,1.000000)
itemEuclideanNoPref =>uid:65,(666,4.166667)(96,3.093931)(345,2.958333)
userEuclidean       =>uid:66,
itemEuclidean       =>uid:66,(971,9.900000)(656,9.600000)(918,9.577709)
userEuclideanNoPref =>uid:66,(6,1.000000)(492,1.000000)(676,1.000000)
itemEuclideanNoPref =>uid:66,(185,3.650000)(533,3.617307)(172,3.500000)
userEuclidean       =>uid:67,
itemEuclidean       =>uid:67,(663,9.700000)(987,9.625000)(486,9.600000)
userEuclideanNoPref =>uid:67,(732,1.000000)(828,1.000000)(113,1.000000)
itemEuclideanNoPref =>uid:67,(724,3.000000)(279,2.950000)(890,2.750000)
...

鎴戜滑鏌ョ湅uid=65鐨勭敤鎴锋帹鑽愪俊鎭細

鏌ョ湅user.csv鏁版嵁闆?/p>


> user[65,]
userid gender age
65     65      M  14

鐢ㄦ埛65锛岀敺鎬э紝14宀併€?/p>

浠temEuclideanNoPref鐨勭畻娉曠殑鎺ㄨ崘缁撴灉锛屾煡鐪媌ookid=666鐨勫浘涔﹁瘎鍒嗘儏鍐?/p>


> rating[which(rating$bookid==666),]
userid bookid pref
646      44    666   10
1327     89    666    7
2470    165    666    3
2697    179    666    7

鍙戠幇鏈?涓敤鎴峰666鐨勫浘涔﹁瘎鍒嗭紝鏌ョ湅杩?涓敤鎴风殑灞炴€ф暟鎹?/p>


> user[c(44,89,165,179),]
userid gender age
44      44      F  76
89      89      M  40
165    165      F  59
179    179      F  68

杩?涓敤鎴凤紝3濂?鐢枫€?/p>

鎴戜滑鍋囪鐢锋€у拰鐢锋€ф湁鐩稿悓鐨勫浘涔﹀叴瓒o紝濂虫€у拰濂虫€ф湁鐩稿悓鐨勫浘涔﹀亸濂姐€傚洜涓虹敤鎴?5鏄敺鎬э紝鎵€浠ユ垜浠帴涓嬫潵鎺掗櫎濂虫€х殑璇勫垎鑰咃紝鍙繚鐣欑敺鎬ц瘎鍒嗚€呯殑璇勫垎璁板綍銆?/p>

3). BookFilterGenderResult.java锛屽彧淇濈暀鐢锋€х敤鎴风殑鍥句功鍒楄〃

婧愪唬鐮?/p>


package org.conan.mymahout.recommendation.book;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.util.HashSet;
import java.util.List;
import java.util.Set;

import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.eval.RecommenderBuilder;
import org.apache.mahout.cf.taste.impl.common.LongPrimitiveIterator;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.recommender.IDRescorer;
import org.apache.mahout.cf.taste.recommender.RecommendedItem;

public class BookFilterGenderResult {

    final static int NEIGHBORHOOD_NUM = 2;
    final static int RECOMMENDER_NUM = 3;

    public static void main(String[] args) throws TasteException, IOException {
        String file = "datafile/book/rating.csv";
        DataModel dataModel = RecommendFactory.buildDataModel(file);
        RecommenderBuilder rb1 = BookEvaluator.userEuclidean(dataModel);
        RecommenderBuilder rb2 = BookEvaluator.itemEuclidean(dataModel);
        RecommenderBuilder rb3 = BookEvaluator.userEuclideanNoPref(dataModel);
        RecommenderBuilder rb4 = BookEvaluator.itemEuclideanNoPref(dataModel);
        
        long uid = 65;
        System.out.print("userEuclidean       =>");
        filterGender(uid, rb1, dataModel);
        System.out.print("itemEuclidean       =>");
        filterGender(uid, rb2, dataModel);
        System.out.print("userEuclideanNoPref =>");
        filterGender(uid, rb3, dataModel);
        System.out.print("itemEuclideanNoPref =>");
        filterGender(uid, rb4, dataModel);
    }

    /**
     * 瀵圭敤鎴锋€у埆杩涜杩囨护
     */
    public static void filterGender(long uid, RecommenderBuilder recommenderBuilder, DataModel dataModel) throws TasteException, IOException {
        Set userids = getMale("datafile/book/user.csv");

        //璁$畻鐢锋€х敤鎴锋墦鍒嗚繃鐨勫浘涔?
        Set bookids = new HashSet();
        for (long uids : userids) {
            LongPrimitiveIterator iter = dataModel.getItemIDsFromUser(uids).iterator();
            while (iter.hasNext()) {
                long bookid = iter.next();
                bookids.add(bookid);
            }
        }

        IDRescorer rescorer = new FilterRescorer(bookids);
        List list = recommenderBuilder.buildRecommender(dataModel).recommend(uid, RECOMMENDER_NUM, rescorer);
        RecommendFactory.showItems(uid, list, false);
    }

    /**
     * 鑾峰緱鐢锋€х敤鎴稩D
     */
    public static Set getMale(String file) throws IOException {
        BufferedReader br = new BufferedReader(new FileReader(new File(file)));
        Set userids = new HashSet();
        String s = null;
        while ((s = br.readLine()) != null) {
            String[] cols = s.split(",");
            if (cols[1].equals("M")) {// 鍒ゆ柇鐢锋€х敤鎴?
                userids.add(Long.parseLong(cols[0]));
            }
        }
        br.close();
        return userids;
    }

}

/**
 * 瀵圭粨鏋滈噸璁$畻
 */
class FilterRescorer implements IDRescorer {
    final private Set userids;

    public FilterRescorer(Set userids) {
        this.userids = userids;
    }

    @Override
    public double rescore(long id, double originalScore) {
        return isFiltered(id) ? Double.NaN : originalScore;
    }

    @Override
    public boolean isFiltered(long id) {
        return userids.contains(id);
    }
}

鎺у埗鍙拌緭鍑?


userEuclidean       =>uid:65,
itemEuclidean       =>uid:65,(784,8.090909)(276,8.000000)(476,7.666667)
userEuclideanNoPref =>uid:65,
itemEuclideanNoPref =>uid:65,(887,2.250000)(356,2.166667)(430,1.866667)

鎴戜滑鍙戠幇锛岀敱浜庡彧淇濈暀鐢锋€х殑璇勫垎璁板綍锛屾暟鎹噺灏卞彉寰楁瘮杈冨皯浜嗭紝鍩轰簬鐢ㄦ埛鐨勫崗鍚岃繃婊ょ畻娉曪紝宸茬粡娌℃湁杈撳嚭鐨勭粨鏋滀簡銆傚熀浜庣墿鍝佺殑鍗忓悓杩囨护绠楁硶锛岀粨鏋滈泦涔熸湁鎵€鍙樺寲銆?/p>

瀵逛簬itemEuclideanNoPref绠楁硶锛岃緭鍑烘帓鍚嶇涓€鏉′负ID涓?87鐨勫浘涔︺€?/p>

鎴戝啀杩涗竴姝ュ悜涓嬭拷韪細鏌ヨ鍝簺鐢ㄦ埛瀵瑰浘涔?87杩涜浜嗘墦鍒嗐€?/p>


> rating[which(rating$bookid==887),]
userid bookid pref
1280     85    887    2
1743    119    887    8
2757    184    887    4
2791    186    887    5

鏈?涓敤鎴峰鍥句功887璇勫垎锛屽啀鍒嗗埆鏌ョ湅杩欎釜鐢ㄦ埛鐨勫睘鎬?/p>


> user[c(85,119,184,186),]
userid gender age
85      85      F  31
119    119      F  49
184    184      M  27
186    186      M  35

鍏朵腑2鐢凤紝2濂炽€傜敱浜庢垜浠殑绠楁硶锛屽凡缁忔帓闄や簡濂虫€х殑璇勫垎锛屾垜浠彲浠ユ帹鏂浘涔?87鐨勬帹鑽愬簲璇ユ潵鑷簬2涓敺鎬х殑璇勫垎鑰呯殑鎺ㄨ崘銆?/p>

鍒嗗埆璁$畻鐢ㄦ埛65锛屼笌鐢ㄦ埛184鍜岀敤鎴?86鐨勮瘎鍒嗙殑鍥句功浜ら泦銆?/p>


rat65<-rating[which(rating$userid==65),]
rat184<-rating[which(rating$userid==184),]
rat186<-rating[which(rating$userid==186),]

> intersect(rat65$bookid ,rat184$bookid)
integer(0)
> intersect(rat65$bookid ,rat186$bookid)
[1]  65 375

鏈€鍚庡彂鐜帮紝鐢ㄦ埛65涓庣敤鎴?86閮界粰鍥句功65鍜屽浘涔?75鎵撹繃鍒嗐€傛垜浠啀鎵撳垎鍑虹敤鎴?86鐨勮瘎鍒嗚褰曘€?/p>


> rat186
userid bookid pref
2790    186     65    7
2791    186    887    5
2792    186    529    3
2793    186    375    6
2794    186    566    7
2795    186    169    4
2796    186    907    1
2797    186    821    2
2798    186    720    5
2799    186    642    5
2800    186    137    3
2801    186    744    1
2802    186    896    2
2803    186    156    6
2804    186    392    3
2805    186    386    3
2806    186    901    7
2807    186     69    6
2808    186    845    6
2809    186    998    3

鐢ㄦ埛186锛岃繕缁欏浘涔?87鎵撹繃鍒嗭紝鎵€浠ュ浜庣粰65鐢ㄦ埛鎺ㄨ崘鍥句功887锛屾槸鍚堢悊鐨勩€?/p>

鎴戜滑閫氳繃涓€涓疄闄呯殑鍥句功鎺ㄨ崘鐨勬渚嬶紝鏇磋繘涓€姝ュ湴浜嗚В浜嗗浣曠敤Mahout鏋勫缓鎺ㄨ崘绯荤粺銆?/p>

杞浇璇锋敞鏄庡嚭澶勶細
http://blog.fens.me/hadoop-mahout-recommend-book/