Setting up Mahout and running recommender job – Associatation rule

For this I assume you have configured hadoop and have maven and SVN installed. I am using Ubuntu 12.04

Setting up mahout

Execute the following commands in your eclipse workspace.

[~/workspace]$ svn co http://svn.apache.org/repos/asf/mahout/trunk
[~/workspace]$ mv trunk/ mahout/
[~/workspace/mahout]$ cd mahout/
[~/workspace/mahout]$mvn install
[~/workspace/mahout/core]$ cd core/
[~/workspace/mahout/core]$ mvn compile
[~/workspace/mahout/core]$ mvn install
[~/workspace/mahout/core]$ cd ../examples
[~/workspace/mahout/examples]$ mvn compile

If want to further configure mahout please refer to these blogs here and here.

About the data and upload data into HDFS

Recommender, a part of mahout which runs on top of hadoop takes inputs in the form of <key,value> pairs. The recommender needs two files, an input file and a users file. The input file contains data which has been essentially converted into <key,value> pairs. The users file has the key of the users that you want recommendations for. You can download the input file from my google drive. For the users.txt file create a file with one key value in the first line, which looks like this.

[~/input]$ cat users.txt
8

The number 8 above is the key, say you are trying to get recommendations for user # 8. You could have more keys too but it is going to take a way longer time.

Make sure your hadoop is running and upload your files into HDFS.

[~/input]$ hadoop dfs -mkdir input/
[~/input]$ hadoop dfs -put links-converted.txt input/
[~/input]$ hadoop dfs -put users.txt input/

Run recommender on the local machine

Recomendation-AssociationRuleMahout

This is the command to run recommender job , I know its kind of loaded, I will try and explain it below.

[~/workspace/mahout]$ hadoop jar ~/workspace/mahout/core/target/mahout-core-0.7-job.jar org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -Dmapred.input.dir=input/links-converted.txt -Dmapred.output.dir=output --usersFile input/users.txt --booleanData -s SIMILARITY_LOGLIKELIHOOD

hadoop jar ~/workspace/mahout/core/target/mahout-core-0.7-job.jar tells hadoop, we are giving it a jar file and the location of the jar file.

org.apache.mahout.cf.taste.hadoop.item.RecommenderJob is the main class where execution  starts from.

-Dmapred.input.dir=input/links-converted.txt -Dmapred.output.dir=output are the HDFS input and output directories.

–usersFile input/users.txt has the key value of the users that you want recommendations for.

-s SIMILARITY_LOGLIKELIHOOD This gives the probability of similarity between 8 (as given in the users.txt file ) and the rest of the keys. You could also do other analysis like SIMILARITY_COOCCURRENCE, SIMILARITY_LOGLIKELIHOOD,  SIMILARITY_TANIMOTO_COEFFICIENT, SIMILARITY_CITY_BLOCK, SIMILARITY_EUCLIDEAN_DISTANCE.

This is the output I got

 8[3303009:1.0,4393611:1.0,5292042:1.0,2583882:1.0,1850305:1.0,275656:1.0,1254637:1.0,1720928:1.0,5575496:1.0,3956845:1.0]
<pre>

Running on EMR with data on S3

Follow the instructions from my previous post of running jobs on EMR with the following changes.

In the jar location field,

s3n://buckwell/input/mahout-core-0.7-job.jar

Note : jar file is in bucket buckwell, in folder input/ , change S3 directories accordingly.

In the arguments field,

org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -Dmapred.input.dir=s3n://buckwell/data/links-converted.txt -Dmapred.output.dir=s3n://buckwell/output --usersFile s3n://buckwell/data/users.txt --booleanData -s SIMILARITY_LOGLIKELIHOOD

Note : links-converted.txt and users.txt file are in bucket buckwell, in the folder data , change S3 directories accordingly.


Advertisements

3 thoughts on “Setting up Mahout and running recommender job – Associatation rule

  1. Hey, Is the jar file you mentioned is created by you? I am wondering if I have to use the different data but same kind of similarity and input files. Do I have to write my own java classes and convert it to jar ? Also, the current version of Mahout on EMR is 0.11.0, could you please let me know on what jar files I have to use for running the jobs on current EMR with new version of Mahot. Thanks

    • What I tried and succeeded:
      Mahout on AWS EMR Practice – Recommender for User
      JAR location:s3://mahoutwenzhao1st/input/mahout-examples-0.11.1-job.jar Main class:None
      Arguments:org.apache.mahout.cf.taste.hadoop.item.RecommenderJob –input s3://mahoutwenzhao1st/data/links-converted.txt
      –output s3://mahoutwenzhao1st/output
      –similarityClassname SIMILARITY_LOGLIKELIHOOD
      –usersFile s3://mahoutwenzhao1st/user/users.txt –booleanData

    • Mahout on AWS EMR Practice – Recommender for User
      JAR location:s3://mahoutwenzhao1st/input/mahout-examples-0.11.1-job.jar Main class:None
      Arguments:org.apache.mahout.cf.taste.hadoop.item.RecommenderJob –input s3://mahoutwenzhao1st/data/links-converted.txt
      –output s3://mahoutwenzhao1st/output
      –similarityClassname SIMILARITY_LOGLIKELIHOOD
      –usersFile s3://mahoutwenzhao1st/user/users.txt –booleanData

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s