Vectorized Intersection Over Union (IOU) In Numpy and Tensor Flow

0

Its been a while since I wrote a post. I have been recently working with Convolutional Neural Networks for Object Detection, and one of the important algorithms is Intersection Over Union (IOU) or Jaccard similarity coefficient.

In this post I talk about vectorizing IOU calculation and benchmarking it on platforms like Numpy, and Tensor Flow.

Intersection Over Union, src:https://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/

In the papers Yolo, YoloV2 or SqueezeDet, duplicate detections are eliminated by applying Non Max Suppression(NMS). IOU is computed for the bounding boxes (bboxes) with the high confidence with all the other bboxes. The bboxes that have a high IOU with the bboxes of high confidence are suppressed, thus Non Max Suppression(NMS).

This post from pyimagesearch is a good read on the algorithm for IOU. This approach loops over the boxes to compute IOU. I vectorized the IOU algorithm in numpy to improve speed and measured the wall time using python’s time.time().

def np_vec_no_jit_iou(boxes1, boxes2):
    def run(bboxes1, bboxes2):
        x11, y11, x12, y12 = np.split(bboxes1, 4, axis=1)
        x21, y21, x22, y22 = np.split(bboxes2, 4, axis=1)
        xA = np.maximum(x11, np.transpose(x21))
        yA = np.maximum(y11, np.transpose(y21))
        xB = np.minimum(x12, np.transpose(x22))
        yB = np.minimum(y12, np.transpose(y22))
        interArea = np.maximum((xB - xA + 1), 0) * np.maximum((yB - yA + 1), 0)
        boxAArea = (x12 - x11 + 1) * (y12 - y11 + 1)
        boxBArea = (x22 - x21 + 1) * (y22 - y21 + 1)
        iou = interArea / (boxAArea + np.transpose(boxBArea) - interArea)
        return iou
    tic = time()
    run(boxes1, boxes2)
    toc = time()
    return toc - tic

Then I converted the code to use tensorflow as by simply replacing “np” with “tf”, this test was on Nvidia Titan X.

Creating the session and placeholders for bboxes in tensorflow:

sess = tf.Session(config=conf)
tf_bboxes1 = tf.placeholder(dtype=tf.float16, shape=[None, 4])
tf_bboxes2 = tf.placeholder(dtype=tf.float16, shape=[None, 4])

Tensor Flow implementation of vectorized IOU:

def tf_iou(boxes1, boxes2):
    def run(tb1, tb2):
        x11, y11, x12, y12 = tf.split(tb1, 4, axis=1)
        x21, y21, x22, y22 = tf.split(tb2, 4, axis=1)

        xA = tf.maximum(x11, tf.transpose(x21))
        yA = tf.maximum(y11, tf.transpose(y21))
        xB = tf.minimum(x12, tf.transpose(x22))
        yB = tf.minimum(y12, tf.transpose(y22))

        interArea = tf.maximum((xB - xA + 1), 0) * tf.maximum((yB - yA + 1), 0)

        boxAArea = (x12 - x11 + 1) * (y12 - y11 + 1)
        boxBArea = (x22 - x21 + 1) * (y22 - y21 + 1)

        iou = interArea / (boxAArea + tf.transpose(boxBArea) - interArea)

        return iou

    op = run(tf_bboxes1, tf_bboxes2)
    sess.run(op, feed_dict={tf_bboxes1: boxes1, tf_bboxes2: boxes2})
    tic = time()
    sess.run(op, feed_dict={tf_bboxes1: boxes1, tf_bboxes2: boxes2})
    toc = time()
    return toc - tic

Bench mark the different versions in terms of execution time by simulating bboxes:

def get_2_bbxoes(num_boxes_in_1=10001, num_boxes_in_2=10001):
    # generating random co-ordinates of [x1,y1,x2,y2]
    boxes1 = np.reshape(np.random.randint(high=1200, low=0, size=num_boxes_in_1 * 4), newshape=(num_boxes_in_1, 4))
    boxes2 = np.reshape(np.random.randint(high=1200, low=0, size=num_boxes_in_2 * 4), newshape=(num_boxes_in_2, 4))
    return boxes1, boxes2
for num_boxes_in_1, num_boxes_in_2 in zip(range(1, box1_max, box1_step), range(10, box2_max, box2_step)):
    print("num_boxes_in_1: {}, \t num_boxes_in_2: {}".format(num_boxes_in_1, num_boxes_in_2))

    boxes1, boxes2 = get_2_bbxoes(num_boxes_in_1, num_boxes_in_2)

Data frame and processing time plots

Read the bench mark results into a pandas dataframe
When the number of boxes are more than 5 vectorized version performs better
Tensorflow based implementation’s runtime is pretty much constant all through, and performs better when the number of boxes are higher. When the number of boxes are less the over head of copying the data to the GPU is greater than the processing time.

Another interesting observation here is that the plain numpy version of IOU performs better that the vectorized version when the number of boxes is greater than 2000.

Conclusion:

Tensor flow based vectorized IOU works better when the number of boxes is more that 1000. So when your number of of boxes is low use numpy’s vectorized code and when you have high number of boxes use tensor flow.

The full code can be found here

References:

pyimagesearch

Setting up Mahout and running recommender job – Associatation rule

3

For this I assume you have configured hadoop and have maven and SVN installed. I am using Ubuntu 12.04

Setting up mahout

Execute the following commands in your eclipse workspace.

[~/workspace]$ svn co http://svn.apache.org/repos/asf/mahout/trunk
[~/workspace]$ mv trunk/ mahout/
[~/workspace/mahout]$ cd mahout/
[~/workspace/mahout]$mvn install
[~/workspace/mahout/core]$ cd core/
[~/workspace/mahout/core]$ mvn compile
[~/workspace/mahout/core]$ mvn install
[~/workspace/mahout/core]$ cd ../examples
[~/workspace/mahout/examples]$ mvn compile

If want to further configure mahout please refer to these blogs here and here.

About the data and upload data into HDFS

Recommender, a part of mahout which runs on top of hadoop takes inputs in the form of <key,value> pairs. The recommender needs two files, an input file and a users file. The input file contains data which has been essentially converted into <key,value> pairs. The users file has the key of the users that you want recommendations for. You can download the input file from my google drive. For the users.txt file create a file with one key value in the first line, which looks like this.

[~/input]$ cat users.txt
8

The number 8 above is the key, say you are trying to get recommendations for user # 8. You could have more keys too but it is going to take a way longer time.

Make sure your hadoop is running and upload your files into HDFS.

[~/input]$ hadoop dfs -mkdir input/
[~/input]$ hadoop dfs -put links-converted.txt input/
[~/input]$ hadoop dfs -put users.txt input/

Run recommender on the local machine

Recomendation-AssociationRuleMahout

This is the command to run recommender job , I know its kind of loaded, I will try and explain it below.

[~/workspace/mahout]$ hadoop jar ~/workspace/mahout/core/target/mahout-core-0.7-job.jar org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -Dmapred.input.dir=input/links-converted.txt -Dmapred.output.dir=output --usersFile input/users.txt --booleanData -s SIMILARITY_LOGLIKELIHOOD

hadoop jar ~/workspace/mahout/core/target/mahout-core-0.7-job.jar tells hadoop, we are giving it a jar file and the location of the jar file.

org.apache.mahout.cf.taste.hadoop.item.RecommenderJob is the main class where execution  starts from.

-Dmapred.input.dir=input/links-converted.txt -Dmapred.output.dir=output are the HDFS input and output directories.

–usersFile input/users.txt has the key value of the users that you want recommendations for.

-s SIMILARITY_LOGLIKELIHOOD This gives the probability of similarity between 8 (as given in the users.txt file ) and the rest of the keys. You could also do other analysis like SIMILARITY_COOCCURRENCE, SIMILARITY_LOGLIKELIHOOD,  SIMILARITY_TANIMOTO_COEFFICIENT, SIMILARITY_CITY_BLOCK, SIMILARITY_EUCLIDEAN_DISTANCE.

This is the output I got

 8[3303009:1.0,4393611:1.0,5292042:1.0,2583882:1.0,1850305:1.0,275656:1.0,1254637:1.0,1720928:1.0,5575496:1.0,3956845:1.0]
<pre>

Running on EMR with data on S3

Follow the instructions from my previous post of running jobs on EMR with the following changes.

In the jar location field,

s3n://buckwell/input/mahout-core-0.7-job.jar

Note : jar file is in bucket buckwell, in folder input/ , change S3 directories accordingly.

In the arguments field,

org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -Dmapred.input.dir=s3n://buckwell/data/links-converted.txt -Dmapred.output.dir=s3n://buckwell/output --usersFile s3n://buckwell/data/users.txt --booleanData -s SIMILARITY_LOGLIKELIHOOD

Note : links-converted.txt and users.txt file are in bucket buckwell, in the folder data , change S3 directories accordingly.


Running jobs on Elastic Map Reduce(EMR) with data on S3

1

This article gives step-by-step instructions on a creating job flow on Amazon EMR with data on Amazon S3. I am using the same wordcount.jar file from the previous post and also the same data file.

Upload data and source to S3

Go to your S3 account here and create a Bucket, my bucket name is buckwell and create a folder for data and another folder for jar files called source. Now upload the data file into the folder s3n://buckwell/data/ as shown in the fig 1 below.

Upload haha.txt to data folder

Fig 1: Upload haha.txt to data folder

Also upload code into the folder  s3n://buckwell/source/ as shown in fig 2.

Fig 2 : Upload wordcount.jar into S3n://buckwell/source/

Fig 2 : Upload wordcount.jar into S3n://buckwell/source/

Creating job flow on EMR

Login into your EMR here. Click on the “create a new job flow” button , type in a job flow name, select Hadoop version as Hadoop 0.20.205 (MapR M3 Edition v1.2.8)  then select your own job flow and in the drop down select Custom JAR as shown in fig 3.

Fig 3: EMR job flow

Fig 3: EMR job flow

Click continue and set the input parameters and arguments as in Fig 4

Fig 4: Job Flow arguments and input parameters

Fig 4: Job Flow arguments and input parameters

jar location :

s3n://buckwell/source/wordcount.jar

jar Arguments :

s3n://buckwell/data/haha.txt s3n://buckwell/output

If you have used different folders , change the input parameters accordingly. Also , you don’t need to create the output/ folder in your bucket , it will be automatically created in the course of the job flow, EMR throws an error if the folder already exists. Click continue and in the Advanced options tab set the Log Path to :

s3n://buckwell/logs/

Click continue and finally create your job flow and close. You should now be back at the Job Flows window which looks like Fig 5.

Fig 5: Job flow running

Fig 5: Job flow running

Its going to go through its different phases of STARTING, BOOTSTRAPPING , RUNNING and COMPLETED. It should look somewhat like Fig 6:

Fig 6: Completed screen with controller, stderr, stdout, syslog

Fig 6: Completed screen with controller, stderr, stdout, syslog

Click on the controller, stderr, stdout, syslog to look at the logs and error messages. If you get the completed message stderr should be empty else use it debug. To look at the output file go back to your S3 and open/download this file s3n://buckwell/output/part-r-00000 with a notepad.

Wordcount MapReduce from command line

1

To run map reduce from command line we need to upload data on to HDFS and run the MapReduce form the command prompt giving the input as an argument. For this exercise we are going to use the source code from the previous post.

Upload data onto HDFS

Create a directory for input files and copy data into HDFS. I am going to use the same data file from the previous post.

[~/workspace/wordcount]$ hadoop dfs -mkdir /user/venu/input/
[~/workspace/wordcount]$ hadoop dfs -put ./lols.txt /user/venu/input/data.txt
[~/workspace/wordcount]$ hadoop dfs -cat /user/venu/input/data.txt
haha
papa
lala
dada
kaka
papa

Creating a .jar file

To create a jar file right click on the project and select java/JAR file click next. In the next windows select .classpath and .project and click next, check “export files with compile warnings”  click next, select “Generate manifest file” and in the Main class field type “WordCountDriver”

picword

Run MapReduce

[~/workspace]$ hadoop jar wordcount.jar /user/venu/input/data.txt
12/11/19 22:47:13 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
12/11/19 22:47:13 INFO input.FileInputFormat: Total input paths to proc
...
...
...
2/11/19 22:47:17 INFO mapred.JobClient: Reduce output records=5
12/11/19 22:47:17 INFO mapred.JobClient: Spilled Records=12
12/11/19 22:47:17 INFO mapred.JobClient: Map output bytes=54
12/11/19 22:47:17 INFO mapred.JobClient: Combine input records=0
12/11/19 22:47:17 INFO mapred.JobClient: Map output records=6
12/11/19 22:47:17 INFO mapred.JobClient: Reduce input records=6
true

To look at the output

[~/workspace]$ hadoop dfs -lsr
drwxr-xr-x - venu supergroup 0 2012-11-19 22:47 /user/venu/Out1119224713
-rw-r--r-- 1 venu supergroup 35 2012-11-19 22:47 /user/venu/Out1119224713/part-r-00000
drwxr-xr-x - venu supergroup 0 2012-11-19 21:38 /user/venu/input
-rw-r--r-- 1 venu supergroup 30 2012-11-19 21:38 /user/venu/input/data.txt
[~/workspace]$ hadoop dfs -cat /user/venu/Out1119224713/part-r-00000
dada 1
haha 1
kaka 1
lala 1
papa 2

Wordcount mapreduce on Hadoop 0.20.2 using eclipse plugin

2

For this post I assume that you have setup hadoop on your local system and it is running if not you can get that info here.

Setting up map-reduce plugin for eclipse

To setup the eclipse plugin copy the HADOOP_HOME/hadoop-0.20.2/contrib/eclipse-plugin/hadoop-0.20.2-eclipse-plugin.jar to the ECLIPSE_HOME/plugins/ folder. Restart eclipse, click on Windows/Open Prespective/Other , you should now see the infamous blue elephant Map/Reduce select it and you should see the “Map/Reduce Locations” tab. On the tab click the “New Map Redude” button and set the values as shown below , if you have set the HADOOP_HOME/conf/*-site.xml values as I have in the my previous post , if not change the values here accordingly. You should now see DFS locations in the Projects tab or eclipse. Click on the drop down menu and you should be able to see data and code you uploaded on to HDFS.

Project setup and coding

In the eclipse go to File-Project and select Map/Reduce Project and click Next, type the name of the Project as “WordCount” and Finish. We define 3 classes here, a)Driver b)Mapper c)Reducer. Driver contains the main method and defines the mapper and driver classes. Mapper function takes care of mapping processes and data using key, values pairs. The reducer function does the actual processing. This could even be a multi-level tree kind of structure. Where the output of one MapReduce is the input of another MapReduce.

To create a Driver class right click on the project and select New/Other/MapReduce folder/MapReduce Driver and click next, name the class WordCountDriver and make sure your code looks something like this.

// WordCountDriver.java
import java.io.IOException;
import java.util.Date;
import java.util.Formatter;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class WordCountDriver {

public static void main(String[] args) throws IOException,
 InterruptedException, ClassNotFoundException {
 Configuration conf = new Configuration();
 GenericOptionsParser parser = new GenericOptionsParser(conf, args);
 args = parser.getRemainingArgs();

Job job = new Job(conf, "wordcount");

 job.setJarByClass(WordCountDriver.class);

job.setOutputKeyClass(Text.class);
 job.setOutputValueClass(IntWritable.class);

job.setInputFormatClass(TextInputFormat.class);
 job.setOutputFormatClass(TextOutputFormat.class);

Formatter formatter = new Formatter();
 String outpath = "Out"
 + formatter.format("%1$tm%1$td%1$tH%1$tM%1$tS", new Date());
 FileInputFormat.setInputPaths(job, new Path(args[0]));
 FileOutputFormat.setOutputPath(job, new Path(outpath));
 job.setMapperClass(WordCountMapper.class);
 job.setReducerClass(WordCountReducer.class);

System.out.println(job.waitForCompletion(true));
 }
}

Similarly for the Mapper class right click on the project and select New/Other/MapReduce folder/Mapper and click next, name the class WordCountmapper and make sure your code looks something like this.


// WordCountMapper.java

import java.io.IOException;

public class WordCountMapper extends
Mapper<LongWritable, Text, Text, IntWritable> {
private Text word = new Text();
private final static IntWritable one = new IntWritable(1);

protected void map(LongWritable key, Text value, Context context)
 throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
 word.set(tokenizer.nextToken());
 context.write(word, one);
}
}
}

For the reducer right click on the project and select New/Other/MapReduce folder/Reducer and click next, name the class WordCountReducer and make sure your code looks something like this.

// WordCountReducer.java
import java.io.IOException;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;

public class WordCountReducer extends
 Reducer<Text, IntWritable, Text, IntWritable> {
 protected void reduce(Text key, Iterable<IntWritable> values,
 Context context) throws IOException, InterruptedException {
 int sum = 0;
 for (IntWritable value : values) {
 sum += value.get();
 }
 context.write(key, new IntWritable(sum));
 }
}

Setting Log4j properties

Copy the following code into project-home/bin folder workspace/WordCount/bin/log4j.properties

log4j.rootLogger=INFO,console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n

Input and run

Let the input file to the wordcount program be as follows and place it in project home folder workspace/WordCount/input.txt

haha
papa
lala
dada
kaka
papa

Right click on the project and set run confugurations to

  input.txt 

Run the project and you should see a bunch of output lines on the console, the end of the output should look like this

12/11/19 17:05:11 INFO mapred.JobClient: Reduce output records=5
12/11/19 17:05:11 INFO mapred.JobClient: Spilled Records=12
12/11/19 17:05:11 INFO mapred.JobClient: Map output bytes=54
12/11/19 17:05:11 INFO mapred.JobClient: Combine input records=0
12/11/19 17:05:11 INFO mapred.JobClient: Map output records=6
12/11/19 17:05:11 INFO mapred.JobClient: Reduce input records=6
true

Also the WORKSPACE/WordCount/Out{time}/part-r-0000 should look as follows, note “papa” has been correctly counted twice.

haha   1
lala   1
dada   1
kaka   1
papa   2

That was wordcount with MapReduce plugin on eclipse.

Setting up hadoop-0.20.2 single node on Ubuntu

2

In this article I explain, how to set up hadoop 0.20.2, configure and transfer data on to a Hadoop Distributed File System(HDFS). I am using 0.20.2 as it is the most widely used and also, it comes with a pre-built eclipse plugin which would be very convenient to write map reduce programs, I will not be talking about running a sample program on hadoop in this article.

 Download and untar

From you HOME folder run to download hadoop 0.20.2 from the apache archive.


[~]$ wget -c http://archive.apache.org/dist/hadoop/core/hadoop-0.20.2/hadoop-0.20.2.tar.gz

Untar the downloaded file.


[~]$ tar -zxvf hadoop-0.20.2.tar.gz

Configure

Assuming you have java installed , JAVA_HOME and PATH set, add the following to the end or your .bash_profile.


export HADOOP_HOME=$HOME/hadoop-0.20.2
export PATH=$PATH:$HADOOP_HOME/bin

and run


source ~/.bash_profile

vi into the HADOOP_HOME/conf/hadoop-env.sh

 [~/hadoop-0.20.2/conf]$ vi hadoop-env.sh 

and add this

 export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-i386/ 

Copy the following into HADOOP_HOME/conf/core-site.xml  save and exit.


<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

Copy the following into HADOOP_HOME/conf/hdfs-site.xml  save and exit.

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
 <name>dfs.replication</name>
 <value>1</value>
 <!-- set to 1 to reduce warnings when running on a single node -->
 </property>
</configuration>

Copy the following into HADOOP_HOME/conf/mapred-site.xml save and exit.


<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
 <property>
 <name>mapred.job.tasker</name>
 <value>localhost:9001</value>
 </property>
</configuration>

Running hadoop

If your HADOOP_HOME is properly set


[~]$ hadoop
Usage: hadoop [--config confdir] COMMAND
where COMMAND is one of:
 namenode -format format the DFS filesystem
 secondarynamenode run the DFS secondary namenode

you should see the above output else go to your HADOOP_HOME directory and run.


[~/hadoop-0.20.2]$ bin/hadoop
Usage: hadoop [--config confdir] COMMAND
where COMMAND is one of:
 namenode -format format the DFS filesystem
 secondarynamenode run the DFS secondary namenode
 namenode run the DFS namenode

Format namenode , datanode , secondarynamenode and trasktracker


[~/hadoop-0.20.2]$ bin/hadoop namenode -format

[~/hadoop-0.20.2]$ bin/hadoop secondarynamenode -format

[~/hadoop-0.20.2]$ bin/hadoop datanode -format

[~/hadoop-0.20.2]$ bin/hadoop tasktracker -format

Loading data into HDFS

Say you want to load a *.txt file that you want to analyse on HDFS.


[~/hadoop-0.20.2]$ bin/hadoop dfs -mkdir /user/venu/input_data/

[~/hadoop-0.20.2]$ bin/hadoop dfs -ls
Found 1 items
drwxr-xr-x - venu supergroup 0 2012-11-14 23:06 /user/venu/input_data

[~/hadoop-0.20.2]$ bin/hadoop dfs -put ~/clusteranalyze.txt /user/venu/input_data/

[~/hadoop-0.20.2]$ bin/hadoop dfs -cat /user/venu/input_data/clusteranalyze.txt

[~/hadoop-0.20.2]$ bin/hadoop dfs -rmr /user/venu/input_data/clusteranalyze.txt  ######## to delete data from HDFS
Deleted hdfs://localhost:9000/user/venu/input_data/clusteranalyze.txt

FYI /user/venu/ directory in this case is the HOME directory of HDFS.

That was intro to hadoop, lets look at map reduce with hadoop after the break.

Setting up amazon EC2 on Ubuntu and create an instance

1

In this post I describe how to setup EC2 , manage security credentials and instantiate an EC2 image from bash shell. You can also create instances from amazon’s GUI which I will discuss in a separate post.

Signing up for AWS

You can sign up for Amazon Web Services here. If already have an amazon account that you use for buying stuff or on your Kindle you can use the same credentials , amazon will charge you for your usage as you go on the same card.

I have configured my command line interface on an Ubuntu system running on VMware inside Windows 7. Alternatively you can also install it in Cygwin. If have mac you can configure it on a mac directly but would have to use different package installer commands.

Download EC2 and Security credentials

You can download EC2 with the following command

[~]$ wget http://s3.amazonaws.com/ec2-downloads/ec2-api-tools-1.3-34128.zip

Unzip EC2 and put in amazon folder

[~]$ mkdir ~/amazon/
[~]$ cd ~/amazon/
[~/amazon/]$ cp ec2-api-tools-1.3-34128.zip .
[~/amazon/]$ unzip ec2-api-tools-1.3-34128.zip

Amazon uses three types of access credentials , here I talk about setting them up.

[~/amazon]$ mkdir .ec2/ 

You can set up security credentials here . Under the section Access Credentials you see three tabs Access Keys (for REST AWS service API), X.509 Certificates (for SOAP API calls),  CloudFront Key Pairs.

Access keys are mostly used for third party applications to access AWS like, Whirr package which creates instances for processing on the fly or EMR command line interface.

Click on the second tab X.509 Certificates and create a new certificate, download the certificate( cert-XXXX.pem ) and private key (pk-XXXX.pem) into the ~/amazon/.ec2 folder. You can only download the private key at the time of creation but the X.509 can be downloaded anytime. You can always create more X.509 certificates.

Now click the third, Key pairs tab and download the two CloudFront key pairs( rsa-XXXX.pem , pk-XXXX.pem) into the ~/amazon/.ec2 folder.

Add the following values to the end or your .bash_profile.

# EC2
export EC2_KEYPAIR=key1
export EC2_HOME=$HOME/amazon/ec2-api-tools-1.3-34128
export PATH=$PATH:$EC2_HOME/bin
export EC2_PRIVATE_KEY=$HOME/amazon/.ec2/pk-XXXXXXXXXXXXXXXXXXXXXXXXXXXXX.pem
export EC2_CERT=$HOME/amazon/.ec2/cert-XXXXXXXXXXXXXXXXXXXXXXXXXXXXX.pem
#JAVA PATH
export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-i386/
export PATH=$PATH:/usr/lib/jvm/java-6-openjdk-i386/bin

If you already have your java path set you can skip the last 3 lines, if not make sure you have set PATH to the directory of your java installation, mine is /usr/lib/jvm , modify it accordingly.

Then run the following command

[~/amazon]$ source ~/.bash_profile

Create Key Pair

Here we create a key pair that we use to create instances and tunnel into them.

[~/amazon]$ ec2-create-keypair testkey
KEYPAIR testkey XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX
-----BEGIN RSA PRIVATE KEY-----
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
-----END RSA PRIVATE KEY-----


It prints out a bunch of lines which looks like as shown above. vi ~/testkey.pem and copy everything except the first line in the above output. It does not work for some reason with the first line included. Also notice, I put testkey.pem file in the HOME folder. It did not work when it was in a different folder for some reason.

You can also create a new keypair with the web UI here , download it to the HOME folder and delete the first line.

Now you are finally ready to run your first EC2 command.

Create instance

First we need to look at the images available.

[~/amazon]$ ec2-describe-images -a | grep bitnami | grep ubuntu| grep jasper

The above command lists all the available images and I am filtering it with the grep command for bitnami (Owner) , ubuntu (Platform) , jasper (system with jasper server configured). It is going to print a bunch of images that have the above configuration. Copy the ami-XXXXXXXXX number and give it as an argument as shown here, you could use the same ami number, this particular image is in US-eastcoast , would be slow if you are in a different part of the world.

[~/amazon]$ ec2-run-instances -k testkey ami-93de08fa
RESERVATION r-c4c01abd 157157267532 default INSTANCE i-XXXXXXXX ami-93de08fa pending key10 m1.small 2012-XX-XXXXX:35:01+0000 us-east-1a aki-XXXXXX monitoring-disabled instance-store paravirtual xen sg-XXXXXXX default



This creates an instance for the image number ami-93de08fa and prints info about the instance. Copy the i-XXXXXXXX (instance number) and wait for a while so that it gets into running state from pending state.

[~/amazon]$ ec2-describe-instances i-d634c4a9
RESERVATION r-c4c01abd 157157267532 default INSTANCE i-d634c4a9 ami-93de08fa ec2-XXX-XX-XX-XXX.compute-1.amazonaws.com ip-XX-XXX-XXX-XXX.ec2.internal running key1 0m1.small 2012-XX-XXT01:35:01+0000 us-east-1a aki-825ea7eb monitoring-disabled 107.20.79.202 10.118.142.145 instance-store paravirtual xen

Now copy the ec2-XXX-XX-XX-XXX.compute-1.amazonaws.com and paste it in a web browser to see the tomcat home page. Click on the green colored access my application button to access jaspersoft log in page.

Ssh into the EC2 instance

To be able to shh into the remote machine you first need to make sure you have set your security group permissions to accept SSH connections, you can do it here. In the top tab click on Default security group in the Inbound tab and say “Create a new rule” and add ssh to it the existing protocols.

 [~/amazon]$ ssh -i ~/key1.pem bitnami@ec2-XXX-XX-XX-XXX.compute-1.amazonaws.com 

Kill the instance

Make sure you kill the instance at the end of your trial, if don’t, amazon will charge 8 cents an hour at the minimum and if you don’t realize, you could rack up a huge bill.

ec2-terminate-instances i-XXXXXXX