Its been a while since I wrote a post. I have been recently working with Convolutional Neural Networks for Object Detection, and one of the important algorithms is Intersection Over Union (IOU) or Jaccard similarity coefficient.
In this post I talk about vectorizing IOU calculation and benchmarking it on platforms like Numpy, and Tensor Flow.
In the papers Yolo, YoloV2 or SqueezeDet, duplicate detections are eliminated by applying Non Max Suppression(NMS). IOU is computed for the bounding boxes (bboxes) with the high confidence with all the other bboxes. The bboxes that have a high IOU with the bboxes of high confidence are suppressed, thus Non Max Suppression(NMS).
This post from pyimagesearch is a good read on the algorithm for IOU. This approach loops over the boxes to compute IOU. I vectorized the IOU algorithm in numpy to improve speed and measured the wall time using python’s time.time().
def np_vec_no_jit_iou(boxes1, boxes2): def run(bboxes1, bboxes2): x11, y11, x12, y12 = np.split(bboxes1, 4, axis=1) x21, y21, x22, y22 = np.split(bboxes2, 4, axis=1) xA = np.maximum(x11, np.transpose(x21)) yA = np.maximum(y11, np.transpose(y21)) xB = np.minimum(x12, np.transpose(x22)) yB = np.minimum(y12, np.transpose(y22)) interArea = np.maximum((xB - xA + 1), 0) * np.maximum((yB - yA + 1), 0) boxAArea = (x12 - x11 + 1) * (y12 - y11 + 1) boxBArea = (x22 - x21 + 1) * (y22 - y21 + 1) iou = interArea / (boxAArea + np.transpose(boxBArea) - interArea) return iou tic = time() run(boxes1, boxes2) toc = time() return toc - tic
Then I converted the code to use tensorflow as by simply replacing “np” with “tf”, this test was on Nvidia Titan X.
Creating the session and placeholders for bboxes in tensorflow:
sess = tf.Session(config=conf) tf_bboxes1 = tf.placeholder(dtype=tf.float16, shape=[None, 4]) tf_bboxes2 = tf.placeholder(dtype=tf.float16, shape=[None, 4])
Tensor Flow implementation of vectorized IOU:
def tf_iou(boxes1, boxes2): def run(tb1, tb2): x11, y11, x12, y12 = tf.split(tb1, 4, axis=1) x21, y21, x22, y22 = tf.split(tb2, 4, axis=1) xA = tf.maximum(x11, tf.transpose(x21)) yA = tf.maximum(y11, tf.transpose(y21)) xB = tf.minimum(x12, tf.transpose(x22)) yB = tf.minimum(y12, tf.transpose(y22)) interArea = tf.maximum((xB - xA + 1), 0) * tf.maximum((yB - yA + 1), 0) boxAArea = (x12 - x11 + 1) * (y12 - y11 + 1) boxBArea = (x22 - x21 + 1) * (y22 - y21 + 1) iou = interArea / (boxAArea + tf.transpose(boxBArea) - interArea) return iou op = run(tf_bboxes1, tf_bboxes2) sess.run(op, feed_dict={tf_bboxes1: boxes1, tf_bboxes2: boxes2}) tic = time() sess.run(op, feed_dict={tf_bboxes1: boxes1, tf_bboxes2: boxes2}) toc = time() return toc - tic
Bench mark the different versions in terms of execution time by simulating bboxes:
def get_2_bbxoes(num_boxes_in_1=10001, num_boxes_in_2=10001): # generating random co-ordinates of [x1,y1,x2,y2] boxes1 = np.reshape(np.random.randint(high=1200, low=0, size=num_boxes_in_1 * 4), newshape=(num_boxes_in_1, 4)) boxes2 = np.reshape(np.random.randint(high=1200, low=0, size=num_boxes_in_2 * 4), newshape=(num_boxes_in_2, 4)) return boxes1, boxes2 for num_boxes_in_1, num_boxes_in_2 in zip(range(1, box1_max, box1_step), range(10, box2_max, box2_step)): print("num_boxes_in_1: {}, \t num_boxes_in_2: {}".format(num_boxes_in_1, num_boxes_in_2)) boxes1, boxes2 = get_2_bbxoes(num_boxes_in_1, num_boxes_in_2)
Data frame and processing time plots
Another interesting observation here is that the plain numpy version of IOU performs better that the vectorized version when the number of boxes is greater than 2000.
Conclusion:
Tensor flow based vectorized IOU works better when the number of boxes is more that 1000. So when your number of of boxes is low use numpy’s vectorized code and when you have high number of boxes use tensor flow.
The full code can be found here
References: