MXNet made simple: Image RecordIO with im2rec and Data Loading

In this post, we will learn the mechanisms for packaging any image dataset. We will also detail how to load it while applying data augmentation with MXNet. Preparing the data for your neural network is often time consuming and error prone. This tutorial aims at providing some guideline for doing it with MXNet.

Oxford-IIIT Dataset

We will use the Oxford-IIIT Dataset to demonstrate how to perform data preparation and data loading.

From the Oxford-IIIT Dataset website:

A 37 category pet dataset with roughly 200 images for each class. The images have a large variation in scale, pose and lighting. Can also be used for localization.

Below are some pet classes from this dataset

I am not a pet expert and it is always a good idea to look at the dataset to get a feel for the computer vision task ahead. Let’s take a look at some dogs and cats!

English Cocker Spaniel Russian Blue Pug
Downloading the dataset

I wrote a small bash script to fetch the dataset and organize it in a way that im2rec can easily use to create the image record files.

$ tree -d -L 1
├── Abyssinian
  ├── Abyssinian_100.jpg
  ├── Abyssinian_101.jpg
  ├── Abyssinian_102.jpg
  ├── Abyssinian_103.jpg
  ├── Abyssinian_104.jpg
├── american_bulldog
├── american_pit_bull_terrier
├── basset_hound
├── beagle
├── Bengal
├── Birman
├── Bombay
├── boxer
├── British_Shorthair
├── chihuahua
├── Egyptian_Mau
├── english_cocker_spaniel
├── english_setter
├── staffordshire_bull_terrier
├── wheaten_terrier
└── yorkshire_terrier

37 directories

# Making .lst and .rec files for MXNet to load
if [ ! -f "$data_path/data_train2.lst" ]; then

# Cleaning up the images that are failing with OpenCV
rm -f $data_path/Abyssinian/Abyssinian_34.jpg
rm -f $data_path/Egyptian_Mau/Egyptian_Mau_139.jpg
rm -f $data_path/Egyptian_Mau/Egyptian_Mau_145.jpg
rm -f $data_path/Egyptian_Mau/Egyptian_Mau_167.jpg
rm -f $data_path/Egyptian_Mau/Egyptian_Mau_177.jpg
rm -f $data_path/Egyptian_Mau/Egyptian_Mau_191.jpg

python $MXNET_HOME/tools/ \
  --list \
  --train-ratio 0.8 \
  --recursive \
  $data_path/data $data_path

python $MXNET_HOME/tools/ \
  --resize 224 \
  --center-crop \
  --num-thread 4 \
  $data_path/data $data_path

(ns mxnet-clj-tutorials.image-record-iter
  "Tutorial for ImageRecordIter API."
  (:require [ :as mx-io]
            [org.apache.clojure-mxnet.ndarray :as ndarray]
            [opencv4.mxnet :as mx-cv]
            [opencv4.core :as cv]
            [opencv4.utils :as cvu]))

;; Parameters
(def batch-size 10)
(def data-shape [3 224 224])
(def train-rec "data/data_train.rec")

(def train-iter
    {:path-imgrec train-rec
     :data-name "data"
     :label-name "softmax_label"
     :batch-size batch-size
     :data-shape data-shape

     ;; Data Augmentation
     ; :shuffle true  ;; Whether to shuffle data randomly or not
     ; :max-rotate-angle 50  ;; Rotate by a random degree in [-50 50]
     ; :saturation 0.5
     ; :resize 300  ;; resize the shorter edge before cropping
     ; :rand-crop true  ;; randomely crop the image
     ; :rand-mirror true  ;; randomely mirror the image
     ; :max-shear-ratio 0.5 ;; randomely shear the image

(defn visualize-image-rec-iter!
   (visualize-image-rec-iter! image-rec-iter 5))
  ([image-rec-iter k]
   (let [nda-data (first (mx-io/iter-data train-iter))
         mats (map (fn [i]
                     (-> nda-data
                         ;; ith image in batch
                         (ndarray/slice i)
                         (ndarray/reshape data-shape)
                         ;; Swapping [c w h] -> [w h c]
                         (ndarray/swap-axis 0 2)
                         (ndarray/swap-axis 0 1)
                         ;; Conversion BGR -> RGB
                         (cv/cvt-color! cv/COLOR_BGR2RGB)))
                   (range k))]
     (doseq [mat mats]
       (cvu/imshow mat)))
   (mx-io/reset image-rec-iter)))


  (visualize-image-rec-iter! train-iter 8))
