Combining Neural Network Models for Blood Cell Classification
Abstract
The objective of
the study is to evaluate the efficiency of on a multi layer neural network
models built by combining Recurrent Neural Network(RNN) and Convolutional Neural
Network(CNN) for solving the problem of classifying of different kind of White
Blood Cells. This can have applications in the pharmaceutical and healthcare
industry for automating the analysis of blood tests and other processes
requiring identifying the nature of
blood cells in a given image sample. It can also be used in diagnosis of
various blood related diseases in patients.
Index Terms—CNN,
LSTM, RNN, Cuda, WBC Classification
METHODS
The two main
segments of the architecture of our proposed neural network are a Convolutional
Neural Network and a Recurrent Neural Network, both trained using the same
image data.
A.
Convolutional Neural Networks
A Convolutional
Neural Network (CNN) is a type of neural network containing cells that extract
features from an input by moving over it with a small window, called a kernel.
The kernel moves over the entire input, and the portion of the image captured
in the kernel window is checked for features corresponding to the one the cell
has learned to detect. Applied sequentially, Convolutional Neural Networks are
capable of extracting both high-level and low-level features. They are
typically applied to images, where the usage of a simple feedforward network
would make the hidden layers unnecessarily large and computationally expensive,
and also prone to overfitting.
B.
Recurrent Neural Networks
A Recurrent
Neural Network (RNN) is a variation of standard feedforward networks where the
output of a layer is dependent not only on the current input, but also the set
of inputs that has come before. This is useful for sequence detection and
generation. They provide a significant advantage when the inputs obtained
before can be used to predict what kind of output comes later.
PROCESSES
D. Dataset
The dataset obtained from Kaggle
contains 12,500 augmented images of blood cells in JPEG
format with the accompanying cell type labels. The cell types
are Eosinophil, Lymphocyte, Monocyte, and Neutrophil. There
are approximately 3,000 images for each of the four
cell types. This dataset is
accompanied by an additional
dataset containing the original pre-augmented 410 images as well
as two sub-type labels and also bounding boxes for labeling
(JPEG+XML) each of the cells in these images. It also
contains 2500 augmented images as well as four additional
subtype labels (JPEG+ CSV). There are approximately 2,000 augmented
images for each of the four class as compared to 88,
33,21 and 207 of the original images. [3]
E.
Data Pre-Processing
Since the
dataset we have is small, we decided to augment the images by rotation,
reflection about the horizontal axis and
shifting both horizontally and vertically. We must ensure that the computation
time without losing too much accuracy. Hence, the size of the input image was
reduced to 80 _ 60. Scaling transformations are not applied because the correct
identification of the type of cell depends on the size of the nuclei.
The
four cell types have been transformed into a 4 dimensional vector with one-hot
encoding - i.e., all components are zero, except the one corresponding to the
appropriate class.
For
example, the cell type ’NEUTROPHIL’ may be encoded as [0,0,1,0], and ’MONOCYTE’
may be encoded as [1,0,0,0].
Model
Our Model
consists of five parts - the Input Layer, the CNN Layer, the RNN Layer, the
Merge Layer and the Output Layer - as detailed below.
1) Input Layer: This is the simplest layer of
the network. It takes the image data as an input, and converts it into a tensor
of the appropriate size - img rows _ image columns _ 3 (one for each primary
color).
2) CNN Layer: This is the segment consisting of
convolutional cells that scan the image for present features. In our model, it
contains 4 layers.
The
first layer is a convolutional layer that consists of 32 cells, and the second
layer is also a convolutional layer that consists of 64 cells.
Each
layer uses a 3_3 kernel. It is followed by pooling to reduce the size of the
output, a dropout layer to reduce overfitting and a layer that flattens the
output to 2D.
3) RNN Layer: This layer consists of LSTMs that learn
to detect recurring features in the images. In our model, the image is
converted to grayscale before being passed to the RNN, to reduce overfitting
due to learning color patterns. This is likely to happen since the dataset is
quite small (even after augmentations) and RNNs do not deal with overfitting
due to color in images as well as CNNs due to their similarities with standard
feed forward networks. Our model has 2 LSTM layers.
Each
LSTM layer uses 64 LSTMs, and is followed by a dropout layer to reduce
overfitting.
4) Merge Layer: The merge layer takes the CNN
layer’s output and the RNN layer’s output and returns the elementwise product
of the two. Both layers were constructed to have 64 neurons at the time of
merging. Therefore, we get the combination of the CNN and RNN parts. A 128 cell
layer with ReLU activation follows this for another round of processing,
followed by a final dropout layer, feeding into the output layer.
5) Output Layer: The output layer has 128 cells,
and applies the softmax function to evaluate which class the given input image
is most likely to fall into, as defined by the input set, according to equation
(9). Since the input set only has single-class data, we decided to use the
softmax function to improve our model’s classification power.
Training
For training the
network, we decided to use the predefined split in the dataset. Both the
branches of the network, the CNN and the RNN, are trained simultaneously. The
loss function used is categorical
cross-entropy, since we have a four-category classification problem. The
optimizer used is Adadelta, since it allows for adaptive learning rate based on
previous gradients.
For
both the four-way and the two-way classification problems, the model is trained
for 70 epochs, with an initial learning rate of 1.0 and a batch size of 32.
EXPERIMENTS AND RESULTS
For comparative
purposes, we tested a CNN-only solution to the problem, under the same
conditions as the hybrid solution.
A table comparing the two is
shown below:





















thank you for your comment
pls call me on 8125424511