google2c7a87877ffe6fb7.html

Combining Neural Network Models for Blood Cell Classification

Abstract

The objective of the study is to evaluate the efficiency of on a multi layer neural network models built by combining Recurrent Neural Network(RNN) and Convolutional Neural Network(CNN) for solving the problem of classifying of different kind of White Blood Cells. This can have applications in the pharmaceutical and healthcare industry for automating the analysis of blood tests and other processes requiring identifying the nature of blood cells in a given image sample. It can also be used in diagnosis of various blood related diseases in patients.

Index Terms—CNN, LSTM, RNN, Cuda, WBC Classification

METHODS

The two main segments of the architecture of our proposed neural network are a Convolutional Neural Network and a Recurrent Neural Network, both trained using the same image data.

A. Convolutional Neural Networks

A Convolutional Neural Network (CNN) is a type of neural network containing cells that extract features from an input by moving over it with a small window, called a kernel. The kernel moves over the entire input, and the portion of the image captured in the kernel window is checked for features corresponding to the one the cell has learned to detect. Applied sequentially, Convolutional Neural Networks are capable of extracting both high-level and low-level features. They are typically applied to images, where the usage of a simple feedforward network would make the hidden layers unnecessarily large and computationally expensive, and also prone to overfitting.

B. Recurrent Neural Networks

A Recurrent Neural Network (RNN) is a variation of standard feedforward networks where the output of a layer is dependent not only on the current input, but also the set of inputs that has come before. This is useful for sequence detection and generation. They provide a significant advantage when the inputs obtained before can be used to predict what kind of output comes later.

PROCESSES

D. Dataset

The dataset obtained from Kaggle contains 12,500 augmented images of blood cells in JPEG format with the accompanying cell type labels. The cell types are Eosinophil, Lymphocyte, Monocyte, and Neutrophil. There are approximately 3,000 images for each of the four cell types. This dataset is

accompanied by an additional dataset containing the original pre-augmented 410 images as well as two sub-type labels and also bounding boxes for labeling (JPEG+XML) each of the cells in these images. It also contains 2500 augmented images as well as four additional subtype labels (JPEG+ CSV). There are approximately 2,000 augmented images for each of the four class as compared to 88, 33,21 and 207 of the original images. [3]

E. Data Pre-Processing

Since the dataset we have is small, we decided to augment the images by rotation, reflection about the horizontal axis and shifting both horizontally and vertically. We must ensure that the computation time without losing too much accuracy. Hence, the size of the input image was reduced to 80 _ 60. Scaling transformations are not applied because the correct identification of the type of cell depends on the size of the nuclei.

The four cell types have been transformed into a 4 dimensional vector with one-hot encoding - i.e., all components are zero, except the one corresponding to the appropriate class.

For example, the cell type ’NEUTROPHIL’ may be encoded as [0,0,1,0], and ’MONOCYTE’ may be encoded as [1,0,0,0].

Model

Our Model consists of five parts - the Input Layer, the CNN Layer, the RNN Layer, the Merge Layer and the Output Layer - as detailed below.

1) Input Layer: This is the simplest layer of the network. It takes the image data as an input, and converts it into a tensor of the appropriate size - img rows _ image columns _ 3 (one for each primary color).

2) CNN Layer: This is the segment consisting of convolutional cells that scan the image for present features. In our model, it contains 4 layers.

The first layer is a convolutional layer that consists of 32 cells, and the second layer is also a convolutional layer that consists of 64 cells.

Each layer uses a 3_3 kernel. It is followed by pooling to reduce the size of the output, a dropout layer to reduce overfitting and a layer that flattens the output to 2D.

3) RNN Layer: This layer consists of LSTMs that learn to detect recurring features in the images. In our model, the image is converted to grayscale before being passed to the RNN, to reduce overfitting due to learning color patterns. This is likely to happen since the dataset is quite small (even after augmentations) and RNNs do not deal with overfitting due to color in images as well as CNNs due to their similarities with standard feed forward networks. Our model has 2 LSTM layers.

Each LSTM layer uses 64 LSTMs, and is followed by a dropout layer to reduce overfitting.

4) Merge Layer: The merge layer takes the CNN layer’s output and the RNN layer’s output and returns the elementwise product of the two. Both layers were constructed to have 64 neurons at the time of merging. Therefore, we get the combination of the CNN and RNN parts. A 128 cell layer with ReLU activation follows this for another round of processing, followed by a final dropout layer, feeding into the output layer.

5) Output Layer: The output layer has 128 cells, and applies the softmax function to evaluate which class the given input image is most likely to fall into, as defined by the input set, according to equation (9). Since the input set only has single-class data, we decided to use the softmax function to improve our model’s classification power.

Training

For training the network, we decided to use the predefined split in the dataset. Both the branches of the network, the CNN and the RNN, are trained simultaneously. The loss function used is categorical cross-entropy, since we have a four-category classification problem. The optimizer used is Adadelta, since it allows for adaptive learning rate based on previous gradients.

For both the four-way and the two-way classification problems, the model is trained for 70 epochs, with an initial learning rate of 1.0 and a batch size of 32.

EXPERIMENTS AND RESULTS

For comparative purposes, we tested a CNN-only solution to the problem, under the same conditions as the hybrid solution.

A table comparing the two is shown below: