Segmentation of Polyp Instruments using UNet based deep learning model

In this paper, we present a UNet-based architecture used to segment polyps and instruments from the KVASIR Polyp Dataset [1] and KVASIR Intruments Dataset [2] provided at the MedAI Challenge 2021. For the polyp segmentation task, we developed a UNet-based algorithm for segmenting polyps in images taken from endoscopies. The main focus of this task is to achieve high segmentation metrics on the supplied test dataset. Similar to the polyp segmentation task, we also developed a UNet-based algorithm for segmenting instruments in the instrument segmentation task.


Introduction
The 2021 MedAI Challenge focuses on three tasks based on input from experts in the field to address unique gastrointestinal picture segmentation issues. Two alternative segmentation situations are included, as well as a challenge on transparent machine learning systems, which underlines the importance of explainable and interpretable deep learning methods.
Segmentation, as opposed to categorization and object identification, provides a more exact region of interest for a given class.
The task of categorising each pixel of an object of interest in medical images is known as medical image segmentation. Clinicians can use medical picture segmentation to focus on a specific area of a condition and extract comprehensive information for a more accurate diagnosis. The lack of a significant number of annotated, high-quality labelled images for training [2], low image quality, the lack of a consistent segmentation technique, and large variances in images among patients are the main issues associated with medical image segmentation. To estimate the performance of additional applications, segmentation accuracy and uncertainty must be quantified [3].This indicates the requirement for an automatic, generalizable, and efficient semantic image segmentation approach.
For automated medical image segmentation, Convolutional Neural Networks (CNNs) have demonstrated stateof-the-art performance [3]. A Fully Convolutional Network is an early Deep Learning (DL) architecture that was trained end-to-end for pixel-wise prediction for semantic segmentation tasks (FCN). Another prominent image segmentation architecture for pixel-wise prediction is U-Net, which is taught end-to-end. The U-Net architecture is divided into two sections: analysis and synthesis. Deep features are learned in the analysis path, and segmentation is conducted based on the learned features in the synthesis path.
For the Instrument Segmentation Development Task, we were given 590 images and their corresponding masks in JPEG and PNG format. For the Poly Segmentation Development Task, we were given 1000 images and their corresponding masks in JPEG format. These datasets were provided for training purpose.

Materials and methods
In this section, we discuss the approach and data preprocessing steps that carried out our model formulation and helped in getting the results.

Data Pre-Processing
As the images in the KVASIR Polyp Dataset [1] and KVASIR Intruments Dataset [2] were of various shapes, we scaled them to 256 × 256 dimensions in order to train it. When trained on a larger dataset, neural networks perform better in the training and test sets. However, we have insufficient data for the instrumentation task. To address this, we used the albumentation library [4]. We used HorizontalFlip, VerticalFlip, ElasticTransform, GridDistortion and OpticalDistortion to increase out the UNet [6] is the state-of-the-art deep learning models based on neural networks that offers the best metrics score for biomedical picture segmentation, according to extensive literature surveys [7] [8] [9] in the field of biomedical image analysis. The U-Net architecture is a semantic segmentation system. The primary idea behind a "fully convolutional network" is to add layers to an existing contracting network, replacing pooling operators with upsampling operators. It can take two different paths: one that contracts and one that expands.

Model Tuning
As shown in the diagram below, the parameters to tune U-Net model for better performance falls into three categories. 1. Model Parameters: This includes 23 convolution layers, 64 filters at the first convolution layer and Batch Normalization 2. Image Parameters: In this we have used Pixel normalization as min-max, image cropping was set to be 256*256 and as discussed in the data preprocessing step, augmentation was used to be 5 times 3. Model Hyperparameters: This comprises of dice loss as loss function, optimizer was set to be adam and batch size was taken as 2. Finally the model was evaluated on Dice Coefficient, IoU, Accuracy, Precision and Recall.

Results
As shown in the figure 4 for instance, dice coefficient is 2 times the area of overlap divided by the total number Figure 3: Parameters of pixels in both ground truth and segmented mask for Polyp task was found to be 0.7986 and 0.6298 for training and internal testing dataset respectively. Similarly IoU is the area of overlap between the predicted segmentation and the ground truth divided by the union area between the predicted mask and the ground truth came out to be 0.8955 and 0.1381 respectively.Here, the training column specifies the training data and test data corresponds internal test data which we kept during splitting.

Discussion
The UNet design performs admirably in a variety of biomedical segmentation applications. We trained the model in our personal computers till 28 epochs due to lack of computation power. The test accuracy is of 10% dataset that we kept for evaluation during splitting. The UNet model has shown good accuracy on the polyp segmentation task, but we can observe that the score was not good in the instrument segmentation task. One of the main reason behind it is polyp masks are continuous circular kind of shape. But, in case of instrument dataset, the masks are abrupt, sparse and discontinuous. This can be resolved by increasing the complexity of the model like increasing neurons, use of transfer learning, etc. Our model gave similar results on testing dataset as on internal testing dataset in the case of intrumentation task which is around 0.3. Results got differ in Polyp case, here test score was low compare to our internal testing dataset. We tracked our model on two main scores: Dice coefficient and IoU because these are the scores that give us an accurate idea about the quality of the mask predicted by the model.This metric ranges from 0-1, with 0 signifying no overlap and 1 signifying perfectly overlapping segmentation.