Healthcare technology has a great contribution to society: not only can it provide better treatment for those in need, it can also present diagnosis with better accuracy and help with hospital management processes.
This in mind, we decided to work on a project that could benefit post-op patients directly.
It is well known that post-operative patients suffer from a great amount of pain and are usually treated with oral analgesics – such as Codeine, Hydrocodone, and Methadone. The problem with these drugs is that they cause harmful side-effects and can be misused or even abused.
A good alternative to oral medicine is pain-management catheters. We built a model that is supposed to identify the location of the Branchial Plexus (BP), a nerve segment that is used as a reference point for catheter placement in ultrasound images.
We used convolutional neural networks and U-net architecture and got up to 80% accuracy on the validation set. Here’s how we developed this project!
For this project, we used a publicly-available Kaggle dataset with over 5000 ultrasound images and corresponding training masks. In the case below, there is an example of how the white part of the mask represents the location of the BP in the image. Some of the images in the dataset did not contain a BP, so their masks were left entirely black.
Luckily, the dataset was very complete, so few changes had to be made. The images were cropped into square shapes due to the constraints of the model – as we’ll see further on.
The data was immediately split into two parts: 80% of the images were used for training and 20% were reserved for validation. The reason that this was done was to allow us to “test” our model and make sure that is was learning.
We decided Neural Networks would be useful for our purpose because, much like the human brain, they identify patterns within a set of inputs and outputs, allowing them to predict an output for a given input based on past results. This ability gives neural networks practical use in many different fields, including object detection, translation, and time series forecasting and prediction.
In the representative example above, this is shown perfectly. The input runs through hidden layers within the network which apply functions based on patterns from training data.
For visual tasks, the best type of neural networks is convolutional neural networks. Like the animal visual cortex, CNN’s use a large number of neurons, each focused on a different region of the image, to generate a complete view of the image. The word convolution refers to a mathematical operation that identifies how one function modifies the shape of another. A visual example of a convolution is provided below. You can see that each green square gathers data from the blue squares within its receptive field. All that data together should gather enough context from the image to be able to classify an object in an image.
As mentioned above, CNN’s are very useful for identifying objects of interest in an image. Unfortunately, we sought to actually locate the BP in an image, not just identify it.
As the CNN’s are structured, at every step in the neural network a pooling layer is added. These layers split the image up into regions, gather context from each region – usually by summing up the weights of all of the units within the region and taking the maximum – and continue computation using this simplified representation of the image, as opposed to the actual image itself. This works really well for simplification, speed, and ease of computation, but the tradeoff is that a significant amount of spatial data is lost at each step. Without the spatial data, the model cannot locate the presence of an object of interest.
Luckily for us, individuals in the biomedical field have already found a solution to this problem: U-net architecture. By modifying the traditional CNN, adding a symmetric expanding path which effectively “undoes” the pooling layers condensing of the image, and copying the context gathered from the convolutions to the new, more detailed map of the image, we were able to precisely locate objects of interest. We decided to try this method.
Training the model didn’t take very long using Google Collaboratory: each of the 100 epochs took about 40 seconds, so the entire process was done in about an hour. It was well worth the wait, too. We got over 80% accuracy on the validation set. It seems that the U-net CNN was a great choice for the project!