Tldr: In the last few weeks I had the luck of contributing to the open source project Qu. Here I will describe what is Qu, its aim, how it is evolving and also my [small] contribution in optimizing the deep learning models that are embedded into the program.
What is Qu?
Qu is a program developed by Aaron Ponti from the Department of Biosystems Science and Engineering of the ETHZ , and aims at supporting life science researchers in exploring, annotating and processing medical images taken with fluorescence and transmitted-light microscopy resulting from lab experiments. Qu expands the functionalities of NAPARI, that is described as follows:
Napari is a fast, interactive, multi-dimensional image viewer for Python. It’s designed for browsing, annotating, and analyzing large multi-dimensional images
Qu uses the intuitive graphic interface of Napari to allow the user to visualize and annotate the images, but adds deep-learning based tools for different aims:
- Cell segmentation: detecting whether a certain pixel belongs to a cell, the contour or the background facilitates extracting simple analytics from experiments quickly and improves the quality of follow-up analysis
- Image Restoration: improves the spatial and radiometric resolution of images to facilitate comparison across images taken with different microscopy techniques, is alternative to more computationally intensive procedures such as deconvolution
My journey with Qu
When I offered to volunteer for the project, Aaron (the main developer) had two main tasks in mind:
- Improving the segmentation model by testing different architectures and tuning the hyper-parameters of the Convolutional Neural Network, in order to minimize false positive and ensure the shape of the cell is retained correctly in the predictions.
- Improve the restoration model so that not only the spatial resolution is increased but the colour space (as in the possible value of pixels in the image) is increased as well
Segmentation: model optimisation and hyperparameter tuning
If you ever worked on a ML project or deep learning project you have got to the point were the model works but not at the level you would like. Now you are faced with the choice of what should you tweak among the thousands of so-called hyperparameters that make your model do what it does. Here is a quick list of possible changes:
- Model architecture: how many neurons per layer? How many layers? How should they be connected to each other?
- Loss function: the way the model calculates how far it is from the correct solution
- Learning rate optimiser: how the model adjusts internally to look for better solutions
…and many more!
If there is something that I learnt since I started working with deep learning is that blindly trying different solutions (the so-called model cycling)is worth little.
In data science everything is from the data and for the data, and whenever you have a problem, you should go back to your data.
Thoroughly understanding your inputs and your outputs is by far the most important step to get closer to a solution. So what did I do?
- Increase the dataset: Adding more and more diverse images forces the model to refine and to build more complex internal representations of the data, which often lead to improved model performance. Plus, since the spectrum of possible uses is very wide, adding more images to the training might be useful to test different performance in relation to different uses. Fortunately the data science community is very keen on sharing resources, tools and new solutions in an open source way. Point in case: I was able to access the Evican dataset
EVICAN — Expert visual cell annotation, comprising partially annotated grayscale images of 30 different cell lines from multiple microscopes, contrast mechanisms and magnifications that is readily usable as training data for computer vision applications. With 4600 images and ∼26 000 segmented cells, our collection offers an unparalleled heterogeneous training dataset for cell biology deep learning application development.
A huge change from our previous dataset of ~100 images!
2. I built a simple diagnostic tool that allows to have a better understanding of the segmentation performance, by combining different metrics useful for the problem at hand over the ground truth /masks and the predictions. This tools was so useful and cool that I decided to develop it for the qu user interface
Image restoration and the colour space
The second task I volunteered for was more tricky: make sure the restoration model would have an increased colour space; from 8bit ( 256 colours / pixel values) to 16 bit (65,536 colours/pixel values).
This increase is not a simple graphical enhancement. In microscopy experiments, brightness indicates the concentration of a particular reagent or protein, in other words the object of the study. Increasing the colour space means being able to detect smaller difference in the observed processed!
Achieving this increased colour space was mostly a matter of correctly preparing the data. While the input data (256 bit low resolution) would remain the same, we tested several transformation and normalisation equations to translate thew target data in a 16bit colour space at higher resolution. After traning with this new set of images, the model was able to produce predictions with a larger colour space. Further testing is needed to understand how the predicted values are close to measured values, but he basis is there!