1 Introduction
CNNs are often of very large size, resulting in high memory requirement and high latency of operations, and thus not suitable for resourceconstrained applications (e.g., edge computing). To find a good compromise between network size and performance, a series of timeconsuming training/validation experiments is often used for a specific imaging application. To address this challenge, we propose a new network compression scheme targeting biomedical image segmentation in resourceconstrained application settings (e.g., low cost and easytocarry imaging devices for disaster/emergency response and military rescue).
Since the inception of FCNs [1], various improved segmentation networks [2, 3, 4, 5] were developed. To compress CNNs, various pretraining [6, 7] and posttraining compression [8, 9] schemes were suggested. In these techniques, compression thresholds often need to be set manually in multiple pruning iterations.
In contrast with natural scene images, in biomedical or healthcare application settings, images are often for a specific type of disease/injury and captured by specific imaging devices; hence, their objects and settings are quite “stable”, making the image characteristics and complexity much more specific to analyze. In this paper, we leverage this observation to introduce CCNet.
Based on the image complexity measure, target CNN, and user constraints (e.g., desired accuracy or available memory), CCNet determines for the given dataset the most suitable multiplicative factor to compress the original CNN. The resulting compressed network is then trained, with much less effort and memory compared to the original network. Experiments using 5 public and 2 inhouse datasets and 3 commonlyused CNN segmentation models as representative networks show that CCNet is effective for compressing segmentation networks, retaining up to of the base network segmentation accuracy and utilizing only of trainable parameters of the fullsized networks in the best case.
2 Methodology
Featuremap (filter output) energy is a good indicator of filter’s feature extraction capability. We have conducted a large set of experiments to study the relationship between featuremap energy and training datasets. Fig.
1 depicts 3 example energy distribution for the first convolution layer of UNet [2]. One can observe that (i) a significant number of filter outputs have very low energy, and (ii) less “complex” (to be defined more precisely later) datasets have more lowenergy filter outputs. These suggest that UNet [2] may be unnecessarily large for some biomedical datasets, and in these cases, filters can be pruned without significantly deteriorating the accuracy.Based on above observations, we develop CCNet, depicted in Fig. 2. Inputs and internal operations of CCNet are shown in parallelograms and rectangles. Existing architectures are the 3 CNNs studied and parameterized in our work. Colored boxes highlights the key contributions of this paper. We elaborate the major components in CCNet below.
2.1 Image Complexity Computation
We seek an image complexity metric that can (i) indicate the trends of segmentation accuracy and (ii) be easily computed. Our work examined the following candidate metrics: (i) signal energy, (ii) edge information (Sobel and Scharr filters along with image pyramid), (iii) local keypoint detection using SURF [10], (iv) visual clutter information [11], (v) JPEG complexity [12] and (vi) blob density. To obtain a single complexity value for an entire dataset, we take the average of complexity values over all the images in the dataset.
Dataset  Size  Type  J  B  Source 

Glands (GL)  165  RGB  0.2401  0.5711  [13] 
Lymph Nodes (LN)  74  Ultrasound  0.2445  0.0715  inhouse 
Melanoma (ME)  2750  RGB  0.1505  0.3055  [14] 
C2DHHeLa (CH)  20  Gray  0.1403  0.4607  [15] 
Wing Discs (WD)  20  Gray  0.0925  0.1348  inhouse 
C2DHU373 (CU)  34  Gray  0.1473  0.0699  [15] 
C2DLPSC (CP)  4  Gray  0.2296  0.3066  [15] 
Out of 7 datasets shown in Table. 1, 5 datasets (trainset, top 5 rows) are used to formulate the methodology, while the remaining 2 datasets (testset) are used for blind evaluation. Fig. 4 plots average complexities (normalized to the range [0,1]) against the trainset datasets arranged as their F1 and IU score degradation (two most popular segmentation accuracy metrics). Among these complexity measures, the JPEG complexity better follows the trend of F1 score degradation (i.e., higher complexity leads to lower F1). Since IU is related to both feature variety and quantity, to represent it, we linearly combine the JPEG complexity and blob density (, see Table 1), as , where is a value in . The value of is determined by inspecting the optimal regression fitting on the training datasets in our experiments. We consider J and JB for multiplier determination explained as follows.
2.2 Multiplier Determination and Network Compression
Keeping all other variables unchanged, we can express the relationship between the segmentation accuracy () and data complexity () as , where is the number of trainable parameters in a CNN. For general networks, the function can be rather complicate. But in general, segmentation accuracy is monotonically nondecreasing with respect to and , i.e., and .
For CNNs (see Fig. 3), we observe (as discussed in Section 3) that can be approximated by a linear function of . That is, for a constant that reflects the degree of degradation. Given the linear dependency, if and are known, then it is straightforward to compute the change in accuracy or in the number of parameters, when the other is provided. The value of is networkdependent, and can be obtained by performing systematic analysis on network compression and tracking the change in accuracy.
A simple way of compression is to uniformly scale down the number of feature maps in every convolution layer using a single multiplier (). Existing work has shown that it performs very well [7, 16]. The number of trainable parameters after scaling becomes , where and are the numbers of input and output feature maps, and and are filter dimensions. However, finding a good is challenging. We employ complexity measures to determine .
When producing compressed networks, we consider two practical scenarios: (1) memoryconstrained best possible accuracy, and (2) accuracyguided least memory usage. For (1), two subcases are: (1.a) disk space budget and (1.b) main memory budget. For case (1.a), given a disk space budget in MB, we first determine , based on the number of bits for each parameter. Then can be computed as . For case (1.b), sizes of featuremaps are considered along with the number of bits for , and the value of can be determined as . For (2), given the lowest acceptable accuracy and the original base network accuracy , using the linear model, , and so as can be readily computed. Using , a compressed network is produced, which then can be trained.
3 Experimental Evaluation
5 trainset datasets (Glands, Lymph Nodes, Melanoma, C2DHHeLa, Wing Discs) are used to determine for 3 CNN models (Fig. 3), which is then mapped to J & JB to determine . For simple calculations maintaining integer filter values, , , , are considered (Fig. 6 & Fig. 7
(a), (c) Xaxis). 2 testset datasets (C2DHU373, C2DLPSC) are used to validate our method. We use a standard backpropagation implementing Adam (learning rate = 0.00005) and cross entropy as loss function using data augmentation. Experiments are performed on NVIDIATITAN and Tesla P100 GPUs, using the Torch framework.
Fig. 5 shows some segmentation output. Fig. 6 and 7 show the calculated degree of degradation () for FCN [1], UNet [2], and CUMedVision [3] networks. In these figures, (a) and (c) give the degradation in the relative F1 and IU accuracy (i.e., ) with respect to changes in the number of parameters expressed in logarithmic values. The slopes of regression lines for each dataset in (a) and (c) are plotted against the respective complexities in (b) and (d).
Test case 1 (accuracyguided least memory usage). We consider an example constraint of . The
is estimated using
and and complexity (Table 1). Using the ceiling values, compressed networks are trained and analyzed. As shown in Table 2, a significant compression is achieved (best 113x for C2DHU373 on Unet and least 3.5x for C2DLPSC on CUMed) with much better accuracy compared to compression achieved using only [6] or [9]. To validate the effectiveness in estimating , we introduce a small reduction in value (, smallest possible keeping integer filters); the accuracy degrades below (Table 2, row CCNetcase1). CCNet compression does not show much improvement when pruned further, indicating few remaining ineffective filters.Test case 2 (memoryconstrained best possible accuracy). We consider a disk space budget of 1 MB. Using ceiling of , compressed networks are produced as shown in Table 2, whose accuracy satisfies the accuracy prediction made by our method (Fig. 8).
UNet [2]  CUMedVision [3]  FCN [1]  
Method  Dataset  F1  IU  log(#P)  F1  IU  log(#P)  F1  IU  log(#P)  
Base Network  C2DHU373  0.896  0.900  7.492  0.891  0.895  6.887  0.891  0.894  7.552  
C2DLPSC  0.801  0.820  0.793  0.814  0.755  0.788  
Compressed Networks 
Base Network + Squeeze [6]  C2DHU373  0.819  0.854  7.049  0.832  0.863  6.669  0.844  0.875  7.369 
C2DLPSC  0.752  0.781  0.751  0.781  0.697  0.753  
Base Network + Prune [9]  C2DHU373  0.858  0.867  7.491  0.848  0.861  6.886  0.809  0.837  7.551  
C2DLPSC  0.749  0.785  7.491  0.744  0.768  6.886  0.691  0.738  7.552  
CCNetcase1  C2DHU373  0.863  0.890  5.436  0.868  0.866  5.378  0.880  0.885  5.939  
C2DLPSC  0.775  0.818  6.640  0.763  0.794  6.341  0.720  0.766  6.949  
CCNetcase1 + Squeeze  C2DHU373  0.806  0.840  5.243  0.820  0.853  5.245  0.824  0.860  5.915  
C2DLPSC  0.681  0.735  6.197  0.629  0.705  6.176  0.663  0.728  6.786  
CCNetcase1 + Prune  C2DHU373  0.834  0.847  5.435  0.834  0.847  5.377  0.830  0.843  5.938  
C2DLPSC  0.772  0.800  6.639  0.750  0.786  6.341  0.678  0.730  6.949  
CCNetcase1  C2DHU373  0.841  0.872  5.277  0.816  0.849  5.297  0.817  0.844  5.847  
C2DLPSC  0.751  0.781  6.603  0.759  0.785  6.315  0.713  0.742  6.922  
CCNetcase2  C2DHU373  0.832  0.863  5.097  0.807  0.837  5.097  0.803  0.834  5.097  
C2DLPSC  0.698  0.745  5.097  0.711  0.743  5.097  0.644  0.719  5.097 
The overall reduction (R = ) in trainable parameters (PR) and evaluation latency (LR) for all 7 datasets (for test case 1) is plotted in Fig. 9. Larger complexity results in less compression, indicating a higher requirement in trainable parameters for extracting features. CCNet achieves parameter and latency reduction in the range of to and to for different datasets.
Approach  Dataset  Pretraining  Training  Posttraining 

UNet+[9]  C2DHU373    10781 ms  160 
C2DLPSC  2348 ms  30  
Ours (new)  C2DHU373  O  4786 ms   
C2DLPSC  1282 ms    
Ours (existing)  C2DHU373  Negligible  4786 ms   
C2DLPSC  1282 ms   
Table 3 shows training time for [9]
and CCNet on UNet for test case 1 (on P100 GPU). Per epoch training time (in ms) is provided along with number of pruning epochs (column Posttraining). We have used fewer finetuning iterations per pruning epoch, however, pruning is expensive and can exceed original network training by a factor of 3
[8, 9]. One time determination (‘O’ in Table 3) for any CNN is a bottleneck for CCNet. Yet, after this process, significant reduction in training time can be achieved for any dataset, trained on the same network. We consider ‘O’ can be computed under 2x training time of base architecture, with a sufficient degree of accuracy, using 2 datasets with two points (.4 Conclusions
In this paper, we presented a new image complexityguided network compression scheme, CCNet, for biomedical image segmentation. Instead of compressing CNNs after training, we focused on pretraining network size reduction, exploiting image complexity of the training data. Our method is effective in quickly generating compressed networks with target accuracy, outperforming stateoftheart network compression methods. Our scheme accommodates practical applied design constraints for compressing CNNs for biomedical image segmentation.
5 Acknowledgement
This work was supported in part by the National Science Foundation under Grants CNS1629914, CCF1640081, and CCF1617735, and by the Nanoelectronics Research Corporation, a whollyowned subsidiary of the Semiconductor Research Corporation, through Extremely Energy Efficient Collective Electronics, an SRCNRI Nanoelectronics Research Initiative under Research Task ID 2698.004 and 2698.005.
References
 [1] Jonathan Long, Evan Shelhamer, and Trevor Darrell, “Fully convolutional networks for semantic segmentation,” CoRR, vol. abs/1411.4038, 2014.
 [2] O. Ronneberger, P. Fischer, and T. Brox, “UNet: Convolutional networks for biomedical image segmentation,” ArXiv eprints, May 2015.

[3]
Hao Chen, Xiaojuan Qi, JieZhi Cheng, and PhengAnn Heng,
“Deep contextual networks for neuronal structure segmentation,”
in AAAI, 2016, pp. 1167–1173. 
[4]
L. Yang, Y. Zhang, J. Chen, S. Zhang, and D. Z. Chen,
“Suggestive annotation: A deep active learning framework for biomedical image segmentation,”
in 20th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), 2017, vol. III, pp. 399–407.  [5] L. Wu, Y. Xin, S. Li, T. Wang, P. A. Heng, and D. Ni, “Cascaded fully convolutional networks for automatic prenatal ultrasound image segmentation,” in 14th IEEE International Symposium on Biomedical Imaging (ISBI), April 2017, pp. 663–666.
 [6] F. N. Iandola, S. Han, et al., “SqueezeNet: AlexNetlevel accuracy with 50x fewer parameters and 0.5MB model size,” ArXiv eprints, Feb. 2016.
 [7] A. G. Howard, M. Zhu, et al., “MobileNets: Efficient convolutional neural networks for mobile vision applications,” ArXiv eprints, Apr. 2017.
 [8] S. Han, H. Mao, et al., “Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding,” ArXiv eprints, Oct. 2015.
 [9] P. Molchanov, S. Tyree, T. Karras, T. Aila, and J. Kautz, “Pruning convolutional neural networks for resource efficient inference,” ICLR, June 2017.
 [10] Herbert Bay, Tinne Tuytelaars, and Luc Van Gool, “Surf: Speeded up robust features,” in ECCV, 2006, pp. 404–417.
 [11] Ruth Rosenholtz, Yuanzhen Li, and Lisa Nakano, “Measuring visual clutter,” Journal of Vision, vol. 7, no. 2, pp. 17, 2007.
 [12] Honghai Yu and Stefan Winkler, “Image complexity and spatial information,” 5th International Workshop on Quality of Multimedia Experience (QoMEX), pp. 12–17, 2013.
 [13] K. Sirinukunwattana, J. P. W. Pluim, et al., “Gland segmentation in colon histology images: The GlaS challenge contest,” ArXiv eprints, Mar. 2016.
 [14] N. C. F. Codella, D. Gutman, et al., “Skin lesion analysis toward melanoma detection: ISBI 2017,” ArXiv eprints, Oct. 2017.
 [15] V. Ulman, M. Maška, et al., “An objective comparison of celltracking algorithms,” Nature Methods, 2017.
 [16] Ariel Gordon, Elad Eban, et al., “MorphNet: Fast and simple resourceconstrained structure learning of deep networks,” arXiv, 2017.
Comments
There are no comments yet.