Comparison of convolutional neural network architectures for robustness against common artefacts in dermatoscopic images

Florian Katsch; Christoph Rinner; Philipp Tschandl

doi:10.5826/dpc.1203a126

Comparison of convolutional neural network architectures for robustness against common artefacts in dermatoscopic images

Authors

Florian Katsch Center for Medical Statistics, Informatics and Intelligent Systems (CeMSIIS); Medical University of Vienna
Christoph Rinner Center for Medical Statistics, Informatics and Intelligent Systems (CeMSIIS); Medical University of Vienna
Philipp Tschandl Department of Dermatology, Medical University of Vienna, Vienna, Austria

Keywords:

image classification, object detection, artefacts, instance segmentation, dermatoscopy

Abstract

Introduction: Automated classification of dermatoscopic images via neural networks shows comparable performance to clinicians in experimental conditions, but can be affected by artefacts like skin markings or rulers. It is unknown whether specialized neural networks are equally affected, or more robust to artefacts.

Objectives: Analyse robustness of three neural network architectures, namely ResNet34, Faster R-CNN and Mask R-CNN. Methods: We identified common artefacts in the public HAM10000, PH2 and the 7-point criteria evaluation datasets, and established a template-based method to superimpose artefacts on dermatoscopic images. The HAM10000-dataset with and without superimposed artefacts was used to train the networks, followed by analysing their robustness against artefacts in test images.

Results: ResNet-34 and Faster R-CNN models trained on regular images perform worse than the Mask R-CNN models when tested on images with superimposed artefacts. Artefacts in all tested images led to a decrease in area under the precision-recall curve values of 0.030 for ResNet-34 and 0.045 for Faster R-CNN in comparison to only 0.011 for Mask R-CNN. However, changes in model’s performance only became significant with 40% or more of the images having superimposed artefacts. We could also show that loss in performance occurs when the training was biased by selectively superimposing artefacts on images belonging to a certain class.

Conclusions: Instance segmentation architectures may be helpful to counter the effects of artefacts, and further research on related architectures of this family should be promoted. Our introduced template-based artefact insertion mechanism could be useful for future research.

References

1. Apalla Z, Lallas A, Sotiriou E, Lazaridou E, Ioannides D. Epidemiological trends in skin cancer. Dermatol Pract Concept. 2017;7(2):1-6.
2. Balch CM, Soong S-J, Atkins MB, et al. An evidence-based staging system for cutaneous melanoma. CA Cancer J Clin. 2004;54(3):131-149; quiz 182-184.
3. Kittler H, Pehamberger H, Wolff K, Binder M. Diagnostic accuracy of dermoscopy. Lancet Oncol. 2002;3(3):159-165.
4. Korotkov K, Garcia R. Computerized analysis of pigmented skin lesions: a review. Artif Intell Med. 2012;56(2):69-90.
5. Rubegni P, Burroni M, Cevenini G, et al. Digital dermoscopy analysis and artificial neural network for the differentiation of clinically atypical pigmented skin lesions: a retrospective study. J Invest Dermatol. 2002;119(2):471-474.
6. Esteva A, Kuprel B, Novoa RA, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542(7639):115-118.
7. Muñoz-López C, Ramírez-Cornejo C, Marchetti MA, et al. Performance of a deep neural network in teledermatology: a single-centre prospective diagnostic study. J Eur Acad Dermatol Venereol. 2021;35(2):546-553.
8. Tschandl P, Rinner C, Apalla Z, et al. Human–computer collaboration for skin cancer recognition. Nat Med. 2020;26(8):1229-1234.
9. Fink C, Blum A, Buhl T, et al. Diagnostic performance of a deep learning convolutional neural network in the differentiation of combined naevi and melanomas. J Eur Acad Dermatol Venereol. 2020;34(6):1355-1361.
10. Okuboyejo DA, Olugbara OO. A review of prevalent methods for automatic skin lesion diagnosis. Open Dermatol J. 2018;12(1):14-53.
11. Winkler JK, Fink C, Toberer F, et al. Association Between Surgical Skin Markings in Dermoscopic Images and Diagnostic Performance of a Deep Learning Convolutional Neural Network for Melanoma Recognition. JAMA Dermatol. 2019;155(10):1135-1141.
12. Navarrete-Dechent C, Dusza SW, Liopyris K, Marghoob AA, Halpern AC, Marchetti MA. Automated Dermatological Diagnosis: Hype or Reality? J Invest Dermatol. 2018;138(10):2277-2279.
13. Ren S, He K, Girshick R, Sun J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2017;39(6):1137-1149. doi:10.1109/tpami.2016.2577031
14. He K, Gkioxari G, Dollar P, Girshick R. Mask R-CNN. 2017 IEEE International Conference on Computer Vision (ICCV). Published online 2017. doi:10.1109/iccv.2017.322
15. Kaur R, LeAnder R, Mishra NK, et al. Thresholding methods for lesion segmentation of basal cell carcinoma in dermoscopy images. Skin Res Technol. 2017;23(3):416-428.
16. Mishra NK, Kaur R, Kasmi R, et al. Automatic lesion border selection in dermoscopy images using morphology and color features. Skin Res Technol. 2019;25(4):544-552.
17. Han SS, Moon IJ, Lim W, et al. Keratinocytic Skin Cancer Detection on the Face Using Region-Based Convolutional Neural Network. JAMA Dermatol. 2020;156(1):29-37.
18. Lee T, Ng V, Gallagher R, Coldman A, McLean D. Dullrazor®: A software approach to hair removal from images. Comput Biol Med. 1997;27(6):533-543.
19. Tschandl P, Rosendahl C, Kittler H. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci Data. 2018;5:180161.
20. Mendonça T, Ferreira PM, Marques JS, Marcal ARS, Rozeira J. PH2 - A dermoscopic image database for research and benchmarking. In: 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). ; 2013:5437-5440.
21. Kawahara J, Daneshvar S, Argenziano G, Hamarneh G. Seven-Point Checklist and Skin Lesion Classification Using Multitask Multimodal Neural Nets. IEEE Journal of Biomedical and Health Informatics. 2019;23(2):538-546.
22. Paszke A, Gross S, Massa F, et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In: Wallach H, Larochelle H, Beygelzimer A, d\textquotesingle Alché-Buc F, Fox E, Garnett R, eds. Advances in Neural Information Processing Systems 32. Curran Associates, Inc.; 2019:8026-8037.
23. Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: Machine learning in Python. the Journal of machine Learning research. 2011;12:2825-2830.
24. Seabold S, Perktold J. Statsmodels: Econometric and statistical modeling with python. In: Proceedings of the 9th Python in Science Conference. Vol 57. Austin, TX; 2010:61.
25. Maron RC, Haggenmüller S, von Kalle C, et al. Robustness of convolutional neural networks in recognition of pigmented skin lesions. Eur J Cancer. 2021;145:81-91.
26. Aggarwal SLP. Data augmentation in dermatology image recognition using machine learning. Skin Res Technol. 2019;25(6):815-820.
27. Winkler JK, Sies K, Fink C, et al. Association between different scale bars in dermoscopic images and diagnostic performance of a market-approved deep learning convolutional neural network for melanoma recognition. Eur J Cancer. 2021;145:146-154.
28. Michaelis C, Mitzkus B, Geirhos R, et al. Benchmarking Robustness in Object Detection: Autonomous Driving when Winter is Coming. arXiv [csCV]. Published online July 17, 2019. http://arxiv.org/abs/1907.07484