Author Interviews, Cancer Research, Dermatology, Lancet, Melanoma, Technology / 11.11.2021
Dermatology: Datasets Used for AI Lack Diversity and Completeness
MedicalResearch.com Interview with:
Dr David Wen BM BCh
NIHR Academic Clinical Fellow in Dermatology
University of Oxford
MedicalResearch.com: What is the background for this study?
Response: Publicly available skin image datasets are commonly used to develop machine learning (ML) algorithms for skin cancer diagnosis. These datasets are often utilised as they circumvent many of the barriers associated with large scale skin lesion image acquisition. Furthermore, publicly available datasets can be used as a benchmark for direct comparison of algorithm performance.
Dataset and image metadata provide information about the disease and population upon which the algorithm was trained or validated on. This is important to know because machine learning algorithms heavily depend on the data used to train them; algorithms used for skin lesion classification frequently underperform when tested on independent datasets to which they were trained on. Detailing dataset composition is essential for extrapolating assumptions of generalisability of algorithm performance to other populations.
At the time this review was conducted, the total number of publicly available datasets globally and their respective content had not previously been characterised. Therefore, we aimed to identify publicly available skin image datasets used to develop ML algorithms for skin cancer diagnosis, to categorise their data access requirements, and to systematically evaluate their characteristics including associated metadata. (more…)