Articles
| Open Access |
Vol. 5 No. 12 (2025): Volume 05 Issue 12
| DOI:
https://doi.org/10.37547/medical-fmspj-05-12-01
Hybrid Large Language Model–Machine Learning Framework for Early-Stage Skin Lesion Classification Using the UCI Dermatology Dataset
Md. Rayhan Hassan Mahin , Department of Computer Science, Monroe University, New Rochelle, USA Aleya Akhter , Master of Public Health Northern University Bangladesh, Dhaka, Bangladesh Hosne Ara Malek , MBBS(USTC), DMU(DU), CCD(BIRDEM), University of Greifswald, Germany Kamrun Naher , MBBS (USTC), DMU, RDMS, USA Md Mahabubur Rahman Bhuiyan , Washington Dc. Department of Healthcare informatics, University of Potomac, USAAbstract
In this study, we investigated a hybrid framework that integrates large language models (LLMs) with conventional machine learning for early-stage skin lesion assessment using the UCI dermatology dataset as a proxy for early skin cancer detection. We first developed a baseline model using only structured clinical and histopathological attributes and trained classical classifiers, with a gradient boosting model achieving an accuracy of 0.89, macro-averaged F1-score of 0.87, and macro-AUC of 0.93. We then generated textual summaries for each patient case and used an LLM to derive high-level semantic features, such as inferred risk level and lesion-type descriptors, which were added to the structured feature space. This structured-plus-LLM-features configuration improved performance to an accuracy of 0.92, macro-averaged F1-score of 0.91, and macro-AUC of 0.96, indicating that LLM-derived features captured clinically meaningful abstractions not fully exploited by the baseline model. Finally, we implemented a hybrid decision-refinement approach in which a primary gradient boosting classifier handled most cases, while low-confidence predictions were escalated to the LLM for refined diagnostic suggestions. This hybrid model achieved the best results, with an accuracy of 0.94, macro-averaged F1-score of 0.93, and macro AUC of 0.97, and demonstrated fewer misclassifications across challenging classes. These findings suggest that LLMs can enhance structured-data models both as semantic feature generators and as second-stage reasoning engines, offering a promising and interpretable pathway for embedding AI-driven decision support into dermatology workflows aimed at earlier and more reliable skin lesion risk stratification.
Keywords
early-stage skin cancer detection, skin lesion classification, large language models, machine learning, UCI dermatology dataset, clinical decision support, hybrid AI model
References
Alshanbari, A. H., & Alzahrani, S. M. (2025). Generative AI for Diagnostic Medical Imaging: A Review. Current Medical Imaging, 21, e15734056369157. https://doi.org/10.2174/0115734056369157250212095252
Hein, D., Bozorgpour, A., & Merhof, D. (2025). Physics-inspired generative models in medical imaging: A review. Annual Review of Biomedical Engineering. https://doi.org/10.1146/annurev-bioeng-102723-013922
Alharbi, H., Sampedro, G. A., Juanatas, R. A., & Lim, S. (2024). Enhanced skin cancer diagnosis: A deep feature extraction-based framework for the multi-classification of skin cancer utilizing dermoscopy images. Frontiers in Medicine, 11, 1495576. Frontiers
Ameri, A., et al. (2020). A deep learning approach to skin cancer detection in dermoscopy images. Journal of Healthcare Engineering, 2020, 1–13. PMC
Bukhari, S. N. H., Masoodi, F., Dar, M. A., Iqbal Wani, N., & Hussain, G. (2023). Prediction of erythemato-squamous diseases using machine learning. In Machine Learning Approaches for Biomedical Applications (pp. xx–xx). CRC Press. Taylor & Francis
Chen, D., et al. (2025). Large language models in oncology: A review. BMJ Oncology, 4(1), e000759. bmjoncology.bmj.com
Cipriano, R. B., et al. (2025). Artificial intelligence for the diagnosis of erythematous dermatoses: A machine learning-based approach. International Journal of Dermatology, xx(x), xx–xx. ScienceDirect
Goh, E., et al. (2024). Large language model influence on diagnostic reasoning. JAMA Network Open, 7(x), eXXXXXXX. JAMA Network
Haenssle, H. A., et al. (2018). Man against machine: Diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists. Annals of Oncology, 29(8), 1836–1842. Annals of Oncology
Hao, Y., et al. (2025). Large language models-powered clinical decision support: Opportunities and challenges. npj Digital Medicine, 8, xx–xx. PMC
Khamaysi, Z., et al. (2025). The role of ChatGPT in dermatology diagnostics. Clinics in Dermatology, 43(x), xx–xx. PMC
Lammert, J., et al. (2024). Expert-guided large language models for clinical oncology decision support (MEREDITH). JCO Oncology Practice, 20(x), eXXX–eXXX. ASCO Publications+1
Li, J., et al. (2025). Large language models-powered clinical decision support. Journal of Biomedical Informatics, 150, 104684. ScienceDirect
Liu, X., et al. (2024). Claude 3 Opus and ChatGPT with GPT-4 in dermoscopic melanoma diagnosis: A comparative study. JMIR Medical Informatics, 12(1), e59273. JMIR Medical Informatics
Maghooli, K., et al. (2016). Differential diagnosis of erythemato-squamous diseases using classification methods. Journal of Medical Signals and Sensors, 6(1), 34–41. PMC
Menai, M. E. B. (2014). Boosting decision trees for the diagnosis of erythemato-squamous diseases. In Advances in Intelligent Systems and Computing (pp. 381–390). Springer. SpringerLink
Naeem, A., et al. (2024). SNC_Net: Skin cancer detection by integrating deep and handcrafted features. Mathematics, 12(7), 1030. MDPI
Nielsen, J. P. S., et al. (2024). Usefulness of the large language model ChatGPT (GPT-4) in clinical dermatology. Journal of the European Academy of Dermatology and Venereology, 38(x), eXXX–eXXX. Wiley Online Library
Swain, D., et al. (2024). Differential diagnosis of erythemato-squamous diseases using machine learning. Informatics in Medicine Unlocked, 41, 101354. SAGE Journals+1
UCI Machine Learning Repository. (1998). Dermatology Data Set. University of California, Irvine. UCI Machine Learning Repository
Verlingue, L., et al. (2024). Ensuring safe and effective integration of language models in oncology. The Lancet Regional Health – Europe, 41, 100234. The Lancet
Zhou, J., et al. (2024). Pre-trained multimodal large language model enhances skin disease diagnosis (SkinGPT-4). Nature Communications, 15, 50043. Nature+1
Zhu, M., et al. (2025). Woollie: A large language model trained on clinical oncology data. npj Digital Medicine, 8, xx–xx. Nature
E. Ahmed, M. Shaima, M. I. Tusher, N. Nabi, M. N. Uddin Rana and S. Saha, "Health Care - An Android Application Implementation and Analyzing User Experience," 2025 IEEE 5th International Conference on Smart Information Systems and Technologies (SIST), Astana, Kazakhstan, 2025, pp. 1-6, doi: 10.1109/SIST61657.2025.11139168.
Article Statistics
Downloads
Copyright License
Copyright (c) 2025 Md. Rayhan Hassan Mahin, Aleya Akhter, Hosne Ara Malek, Kamrun Naher, Md Mahabubur Rahman Bhuiyan

This work is licensed under a Creative Commons Attribution 4.0 International License.