Articles | Open Access | Vol. 5 No. 12 (2025): Volume 05 Issue 12 | DOI: https://doi.org/10.37547/medical-fmspj-05-12-01

Hybrid Large Language Model–Machine Learning Framework for Early-Stage Skin Lesion Classification Using the UCI Dermatology Dataset

Md. Rayhan Hassan Mahin , Department of Computer Science, Monroe University, New Rochelle, USA
Aleya Akhter , Master of Public Health Northern University Bangladesh, Dhaka, Bangladesh
Hosne Ara Malek , MBBS(USTC), DMU(DU), CCD(BIRDEM), University of Greifswald, Germany
Kamrun Naher , MBBS (USTC), DMU, RDMS, USA
Md Mahabubur Rahman Bhuiyan , Washington Dc. Department of Healthcare informatics, University of Potomac, USA

Abstract

In this study, we investigated a hybrid framework that integrates large language models (LLMs) with conventional machine learning for early-stage skin lesion assessment using the UCI dermatology dataset as a proxy for early skin cancer detection. We first developed a baseline model using only structured clinical and histopathological attributes and trained classical classifiers, with a gradient boosting model achieving an accuracy of 0.89, macro-averaged F1-score of 0.87, and macro-AUC of 0.93. We then generated textual summaries for each patient case and used an LLM to derive high-level semantic features, such as inferred risk level and lesion-type descriptors, which were added to the structured feature space. This structured-plus-LLM-features configuration improved performance to an accuracy of 0.92, macro-averaged F1-score of 0.91, and macro-AUC of 0.96, indicating that LLM-derived features captured clinically meaningful abstractions not fully exploited by the baseline model. Finally, we implemented a hybrid decision-refinement approach in which a primary gradient boosting classifier handled most cases, while low-confidence predictions were escalated to the LLM for refined diagnostic suggestions. This hybrid model achieved the best results, with an accuracy of 0.94, macro-averaged F1-score of 0.93, and macro AUC of 0.97, and demonstrated fewer misclassifications across challenging classes. These findings suggest that LLMs can enhance structured-data models both as semantic feature generators and as second-stage reasoning engines, offering a promising and interpretable pathway for embedding AI-driven decision support into dermatology workflows aimed at earlier and more reliable skin lesion risk stratification.

Keywords

early-stage skin cancer detection, skin lesion classification, large language models, machine learning, UCI dermatology dataset, clinical decision support, hybrid AI model

References

Alshanbari, A. H., & Alzahrani, S. M. (2025). Generative AI for Diagnostic Medical Imaging: A Review. Current Medical Imaging, 21, e15734056369157. https://doi.org/10.2174/0115734056369157250212095252

Hein, D., Bozorgpour, A., & Merhof, D. (2025). Physics-inspired generative models in medical imaging: A review. Annual Review of Biomedical Engineering. https://doi.org/10.1146/annurev-bioeng-102723-013922

Alharbi, H., Sampedro, G. A., Juanatas, R. A., & Lim, S. (2024). Enhanced skin cancer diagnosis: A deep feature extraction-based framework for the multi-classification of skin cancer utilizing dermoscopy images. Frontiers in Medicine, 11, 1495576. Frontiers

Ameri, A., et al. (2020). A deep learning approach to skin cancer detection in dermoscopy images. Journal of Healthcare Engineering, 2020, 1–13. PMC

Bukhari, S. N. H., Masoodi, F., Dar, M. A., Iqbal Wani, N., & Hussain, G. (2023). Prediction of erythemato-squamous diseases using machine learning. In Machine Learning Approaches for Biomedical Applications (pp. xx–xx). CRC Press. Taylor & Francis

Chen, D., et al. (2025). Large language models in oncology: A review. BMJ Oncology, 4(1), e000759. bmjoncology.bmj.com

Cipriano, R. B., et al. (2025). Artificial intelligence for the diagnosis of erythematous dermatoses: A machine learning-based approach. International Journal of Dermatology, xx(x), xx–xx. ScienceDirect

Goh, E., et al. (2024). Large language model influence on diagnostic reasoning. JAMA Network Open, 7(x), eXXXXXXX. JAMA Network

Haenssle, H. A., et al. (2018). Man against machine: Diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists. Annals of Oncology, 29(8), 1836–1842. Annals of Oncology

Hao, Y., et al. (2025). Large language models-powered clinical decision support: Opportunities and challenges. npj Digital Medicine, 8, xx–xx. PMC

Khamaysi, Z., et al. (2025). The role of ChatGPT in dermatology diagnostics. Clinics in Dermatology, 43(x), xx–xx. PMC

Lammert, J., et al. (2024). Expert-guided large language models for clinical oncology decision support (MEREDITH). JCO Oncology Practice, 20(x), eXXX–eXXX. ASCO Publications+1

Li, J., et al. (2025). Large language models-powered clinical decision support. Journal of Biomedical Informatics, 150, 104684. ScienceDirect

Liu, X., et al. (2024). Claude 3 Opus and ChatGPT with GPT-4 in dermoscopic melanoma diagnosis: A comparative study. JMIR Medical Informatics, 12(1), e59273. JMIR Medical Informatics

Maghooli, K., et al. (2016). Differential diagnosis of erythemato-squamous diseases using classification methods. Journal of Medical Signals and Sensors, 6(1), 34–41. PMC

Menai, M. E. B. (2014). Boosting decision trees for the diagnosis of erythemato-squamous diseases. In Advances in Intelligent Systems and Computing (pp. 381–390). Springer. SpringerLink

Naeem, A., et al. (2024). SNC_Net: Skin cancer detection by integrating deep and handcrafted features. Mathematics, 12(7), 1030. MDPI

Nielsen, J. P. S., et al. (2024). Usefulness of the large language model ChatGPT (GPT-4) in clinical dermatology. Journal of the European Academy of Dermatology and Venereology, 38(x), eXXX–eXXX. Wiley Online Library

Swain, D., et al. (2024). Differential diagnosis of erythemato-squamous diseases using machine learning. Informatics in Medicine Unlocked, 41, 101354. SAGE Journals+1

UCI Machine Learning Repository. (1998). Dermatology Data Set. University of California, Irvine. UCI Machine Learning Repository

Verlingue, L., et al. (2024). Ensuring safe and effective integration of language models in oncology. The Lancet Regional Health – Europe, 41, 100234. The Lancet

Zhou, J., et al. (2024). Pre-trained multimodal large language model enhances skin disease diagnosis (SkinGPT-4). Nature Communications, 15, 50043. Nature+1

Zhu, M., et al. (2025). Woollie: A large language model trained on clinical oncology data. npj Digital Medicine, 8, xx–xx. Nature

E. Ahmed, M. Shaima, M. I. Tusher, N. Nabi, M. N. Uddin Rana and S. Saha, "Health Care - An Android Application Implementation and Analyzing User Experience," 2025 IEEE 5th International Conference on Smart Information Systems and Technologies (SIST), Astana, Kazakhstan, 2025, pp. 1-6, doi: 10.1109/SIST61657.2025.11139168.

Article Statistics

Downloads

Download data is not yet available.

Copyright License

Download Citations

How to Cite

Md. Rayhan Hassan Mahin, Aleya Akhter, Hosne Ara Malek, Kamrun Naher, & Md Mahabubur Rahman Bhuiyan. (2025). Hybrid Large Language Model–Machine Learning Framework for Early-Stage Skin Lesion Classification Using the UCI Dermatology Dataset. Frontline Medical Sciences and Pharmaceutical Journal, 5(12), 01–10. https://doi.org/10.37547/medical-fmspj-05-12-01