[2505.00025] A Method for the Architecture of a Medical Vertical Large Language Model Based on Deepseek R1

[Submitted on 25 Apr 2025 (v1), last revised 22 Jul 2025 (this version, v2)]

View a PDF of the paper titled A Method for the Architecture of a Medical Vertical Large Language Model Based on Deepseek R1, by Mingda Zhang and 1 other authors

View PDF
HTML (experimental)

Abstract:Despite significant advances in foundation models like DeepSeek-R1 and ChatGPT, their deployment in medical settings faces critical challenges including computational requirements and professional knowledge barriers. This paper presents an efficient lightweight medical large language model architecture that systematically addresses these challenges through three-dimensional optimization: knowledge acquisition, model compression, and computational enhancement. We design a knowledge transfer pipeline from DeepSeek-R1-Distill-70B to DeepSeek-R1-Distill-7B using Low-Rank Adaptation (LoRA) for precise medical knowledge retention. Through 4-bit quantization and mixed-precision strategies, we achieve substantial model compression while preserving medical reasoning capabilities. The inference framework incorporates Flash Attention acceleration and continuous batching, complemented by specialized prompt templates for diverse medical queries. Experimental evaluation on medical benchmarks demonstrates that our approach maintains 92.1% accuracy on USMLE examinations while reducing memory consumption by 64.7% and inference latency by 12.4% compared to baseline models. This work provides a practical solution for deploying advanced language models in resource-constrained medical environments, enabling broader accessibility of AI-assisted healthcare.

Submission history

From: Mingda Zhang [view email]
[v1]
Fri, 25 Apr 2025 14:28:29 UTC (256 KB)
[v2]
Tue, 22 Jul 2025 14:26:53 UTC (256 KB)

Submission history

Leave a Comment Cancel Reply