Logo
Nazad
Sid Black, Stella Biderman, Eric Hallahan, Leo Anthony, Laurence Gao, Horace Golding, He, Wei-Lin Chiang, Zhuohan Li, Zi Lin, Ying Sheng, Christopher Clark, Kenton Lee, Ming-Wei Chang, Peter Clark, Isaac Cowhey, O. Etzioni, Tushar Khot, Tri Dao, Dan Fu, Stefano Ermon, Atri Rudra, Edward J. Hu, Yelong Shen, Zeyuan Phillip Wallis, Eldar Kurtic, Daniel Campos, Tuan Nguyen, Elias Fran-728, Mark Kurtz, Ben Fineran, M. Goin, Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, Chen Liang, Simiao Zuo, Minshuo Chen, Xia-Ming Jiang, Pengcheng Liu, Tuo He, Zhao Chen, Mitchell P. Marcus, Beatrice Santorini, Mary Ann, Stephen Merity, Caiming Xiong, James Bradbury, Todor Mihaylov
2 2024.

Structured Pruning for Large Language Models Using Coupled Components Elimination and Minor Fine-tuning

Large language models (LLMs) have demon-001 strated powerful capabilities in natural lan-002 guage processing, yet their vast number of pa-003 rameters poses challenges for deployment and 004 inference efficiency. Structured model pruning 005 emerges as a viable approach to reduce model 006 size and accelerate inference, without requir-007 ing specialized operators and libraries for de-008 ployment. However, structured pruning often 009 severely weakens the model’s capability. De-010 spite repetitive fine-tuning can restore the capa-011 bility to a certain extent, it impairs LLMs’ util-012 ity as versatile problem solvers. To address this 013 issue, we propose a novel structured pruning 014 algorithm tailored for LLMs. It derives the im-015 portance of different components, namely rows 016 and columns in parameter matrices, based on in-017 termediate data dependencies. Then it removes 018 coupled components across different layers si-019 multaneously and preserves dependency rela-020 tionships within remaining parameters, avoid-021 ing significant performance degradation. The 022 pruned model requires only few epochs of fine-023 tuning to restore its performance, ensuring the 024 model’s ability to generalize. Empirical eval-025 uations on LLaMA, Vicuna, and ChatGLM3 026 demonstrate our algorithm’s efficacy, yielding 027 20% parameter reduction while retaining at 028 least 94.4% of original performance metrics. 029


Pretplatite se na novosti o BH Akademskom Imeniku

Ova stranica koristi kolačiće da bi vam pružila najbolje iskustvo

Saznaj više