Efficient CV Parsing: Lightweight Models and Heuristic Methods for Diverse and Non-Standard Formats
This paper addresses the challenge of analyzing CVs to parse their content into structured formats suitable for further processing and analysis. The proposed solution processes CVs provided as images or PDFs, handling diverse input formats, including free-form, multi-language, non-standardized layouts, and highly structured documents. Various heuristic approaches are employed for layout analysis, complemented by lightweight language models for extracting information. While multimodal models demonstrate strong performance, their cost and deployment complexity remain significant barriers. This study explores alternative methods optimized for computational efficiency, processing accuracy, and easier deployment. A comparative analysis of approaches is conducted on a standard dataset containing CVs from diverse clients and job roles, ranging from entry-level to specialized positions in various domains. The findings highlight the potential of these tailored, efficient solutions for scalable and secure CV parsing.