Automated Bolus Detection in Videofluoroscopic Images of Swallowing Using Mask-RCNN
Tracking a liquid or food bolus in videofluoroscopic images during X-ray based diagnostic swallowing examinations is a dominant clinical approach to assess human swallowing function during oral, pharyngeal and esophageal stages of swallowing. This tracking represents a highly challenging problem for clinicians as swallowing is a rapid action. Therefore, we developed a computer-aided method to automate bolus detection and tracking in order to alleviate issues associated with human factors. Specifically, we applied a stateof-the-art deep learning model called Mask-RCNN to detect and segment the bolus in videofluoroscopic image sequences. We trained the algorithm with 450 swallow videos and evaluated with an independent dataset of 50 videos. The algorithm was able to detect and segment the bolus with a mean average precision of 0.49 and an intersection of union of 0.71. The proposed method indicated robust detection results that can help to improve the speed and accuracy of a clinical decisionmaking process.