Challenges in Mitigating Soft Errors in Safety-critical Systems with COTS Microprocessors
—The number of Commercial-Off-The-Shelf (COTS) microprocessors and microcontrollers used in safety applications increased significantly over the last decade. In contrast to safety-certified microcontrollers, these microcontrollers are produced without integrated protection against memory soft errors, and limited in terms of available memory and computation power. However, due to the constant optimizations of the memory’s physical size and the voltage margins, the probability that external factors, such as magnetic fields or cosmic rays, temporally alter a memory state (and thus cause a soft error) rises. Especially within safety-critical automation systems, it is crucial to address such errors and a wide range of error mitigation strategies have been proposed. In the context of established brownfield automation systems, the redesign and deployment of new hardware is usually not feasible. Therefore software-based strategies are required, which can be deployed on existing fail-safe architectures to further improve their performances, without requiring their rework or conceptual changes. This article identifies challenges associated with software-based soft error detection and correction strategies. Along with the challenges, a short overview of currently applicable software-based mitigation strategies is given and the strategies are evaluated.