Comparison of the Non-Blocked and Blocked Floyd-Warshall Algorithm with Regard to Speedup and Energy Saving on an Embedded GPU
In this paper, three variants of the Floyd-Warshall (FW) All Pairs Shortest Path (APSP) algorithm are presented and compared - the sequential implementation, the parallel implementation using the Nvidia CUDA API, and the blocked parallel version of the FW algorithm. A performance analysis between these three approaches, as well as between the individual phases of the parallel algorithm is provided. The performance of these algorithms has been measured on regular as well as on embedded GPU hardware, and a significant speedup has been achieved. Additionally, this paper shows that a blocked data access results in significant energy savings of up to 72% on embedded hardware.