Memory bandwidth has always been a critical factor for the performance of many data intensive applications. The increasing processor performance, and the advert of single chip multiprocessors have increased the memory bandwidth demands beyond what a single commodity memory device can provide. The immediate solution is to use more than one memory device, and interleave data across them so they can be used in parallel as if they were a single device of higher bandwidth.
In this paper we showed that fine-grain memory interleaving on the evaluated many-core architectures with many DRAM channels was critical to achieve high memory bandwidth efficiency. Our results showed that performance can degrade up to 50\% due to achievable bandwidths being far from the maximum installed.