Most video applications use specific application programming interfaces to achieve the desired functionalities. Implementing interface backends with hardware is often too expensive for low-end mobile devices, so most of the devices cope with highly optimized software implementations that employ special instruction sets. The most common approach is the utilization of SIMD processing units such as ARM NEON or Intel WMMX in mobile application processors. To fully utilize the potential benefits of such instruction sets usually means tedious assembly coding even if vectorizing compilers have improved lately. In addition, low level APIs such as OpenMax DL have been made available to offer standardized interface for accelerated codec functionalities. In this paper we present optimization methods and results from using NEON instruction set and OpenMax DL API for MPEG-4 and H.264 video encoding and decoding. Although these technologies provide for significant speed-ups and reduce the burden of application designers, the serial bit stream processing bottleneck remains to be solved.