This paper describes a new architecture and the corresponding implementation of a stereo vision system that covers the entire stereo vision process including noise reduction, rectification, disparity estimation, and visualization. Dense disparity estimation is performed using the non-parametric rank transform and semi-global matching (SGM), which is among the top performing stereo matching methods and outperforms locally-based methods in terms of quality of disparity maps and robustness under difficult imaging conditions. According to the non-scan-aligned, complex data dependencies of the SGM, a scalable, systolic-array-based architecture is proposed. This architecture fulfills the demands of real-world applications regarding frame rate, depth resolution and low resource usage. The architecture is based on a novel two-dimensional parallelization concept for the SGM. An FPGA implementation on a Xilinx Virtex-5 generates disparity maps of VGA images (640x480 pixel) with a 128 pixel disparity range under real-time conditions (30 fps) at a clock frequency as low as 39 MHz.