In this paper, we discuss a new architecture, which is deployed for multi-standard packet inspection and basic network processing tasks in a high-performance network coprocessor. Thereby, concepts, architecture, compiler tool-chain and VLSI area estimation for this programmable finite state machine based (FSM-based Processing Engine, FPE, are presented. The micro-architecture comprises an FSM-controlled instruction sequencing mechanism, a novel register scheme and a short pipeline instead of a typical multi-staged processor pipeline. This introduces several advantages for efficient handling of conditional branches and small look-ups. Those advantages can be utilized for packet classification applications. The FPE data path performance is compared to an ARM9-type processor in two exemplary header parsing kernels from the "CommBench" benchmark suite. According to the results, the presented engine provides a speed-up of 4 and more in terms of required computation cycles to the ARM9. Using a 65nm VLSI technology, the FPE design is supposed to run at clock frequencies up to 2GHz and requires about 1.8qmm chip area. Based on the specific transition rule memory organization, which is an essential element of the programmable FSM, a memory utilization of around 95% can be achieved. However, the FPE micro-architecture requires a customized code translation chain in order to transfer high-level program code into an FSM representation. Basically, this is achieved by three steps: (1) generation of sequential, assembly-like macro-instructions, (2) scheduling and generation of FSM-based horizontal (parallel) micro-code and (3) organization of respective FSM rules in the "instruction" memory. Our studies confirm the advantages of the FPE as a fully programmable high-performance header parsing engine.