This paper describes an energy-aware methodology that identifies custom instructions for critical code segments, given the available data bandwidth constraint between custom logic and a base processor. Our approach enables designers to optionally constrain the number of input and output operands for custom instructions to reach the acceptable performance considering the energy dissipation of the registerfile. We describe a design flow to identify promising area, performance, and power tradeoffs. We study the effect of custom instruction IO constraints and registerfile IO ports on overall performance and energy usage of the registerfile. Our experiments show that, in most cases, the solutions with the highest performance are not identified with relaxed IO constraints. Results for packet-processing benchmarks covering cryptography and lookup applications are shown, with speed-ups between 25% and 40%, and energy reduction between 20% and 30%.