12 cycles in.a research paper is likely an example number that is a reasonable guess, but isn't necessarily the exact number that a particular hardware implementation would require.
If you want flexible allocation of a bunch of small SRAM and TCAM that can be glued together via configuration registers in the hardware in order to make deeper or wider tables, as soon as the total ASIC area for those is large enough and the clock speed is high enough, it can take a couple of clock cycles just to reach the farthest one, plus a couple of clock cycles back. Most chips using SRAM use parity or ECC checking on all results they read, and verify those values before using the results, which can add a clock cycle, or maybe 2 or 3, of latency before you know the results are good. Don't believe 12 is some immutable constant of the universe. It depends. It is impractical to make it only 1 clock cycle, unless you reduce the clock rate so low that it isn't commercially interesting to build such a chip.
Andy