64-bit data path needs double the registers for pipelining & data outputs compared to 32-bit data path. That would be the reason for increased resources. But, the difference in resources are not in a greater scale as per the Xilinx table in the above link.
With 64-bit interface, the user clock will be half of that of 32-bit version.
I feel that's the big advantage of running the core @ lesser clock frequency compared to 32-bit version.