Compile-tested only.
I am not sure whether input can be 64bit-unaligned.If it indeed can be, replace:
((u64*)(input))[I] -> get_unaligned( ((u64*)(input))+I )--vda