C++ code:
http://pastebin.com/AFezksCYPython code:
http://pastebin.com/HkGFynSWBoth codes have the same output, the same input, but the speed difference is drastic, the whole code needs about 10s in total for Python to run, while my C++ code takes 65s on net.forward() alone.
How does this happen? Its x64 Release mode, with all optimization turned on (CPU_ONLY mode in either case), so I would expect them both to have around the same runtime, yet my C++ is a lot slower.