--
You received this message because you are subscribed to a topic in the Google Groups "last-align" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/last-align/yzqdAcch-h8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to last-align+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/last-align/dbf6759a-ccb1-5e6a-e326-0a847fcab996%40gringene.org.
Thanks David for helping me understand what "platform" means here.The latest version (1045) might work on the other platforms. The compiler might complain about an unknown "msse4" option, in which case that option can be omitted, e.g. by:make CXXFLAGS='-O3 -std=c++11 -pthread -DHAS_CXX_THREADS'But don't do that on the amd64 platform, because we want to keep the SIMD speedup!!!It should be quite easy to add ARM SIMD, if that would be useful.Have a nice day,Martin
On Sat, Dec 21, 2019 at 12:36 PM David Eccles (gringer) <bioinfo...@gringene.org> wrote:
On 21.12.19 15:37, mcf...@edu.k.u-tokyo.ac.jp wrote:
> What other platforms do you have? How important is it to support them?
By my count, 9 official ports, 22 other ports:
https://www.debian.org/ports/
- David
--
You received this message because you are subscribed to a topic in the Google Groups "last-align" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/last-align/yzqdAcch-h8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to last-...@googlegroups.com.
To unsubscribe from this group and all its topics, send an email to last-align+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/last-align/769fbb23-4ff6-4ec8-9f98-0859632ab69d%40googlegroups.com.
Dear Michael,this is very interesting and impressive: many thanks for this suggestion!!!I don't fully understand it, but I have some doubts.As far as I can understand, if we use AVX2, then the code with your patch compiles down to the exact same thing as it did before your patch. Fine.
But if we use SSE4, the code with your patch compiles down to something a bit different than it did before, and I think less efficient (though the difference might be minor).
As for ARM NEON: I think your patch replaces each AVX/SSE instruction with a NEON instruction. But that's not great for horizontal max/min, which use multiple AVX/iSSE instructions, but can be implemented with a single NEON instruction (I think).
I don't fully understand your makefile changes: I guess you have some extra script that runs make with different "SFX"s and makes the wrapper script? And you somehow override the -msse4 option?I sometimes run LAST on Mac computers, which don't seem to have /proc/cpuinfo.
In my tests (before your patch), compiling with AVX2 does not make it much faster, versus compiling with SSE4. Maybe 5% faster at best. Surprising, I don't understand. This reduces my interest in automatically using the "best" available out of AVX, SSE, etc. But my interest would be resurrected if this would make it much faster, which it seems like it should...
Hope I don't seem too negative, but I'll try to learn from these ideas and bear them in mind.Have a nice day,MartinP.S. I can't find much info about that BioHackathon, but you're very welcome to visit us in Tokyo!
To view this discussion on the web visit https://groups.google.com/d/msgid/last-align/769fbb23-4ff6-4ec8-9f98-0859632ab69d%40googlegroups.com.
Dear Michael,this is very interesting and impressive: many thanks for this suggestion!!!
I don't fully understand your makefile changes: I guess you have some extra script that runs make with different "SFX"s and makes the wrapper script? And you somehow override the -msse4 option?
I sometimes run LAST on Mac computers, which don't seem to have /proc/cpuinfo.
In my tests (before your patch), compiling with AVX2 does not make it much faster, versus compiling with SSE4. Maybe 5% faster at best. Surprising, I don't understand. This reduces my interest in automatically using the "best" available out of AVX, SSE, etc. But my interest would be resurrected if this would make it much faster, which it seems like it should...
Hope I don't seem too negative, but I'll try to learn from these ideas and bear them in mind.
Have a nice day,MartinP.S. I can't find much info about that BioHackathon, but you're very welcome to visit us in Tokyo!
Thank you both for the explanations!
> > I don't fully understand your makefile changesIt seems you already sent me the links answering this, the first time. Don't know how I missed them, sorry!
I just noticed this: https://github.com/google/highwayPerhaps my main concern about SIMDe is that I'd prefer "Width-agnostic" SIMD. (And runtime dispatch.)
Thank you both for the explanations!> The most likely explanation is that you have a bottleneck somewhere so you can't run the 256-bit vectors at full speed. Based on that 5% figure there is a good chance that your 128-bit version isn't really running full speed, either. Have you tried looking for stalls?This seems the most important thing for me. I don't really know what a "stall" is, or how to look for one. I'd be grateful for any advice.
> SIMDe will often be slower than a fully ported implementation targeting NEONI wonder if it's generally better to start with a NEON implementation, then use SIMDe to make it work with AVX/SSE...
> > I don't fully understand your makefile changesIt seems you already sent me the links answering this, the first time. Don't know how I missed them, sorry!I just noticed this: https://github.com/google/highwayPerhaps my main concern about SIMDe is that I'd prefer "Width-agnostic" SIMD. (And runtime dispatch.)