Out of curiosity, are you struggling with speed due to a large reference tree/alignment?
100s-1000s of leaves in the reference tree?
Hi Benjamin,
the problem was not with your compiler, but with the cmake version requirement which I had set incorrectly.
The latest commit on the master branch should circumvent this; I tested it with a very similar cmake version and it should work now.
Please do a `git pull`, `make clean` and `make` and let me know if that fixes the problem!
Regards,
Pierre
Hi Benjamin,
also note that you don't need icc to compile EPA-ng. Any of gcc, clang or icc should work - having one of those is enough ;-)
According to your cmake output, you are using gcc, so that's
fine.
Lucas
--
You received this message because you are subscribed to the Google Groups "raxml" group.
To unsubscribe from this group and stop receiving emails from it, send an email to raxml+un...@googlegroups.com.
Dear Benjamin,
yes, I believe the discrepancy is due to there being so many redundancies. I haven't explicitly mentioned it in the documentation yet, but EPA-ng does not (currently) re-replicate the input queries, as I see that as a pre-processing step.
That being said, I realize that the input fasta format for the queries has no clean way of tracking how many multiplicities a sequence has behind it... So keeping track of that is not straight forward. I want to find a clean/workable solution for this in the future.
The data I usually test with, and which I used to arrive at the
`6 times faster` figure was a chunk of data used in a recent
popular study using placement, where the fraction of redundant
sequences was much lower (around 10%). It is entirely possible
that they performed some dereplication however.
Just for some insight: the idea during development was to stream
through the query sequences to allow for massive files (billions
of queries) while keeping the memory footprint fixed. It also
improves runtime, as reading and processing can be overlapped.
--
You received this message because you are subscribed to the Google Groups "raxml" group.
To unsubscribe from this group and stop receiving emails from it, send an email to raxml+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
-- MSc Pierre Barbera Phone: +49 6221 533 258 Fax: +49 6221 533 298 E-Mail: pierre....@h-its.org HITS gGmbH Schloss-Wolfsbrunnenweg 35 D-69118 Heidelberg Amtsgericht Mannheim / HRB 337446 Managing Director: Dr. Gesa Schönberger Scientific Director: Prof. Dr. Michael Strube
The data I usually test with, and which I used to arrive at the `6 times faster` figure was a chunk of data used in a recent popular study using placement, where the fraction of redundant sequences was much lower (around 10%). It is entirely possible that they performed some dereplication however.
At least for EPA-ng I know that in some cases, the time spent per sequence may be significantly higher for small inputs as compared to large inputs due to static overheads, and optimizations that trade of a small amount of time in the beginning to greatly accelerate the runtime overall
linard@flip:~/epa$ makeRunning cmake-- The CXX compiler identification is GNU 4.9.3-- The C compiler identification is GNU 5.4.0
-- Check for working CXX compiler: /usr/bin/g++-4.9-- Check for working CXX compiler: /usr/bin/g++-4.9 -- works-- Detecting CXX compiler ABI info-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features-- Detecting CXX compile features - done
-- Check for working C compiler: /usr/bin/cc-- Check for working C compiler: /usr/bin/cc -- works-- Detecting C compiler ABI info-- Detecting C compiler ABI info - done
-- Detecting C compile features-- Detecting C compile features - done
-- EPA-ng version: 0.1.0-- Building RELEASE
-- Looking for pthread.h-- Looking for pthread.h - found-- Looking for pthread_create-- Looking for pthread_create - not found-- Check if compiler accepts -pthread-- Check if compiler accepts -pthread - yes-- Found Threads: TRUE
-- Enabling Prefetching-- Checking for OpenMP
-- Checking for OpenMP -- found-- Performing Test HAS_AVX-- Performing Test HAS_AVX - Success-- Performing Test HAS_SSE3-- Performing Test HAS_SSE3 - Success
-- CMake version 3.5.1
-- Genesis version: v0.16.0-- Scope: library-- Build type: RELEASE-- Unity build: FULL
-- C++ compiler: GNU 4.9.3 at /usr/bin/g++-4.9-- C compiler : GNU 5.4.0 at /usr/bin/cc
-- Check if the system is big endian-- Searching 16 bit integer-- Looking for sys/types.h-- Looking for sys/types.h - found-- Looking for stdint.h-- Looking for stdint.h - found-- Looking for stddef.h-- Looking for stddef.h - found-- Check size of unsigned short-- Check size of unsigned short - done-- Using unsigned short-- Check if the system is big endian - little endian-- Looking for Threads
-- Found Threads: -pthread-- Using Threads
-- Building static lib-- Using flags: -std=c++14 -Wall -Wextra -D__PREFETCH -fopenmp -D__OMP -mavx -D__AVX -- Could NOT find GTest (missing: GTEST_LIBRARY GTEST_INCLUDE_DIR GTEST_MAIN_LIBRARY) -- GTest not foundCMake Warning at test/src/CMakeLists.txt:5 (message): Skipping building tests.
-- Configuring done-- Generating done-- Build files have been written to: /home/linard/tests_speed/epa/buildRunning makemake -C build make[1]: Entering directory '/home/linard/tests_speed/epa/build'make[2]: Entering directory '/home/linard/tests_speed/epa/build'make[3]: Entering directory '/home/linard/tests_speed/epa/build'Scanning dependencies of target genesis_lib_staticmake[3]: Leaving directory '/home/linard/tests_speed/epa/build'make[3]: Entering directory '/home/linard/tests_speed/epa/build'[ 4%] Building CXX object libs/genesis/lib/genesis/CMakeFiles/genesis_lib_static.dir/__/__/__/__/genesis_unity_sources/lib/all.cpp.o[ 9%] Linking CXX static library ../../../../../libs/genesis/bin/libgenesis.amake[3]: Leaving directory '/home/linard/tests_speed/epa/build'[ 9%] Built target genesis_lib_staticmake[3]: Entering directory '/home/linard/tests_speed/epa/build'Scanning dependencies of target epa_modulemake[3]: Leaving directory '/home/linard/tests_speed/epa/build'make[3]: Entering directory '/home/linard/tests_speed/epa/build'[ 14%] Building CXX object src/CMakeFiles/epa_module.dir/sample/Placement.cpp.o[ 19%] Building CXX object src/CMakeFiles/epa_module.dir/tree/Tiny_Tree.cpp.o[ 23%] Building CXX object src/CMakeFiles/epa_module.dir/tree/Tree.cpp.o[ 28%] Building CXX object src/CMakeFiles/epa_module.dir/tree/tiny_util.cpp.o[ 33%] Building CXX object src/CMakeFiles/epa_module.dir/core/pll/epa_pll_util.cpp.o[ 38%] Building CXX object src/CMakeFiles/epa_module.dir/core/pll/optimize.cpp.o[ 42%] Building CXX object src/CMakeFiles/epa_module.dir/core/pll/pll_util.cpp.o[ 47%] Building CXX object src/CMakeFiles/epa_module.dir/core/raxml/Model.cpp.o[ 52%] Building CXX object src/CMakeFiles/epa_module.dir/core/place.cpp.o[ 57%] Building CXX object src/CMakeFiles/epa_module.dir/pipeline/schedule.cpp.o[ 61%] Building CXX object src/CMakeFiles/epa_module.dir/seq/MSA.cpp.o[ 66%] Building CXX object src/CMakeFiles/epa_module.dir/seq/MSA_Stream.cpp.o[ 71%] Building CXX object src/CMakeFiles/epa_module.dir/io/jplace_util.cpp.o[ 76%] Building CXX object src/CMakeFiles/epa_module.dir/io/file_io.cpp.o[ 80%] Building CXX object src/CMakeFiles/epa_module.dir/io/Binary.cpp.o[ 85%] Building CXX object src/CMakeFiles/epa_module.dir/util/stringify.cpp.o[ 90%] Building CXX object src/CMakeFiles/epa_module.dir/set_manipulators.cpp.o[ 95%] Building CXX object src/CMakeFiles/epa_module.dir/main.cpp.o[100%] Linking CXX executable ../../bin/epa-ngmake[3]: Leaving directory '/home/linard/tests_speed/epa/build'[100%] Built target epa_modulemake[2]: Leaving directory '/home/linard/tests_speed/epa/build'make[1]: Leaving directory '/home/linard/tests_speed/epa/build'
Hi Benjamin,
is it possible that you copied the epa directory from your desktop machine to the cluster, and then ran `make pll` and `make` directly? In that case, the configuration is still made for your desktop, that is, it uses all those special instructions during compilation. Try to run `make clean` first, and then `make pll` and `make` again.
If this still does not work, we will have to dig deeper.
Best
Lucas
is it possible that you copied the epa directory from your desktop machine to the cluster
Hello Benjamin,
this is strange indeed. EPA-ng doesn't have a minimum requirement on vector instruction sets.
I can see that cmake successfully detects AVX, even though as you write, the target processor does not.
Did you run the executable on the login node of your cluster? Or did you submit a job script? Perhaps your login nodes compilers are configured to compile for the actual compute nodes, while the login node has a different architecture.
Barring that, it might be an issue with cmake, and/or how I use it to detect the intrinsics, but it hasn't failed me yet. The only other odd thing I can see is that your C compiler is set differently than your CXX compiler:
You can try to resolve that and see how it affects it.
Regards,
Pierre
--
You received this message because you are subscribed to the Google Groups "raxml" group.
To unsubscribe from this group and stop receiving emails from it, send an email to raxml+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To unsubscribe from this group and stop receiving emails from it, send an email to raxml+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
-- MSc Pierre Barbera Phone: +49 6221 533 258 Fax: +49 6221 533 298 E-Mail: pierre....@h-its.org HITS gGmbH Schloss-Wolfsbrunnenweg 35 D-69118 Heidelberg Amtsgericht Mannheim / HRB 337446 Managing Director: Dr. Gesa Schönberger Scientific Director: Prof. Dr. Michael Strube
--
You received this message because you are subscribed to the Google Groups "raxml" group.
To unsubscribe from this group and stop receiving emails from it, send an email to raxml+unsubscribe@googlegroups.com.
I'd strongly recommend *against* running epa-ng on a any CPU without SSE3, as the performance will be *extremely* poor. (and if you have a choice, please use AVX/AVX2-enabled CPU)
Perhaps your login nodes compilers are configured to compile for the actual compute nodes, while the login node has a different architecture.
Barring that, it might be an issue with cmake, and/or how I use it to detect the intrinsics, but it hasn't failed me yet. The only other odd thing I can see is that your C compiler is set differently than your CXX compiler:
-- The CXX compiler identification is GNU 4.9.3-- The C compiler identification is GNU 5.4.0You can try to resolve that and see how it affects it.
Hello Benjamin,
first of all, thank you for the in-depth investigation! And apologies about the late reply.
I think you are correct, the problem is with CMake detecting AVX
compatibility where none exists. I just replicated the problem
locally.
--
You received this message because you are subscribed to the Google Groups "raxml" group.
To unsubscribe from this group and stop receiving emails from it, send an email to raxml+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Hello Benjamin,
if you pull the latest commit on the master branch, your issue should (hopefully) be resolved! Please let me know about any further issues, I really appreciate it!
Pierre
INFO Selected: verbose (debug) output
INFO Selected: Output dir: ./
INFO Selected: Query file: ../EMP_92_studies_10000000.fas_pplacer16S.aln.fas
INFO This appears to be a non-binary fasta file. Converting!
INFO Updated Query file: ./EMP_92_studies_10000000.fas_pplacer16S.aln.fas.bin
INFO Selected: Tree file: ../RAxML_result.bv_refs_aln
INFO Selected: Reference MSA: ../bv_refs_aln_stripped_99.5.fasta
INFO Selected: Using threads: 1
INFO ______ ____ ___ _ __ ______
/ ____// __ \ / | / | / // ____/
/ __/ / /_/ // /| | ______ / |/ // / __
/ /___ / ____// ___ |/_____// /| // /_/ /
/_____//_/ /_/ |_| /_/ |_/ \____/
DBG Rate heterogeneity: GAMMA (4 cats, mean), alpha: 1 (ML), weights&rates: (0.25,0.136954) (0.25,0.476752) (0.25,1) (0.25,2.38629)
Base frequencies (ML): 0.25 0.25 0.25 0.25
Substitution rates (ML): 0.5 0.5 0.5 0.5 0.5 1
DBG Tree length: 56.5177
DBG Post-optimization reference tree log-likelihood: -92080.956637
INFO Number of ranks: 1
INFO Number of sequences per rank: 10000652
DBG num_sequences: 5000
DBG Preplacement.
DBG Using threads: 1
DBG Max threads: 1
Segmentation fault (core dumped)
Hi Benjamin,
would it be possible for you to send me your input files in a private email?
I've tested with 100M input sequences before and never had issues that that stage of the algorithm, so I would be very interested to reproduce this.
Alternatively you can try to recompile with:
make clean && make EPA_DEBUG=1
and then re-run the program. If that yields no concrete output, you can run the program with gdb by prepending this to your command:
gdb --args
then "r" from within the gdb shell to run the actual program. Once its at a failure state, you can use the command "bt" to produce a backtrace.
Thanks,
Pierre
--
You received this message because you are subscribed to the Google Groups "raxml" group.
To unsubscribe from this group and stop receiving emails from it, send an email to raxml+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.