Pgfi initiation

40 views
Skip to first unread message

choish...@gmail.com

unread,
Feb 10, 2022, 10:08:23 PM2/10/22
to plink2-dev
Hi Chris, sorry for bothering you again. Am trying to use the pgenlib_read to read the genotype data at the moment. I noticed that we need the  PgenFileInfo object and it seems like that the procedure is somewhat like:
```
plink2::PgenFileInfo pgen_file_info;

plink2::PgenReader pgen_reader;

plink2::PreinitPgfi(&pgen_file_info);

plink2::PgenHeaderCtrl header_control;

uintptr_t cur_alloc_cacheline_ct;

uint32_t use_mmap = 0;
char error_log[2 * 131072]; // some kind of error string

auto reterr = plink2::PgfiInitPhase1( file_name, num_variants, num_sample, use_mmap, &header_control, &pgen_file_info, &cur_alloc_cacheline_ct, error_log);

if (unlikely(reterr)) { throw std::runtime_error(error_log); }

auto alloc_size = plink2::calculate_alloc_size(cur_alloc_cacheline_ct * plink2::kCacheline); std::vector<unsigned char> pgfi_alloc(alloc_size);

uint32_t max_vrec_width = 0;
auto phase2_error = plink2::PgfiInitPhase2( header_control, 0, 0, 1, 0, num_variants, &max_vrec_width, pgen_file_info, pgfi_alloc->data(), &pgr_alloc_cacheline_ct, error_log);

if (unlikely(phase2_error)) { throw std::runtime_error(error_log); }

```

This seems to work well when we are not using multi-threading. However, if I use this code in multi-threaded environment, I always get BUS ERROR 10 the line before  the plink2:PgfiInitPhase2 and the function never got caught. I have checked the pgfi_alloc is correctly sized and all the exact same script works when I was not putting it in a std::thread. but will fail if I put it in a std::thread (see pseudo code example in attached)

Was there any kind of limitation regarding how to use PgenFileInfo and PgenReader in multi-threaded environment?  Are they thread safe?

Thank you
test.cpp

Christopher Chang

unread,
Feb 11, 2022, 12:40:55 PM2/11/22
to plink2-dev
I reproduced this crash on my Mac, and it looks like the issue is too low of a default stack size (you can see the value with "ulimit -s"; I got "8192").  Your error_log array requires 256 KiB of stack space (it is safe to shrink this to plink2::kPglErrstrBufBlen, which is 4.25 KiB), and PgfiInitPhase2() requires another 256 KiB.

Other issues I saw in this code:
- You must pass use_blockload=0 if you are using a PgenReader instead of the PgfiMultiread function.
- You should guarantee that pgfi_alloc is cacheline-aligned.

Christopher Chang

unread,
Feb 11, 2022, 12:45:17 PM2/11/22
to plink2-dev
Correction, "ulimit -s" should return a value in KiB units, so 8192 should be enough.  Investigating further.

Christopher Chang

unread,
Feb 11, 2022, 12:50:59 PM2/11/22
to plink2-dev
Ok, the stack size limit for new std::threads on macOS is 512 KiB and the standard library doesn't let you change it.

~256 KiB is the largest amount of stack space that any plink2 function requires, so shrinking the error_log array to the suggested value should solve your problem.

Christopher Chang

unread,
Feb 11, 2022, 1:00:35 PM2/11/22
to plink2-dev
Also, "if you are using a PgenReader instead of the PgfiMultiread function" should read "if you are not using the PgfiMultiread function", since the PgfiMultiread-based workflow still involves PgenReaders.
On Friday, February 11, 2022 at 9:40:55 AM UTC-8 Christopher Chang wrote:

choish...@gmail.com

unread,
Feb 11, 2022, 1:12:27 PM2/11/22
to plink2-dev
Thank you Chris, that is very helpful. I will try to adjust my code accordingly!

Thanks

choish...@gmail.com

unread,
Feb 11, 2022, 2:25:24 PM2/11/22
to plink2-dev

Follow up question regarding thread safety of PgenReaders: Is it thread safe for us to use one copy of PGenFileInfo for multiple PgenReaders? I see that in plink_data, when doing multi-threaded genotype filtering, PgrCopyBaseAndOffset need to be called which seems to assign the block_base from pgfi to pgen reader.

Are there any simple example of how best to use PgenReaders and PGenFileInfo in multi-thread settings? I guess I can always initialize one copy of PgenReader and PGenFileInfo per-thread, just wondering if there are better way of doing that.

Sorry for the constant questions and thank you for the amazing software, have learn a lot of programming knowledges from reading your codes.

Christopher Chang

unread,
Feb 11, 2022, 2:32:51 PM2/11/22
to plink2-dev
Yes, the PgenFileInfo struct is designed to be usable with multiple PgenReaders.  See https://groups.google.com/g/plink2-dev/c/aKxf8d5Zwjw for some previous discussion, and let me know if some questions remain after reading that.

choish...@gmail.com

unread,
Feb 11, 2022, 3:27:33 PM2/11/22
to plink2-dev
Yes, I think I understand now. Basically, it is fine to have one PgenFileInfo and pgfi_alloc shared across multiple threads, but each thread will need to have their own copy of PGenReader and PgenVariant, with the appropriate vectors allocated. Is that correct?

Christopher Chang

unread,
Feb 11, 2022, 4:42:39 PM2/11/22
to plink2-dev
Yes, that should work.

choish...@gmail.com

unread,
Feb 21, 2022, 11:01:29 AM2/21/22
to plink2-dev
Thanks Chris, I got most stuff working. However, I noticed that when doing  PgrInit, the PGenReader will take the shared_ff away from the PgenFileInfo. If multiple threads are using the same PgenFileInfo and initialize their own PGenReader, would that cause a race condition? As I am not certain about this, I am currently putting a mutex lock surrounding the PgrInit function just in case. Again, thank you so much for the API, it makes it much easier to write code processing the plink data.

Side note: Does it make sense to make the PgenVariantStruct and PGenReader movable? e.g.

  PgenVariantStruct(PgenVariantStruct&&) = default;
  PgenVariantStruct& operator = (PgenVariantStruct&&) = default;

This is purely for the purpose of allowing us to put these structure in a vector, though that might not be a very useful common use case in the main plink code.

Christopher Chang

unread,
Feb 21, 2022, 12:28:24 PM2/21/22
to plink2-dev
1. Your suspicion is correct, it is not safe to initialize multiple PgenReaders from the same PgenFileInfo in parallel, so it is reasonable to have a mutex guard the PgenFileInfo here.
2. I'll look into making PgenVariant and PgenReader movable-but-noncopyable later today.

Christopher Chang

unread,
Feb 21, 2022, 1:18:51 PM2/21/22
to plink2-dev
Movability update has been pushed to GitHub, let me know if you run into any issues.

choish...@gmail.com

unread,
Feb 21, 2022, 8:54:30 PM2/21/22
to plink2-dev
Thank you so much! That should solve all my problem for now

Sam
Reply all
Reply to author
Forward
0 new messages