Easily extracting/identifying the expanded particles to their respective parents

digvij...@gmail.com

unread,

Jul 12, 2021, 3:14:17 PM7/12/21

to EMAN2

Hello

[1] I am trying to find easy ways to relate the expanded particles to their respective parents. E.g., I want to separate/sort out expanded particles that came from a single parent.

[2] Currently, when you do symmetry expansion, the particles3d and particles are extracted and then you make .lst of them for further refinements.

[3] The listing of expanded particles in .lst is not sequential..i.e. all expanded particles of one parent followed by all expanded particles of another parent and so on. Because of this issue, the data_n associated with expanded particles can't be easily used to do [1].

[4] I semi-separate out the expanded particles by their data_source (i.e. name of their tomogram) and then have some crude way of further separating them further. (explained below in green)

[5] A routine for this separation is shown below in red. I collect expanded particles by their data_source and then using some distance restraints, I try to collect expanded particles that came from a single parent.

[6] Is there a better, easier and more systematic way of doing this? The best would be generating subsets of .lst, each containing expanded particles from a single parent only.

Thanks and cheers,

Digvijay

js=js_open_dict("./spt_12/particle_parms_04.json")
keys=js.keys()
srcname_list=[]
ptcl_source_coord_list=[]
data_n_list=[]
for k in keys:
src, ii = eval(k)
e=EMData(src, ii, True)
srcname=e["file_twod"]
ptcl_source_coord_list.append(e["ptcl_source_coord"])
srcname_list.append(base_name(srcname))
data_n_list.append(e["data_n"])

length = len(srcname_list)
a={}
for i in range(length):
a[srcname_list[i]]=[]
for j in range(length):
p1=np.array(ptcl_source_coord_list[i])
p2=np.array(ptcl_source_coord_list[j])
squared_dist = np.sum((p1-p2)**2, axis=0)
#diff_data_n=abs(int(data_n_list[i])-int(data_n_list[j]))
dist = np.sqrt(squared_dist)

if srcname_list[i]==srcname_list[j] and dist <threshold: #(if expanded particles have the same data_source or srcname and they are within a threshold distance, then they are considered to be from the same parent)
a[srcname_list[i]].append(ptcl_source_coord_list[j])

with open("sample.json", "w") as outfile:
json.dump(a, outfile)

MuyuanChen

unread,

Jul 12, 2021, 3:25:40 PM7/12/21

to em...@googlegroups.com

I thought after a certain version, the original particle information is saved in the header of new 3d particles. (orig_ptcl, orig_idx, orig_xf). Do you see them in your 3d particles? Admittedly I forgot to document this in the wiki or the program…

--
--
----------------------------------------------------------------------------------------------
You received this message because you are subscribed to the Google
Groups "EMAN2" group.
To post to this group, send email to em...@googlegroups.com
To unsubscribe from this group, send email to eman2+un...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/eman2

---
You received this message because you are subscribed to the Google Groups "EMAN2" group.
To unsubscribe from this group and stop receiving emails from it, send an email to eman2+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/80ab2067-53d8-4fac-b223-b73b4e95373dn%40googlegroups.com.

digvij...@gmail.com

unread,

Jul 12, 2021, 6:01:31 PM7/12/21

to EMAN2

The newer version indeed stores the orig_ptcl info etc. I read somewhere that you are doing away the .json file to .lst files only. In light of this plus more, I have following questions:

{1} When running the new refinement, does the ptcl_source-coord in the header of the particle automatically get updated with the new coordinates obtained after refinement? Therefore, the ptcl_source-coord in header is the coordinate from the latest refinement of that particle. I can double check this at my end as well, but I wanted to confirm.

{2} When doing symmetry expansion, do the stacks of particles3d and particles get built sequentially? E..g, if extracting 4 asymmetric subunits each out of 3 particles (id: 1,2 and 3) in a tomogram. Does the particle3d stack first fill with 4 asymmetric subunits of parent particle with id 1 and then with 4 asymmetric subunits of parent particle with id 2..so on and so forth.? Because I am thinking the best for me would be to simply unstack the existing stack into stacks containing expanded particles from only one parent particle. E.g., a stack of 40 particles containing 4*10 asymmetric subunits can be unstacked into 10 separate stacks each containign 4 asymmetric subunits of 10 different parent particles.

{3} Can you extract only specific info from the header using e2iminfo.py. E.g., only mode_id or only ptcl_source_coord? Parsing the specific info from the whole header is trivial, but I was just asking if e2iminfo.py has this functionality of outputing only specific info from the header.

Cheers

Muyuan Chen

unread,

Jul 12, 2021, 6:47:09 PM7/12/21

to em...@googlegroups.com

The particle header should be unchanged for lst or json in general. Now both systems should work for re-extraction.

I think the coordinates get updated every time.

The sequence is random unless you use only one thread during extraction. The main thread just append whichever particle that got reconstructed by the worker thread.

I guess you can do e2iminfo ... |grep xxx?

To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/48d9e58e-0277-4e27-a14b-f5aa95a5527an%40googlegroups.com.

digvij...@gmail.com

unread,

Jul 12, 2021, 9:00:07 PM7/12/21

to EMAN2

Cool.

Here's how I plan to generate stacks of expanded asymmetric subunits, with each stack belonging to a single parent particle of m asymmetric subunits.

[1] Re-extract the asymmetric subunits using only 1 thread, so that asymmetric subunits get sequentially processed and appended onto the stack and the write a routine to automate the following:

[2] For each stack, find the total number of images in it (n) and run a loop (n/m) times to unstack the given stack via following e2proc3d.py stack.hdf stack_i__xyz.hdf where i is an integer. (where xyz will be useful as the label)

[3] create a set with the label xyz to create a set of stacks such that each stack contaims m asymmetry subunits from a single parent particle.

[4] Run refinements etc. with the set created at [3]

Please let me know if this approach is flawed or wrong.

Cheers,

Muyuan Chen

unread,

Jul 12, 2021, 9:32:49 PM7/12/21

to em...@googlegroups.com

I don't think it is flawed but it is probably not the most efficient way of doing this either... Unless you need to change the binning of the particles, ideally you can just have multiple virtual copies of each particle, each posed at the orientation of a different asymmetrical unit. So you do not need to duplicate the actual particle volume and waste the computation power and storage. Saying that, this is easier to implement with the new lst format than the json (since json does not allow multiple entries of the same key), which is a reason I don't have a working program for this. Basically,

1. find out the transform from the center of the structure to each of the N asymmetrical unit (including rotation and translation).

2. for json alignment parameters, you need to make a new lst of particles, but refer to each actual hdf particle N times in the lst. for lst alignment format, this can be skipped

3. make a new json alignment file for the new lst, for each copy of the same particle, multiply the previously determined orientation by the transform to one asymetrical unit

4. run e2spt_average on the new json. this should give you the structure centered at an asymmetrical unit if the previous steps work properly.

5. run focused refinement from the alignment file and averaged structure. probably need a customized mask to focus on one unit of the protein.

The actual process will probably be more complicated, and some math need to be done to get the transform right. I may write a program for the new lst based pipeline at some point, since bookkeeping will be much easier as list allows duplication of entries by itself. Maybe Steve has a working program for symmetry expansion?

Again, your approach of symmetry expansion by duplicating actually particles will probably work too, so feel free to proceed...

Muyuan

To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/6d9812aa-cfcf-4d2f-a3db-fd110c65537bn%40googlegroups.com.

digvij...@gmail.com

unread,

Jul 14, 2021, 12:55:53 AM7/14/21

to EMAN2

Thanks, Muyuan

Reply all

Reply to author

Forward