Assembly function

2 views
Skip to first unread message

Arpit Gaur

unread,
Jan 9, 2026, 11:07:07 AM (3 days ago) Jan 9
to methylkit_discussion

Hi,

This is my first time working with methylKit, and I am using the IWGSC RefSeq v2.1 wheat reference genome. I had a few questions regarding the assembly argument in methRead() and related functions:

  1. How does the assembly argument work internally in methylKit?
    Is it used for any validation or processing steps, or is it stored purely as metadata?

  2. Is it mandatory to specify assembly?
    Are there any specific analyses or downstream steps (e.g. DMR calling, annotation, visualization) where providing assembly becomes necessary?

  3. What format is expected for assembly?
    Is it simply a character string (e.g. "IWGSC_RefSeq_v2.1"), or does it require a path to the reference genome FASTA or index files?

Apologies for asking multiple questions in one post. I appreciate your time and help in improving my understanding of methylKit.


Regards

Arpit

alex....@gmail.com

unread,
Jan 9, 2026, 3:39:32 PM (3 days ago) Jan 9
to methylkit_discussion
Hi Arpit,

Great that you are giving methylKit a shot. 

1) The assembly string is currently stored purely as metadata.  To cite the description from the processBismarkAln help string:  
"assembly string that determines the genome assembly. Ex: mm9,hg18 etc.
 This is just a string for book keeping. It can be any string. Although,
when using multiple files from the same assembly, this string should be
 consistent in each object." 

2) Specifying the assembly is mandatory. While it is not strictly enforced, there have been issues reported when not providing it
However, all downstream analyses are completely independent of the assembly.

3) The variable must be a character string, but there is no specific format enforced. I would recommend using the official genome assembly name, as you suggested. 

Best,
Alex
Reply all
Reply to author
Forward
0 new messages