Hi Nanshan,Most WGS CNV callers provide just DEL or DUP designations because it is difficult to distinguish between the heterozygous state (cn=1 or cn=3) and the homozygous state (cn=0 or cn=4).
I would suggest you can just assume heterozygous CNV calls make up the majority of calls and encode CNState values DEL=>state2,cn=1 and DUP=>state5,cn=3.
If you want to try a more complex approach, you could plot the histogram of RD values you have from CNVkit and look for inflection points that would show appropriate cut-offs for homozygous vs. heterozygous states. However, from my experience the homozygous are so few and differential signal is too weak to separate them.
Here is an awk command that will convert CNVkit output into standard PennCNV format:
awk '{ORS="";print $1":"$2"-"$3" numsnp="$7" length="$4" ";if($6=="DEL"){print"state2,cn=1"}else{print"state5,cn=3"};print" "$5" startsnp="$1"_"$2" endsnp="$1"_"$3"\n"}' CNVkit.txt > CNVkit.rawcnv
Regards,
Joseph Glessner