I was pondering the same thing this evening. So since there seems to be precious little information out there, allow me to revive this 3 month old thread with a few of my findings.
I too got a crash when I tried extracting the fixed-length-dawgs, and
dawg2wordlist doesn't seem to offer any special flags for handling this special composite dawg.
wordlist2dawg -l <short> <long> WORDLIST DAWG lang.unicharset
and says about the option:
-l <short> <long> Produce a file with several dawgs in it, one each for words of length <short>, <short+1>,… <long>
While one could surely just look at the source to figure out the details, I figured the "dawgs" file format is simply a bunch of "dawg"s cat'ed together.
To verify this theory I compared a regular dawg and the fixed-length-dawgs in a hex editor.
The regular dawg appears to use the magic number '2A001D0E', which was suspiciously found several times in the dawgs.
An educated guess tells me the dawgs format is simply:
[4 bytes : number of dawgs] + ([4 bytes : length of words in dawg] + [DAWG ...])*
This makes is very easy to manually extract the individual dawgs, and one could even naively split the file on the headers:
awk 'BEGIN {RS="\x2A\x00\x1D\x0E"; FILENUM=-1} {FILENUM++; if (FILENUM == 0)
{next}; FILENAME=".fixed-length-dawg-"FILENUM; printf "%s",RS$0 > FILENAME;}' .fixed-length-dawgs
By using the above snippet I successfully managed to "extract" 6 dawgs of various length from the pre-built jpn.traineddata.
You can then run the standard dawg2wordlist and extract the wordlists from them.
On a separate note it is still not clear to me what the exact purpose of these sub dawgs is.
The jpn.traineddata appears to contain a .freq-dawg and the .fixed-length-dawgs but no .word-dawg.
Why it is helpful to split the dictionary into many smaller dictionaries based on word length, I cannot guess.
I hope this will be helpful to someone out there.