FST mmapping

706 views
Skip to first unread message

Maël Primet

unread,
Apr 6, 2017, 10:43:59 AM4/6/17
to kaldi-help
Is there a way to use mmapping when opening FST files? it seems that Kaldi uses streams to open the file, not sure if it is possible to bypass and use mapping

Daniel Povey

unread,
Apr 6, 2017, 1:21:38 PM4/6/17
to kaldi-help
Unfortunately the in-memory formats that OpenFst uses are not compatible with memory mapping, there are too many pointers and stl types.  The code (or the necessary parts of it) would have to be rewritten from scratch with memory mapping in mind, to get that to work.

Dan


On Thu, Apr 6, 2017 at 7:43 AM, Maël Primet <mael....@gmail.com> wrote:
Is there a way to use mmapping when opening FST files? it seems that Kaldi uses streams to open the file, not sure if it is possible to bypass and use mapping

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Maël Primet

unread,
Apr 10, 2017, 4:32:43 AM4/10/17
to kaldi-help, dpo...@gmail.com
Thanks Dan,

from what I see in the OpenFST documentation, we can't do mmapping because we use vector FST rather than const  FST which provide a mmap implementation, is that true? and do we need the vector FST because we store two (acoustic, language) weights for the FST or is there another reason?

could you indicate me if you think that the rewrite of the FST for the decoding would be a very large task, or something that I can perhaps look at, and if so would you have suggestions of where I should look to get started? I have not yet taken a deep look at the Kaldi internals for FST decoding so I can take all the ideas you have

Armando

unread,
Apr 10, 2017, 1:14:56 PM4/10/17
to kaldi-help, dpo...@gmail.com
openfst has the program ftsconvert to convert from one fst type to another

Maël Primet

unread,
Apr 10, 2017, 1:16:45 PM4/10/17
to kaldi...@googlegroups.com, dpo...@gmail.com

Indeed, but is it possible to use Kaldi with const FST or do we need special types of arcs with pairs of weights? When I load a FST that I converted to a const FST I get an error when loading the FST in Kaldi, as it can’t load it as a VectorFST

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to a topic in the Google Groups "kaldi-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kaldi-help/4Sqw6vKu9CI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kaldi-help+...@googlegroups.com.

Daniel Povey

unread,
Apr 10, 2017, 1:31:49 PM4/10/17
to Maël Primet, kaldi-help
Oh, I didn't realize that const-FST has the option for memory mapping.
It is possible to use const-FST in the decoder, as the interfaces only require Fst<StdArc> which is the base class.
In fact, const-FST is also more memory efficient so I had wanted to investigate the use of that in the decoders (see this Issue: https://github.com/kaldi-asr/kaldi/issues/1486).  It would be nice to see what impact there is on memory usage and speed, even without memory mapping.

Dan

To unsubscribe from this group and all its topics, send an email to kaldi-help+unsubscribe@googlegroups.com.

Armando

unread,
Apr 10, 2017, 3:03:02 PM4/10/17
to kaldi-help, dpo...@gmail.com
well, you should change the type of the fst accordingly in the main of the decoder program

mike.cl...@gmail.com

unread,
Apr 10, 2017, 6:25:55 PM4/10/17
to kaldi-help
We have been using const FSTs with Kaldi successfully for over a year. We prepare the const FSTs using something like:
    fstconvert --fst_type=const HCLGvector.fst HCLGconst.fst.
Our program is happy to read either const or vector, and even auto-detects the correct format at runtime (see code snippet at the end of the email). We find that const FSTs load from storage significantly faster than vector FSTs. They also take up less space in memory. Const FSTs also speed up decoding by a measurable amount, as they seem to traverse more quickly. Curiously, we see that const FSTs take up approximately 10% more space on disk when compared to vector FSTs. We have not experimented with mmaping, though we have seen mention of that capability in the OpenFST source code.

______
 
    fst::Fst<StdArc> *decode_fst = NULL;
    Input ki_fst(fst_in_filename); // use ki.Stream() instead of is.
    if (!ki_fst.Stream().good()) {
      KALDI_ERR << "Could not open decoding FST: " << fst_in_filename;
      return 1;
    }
    fst::FstHeader hdr;
    if (!hdr.Read(ki_fst.Stream(), "<unknown>")) {
      KALDI_ERR << "Reading FST: error reading FST header.";
      return 1;
    }
    if (hdr.ArcType() != fst::StdArc::Type()) {
      KALDI_ERR << "FST with arc type " << hdr.ArcType() << " not supported.";
      return 1;
    }
    fst::FstReadOptions ropts("<unspecified>", &hdr);
    if (hdr.FstType() == "vector") {
      decode_fst = kaldi::ReadDecodeGraph(fst_in_filename);
    } else if (hdr.FstType() == "const") {
      decode_fst = fst::ConstFst<fst::StdArc>::Read(ki_fst.Stream(), ropts);
    }

Daniel Povey

unread,
Apr 10, 2017, 6:44:46 PM4/10/17
to kaldi-help

How much memory does it save?
Is there any chance you could help with a pull request?  If not it's OK, I'll find someone else to do it, but let me know either way.

One possibility is to copy and modify the following function in kaldi-fst-io.h:

// Read a binary FST using Kaldi I/O mechanisms (pipes, etc.) 
// On error, throws using KALDI_ERR.  Note: this  
// doesn't support the text-mode option that we generally like to support.  
VectorFst<StdArc> *ReadFstKaldi(std::string rxfilename);

to the following function based on your code:

// Read a binary FST using Kaldi I/O mechanisms (pipes, etc.) 
// If it can't read the FST, if throw_on_err == true it throws using KALDI_ERR;
// otherwise it prints a warning and returns.  Note: this  
// doesn't support the text-mode option that we generally like to support.  
// This version currently supports ConstFst<StdArc> or VectorFst<StdArc>
// (const-fst can give better performance for decoding).
Fst<StdArc> *ReadFstKaldiGeneric(std::string rxfilename, 
                                                          bool throw_on_err = true);

.. and this function would be based on your code.

The decoding programs would have to be modified to call this instead (and would have to make sure to delete the object when done, if a variable had to be changed to a pointer).  After that, the graph-creation script (mkgraph.sh) could be converted to write a const-fst.

We haven't been using std::unique_ptr/std::shared_ptr and the like so far but it's a possibility now that Kaldi is C++11-only, so perhaps we should have a discussion about that too.

Dan



--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages