Fixed length fields

22 views
Skip to first unread message

The Beez

unread,
Feb 11, 2017, 7:02:05 AM2/11/17
to 4tH-compiler
Hi 4tH-ers!

It didn't took too long before a message plunged into my mailbox after I'd claimed that reading fixed length fields didn't boil down to much more than is required for delimited fields. True, it takes a heck of a lot of code to figure out how a fixed length field file is laid out, because you don't got much to work on. Still, it's no rocket science. It just takes some careful planning.

Most of you know that I tend to define FIELD> (get next field), FIELD>> (skip a field) and FIELDS>> (skip a number of fields) when I work with files, because it allows me to abstract a lot of things. E.g. FIELD> may look like:

  : field> delimiter parse-csv csv> -trailing -leading ;

And yes, reading fixed length files may be parsed like that too. So, how do we do it? First we define a zero limited array of field lengths:

  create states-fields 20 , 2 , 14 , 8 , 7 , 2 , 0 ,

Then we define a 2 element array:

  /layout array states-layout            \ this is the layout variable

In this array we store the buffer (where we're reading from) and the field lengths array. We're almost there.. Note we got to reset this thing each time we have read a record, so let's bind it with this buffer read:

  : !refill states-fields tib states-layout fields! refill ;

We're using REFILL here, but it would make no difference if we used e.g. MyBuffer 64 ACCEPT for that matter. Now we got to wrap it all together in one, elegant FIELD> definition:

  : field> states-layout next-field -trailing ;

NEXT-FIELD simply returns the current position in the buffer (what >IN usually does) and the current entry in the States-Fields array. Yes, sure. It may be padded with blanks, that's why we added -TRAILING. Done!

Yes, if you have several tables, you might need several FIELD> definitions, but that is the same when you use files with different delimiters - which you probably factored away anyway, so where's the difference?

The magic is all in this tiny definition:

  : next-field                           ( x -- a n)
    dup >r @ r@ cell+ @ @c dup           \ get the field
    if 2dup chars + r@ ! 1 cells r> cell+ +! else r> drop then
  ;                                      \ now increment the variables


It simply gets the current position of the buffer and the length and increments it when it is not zero. It doesn't get any easier than that. Code shortly in CSV.

Hans Bezemer
Reply all
Reply to author
Forward
0 new messages