Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Simple file processing...

0 views
Skip to first unread message

Niklas Backlund

unread,
Jan 24, 2001, 4:43:05 PM1/24/01
to
Hi everyone,

I have a problem with a supposedly elementary thing.

In trying to be Pragmatic(tm), I want my program to read
all its configuration parameters from a text file. Though
I've been trying to make the text file's format as simple
as possible, I couldn't (?) avoid ending up in the
following situation:

The "main loop" in the method that reads the file, reads
a parameter on each line (for simplicity) according to a
keyword-value scheme, and constructs the appropriate
objects. However, some of the data I want to read are
strings containing line breaks. I solved it by doing
something along the lines of here documents, i e
something like

parameter value
parameter value
long-parameter WHATEVER
data
data
WHATEVER
parameter value
...

So I'd like the main loop to read normal parameters, and
when it encounters a long parameter-type, let another
method copy a few lines "verbatim", and when it's done,
assign the data to the appropriate instance variable and
let the main loop pick up after the end marker as if
nothing happened...

Is there a neat, "Ruby-way" of pulling this off? Or do I
just have to reconsider my choice of file format?

/Niklas


Niklas Backlund

unread,
Jan 24, 2001, 4:46:32 PM1/24/01
to

Kevin Smith

unread,
Jan 25, 2001, 1:54:31 AM1/25/01
to
Niklas Backlund wrote:
>The "main loop" in the method that reads the file, reads
>a parameter on each line (for simplicity) according to a
>keyword-value scheme, and constructs the appropriate
>objects. However, some of the data I want to read are
>strings containing line breaks. I solved it by doing
>something along the lines of here documents, i e
>something like
>
>parameter value
>parameter value
>long-parameter WHATEVER
> data
> data
>WHATEVER
>parameter value
>...
>Is there a neat, "Ruby-way" of pulling this off? Or do I
>just have to reconsider my choice of file format?

Hmmm....my first cut (in pseudocode) would be:
for each line
if it's a short parameter
process that parameter
else
memorize the terminator
read lines until you hit the terminator
process that parameter
end
end

You could extract the entire if/else into a
method to keep the main loop simpler. Aside from
Ruby's nice line-oriented input methods, I don't
see any obvious opportunities to come up with a
really cool Ruby-specific solution.

To get any more detailed, I think I'd need to
know what kinds of objects you're creating based
on what kinds of inputs.

Kevin

Niklas Frykholm

unread,
Jan 25, 2001, 5:44:38 AM1/25/01
to
In article <J%Hb6.6735$wz.2...@nntp1.chello.se>, Niklas Backlund wrote:
>The "main loop" in the method that reads the file, reads
>a parameter on each line (for simplicity) according to a
>keyword-value scheme, and constructs the appropriate
>objects. However, some of the data I want to read are
>strings containing line breaks. I solved it by doing
>something along the lines of here documents, i e
>something like
>
>parameter value
>parameter value
>long-parameter WHATEVER
> data
> data
>WHATEVER
>parameter value
>...
>
>So I'd like the main loop to read normal parameters, and
>when it encounters a long parameter-type, let another
>method copy a few lines "verbatim", and when it's done,
>assign the data to the appropriate instance variable and
>let the main loop pick up after the end marker as if
>nothing happened...

Do you mean that the main loop would have to know which parameters are
long and normal? It seems better to encode this in the file.

In my opinion, rolling your own parser for a config file is seldom a good
solution. It is better to use an existing parser that provides a familiar
syntax, supports comments, can report about errors, etc.

In ruby, we can use the ruby parser itself. One way is to write your config
file as.

param1 = "value1"
param2 = "value2"
param3 = <<PARAM3
A
long
value
PARAM3

And use it as:

def read_config(file)
eval(IO::File.new(file).read)
binding
end

params = read_config("config")
p eval "param1", params
p eval "param3", params

Note that you should only use this method if the person who writes the
config file is trusted, since that person can specify arbitrary code for
the program to execute.

// Niklas

a...@crimson.propagation.net

unread,
Jan 25, 2001, 5:23:35 AM1/25/01
to
On Thu, 25 Jan 2001, Kevin Smith wrote:

> Niklas Backlund wrote:
> >parameter value
> >parameter value
> >long-parameter WHATEVER
> > data
> > data
> >WHATEVER
> >parameter value
> >...
> >Is there a neat, "Ruby-way" of pulling this off? Or do I
> >just have to reconsider my choice of file format?
>
> Hmmm....my first cut (in pseudocode) would be:
> for each line
> if it's a short parameter
> process that parameter
> else
> memorize the terminator
> read lines until you hit the terminator
> process that parameter
> end
> end

Kevin, one comment on your style. I tend to outline my routines in a
same way, and I guess it's a good thing to do. But if you pay
attention to outlining style, then you have half-baked Ruby code
already.

1) drop "each", since "for line in stream" works already
2) use '_' instead of space, and you get method names on the fly
3) plain English doesn't always give me hint what could be the
variables, but Ruby like pseudo-code tries to talk to me

I guess I'm approaching the point when it's as fast to write real (but
rough) ruby code instead of using English.

If these tests are along the way it should be

###
class TestParamReader < RUNIT::TestCase
def test_read2
pr = ParamReader.new
pr.read("foo bar")
assert_equals({"foo" => "bar"}, pr.params)

pr = ParamReader.new
pr.read("foo bar\nzak zok")
assert_equals({"foo" => "bar", "zak" => "zok"}, pr.params)

pr = ParamReader.new
pr.read("foo bar\nzak zok\n\n")
assert_equals({"foo" => "bar", "zak" => "zok"}, pr.params)

pr = ParamReader.new
pr.read("foo bar\n bar2\nzak zok\n zok2\n")
assert_equals({"foo" => "bar\nbar2", "zak" => "zok\nzok2"},
pr.params)

pr = ParamReader.new
pr.read("foo bar\nzak zok \n zok2 \n\n zok3\nfoo bar2\n")
assert_equals({"foo" => "bar\nbar2",
"zak" => "zok \nzok2 \n zok3"},
pr.params)
end

end
###

Following snippet passes these tests:

###
class ParamReader
attr_reader :params
def initialize
@params = {}
end

def read(file)
new_param_re = /^[^\s]/
for line in file
next if line.strip == "" # skip empty rows

value = line

name, value = line.split(/\s/, 2) if line =~ new_param_re

if @params.has_key? name
value = "\n" + value # if appending, add new lines too
else
@params[name] = "" # if not, initialize hash
end

# remove leading space (only one) and trailing new line if any
@params[name] += value.chomp.sub(/^ /, "")
end
end

# If you're willing to live with some constraints (not tested):
# - requires newer ruby: @params = Hash.new { "" }
# - doesn't handle empty rows gracefully
# - doesn't handle the formatting of the beginning of the
# long parameters continuation lines (strip leading space away?)
def read2(file)
@params = Hash.new { "" }
new_param_re = /^[\s]/
for line in file
value = line
name, value = value.split(/\n/) if line =~ new_param_re
@params[name] += value
end
end

end
###

Kevin notes:


> You could extract the entire if/else into a
> method to keep the main loop simpler. Aside from
> Ruby's nice line-oriented input methods, I don't
> see any obvious opportunities to come up with a
> really cool Ruby-specific solution.

Agreed. I kept everything in one loop and one routine, but it messes
up the code. Also I couldn't see any really nice way of doing this.


- Aleksi

Kevin Smith

unread,
Jan 25, 2001, 12:09:53 PM1/25/01
to
a...@crimson.propagation.net wrote:
>On Thu, 25 Jan 2001, Kevin Smith wrote:
>> Hmmm....my first cut (in pseudocode) would be:
>> for each line
>> if it's a short parameter
>> process that parameter
>> else
>> memorize the terminator
>> read lines until you hit the terminator
>> process that parameter
>> end
>> end
>
>Kevin, one comment on your style. I tend to outline my routines in a
>same way, and I guess it's a good thing to do. But if you pay
>attention to outlining style, then you have half-baked Ruby code
>already.

In real life, I very rarely write pseudocode like
this. I would start writing tests, and then fill
in the code. In this case, rather than take the
time to write real code, or to post untested
Ruby(ish) code, I figured pseudocode was the way
to go.

> def read(file)
> new_param_re = /^[^\s]/
> for line in file
> next if line.strip == "" # skip empty rows
>
> value = line
>
> name, value = line.split(/\s/, 2) if line =~ new_param_re

Argh. This formatting drives me insane. I want it
to be really obvious that name and value may or
may not be assigned. I would much prefer:

if line =~ new_param_re


name, value = line.split(/\s/, 2)

else
value = line
end

>
> if @params.has_key? name
> value = "\n" + value # if appending, add new lines too
> else
> @params[name] = "" # if not, initialize hash
> end
>
> # remove leading space (only one) and trailing new line if any
> @params[name] += value.chomp.sub(/^ /, "")
> end
> end

Otherwise, that approach seems reasonable. The
function is short enough not to raise my
eyebrows. I think my pseudocode might lend itself
to breaking out a function to do the stuff in the
middle, rather than relying on 'name' persisting
from one line to the next. But either way works.

Kevin

Kevin Smith

unread,
Jan 25, 2001, 12:53:48 PM1/25/01
to
r2...@mao.acc.umu.se wrote:
>In article <J%Hb6.6735$wz.2...@nntp1.chello.se>, Niklas Backlund wrote:
>In ruby, we can use the ruby parser itself. One way is to write your config
>file as.
(snip)

>Note that you should only use this method if the person who writes the
>config file is trusted, since that person can specify arbitrary code for
>the program to execute.

And for that reason, I would personally avoid
this approach. I don't trust anyone that much.

Kevin

Niklas Frykholm

unread,
Jan 26, 2001, 4:28:40 AM1/26/01
to
>>In ruby, we can use the ruby parser itself.
>(snip)
>>Note that you should only use this method if the person who writes the
>>config file is trusted, since that person can specify arbitrary code for
>>the program to execute.
>
>And for that reason, I would personally avoid
>this approach. I don't trust anyone that much.

Well, you probably trust the person who wrote the original program that much
(since you are running her code). And you probably trust yourself. And matz.

Also note that many other programs have security that depend on trusted
config-files, for example firewalls. If someone can edit the config file
of your firewall, you are in trouble.

Powerful config-files are only a problem when the access rights to the
config-files are more permissive than the access rights to the executable.

But if you are not lazy (like me) then I certainly agree that it is better
security-wise to use less powerful config files. The risk that you will
mess up is smaller.

Wouldn't it be nice if eval's could be sandboxed?

// Niklas

ts

unread,
Jan 26, 2001, 5:13:00 AM1/26/01
to
>>>>> "N" == Niklas Frykholm <r2...@mao.acc.umu.se> writes:

N> Wouldn't it be nice if eval's could be sandboxed?

Run it with $SAFE >= 4


Guy Decoux

Michael Neumann

unread,
Jan 28, 2001, 7:33:27 AM1/28/01
to


You may also have a look at EDF (Easy Data Format) available at RAA.
A file is divided into sections, which can be of any name, each section contains
any number of parameters. Of course you can have a file without sections which parameter
belong to the default-section named "":

require "edf"

str = <<EOF
param: Value 1
this line belongs to param because the first character is a whitespace
param2: Value2
EOF

default = EDF::parse(str)[0]
p default["param"]
p default["param2"]


------
Multiple sections are written like this:

:section1
param: test
param2: test2

:section2
para6: why

--
Michael Neumann

0 new messages