[fityk-devel] Winspec SPE format support has been added

张鹏

unread,

Aug 2, 2007, 6:18:45 PM8/2/07

to Marcin Wojdyr, fityk-dev ML

Hi, Marcin:

> Hi, I'm going to write an announcement about your project to two mailing lists (one is about powder diffraction, the > other about computational chemistry, toghether a few thousends subscribers)
Fine, please do it.

The new code with Winspec SPE format support has been committed. See Changelog:

* Added new format support of Princeton Instruments WinSpec SPE format
* Extended some functions in util.cpp,
* Fixed a bug in function le_to_host()

--
Regards,

Peng ZHANG (张鹏)

Marcin Wojdyr

unread,

Aug 2, 2007, 10:54:51 PM8/2/07

to 张鹏, fityk-dev ML

On 8/2/07, 张鹏 <zhangp...@gmail.com> wrote:
> The new code with Winspec SPE format support has been
> committed. See Changelog:

ds_winspec_spe.cpp:
See line 127 (bug)

and this looks like a bug:

if (1 == ydim) {
dim = xdim;
calib = &x_calib;
} else if (1 == ydim) {
dim = xdim;
calib = &y_calib;
}

Do not call seekg() when not needed. Please remove offset argument
from all read_xxx_le() functions and try to avoid seekg. In many
places the pointer is already at the right position, in other cases
it's enough to change order in which numbers are read from file. Avoid
jumping with reading pointer around a file. I think you should never
need to move it backward, or only once, to set it at the beginning of
file. You can also specify relative offset to seek functions.

Instead of:
string my_flt_to_string(float num, float undef)
{ return ((undef == num) ? "undefined" : S(num));
}
p_rg->add_meta("THETA_START",
my_flt_to_string(read_flt_le(f, cur_range_offset + 28), -1e6));
should be:
float t = read_flt_le(f, cur_range_offset + 28); // but without offset
if (t != -1e6)
p_rg->add_meta("THETA_START", S(t));

At utils.cpp:

this looks like a bug:
unsigned read_int16_le(ifstream &f, unsigned offset)

I think we can give up support for PDP-endian computer (because it's
not complete, we can't test it, and it's not supported even by glib).

Split function le_to_host(void *p, unsigned len) into three functions
for 2, 4 and 8 bytes. It will be definitely simpler.

What this comment means?
p_ifs = NULL; // move here to avoid gcc complaining

Could you change XXXDataSet constructors to take istream& instead of filename?
The file can be opened in getNewDataSet(), later it can be changed. Use cases:
- I want to read a file as both a text and as points using xylib
(preview in fityk GUI). If I can pass istream, I avoid opening the
file twice.
- I want to read data from stream, that is not a file on a disk (like
istringstream).

Implementation of udf format is really bad. See how parsing simple
lines with numbers is complicated.
6234, 6185, 5969, 6129, 6199, 5988, 6046, 5922
6017, 5966, 5806, 5918, 5843, 5938, 5899, 5851
...
1234/
You read a line, call a function that check the "type" of the line,
move the file pointer back, read the same line again, call the
function that checks the type of the line again ....
And it took me too much time to track how it works. Please don't use
UxdLikeDataSet for this format.

The strict reading of this format could look like this:

for (;;) {
int d;
f >> d;
if (!f)
throw format error;
p_rg->add_y(d);
int c = f.get(); // c should be blank or ',' or '/'
if (c == '/') // end of data
return;
// optionally check if isspace(c) || c == ','
}
// at the end, check if the number of points is what it should be
// (use DataAngleRange and ScanStepSize)

If you want a more flexible version ("1234 , 1238"), you can
replace get() with:
char c;
f >> c;
if (c == '/') return;
if (c != ',') putback(c);

I didn't tested it, so it may not work, but you have an idea.
I guess it's much faster, but what's more important it's easier to understand.

Marcin

--
Marcin Wojdyr | http://www.unipress.waw.pl/~wojdyr/

张鹏

unread,

Aug 5, 2007, 9:42:54 AM8/5/07

to Marcin Wojdyr, fityk-dev ML

Hi, Marcin:

On 8/3/07, Marcin Wojdyr <woj...@gmail.com > wrote:
> ds_winspec_spe.cpp:
> See line 127 (bug)
>
> and this looks like a bug:
>
> if (1 == ydim) {

> ...
>

yes, this is a bug. Code has changed to:

if (1 == ydim) {
dim = xdim;
calib = &x_calib;

} else if (1 == xdim) {
dim = ydim;
calib = &y_calib;
} else {
throw XY_Error("xylib does not support 2-D images");
}

> Do not call seekg() when not needed. ...
Changed to call ignore() at the caller position and try to avoid
useing seekg() .

> Instead of:
> string my_flt_to_string(float num, float undef)
> { return ((undef == num) ? "undefined" : S(num));
> }

> ...
changed.

> At utils.cpp:
>
> this looks like a bug:
> unsigned read_int16_le(ifstream &f, unsigned offset)

Yes, changed the return value type to "int"

> I think we can give up support for PDP-endian computer (because it's
> not complete, we can't test it, and it's not supported even by glib).

ok, I will throw an exception in PDP case.

> Split function le_to_host(void *p, unsigned len) into three functions
> for 2, 4 and 8 bytes. It will be definitely simpler.

Split it to 3 functions.

> What this comment means?
> p_ifs = NULL; // move here to avoid gcc complaining

removed this comment.

> Could you change XXXDataSet constructors to take istream& instead of filename?

Now, I changed the constructors to support both filename and istream&.

> Implementation of udf format is really bad.

Rewrote the implementation of udf format.

The ChangeLog is:
///////////////////////////////////////////////
21:34 2007-8-5 GMT+8 version 328
* Some bugs fixed
* Changed to avoid use seekg() in the file reading stream
* Gave up PDP-endian support
* Reimplement the philipse-udf format
* XXXDataSet's ctor can also accept istream& argument.
* Some other changes following Marcin's advice.
///////////////////////////////////////////////

PS: since you are going to use xylib in fityk, should I integrate the
already-supported format in fityk (.cpi .rit) into xylib?
If you have some sample files of these formats?

Looking forward to your kindly reply.

Marcin Wojdyr

unread,

Aug 5, 2007, 3:50:39 PM8/5/07

to 张鹏, fityk-dev ML

On 8/5/07, 张鹏 <zhangp...@gmail.com> wrote:

> throw XY_Error("xylib does not support 2-D images");

If only a subset of a format is supported, add a short note (2 words)
about it in README, where the list of supported formats is.

> > Could you change XXXDataSet constructors to take istream& instead of filename?
>
> Now, I changed the constructors to support both filename and istream&.
>

so now there are two ctors:

TextDataSet(const std::string &filename)
TextDataSet(std::istream &is, const std::string &filename)

In the second ctor you assign:
p_is = &is;
and then in load_data() you call init(), where you have:
p_is = new ifstream(filename.c_str(), ios::in | ios::binary);

It's in your interest to keep the code as simple as possible, to avoid
bugs like this.

Please leave only one ctor:

DataSet(std::istream &is_, xy_ftype ftype_) : is(is_), ftype(ftype_) {}

get rid of init() and is_filetype(). You can move checks from
is_filetype() to either load_data() or to static check().

> PS: since you are going to use xylib in fityk, should I integrate the
> already-supported format in fityk (.cpi .rit) into xylib?
> If you have some sample files of these formats?

As I wrote, I can do it. I know these formats and it will take me 1-2
hours. You can add other XRD formats.

This:
calib.calib_valid = (read_string(f, 1).c_str())[0]
can be rewritten as
f.read(calib.calib_valid) // reads one char

Two issues I already wrote about:

are elements in g_fi ordered? (if yes, add a comment about it in code)

Reading functions should not assume that the format is correct. E.g.
reading Bruker raw v2 file as v1 should not crash, but rather throw an
exception.

张鹏

unread,

Aug 6, 2007, 8:50:02 PM8/6/07

to Marcin Wojdyr, fityk-dev ML

Hi, Marcin:

On 8/6/07, Marcin Wojdyr <woj...@gmail.com> wrote:
> If only a subset of a format is supported, add a short note (2 words)
> about it in README, where the list of supported formats is.

Added, and together with the limitation of VAMAS.

> so now there are two ctors:

Changed to 1 Ctors. But in its arg-list, I reserved the file_name arg,
because I think it should be the DataSet's responsibility to mantain
the relationship between data files and the istream. Of course, if you
did not get the stream from a file, you can simpliy omit the file_name
arg, and it will default to "".

>
> get rid of init() and is_filetype(). You can move checks from
> is_filetype() to either load_data() or to static check().

OK, done.

> As I wrote, I can do it. I know these formats and it will take me 1-2
> hours. You can add other XRD formats.

OK

>
>
> This:
> calib.calib_valid = (read_string(f, 1).c_str())[0]
> can be rewritten as
> f.read(calib.calib_valid) // reads one char

Changed.

>
> Two issues I already wrote about:
>
> are elements in g_fi ordered? (if yes, add a comment about it in code)

Added.

>
> Reading functions should not assume that the format is correct. E.g.
> reading Bruker raw v2 file as v1 should not crash, but rather throw an
> exception.

How tolarent should I do when I parse a file? If you assume that
format may be not as expected in every step, it will be very difficult
and boring to write the program.
In my opinion, we only need to make the format check in the static
"check()" memeber function, (of course, if you think it have not done
enough checking in the existing code, we can do even more); once a
file passes the check() function, we can assume that it is in the
correct format.

I have changed the check() functions to make sure that raw_v23 and
raw_v1 will not be mistaken (i.e. a v1 file will not pass the check()
of v2/3, and vice versa)

The ChangeLog:
//////////////////////////////////////
* Changed "ChangLog" to the right order
* Updated README to explicitly give some restrictions of supported
formats (VAMAS and WinSpec)
* DataSet and all derived classes have only one Ctor, accepting
"istream&" argument only
* Removed XXXDataSet::init() & is_filetype. Type checking is moved to
the static check() member function
* Other slight changes, such as updating comments, moving "include
"util.h" from xxxDataSet.h to .cpp files, changing to use ifstream&
instead of ifstream* inside DataSet.
* Added some handling to avoid raw_v2v3 be mistaken as raw_v1,
rigaku_dat as text.

//////////////////////////////////////

Marcin Wojdyr

unread,

Aug 6, 2007, 11:36:40 PM8/6/07

to 张鹏, fityk-dev ML

On 8/6/07, 张鹏 <zhangp...@gmail.com> wrote:
> > so now there are two ctors:
> Changed to 1 Ctors. But in its arg-list, I reserved the file_name arg,
> because I think it should be the DataSet's responsibility to mantain
> the relationship between data files and the istream.

I don't think so. You can always add it to API, but you can't remove
it without breaking code that uses the library.

You made other changes to the very simple one-line constructor I wrote

DataSet(std::istream &is_, xy_ftype ftype_) : is(is_), ftype(ftype_) {}

- added default parameter ftype_=FT_UNKNOWN. Why?
- changed ftype(ftype_) to ftype=ftype_. Why?

I don't want you to waste your time trying to complicate simple
things, so please if you don't like my suggestions, first write about
it, and then change your code.

Designing API and writing DataSet class should take only a few percent
of all time you have for GSOC project. Your main job is learning about
file formats and implementing them.
Now you have 8 formats. That's not much. Text and udf are trivial, uxd
and rigaku dat are below acceptable quality.

> > Reading functions should not assume that the format is correct. E.g.
> > reading Bruker raw v2 file as v1 should not crash, but rather throw an
> > exception.
> How tolarent should I do when I parse a file? If you assume that
> format may be not as expected in every step, it will be very difficult
> and boring to write the program.

What means tolerant? You don't have to read a file if the format is
different then expected, but just crashing a program is not a good
solution.
I don't know if it is difficult or boring, but it's a normal way of
writing programs.

> In my opinion, we only need to make the format check in the static
> "check()" memeber function, (of course, if you think it have not done

You are very wrong. The good programming practice is that you do not
assume that the input data is correct. If program crashes, it is a
bug, whatever the input is.

> enough checking in the existing code, we can do even more); once a
> file passes the check() function, we can assume that it is in the
> correct format.

No, that's bad.

> I have changed the check() functions to make sure that raw_v23 and
> raw_v1 will not be mistaken (i.e. a v1 file will not pass the check()
> of v2/3, and vice versa)

And what about the bug that crashed the program? It should be fixed, not hidden.
There may be plenty of other bugs like this, so please review the code
and see if there are any assumptions about input file that may cause
crash. Files can be corrupted, it happens.

Reply all

Reply to author

Forward