[fityk-devel] Philips RD raw scan format is supported

张鹏

unread,

Aug 19, 2007, 11:32:59 AM8/19/07

to Marcin Wojdyr, fityk-dev ML

Hi, Marcin:

Following are changes of this version:
//////////////////////////////////////////////////////////////////////
20:13 2007-8-19 GMT+8 version 337
* Removed all non-NULL assertions after new()
* Re-wrote the Implementation of UxdDataSet, rmoved the UxdLikeDataSet class
* Found some new sample files (raw_v2, udf, uxd, pdCIF) to test the
existing code. Fixed bugs to process pdCIF files.
* Added support to the Philips Raw Scan format (.RD).
* Some other trivial changes, such as changing to use fmt_info.ftype
in the Ctor, etc
* Added the LGPL 2.1 text file as COPYING
//////////////////////////////////////////////////////////////////////

And there is still a problem in this version need to be fixed.
See ds_pdcif.cpp
//////////////////////////////////////////////////////////////////////
* Known issues:
There are still some problems when handling some pdCIF files with multiple
data blocks in a range.
Like in NISI.cif, there are 2 data blocks in 1 range, and the counts
of the data
points are different. This will cause the program to crash. Need be fixed.
//////////////////////////////////////////////////////////////////////

And here I will give a more detailed explanation:
loop_ _pd_meas_time_of_flight
_pd_meas_intensity_total
_pd_meas_point_id
1000.0 1818(34) 626
1001.6 1810(34) 627
1003.2 1808(34) 628
1004.8 1838(34) 629
.....
there are 4495 groups of (_pd_meas_time_of_flight,
_pd_meas_intensity_total, _pd_meas_point_id) values in this data
BLOCK. (Note that "block" and "range" are different here)

Then, inside the same RANGE (range starts with a "data_xxx"), there
are another data BLOCK: (starts at line 5459)
loop_
_pd_proc_d_spacing
_pd_proc_intensity_total
_pd_proc_ls_weight
_pd_proc_intensity_bkg_calc
_pd_calc_intensity_total
_pd_proc_point_id
0.50035 0.424(7) 19401. 0.3726 0.4155 1
0.50090 0.413(7) 19904. 0.3726 0.4059 2
....
there are 1648 groups of (_pd_proc_d_spacing,
_pd_proc_intensity_total, _pd_proc_ls_weight,
_pd_proc_intensity_bkg_calc, _pd_calc_intensity_total,
_pd_proc_point_id) values in this data BLOCK.

So, when xyconv handles this file, it will crash, because it assumes
that every columns in one range has the same size.

How do you think about how to fix this issue?

Another problem is that I have read the Philips Raw file format
specification, but I can just find some ".RD" Raw files (Version 3 of
Philips raw scan file). In fact, the version 5 of Philips raw scan
files format specification is also included in that pdf file.
But I cannot find any ".SD" Raw files (Version 5).
I have asked Martijn Fransen <martijn...@panalytical.com> for
some sample files to experiment, but he has not replied me yet. Do you
have some samples of this format?

About the testing of the library:
I have tried my best to find more sample files, and I have committed
all additional sample files to SVN. Still, sample files of some types
of the formats cannot be found. Currently, all of the sample files,
except for the one pdCIF mentioned above, can be processed properly by
xyconv.

Next, I will add some text-based format to xylib, and write some
documents to explain how to use/extend the code.

About the deadline of the gsoc project: yes, there is almost no time
before the dead line. However, our goal is let more people in the
world to benifit from our work, and I am so excited when hearing that
my work will be distributed to those popular Linux distributions along
with fityk.
After gsoc deadline, I will still work with you and all fityk
community, go on to improve xylib project. Although I may have less
time to work with you because the Fall Semester will begin soon, I
will try my best.

At last, I want to say, thank you, Marcin, thanks for helping me so
much! I did have a very good time in this year's GSOC project and
learn a lot. I am very happy to still work with you to make fityk
better after GSOC deadline.

--
Regards,

Peng ZHANG (张鹏)

张鹏

unread,

Aug 19, 2007, 12:47:45 PM8/19/07

to Marcin Wojdyr, fityk-dev ML

Hi, Marcin:

I have thought about the multi-block-range problem, and there may be 3
solutions to it:

1. only pick the points data from one block, and ignore all of the
other blocks.
This is the simplest way, but some of the XY data in the pdCIF file
has not been fully used.

2. still store the different-sized "column-groups" together in Range
like what it does currently. but before adding the range ptr p_rg to
ranges, call a set_x_y() methods to set x_column & y_column. Make sure
it will make x & y both in the same block. If this is done,
"vector-out-of-range" in NISI.cif will never happen.
Additionally, we can use another vector group_flg<int> to indicate
which column-group the corresponding column (col[i]) specified by
group_flg[i] is in.

3. add a member that can represent Block(s) to Range, either in a
C-style linked-list way or a C++ vector way.
But this seems to make things too complex: a DataSet contains
Range(s), a range contains Block(s), a Blocks contains columns of 1-D
data that can be used as X/Y coordinate.

I don't konw which solution should I choose now. I'd like to ask your
suggestions on this issue. (However, these are just my ideas, your
suggestion can give a 4th solution)

Thanks.

张鹏

unread,

Aug 19, 2007, 9:27:07 PM8/19/07

to Marcin Wojdyr, fityk-dev ML

Hi, Marcin:

On 8/20/07, Marcin Wojdyr <woj...@gmail.com> wrote:

> IMO each BLOCK should be a separate Range (class) in your library.
> BTW, I think the name Block would be better than Range, although I'm
> not completely sure about it.
> Such Blocks could have names.

Have you read the later letter about how to handle this problem?
And what's you suggestion mean? Adding a new level of Block (Class)
between *Currently* Range & Column as the 3rd sulution mentioned in
the later mail, or use these data as a *CURRENTLY* Range (class)? If
use these data as a *CURRENTLY* Range (class), how to handle the
meta-info of these 2 ranges? In fact, they are in one range (defined
by pdCIF specification), and the meta-info should be the same,
including the range name (data_XXX).

>SD format:
> is to skip (810-250) bytes if its V5.
yes, I just want to get some sample files to test.

>
> [sample files].
ok, I will remove all sample files that I have not got the
re-distribution permission. But what about those sample files that are
not downloaded from a URL (e.g. in an compressed archive)?

Marcin Wojdyr

unread,

Aug 19, 2007, 10:11:23 PM8/19/07

to 张鹏, fityk-dev ML

On 8/19/07, 张鹏 <zhangp...@gmail.com> wrote:
> Hi, Marcin:
>
> On 8/20/07, Marcin Wojdyr <woj...@gmail.com> wrote:
>
> > IMO each BLOCK should be a separate Range (class) in your library.
> > BTW, I think the name Block would be better than Range, although I'm
> > not completely sure about it.
> > Such Blocks could have names.
>
> Have you read the later letter about how to handle this problem?

yes

> And what's you suggestion mean? Adding a new level of Block (Class)
> between *Currently* Range & Column as the 3rd sulution mentioned in
> the later mail, or use these data as a *CURRENTLY* Range (class)? If

the latter, one pdCIF-BLOCK should be one xylib::Range.

> use these data as a *CURRENTLY* Range (class), how to handle the
> meta-info of these 2 ranges?

duplicate (or multiplicate) it

> In fact, they are in one range (defined
> by pdCIF specification), and the meta-info should be the same,
> including the range name (data_XXX).

meta-info can be the same, name can be different - one is measured
data, the other is calculated data.
_pd_meas_number_of_points refers to one block, and
_pd_proc_number_of_points to the other, so you can add "meas" / "proc"
to name.

> ok, I will remove all sample files that I have not got the
> re-distribution permission. But what about those sample files that are
> not downloaded from a URL (e.g. in an compressed archive)?

I don't know.

Marcin

--
Marcin Wojdyr | http://www.unipress.waw.pl/~wojdyr/

张鹏

unread,

Aug 20, 2007, 1:26:04 AM8/20/07

to Marcin Wojdyr, fityk-dev ML

Hi, Marcin:

How about handle the things like this?
in the following code skeleton,
* I separate the mata-info as a MetaInfo class, then add a ptr to
MetaInfo in both Range class and DataSet class (code in DataSet class
is not listed, but it's similar to Range).
* MetaInfo can be shared between those Ranges in pdCIF format, and
will never cause inconsistency.
* Added a name attribute to Range to identify it.
* Changed the Ctor & Dtor of Range to handle the MetaInfo class

class MetaInfo
{
public:
bool has_meta_key(const std::string &key) const;
bool has_meta() const { return (0 != meta_map.size()); }
std::vector<std::string> get_all_meta_keys() const;
const std::string& get_meta(std::string const& key) const;

/////////////////////////////////
// called internally

bool add_meta(const std::string &key, const std::string &val);

int get_refcnt() const { return refcnt; }
int inc_refcnt() { return ++refcnt; }
int dec_refcnt() { return --refcnt; }

protected:
std::map<std::string, std::string> meta_map;
int refcnt;
};

//////////////////////////////////////////////////////////////////////////
// The class for holding a range/block of x-y data
class Range
{
public:
MetaInfo *p_meta; // put it directly in public

Range(MetaInfo *p_meta_ = NULL) : p_meta(p_meta_), column_x(0),
column_y(1), column_stddev(-1)
{
if (!p_meta) {
p_meta = new MetaInfo;
}
}

virtual ~Range();

void set_name(std::string &name_) { name = name_; }
std::string get_name() { return name; }

... // other member functions, got rid of metainfo-related ones

protected:
std::string name;
...
};

Range::~Range()
{
vector<Column*>::iterator it;
for (it = cols.begin(); it != cols.end(); ++it) {
delete *it;
}

if (p_metainfo->dec_refcnt() == 0) {
delete p_metainfo;
}
}

//////////////////////////////////////////////////////////////////
// in the caller code

Range *p_rg = new Range;
p_rg->p_meta->add_meta("key", "val");
...
// after read the data of one "block in range" into p_rg, and now meet
another block
MetaInfo *p_mi = p_rg->meta;

ranges.push_back(p_rg);
...

Range *p_rg2 = new Range(p_mi);
...
ranges.push_back(p_rg2);

How do you think about this proposal? Looking forward to your reply.

About the SD format, I will skip the 250~810 if it's SD format. And
change the class name to PhilipsRawScanDataSet.

张鹏

unread,

Aug 20, 2007, 2:07:47 AM8/20/07

to Marcin Wojdyr, fityk-dev ML

Hi, Marcin:

Range(MetaInfo *p_meta_ = NULL) : p_meta(p_meta_), column_x(0),
column_y(1), column_stddev(-1)
{
if (!p_meta) {
p_meta = new MetaInfo;
}

p_meta->inc_refcnt(); // THIS LINE WAS MISSING
}

Marcin Wojdyr

unread,

Aug 20, 2007, 2:18:05 AM8/20/07

to 张鹏, fityk-dev ML

IMHO it's not good idea.
It complicates everything, and makes using the library more difficult:
..->get_range(n)->p_meta->get_meta(...)
Copying the map<string,string> is much simpler.

Marcin

张鹏

unread,

Aug 20, 2007, 2:25:10 AM8/20/07

to Marcin Wojdyr, fityk-dev ML

ok

On 8/20/07, Marcin Wojdyr <woj...@gmail.com> wrote:

Reply all

Reply to author

Forward