> Support to pdCIF format has been added, and the code has
> been committed.
You forgot to commit it (ds_pdcif.*).
> we can see that there are many possible X-Y combinations.
> To fully utilize all of the data in pdCIF, I think the current data
> structure to hold the x-y points need to be modified.
> We can read all of the data in, along with its labels, and then let the
> user to choose which one as x and which one
> as y.
Yes, this can be also useful for reading CSV files with more than two columns.
I'd suggest class MultiColumnRange with:
int get_column_count();
string get_column_name(int n); // returns "" if column is unnamed
void set_column_x(int n); // by default = 0
void set_column_y(int n); // by default = 1
void set_column_y_stddev(int n); // -1 = no stddev column
int get_column_x(), etc.
vector<string> column_names;
vector<double> all_data;
double get_x(int n) { return all_data[n * get_column_count() +
get_column_x()]; }
other thoughts:
- we can assume that y_has_stddev is a property of "range" rather than of point.
- that's bad that FixedStepRange inherits add_pt() and other methods
and members that make no sense for this class.
- if "calibration" in WinSpec format is linear, has_fixed_step()
should return true
- why add_pt() is virtual?
- IMHO almost all writing methods should be protected
- istream argument can be passed to load_data() instead of ctor.
- i'm not sure if check() should throw exceptions
> PS: I am now going to implement the PANalytical's data format.
I'd suggest other things to do first:
- as you wrote in one of the previous emails:
> I am now going to implement the two new formats and then re-write uxd
> and rigaku format without the UxdLikeDataSet
- check if you have rights to distribute all sample files that are in
samples/ directory
- find other data files (if it's possible) and test your library.
Having several sample files that are from the same source and are
almost identical is not very useful for testing.
Marcin
--
Marcin Wojdyr | http://www.unipress.waw.pl/~wojdyr/
> we can see that there are many possible X-Y combinations.
> To fully utilize all of the data in pdCIF, I think the current data
> structure to hold the x-y points need to be modified.
> We can read all of the data in, along with its labels, and then let the
> user to choose which one as x and which one
> as y.
Yes, this can be also useful for reading CSV files with more than two columns.
I'd suggest class MultiColumnRange with: ...
other thoughts:
- we can assume that y_has_stddev is a property of "range" rather than of point.
- that's bad that FixedStepRange inherits add_pt() and other methods
and members that make no sense for this class.
- if "calibration" in WinSpec format is linear, has_fixed_step()
should return true
- why add_pt() is virtual?
- IMHO almost all writing methods should be protected
- istream argument can be passed to load_data() instead of ctor.
- i'm not sure if check() should throw exceptions
> PS: I am now going to implement the PANalytical's data format.
I'd suggest other things to do first:
- as you wrote in one of the previous emails:
> I am now going to implement the two new formats and then re-write uxd
> and rigaku format without the UxdLikeDataSet
- check if you have rights to distribute all sample files that are in
samples/ directory
- find other data files (if it's possible) and test your library.
Having several sample files that are from the same source and are
almost identical is not very useful for testing.
OK, it looks good.
> bool has_stddev() { return (stddev != -1); }
> double get_stddev() { return stddev; }
> void set_stddev(double stddev_) { stddev = stddev_; }
[...]
> > - we can assume that y_has_stddev is a property of "range" rather than of
> point.
> ok. in my proposal, I have changed the stddev as a scalar member of Column.
I meant that y has std. dev. either for all points or for none. But it
is of course different for each point.
BTW, does notation like 0.778(9) in pdCIF mean 0.778 +- 0.009 ?
> > - IMHO almost all writing methods should be protected
>
> which methods do you mean ? is there this problem in my new proposal?
no, I was wrong
> > - istream argument can be passed to load_data() instead of ctor.
>
> yes, and there should also be some changes:
> * use istream *p_is instead of ref member f to store istream object.
Why do you want to store it?
> * need a member function clear() as following:
> And in load_data, clear() should be called first.
ok
BTW, does notation like 0.778(9) in pdCIF mean 0.778 +- 0.009 ?
> > - istream argument can be passed to load_data() instead of ctor.
>
> yes, and there should also be some changes:
> * use istream *p_is instead of ref member f to store istream object.
Why do you want to store it?
Thanks. How do you use it?
> > > > - istream argument can be passed to load_data() instead of ctor.
> > >
> > > yes, and there should also be some changes:
> > > * use istream *p_is instead of ref member f to store istream object.
> >
> > Why do you want to store it?
>
> 1. it will be convinient for the other member function to use it. Although,
> it can also passed as a function arg.
yes, and in some cases you don't have to pass it to other functions
> 2. it can used as a flag to indicate that an sream has been read in. And the
> methods like DataSet::get_range() can check it to make sure
> some data has been read.
I don't think it's really helpful. It's enough to check the number or
ranges (and even this is not necessary). You don't have to check if
the library is used in a correct way, although it is good if there are
assertions here and there. But you can assume that programmer who is
using it have read docs. (you can't assume that the input data is
correct, of course.)
> I don't think it's really helpful. It's enough to check the number or
> ranges (and even this is not necessary).
ok, I will remove it from the DataSet class in next version of code.
> You don't have to check if
> the library is used in a correct way, although it is good if there are
> assertions here and there. But you can assume that programmer who is
> using it have read docs. (you can't assume that the input data is
> correct, of course.)
mainly there are three kind of errors:
1. format error or file corruption: this is the most important one to
be prevented
2. system fault: like lack of memory new() return NULL, this should
also to be prevented.
3. wrong use of the lib by the client programmer who use xylib: this
can be taken less consideration of.