> * changed the existing code following Marcin's advice (fp, left-hand &
> right-hand, put xylib::util into separated files, etc..)
Let me focus on util.*:
string add_space(const string &str, int max_len)
{ string ret(str);
int spaces = max_len - str.size();
for (; spaces > 0; --spaces) {
ret += " ";
}
return ret;
}
You can replace:
int spaces = max_len - str.size();
for (; spaces > 0; --spaces) {
with:
for (int spaces = max_len - str.size(); spaces > 0; --spaces) {
or even better with:
for (int i = 0; i < max_len - str.size(); ++i) {
You can append spaces to string, to get needed length simply with:
ret.resize(max_len, ' ');
or
ret += string(' ', max_len - str.size());
But IHMO this function is not needed at all, you can use iostream
manipulators instead.
endiannes:
just a note: usually swapping bytes is done with bitwise operators.
But it's ok how it's done now.
// convert a string to double
double string_to_double(const string &str)
{
char **endptr = NULL;
return strtod(str.c_str(), endptr);
}
Instead of
double foo = string_to_double(bar);
you can just write
double foo = strtod(bar, NULL);
the `endptr' is used when you want to check, if the string was
converted properly.
We are touching one of the most difficult problems: what should we do
when the format is not exactly what it should be, in other words, how
strictly we should verify the file format and what variations from the
`standard' should be allowed. It's even worse when we do not have any
specification of the format, only examples.
in xylib.h
DataSet(const std::string &filename_, xy_ftype ftype_ = FT_UNKNOWN)
i think it never can be unknown.
I also think the way type information is handled (g_ftype, g_desc,
etc.) will need to be changed at some point, but let's leave it how it
is for now.
> * added meta-info exporting in export_to_xy() with an optional param to
> determine whether export meta-info together with the xy data
good
> * new format support: Philips_udf format
good
> * Used a class "UxdLikeDataSet" to present all UXD-like formats (so far,
> they are UXD, Rigaku_dat and Philips_udf format).
I'm not sure if it's a good idea, but there is no need to change it now.
You have in your code:
char ch = ll[0];
if (string::npos != string("0123456789+-").find(ch)) ...
Names that consists of only l's and O's are not recommended, because
of similarity to 1 and 0.
But what's worst in the snippet above is that I have to stop and think
for a while what it's supposed to do. Avoid all smart tricks, make is
as simple to read as possible.
I think what this line does is simply:
if (isdigit(ch) || ch == '+' || ch == '-') ...
Rethink using static members in UxdLikeDataSet, it won't work.
Could you write a tentative ordered list of the formats you are going
to implement?
Cheers,
Marcin
--
Marcin Wojdyr | http://www.unipress.waw.pl/~wojdyr/
> yes, this is a problem indeed. But I think now we can just assume that they
> are all as what they should be; otherwise, it will be very complex to be
> tolerant of the format errors.
Yes. Checking for all possible errors will make reading 2-3 times more
complicated. That's always painful.
You don't have to tolerate errors, you have a choice: either to try to
read the file anyway or to throw exception.
Next few issues:
expression "return ( condition ? true : false)" can (IMO should) be
replaced with just "return condition"
It's better to read the file only once, when it's possible - in
UdfDataSet::load_data (and other places) almost each line is read
twice (you read it first time, then you move stream pointer back, and
read it again). It's not optimal and it's easy to avoid reading it
twice.
> > Could you write a tentative ordered list of the formats you are going
> > to implement?
>
> Next, I will finish the testmain.cpp first, then try the following formats:
>
> JCAMP
> Powder Diffraction File (PDF) format
> Galactic (GRAMS/32) SPC files
> NetCDF
Could you write, what you know about these formats?
- ascii / binary
- how data there is structured logically: one or multiple ranges
(blocks), is it x-y data?
- what your implementation will be based on (format specification,
other open-source implementations, sample data, etc.)
You have a number of sample files for each formats. Please add
somewhere info where these files come from.
Regards,
Next few issues:
expression "return ( condition ? true : false)" can (IMO should) be
replaced with just "return condition"
It's better to read the file only once, when it's possible - in
UdfDataSet::load_data (and other places) almost each line is read
twice (you read it first time, then you move stream pointer back, and
read it again). It's not optimal and it's easy to avoid reading it
twice.
Could you write, what you know about these formats?
- ascii / binary
- how data there is structured logically: one or multiple ranges
(blocks), is it x-y data?
- what your implementation will be based on (format specification,
other open-source implementations, sample data, etc.)
You have a number of sample files for each formats. Please add
somewhere info where these files come from.
I meant every case where there is such an expression.
> > It's better to read the file only once, when it's possible - in
> > UdfDataSet::load_data (and other places) almost each line is read
> > twice (you read it first time, then you move stream pointer back, and
> > read it again). It's not optimal and it's easy to avoid reading it
> > twice.
>
>
> Is it suitable to just call seekg() to set the
> reading pointer back in the ifstream?
I'm not sure if I understand your question. If you move the pointer
back, you are probably going to read a part of the file again.
>
> Till now, I only have a basic knowledge of JCAMP-DX format. I think it will
> be better to implement these formats one by one, so I have not explore the
> other formats.
> JCAMP-DX info:
> ascii,
> multiple ranges. Maybe some of them have x-y data. The data are organized in
> a strange format, and now I am working hard to understand the organizaton of
> the data.
> My implementation of JCAMP-DX will based on both format
> specification and 2 open-source projects on SF.NET
> (http://sourceforge.net/search/?type_of_search=soft&words=JCAMP-DX).
I had a look at JCAMP-DX code, there is 10 times more lines of code
there than in your library now. How are you going to implement this
format? Is there anything I fail to see?
> Both of their license is LGPL, is there any problem to borrow some code from
> them?
> BTW, which license should I use in xylib: GPL or LGPL?
I'd recommend LGPL - more project will be able to use it.
> > You have a number of sample files for each formats. Please add
> > somewhere info where these files come from.
>
> OK, the source info of sample files has been written in a README file.
Great, I'm looking forward to see it.
I meant every case where there is such an expression.
> I had a look at JCAMP-DX code, there is 10 times more lines of code
> there than in your library now. How are you going to implement this
> format? Is there anything I fail to see?
>
> Yes, it is very complex.
> How much do you know about JCAMP-DX format? You can find some ref info at
> http://en.wikipedia.org/wiki/JCAMP
> Can you help me to get some papers (actually, they are format
> specifications) listed at that page? especially the first one:
>
> R. S. Mc Donald and Paul A. Wilks, JCAMP-DX: A Standard Form for Exchange of
> Infrared Spectra in Computer Readable Form, Appl. Spec., 1988, 2, 151-162.
>
> I am also trying to get these papers.
>
> And, I am wondering: Is there a way to re-use the JCAMP-DX
> project on SF?
It's hard to estimate how long it will take you to add this format, so
please leave it. There is not much time till the end of GSOC.
Could you rather pick one or two more formats from:
http://www.ccp14.ac.uk/solution/powderdataconv/index.html ?
Do you have a converting program (that's a few lines of code)?
Please submit the README file, where you have info about sample files.
For each file format, give some basic information about it (either in
code or in docs) and write what your implementation was be based on
(format specification, other open-source implementations, sample data,
etc.)
Marcin
It's hard to estimate how long it will take you to add this format, so
please leave it. There is not much time till the end of GSOC.
Could you rather pick one or two more formats from:
http://www.ccp14.ac.uk/solution/powderdataconv/index.html ?
Do you have a converting program (that's a few lines of code)?
Please submit the README file, where you have info about sample files.
> > R. S. Mc Donald and Paul A. Wilks, JCAMP-DX: A Standard Form for Exchange
> > of Infrared Spectra in Computer Readable Form, Appl. Spec., 1988, 2,
> > 151-162.
It should be:
McDonald, Robert S.; Wilks, Paul A., JCAMP-DX: A Standard Form for Exchange of
Infrared Spectra in Computer Readable Form, Applied Spectroscopy, Volume 42,
Number 1, January 1988 , pp. 151-162.
You can download a copy from www.ingentaconnect.com/content/sas/sas (DOI:
10.1366/0003702884428734).
Cheers,
Michael
I'll try to find documentation (by asking people) for more binary
files. There is a lot of binary formats produced by various
spectrometers, diffractometers, multichannel analyzers, etc., and
these instruments usually come (I think so, I'm not an
experimentalist) with paper manuals, where file formats are
documented, but it's hard to find it on the Internet.
But before I start asking people, it would be good to have a list of
already implemented formats, so when you have time, please update the
list in docs/formats.txt. You can also add there information, which
format is "multi-range".
Are you OK with LGPL licence?
> > Do you have a converting program (that's a few lines of code)?
>
> yes, I have finihed the code of the converting program.
> testmain can accept all arguments as the usage()
> says. And it can also automatically convert all of the
> sample files.
There are a few minor things I don't like about this program:
- inconsistent indentation
- #include "common.h". Please move what you need from this file to
util.h and remove the file.
- most of the code is "auto test", and IMO it was not a good idea to
put there auto-test.
- exceptions are not caught without -a option
- usage
- testmain.h file, that doesn't contain anything useful
I added simpler xyconv tool, that just convert files.
I also added 5-line shell script samples.sh, that does the same
"testmain -a" is doing, but
IMHO usually it's more convenient to test converting from command
line, without this script.
I also removed empty out/* directory from SVN.
BTW, what are empty strings in g_ftype and g_desc for?
> > Please submit the README file, where you have info about sample files.
>
> Added.
Thanks!
You can download a copy from www.ingentaconnect.com/content/sas/sas (DOI:
10.1366/0003702884428734).
I'll try to find documentation (by asking people) for more binary
files. ...
But before I start asking people, it would be good to have a list of
already implemented formats, so when you have time, please update the
list in docs/formats.txt. You can also add there information, which
format is "multi-range".
Are you OK with LGPL licence?
> > Do you have a converting program (that's a few lines of code)?
>
> yes, I have finihed the code of the converting program.
> testmain can accept all arguments as the usage()
> says. And it can also automatically convert all of the
> sample files.
There are a few minor things I don't like about this program:
- inconsistent indentation
...
I added simpler xyconv tool, that just convert files.
I also added 5-line shell script samples.sh, that does the same
"testmain -a" is doing, but
IMHO usually it's more convenient to test converting from command
line, without this script.
I also removed empty out/* directory from SVN.
BTW, what are empty strings in g_ftype and g_desc for?
> > list in docs/formats.txt. You can also add there information, which
> > format is "multi-range".
> ok. In fact, the most precise description is the format specification: like
> the scanned pages you sent me and the VAMAS paper. However, directly putting
> these files to the public domain may cause some legal and copyright issues.
> So, what info should I add into docs/formats? It cannot be as precise and
> detailed as the official format specification.
You don't need to have full specification there. It's up to you what
you put there.
The documentation can be minimal, but should be easy to find and up to date.
I'd be good to have a list of supported formats. You have two such
lists, docs/formats and README, both are incomplete. I understand that
you are still developing the code, but you must remember, that I
frequently review your code and docs, so please keep it clean.
I would recommend to:
update README file
move docs about file formats from "IMPLEMENTATION REF" in
test/*/README and from docs/formats to ds_*.cpp files.
svn rm docs/
Your description of Rigaku .dat file is very clear.
The documentation can be terse.
Example:
You have in README:
test/
Directory containing the original sample files. They are categorized
into different sub-directories according to their types. For more
detailed info (like where they are from), see the README file in each
sub-directory.
It's good, but I'd be also happy with shorter version:
test/ - sample files
or even better, rename test/ to samples/, and it will be self-explaining.
> > BTW, what are empty strings in g_ftype and g_desc for?
>
> those lines are for FT_UNKNOWN. I will add some text there.
>
aha, ok
BTW, IMO it's better to keep all format informations in ds_* files, like this:
struct FormatInfo
{
string name;
string desc; // short, can be used in filter of open-file dialog
vector<string> extensions;
bool binary;
bool multi_range;
TypeInfo(...);
bool has_extension(); // case insensitive
};
in each XXXDataSet:
static const FormatInfo;
in xylib.cpp:
FormatInfo const* formats[] = { ..., &RigakuDataSet::FormatInfo, ...} ;
> BTW, how much do you know about the JCAMP
> format? and if you have that document, please send me a copy.
Not much, unfortunately :) Sadly, we don't have access to the journal either.
However, I'll have a hunt around and see if we can get a hard copy of it
through an alternate library.
Best wishes,
Michael.
Thanks Micheal,
I have it, I'll sent it to Peng.
Are you using files in this format?
Marcin
> Are you using files in this format?
No. (These days my main tool for spectroscopy is XPS, thus plain TSV files and
occasionally those in Vamas format.)
Cheers,
Michael.