stride = 1
tmpw = mfscanf(stride, fd, '%s %i %i %*c %i %*c %i %*5c %s ' )
tmpw = mfscanf(stride, fd, '%s %i %i %*c %i %*c %i %*5c %s ' )
tmpw = mfscanf(stride, fd, '%s %i %i %*c %i %*c %i %*5c %s ' )
tmpw = mfscanf(stride, fd, '%s %i %i %*c %i %*c %i %*5c %s ' )
tmpw = mfscanf(stride, fd, '%s %i %i %*c %i %*c %i %*5c %s ' )
tmpw = mfscanf(stride, fd, '%s %i %i %*c %i %*c %i %*5c %s ' )
tmpw = mfscanf(stride, fd, '%s %i %i %*c %i %*c %i %*5c %s ' )
tmpw = mfscanf(stride, fd, '%s %i %i %*c %i %*c %i %*5c %s ' )
In actual application the stride is 1000 and the mfscanf
statement is inside of a loop. As actual data is >300k lines I'm
looking for improvements in speed.[16M lines not unreasonable. If
I get that far I would hope to convince data source to improve
the format ;] My original code had
tmpw = mfscanf(stride, fd, '%s %s %s %s %s ' )
Parsing the result seemed to be my bottleneck so I went looking
for alternatives. The rightmost column of data may have 1 or 2
digits to left of the decimal point.
Comments or suggestions?
TIA
My test data file contents are
Sep 03 13:06:00 freq: 10.87
Sep 04 13:06:02 freq: 10.95
Sep 09 13:06:12 freq: 10.88
Sep 10 13:06:14 freq: 10.92
Sep 11 13:07:13 freq: 10.86
Sep 12 13:07:15 freq: 11.02
Sep 13 13:07:17 freq: 11.05
Sep 14 13:07:19 freq: 11.43
Sep 15 13:07:21 freq: 10.94
Sep 16 13:07:23 freq: 11.17
Sep 27 13:05:51 freq: 10.92
Jan 18 13:05:53 freq: 10.90
[above is cut and paste from my editor]
When I paste my test code fragment to the Scilab window I get
Startup execution:
loading initial environment
-->datafile = xgetfile('*.log',"G:/2008 downloads/",title =
'Chose an input file');
-->fd = mopen(datafile,'r'); // open the file for reading
-->
-->stride = 1
stride =
1.
-->tmpw = mfscanf(stride, fd, '%s %i %i %*c %i %*c %i %*5c %s ' )
tmpw =
Sep 3 13 6 0 10.87
-->tmpw = mfscanf(stride, fd, '%s %i %i %*c %i %*c %i %*5c %s ' )
tmpw =
Sep 4 13 6 2 10.95
-->tmpw = mfscanf(stride, fd, '%s %i %i %*c %i %*c %i %*5c %s ' )
tmpw =
Sep 0 9 3 6 req:
-->tmpw = mfscanf(stride, fd, '%s %i %i %*c %i %*c %i %*5c %s ' )
tmpw =
10.88
-->tmpw = mfscanf(stride, fd, '%s %i %i %*c %i %*c %i %*5c %s ' )
tmpw =
Sep 10 13 6 14 10.92
-->tmpw = mfscanf(stride, fd, '%s %i %i %*c %i %*c %i %*5c %s ' )
tmpw =
Sep 11 13 7 13 10.86
-->tmpw = mfscanf(stride, fd, '%s %i %i %*c %i %*c %i %*5c %s ' )
tmpw =
Sep 12 13 7 15 11.02
-->tmpw = mfscanf(stride, fd, '%s %i %i %*c %i %*c %i %*5c %s ' )
tmpw =
Sep 13 13 7 17 11.05
the first -1 means read up to the end. ans I uses %d instead of %i,
because %i produces strange behavior.
Serge Steer
Richard Owlett a �crit :