Re: case sensitive file test

28 views
Skip to first unread message

Kevin Wells

unread,
May 26, 2020, 11:25:05 AM5/26/20
to
In message <5876b6...@sick-of-spam.invalid>
Bob Latham <b...@sick-of-spam.invalid> wrote:

>In article <5876ae...@sick-of-spam.invalid>,
> Bob Latham <b...@sick-of-spam.invalid> wrote:
>> Can someone tell me what is the best (speed wise) method of testing
>> for a specific file but importantly the name in lower case.
>
>> I have a recursive program running which scans my music library. I
>> want it to specifically test each album for the existence of a file
>> 'folder/jpg' but to fail anything with a different case like
>> 'Folder/jpg'.
>
>> OS_File 17 does not appear to be case sensitive.
>
>> The only way I can see is to read the contents of the directory
>> using OS_GBPB 9 and wildcards and then test the characters for
>> lower case.
>
>> I'm thinking that may be a lttle slow when doing thousands and i'm
>> also struggling to make it work anyway. on a short test run it fails
>> 7 out of 10 albums and all albums had folder.jpg in them.
>
>[Snip]
>
>Okay, found the problem (eventually) with OS_GBPB 9 buffer size!
>
>But if anyone has a good way to test for a lowercase file name I'd
>love to hear it.

If it is just the first letter that has to be lower case why not try for
just the first letter by the letter code e.g lower case f is CHR$(102)
while uppercase F is CHR$(70)
>
>
>
> Thanks
>
> Bob.
>


--
Kev Wells
http://kevsoft.co.uk/ https://ko-fi.com/kevsoft
carpe cervisium
Idiot in search of a village.

Steve Fryatt

unread,
May 26, 2020, 12:15:04 PM5/26/20
to
On 26 May, Kevin Wells wrote in message
<c775bd7658.Kevin@Kevsoft>:
More generally, and allowing for a full set of possible characters:

DEF FNis_lower(string$)
LOCAL loop%, char%, byte%, bit%, alpha_table%, case_table%, alpha%, lower%

SYS "Territory_CharacterPropertyTable", -1, 2 TO lower_table%
SYS "Territory_CharacterPropertyTable", -1, 3 TO alpha_table%

FOR loop% = 1 TO LEN(string$)
char% = ASC(MID$(string$, loop%, 1))

byte% = char% DIV 8
bit% = char% MOD 8

alpha% = ((alpha_table%?byte%) AND (1 << bit%)) <> 0
lower% = ((lower_table%?byte%) AND (1 << bit%)) <> 0

IF alpha% AND (NOT lower%) THEN =FALSE
NEXT loop%

=TRUE

--
Steve Fryatt - Leeds, England

http://www.stevefryatt.org.uk/

Steve Fryatt

unread,
May 26, 2020, 12:55:03 PM5/26/20
to
On 26 May, Bob Latham wrote in message
<5876c7...@sick-of-spam.invalid>:

> Thanks for that and to be honest that would probably do. I just thought it
> was odd that there doesn't appear to be a way of being case sensitive
> without doing the testing yourself. I might have expected a flag on the
> entry to OS_file 17 to say fixed case but it appears not.

RISC OS filing systems are not case sensitive, full stop. Create yourself
two files in a HostFS folder with RPCEmu from the host system, using names
like "Text" and "text", and see things go wrong when accessing them in RISC
OS.

John Williams (News)

unread,
May 26, 2020, 1:04:47 PM5/26/20
to
In article <5876c7...@sick-of-spam.invalid>,
Bob Latham <b...@sick-of-spam.invalid> wrote:

> I might have expected a flag on the entry to OS_file 17 to say fixed case
> but it appears not.

Is not, and has the filer not always been, famously case agnostic?

And as a consequence, isn't your expectation above a bit unreasonable?

John

John Williams (News)

unread,
May 26, 2020, 1:07:01 PM5/26/20
to

Is there nothing in the file content you could work with rather than this
name-case business?

John

David Higton

unread,
May 26, 2020, 1:08:34 PM5/26/20
to
In message <5876b6...@sick-of-spam.invalid>
Bob Latham <b...@sick-of-spam.invalid> wrote:

> But if anyone has a good way to test for a lowercase file name I'd
> love to hear it.

RISC OS filing systems are case insensitive. The only way you can do
what you want is to iterate through the filenames, and do whatever
test you want on each filename returned.

David

Steve Fryatt

unread,
May 26, 2020, 1:35:04 PM5/26/20
to
On 26 May, Bob Latham wrote in message
<5876c9...@sick-of-spam.invalid>:

> I presume this is a means to test a directory listing to make sure an
> entry is lower case?

No, it's just a generic "is this string lower case" test. The two SWIs
return pointers to tables of bit flags (so 32 bytes of 8 bits each, for all
256 characters in a RISC OS character set). In alpha_table%, a bit is set if
the character is alphabetic; in lower_table%, its set if the character is
considered lower case.

You still need OS_GBPB to find the names to test.

Steve Drain

unread,
May 27, 2020, 8:15:58 AM5/27/20
to
On 26/05/2020 17:12, Steve Fryatt wrote:
>
> DEF FNis_lower(string$)
> LOCAL loop%, char%, byte%, bit%, alpha_table%, case_table%, alpha%, lower%
>
> SYS "Territory_CharacterPropertyTable", -1, 2 TO lower_table%
> SYS "Territory_CharacterPropertyTable", -1, 3 TO alpha_table%
>
> FOR loop% = 1 TO LEN(string$)
> char% = ASC(MID$(string$, loop%, 1))
>
> byte% = char% DIV 8
> bit% = char% MOD 8
>
> alpha% = ((alpha_table%?byte%) AND (1 << bit%)) <> 0
> lower% = ((lower_table%?byte%) AND (1 << bit%)) <> 0
>
> IF alpha% AND (NOT lower%) THEN =FALSE
> NEXT loop%
>
> =TRUE
Perhaps:

DEF FNis_lower(string$)
LOCAL buff%,upper%,char%
buff%=&8200:REM use input buffer or other block
$buff%=string$
SYS "Territory_UpperCaseTable",-1 TO upper%
FOR char%=buff% TO buff%+LENstring$-1
IF ?char%=upper%??char% THEN =FALSE:REM note ??
NEXT char%
=TRUE

Or, if you want to disentangle it, try:

DEF FNis_lower(string$)
LOCAL upper%,char%
SYS "Territory_UpperCaseTable",-1 TO upper%
FOR char%=&8100 TO &8100+LENstring$-1
IF ?char%=upper%??char% THEN =FALSE:REM note ??
NEXT char%
=TRUE

;-)

j...@mdfs.net

unread,
May 27, 2020, 7:25:41 PM5/27/20
to
> Or, if you want to disentangle it, try:

DEF FNis_lower($&8100)
LOCAL upper%,char%
SYS "Territory_UpperCaseTable",-1 TO upper%
char%=&8100-1
REPEAT
char%=char%+1
UNTIL ?char%=upper%??char% OR ?char%=13
=?char%=13

Steve Drain

unread,
May 28, 2020, 9:17:53 AM5/28/20
to
There are many ways to skin this cat and speed is hardly important these
days, but I think an early exit from the loop on first failure is
worthwhile. It certainly would be with a long string.

BTW my trick of using the string accumulator (&8100) works because the
LENstring function put the string in there. It is only safe until the
next string keyword and I would never actually use it.

Erik G

unread,
May 31, 2020, 9:19:57 PM5/31/20
to
A general afterthought about the efficiency (speed wise) of searching
a directory tree.

On 26/05/2020 13:46, Bob Latham wrote:
> Can someone tell me what is the best (speed wise) method of testing
> for a specific file but importantly the name in lower case.
>
> I have a recursive program running which scans my music library. I
> want it to specifically test each album for the existence of a file
> 'folder/jpg' but to fail anything with a different case like
> 'Folder/jpg'.
>
> OS_File 17 does not appear to be case sensitive.
>
> The only way I can see is to read the contents of the directory using
> OS_GBPB 9 and wildcards and then test the characters for lower case.
>
> I'm thinking that may be a little slow when doing thousands and I'm
> also struggling to make it work anyway. on a short test run it fails
> 7 out of 10 albums and all albums had folder.jpg in them.

(NOTE: it has been a long time since I studied the internals of
ADFS. Specific efficiency details of SWI calls such as OS_FILE and
OS_GBPB will have significant effect on the real runtime of any
such program. Read documentation and experiment to find the best
solution)

== In short, the thing I want to impress on all programmers is this:

To make any algorithm involving disk I/O fast, the focus needs to be
on:
- Making as few reads as possible
- Reading as much data in one operation as possible

Also:
- Don't spend much effort optimising the processing of the data by
the CPU, as the disk I/O will dominate the time the algorithm takes
to complete.

This example case of searching through a directory tree involves
reading several (or a lot) of directories and processing the
information with a program.
By far the most time-consuming part of this is the physical reading
of the information from a disk.
Reading one block of data requires:
1) moving the disk head to the correct track
2) waiting for the disk to rotate to the sector that contains the block
3) reading the magnetic information from the disk and transferring it
to memory.

Of these, steps 1 and 2 take up the most time, in the order of
milliseconds.

By comparison, you can do tons of CPU processing in a few milliseconds.

Note that reading several blocks in a row on the same track
returns more data, but only requires one head move (step 1) and one
wait (step 2).
Also note that continuing to read from the next track only needs
a very short (and thus quick) head move, while the wait time can be
practically eliminated by organising the disk in such a way that the
next block to read on this next track shows up just as the head has
settled in its new position.

So in the case of traversing a directory structure, it would be much
more efficient to read an entire directory on one go and then
process the data in memory (e.g. searching for a file that matches
a certain name or pattern), than it would be to ask for the first
directory entry, process it, then ask for the second entry, process
it, etcetera.

My advice for this particular program is to find the best combination
of SWI calls to get a good I/O performance.

In a more general sense it is a lot more efficient to read one big file
with all the data in it rather than have that data spread over lots
of small files. (For example: the game Kerbal Space Program used to
have every detail of the game in a separate file, taking up tens of
thousands of files.
It took several minutes to load. In recent versions many of
those files have been combined into a smaller number of bigger files,
and now the program loads in under a minute.)

And finally: developers of filing systems have worked for decades to
optimise the finding, reading, writing, extending and deletion of files,
using every trick in the book and inventing new ones, because disk I/O
is one of the major bottlenecks in the speed at which programs run.

--
Erik G.
From address is fake
See http://erikgrnh.home.xs4all.nl/

druck

unread,
Jun 1, 2020, 3:01:58 PM6/1/20
to
On 28/05/2020 14:16, Steve Drain wrote:
> There are many ways to skin this cat and speed is hardly important these
> days,

It can be if you hit a directory on a file server with many thousand
entries - it certainly lets you know who OS_GBPB's one entry at a time,
and who uses a decent sized buffer!

---druck

druck

unread,
Jun 1, 2020, 3:57:48 PM6/1/20
to
On 01/06/2020 02:19, Erik G wrote:
> And finally: developers of filing systems have worked for decades to
> optimise the finding, reading, writing, extending and deletion of files,
> using every trick in the book and inventing new ones, because disk I/O
> is one of the major bottlenecks in the speed at which programs run.

Unfortunately except on RISC OS, where no use is made of free memory to
cache filing system operations, as just about every other common OS does.

The closest RISC OS comes is some fixed size buffering an ADFS, which
often resulted in the Risc PC's slow motherboard IDE interface
outperforming much better 3rd party IDE hardware using IDEFS variants
with no caching.

---druck

j...@mdfs.net

unread,
Jun 4, 2020, 12:23:41 PM6/4/20
to
Similarly, if there's some I/O information that won't change over the
run of a program, read it once into a variable, then access the variable.
For example:
size%=EXT#inputfile then use size% instead of EXT#
If your program is never going to change screen mode:
SYS whatever TO xsz%,ysz%,etc then use xsz% and ysz%

etc.

Martin

unread,
Jun 4, 2020, 12:57:03 PM6/4/20
to
On 04 Jun in article
<a248d019-7c38-439a...@googlegroups.com>,
<j...@mdfs.net> wrote:
> Similarly, if there's some I/O information that won't change over
> the run of a program, read it once into a variable, then access the
> variable.

> For example:
> size%=EXT#inputfile then use size% instead of EXT#

Excellent advice, in general ... but this example ...

> If your program is never going to change screen mode:
> SYS whatever TO xsz%,ysz%,etc then use xsz% and ysz%

is a bad one, because if it is a Wimp program the mode is usually
changed outside your program, so ModeChange messages have to be
watched for and the relevant variables read again.

--
Martin Avison
Note that unfortunately this email address will become invalid
without notice if (when) any spam is received.

druck

unread,
Jun 4, 2020, 3:49:57 PM6/4/20
to
On 04/06/2020 17:23, j...@mdfs.net wrote:
> Similarly, if there's some I/O information that won't change over the
> run of a program, read it once into a variable, then access the variable.
> For example: > size%=EXT#inputfile then use size% instead of EXT#

Sorry, that's bad advice, a program should always assume filing system
data may be altered by other processes.

1) Obviously if its a Wimp application, other tasks are running
2) If the single tasking program can be run a in taskwindow or graphic
taskwindow, other tasks are running
3) If the file is on a remote filing system, other machines may alter it
4) If the file is on a local filing system which is shared, other
machines may alter it.

So only if you are outside the desktop, and storage is on a local non
shared disc, can you be sure it wont be altered by anything else.

> If your program is never going to change screen mode:
> SYS whatever TO xsz%,ysz%,etc then use xsz% and ysz%

Only if its running outside the desktop. Inside the desktop the mode can
change, so you need to ensure you handle the mode change message an
re-read any mode related parameters you are using.

---druck

j...@mdfs.net

unread,
Jun 4, 2020, 7:18:26 PM6/4/20
to
On Thursday, 4 June 2020 20:49:57 UTC+1, druck wrote:
> > For example: size%=EXT#inputfile then use size% instead of EXT#
>
> Sorry, that's bad advice, a program should always assume filing system
> data may be altered by other processes.

If it's open for input, other processes *can't* alter it.
Read By Many, Write By One.

> > If your program is never going to change screen mode:
> > SYS whatever TO xsz%,ysz%,etc then use xsz% and ysz%
>
> Only if its running outside the desktop. Inside the desktop the mode can
> change, so you need to ensure you handle the mode change message an
> re-read any mode related parameters you are using.

Which is why I wrote 'your program is never going to change screen
mode'. Maybe it should have been 'where the screen mode is never
going to be changed during the execution of the program'. Such as
a command line tool or a single-taking application.

druck

unread,
Jun 5, 2020, 6:27:36 AM6/5/20
to
On 05/06/2020 00:18, j...@mdfs.net wrote:
> On Thursday, 4 June 2020 20:49:57 UTC+1, druck wrote:
>>> For example: size%=EXT#inputfile then use size% instead of EXT#
>>
>> Sorry, that's bad advice, a program should always assume filing system
>> data may be altered by other processes.
>
> If it's open for input, other processes *can't* alter it.
> Read By Many, Write By One.

It's down to the implementation of the filing system to whether that is
true. Local filing systems will tend to lock on write, remote ones will
tend not to. It's a bit of a mine field!

---druck
Reply all
Reply to author
Forward
0 new messages