Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Finally found it! A seekable file stream

19 views
Skip to first unread message

JJ

unread,
Mar 9, 2022, 1:04:20 AM3/9/22
to
Accessing large file was really slow when the needed data is at the end or
near the end of the file, because the data needs to be read from the start
until the needed file offset.

But I've finally found it. A seekable file stream ActiveX built in the
Windows itelf. No third party software required. The object was found in an
unexpected place/classification: the Speech API. With automation object
named `SAPI.SpFileStream`.

https://docs.microsoft.com/en-us/previous-versions/windows/desktop/ms722561(v=vs.85)

The object is meant for handling WAV audio files, but it also support raw
format or formatless.

With it, we can do faster processing of large files e.g. in-place patching
such as fixing frame rate or audio sampling rate of video files, or extract
part of a file from the middle or the end.

A huge bonus is that, it supports files larger than 4GB as long as the file
system supports it (tested with 6GB file; read and write). It's Read()
method gives Byte(). i.e. literal array of bytes. Not a variant array. It's
like ADDB.Stream's Read().

And best of all, the object already exists at least since Windows XP.

But it's not perfect. It has several limitations:

- There's no way to truncate a file. i.e. delete data at current file offset
and shrink the file size. The equivalent for the SetEndOfFile() Windows API.
This functionality is still absent from VBScript, without third party
software.

- Seek() can not be used to increase file size. Data must be written to
increase file size.

- There's no way to create and overwrite existing file as raw format. While
there's SSFMCreateForWrite (3) file mode to create and overwrite existing
file, it forces the format to WAV type 22, where the WAV header is
automatically written. The only exception is Windows XP, where
SSFMCreateForWrite file mode still only support raw format.

And one caveat:

- Passing a string to Write() will write the given string only in UTF-16
encoding, _plus_ one Null character.

Additional information which is not found in the documentation:

- On object creation, the default format type is SAFTNoAssignedFormat (0),
instead of SAFTDefault (-1).

- Format GUIDs:
- SAFTNoAssignedFormat (0): {00000000-0000-0000-0000-000000000000}
- SAFT22kHz16BitMono (22): {C31ADBAE-527F-4FF5-A230-F62BB61FF70C}

Mayayana

unread,
Mar 9, 2022, 9:02:22 AM3/9/22
to
"JJ" <jj4p...@gmail.com> wrote

|
|
https://docs.microsoft.com/en-us/previous-versions/windows/desktop/ms722561(v=vs.85)
|
| The object is meant for handling WAV audio files, but it also support raw
| format or formatless.
|

I'm curious what it is you're trying to accomplish with
byte data that needs to be done in VBS. You can pretty
much do what you like with FSO, as long as you don't have
a DBCS codepage on the local system. (Chinese, Korean,
or Japanese) You just need to work with the file as ANSI,
which is default. I've done things like brightening a bitmap
using nothing more than Textstream. Of course it's slow,
since it's dealing with variants, but VBS is not the tool for
speed, anyway.

The WSH designers thought that WSH would be used by
admins who only need file ops to do things like read or
write logs, so they made Textstream a simplifed, non-binary
access. But it still works. All files are binary. There are just
special considerations to deal with a null. For example, you
can read a file in using Read(filelen) but if you use ReadAll
it will be snipped at the first null. The WSH designers were
surprisingly sloppy. I suppose that at the time it was just a
seat-of-the-pants GUI update to DOS that they figured would
only be needed by a few people.


R.Wieser

unread,
Mar 9, 2022, 10:59:20 AM3/9/22
to
Mayayana,

> For example, ... if you use ReadAll it will be snipped at
> the first null.

Not quite. Have you ever looked at the last series of bytes in both the
actual file are the resulting "readall" buffer ? They match.

Also, up until a certain number of bytes embedded NUL chars *and their
following data* will be kept as you have provided it.

The problem is that "readall" reads the file in chunks. When the current
buffer overflows a new buffer is allocated the size of the old one + the
size of a chunk, and copies the old buffer into the new buffer. And that
copy method is where it goes wrong - it does a zero-terminated string copy*,
instead of (a much simpler) block copy.

But it still it stores the new data at the correct point in that buffer -
into the last block of it. Hence the start *and* the end of the file
matching the "readall" result.

* That copy method actually has got two paths, one for zero-terminated
string, and another for binary. My guess is therefore that someone made a
fat-finger error - providing a True where a False should have been (or
vice-verse).

> The WSH designers were surprisingly sloppy.

Tell me about it. Like the dictionary object *which adds a key* when you
ask for it to return the data of a key and the key doesn't exist yet. It
makes a bit of a mockery of the objects "exists" method.

The most surprising to me is that neither have been fixed ...

> I suppose that at the time it was just a seat-of-the-pants GUI
> update to DOS that they figured would only be needed by a few
> people.

It was also used as the MS script language for webpages (as a "better"
alternative to JS). I remember it as part of IE5 (W98).

Regards,
Rudy Wieser


Mayayana

unread,
Mar 9, 2022, 12:08:22 PM3/9/22
to
"R.Wieser" <add...@not.available> wrote

| > I suppose that at the time it was just a seat-of-the-pants GUI
| > update to DOS that they figured would only be needed by a few
| > people.
|
| It was also used as the MS script language for webpages (as a "better"
| alternative to JS). I remember it as part of IE5 (W98).
|

Yes. I used to have two VBS books from '95 or '96.
They pre-dated CreateObject and only adressed the
browser DOM. One of them explained that VBS would
be supported in all browsers very soon. :)

I'm not sure when the COM tie-in happened -- whether
WSH came first or ActiveX controls came first. Probably
the latter. And the rest is bad browser security history.


JJ

unread,
Mar 10, 2022, 12:33:41 AM3/10/22
to
On Wed, 9 Mar 2022 09:01:56 -0500, Mayayana wrote:
>
> I'm curious what it is you're trying to accomplish with
> byte data that needs to be done in VBS. You can pretty
> much do what you like with FSO, as long as you don't have
> a DBCS codepage on the local system. (Chinese, Korean,
> or Japanese) You just need to work with the file as ANSI,
> which is default. I've done things like brightening a bitmap
> using nothing more than Textstream. Of course it's slow,
> since it's dealing with variants, but VBS is not the tool for
> speed, anyway.
>
> The WSH designers thought that WSH would be used by
> admins who only need file ops to do things like read or
> write logs, so they made Textstream a simplifed, non-binary
> access. But it still works. All files are binary. There are just
> special considerations to deal with a null. For example, you
> can read a file in using Read(filelen) but if you use ReadAll
> it will be snipped at the first null. The WSH designers were
> surprisingly sloppy. I suppose that at the time it was just a
> seat-of-the-pants GUI update to DOS that they figured would
> only be needed by a few people.

I've already provided a view examples. And I did mention that it for working
with large files. Image files aren't usually that large. I'm talking about
100MB+ files.

I'm just maximizing the usage of a software. By using only what's already
available in the system. Without the need of additional software.

It's easy to use additional softwares to solve things. But what can one do
without them? There's no guarantee in life that everything will be available
at any time.

Of course, there's a hard limit of what a software can do. When that
happens, there's no other way than to use an alternative or additional
software.

WSH is like a survival's scripting tool. It's not meant for performance and
efficiency in the first place, and it can't do everything. It's not a
perfect tool. But it may do a job with some workarounds.
0 new messages