Hex File Snooper

60 views
Skip to first unread message

Thomas Passin

unread,
Jun 10, 2024, 7:53:51 PMJun 10
to leo-editor
Sometimes it is useful to view the first N bytes of a file without reading the whole file.  The file might be much too large to load into a text editor or a Leo node, for example. The Linux command head will show you just the start of a file, but Windows doesn't have that command out of the box.  Often it's better to look at the hex bytes rather than the text, especially if the file is binary.  Say, for example, that you want to see if a file is a zip file or not, or if the EXIF data is embedded in a .jpg picture file.

Here is a Leo command that loads a file using the file dialog.  It creates a new node at the end of your outline and fills it with a classic hex view of the first 4096 bytes.

I have this script as an @command node in my myLeoSettings.leo file, and I also have a custom menu item for it.  I don't use it all the time but it's invaluable when it is needed.

Here is the script:

"""Display as hex the first 4096 bytes of a file in body of a new node."""

data = ''

def format_bytes(bytes):
    """Format byte data into classic hex bytes display."""
    text = ''
    asc = ' ' * 3  # For translation of bytes to ascii characters
    n = 0
    for i, b in enumerate(bytes):
        text += f'{b:02x}'
        asc += chr(b) if b > 0x1F else '.'
        if i > 0 and i % 10 == 9:
            # Complete the current line
            text += asc
            text += '\n'
            asc = ' ' * 3
            n += 1
            if n % 8 == 0:
                # Insert a blank line for readability
                text += '\n'
        else:
            text += ' '
    return text

filetypes = [('Any', '*.*'),]
path = g.app.gui.runOpenFileDialog(c, 'Choose File To Sample', filetypes=filetypes)
g.es(path)

if path:
    try:
        with open(path, 'rb') as f:
            data = f.read(4096)
    except IOError as e:
        g.es(e)

if data:
    # Create and select target node
    p_last = c.lastVisible()
    target = p_last.insertAfter()
    target.h = f'first bytes of {path}'
    target.b = f'{format_bytes(data)}'
    target.setDirty()
    c.selectPosition(target)
    while c.canMoveOutlineLeft():
        c.moveOutlineLeft()
    c.redraw()

Here is (just the first part) of the bytes of an image file of mine.  We can see the signature of a .jpg file. and that it uses the sRGB color map.

ff d8 ff e0 00 10 4a 46 49 46   ÿØÿà..JFIF
00 01 01 00 00 01 00 01 00 00   ..........
ff e2 0c 58 49 43 43 5f 50 52   ÿâ.XICC_PR
4f 46 49 4c 45 00 01 01 00 00   OFILE.....
0c 48 6c 63 6d 73 02 10 00 00   .Hlcms....
6d 6e 74 72 52 47 42 20 58 59   mntrRGB XY
5a 20 07 ce 00 02 00 09 00 06   Z .Î......
00 31 00 00 61 63 73 70 4d 53   .1..acspMS

46 54 00 00 00 00 49 45 43 20   FT....IEC
73 52 47 42 00 00 00 00 00 00   sRGB......
00 00 00 00 00 00 00 00 f6 d6   ........öÖ
00 01 00 00 00 00 d3 2d 6c 63   ......Ó-lc
6d 73 00 00 00 00 00 00 00 00   ms........
00 00 00 00 00 00 00 00 00 00   ..........
00 00 00 00 00 00 00 00 00 00   ..........
00 00 00 00 00 00 00 00 00 00   ..........


jkn

unread,
Jun 11, 2024, 3:38:50 AMJun 11
to leo-editor
This reminds me of a general muse 'it would be nice if...' thought: it would be nice if there was a way of defining more featureful dialogs, so that in this case for instance, you could browse for a file (or put a pathname in), and also add a file offset, or something like that.

I am aware that such a thing would tend to suffer from featuritis, however.

    J^n

Edward K. Ream

unread,
Jun 11, 2024, 5:08:08 AMJun 11
to leo-e...@googlegroups.com
On Mon, Jun 10, 2024 at 6:53 PM Thomas Passin <tbp1...@gmail.com> wrote:
Sometimes it is useful to view the first N bytes of a file without reading the whole file. 

This tool reminds me of tools I used ca 1980 at the dawn of the personal computer age.

Edward

Thomas Passin

unread,
Jun 11, 2024, 6:01:07 AMJun 11
to leo-editor
Yes, that kind of display goes *way* back.  And back then you had so little memory that you only wanted to load small slices of a file. I don't know what others do on Windows, but I have a small head.cmd script that uses awk to do the job (for text files, not binaries).

Jacob Peck

unread,
Jun 11, 2024, 12:35:26 PMJun 11
to leo-e...@googlegroups.com
On Windows, I use my Linux tools :)

WSL is fantastic for this — ‘xxd’ works just as well in a WSL shell as it does on my Arch laptop.

This script is real nice though! I’ve often wished Leo had a ‘hex mode’, for whatever that would mean.  It would be a heavy lift though, to get proper hex editing going. Not to mention figuring out what the semantics of that even mean in the context of an outliner…

Jake

On Jun 11, 2024, at 6:01 AM, Thomas Passin <tbp1...@gmail.com> wrote:

Yes, that kind of display goes *way* back.  And back then you had so little memory that you only wanted to load small slices of a file. I don't know what others do on Windows, but I have a small head.cmd script that uses awk to do the job (for text files, not binaries).
--
You received this message because you are subscribed to the Google Groups "leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email to leo-editor+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/leo-editor/23afab4b-f3df-4417-8c71-a105fffc07c3n%40googlegroups.com.

Thomas Passin

unread,
Jun 11, 2024, 1:58:56 PMJun 11
to leo-editor
On Tuesday, June 11, 2024 at 12:35:26 PM UTC-4 gates...@gmail.com wrote:
On Windows, I use my Linux tools :)

WSL is fantastic for this — ‘xxd’ works just as well in a WSL shell as it does on my Arch laptop.

I haven't tried WSL yet.  Does it take up a lot of memory or disk space? I'm not a Linux whiz, and I usually spin up a Linux VM when I need to try something.

Jacob Peck

unread,
Jun 11, 2024, 3:25:02 PMJun 11
to leo-e...@googlegroups.com
As with all things, YMMV.  But I run a Windows 10 installation, with ArchWSL as my WSL distro.  It's on WSL2 (as opposed to WSL1, which did things differently).  WSL2 effectively runs the Linux kernel as a program, which other linux programs can use to get their system calls answered, so it's fairly lightweight in that aspect.

On my system, the memory impact is minimal (maybe 10-100MB when idle). Disk space is going to be entirely up to what you install inside your distro -- on my box it's hovering at around 50GB, but that's after using this same installation for many years, installing packages, etc.  The default install depending on distro could be 10GB or so.

Once in a WSL shell, your Windows drives and files are available in /mnt/<driveletter>.  For example, /mnt/c is my C: drive.  It's really slick how well they pulled it off.

Jake

--
You received this message because you are subscribed to the Google Groups "leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email to leo-editor+...@googlegroups.com.

jkn

unread,
Jun 13, 2024, 3:04:06 AMJun 13
to leo-editor
I hadn't heard of ArchWSL before, thanks for that. I run Kubuntu rather than Arch Linux, but I like the latter's documentation a lot.

    J^n

David Szent-Györgyi

unread,
Jun 13, 2024, 9:57:45 AMJun 13
to leo-editor
The TECO text editor was designed to edit files that were too large to fit in memory. I used TECO on a PDP8/e minicomputer; the 8/e used 12-bit words; memory was addressed in 128-word pages and 4096-word fields. 

The original implementation of Emacs was a set of "editor macros" written in TECO. Maybe this means that the old joke that every program accrues features to become more like Emacs has it wrong; every program grows to become more like TECO. *grin*

Thomas Passin

unread,
Jun 13, 2024, 11:11:29 AMJun 13
to leo-editor
I was using a PDP8/i back around 1973.  I don't remember TECO.  Maybe it came in with the /e?  Or maybe it's just the passage of time ...

David Szent-Györgyi

unread,
Jun 13, 2024, 3:17:49 PMJun 13
to leo-editor
TECO for the PDP/8 was available under OS/8. The Wikipedia article on TECO is a nice summary. There you can find links to Web pages on TECO, including one by the originatoer of TECO and the GitHub repository for TECOC, a reimplementation in C for Windows, macOS, and LInux. 

Thomas Passin

unread,
Jun 13, 2024, 3:45:01 PMJun 13
to leo-editor
After reading the link, I see I never got near anything like TECO.

jkn

unread,
Jun 13, 2024, 5:08:29 PMJun 13
to leo-editor
I never used TECO on a PDP (PDP/11 in my case), but I did use PMATE ('Michael Aaronson's Text Editor, IIRC) on an early S-100 Z80 computer, and that was heavily 'inspired' by TECO, I believe. working with the 'command syntax' was great fun, as was customising the editor to work with your graphics card driver.

Thomas Passin

unread,
Jun 13, 2024, 6:19:31 PMJun 13
to leo-editor
 It was remarkable how much faster a 2 MHz Z80 was than an early PC with the 8088 at nearly 5 MHz.  I was able to make a direct comparison because I had a FORTH program I used all the time on a 64k Z80 machine.  When I got a PC I was able to find a 8088 FORTH system.  FORTH was written with a small core of assembler and then all the other FORTH instructions were written using that fast core.  So FORTH for the two machines was as nearly comparable as you were going to get.

The PC version running our FORTH code on a 8088 machine was wretchedly slow compared to the Z80 version.  The PC version only caught up when the AT came out with an 8MHz processor.

David Szent-Györgyi

unread,
Jun 16, 2024, 10:02:49 AM (13 days ago) Jun 16
to leo-editor
The original IBM PC was a quick-and-dirty design; performance was not a goal. 

FORTH is an opposite: Simplicity and flexibility are at the heart of its design, with carefully considered access to assembly language where required for performance or access to bare metal. The design requires trading conventional convenience for that. More conventional schemes for access to bare metal involve complex, fragile, non-portable and expensive platform development tools. Given engineering talent able to use FORTH, FORTH makes sense. The wide-open architecture of FORTH requires discipline, documentation, and careful management to write code that not only runs but can be read and maintained. 

I haven't made a career of embedded systems work, but I have done a few small projects of that sort, and I think that FORTH holds its own in the right hands. 

David Szent-Györgyi

unread,
Jun 16, 2024, 10:37:34 AM (13 days ago) Jun 16
to leo-editor
From Documentation for Atari CoinOp Forth and Swarthmore Extensions

HOW TO WRITE GOOD FORTH 

FORTH suffers from a combination of bad press arid bad programmers. It is actually easier to write good FORTH than good anything else, since it is flexible, imposes few restrictions, and extracts no penalty for short programs, (Short programs, incidentally, are where it's at - breaking things up into small, sensible, lucid pieces is a key to good programming).

Why write good FORTH?

Good code is faster to write, if you consider total time, since it comes from clear thinking and good organization, arid it is much easier to debug. You can understand and modify good code in amazingly shorter time*

Finally, you're much more employable if you write good code* Should a prospective employee tell me that he or she prides him or herself in writing good, clear, maintainable, well-documented FORTH I would, after picking myself up off the floor, endeavor to hire the person or. the spot.

How to write good FORTH:

You need to (1) know what good FORTH is, (2) discipline yourself to write it. Item (1) will be addresed[sic] next. For (2), I've observed that a few thoughtful rewritings of a reasonably sized piece of code will not only help a decent programmer understand what good code is, but will also establish the necessary habits so that writing passable code becomes almost automatic, and good code a distinct possibility. 

What is good FORTH?

Good FORTH consists of short, well thought out pieces. A word should rarely take up more than half a screen. If you use more than that, see whether you can break up the word into smaller, more coherent pieces* A good FORTH word is one which is easy to understand, and which you might be tempted to use again elsewhere.

Good FORTH is vertical, not horizontal. That is, there are few words per line and the lines are left-indented in an intelligent fashion. DOs and LOOPs, BEGINs and UNTILs, IFs, TH£Ns, and ELSEs, etc. are all left justified, with inner groupings to the right of outer ones. Take a look, at the Swarthmore source code to see what this means.

Each line should contain no more than a single idea.

A comment accompanying each word should show its effect on the stack, preferably with helpful mnemonic symbols for the stack, elements.

Words should be liberally sprinkled with comments. More lengthy comments can be put after the "-->" or ";S". The best code is almost self-documented.

The first line of a screen should consist of a comment which describes the contents of the screen. You should rarely start with more than half a screen full of code. 

Will you go broke buying the number of disks necessary to write good FORTH? Hardly. Even in this expansive forn, FORTH gets you a lot of mileage. Moreover, few items connected with computers are as cheap as disk space. Finally, truth to tell, revisions and reworking usually result in screens filling up more than the minimal recommended (starting minimally gives you room for such expansion). 

FORTH is supposedly a "stack oriented" language, but the programmer should not be stack oriented. Juggling several elements or. the stack can cause a surprising number of errors. In almost all situations, it's much clearer and leads to many fewer errors if you introduce sensibly named variables. It is hard to find a situation when such a practice will measurably slow down a program. It is possible to juggle stack elements by sending them to and from the return stack (it's possible to do lots of wild things in FORTH), but it's usually just not worth it. 

If you simply must play with several items or. the stack, you're probably best off making up special stack manipulation words. If a word deals with several items on a stack, I've also found it very helpful to put in full comment lines within the word which show the stack, changes in clear mnemonics[sic],

Well-thought-out names can contribute immensely to good code. It's worth spending tine making up useful names (self-documenting code, again). One helpful approach is to try to name what the word does, not how it does it. Don't use an unidentified nunber in a word. Either identify it in a content or define a constant (with a good name) arid use it in the code.

Avoid abbreviations when you're writing code. The extra typing is trivial and the clarity introduced is considerable. It's fair and reasonable to introduce abbreviations in a testing/debugging session, however.

Is our code good?

It varies. It's getting better. At first we committed the negation of the above ideas. Look over our source code and see what you like and what is clear. There are probably 8 different prograMMers represented on the SwarthMore disks, so there are lots of different styles, ideas, arid approachs. 

Further thoughts or. writing good FORTH are to be found in Leo Brodie's Starting FORTH

Thomas Passin

unread,
Jun 16, 2024, 11:50:30 AM (13 days ago) Jun 16
to leo-editor
However, the simple FORTH kernel (I think it was fig-FORTH but my memory is hazy) didn't use DOS calls at all, and only screen write and direct disk access BIOS calls (that was INT21, wasn't it?). I don't see how the PC or DOS design could have slowed it down by much compared with the CP/M Z-80 version.
Reply all
Reply to author
Forward
0 new messages