This could be an interesting topic, and also has high flame-war potential.
First, you can probably count on one hand people anywhere in the world that wrote -1 assembler back in the day that are also doing it now, so the pool of deep knowledge is limited.
Second, there will be 2 groups, those that will insist on 'purity' and doing things exactly like they were done 60 yrs ago, and those that try new things.
"Stack? That's totally against the spirit of -1 programming!". Actually not, it seems there was a stack implementation at MIT back in the day.
Having primed the flame pump, here is what I'm doing, and it's definitely a learning process. I haven't written assembler in decades:
I have been using jda almost exclusively because it saves the AC. It does take twice as long, though, 10usec. Also, if you write a macro that is a 'subroutine', how do you know without looking at its code if you do a jda or a jsp? Easier for me to standardize on one way unless it's hidden by a macro.
I have avoided xct. While it was used often, self-modifying code quickly became recognized as a Bad Idea once separate I&D space machines and real operating systems became more common, not to mention memory management. The -1 certainly encourages it, with dap and dip instructions. I don't use the 'dap x... jmp x .... x, jmp .' pattern.
I have started accumulating macros for dealing with bits like extended memory consistently, farjda, farjmp, fardac, farlac, etc. but note that macro and macro1 don't have any direct support for extended memory, including producing loadable tapes that load into extended memory. Unless, of course, there is some hidden magic somewhere. Haven't found any.
Other than that, I'm still learning. One instruction that I'm trying to use more is law, load AC immediate with a 12 bit (or is it 13 bits or 14 bits? Still haven't tracked that down). Saves a memory location and a cycle.
Now for the controversial bit. Regardless of any "it was revolutionary for the time" discussion, macro is really poor and error prone. 3 character symbols, symbol redefinition just by declaring a symbol more than once, and the horrible 'space means add', which leads to tortuous and incomprehensible code like 'spa+xyz-qrs...' to get the summed keywords to add up to the actual instruction you want. Not to mention very poor error checking. Try "flexo AbC' and see what you actually get. Macro1 improved things slightly, 6 chars for symbols and some more operators. There was also as assembler used at MIT and authentically -1, 'certainty', which improved things even more, but it seems to have vanished.
I've written a couple of thousand lines of assembler so far, and my goal isn't to see how closely I can do (or suffer thru doing) things 'the original way'. I want to be productive. Hence my new assembler, am1. That's what I'm using exclusively now, and the rim tapes it produces load and run just like 'real' ones. One driver was support for extended memory both in the code and in the loader. Long symbol names. No symbol magic redefinition. Local and global symbols. Error checking (gasp). Pseudo-linkable separate programs. Yes, pure evil.
Back to the topic, writing assembler for the -1 is certainly great fun and also a challenge. I've had to learn mostly the hard way how extended memory works, how the sbs system works, how that interacts with IOTs, etc. Some of it required digging into the emulator implementation, and I usually cross-check the pidp-1 impl with the simh one. For example, did you know that you can combine multiple operate instructions? There is a defined execution order. Where is that documented? Nowhere I could find without looking at the emulators.
BTW, that's where all the tortuous macro coding mentioned above comes in. I really don't understand why anyone thought 'cli cla clf' should mean 'cli+cla+clf'. The -1 has an or instruction, trivial change in the assembler. Am1 assumes that means 'cli | cla | clf', which makes far more sense to me. "But that will break existing code!". Mostly not, the expression with explicit adds and subtracts work fine. The original macro way guarantees you're going to write some code that isn't going to do anything like you thought it would and you'll pull your hair out trying to figure out why. I learned this from experience.
I suppose we should encourage anyone writing assembler for the -1 to start accumulating a list of tricks, knowledge, etc. to make life easier for everyone. Everyone writing assembler that is, which will still be a small pool. I'll try to remember to do so.
One final comment, it's great that we have a display implementation, tape reader and punch, soroban typewriter. But, how do you really communicate with the outside world? How do you store data? The real -1's had drums and mag tapes. There was DCS, which supported multiple serial devices (apparently mostly teletypes). That's why I implemented the Type 23 drum and a modernized (but still compatible) DCS. Using those, it's certainly feasible to have a web server, written in -1 assembler, running. Shades of modernity! Haven't tackled mag tapes or dectapes. Someone else, step up. Oh, and that's the reason I implemented dynamically-loaded IOTs, didn't want to have to hack the emulator every time I added something.
Flame on,
Bill