First, I cannot really decipher what you actually want to do and
where your problems are. The usual procedure is to post sample data:
input data and the corresponding output data at least (not shell
code that creates the input data). Anyway you find below some hints
and suggestions...
On 12.03.2023 01:06, Bryan wrote:
> I'm using gawk 5.1.0, bash 5.1.16, Ubuntu 22.04.2. I will write and
> provide a lot of material in case it is useful or there is conflict
> in the script, but I am trying not to ramble.
>
> I prepared a test script below - which should be easy to copy/paste
> into a shell, e.g. bash. I am focused on the gsub regexps, which are
> obviously contrived to replace all these different strings which - as
> they vary from output from another program - take the general form
> (attempting a "plain English" version):
>
> [open apostrophe][the word "path"][maybe an underscore][various
> digits][end apostrophe]
>
> I want to take all of that ^^^ and delete it - or equivalently
> replace it with nothing (ideally), to prepare input to gnuplot as
> "x,y" or "x y" data - two columns.
>
> I tried using this type of command :
>
> gsub("^[a-z]{4}$","TEST") ;
This is fine to substitutes lines containing _only_ a sequence of
four lower case letters to "TEST". gsub() _without_ the ^ and $
anchors will substitute any occurrence of that pattern on a line.
You can provide a third argument to gsub() to operate on variables
or specific fields; in that case the anchors ^ and $ will define
the beginning and end of that variable or field respectively.
It is also advantageous to use /.../ syntax for constant patterns
instead of the string form "...".
>
> ... and more, e.g. trying sub and gensub - but did not get far - I am
> aware of a curly brace escape that is important or not depending on
> the awk version, so I also tried with \{ and \}.
There's no need to escape these braces.
Instead of echo arguments with quotes and newline-escapes I suggest,
in shell, to use here-documents with this syntax:
awk '
# ... your awk program ...
...
' <<EOT
your data line 1
your data line 2
...
EOT
and with the more contemporary $(...) a line might be
{"path_1234567":[$(seq -s',' -f '%f' 1 20)], ...
but I wouldn't call seq many times but only once and assign it to a
variable and use that repeatedly
s=$(seq -s',' -f '%f' 1 20)
awk '
...
' <<EOT
{"path_1234567":[${s}], ...
...
EOT
If you pipe in or redirect other input just omit the code from <<EOT
onward.
data_from_some_process | awk '...'
awk '...' < data_from_some_file
(But for testing the here-documents have advantages.)
>
> ... the last printf thing is perhaps for another post, but (IIUC)
> matches every 2nd comma and replaces it with a newline.
printf doesn't replace anything. It prints every other time a newline
instead of a comma.
> So that's the
> "x,y" data idea. I hope that is clear - I imagine the regexps in the
> [a-z][0-9] parts ought to be able to go all into one gsub if I knew
> the syntax or what to read about.
To match more than one regexp for the _same_ replacement you can
combine them with the | (or) operator. For an example from your
code above use, e.g., gsub(/{|}|]/, "") to remove those three
braces/brackets in one expression.
But with your samples above you can also use other regexp syntaxes,
like ? (for optional parts) and use grouping with parenthesis (...)
for longer subexpressions, e.g.
[a-z][4}_?[0-9]{4}([0-9]{2})?
for an optional underscore and two optional digits.
Janis