Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: serial numbers as RS

16 views
Skip to first unread message

Janis Papanagnou

unread,
Jan 18, 2023, 12:56:35 AM1/18/23
to
The contents of your post is inconsistent...

On 18.01.2023 04:30, raj wrote:
> Hi
> I have file with 7 fields.

No. Field numbers vary. A typical value is 8.

> The first field is serial number

No. There's gaps, or, joined subsequent lines.

> In some records 5th field is missing.

Also other fields in joined lines.

> Few records got truncated with the next record. In the sample file
> I have shown only two records truncation but in some cases even three to four records got truncated.
> sample file:
>
> 1 651 643786485 107249 5190 M SMITH 1284
> 2 963 212018826 103480 M746 R WADHWA 156
> 3 232 215036022 105012 M743 SAMBA 337
> 4 232 215036023 105012 M743 SAMBA 443
> 5 054 215036704 103325 KIYA K 351 ====> 5th field is missing
> 6 205 308363068 103402 5537 Mc DON 943
> 7 231 343328800 105880 MANO M 6403 8 231 343329128 105880 MANO M 8324 =====> in both the records 5th field is missing
> 9 309 361257222 103595 M564 C R SAM 102 10 309 361297561 103595 M564 C R SAM 332
> 11 216 308659868 625402 9693 FERNAND 365
>
> The required output:
>
> 1 651 643786485 107249 5190 M SMITH 1284
> 2 963 212018826 103480 M746 R WADHWA 156
> 3 232 215036022 105012 M743 SAMBA 337
> 4 232 215036023 105012 M743 SAMBA 443
> 5 054 215036704 103325 4897 KIYA K 351

And where from should that "4897" come?

> 6 205 308363068 103402 5537 Mc DON 943
> 7 231 343328800 105880 MANO M 6403
> 8 231 343329128 105880 MANO M 8324

You want records with 7 and 8 fields mixed?

> 9 309 361257222 103595 M564 C R SAM 102
> 10 309 361297561 103595 M564 C R SAM 332
>
> I have tried by considering the serial number as RS but did not get the desired result
>
> awk 'BEGIN{RS="[0-9]+"}{
> print $0 RT
> }' file
>
> Actually I need first four fields(including serial number) and the last field.

This does not match with the "required output" above.

> If the "," delimiter is given in the output that would be more helpful.
>
> Thank you
>

...so fix your data sample and requirements first.

And have a closer look on the definition of lines that have a number
of fields that may be 14, 15, 16, and how to distinguish that data.

And speak with the one who created that data trash to fix his process.

Janis

Kees Nuyt

unread,
Jan 18, 2023, 9:45:40 AM1/18/23
to
On Tue, 17 Jan 2023 19:30:39 -0800 (PST), raj
<visi...@gmail.com> wrote:


> Actually I need first four fields(including serial number) and the last field.

The "last field" can always be addressed with $NF

> If the "," delimiter is given in the output that would be more helpful.

Have a look at OFS or printf. Your choice.
--
Kees Nuyt

raj

unread,
Jan 18, 2023, 9:57:34 AM1/18/23
to
The data was copy and pasted in a text editor from a pdf file.
The user is not having any tool/access to convert the pdf to doc or excel.

The problem is arising when it is directly copied from the pdf file.
That is the reason for inconsistency.

awk 'BEGIN{RS="[0-9]+"}{
print $0 RT
}' file
The result of above is breaking each field into a separate record.
....
.....



Janis Papanagnou

unread,
Jan 18, 2023, 10:26:57 AM1/18/23
to
On 18.01.2023 15:57, raj wrote:
>> [...]
>
> The data was copy and pasted in a text editor from a pdf file.

If all you have is a PDF I suggest to use a more sophisticated
PDF tool to extract the text in a more accurate plain text form,
or otherwise fix the worst formatting issue by hand before posting.

> The user is not having any tool/access to convert the pdf to doc or excel.
>
> The problem is arising when it is directly copied from the pdf file.
> That is the reason for inconsistency.

And don't forget to answer/clarify the other issues you have been
hinted to.

Janis

>
> [snip]

0 new messages