\d{3};\w{3};[^;]*;[^;]*;\d{10};\w{2};\d{2};\d{5};[^;]*;[^;]*;[^;]*;[^;]*;[^;]*;[^;]*;\w{2};\d{2};\d{5};[^;]*;\d{12};[^;]*;[^;]*;\d{8};[^;]*;\d{12};[^;]*;[^;]*;\d;\d;\d;\d;\d;\d;\d;(1);[^;]*;[^\n]*
will do the trick. For filtering you don't need the group capturing on the 1 but it is useful with Pattern Playground to verify you're getting the right field position and field contents matched.
For your "Sort the file by MSGNO, ADRC_COUNTRY, ADRC_REGION, ADRC_POST_CODE1, ADRC_CITY1, ADRC_CITY2, ADRC_STREET and ADRC_HOUSE_NUM1" using Text -> Sort Lines ... with a grep pattern of:
\d{3};\w{3};[^;]*;[^;]*;\d{10};(\w{2});(\d{2});(\d{5});([^;]*);[^;]*;([^;]*);([^;]*);([^;]*);[^;]*;\w{2};\d{2};\d{5};[^;]*;\d{12};[^;]*;[^;]*;\d{8};[^;]*;\d{12};[^;]*;[^;]*;\d;\d;\d;\d;\d;\d;\d;\d;([^;]*);[^\n]*
with "Specific sub-patterns" selected with \8\1\2\3\4\5\6\7 in the fill in field will sort your example text using your desired field ordering.
--
This is the BBEdit Talk public discussion group. If you have a feature request or believe that the application isn't working correctly, please email "sup...@barebones.com" rather than posting here. Follow @bbedit on Mastodon: <https://mastodon.social/@bbedit>
---
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbedit+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/bbedit/50130484-14eb-4298-b762-800f88b2c66en%40googlegroups.com.
Am 26.03.2025 um 09:29 schrieb Vlad Ghitulescu <Vl...@Ghitulescu.de>:
The rearrange columns worked only on the first 25,816 lines
--
This is the BBEdit Talk public discussion group. If you have a feature request or believe that the application isn't working correctly, please email "sup...@barebones.com" rather than posting here. Follow @bbedit on Mastodon: <https://mastodon.social/@bbedit>
---
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbedit+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/bbedit/D78F9924-03B3-4117-9B3D-BC6614B5D1D5%40Ghitulescu.de.
<CleanShot 2025-03-26 at 09.21.19.png>
The rest of the file was more or less mixed (see above). I’ll talk with the support about it.
Am 26.03.2025 um 09:08 schrieb Vlad Ghitulescu <Vl...@Ghitulescu.de>:
Hello again,
I changed all the semicolons to ",", changed then the grep ^ and $ into " and then voilá! the columns magically appeared:
<CleanShot 2025-03-26 at 08.57.09.png>
After this I could successfully rearrange the columns
<CleanShot 2025-03-26 at 09.01.19.png>
I then sorted all but the first line… and this totally destroyed the file
<CleanShot 2025-03-26 at 09.03.44.png>
(The copy of the file, that is - naturally! 😉)
I’ll start analyzing the new problem…
Regards,
Vlad
<CleanShot 2025-03-26 at 08.11.14.png>
but unfortunately that didn’t unlock the possibility to rearrange the columns
<CleanShot 2025-03-26 at 08.11.50.png>
Any idea where I went wrong?
Thanks again!
Regards,
Vlad
To view this discussion visit https://groups.google.com/d/msgid/bbedit/em0f3661a9-5cf2-410e-bf67-3da2c28d5975%40c8f72f7e.com.
--
This is the BBEdit Talk public discussion group. If you have a feature request or believe that the application isn't working correctly, please email "sup...@barebones.com" rather than posting here. Follow @bbedit on Mastodon: <https://mastodon.social/@bbedit>
---
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbedit+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/bbedit/D78F9924-03B3-4117-9B3D-BC6614B5D1D5%40Ghitulescu.de.
To view this discussion visit https://groups.google.com/d/msgid/bbedit/ema35f173d-e03b-43d0-84b2-ae4ce1b51f9d%40181c06df.com.
You’re probably right re BBEdit vs other special tools - even Rich suggested this in the beginning of our conversation.The question is: Which other tool exactly?Excel would not import 3 million+ records 😔 and yes, I encountered the „helpful“ changes and some more too 😶The Perl-CSV family of modules sounds like heaven 😉 - unfortunately I didn’t learn Perl yet 😶
--
This is the BBEdit Talk public discussion group. If you have a feature request or believe that the application isn't working correctly, please email "sup...@barebones.com" rather than posting here. Follow @bbedit on Mastodon: <https://mastodon.social/@bbedit>
---
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbedit+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/bbedit/8d1058a2-37d1-4262-bfdd-1a12f952c9e2n%40googlegroups.com.
--
This is the BBEdit Talk public discussion group. If you have a feature request or believe that the application isn't working correctly, please email "sup...@barebones.com" rather than posting here. Follow @bbedit on Mastodon: <https://mastodon.social/@bbedit>
---
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbedit+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/bbedit/13035DAE-5056-47BB-98F2-134B8B38C60E%40gmail.com.
--
This is the BBEdit Talk public discussion group. If you have a feature request or believe that the application isn't working correctly, please email "sup...@barebones.com" rather than posting here. Follow @bbedit on Mastodon: <https://mastodon.social/@bbedit>
--- You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbedit+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/bbedit/em3a519c56-ef50-4ccb-bcfb-de55aeab4fd2%40181c06df.com.
To view this discussion visit https://groups.google.com/d/msgid/bbedit/3e139849-cf1a-41d8-821e-97f87cc39513n%40googlegroups.com.
\d{3};\w{3};[^;]*;[^;]*;\d{10};(\w{2});(\d{2});(\d{5});([^;]*);[^;]*;([^;]*);([^;]*);([^;]*);[^;]*;\w{2};\d{2};\d{5};[^;]*;\d{12};[^;]*;[^;]*;\d{8};[^;]*;\d{12};[^;]*;[^;]*;\d;\d;\d;\d;\d;\d;\d;\d;([^;]*);[^\n]*
results in every line but the first column labels line matching.
To figure out what the problem might be on your system with your local language configuration using either BBEdit's Pattern Playground or regex101 start out by building the grep pattern from scratch and rebuilding it from left to right by semicolon delineated field pattern parts. E.g., first \d{3}; which should find/highlight 7 matches in each line of the example CSV data - second add \w{3}; for a total grep of \d{3};\w{3}; which should result in the leading 200;BAG; being highlighted for each line in the example. Continue on like that until you find the next added semicolon delineated field pattern part fails to show a match for the left side part of each line in the example data. It'll be something in that line's or lines' field/column that isn't matching what the just add grep pattern part's matching criteria is.
In addition to sorting, an additional use of a working grep pattern is that you can also use it with BBEdit's Text -> Process Lines Containing... to find all lines that do NOT contain that grep pattern which will help in finding malformed CSV data in the large CSV data files your working with.
xsv fixlengths --help
Transforms CSV data so that all records have the same length. The length is
the length of the longest record in the data (not counting trailing empty fields,
but at least 1). Records with smaller lengths are padded with empty fields.
This requires two complete scans of the CSV data: one for determining the
record size and one for the actual transform. Because of this, the input
given must be a file and not stdin.
Alternatively, if --length is set, then all records are forced to that length.
This requires a single pass and can be done with stdin.
Usage:
xsv fixlengths [options] [<input>]
fixlengths options:
-l, --length <arg> Forcefully set the length of each record. If a
record is not the size given, then it is truncated
or expanded as appropriate.
Common options:
-h, --help Display this message
-o, --output <file> Write output to <file> instead of stdout.
-d, --delimiter <arg> The field delimiter for reading CSV data.
Must be a single character. (default: ,)
Am 28.03.2025 um 19:16 schrieb GP <gp-bbed...@hotmail.com>:
Your Pattern Playground results are perplexing. Using your first post's example CSV data, the grep:
\d{3};\w{3};[^;]*;[^;]*;\d{10};(\w{2});(\d{2});(\d{5});([^;]*);[^;]*;([^;]*);([^;]*);([^;]*);[^;]*;\w{2};\d{2};\d{5};[^;]*;\d{12};[^;]*;[^;]*;\d{8};[^;]*;\d{12};[^;]*;[^;]*;\d;\d;\d;\d;\d;\d;\d;\d;([^;]*);[^\n]*
results in every line but the first column labels line matching.
To figure out what the problem might be on your system with your local language configuration using either BBEdit's Pattern Playground or regex101 start out by building the grep pattern from scratch and rebuilding it from left to right by semicolon delineated field pattern parts. E.g., first \d{3}; which should find/highlight 7 matches in each line of the example CSV data - second add \w{3}; for a total grep of \d{3};\w{3}; which should result in the leading 200;BAG; being highlighted for each line in the example. Continue on like that until you find the next added semicolon delineated field pattern part fails to show a match for the left side part of each line in the example data. It'll be something in that line's or lines' field/column that isn't matching what the just add grep pattern part's matching criteria is.
In addition to sorting, an additional use of a working grep pattern is that you can also use it with BBEdit's Text -> Process Lines Containing... to find all lines that do NOT contain that grep pattern which will help in finding malformed CSV data in the large CSV data files your working with.
On Friday, March 28, 2025 at 7:12:03 AM UTC-7 Vlad Ghitulescu wrote:
Hey GPI corrected the error re „Specific sub-patterns:“ but this didn’t seem to bring any change: The ADRC_POST_CODE1 is still not sorted
<CleanShot 2025-03-28 at 10.02.07.png>
The command gave also no recognizable sign that is ready, so I’m not sure that it didn’t have also problems with the line 25816, where the CRLF follows a house-number (see previous emails).BBEdit’s Pattern Playground shows however that there is no result after searching with the regex
To view this discussion visit https://groups.google.com/d/msgid/bbedit/a12981c7-c81f-44cb-9f7b-3ea64cd6c602n%40googlegroups.com.
<CleanShot 2025-03-28 at 10.02.07.png><CleanShot 2025-03-28 at 10.09.51.png>
To view this discussion visit https://groups.google.com/d/msgid/bbedit/91d5fb2e-1280-40c1-b9c8-c83cbbf698fbn%40googlegroups.com.
Am 28.03.2025 um 19:16 schrieb GP <gp-bbed...@hotmail.com>:
Your Pattern Playground results are perplexing. Using your first post's example CSV data, the grep:
\d{3};\w{3};[^;]*;[^;]*;\d{10};(\w{2});(\d{2});(\d{5});([^;]*);[^;]*;([^;]*);([^;]*);([^;]*);[^;]*;\w{2};\d{2};\d{5};[^;]*;\d{12};[^;]*;[^;]*;\d{8};[^;]*;\d{12};[^;]*;[^;]*;\d;\d;\d;\d;\d;\d;\d;\d;([^;]*);[^\n]*
results in every line but the first column labels line matching.
To figure out what the problem might be on your system with your local language configuration using either BBEdit's Pattern Playground or regex101 start out by building the grep pattern from scratch and rebuilding it from left to right by semicolon delineated field pattern parts. E.g., first \d{3}; which should find/highlight 7 matches in each line of the example CSV data - second add \w{3}; for a total grep of \d{3};\w{3}; which should result in the leading 200;BAG; being highlighted for each line in the example. Continue on like that until you find the next added semicolon delineated field pattern part fails to show a match for the left side part of each line in the example data. It'll be something in that line's or lines' field/column that isn't matching what the just add grep pattern part's matching criteria is.
In addition to sorting, an additional use of a working grep pattern is that you can also use it with BBEdit's Text -> Process Lines Containing... to find all lines that do NOT contain that grep pattern which will help in finding malformed CSV data in the large CSV data files your working with.
On Friday, March 28, 2025 at 7:12:03 AM UTC-7 Vlad Ghitulescu wrote:
Hey GPI corrected the error re „Specific sub-patterns:“ but this didn’t seem to bring any change: The ADRC_POST_CODE1 is still not sorted
<CleanShot 2025-03-28 at 10.02.07.png>
The command gave also no recognizable sign that is ready, so I’m not sure that it didn’t have also problems with the line 25816, where the CRLF follows a house-number (see previous emails).BBEdit’s Pattern Playground shows however that there is no result after searching with the regex
Hey GP,
yes, this is strange.
I'll ask BBEdit-support about it, perhaps they could hint to some differences that I don't get.
I'll come back to this.
Thanks again!
Regards,
Vlad
and made only a minor change to the „*Replace pattern*" in order to still
have the semicolons (see above).
The grep selects every single line of the sample data with the exception
of the first - hurray!
That means that sorting the changed file will sort the lines as I wanted.
After this it would only be necessary to put the columns in the initial
order.
Now that I know for sure 😉 that the grep works I wanted to get the „*Sort
lines…*“ also working, so I put then your grep in the „*Sort lines…*“
again
[image: CleanShot 2025-04-07 at 06.17.42.png]
and checked also „*Sorted lines to new document*“.
--
This is the BBEdit Talk public discussion group. If you have a feature
request or believe that the application isn't working correctly, please
email "sup...@barebones.com" rather than posting here. Follow @bbedit on
Mastodon: <https://mastodon.social/@bbedit>
---
You received this message because you are subscribed to the Google Groups
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to bbedit+un...@googlegroups.com.
To view this discussion visit
--
This is the BBEdit Talk public discussion group. If you have a feature
request or believe that the application isn't working correctly, please
email "sup...@barebones.com" rather than posting here. Follow @bbedit on
Mastodon: <https://mastodon.social/@bbedit>
---
You received this message because you are subscribed to the Google Groups
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to bbedit+un...@googlegroups.com.
To view this discussion visit
<CleanShot 2025-03-28 at 10.02.07.png><CleanShot 2025-03-28 at
10.09.51.png>
--
This is the BBEdit Talk public discussion group. If you have a feature request or believe that the application isn't working correctly, please email "sup...@barebones.com" rather than posting here. Follow @bbedit on Mastodon: <https://mastodon.social/@bbedit>
---
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbedit+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/bbedit/6da51d2b-1b78-4847-8f45-a86e91cc30bbn%40googlegroups.com.
Hey all,
GP helped me with this "mystery": Unfortunately I did NOT include any regular expression capture groups in my first attempts to use the regex with Sort Lines….
As soon as I did it worked as expected 😆
So now I learned to solve my problem with BBEdit and xsv… and also learned some regex on the way.
Thanks you all for this!
Regards,
Vlad
On 9 Apr 2025, at 13:02, Vlad Ghitulescu wrote:
Hey GP,
yes, this is strange.
I'll ask BBEdit-support about it, perhaps they could hint to some differences that I don't get.
I'll come back to this.Thanks again!
Regards,
Vlad
---