Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Parsing MT940 SWIFT Message using Java REGEX

3,000 views
Skip to first unread message

Arun

unread,
Dec 27, 2008, 2:13:26 PM12/27/08
to
Hi Folks,

I have two SWIFT messages in a file. I have read the entire file into
a StringBuffer. Now using java.util.regex, I am able to retrieve SWIFT
message blocks 1 ( start with {1: , end with } ) and 2 ( start with
{2: and end with } )

However, I am unable to grab the block 4 ( start with {4: and end with
the first occurence of -} ). My regex pattern is \\{4:.*-\\} .
However, this picks up the message until the last occurence of -}. I
am not sure how to restrict the regex to stop looking beyond the first
occurence of -} . Can you assist please?

Thank you,
Arun


{1:F01AAAABB99BSMK3513951576}
{2:O9400934081223BBBBAA33XXXX03592332770812230834N}{4:
:20:0112230000000894
:25:GSAKW827958933CAD
:28C:255/1
:60F:C011223CAD32,55
:62F:C011223CAD32,55
-}{5:
{CHK:794BB7656E00}}
{1:F01AAAABB99BSMK3513951576}
{2:O9400934081223BBBBAA33XXXX03592332770812230834N}{4:
:20:0112230000000890
:25:SAKG800030155USD
:28C:255/1
:60F:C011223USD175768,92
:61:0112201223CD110,92NDIVNONREF//08 IL053309
/GB/2542049/SHS/312,
:62F:C011021USD175879,84
-}{5:
{CHK:0F4E5614DD28}}

John B. Matthews

unread,
Dec 27, 2008, 2:22:09 PM12/27/08
to
In article
<417b4c4a-6b86-42aa...@o40g2000yqb.googlegroups.com>,
Arun <set...@gmail.com> wrote:

> I have two SWIFT messages in a file. I have read the entire file into
> a StringBuffer. Now using java.util.regex, I am able to retrieve SWIFT
> message blocks 1 ( start with {1: , end with } ) and 2 ( start with
> {2: and end with } )
>
> However, I am unable to grab the block 4 ( start with {4: and end with
> the first occurence of -} ). My regex pattern is \\{4:.*-\\} .
> However, this picks up the message until the last occurence of -}. I
> am not sure how to restrict the regex to stop looking beyond the first
> occurence of -} . Can you assist please?

You might try a reluctant quantifier: \\{4:.*?-\\} (untested).
Does a {4: block include line terminators?

<http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html>

> {1:F01AAAABB99BSMK3513951576}
> {2:O9400934081223BBBBAA33XXXX03592332770812230834N}{4:
> :20:0112230000000894
> :25:GSAKW827958933CAD
> :28C:255/1
> :60F:C011223CAD32,55
> :62F:C011223CAD32,55
> -}{5:
> {CHK:794BB7656E00}}
> {1:F01AAAABB99BSMK3513951576}
> {2:O9400934081223BBBBAA33XXXX03592332770812230834N}{4:
> :20:0112230000000890
> :25:SAKG800030155USD
> :28C:255/1
> :60F:C011223USD175768,92
> :61:0112201223CD110,92NDIVNONREF//08 IL053309
> /GB/2542049/SHS/312,
> :62F:C011021USD175879,84
> -}{5:
> {CHK:0F4E5614DD28}}

--
John B. Matthews
trashgod at gmail dot com
http://home.roadrunner.com/~jbmatthews/

Lew

unread,
Dec 27, 2008, 2:24:23 PM12/27/08
to
Arun wrote:
> Hi Folks,
>
> I have two SWIFT messages in a file. I have read the entire file into
> a StringBuffer. Now using java.util.regex, I am able to retrieve SWIFT
> message blocks 1 ( start with {1: , end with } ) and 2 ( start with
> {2: and end with } )
>
> However, I am unable to grab the block 4 ( start with {4: and end with
> the first occurence of -} ). My regex pattern is \\{4:.*-\\} .
> However, this picks up the message until the last occurence of -}. I
> am not sure how to restrict the regex to stop looking beyond the first
> occurence of -} . Can you assist please?

Looks like a case for the reluctant quantifier
<http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html>
<http://java.sun.com/docs/books/tutorial/essential/regex/quant.html>

\\{4:.*?-\\}

if I read the docs correctly.

--
Lew

Arun

unread,
Dec 27, 2008, 2:42:45 PM12/27/08
to
On Dec 28, 12:22 am, "John B. Matthews" <nos...@nospam.com> wrote:
> In article
> <417b4c4a-6b86-42aa-a99f-e1ce887b8...@o40g2000yqb.googlegroups.com>,
> trashgod at gmail dot comhttp://home.roadrunner.com/~jbmatthews/- Hide quoted text -
>
> - Show quoted text -

John,
Thank you. It worked. I am reading through rethe reluctant quantifiers
now. And yes {4: has a line terminator

WIth your assistance I was able to grab each of the messages
separately.

In the above example, I have a multiline message (:61: followed by
text, followed by a crlf/line terminator and a next line of text
followed by :62F:.

:61:0112201223CD110,92NDIVNONREF//08 IL053309
/GB/2542049/SHS/312,
:62F:C011021USD175879,84

Here the line following the line containing :61: is optional like
:61:0112201223CD110,92NDIVNONREF//08 IL053309
:62F:C011021USD175879,84

or the third line could be another starting with :61: like

:61:0112201223CD110,92NDIVNONREF//08 IL053309
/GB/2542049/SHS/312,
:61:0112201223CD110,92NDIVNONREF//08 IL053309
/GB/2542049/SHS/312,

I wrote something like
((:61:)(\\d{6})([\\d]{4})([CD]?[A-Z]?)(\\d*[,]?\\d*)([\\w\\S]{4})(.*&&
[^:]))

It did not work. :(

Where could I be wrong?

Thank you verymuch.
Arun

John B. Matthews

unread,
Dec 27, 2008, 4:46:54 PM12/27/08
to
In article
<eb350e39-6068-4034...@k36g2000pri.googlegroups.com>,
Arun <set...@gmail.com> wrote:

> On Dec 28, 12:22 am, "John B. Matthews" <nos...@nospam.com> wrote:
> > In article
> > <417b4c4a-6b86-42aa-a99f-e1ce887b8...@o40g2000yqb.googlegroups.com>,
> >
> >  Arun <set...@gmail.com> wrote:
> > > I have two SWIFT messages in a file. I have read the entire file into
> > > a StringBuffer. Now using java.util.regex, I am able to retrieve SWIFT
> > > message blocks 1 ( start with {1: , end with } ) and 2 ( start with
> > > {2: and end with } )
> >
> > > However, I am unable to grab the block 4 ( start with {4: and end with
> > > the first occurence of -} ). My regex pattern is \\{4:.*-\\} .
> > > However, this picks up the message until the last occurence of -}. I
> > > am not sure how to restrict the regex to stop looking beyond the first
> > > occurence of -} . Can you assist please?
> >
> > You might try a reluctant quantifier: \\{4:.*?-\\} (untested).
> > Does a {4: block include line terminators?
> >
> > <http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html>

[...]


> Thank you. It worked. I am reading through rethe reluctant quantifiers
> now. And yes {4: has a line terminator
>

> With your assistance I was able to grab each of the messages


> separately.
>
> In the above example, I have a multiline message (:61: followed by
> text, followed by a crlf/line terminator and a next line of text
> followed by :62F:.
>

[...]


> I wrote something like
> ((:61:)(\\d{6})([\\d]{4})([CD]?[A-Z]?)(\\d*[,]?\\d*)([\\w\\S]{4})(.*&&
> [^:]))
>
> It did not work. :(
>
> Where could I be wrong?

Sorry, I don't understand SWIFT message syntax well enough to comment.
IIUC, a pre-XML SWIFT parser is non-trivial. You might Google for an
existing solution.

Arun

unread,
Dec 27, 2008, 11:11:11 PM12/27/08
to
On Dec 28, 2:46 am, "John B. Matthews" <nos...@nospam.com> wrote:
> In article
> <eb350e39-6068-4034-9cab-cc750d43c...@k36g2000pri.googlegroups.com>,
> trashgod at gmail dot comhttp://home.roadrunner.com/~jbmatthews/- Hide quoted text -
>
> - Show quoted text -

John,

Simply put, in the lines below,

:61:0112201223CD110,92NDIVNONREF//08 IL053309
/GB/2542049/SHS/312,
:62F:C011021USD175879,84

I need to grab line 1&2 in a buffer separately. The rule is start
from :61: and read until i encounter the next :.

Thank you verymuch
Arun

Lew

unread,
Dec 27, 2008, 11:51:32 PM12/27/08
to
Arun wrote:
> Simply put, in the lines below,
>
:61:0112201223CD110,92NDIVNONREF//08 IL053309
/GB/2542049/SHS/312,
:62F:C011021USD175879,84
>
> I need to grab line 1&2 in a buffer separately. The rule is start
> from :61: and read until i [sic] encounter the next :.

What about line 3?

I think I understand what you were saying, but Usenet wraps lines, so it's
tricky to refer to line numbers that might not match what people are reading.

--
Lew

Arun

unread,
Dec 28, 2008, 12:00:29 AM12/28/08
to

Hi Lew,

In my example

LINE 1 -> :61:0112201223CD110,92NDIVNONREF//08 IL053309
LINE 2 -> /GB/2542049/SHS/312,
LINE 3 -> :62F:C011021USD175879,84

Here LINE 2 can be any text , basically a (.*) .

LINE 3 could be another line starting with a :

My requirement is if the line starts with :61: , match all characters
until you see a next ":" ( and not :62F: as in above example because
LINE 1 is repetitive, LINE 2 may or may not occur after LINE 2.

Did I understand your question correcty? And did I give a correct
response? Please let me know.

Thank you
Arun


Arun

unread,
Dec 28, 2008, 12:02:28 AM12/28/08
to
On Dec 28, 9:51 am, Lew <no...@lewscanon.com> wrote:

Lew,

I am enclosing each line between braces ().

(:61:0112201223CD110,92NDIVNONREF//08 IL053309 )
(/GB/2542049/SHS/312,)
(:62F:C011021USD175879,84)

Thank you
Arun

John B. Matthews

unread,
Dec 28, 2008, 9:40:05 AM12/28/08
to
In article
<1bc8919e-3639-4ad7...@b38g2000prf.googlegroups.com>,
Arun <set...@gmail.com> wrote:

[...]


> In my example
>
> LINE 1 -> :61:0112201223CD110,92NDIVNONREF//08 IL053309
> LINE 2 -> /GB/2542049/SHS/312,
> LINE 3 -> :62F:C011021USD175879,84
>
> Here LINE 2 can be any text , basically a (.*) .
>
> LINE 3 could be another line starting with a :
>
> My requirement is if the line starts with :61: , match all characters
> until you see a next ":" ( and not :62F: as in above example because
> LINE 1 is repetitive, LINE 2 may or may not occur after LINE 2.

[...]

Do you mean like this:

<sscce>
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Splitting {
public static void main(String[] args) {
String s = ""
+ ":60F:C011223USD175768,92\n"
+ ":61:0112201223CD110,92NDIVNONREF//08 IL053309\n"
+ "/GB/2542049/SHS/312,\n"
+ ":62F:C011021USD175879,84\n";
Pattern p = Pattern.compile(
"(:.*?:.[^:]+)", Pattern.DOTALL);
Matcher m = p.matcher(s);
int i = 1;
while (m.find()) {
System.out.println("(" + i++ + ") " + m.group());
}
}
}
<sscce>

<console>
(1) :60F:C011223USD175768,92

(2) :61:0112201223CD110,92NDIVNONREF//08 IL053309
/GB/2542049/SHS/312,

(3) :62F:C011021USD175879,84

</console>

See also:

<http://java.sun.com/docs/books/tutorial/essential/regex/>

Arun

unread,
Dec 28, 2008, 10:39:06 AM12/28/08
to
On Dec 28, 7:40 pm, "John B. Matthews" <nos...@nospam.com> wrote:
> In article
> <1bc8919e-3639-4ad7-aafc-96c7f791c...@b38g2000prf.googlegroups.com>,

John,

Yes. It worked. Thank you so much.Your regex is generic and it worked
for all tags.

Thanks much. I appreciate that.

Arun

Arun

unread,
Dec 29, 2008, 8:57:23 AM12/29/08
to
On Dec 28, 7:40 pm, "John B. Matthews" <nos...@nospam.com> wrote:
> In article
> <1bc8919e-3639-4ad7-aafc-96c7f791c...@b38g2000prf.googlegroups.com>,

John,

In the below example

LINE 1 -> :61:0112201223CD110,92NDIVNONREF//08 IL053309
LINE 2 -> /GB/2542049/SHS/312,
LINE 3 -> :62F:C011021USD175879,84

I tried to split line 1 and 2 into logical groups ( for clarity
purpose I had separated each token with braces )


:61:(011220)(1223)(CD)(110,92)(NDIV)(NONREF//08 IL053309)
(/GB/2542049/SHS/312,)

using the following regex pattern
:61:(\\d{6})(\\d{4})([CD]?[A-Z]?)(\\d*[\\,]?\\d*)(\\w{4})(.*?\\n)(.*?
[^:]+)

however, I am not able to grab the second line using matcher.group(i)
where i is the group number.

What is wrong in )(.*?[^:]+) ?

Thank you
Arun

:61:(\\d{6})(\\d{4})([CD]?[A-Z]?)(\\d*[\\,]?\\d*)(\\w{4})(.*?\\n)(.*?
[^:]+)

John B. Matthews

unread,
Dec 29, 2008, 10:29:34 AM12/29/08
to
In article
<ba2345eb-e7ab-471f...@x8g2000yqk.googlegroups.com>,
Arun <set...@gmail.com> wrote:

[...]


> In the below example
>
> LINE 1 -> :61:0112201223CD110,92NDIVNONREF//08 IL053309
> LINE 2 -> /GB/2542049/SHS/312,
> LINE 3 -> :62F:C011021USD175879,84
>
> I tried to split line 1 and 2 into logical groups ( for clarity
> purpose I had separated each token with braces )
>
> :61:(011220)(1223)(CD)(110,92)(NDIV)(NONREF//08 IL053309)
> (/GB/2542049/SHS/312,)
>
> using the following regex pattern
> :61:(\\d{6})(\\d{4})([CD]?[A-Z]?)(\\d*[\\,]?\\d*)(\\w{4})(.*?\\n)(.*?
> [^:]+)
>
> however, I am not able to grab the second line using matcher.group(i)
> where i is the group number.
>
> What is wrong in )(.*?[^:]+) ?

>[...]


> :61:(\\d{6})(\\d{4})([CD]?[A-Z]?)(\\d*[\\,]?\\d*)(\\w{4})(.*?\\n)(.*?
> [^:]+)

I don't understand. Perhaps you could modify the <http://sscce.org/> I
provided above to clarify the problem. The following tutorial shows how
to catch syntax errors using the methods of PatternSyntaxException:

Arun

unread,
Dec 29, 2008, 11:26:20 AM12/29/08
to
On Dec 29, 8:29 pm, "John B. Matthews" <nos...@nospam.com> wrote:
> In article
> <ba2345eb-e7ab-471f-98fa-7da6f8b8e...@x8g2000yqk.googlegroups.com>,
> trashgod at gmail dot comhttp://home.roadrunner.com/~jbmatthews/- Hide quoted text -
>
> - Show quoted text -

I think I did not explain my requirement.


I have 3 lines

LINE 1 -> :61:0112201223CD110,92NDIVNONREF//08 IL053309
LINE 2 -> /GB/2542049/SHS/312,
LINE 3 -> :62F:C011021USD175879,84


And I grab line 1 & 2 using pattern "(:61:.*?.[^:]+)" and copy it to a
StringBuffer

Now, with matcher.group(int arg) function, i need to group the
sequence so that i can get the 2nd line.

matcher1.group(1) should return :61:0112201223CD110,92NDIVNONREF//08
IL053309 ( along with the \n ) and matcher1.group(2) should return /GB/
2542049/SHS/312,

This regex is harassing me!!!

Thank you
Arun

John B. Matthews

unread,
Dec 29, 2008, 12:38:30 PM12/29/08
to
In article
<09420b4c-2d19-4f07...@i20g2000prf.googlegroups.com>,
Arun <set...@gmail.com> wrote:

> On Dec 29, 8:29 pm, "John B. Matthews" <nos...@nospam.com> wrote:

[...]
> > <http://java.sun.com/docs/books/tutorial/essential/regex/>

What syntax errors did this approach discover?

[Please trim sigs.]

> I think I did not explain my requirement.
> I have 3 lines
>
> LINE 1 -> :61:0112201223CD110,92NDIVNONREF//08 IL053309
> LINE 2 -> /GB/2542049/SHS/312,
> LINE 3 -> :62F:C011021USD175879,84
>
> And I grab line 1 & 2 using pattern "(:61:.*?.[^:]+)" and copy it to

> a StringBuffer. Now, with matcher.group(int arg) function, i need to

> group the sequence so that i can get the 2nd line.
>
> matcher1.group(1) should return :61:0112201223CD110,92NDIVNONREF//08
> IL053309 ( along with the \n ) and matcher1.group(2) should return /GB/
> 2542049/SHS/312,

[...]

You could try matching the \n:

Pattern p = Pattern.compile("(^.*\n)(.*\n)", Pattern.DOTALL);
Matcher m = p.matcher(s);
if (m.matches()) ...

Again, an <http://sscce.org/> would make discussion easier.

[Please trim sigs.]

Roedy Green

unread,
Dec 30, 2008, 4:23:13 PM12/30/08
to
On Sat, 27 Dec 2008 11:13:26 -0800 (PST), Arun <set...@gmail.com>
wrote, quoted or indirectly quoted someone who said :

>However, I am unable to grab the block 4 ( start with {4: and end with
>the first occurence of -} ). My regex pattern is \\{4:.*-\\} .
>However, this picks up the message until the last occurence of -}. I
>am not sure how to restrict the regex to stop looking beyond the first
>occurence of -} . Can you assist please?

Just a general comment. Regex does not handle delimiter nesting of
variable depth. I did not follow the details of your message, but got
the general impression that might be the problem.

If you have such nesting you need a parser, either one roll yourself
with a finite state automaton, using an enum to track the various
states, and State next( char ) to figure out which state to go to
next depending on the next char.

http://mindprod.com/jgloss/finitestate.html

For tougher parsing you need a parser generator. See
http://mindprod.com/jgloss/parser.html


--
Roedy Green Canadian Mind Products
http://mindprod.com
PM Steven Harper is fixated on the costs of implementing Kyoto, estimated as high as 1% of GDP.
However, he refuses to consider the costs of not implementing Kyoto which the
famous economist Nicholas Stern estimated at 5 to 20% of GDP

smurug...@gmail.com

unread,
Oct 18, 2016, 8:41:52 AM10/18/16
to
Hi John, i came across one of your post today regarding parsing MT940 file. I am working on a requirement were the file looks as below;
:61:161107D6243,23NXPC2000136822
:86:XPC?00ISSUANCE?20INV:5111107901 DTE:20161107 AMT:742.00?21INV
:5111107903 DTE:20161107 AMT:994.74?22INV:5111107869 DTE:201611
07 AMT:479.00?23INV:5111107872 DTE:20161107 AMT:850.00?24INV:511
1107873 DTE:20161107 AMT:500.44?25INV:5111107875 DTE:20161107 AMT
:634.30?26INV:5111107897 DTE:20161107 AMT:405.10?27INV:51111079
00 DTE:20161107 AMT:1020.25?27INV:5111107867 DTE:20161107 AMT:61
7.40?30CITISUPLFIN?31109087?32LOOS AND CO INC?3324356
I want only the keytags to start with a colon and not other lines. The expected output is i want them in a straight line instead of multiple lines. Please help me.

- Muru
0 new messages