Parsing MT940 SWIFT Message using Java REGEX

Arun

unread,

Dec 27, 2008, 2:13:26 PM12/27/08

to

Hi Folks,

I have two SWIFT messages in a file. I have read the entire file into
a StringBuffer. Now using java.util.regex, I am able to retrieve SWIFT
message blocks 1 ( start with {1: , end with } ) and 2 ( start with
{2: and end with } )

However, I am unable to grab the block 4 ( start with {4: and end with
the first occurence of -} ). My regex pattern is \\{4:.*-\\} .
However, this picks up the message until the last occurence of -}. I
am not sure how to restrict the regex to stop looking beyond the first
occurence of -} . Can you assist please?

Thank you,
Arun

{1:F01AAAABB99BSMK3513951576}
{2:O9400934081223BBBBAA33XXXX03592332770812230834N}{4:
:20:0112230000000894
:25:GSAKW827958933CAD
:28C:255/1
:60F:C011223CAD32,55
:62F:C011223CAD32,55
-}{5:
{CHK:794BB7656E00}}
{1:F01AAAABB99BSMK3513951576}
{2:O9400934081223BBBBAA33XXXX03592332770812230834N}{4:
:20:0112230000000890
:25:SAKG800030155USD
:28C:255/1
:60F:C011223USD175768,92
:61:0112201223CD110,92NDIVNONREF//08 IL053309
/GB/2542049/SHS/312,
:62F:C011021USD175879,84
-}{5:
{CHK:0F4E5614DD28}}

John B. Matthews

unread,

Dec 27, 2008, 2:22:09 PM12/27/08

to

In article
<417b4c4a-6b86-42aa...@o40g2000yqb.googlegroups.com>,
Arun <set...@gmail.com> wrote:

> I have two SWIFT messages in a file. I have read the entire file into
> a StringBuffer. Now using java.util.regex, I am able to retrieve SWIFT
> message blocks 1 ( start with {1: , end with } ) and 2 ( start with
> {2: and end with } )
>
> However, I am unable to grab the block 4 ( start with {4: and end with
> the first occurence of -} ). My regex pattern is \\{4:.*-\\} .
> However, this picks up the message until the last occurence of -}. I
> am not sure how to restrict the regex to stop looking beyond the first
> occurence of -} . Can you assist please?

You might try a reluctant quantifier: \\{4:.*?-\\} (untested).
Does a {4: block include line terminators?

<http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html>

> {1:F01AAAABB99BSMK3513951576}
> {2:O9400934081223BBBBAA33XXXX03592332770812230834N}{4:
> :20:0112230000000894
> :25:GSAKW827958933CAD
> :28C:255/1
> :60F:C011223CAD32,55
> :62F:C011223CAD32,55
> -}{5:
> {CHK:794BB7656E00}}
> {1:F01AAAABB99BSMK3513951576}
> {2:O9400934081223BBBBAA33XXXX03592332770812230834N}{4:
> :20:0112230000000890
> :25:SAKG800030155USD
> :28C:255/1
> :60F:C011223USD175768,92
> :61:0112201223CD110,92NDIVNONREF//08 IL053309
> /GB/2542049/SHS/312,
> :62F:C011021USD175879,84
> -}{5:
> {CHK:0F4E5614DD28}}

--
John B. Matthews
trashgod at gmail dot com
http://home.roadrunner.com/~jbmatthews/

Lew

unread,

Dec 27, 2008, 2:24:23 PM12/27/08

to

Arun wrote:
> Hi Folks,
>
> I have two SWIFT messages in a file. I have read the entire file into
> a StringBuffer. Now using java.util.regex, I am able to retrieve SWIFT
> message blocks 1 ( start with {1: , end with } ) and 2 ( start with
> {2: and end with } )
>
> However, I am unable to grab the block 4 ( start with {4: and end with
> the first occurence of -} ). My regex pattern is \\{4:.*-\\} .
> However, this picks up the message until the last occurence of -}. I
> am not sure how to restrict the regex to stop looking beyond the first
> occurence of -} . Can you assist please?

Looks like a case for the reluctant quantifier
<http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html>
<http://java.sun.com/docs/books/tutorial/essential/regex/quant.html>

\\{4:.*?-\\}

if I read the docs correctly.

--
Lew

Arun

unread,

Dec 27, 2008, 2:42:45 PM12/27/08

to

On Dec 28, 12:22 am, "John B. Matthews" <nos...@nospam.com> wrote:
> In article
> <417b4c4a-6b86-42aa-a99f-e1ce887b8...@o40g2000yqb.googlegroups.com>,

> trashgod at gmail dot comhttp://home.roadrunner.com/~jbmatthews/- Hide quoted text -
>
> - Show quoted text -

John,
Thank you. It worked. I am reading through rethe reluctant quantifiers
now. And yes {4: has a line terminator

WIth your assistance I was able to grab each of the messages
separately.

In the above example, I have a multiline message (:61: followed by
text, followed by a crlf/line terminator and a next line of text
followed by :62F:.

:61:0112201223CD110,92NDIVNONREF//08 IL053309
/GB/2542049/SHS/312,
:62F:C011021USD175879,84

Here the line following the line containing :61: is optional like
:61:0112201223CD110,92NDIVNONREF//08 IL053309
:62F:C011021USD175879,84

or the third line could be another starting with :61: like

:61:0112201223CD110,92NDIVNONREF//08 IL053309
/GB/2542049/SHS/312,
:61:0112201223CD110,92NDIVNONREF//08 IL053309
/GB/2542049/SHS/312,

I wrote something like
((:61:)(\\d{6})([\\d]{4})([CD]?[A-Z]?)(\\d*[,]?\\d*)([\\w\\S]{4})(.*&&
[^:]))

It did not work. :(

Where could I be wrong?

Thank you verymuch.
Arun

John B. Matthews

unread,

Dec 27, 2008, 4:46:54 PM12/27/08

to

In article
<eb350e39-6068-4034...@k36g2000pri.googlegroups.com>,
Arun <set...@gmail.com> wrote:

> On Dec 28, 12:22 am, "John B. Matthews" <nos...@nospam.com> wrote:
> > In article
> > <417b4c4a-6b86-42aa-a99f-e1ce887b8...@o40g2000yqb.googlegroups.com>,
> >
> > Arun <set...@gmail.com> wrote:
> > > I have two SWIFT messages in a file. I have read the entire file into
> > > a StringBuffer. Now using java.util.regex, I am able to retrieve SWIFT
> > > message blocks 1 ( start with {1: , end with } ) and 2 ( start with
> > > {2: and end with } )
> >
> > > However, I am unable to grab the block 4 ( start with {4: and end with
> > > the first occurence of -} ). My regex pattern is \\{4:.*-\\} .
> > > However, this picks up the message until the last occurence of -}. I
> > > am not sure how to restrict the regex to stop looking beyond the first
> > > occurence of -} . Can you assist please?
> >
> > You might try a reluctant quantifier: \\{4:.*?-\\} (untested).
> > Does a {4: block include line terminators?
> >
> > <http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html>

[...]

> Thank you. It worked. I am reading through rethe reluctant quantifiers
> now. And yes {4: has a line terminator
>

> With your assistance I was able to grab each of the messages

> separately.
>
> In the above example, I have a multiline message (:61: followed by
> text, followed by a crlf/line terminator and a next line of text
> followed by :62F:.
>

[...]

> I wrote something like
> ((:61:)(\\d{6})([\\d]{4})([CD]?[A-Z]?)(\\d*[,]?\\d*)([\\w\\S]{4})(.*&&
> [^:]))
>
> It did not work. :(
>
> Where could I be wrong?

Sorry, I don't understand SWIFT message syntax well enough to comment.
IIUC, a pre-XML SWIFT parser is non-trivial. You might Google for an
existing solution.

Arun

unread,

Dec 27, 2008, 11:11:11 PM12/27/08

to

On Dec 28, 2:46 am, "John B. Matthews" <nos...@nospam.com> wrote:
> In article

> <eb350e39-6068-4034-9cab-cc750d43c...@k36g2000pri.googlegroups.com>,

> trashgod at gmail dot comhttp://home.roadrunner.com/~jbmatthews/- Hide quoted text -
>
> - Show quoted text -

John,

Simply put, in the lines below,

:61:0112201223CD110,92NDIVNONREF//08 IL053309
/GB/2542049/SHS/312,
:62F:C011021USD175879,84

I need to grab line 1&2 in a buffer separately. The rule is start
from :61: and read until i encounter the next :.

Thank you verymuch
Arun

Lew

unread,

Dec 27, 2008, 11:51:32 PM12/27/08

to

Arun wrote:
> Simply put, in the lines below,
>
:61:0112201223CD110,92NDIVNONREF//08 IL053309
/GB/2542049/SHS/312,
:62F:C011021USD175879,84
>
> I need to grab line 1&2 in a buffer separately. The rule is start

> from :61: and read until i [sic] encounter the next :.

What about line 3?

I think I understand what you were saying, but Usenet wraps lines, so it's
tricky to refer to line numbers that might not match what people are reading.

--
Lew

Arun

unread,

Dec 28, 2008, 12:00:29 AM12/28/08

to

Hi Lew,

In my example

LINE 1 -> :61:0112201223CD110,92NDIVNONREF//08 IL053309
LINE 2 -> /GB/2542049/SHS/312,
LINE 3 -> :62F:C011021USD175879,84

Here LINE 2 can be any text , basically a (.*) .

LINE 3 could be another line starting with a :

My requirement is if the line starts with :61: , match all characters
until you see a next ":" ( and not :62F: as in above example because
LINE 1 is repetitive, LINE 2 may or may not occur after LINE 2.

Did I understand your question correcty? And did I give a correct
response? Please let me know.

Thank you
Arun

Arun

unread,

Dec 28, 2008, 12:02:28 AM12/28/08

to

On Dec 28, 9:51 am, Lew <no...@lewscanon.com> wrote:

Lew,

I am enclosing each line between braces ().

(:61:0112201223CD110,92NDIVNONREF//08 IL053309 )
(/GB/2542049/SHS/312,)
(:62F:C011021USD175879,84)

Thank you
Arun

John B. Matthews

unread,

Dec 28, 2008, 9:40:05 AM12/28/08

to

In article
<1bc8919e-3639-4ad7...@b38g2000prf.googlegroups.com>,
Arun <set...@gmail.com> wrote:

[...]

> In my example
>
> LINE 1 -> :61:0112201223CD110,92NDIVNONREF//08 IL053309
> LINE 2 -> /GB/2542049/SHS/312,
> LINE 3 -> :62F:C011021USD175879,84
>
> Here LINE 2 can be any text , basically a (.*) .
>
> LINE 3 could be another line starting with a :
>
> My requirement is if the line starts with :61: , match all characters
> until you see a next ":" ( and not :62F: as in above example because
> LINE 1 is repetitive, LINE 2 may or may not occur after LINE 2.

[...]

Do you mean like this:

<sscce>
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Splitting {
public static void main(String[] args) {
String s = ""
+ ":60F:C011223USD175768,92\n"
+ ":61:0112201223CD110,92NDIVNONREF//08 IL053309\n"
+ "/GB/2542049/SHS/312,\n"
+ ":62F:C011021USD175879,84\n";
Pattern p = Pattern.compile(
"(:.*?:.[^:]+)", Pattern.DOTALL);
Matcher m = p.matcher(s);
int i = 1;
while (m.find()) {
System.out.println("(" + i++ + ") " + m.group());
}
}
}
<sscce>

<console>
(1) :60F:C011223USD175768,92

(2) :61:0112201223CD110,92NDIVNONREF//08 IL053309
/GB/2542049/SHS/312,

(3) :62F:C011021USD175879,84

</console>

Arun

unread,

Dec 28, 2008, 10:39:06 AM12/28/08

to

On Dec 28, 7:40 pm, "John B. Matthews" <nos...@nospam.com> wrote:
> In article

> <1bc8919e-3639-4ad7-aafc-96c7f791c...@b38g2000prf.googlegroups.com>,

John,

Yes. It worked. Thank you so much.Your regex is generic and it worked
for all tags.

Thanks much. I appreciate that.

Arun

unread,

Dec 29, 2008, 8:57:23 AM12/29/08

to

On Dec 28, 7:40 pm, "John B. Matthews" <nos...@nospam.com> wrote:
> In article

> <1bc8919e-3639-4ad7-aafc-96c7f791c...@b38g2000prf.googlegroups.com>,

John,

In the below example

LINE 1 -> :61:0112201223CD110,92NDIVNONREF//08 IL053309
LINE 2 -> /GB/2542049/SHS/312,
LINE 3 -> :62F:C011021USD175879,84

I tried to split line 1 and 2 into logical groups ( for clarity
purpose I had separated each token with braces )

:61:(011220)(1223)(CD)(110,92)(NDIV)(NONREF//08 IL053309)
(/GB/2542049/SHS/312,)

using the following regex pattern
:61:(\\d{6})(\\d{4})([CD]?[A-Z]?)(\\d*[\\,]?\\d*)(\\w{4})(.*?\\n)(.*?
[^:]+)

however, I am not able to grab the second line using matcher.group(i)
where i is the group number.

What is wrong in )(.*?[^:]+) ?

Thank you
Arun

:61:(\\d{6})(\\d{4})([CD]?[A-Z]?)(\\d*[\\,]?\\d*)(\\w{4})(.*?\\n)(.*?
[^:]+)

John B. Matthews

unread,

Dec 29, 2008, 10:29:34 AM12/29/08

to

In article
<ba2345eb-e7ab-471f...@x8g2000yqk.googlegroups.com>,
Arun <set...@gmail.com> wrote:

[...]

> In the below example
>
> LINE 1 -> :61:0112201223CD110,92NDIVNONREF//08 IL053309
> LINE 2 -> /GB/2542049/SHS/312,
> LINE 3 -> :62F:C011021USD175879,84
>
> I tried to split line 1 and 2 into logical groups ( for clarity
> purpose I had separated each token with braces )
>
> :61:(011220)(1223)(CD)(110,92)(NDIV)(NONREF//08 IL053309)
> (/GB/2542049/SHS/312,)
>
> using the following regex pattern
> :61:(\\d{6})(\\d{4})([CD]?[A-Z]?)(\\d*[\\,]?\\d*)(\\w{4})(.*?\\n)(.*?
> [^:]+)
>
> however, I am not able to grab the second line using matcher.group(i)
> where i is the group number.
>
> What is wrong in )(.*?[^:]+) ?

>[...]

> :61:(\\d{6})(\\d{4})([CD]?[A-Z]?)(\\d*[\\,]?\\d*)(\\w{4})(.*?\\n)(.*?
> [^:]+)

I don't understand. Perhaps you could modify the <http://sscce.org/> I
provided above to clarify the problem. The following tutorial shows how
to catch syntax errors using the methods of PatternSyntaxException:

Arun

unread,

Dec 29, 2008, 11:26:20 AM12/29/08

to

On Dec 29, 8:29 pm, "John B. Matthews" <nos...@nospam.com> wrote:
> In article

> <ba2345eb-e7ab-471f-98fa-7da6f8b8e...@x8g2000yqk.googlegroups.com>,

> trashgod at gmail dot comhttp://home.roadrunner.com/~jbmatthews/- Hide quoted text -
>
> - Show quoted text -

I think I did not explain my requirement.

I have 3 lines

LINE 1 -> :61:0112201223CD110,92NDIVNONREF//08 IL053309
LINE 2 -> /GB/2542049/SHS/312,
LINE 3 -> :62F:C011021USD175879,84

And I grab line 1 & 2 using pattern "(:61:.*?.[^:]+)" and copy it to a
StringBuffer

Now, with matcher.group(int arg) function, i need to group the
sequence so that i can get the 2nd line.

matcher1.group(1) should return :61:0112201223CD110,92NDIVNONREF//08
IL053309 ( along with the \n ) and matcher1.group(2) should return /GB/
2542049/SHS/312,

This regex is harassing me!!!

Thank you
Arun

John B. Matthews

unread,

Dec 29, 2008, 12:38:30 PM12/29/08

to

In article
<09420b4c-2d19-4f07...@i20g2000prf.googlegroups.com>,
Arun <set...@gmail.com> wrote:

> On Dec 29, 8:29 pm, "John B. Matthews" <nos...@nospam.com> wrote:

[...]
> > <http://java.sun.com/docs/books/tutorial/essential/regex/>

What syntax errors did this approach discover?

[Please trim sigs.]

> I think I did not explain my requirement.
> I have 3 lines
>
> LINE 1 -> :61:0112201223CD110,92NDIVNONREF//08 IL053309
> LINE 2 -> /GB/2542049/SHS/312,
> LINE 3 -> :62F:C011021USD175879,84
>
> And I grab line 1 & 2 using pattern "(:61:.*?.[^:]+)" and copy it to

> a StringBuffer. Now, with matcher.group(int arg) function, i need to

> group the sequence so that i can get the 2nd line.
>
> matcher1.group(1) should return :61:0112201223CD110,92NDIVNONREF//08
> IL053309 ( along with the \n ) and matcher1.group(2) should return /GB/
> 2542049/SHS/312,

[...]

You could try matching the \n:

Pattern p = Pattern.compile("(^.*\n)(.*\n)", Pattern.DOTALL);
Matcher m = p.matcher(s);
if (m.matches()) ...

Again, an <http://sscce.org/> would make discussion easier.

[Please trim sigs.]

Roedy Green

unread,

Dec 30, 2008, 4:23:13 PM12/30/08

to

On Sat, 27 Dec 2008 11:13:26 -0800 (PST), Arun <set...@gmail.com>
wrote, quoted or indirectly quoted someone who said :

>However, I am unable to grab the block 4 ( start with {4: and end with
>the first occurence of -} ). My regex pattern is \\{4:.*-\\} .
>However, this picks up the message until the last occurence of -}. I
>am not sure how to restrict the regex to stop looking beyond the first
>occurence of -} . Can you assist please?

Just a general comment. Regex does not handle delimiter nesting of
variable depth. I did not follow the details of your message, but got
the general impression that might be the problem.

If you have such nesting you need a parser, either one roll yourself
with a finite state automaton, using an enum to track the various
states, and State next( char ) to figure out which state to go to
next depending on the next char.

http://mindprod.com/jgloss/finitestate.html

For tougher parsing you need a parser generator. See
http://mindprod.com/jgloss/parser.html

--
Roedy Green Canadian Mind Products
http://mindprod.com
PM Steven Harper is fixated on the costs of implementing Kyoto, estimated as high as 1% of GDP.
However, he refuses to consider the costs of not implementing Kyoto which the
famous economist Nicholas Stern estimated at 5 to 20% of GDP

smurug...@gmail.com

unread,

Oct 18, 2016, 8:41:52 AM10/18/16

to

Hi John, i came across one of your post today regarding parsing MT940 file. I am working on a requirement were the file looks as below;
:61:161107D6243,23NXPC2000136822
:86:XPC?00ISSUANCE?20INV:5111107901 DTE:20161107 AMT:742.00?21INV
:5111107903 DTE:20161107 AMT:994.74?22INV:5111107869 DTE:201611
07 AMT:479.00?23INV:5111107872 DTE:20161107 AMT:850.00?24INV:511
1107873 DTE:20161107 AMT:500.44?25INV:5111107875 DTE:20161107 AMT
:634.30?26INV:5111107897 DTE:20161107 AMT:405.10?27INV:51111079
00 DTE:20161107 AMT:1020.25?27INV:5111107867 DTE:20161107 AMT:61
7.40?30CITISUPLFIN?31109087?32LOOS AND CO INC?3324356
I want only the keytags to start with a colon and not other lines. The expected output is i want them in a straight line instead of multiple lines. Please help me.

- Muru