Matching strings to a particular form

51 views
Skip to first unread message

deostroll

unread,
Apr 16, 2012, 2:35:55 AM4/16/12
to happy-pr...@googlegroups.com
Hi,
 
I don't know how to approach the problem.
 
I need to write a program that takes a string and outputs the actual representation of that string. For e.g. I have a list of strings like
 
X. Y. Z. Corp.
X. Y. Z Corporation
ABC
AB Corporation
etc
 
Suppose I feed in X. Y. Z. Corp into the program I want the result to be like XYZ Corporation - that is what "we", in our slang, call the normalized form.

Seabook

unread,
Apr 16, 2012, 2:52:10 AM4/16/12
to happy-pr...@googlegroups.com
Don't quite understand your questions.
But sounds not very difficult. If you can provide more details, such as

DataSet:
XXX

Input:
XXX

Output
XXX

Thanks,
Seabook

deostroll

unread,
Apr 16, 2012, 3:50:52 AM4/16/12
to happy-pr...@googlegroups.com
In essence we should look at partial variations of a string and map that to its normailzed form (or original form).
 
E.g.
 
XYZ Corp -> XYZ Corporation
x.y.z Corp -> XYZ Corporation
IBM -> International Business Machines
I.B.M -> International Business Machines
 
To simulate a real world scenario - the input may come from a large list of invoices. The invoice may contain the supplier names. There may be different ways people may enter the supplier name. But there is usually always a pattern to it. After we process the invoice we need the exact supplier name.
 
I hope you this explains the problem.

Seabook

unread,
Apr 16, 2012, 7:15:00 AM4/16/12
to happy-pr...@googlegroups.com
Usually, there are some Reference Data System provide all the info. It's very hard to main such system, it should be an independent team to maintain the system.

Otherwise, you can get some service from Google Translation or Wiki.
If the pattern is simple, you can manipulate the String easily.

if Corp -> Corpration
if a.b.c -> abc

This pattern is quite simple.

Thanks,
Seabook
Reply all
Reply to author
Forward
0 new messages