I have come across such situations many times during my career. Have tackled such issues of finding relational data from plain text file.
Best solution I found is converting such files to XML and then apply XML queries to get back relational data.
You can assume these Best Rules
1. apply regular expression in multiple steps…
2. number of steps is directly proportional to easiness of regular expression
Here, I will use only two steps (as per my own difficulty level.)
(you can do it in single step too… but I really think making it very complex is always stupid)
Step one
Find string :
filename:(?<filename>.+)\s*owner:(?<owner>.+)\s*datecreated:(?<datecreated>.+)\s*info:\s+(?<info>.+)\s*Hash:\s+(?<Hash>.+)\s*Salt:\s+(?<Salt>.+)\s*(?<data>(?:\s{4}.*)+)
replace string
<FileHash>
<Hash>${Hash}</Hash>
<FileName>${filename}</FileName>
<Owner>${owner}</Owner>
<DateCreated>${datecreated}</DateCreated>
<Info>${info}</Info>
<Salt>${Salt}</Salt>
${data}
</FileHash>
Your result after step 1.
-------------------------------------------------------
<FileHash>
<Hash>AB-12345
</Hash>
<FileName> pictures_1.zip
</FileName>
<Owner> john
</Owner>
<DateCreated> Mon 2014-02-24 15:16:34 +0200
</DateCreated>
<Info>Files added to AB-12345
</Info>
<Salt>sugar
</Salt>
-------------------------------------------------------
filename: HPM_4217.jpg
owner: john
datecreated: Mon 2014-02-24 14:23:49 +0200
info:
List of things
-------------------------------------------------------
filename: UIH_9754.jpg
owner: john
datecreated: Mon 2014-02-24 12:33:15 +0200
info:
Coffee-break
</FileHash>
-------------------------------------------------------
<FileHash>
<Hash>CD-78954
</Hash>
<FileName> pictures_2.zip
</FileName>
<Owner> john
</Owner>
<DateCreated> Mon 2014-02-24 15:16:34 +0200
</DateCreated>
<Info>Files added to CD-78954
</Info>
<Salt>skyfall
</Salt>
-------------------------------------------------------
filename: PIC_789.jpg
owner: john
datecreated: Mon 2014-02-24 14:23:49file +0200
info:
Transformer
-------------------------------------------------------
filename: PIC_789.jpg
owner: john
datecreated: Mon 2014-02-24 12:33:15 +0200
info:
Fiji Island
</FileHash>
-------------------------------------------------------
<FileHash>
<Hash>EF-45645
</Hash>
<FileName> pictures_3.zip
</FileName>
<Owner> john
</Owner>
<DateCreated> Mon 2014-02-24 15:16:34 +0200
</DateCreated>
<Info>Files added to EF-45654
</Info>
<Salt>jigsaw
</Salt>
-------------------------------------------------------
filename: IMG_704.jpg
owner: john
datecreated: Mon 2014-02-24 14:23:49 +0200
info:
Vermount Mountains
-------------------------------------------------------
filename: IMG_9741.jpg
owner: john
datecreated: Mon 2014-02-24 12:33:15 +0200
info:
New York
</FileHash>
-------------------------------------------------------
Step two
Find string
filename:(?<filename>.+)\s*owner:(?<owner>.+)\s*datecreated:(?<datecreated>.+)\s*info:\s+(?<info>.+)
Replace String
<FileInfo>
<FileName>${filename}</FileName>
<Owner>${owner}</Owner>
<DateCreated>${datecreated}</DateCreated>
<Info>${info}</Info>
</FileInfo>
Result after step two
-------------------------------------------------------
<FileHash>
<Hash>AB-12345
</Hash>
<FileName> pictures_1.zip
</FileName>
<Owner> john
</Owner>
<DateCreated> Mon 2014-02-24 15:16:34 +0200
</DateCreated>
<Info>Files added to AB-12345
</Info>
<Salt>sugar
</Salt>
-------------------------------------------------------
<FileInfo>
<FileName> HPM_4217.jpg
</FileName>
<Owner> john
</Owner>
<DateCreated> Mon 2014-02-24 14:23:49 +0200
</DateCreated>
<Info>List of things
</Info>
</FileInfo>
-------------------------------------------------------
<FileInfo>
<FileName> UIH_9754.jpg
</FileName>
<Owner> john
</Owner>
<DateCreated> Mon 2014-02-24 12:33:15 +0200
</DateCreated>
<Info>Coffee-break
</Info>
</FileInfo>
</FileHash>
-------------------------------------------------------
<FileHash>
<Hash>CD-78954
</Hash>
<FileName> pictures_2.zip
</FileName>
<Owner> john
</Owner>
<DateCreated> Mon 2014-02-24 15:16:34 +0200
</DateCreated>
<Info>Files added to CD-78954
</Info>
<Salt>skyfall
</Salt>
-------------------------------------------------------
<FileInfo>
<FileName> PIC_789.jpg
</FileName>
<Owner> john
</Owner>
<DateCreated> Mon 2014-02-24 14:23:49file +0200
</DateCreated>
<Info>Transformer
</Info>
</FileInfo>
-------------------------------------------------------
<FileInfo>
<FileName> PIC_789.jpg
</FileName>
<Owner> john
</Owner>
<DateCreated> Mon 2014-02-24 12:33:15 +0200
</DateCreated>
<Info>Fiji Island
</Info>
</FileInfo>
</FileHash>
-------------------------------------------------------
<FileHash>
<Hash>EF-45645
</Hash>
<FileName> pictures_3.zip
</FileName>
<Owner> john
</Owner>
<DateCreated> Mon 2014-02-24 15:16:34 +0200
</DateCreated>
<Info>Files added to EF-45654
</Info>
<Salt>jigsaw
</Salt>
-------------------------------------------------------
<FileInfo>
<FileName> IMG_704.jpg
</FileName>
<Owner> john
</Owner>
<DateCreated> Mon 2014-02-24 14:23:49 +0200
</DateCreated>
<Info>Vermount Mountains
</Info>
</FileInfo>
-------------------------------------------------------
<FileInfo>
<FileName> IMG_9741.jpg
</FileName>
<Owner> john
</Owner>
<DateCreated> Mon 2014-02-24 12:33:15 +0200
</DateCreated>
<Info>New York
</Info>
</FileInfo>
</FileHash>
-------------------------------------------------------
prashant
Super Simple Software
Software Development, Internet Marketing, SEO and Academic Projects
--
--
Sub, Unsub, Read-on-the-web, tune your personal settings for this Regex forum:
http://groups.google.com/group/regex?hl=en
---
You received this message because you are subscribed to the Google Groups "Regex" group.
To unsubscribe from this group and stop receiving emails from it, send an email to regex+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Everyone is new at some point of time.. nothing to worry. :-)
Me totally in .Net. don't even know 'p' of python.
I think this page will help you out with Python regex usage and syntax. http://docs.python.org/2/howto/regex.html
Look out for 'search and replace' section.
What you looking for, getting a list, is quite easy In .net. it simply needs a 'matches()' function of regex lib. That returns array of all matches found in given string. You need to search for its alternative in python.
Sorry, was on weekend vacation so didn't reply u early.
Supersimplesoft.com
On Mar 30, 2014 6:40 PM, "Paolo Aciano" <paolo....@gmail.com> wrote:
Also, I was thinking if it would be possible to just to split the file by the ^-+ in the begening of a new line and put it into a list instead. I know it is not good approach for large files but for this small example it could be just fine. how would you split it like that with regex and put it into a list?
Dne 30. 3. 2014 14:41 "Paolo Aciano" <paolo....@gmail.com> napsal(a):
Hi Prashant,
could you also highlight how to do the find and replace in Python. First I thought I could do it from the description you gave me, I am struggling with it. As you already guessed, I am new to Python. I have been searching on Google, but I must be doing something wrong. I have already been playing with the re module and re.compile and re.sub, but I might be looking at wrong place?
Paolo
Dne 27. 3. 2014 19:28 "Prashant Patole" <prashan...@gmail.com> napsal(a):
--
Prashant. 9423968815. SSS.Sent from Gmail Mobile