Importing Log files into MongoDB

307 views
Skip to first unread message

Gopinath Rajee

unread,
Sep 22, 2013, 5:36:36 PM9/22/13
to mongod...@googlegroups.com
All,
 
I have a set of files each containing a number of records of the form of Key=Value. The Key-Value pair in each of the record is not consistent since they differ depending on the log_type such as FTP, Oracle, DB2, Firewall, Exchange.
 
I'm trying to learn the various features of MongoDB and I was wondering if these types of records can be imported into it. I know Mongo has the import utility and but I was wondering if someone can give me kickstart
 
grajee
 
date=2013-09-04 time=06:55:51 timezone="PST" device_name="Ciscoia"  src_ip=120.0.8.129 src_country_code= dst_ip=198.124.267.231 dst_country_code=USA protocol="TCP" log_type="Oracle"
date=2013-09-04 time=06:55:51 timezone="PST" device_name="Ciscoia"  src_ip=120.0.8.127 src_country_code= dst_ip=198.113.216.122 dst_country_code=USA protocol="UDP" log_type="FTP"

Charlie Page

unread,
Sep 23, 2013, 4:18:05 PM9/23/13
to mongod...@googlegroups.com

Hi gargee!

 

To use mongoimport a file that is in csv, tsv, or json must be provided.  Using a script to preprocess what is provided above into csv is a good approach.  It is possible to use a script to generically break those into key value pairs before and after the equal sign (assuming the rest of the data also is in that form).  Inserting them into an array of key-value pairs should work (if you have a massive numbers of key-value pairs this could incur performance issues and/or the max document size (16 meg in 2.4)).  A document in the database with this solution would look like this:

    { src_ip: "120.0.8.127", src_country_code: "USA", ...<common fields>, log_type="FTP", kv: [{key:"ExchangeSpecific", value:"Over"}, […<specific fields>]}

 

However, having a different data model might also work.  One of mongo's strengths is that it can have different fields in different records.  If the conversion script made document fields instead of a key-value pair array then you would have much easier to use document.

With this approach simply insert the document as it appears naturally (this might take some mental adjusting to how simple it is after years of RDBMS complications):

    { src_ip: "120.0.8.127", src_country_code: "USA", ...<common fields>, log_type="FTP", ExchangeSpecific: "Over"}, ...<specific fields>}

Listing common and specific fields above is over kill for illustration as all fields are at the document level.

 

If you have any questions please don’t hesitate to ask.

 

Best,

Charlie

Gopinath Rajee

unread,
Sep 23, 2013, 8:01:13 PM9/23/13
to mongod...@googlegroups.com

On Sunday, September 22, 2013 5:36:36 PM UTC-4, Gopinath Rajee wrote:

Charlie Page

unread,
Sep 24, 2013, 2:37:23 PM9/24/13
to mongod...@googlegroups.com
Hi gargee!

I would recommend preprocessing with Perl.  You could use javascript if you wanted, but I'm not sure it's suited to that.  If I was doing it I'd use bash shell/python to preprocess(merely because I have greater familiarity with those two).  Using scripting to get a JSON formatted document and then inserting that is probably the easiest way to go.  If Perl is your preferred language there is helpful mongo perl info here: http://docs.mongodb.org/ecosystem/drivers/perl/.

Using documents instead of key/value pairs in arrays is generally the right way to go.  To elaborate on data modeling I suggest checking out this: http://docs.mongodb.org/manual/core/data-modeling/.  This will help you understand Mongo data modeling a lot better.  It's hard to unlearn all that SQL normalization stuff, but this should help.

Best,
Charlie



Hi Charlie,
 
Thanks for the info. I'm a SQLServer DBA and I'm trying to learn MongoDB and hence the question while I was playing around with these log files.
 
In SQLServer, I loaded these records into a single column and wrote SQL script to break these into the respective columns. Good thing is that each of these files are only 2MB.
 
If I have to preprocess them, would I have to write a scripting routine in Perl (probabaly) and then load the preprocessed data into Mongo? Cant I use JavaScript to preprocess them? Would it have features to process them?
 
I would preprocess them into document fields as opposed to key=value fields.
 
I'm not sure what you mean here. Can you please elaborate or point me in the right direction?
<<However, having a different data model might also work.  One of mongo's strengths is that it can have different fields in different records.  If the conversion script made document fields instead of a key-value pair array then you would have much easier to use document.>>

 

Thanks,

grajee

Reply all
Reply to author
Forward
0 new messages