Insert Clob data to MongoDB using Python

487 views
Skip to first unread message

raki42

unread,
Dec 19, 2014, 1:02:20 PM12/19/14
to mongod...@googlegroups.com
Hi

I am Migrating the data from oracle to Mongo DB using python,  while migrating i am able to read the clob object using clob.read(), but while inserting into mongo DB it is throwing an exception saying 
<class 'bson.errors.InvalidStringData'>
Traceback (most recent call last):
  File "test.py", line 39, in <module>
    db.test234.insert(i)
  File "C:\Python27\lib\site-packages\pymongo\collection.py", line 409, in insert
    gen(), check_keys, self.uuid_subtype, client)
InvalidStringData: strings in documents must be valid UTF-8: 'Malicious Attack Driver\r\n                            -----------------------\r\n\r\nThis is an effort to (Malicious attack
 driver) comprising of wrapper routines to provide test script infrastructure to  run different attack tools,vulnerability scanners,hacker tools such as . The objective
is to provide common APIs across all the protocols which can run the attack/test from a remote.

Oracle data type is 
('REVIEW_DESCRIPTION', <type 'cx_Oracle.CLOB'>, -1, 4000, 0, 0, 0)

code snippet is as follows

from pymongo import MongoClient
from bson import BSON
mongoclient = MongoClient('localhost',27017)
db = mongoclient['XYZ']

oracleConnection = cx_Oracle.connect('xyz/xyz1@dtabase')
oracleCursor = oracleConnection.cursor()
oracleCursor.execute("select review_description from table where id = 49390")
def getRows():
    """ returns cx_Oracle rows as dicts """
    colnames = []
    #rows = []
    for i in oracleCursor.description:
            print i
            colnames.append(i[0])
            print colnames

    for row in oracleCursor:
rows = []
for i in row:
try:
rows.append(i.read())
except:
rows.append(i)
yield dict(zip(colnames, rows))

data = getRows()
for i in data:
        try:
            db.test234.insert(i)
        except Exception, err:
            print sys.exc_info()[0]
            traceback.print_exc()
            quit()

checked many forums  , unable to find the exact solution for the issue, tried options like encoding the clob data which still thrown same exception

Can anyone help me on this to resolve the issue

Bernie Hackett

unread,
Dec 19, 2014, 3:23:42 PM12/19/14
to mongod...@googlegroups.com
Since this is clob data, you can work around it by using bson.binary.Binary:


I'm not sure why this particular string is causing that error. It may be something about the utf-8 checker in the C extensions. You could try without the C extensions and see if that makes a difference.

Bernie Hackett

unread,
Dec 19, 2014, 3:40:26 PM12/19/14
to mongod...@googlegroups.com
For what it's worth, I just tried to insert that string using PyMongo, both with and without the extensions and both as a str and a unicode, and no error occurred. There must be something about the data we're not seeing from just the repr in the exception.

raki42

unread,
Dec 20, 2014, 12:52:49 PM12/20/14
to mongod...@googlegroups.com
Hi Bernie Hackett

Thank you very much for the reply, here i CLOB data which i tried to upload to the mongo database it is just String which contains all escape characters when we displayed on the console.

"Malicious Attack Driver
                            -----------------------

This is an effort to develop a Tcl package (Malicious attack driver) comprising of wrapper routines to provide test script infrastructure to  run different attack tools,vulnerability scanners,hacker tools such as Codenomicon,Nessus etc.. on a      . The objective is to provide common APIs across all the protocols which can run the attack/test from a remote machine connected to the       device , and check the health of the    device after the attack by verifying console responsivesness, multiple ping sessions in a loop and comparing the process CPU/Interrupt CPU and memory utilization before and after the test. package for Codenomicon, a robustness testing tool.  In the future, support will added to cover other vulnerability testing tools like Nessus. Initially Codenomicon attack pack will be targeted and down the line other tools will be added.



Overview of  Codenomicon
-------------------------

Codenomicon is a tool that can be used to test security flaws in the protocols
Codenomicon provides automated tools with a systematic approach to test the      . The java based tool can simulate numerous protocol messages containing exceptional elements simulating malicious attacks with various protocols such as  such as TCP, BGP, TLS, Radius, Http, Ipv4, Ipv6, UDP, NTP, SSH,GRE,SIP,TACACS etc. It has both a GUI and a command line interface. 


Requirements
------------

Attack machine ----------|----------   
     |
     |
     
     |
     |
  ATS machine

The package routines need to connect to the attack machine and launch the attacks to the       device. This involves control library routines. The attack machine requires java and does not work with all jdk versions.




                                APIs in the package
                                -------------------

1. mad::init_params  API
========================
This initializing API should be called for to setup test environment such as attack type, attack machine, login/password and other optional parameters for the whole bunch of tests following it.
 

Mandatory Parameters:
--------------------
-attack_type   : Type of attack (codenomicon)
-attack machine name     : Name of the machine where codenomicon exists
-passwd : Password to reach attack machine
-javapath                : Path of the java executable.


Optional parameters:
--------------------
-user : User name, defaults to ýrootý
prompt                  : Shell prompt of the attack machine

Returns:
-------
1= success
0= failure 


2. mad::run_test  API
======================
This API is called to run tests from the protocol suite. This API opens Ssh connection to the attack machine and runs a single test or a range of tests specified. Since jar file options varies widely, "-params" is introduced which allows users to specify the parameters. This API will determine the initial memory and CPU utilization before starting the tests.


Mandatory Parameters:
---------------------
-dutname                : Name of the DUT
-params : parameters supplied to jar file containing the tests

Returns:
--------
0= failure (incorrect parameters in ýparams, unable to connect to attack machine)
1= success

The output of the test run can be accessed by mad::testrun_buffer


3. mad::check_health  API
=========================
This API checks the state of the    device after bombarding it with attacks. The following is the flow:

1. Check the responsiveness of  device by running ýshow versioný
2. Pings the device from the attack machine ýpingsý times
3. Checks CPU utilization and memory leaks by running show process cpu |  include CPU util and show mem | include Processor  and compare with that run before the attack

Mandatory Parameters:
---------------------
-dutname : Name of the    DUT
-target_ip : IP address of the interface on DUT
-mem_threshold : % Max allowed increase in memory
-cpu_threshold : % Max allowed increase in CPU utilization
-ping_params            : arguments passed to ping 

Optional Parameters:
--------------------
-pings : Max number of times the DUT is pinged, defaulted to 3

Returns:
--------
0=failure  (Either one of the steps failed)
1=success


4. mad::get_tests  API
======================
This API will return the total number of tests that exist in a protocol suite for a given attack 

Mandatory Parameters:
---------------------
-jarfile : The name of the jar file which contain the tests

Returns: 
--------
Total no of tests if success, otherwise 0"

Regards
Rakesh
Reply all
Reply to author
Forward
0 new messages