Insert Clob data to MongoDB using Python

raki42

unread,

Dec 19, 2014, 1:02:20 PM12/19/14

to mongod...@googlegroups.com

Hi

I am Migrating the data from oracle to Mongo DB using python, while migrating i am able to read the clob object using clob.read(), but while inserting into mongo DB it is throwing an exception saying

Traceback (most recent call last):

File "test.py", line 39, in <module>

db.test234.insert(i)

File "C:\Python27\lib\site-packages\pymongo\collection.py", line 409, in insert

gen(), check_keys, self.uuid_subtype, client)

InvalidStringData: strings in documents must be valid UTF-8: 'Malicious Attack Driver\r\n -----------------------\r\n\r\nThis is an effort to (Malicious attack

driver) comprising of wrapper routines to provide test script infrastructure to run different attack tools,vulnerability scanners,hacker tools such as . The objective

is to provide common APIs across all the protocols which can run the attack/test from a remote.

Oracle data type is

('REVIEW_DESCRIPTION', <type 'cx_Oracle.CLOB'>, -1, 4000, 0, 0, 0)

code snippet is as follows

from pymongo import MongoClient

from bson import BSON

mongoclient = MongoClient('localhost',27017)

db = mongoclient['XYZ']

oracleConnection = cx_Oracle.connect('xyz/xyz1@dtabase')

oracleCursor = oracleConnection.cursor()

oracleCursor.execute("select review_description from table where id = 49390")

def getRows():

""" returns cx_Oracle rows as dicts """

colnames = []

#rows = []

for i in oracleCursor.description:

print i

colnames.append(i[0])

print colnames

for row in oracleCursor:

rows = []

for i in row:

try:

rows.append(i.read())

except:

rows.append(i)

yield dict(zip(colnames, rows))

data = getRows()

for i in data:

try:

db.test234.insert(i)

except Exception, err:

print sys.exc_info()[0]

traceback.print_exc()

quit()

checked many forums , unable to find the exact solution for the issue, tried options like encoding the clob data which still thrown same exception

Can anyone help me on this to resolve the issue

Bernie Hackett

unread,

Dec 19, 2014, 3:23:42 PM12/19/14

to mongod...@googlegroups.com

Since this is clob data, you can work around it by using bson.binary.Binary:

http://api.mongodb.org/python/current/api/bson/binary.html#bson.binary.Binary

I'm not sure why this particular string is causing that error. It may be something about the utf-8 checker in the C extensions. You could try without the C extensions and see if that makes a difference.

Bernie Hackett

unread,

Dec 19, 2014, 3:40:26 PM12/19/14

to mongod...@googlegroups.com

For what it's worth, I just tried to insert that string using PyMongo, both with and without the extensions and both as a str and a unicode, and no error occurred. There must be something about the data we're not seeing from just the repr in the exception.

raki42

unread,

Dec 20, 2014, 12:52:49 PM12/20/14

to mongod...@googlegroups.com

Hi Bernie Hackett

Thank you very much for the reply, here i CLOB data which i tried to upload to the mongo database it is just String which contains all escape characters when we displayed on the console.

"Malicious Attack Driver

-----------------------

This is an effort to develop a Tcl package (Malicious attack driver) comprising of wrapper routines to provide test script infrastructure to run different attack tools,vulnerability scanners,hacker tools such as Codenomicon,Nessus etc.. on a . The objective is to provide common APIs across all the protocols which can run the attack/test from a remote machine connected to the device , and check the health of the device after the attack by verifying console responsivesness, multiple ping sessions in a loop and comparing the process CPU/Interrupt CPU and memory utilization before and after the test. package for Codenomicon, a robustness testing tool. In the future, support will added to cover other vulnerability testing tools like Nessus. Initially Codenomicon attack pack will be targeted and down the line other tools will be added.

Overview of Codenomicon

-------------------------

Codenomicon is a tool that can be used to test security flaws in the protocols

Codenomicon provides automated tools with a systematic approach to test the . The java based tool can simulate numerous protocol messages containing exceptional elements simulating malicious attacks with various protocols such as such as TCP, BGP, TLS, Radius, Http, Ipv4, Ipv6, UDP, NTP, SSH,GRE,SIP,TACACS etc. It has both a GUI and a command line interface.

Requirements

------------

Attack machine ----------|----------

|

ATS machine

The package routines need to connect to the attack machine and launch the attacks to the device. This involves control library routines. The attack machine requires java and does not work with all jdk versions.

APIs in the package

-------------------

1. mad::init_params API

========================

This initializing API should be called for to setup test environment such as attack type, attack machine, login/password and other optional parameters for the whole bunch of tests following it.

Mandatory Parameters:

--------------------

-attack_type : Type of attack (codenomicon)

-attack machine name : Name of the machine where codenomicon exists

-passwd : Password to reach attack machine

-javapath : Path of the java executable.

Optional parameters:

--------------------

-user : User name, defaults to ýrootý

prompt : Shell prompt of the attack machine

Returns:

-------

1= success

0= failure

2. mad::run_test API

======================

This API is called to run tests from the protocol suite. This API opens Ssh connection to the attack machine and runs a single test or a range of tests specified. Since jar file options varies widely, "-params" is introduced which allows users to specify the parameters. This API will determine the initial memory and CPU utilization before starting the tests.

Mandatory Parameters:

---------------------

-dutname : Name of the DUT

-params : parameters supplied to jar file containing the tests

Returns:

--------

0= failure (incorrect parameters in ýparams, unable to connect to attack machine)

1= success

The output of the test run can be accessed by mad::testrun_buffer

3. mad::check_health API

=========================

This API checks the state of the device after bombarding it with attacks. The following is the flow:

1. Check the responsiveness of device by running ýshow versioný

2. Pings the device from the attack machine ýpingsý times

3. Checks CPU utilization and memory leaks by running show process cpu | include CPU util and show mem | include Processor and compare with that run before the attack

Mandatory Parameters:

---------------------

-dutname : Name of the DUT

-target_ip : IP address of the interface on DUT

-mem_threshold : % Max allowed increase in memory

-cpu_threshold : % Max allowed increase in CPU utilization

-ping_params : arguments passed to ping

Optional Parameters:

--------------------

-pings : Max number of times the DUT is pinged, defaulted to 3

Returns:

--------

0=failure (Either one of the steps failed)

1=success

4. mad::get_tests API

======================

This API will return the total number of tests that exist in a protocol suite for a given attack

Mandatory Parameters:

---------------------

-jarfile : The name of the jar file which contain the tests

Returns:

--------

Total no of tests if success, otherwise 0"

Regards

Rakesh

Reply all

Reply to author

Forward