Re: [mongodb-user] How to store HTML data in MongoDB?

1,733 views
Skip to first unread message

Scott Hernandez

unread,
Aug 27, 2012, 10:39:21 AM8/27/12
to mongod...@googlegroups.com
That should work just fine. Can you check the values you are using and
make sure they are what you expect? Or if you can provide a sample
showing the problem that would help.

On Mon, Aug 27, 2012 at 9:41 AM, Sezgin Riggs <sez...@gmail.com> wrote:
> I'm trying to crawl the web and store HTML data on MongoDB using Java.
> Unfortunetely while storing data, MongoDB drivers nulling the data and
> stores empty field for HTML data.
>
> When I get the first 500 (or 1000) chars of HTML data, I can store/
> upsert it without a problem so I think something in HTML (or
> Javascript in it) corrupts the command sent to MongoDB and MongoDB
> stores empty data instead of HTML. Should I use something else
> (escaping string? I saw there was a class for that in former driver
> versions) for storing HTML/JavaScript data?
>
> Here is my code snippet
>
> BasicDBObject savedDoc = new BasicDBObject();
> savedDoc.put("url_ID", objURL.get("_id"));
> savedDoc.put("cnt", content); //Content field
> savedDoc.put("st", 0);
> collection.update(new BasicDBObject().append("url_ID",
> objURL.get("_id")), savedDoc, true, false);
>
>
> BTW: I'm using 2.8.0 drivers for Java...
>
>
> Best Regards,
> Sezgin
>
> --
> You received this message because you are subscribed to the Google
> Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com
> To unsubscribe from this group, send email to
> mongodb-user...@googlegroups.com
> See also the IRC channel -- freenode.net#mongodb

Scott Hernandez

unread,
Aug 27, 2012, 12:28:59 PM8/27/12
to mongod...@googlegroups.com
I just tested this in the shell:

> var s = ""
> for(x=0; x< 50000; x++) {s = s + x}
> s.length
238890
> db.longstrings.insert({_id:1, cnt:s})
> var doc = db.longstrings.findOne()
> doc.cnt.length
238890
> db.longstrings.update({_id:1}, {cnt:s+" "}, true)
> doc.cnt.length
238897

Seems to update just fine. Is this basically what your java app does?


On Mon, Aug 27, 2012 at 11:21 AM, Sezgin Riggs <sez...@gmail.com> wrote:
> Hello Again,
>
> I've tried with 40000 and 50000 characters long contents. 40.000 char
> works fine but 50.000 char saves empty field for the document.
>
>
> Best regards,
> Sezgin
>
>
> On Aug 27, 6:02 pm, Sezgin Riggs <sezg...@gmail.com> wrote:
>> Hello Scott,
>>
>> I've checked the data on MongoDB and also I've checked the data passed
>> to collection.update() too.. All of them are correct except "cnt"
>> which is content field that stores raw HTML. When I substring content
>> with first 2000 chars, it's updating "cnt" field with correct data.
>>
>> As sample HTML data I'm using several news sites from Turkey,www.hurriyet.com.tr,www.haberturk.com.
>>
>> Can it be a configuration mistake for MongoDB server?
>>
>> Best Regards,
>> Sezgin
>>
>> On Aug 27, 5:39 pm, Scott Hernandez <scotthernan...@gmail.com> wrote:
>>
>>
>>
>>
>>
>>
>>
>> > That should work just fine. Can you check the values you are using and
>> > make sure they are what you expect? Or if you can provide a sample
>> > showing the problem that would help.
>>
Reply all
Reply to author
Forward
Message has been deleted
Message has been deleted
Message has been deleted
0 new messages