How to avoid duplication insertion in MongoDB when multiple instances of Tornado app is inserting item in database?

962 views
Skip to first unread message

avi

unread,
Jul 8, 2014, 10:20:48 AM7/8/14
to python-...@googlegroups.com
This is more of a programming question and not exactly problem with Tornado. Btw am I allowed to post such questions in this group? 

I have built a simple wish list app in Tornado. The user adds URL to the product and the app keeps tracking its price. The flow is simple, the user logs in, there is a form and user pastes/enters the url in the form and clicks submit. This makes a post request to my server and that url will be added to database.

For submit I am using ajax. So, it submits the request and refreshes the wishlist table in the page. However as this takes time, the users think the app is not working and tend to press 'submit' multiple times. It takes time, as my server verifies the URL, fetches its price, image and other details.

During initial stages, I was talking directly to Tornado server. When I press submit button multiple times, Tornado would add it only once. I don't know how it managed to ignore same requests or how it figured out its same request, which is being processed. Since it never occurred, I never thought about it.

Now there are four instances of Tornado are running behind a Nginx server. So I guess Nginx is passing the request to different tornado instances when multiple times submit is pressed.

So how do I avoid this?

  • I could create a local storage in browser for every session and maintain list of URLs. When submit is pressed and if URL is already present in the list, then don't send the request. And I would destroy this storage whenever tab is closed.

  • disable submit button once the url is submitted and till it receives reply from the server

  • give a facebook style notification when the submit is pressed and hope user does not press submit again.

  • configure nginx load balancer to work on ip-hash mode. so that the user always gets to served by same instance of Tornado

  • may be configure nginx somehow that it ignores same POST requests as single instance of Tornado was doing earlier?

Here's the code in question (not sure if it actually matters): http://pastebin.com/u3ZsdmbZ

A. Jesse Jiryu Davis

unread,
Jul 8, 2014, 10:38:32 AM7/8/14
to python-...@googlegroups.com
I see you're using PyMongo with Tornado. I can't recommend this approach since a long-running MongoDB operation will block an entire Tornado process from all work. (I maintain Motor, a non-blocking MongoDB driver for Tornado.) But PyMongo does have the advantage of avoiding race conditions.

In your code you query for the product, and if it's absent you insert it. This worked well with a single Tornado process because it handled one request at a time, and it couldn't be interrupted between querying and inserting. If it finds no document, there's still no document a moment later when it attempts to do the insert.

With multiple Tornado processes, however, a user can press the button twice quickly, and something like this happens:
  1. Process A queries, finds no product.
  2. Process B queries, finds no product.
  3. Process A inserts a document.
  4. Process B inserts a duplicate document.
I would:
  • Disable the "submit" button before beginning the AJAX POST. Make sure you show a spinner, and re-enable the button after a timeout or HTTP 500 or other error.
  • Create a unique compound index on the fields in the products collection that ought to be unique.
  • Decide how your python code is going to handle pymongo.errors.DuplicateKeyError should one occur. (Raising an informative error to the user might be fine.)


--
You received this message because you are subscribed to the Google Groups "Tornado Web Server" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python-tornad...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

avi

unread,
Jul 9, 2014, 3:56:05 AM7/9/14
to python-...@googlegroups.com
Thank you very very much! 

I followed @A. Jesse Jiryu Davis advice and here are the changes I made.

First I created a unique single index (I don't need compounded one, as _id is already indexed and I just wanted only url field in product document to be unique)

product_db.create_index('url', unique=True, dropDups=True)

Do note that above one will remove all the duplicate documents which have same url. If you don't want that, then do following:

product_db.create_index('url', unique=True)

If there are any duplicate keys exists, then PyMongo will throw DuplicateKeyError exception.

I changed my javascript code, now at start of ajax it disables the submit button and re-enables it later:

<script type="text/javascript">
    $(document).ready(function()
    {   
        $("#product-add-form").on('submit',function()
        {    
            var product_url = $("#product-url").val();
            console.log(product_url)
            var dataString = 'product-url='+ product_url
            $.ajax({
              type: "POST",
              url: "products",
              data: dataString,
              success: function() {
                $('#product-table-div').load("/products #product-table-div")
              }
            });
            return false;
        });        
    })
    .ajaxStart(function(){
        $("#prodadd-btn").attr("disabled", "disabled");
        NProgress.start(); 
    })
    .ajaxStop(function(){
        $("#prodadd-btn").removeAttr("disabled"); 
        NProgress.done();
    });

I also used Nprogress to show a cool progress bar.

These changes should be sufficient to prevent user entering same url multiple times. However there is a possibility that two users could enter same url at the same time. Now as I have created a unique single index, the second insertion will throw DuplicateKeyError. In that case I will just find that url by from db and add it to user. Here is the modified code:

def post(self):
    user_email = self.get_secure_cookie('trakr')
    user_db = self.application.db.users
    product_db = self.application.db.products
    tracker_db = self.application.db.trackers # this has product_id to users_id
    product_url = self.get_argument('product-url', None)

    # ...

    product_doc = product_db.find_one({'url': url})
    if not product_doc:
        # ...
        # ...            
        product_url, product_name, product_img_url, product_price = vendor_func_map[vendor_id](url)

        try:
            product_id = str(product_db.insert({
            # ...
            'url': url,
            'current_price': product_price,
            }))                
        except pymongo.errors.DuplicateKeyError:
            product_doc = product_db.find_one({'url': url})
            product_id = product_doc['_id']
    else:
        product_id = product_doc['_id']

    user_db.update({'email_id': user_email}, {'$addToSet': {'tracked_products': ObjectId(product_id)}})
    # ...
    self.redirect('/products')

A. Jesse Jiryu Davis

unread,
Jul 9, 2014, 9:44:07 AM7/9/14
to python-...@googlegroups.com
Cool. In this case there's no need for the first find_one; just assume the document is not in the collection, try to insert it, and catch the DuplicateKeyError. After all, if the first find_one returns None you haven't learned anything useful: the document wasn't in the collection a moment ago, but you still don't know for certain whether it's in the collection now.


--

avinash sajjanshetty

unread,
Jul 9, 2014, 11:03:45 AM7/9/14
to python-...@googlegroups.com
no, actually if URL is not present in the DB then I assume I am adding it for the first time. So I have to fetch its price, image, name etc. so I cannot directly save without making this HTTP request. 

thats why I first search whether it exists in DB or not.

can you help me with another query related to MongoDB? I am working this project which is more suitable for RDBMS. However I have tried many times to learn, but I never was able to understand then I gave up. But mongo db it goes smoothly. Here is my code: http://codereview.stackexchange.com/questions/55674/e-commerce-product-price-tracker and I need some assistance optimizing mongo db related code where I query first collection, get some field from a document and query second document with that field.

basically I would like to know is it possible to imitate join type of operation 

A. Jesse Jiryu Davis

unread,
Jul 9, 2014, 7:38:10 PM7/9/14
to python-...@googlegroups.com
Sorry, I'm not going to have time in the next few days to look at this in detail. Looks like the folks on Stack Exchange are helping, though. The short version is: to join collections in MongoDB, first query all the documents in one collection. Make a set of the _ids you need for the second collection, then use an $in query to find the documents in the second collection. Do the final join in Python using a dict.


--

avinash sajjanshetty

unread,
Jul 9, 2014, 10:41:48 PM7/9/14
to python-...@googlegroups.com
^thank you! I will try that 

:)
Reply all
Reply to author
Forward
0 new messages