Working with a form that is not a form

402 views
Skip to first unread message

Stephen M. Olds

unread,
Mar 11, 2012, 1:57:44 AM3/11/12
to beauti...@googlegroups.com
I have the following HTML on a logon page:

<div id="ctl00_ContentPlaceHolder1_pnlLogin">
    <table width="100%" cellpadding="5">
        <tr>
            <td width="20%"  align="right">Username or Email:</td>
            <td>
                <input name="ctl00$ContentPlaceHolder1$txtUserName" type="text" id="ctl00_ContentPlaceHolder1_txtUserName" />
            </td>
            <td rowspan="4"></td>
        </tr>
        <tr>
            <td align="right">Password:</td>
            <td>
                <input name="ctl00$ContentPlaceHolder1$txtPassword" type="password" id="ctl00_ContentPlaceHolder1_txtPassword" />
            </td>
        </tr>
        <tr>
            <td>&nbsp;</td>
            <td><input id="ctl00_ContentPlaceHolder1_chkRemember" type="checkbox" name="ctl00$ContentPlaceHolder1$chkRemember" /> Remember Username<br />
            <input id="ctl00_ContentPlaceHolder1_chkKeepLoggedIn" type="checkbox" name="ctl00$ContentPlaceHolder1$chkKeepLoggedIn" /> Keep me logged in. (If you have billing payment info stored in your account, you cannot use this preference)</td>
        </tr>
        <tr>
            <td>&nbsp;</td>
            <td>
                <input type="submit" name="ctl00$ContentPlaceHolder1$btnLogin" value="Sign In" onclick="javascript:WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions(&quot;ctl00$ContentPlaceHolder1$btnLogin&quot;, &quot;&quot;, true, &quot;Login&quot;, &quot;&quot;, false, false))" id="ctl00_ContentPlaceHolder1_btnLogin" class="button" />
            </td>
        </tr>
    </table>
</div>


Using the following code I can isolate the userID, password & submit fields:

#!/usr/bin/env python

from bs4 import BeautifulSoup
import re
from mechanize import Browser

import urllib, os, string
import urllib2
import cgi
import datetime

from Tkinter import *
from tkFileDialog import *
import tkMessageBox
import sys, gc
import Tkinter
import time

import socket
from configobj import ConfigObj
import optparse

import Image

from tkMessageBox import askokcancel    
timeout = 10 # seconds

br = Browser()
br.set_handle_robots(False)
  
response = br.open('https://www.contentparadise.com/signin.aspx')
page = response.read()

soup = BeautifulSoup(page)
soup.prettify()

logon_form = soup.find('div', {'id' : 'ctl00_ContentPlaceHolder1_pnlLogin'})

userID = logon_form.find('input', {'id' : 'ctl00_ContentPlaceHolder1_txtUserName'})
userPW = logon_form.find('input', {'id' : 'ctl00_ContentPlaceHolder1_txtPassword'})
submitBtn = logon_form.find('input', {'id' : 'ctl00_ContentPlaceHolder1_btnLogin'})

pf = open('pf.txt', 'w')
pf.write(str(userID)+ '\n\n')
pf.write(str(userPW)+ '\n\n')
pf.write(str(submitBtn)+ '\n\n')
pf.close()


pf.txt shows me:

<input id="ctl00_ContentPlaceHolder1_txtUserName" name="ctl00$ContentPlaceHolder1$txtUserName" type="text"/>

<input id="ctl00_ContentPlaceHolder1_txtPassword" name="ctl00$ContentPlaceHolder1$txtPassword" type="password"/>

<input class="button" id="ctl00_ContentPlaceHolder1_btnLogin" name="ctl00$ContentPlaceHolder1$btnLogin" onclick='javascript:WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions("ctl00$ContentPlaceHolder1$btnLogin", "", true, "Login", "", false, false))' type="submit" value="Sign In"/>


How do I set the values of the ID & PW and then click the submit button to log on?


Leonard Richardson

unread,
Mar 11, 2012, 10:44:34 AM3/11/12
to beauti...@googlegroups.com
Steven,

Someone on this list might be able to help you, but this is a question
about Mechanize, so the Beautiful Soup discussion group is not the
most effective place to get help.

There's a section in the Mechanize FAQ that might help you:
("JavaScript is messing up my web-scraping. What do I do?").

http://wwwsearch.sourceforge.net/mechanize/faq.html#general

The Q&A site Stack Overflow is probably the best venue for this kind
of question.

http://stackoverflow.com/

I searched for "mechanize javascript" on Stack Overflow, and found
many related discussions:

http://stackoverflow.com/questions/5793414/mechanize-and-javascript
http://stackoverflow.com/questions/2558856/mechanize-javascript
http://stackoverflow.com/questions/4928181/python-mechanize-submit-custom-form
http://stackoverflow.com/questions/802225/how-do-i-use-mechanize-to-process-javascript
http://stackoverflow.com/questions/6417801/how-to-properly-use-mechanize-to-scrape-ajax-sites
http://stackoverflow.com/questions/4114219/scraping-site-that-requires-javascript-enabled-with-mechanize-beautifulsoup-p

Leonard

> --
> You received this message because you are subscribed to the Google Groups
> "beautifulsoup" group.
> To post to this group, send email to beauti...@googlegroups.com.
> To unsubscribe from this group, send email to
> beautifulsou...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/beautifulsoup?hl=en.

Reply all
Reply to author
Forward
0 new messages