A small doubt

mskd96

unread,

Jan 8, 2015, 12:33:22 PM1/8/15

to wncc...@googlegroups.com

Can we write a program which can log in into some website, and can get details which one can only see after logging in to the website?

I do know about BeautifulSoup in python which can parse web pages to generate meaningful information but can we really "login" to some website using any such softwares??

Regards,

Krishna Deepak

Gireesh Nayak

unread,

Jan 8, 2015, 1:24:59 PM1/8/15

to wncc...@googlegroups.com

I had similar need. Is it possible to crawl through website eg. Flicker,Vimeo etc and get the contact details of authors.

--
--
The website for the club is http://wncc-iitb.org/
To post to this group, send email to wncc...@googlegroups.com
---
You received this message because you are subscribed to the Google Groups "Web and Coding Club IIT Bombay" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wncc_iitb+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Thanks & Regards

Gireesh K Nayak,

@Girishnayak

+919742070092 (Bangalore)

+918454893412 (Mumbai)

Siddharth Bulia

unread,

Jan 8, 2015, 1:30:14 PM1/8/15

to wncc...@googlegroups.com

You can use lynx to access web using terminal. May be with its help, you can write some code to log in and get what you want.

Regards
Siddharth Bulia

On Thu, Jan 8, 2015 at 11:03 PM, mskd96 <deepakr...@gmail.com> wrote:

--

Saket Choudhary

unread,

Jan 8, 2015, 1:34:46 PM1/8/15

to wncc...@googlegroups.com

On 8 January 2015 at 09:33, mskd96 <deepakr...@gmail.com> wrote:
> Can we write a program which can log in into some website, and can get
> details which one can only see after logging in to the website?
>

This is pretty much a solved problem.

> I do know about BeautifulSoup in python which can parse web pages to
> generate meaningful information but can we really "login" to some website
> using any such softwares??
>

Yes. See, for example: https://github.com/coursera-dl/coursera

Krishna Deepak

unread,

Jan 8, 2015, 1:57:51 PM1/8/15

to wncc...@googlegroups.com

More interestingly, if there is a way to login to asc similarly, we can do stuff like telling which course has best grading statistics, which prof gives good grades so that it could help in registration :P

Saket Choudhary

unread,

Jan 8, 2015, 2:40:44 PM1/8/15

to wncc...@googlegroups.com

On 8 January 2015 at 10:57, Krishna Deepak <deepakr...@gmail.com> wrote:
> More interestingly, if there is a way to login to asc similarly, we can do
> stuff like telling which course has best grading statistics, which prof
> gives good grades so that it could help in registration :P
>
>

Sure, a related implementation is here:
https://github.com/saketkc/iitb-library-sms-interface/blob/master/grades.py

Unfortunately, I have no way to test it now.

We infact made a gchat bot(aaa...@appspot.com) that would give you
grading statistics and course info. It was hosted on Google appengine.
It used an internal proxy server and hence again, no longer works.

> On Thursday, January 8, 2015 at 11:03:22 PM UTC+5:30, mskd96 wrote:
>>
>> Can we write a program which can log in into some website, and can get
>> details which one can only see after logging in to the website?
>>
>> I do know about BeautifulSoup in python which can parse web pages to
>> generate meaningful information but can we really "login" to some website
>> using any such softwares??
>>
>> Regards,
>> Krishna Deepak
>

Pritam Baral

unread,

Jan 8, 2015, 4:20:33 PM1/8/15

to wncc...@googlegroups.com

There are several ways to do it.

BeatifulSoup + urllib* (or requests): BeautifulSoup is an html parser, not a browser. So all the other stuff a browser does has to be compensated by something else. The network part, for example, or cookies, or worst: JavaScript and DOM! If you can extract the login URL and parameters (by reading the html and JS or by using a real browser), you can simulate the network part and cookie management part with other libraries.

Selenium/PhantomJS: Both of these automate full browsers. Selenium was built to test websites in the context of real browsers, PhantomJS was built to automate actions with a browser with having to run a full, graphical browser. But both of these provide full browsers, and can be scripted.

Regards,
Chhatoi Pritam Baral

psharma1707

unread,

Jan 8, 2015, 7:52:36 PM1/8/15

to wncc...@googlegroups.com

You could also try libraries like mechanize and selenium in python.

Kumar Ayush

unread,

Jan 8, 2015, 11:56:53 PM1/8/15

to wncc...@googlegroups.com

Google "mechanize python"

I have in past uses mechanize for similar purposes

jayanth jaiswal

unread,

Jan 9, 2015, 12:57:06 PM1/9/15

to wncc...@googlegroups.com

Use urllib2 for sending requests.

Use lxml to parse html and identify elements from CSS using .cssselect or from xpaths or from its DOM functions.

Use Selenium to automate the browser actions. PhantomJS is just a headless browser.

Use scrappy to scrap particular DOM elements using lxml. You can also connect it with database as MongoDB easily to store your scraped content.

Or you can directly use the REST api, if the site provides one.

@Pritam: PhantomJS was built to automate actions with a browser WITHOUT having to run a full, graphical browser.

On Fri, Jan 9, 2015 at 1:52 AM, psharma1707 <pshar...@gmail.com> wrote:

You could also try libraries like mechanize and selenium in python.

--
--
The website for the club is http://wncc-iitb.org/
To post to this group, send email to wncc...@googlegroups.com
---
You received this message because you are subscribed to the Google Groups "Web and Coding Club IIT Bombay" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wncc_iitb+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Regards,

Jayanth
B.Tech
Computer Science Department
I.I.T. Bombay

Pritam Baral

unread,

Jan 9, 2015, 1:04:32 PM1/9/15

to wncc...@googlegroups.com

@Jayanth: thanks for correcting my typo. :)

Reply all

Reply to author

Forward