login to a website with go

brunetto

unread,

Dec 26, 2014, 10:46:08 AM12/26/14

to golan...@googlegroups.com

Hi!

I'm trying to login to a site (for example my wordpress blog) using a Go program.

After a lot of searches I found this:

https://gist.github.com/varver/f327ef9087ebf76aa4c4

that's exactly what I want.

I changed PostForm url.Values to what I found in the page source

(in the wordpress case "user_login", "user_pass") but I'm not able to login.

The same user and pw that let me login by hand would redirect my go program to the login page again

(maybe incorrect login?).

I'm not used to this kind of problem, I'm trying to learn and experiment, so I'm a bit confused.

What should I try/read/understand/do?

Thanks a lot

brunetto

Sri G

unread,

Dec 26, 2014, 12:48:43 PM12/26/14

to golan...@googlegroups.com

Hi Brunetto,

Using the developer tools in your browser, take a look at the request being POST'd and the response when you login successfully, and after the redirect.

Then see if your application does the same thing.

HTH,

Sri

brunetto

unread,

Dec 27, 2014, 6:44:50 AM12/27/14

to golan...@googlegroups.com

Thanks,

it works, but not always (not all the sites).

Why are the post form names different from those in the page source?

Is there a comprehensive reading material?

A standard approach to login from a (go) code into a site?

thanks

brunetto

unread,

Dec 27, 2014, 6:53:22 AM12/27/14

to golan...@googlegroups.com

Ok, I found that probably I should extract an authenticity token from the login page and paste it into the postform...

is there a standard method to do this or should I parse the login page source?

I'm sorry if they are stupid questions but it is the first time for me and I can't find docs for this.

thanks

Konstantin Khomoutov

unread,

Dec 27, 2014, 10:30:54 AM12/27/14

to brunetto, golan...@googlegroups.com

On Sat, 27 Dec 2014 03:44:50 -0800 (PST)
brunetto <brunett...@gmail.com> wrote:

> it works, but not always (not all the sites).
>
> Why are the post form names different from those in the page source?
> Is there a comprehensive reading material?
> A standard approach to login from a (go) code into a site?

[...]

There's no standard approach to "login ... into a site" because this
concept does not really exist. I mean, there's no standardized login
protocol for web applications (something like, say, SASL), which could
be implemented as library and used in each and every case. Instead,
each web application implements something on its own, and in most cases
catered to interactive usage. Think of it: when the user's browser
renders a login page, the user is not at all concerned with how the
variables on this form are named, and what happens when they click/tap
the "login" button on that page. More to this, certain kinds of sites
-- with public image hosting solutions being a good example -- try to
actively combat automated logins to prevent creation of automated
utilities -- image uploaders in our example -- which would allow the
user to use the web application without ever seeing its web UI in their
browser. The reason is simple: these sites cash on showing
advertisements to their users, so if there's no web UI downloading, then
no banners are shown etc. Such sites will intentionally generate a
login form containing randomly named variables, and supply you a cookie
which would allow it to decypher/remap these names back into sensible
values when the browser sends the filled form back.

These days, those web applications which want to be good to automated
solutions usually provide some explicit means to do that. Typically,
these are REST APIs.

So what can you do for this?
Unfortunately, in complicated cases you have to play web browser and do
whatever it would do if it rendered the login form, let the user fill
it and send it back. This requires paying attention to cookies and the
Referer HTTP header. Unfortunately, if the web site uses JavaScript to
operate the form (quite a de-facto standard these days), you're out of
luck. If the web site mangles the form values, you might try to guess
their meaning from their relative disposition and types.

As to comprehensive reading material...
I do not know any book which covered everything needed, but originally
all this stuff (web forms) appeared in the form of a server-side
protocol called CGI, so you might start with googling for this word.
Any book on server-side web programming (no matter which langugage it
targets) should cover the most basic topics. Books on PHP are in
abundance, for example.

brunetto

unread,

Dec 27, 2014, 10:39:35 AM12/27/14

to golan...@googlegroups.com, brunett...@gmail.com

Thanks a lot!!

I thought that my problem was the csrf token, but it seems to be ok.

Now I found in the page header:

Cache-Control:[max-age=0, private, must-revalidate]

that I think is the problem.

Thanks again.

brunetto

Micky

unread,

Dec 28, 2014, 3:34:19 PM12/28/14

to brunetto, golang-nuts

If there are no APIs available to talk with, and the website you are
dealing with has to do with cache, cookies and secure sessions, then
you are probably looking for something called as a "headless browser"
client. In JS, that's a PhantomJS. In Go world, there's webloop and
go-webkit packages.

Otherwise, you will end up writing a lot of code that someone already has!

> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

brunetto

unread,

Dec 28, 2014, 3:54:21 PM12/28/14

to golan...@googlegroups.com, brunett...@gmail.com

Thanks!!

I also tried surf (https://github.com/headzoo/surf).

I'll have a look at webloop.

But it's not clear to me in what a go program differs from my browser in this case

(I don't see javascript tricks, but I'm not an expert)

Shawn Milochik

unread,

Dec 28, 2014, 4:57:09 PM12/28/14

to golan...@googlegroups.com

On Dec 28, 2014 3:54 PM, "brunetto" <brunett...@gmail.com> wrote:
>
> Thanks!!
> I also tried surf (https://github.com/headzoo/surf).
> I'll have a look at webloop.
> But it's not clear to me in what a go program differs from my browser in this case
> (I don't see javascript tricks, but I'm not an expert)
>
>

Because JavaScript makes changes to the page dynamically. Code that pulls the HTML and parses it doesn't "see" everything a browser would see. That's why Selenium exists. Because unless you're executing the JavaScript in the page, it's probably not going to work.

If you've ever used jQuery, for example, you'd see why that is pretty quickly.

shubh...@gmail.com

unread,

Feb 27, 2016, 9:45:37 AM2/27/16

to golang-nuts

Hi,

I have been having the same issue, but different circumstances. When I try to login to a website, there is a captcha image that I have to download. I have made a Captcha Parser successfully, but I am not able to download the exact captcha of that session.

What should I do?

Thank you

Shubhodeep Mukherjee

hoppe....@gmail.com

unread,

Feb 28, 2016, 5:14:39 AM2/28/16

to golang-nuts

Hi all,

in my opinion the simplest way, especially for beginners:

- use chrome developer tools

- process the login process manually

- right click one login request(s) (perhaps there are more than only one step)

- choose "copy as cUrl" for each request

- use the go tool curl2go https://mholt.github.io/curl-to-go/ and convert the curl command to go

That's step one. Sure, sometimes one have to capture the html anwer and extract

tokens, session id etc. and change the header/post parameter. But that shouldn't be a problem when you