Max Gibiansky

unread,

Jun 21, 2013, 4:21:15 PM6/21/13

to council...@googlegroups.com

OK, so here's the status. I've implemented goko parsing and propagated the changes throughout council room, in my fork at https://github.com/ftlftw/dominionstats/ . I think I'm basically done. How should I go about integrating this into cr? Who has access to the cr server to put the changes in/make sure they go through and so on? I put in a pull request. Hopefully it's just plug and it runs fine, but you never know, because these changes were pretty big and pretty fundamental...

Card winningness graphs and card groupings still don't work, but that's not a goko thing, they don't work on cr right now.

I've run the database with a limited number of days, but trying to load a few months worth of data overwhelms my hardware (shitty overheating laptop, database external usb hard drive...) so I have not been able to run a large-scale test.

Still plenty left to do - search functioanlity upgrades, different stats for pro/casual/unrated/adventure games, possibly handling goko's bugs better - but the basics are as far as I know working and I'd like to get that integrated before going off and working on improvements to that.

Michael McCallister

unread,

Jun 22, 2013, 12:13:13 AM6/22/13

to Max Gibiansky, council...@googlegroups.com

I can help, but we've had bad storms here in Minnesota, so we have only had intermittent power over the last 24 hours, and no certain time when it will be restored. Please remind me if you haven't heard from me by Monday evening.

Mike

--
You received this message because you are subscribed to the Google Groups "Councilroom.com development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to councilroom-d...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Max Gibiansky

unread,

Jun 22, 2013, 6:41:34 PM6/22/13

to Michael McCallister, council...@googlegroups.com

Sounds good, let me know how it goes! When you have the time and the power, that is.

Max Gibiansky

unread,

Jun 27, 2013, 11:57:50 PM6/27/13

to Michael McCallister, council...@googlegroups.com

Just checking in again, are you able to help out with this? Have you had your power back?

On Fri, Jun 21, 2013 at 9:13 PM, Michael McCallister <mi...@mccllstr.com> wrote:

Michael McCallister

unread,

Jun 28, 2013, 12:39:18 AM6/28/13

to Max Gibiansky, council...@googlegroups.com

Hey Max,

Thanks for your patience, and thank you for this HUGE contribution of work. I've skimmed through the changesets and it looks good so far. I'm pretty close to just merging it in and turning it on, but here are some questions I'm wondering about before doing so:

It looks like the update.py script will can handle logs from either source (Goko and Iso), and it will just start working backwards from the current date until no new logs are loaded. For the period of time where both Goko and Iso were running in parallel, do you know if update.py will do the right thing, or will it need to be helped through the transition?
Is anything done to handle player name differences between Iso and Goko? Are they merged or kept separate?
Once the games are parsed, is there a way to distinguish which site the parsed game originally came from?
In your testing, did the jobs that come after scraping run successfully on games originating from Goko?
How many days will need to be loaded? How far back are Goko logs available?
While I expect the differences in your system and the AWS server the site is running on might be significant, how quickly does the code process an average day's worth of Goko logs?

Mike

Max Gibiansky

unread,

Jun 28, 2013, 4:10:29 AM6/28/13

to Michael McCallister, council...@googlegroups.com

On Thu, Jun 27, 2013 at 9:39 PM, Michael McCallister <mi...@mccllstr.com> wrote:

Hey Max,

Thanks for your patience, and thank you for this HUGE contribution of work. I've skimmed through the changesets and it looks good so far. I'm pretty close to just merging it in and turning it on, but here are some questions I'm wondering about before doing so:

It looks like the update.py script will can handle logs from either source (Goko and Iso), and it will just start working backwards from the current date until no new logs are loaded. For the period of time where both Goko and Iso were running in parallel, do you know if update.py will do the right thing, or will it need to be helped through the transition?

No, for the period where Goko and iso were running in parallel, it won't do anything as it is, it'll see that there are loaded logs. I can write a script that'll handle that time I suppose, I haven't done that yet.

Is anything done to handle player name differences between Iso and Goko? Are they merged or kept separate?

If someone kept the same name at both iso and goko, it should recognize all their games I think, though if people used this as a time to modify spelling/spacing/punctuation/capitalization(?) then it won't merge them, I just used goko names as is. Due to the fact that goko puts the names very clearly separated from the rest of the text, it should be far more permissive with names on goko than it was on iso, it shouldn't care about special characters or keywords in names or anything.

Once the games are parsed, is there a way to distinguish which site the parsed game originally came from?

Yes, a few ways I think. First you can just look at the game id; goko game ids will have the form "YYYYMMDD/log.[goko stuff].txt", as opposed to isotropic game ids which were 'game-YYYYMMDD-[iso stuff].html'. On goko it should also store the rating system used for the game, 'unknown' (for those prior to when they started reporting it in logs), 'pro', 'casual', or 'adventure'. That will be undefined for old iso games, or will be 'isotropic' if they're re-parsed now for whatever reason.

In your testing, did the jobs that come after scraping run successfully on games originating from Goko?

Yes, they worked.

How many days will need to be loaded? How far back are Goko logs available?

As it is in update.py right now, it'll load goko logs until the day isotropic shut down, so March 20th, and won't go back and look for older ones, there wouldn't be any mixed goko/iso days stored... I think goko logs go back to last august though, so that's something I can work on? There were a lot fewer games on goko before iso shut down though, and they were buggy, and I wasn't sure how I would test mixed goko/iso days since I don't have any iso logs in the system to work with, so I didn't do that yet.

While I expect the differences in your system and the AWS server the site is running on might be significant, how quickly does the code process an average day's worth of Goko logs?

The limiting factor for me was the speed of analyze2 and I think optimal_card_ratios, which both ran at speeds of 30-50 games per second on my setup. So it was super-slow; that part would only process a few days per day. Other steps took about an order of magnitude faster, when I skipped those I could go through a few weeks of goko logs in a day of processing. Of course, this was on a laptop, which I had to underclock and use cpulimit to prevent it from overheating (which it would if it ever used 100% of the CPU), and with the database on a USB external hard drive connected via I think USB 2.0, so yeah... I would sincerely hope that the real setup would give speeds at least an order of magnitude faster. I didn't change anything in analyze2 and optimal_card_ratios, I don't really know why they would be any slower for goko than for iso (or even if they were any slower, maybe they were always like that).

I was never actually able to test the full few months of goko stuff, my database would slow to a crawl once it got to a few weeks of data. I *think* that's due to the aforementioned terrible setup I have, since I didn't touch any of the database or analysis code, and the parsing generated things which didn't look any different from processed iso games, but I can't really be sure.

Actually, one thing I *am* worried about now that I think about it some more is Lord Bottington. Is the database going to be okay with the huge number of games an AI player has? If someone tries to look up Lord Bottington's record, is that going to break things because he'll have such huge numbers of games? I didn't know how I could test that, since my setup seemed to overload if it had to load a stiff breeze worth of data...

Michael McCallister

unread,

Jun 30, 2013, 1:28:06 AM6/30/13

to Max Gibiansky, council...@googlegroups.com

Responses inline below:

On Fri, Jun 28, 2013 at 3:10 AM, Max Gibiansky <maxsi...@gmail.com> wrote:

No, for the period where Goko and iso were running in parallel, it won't do anything as it is, it'll see that there are loaded logs. I can write a script that'll handle that time I suppose, I haven't done that yet.

I will see what I can do to work around it. Should be a minor change.

Yes, a few ways I think. First you can just look at the game id; goko game ids will have the form "YYYYMMDD/log.[goko stuff].txt", as opposed to isotropic game ids which were 'game-YYYYMMDD-[iso stuff].html'. On goko it should also store the rating system used for the game, 'unknown' (for those prior to when they started reporting it in logs), 'pro', 'casual', or 'adventure'. That will be undefined for old iso games, or will be 'isotropic' if they're re-parsed now for whatever reason.

I think I'll put a more concrete "src" value in the Goko games so that it will be easier to query on in the future, if that proves necessary.

As it is in update.py right now, it'll load goko logs until the day isotropic shut down, so March 20th, and won't go back and look for older ones, there wouldn't be any mixed goko/iso days stored... I think goko logs go back to last august though, so that's something I can work on? There were a lot fewer games on goko before iso shut down though, and they were buggy, and I wasn't sure how I would test mixed goko/iso days since I don't have any iso logs in the system to work with, so I didn't do that yet.

I'll see if it is easy to load them from the outset. Many of the trackers look only at a "high water mark", so it's a little harder to get them to go back in time to fill in holes. I'll probably try to get as much of the historic Goko data loaded as possible before proceeding into the analyze steps.

The limiting factor for me was the speed of analyze2 and I think optimal_card_ratios, which both ran at speeds of 30-50 games per second on my setup. So it was super-slow; that part would only process a few days per day. Other steps took about an order of magnitude faster, when I skipped those I could go through a few weeks of goko logs in a day of processing. Of course, this was on a laptop, which I had to underclock and use cpulimit to prevent it from overheating (which it would if it ever used 100% of the CPU), and with the database on a USB external hard drive connected via I think USB 2.0, so yeah... I would sincerely hope that the real setup would give speeds at least an order of magnitude faster. I didn't change anything in analyze2 and optimal_card_ratios, I don't really know why they would be any slower for goko than for iso (or even if they were any slower, maybe they were always like that).

Cool. No way to be sure until we find out by running it. :) I plan to pull in all your code on Sunday and start slurping down all the Goko logs to date. Some of the steps can be run in parallel with multiple Amazon instances.

Actually, one thing I *am* worried about now that I think about it some more is Lord Bottington. Is the database going to be okay with the huge number of games an AI player has? If someone tries to look up Lord Bottington's record, is that going to break things because he'll have such huge numbers of games? I didn't know how I could test that, since my setup seemed to overload if it had to load a stiff breeze worth of data...

Shouldn't be any more of a problem than some of the high-game-count real players have... The query will timeout after a bit, so no harm done. The real way to fix that to precalculate some of the answers either through more work in Mongo or by moving some of the data into a relational DB and setting it up with proper indexes. That will have to be down the road.

Mike

Reply all

Reply to author

Forward