OK, so here's the status. I've implemented goko parsing and propagated the changes throughout council room, in my fork at https://github.com/ftlftw/dominionstats/ . I think I'm basically done. How should I go about integrating this into cr? Who has access to the cr server to put the changes in/make sure they go through and so on? I put in a pull request. Hopefully it's just plug and it runs fine, but you never know, because these changes were pretty big and pretty fundamental...
Card winningness graphs and card groupings still don't work, but that's not a goko thing, they don't work on cr right now.
I've run the database with a limited number of days, but trying to load a few months worth of data overwhelms my hardware (shitty overheating laptop, database external usb hard drive...) so I have not been able to run a large-scale test.
Still plenty left to do - search functioanlity upgrades, different stats for pro/casual/unrated/adventure games, possibly handling goko's bugs better - but the basics are as far as I know working and I'd like to get that integrated before going off and working on improvements to that.
I can help, but we've had bad storms here in Minnesota, so we have only had intermittent power over the last 24 hours, and no certain time when it will be restored. Please remind me if you haven't heard from me by Monday evening.
Mike
--
You received this message because you are subscribed to the Google Groups "Councilroom.com development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to councilroom-d...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Hey Max,Thanks for your patience, and thank you for this HUGE contribution of work. I've skimmed through the changesets and it looks good so far. I'm pretty close to just merging it in and turning it on, but here are some questions I'm wondering about before doing so:
- It looks like the update.py script will can handle logs from either source (Goko and Iso), and it will just start working backwards from the current date until no new logs are loaded. For the period of time where both Goko and Iso were running in parallel, do you know if update.py will do the right thing, or will it need to be helped through the transition?
- Is anything done to handle player name differences between Iso and Goko? Are they merged or kept separate?
- Once the games are parsed, is there a way to distinguish which site the parsed game originally came from?
- In your testing, did the jobs that come after scraping run successfully on games originating from Goko?
- How many days will need to be loaded? How far back are Goko logs available?
- While I expect the differences in your system and the AWS server the site is running on might be significant, how quickly does the code process an average day's worth of Goko logs?
No, for the period where Goko and iso were running in parallel, it won't do anything as it is, it'll see that there are loaded logs. I can write a script that'll handle that time I suppose, I haven't done that yet.
Yes, a few ways I think. First you can just look at the game id; goko game ids will have the form "YYYYMMDD/log.[goko stuff].txt", as opposed to isotropic game ids which were 'game-YYYYMMDD-[iso stuff].html'. On goko it should also store the rating system used for the game, 'unknown' (for those prior to when they started reporting it in logs), 'pro', 'casual', or 'adventure'. That will be undefined for old iso games, or will be 'isotropic' if they're re-parsed now for whatever reason.
As it is in update.py right now, it'll load goko logs until the day isotropic shut down, so March 20th, and won't go back and look for older ones, there wouldn't be any mixed goko/iso days stored... I think goko logs go back to last august though, so that's something I can work on? There were a lot fewer games on goko before iso shut down though, and they were buggy, and I wasn't sure how I would test mixed goko/iso days since I don't have any iso logs in the system to work with, so I didn't do that yet.
The limiting factor for me was the speed of analyze2 and I think optimal_card_ratios, which both ran at speeds of 30-50 games per second on my setup. So it was super-slow; that part would only process a few days per day. Other steps took about an order of magnitude faster, when I skipped those I could go through a few weeks of goko logs in a day of processing. Of course, this was on a laptop, which I had to underclock and use cpulimit to prevent it from overheating (which it would if it ever used 100% of the CPU), and with the database on a USB external hard drive connected via I think USB 2.0, so yeah... I would sincerely hope that the real setup would give speeds at least an order of magnitude faster. I didn't change anything in analyze2 and optimal_card_ratios, I don't really know why they would be any slower for goko than for iso (or even if they were any slower, maybe they were always like that).
Actually, one thing I *am* worried about now that I think about it some more is Lord Bottington. Is the database going to be okay with the huge number of games an AI player has? If someone tries to look up Lord Bottington's record, is that going to break things because he'll have such huge numbers of games? I didn't know how I could test that, since my setup seemed to overload if it had to load a stiff breeze worth of data...