linking to mySQL

26 views
Skip to first unread message

MaryJo Webster

unread,
Apr 9, 2015, 11:35:39 AM4/9/15
to panda-pro...@googlegroups.com
I'm wondering if there's a way -- or if this could be something set up in the future -- to store data in mySQL (or similar database) and link PANDA to either a single table or a saved query instead of having to import each file into PANDA?

I'm looking to update an old internal data warehouse at my new job at the Minneapolis Star Tribune. They've been using Uniquery as a front end with the data stored in SQL Server, giving reporters and editors the ability to search datasets like voter registration, births, deaths, jail bookings, prison records, etc.  Uniquery gets the data from a saved "view" in SQL Server that flattens the relational tables and limits the fields to only those needed for the front-end search. 

It's running on SQL Server 2000, so obviously needs to go. However, one of the IT guys is in the midst of revising Uniquery so that it could move to a more modern version of SQL Server. 

Even with this upgrade, I think Uniquery still has some shortcomings -- such as giving all users the ability to upload and export chunks of data -- and the search capabilities are not quite as flexible as PANDA. 

But my biggest beef with PANDA (having set it up at the Pioneer Press last year) is that, at least in some cases, you have to keep two sets of data -- one in mySQL or SQL Server for doing big analysis efforts -- and the other in PANDA (especially any relational datasets that can't go into PANDA, as is)

I'd appreciate any insights any of you have on this. 
thanks,
MaryJo Webster
Computer-assisted reporting editor
Minneapolis Star Tribune
@MaryJoWebster


Brian Boyer

unread,
Apr 19, 2015, 3:24:28 PM4/19/15
to panda-pro...@googlegroups.com
Hey, MaryJo! That's a interesting idea! The data would need to be duplicated -- PANDA is speedy because it's basically a search engine, not a relational database. You could write a view in SQL and then import the data via the PANDA API. If the data changes regularly, then you could set up a cron job (or something like that) to periodically re-import the view.

--
You received this message because you are subscribed to the Google Groups "PANDA Project Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to panda-project-u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

MaryJo Webster

unread,
May 26, 2015, 3:52:20 PM5/26/15
to panda-pro...@googlegroups.com
On the PANDA FAQ page, it says to look for the "m1.small" instance. I'm seeing on Amazon's pricing page that M1 is "previous generation" and, at first glance at least, is no longer an option. 

Can anyone tell me what I should be looking for, as best option, on the Amazon pricing page? http://aws.amazon.com/ec2/pricing/

And what do I need to take into consideration in deciding how big I need to go? Size of the datasets I'm going to put on PANDA? Number of users? 

At this point, I need to get a price estimate to the powers-that-be in my newsroom to greenlight this thing.

thanks,
MaryJo

Serdar Tumgoren

unread,
May 26, 2015, 4:27:14 PM5/26/15
to panda-pro...@googlegroups.com
Looks like t2.small is the current generation, which will give you a single virtual CPU and 2 GB of virtual ram in AWS world.  If you have the budget, there's nothing stopping you from spending more on an SSD drive and more RAM and CPU (more ram in particular might help with the search backend). 

You could also save money by getting a reserved instance, which requires you to pay up front for longer periods of time. That seems like a good idea for a PANDA instance, unless you plan to start/stop the machine to save on costs.

Finally, keep in mind that you'll pay the cost for bandwidth for transferring data in and out of the EC2 world. So monthly costs could start rising if a lot of folks are uploading lots of large files or downloading data from PANDA. In practice I doubt it'll be a ton of money, but it is a variable,  ongoing cost to be aware of. You can try to price that out using the AWS calculator, but it'd probably be a guessing game unless you have a  clear sense of historical and expected usage patterns. Probably best to just build in some cushion for bandwidth, or at least give the heads up to the money folks.

Brian Boyer

unread,
May 26, 2015, 9:08:09 PM5/26/15
to panda-pro...@googlegroups.com
I was just looking into this question last week, here's what I know:

Our original estimate (http://pandaproject.net/costs/) was $540/yr for a Small instance, reduced to about $300/yr if you pay up front.

Since then, Amazon has changed the machine types a bit, so the numbers change. A year on a Small, paid up-front, is now down to $151/year. But the new machines are sooooo much more machine for your money -- they'll make PANDA blazing fast. So these days, we'd recommend an m3.medium, priced at $372/yr, prepaid.

MaryJo Webster

unread,
May 27, 2015, 6:29:27 PM5/27/15
to panda-pro...@googlegroups.com
Thank you!

Reply all
Reply to author
Forward
0 new messages