Good practices for regularly updated data

Tyler Dukes

unread,

Sep 12, 2013, 12:39:59 PM9/12/13

to panda-pro...@googlegroups.com

Hi everyone,

I've got a regularly updated salary database that I already store in Panda. What's the best practice for updating this dataset in Panda to avoid duplicate data?

So far, I've been deleting the old dataset and uploading the new one, but I wasn't sure if that's the best way to go.

Thanks in advance for the help.

Nolan Hicks

unread,

Oct 15, 2013, 2:11:17 PM10/15/13

to panda-pro...@googlegroups.com

Hey Tyler,

That's pretty much what we've been doing here. If anyone has better (i.e. less time consuming) suggestions, I'm all ears.

Joe Germuska

unread,

Oct 15, 2013, 6:31:24 PM10/15/13

to panda-pro...@googlegroups.com

So, PANDA doesn't have any direct support for something like that.

Besides lines that are exact duplicates, are there lines where some cells are identifiers (a person's name, or a unique ID) and other cells are updated values (like if the person got a raise)?

If the only issue is pure duplicate lines, I'd probably write a simple script that reads the old file and the new file and spits out only unique lines. Not the most user friendly, of course, but…

If you want to update some rows as well as insert others, then there's really probably nothing easier than deleting the dataset.

I could see having a PANDA feature which is "replace data" alongside the "upload more data"; we'd want to think it through a little, but it seems straightforward logically. It would save you from re-entering the metadata, and it could also keep a reference to earlier files which could be downloaded but which wouldn't be in search results.

Technically we could probably do something like "merge data" to save you from writing the script mentioned above but I feel like that starts to run the risk of some weird edge cases.

Joe

--
You received this message because you are subscribed to the Google Groups "PANDA Project Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to panda-project-u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Joe Germuska
J...@Germuska.com * http://blog.germuska.com * http://twitter.com/JoeGermuska

"Participation. That's what's gonna save the human race." --Pete Seeger

wm higgins

unread,

Oct 19, 2013, 11:47:16 PM10/19/13

to panda-pro...@googlegroups.com

this probably won't help your case, but i'm stealing data from a django app for one of our panda datasets, and since the django updates are being made by users, i run a cron job that captures a user change log of deletions, additions and updates and then pushes that to panda via the api.

Reply all

Reply to author

Forward