Count, filter out duplicates in keys while doing the same for values

85 views
Skip to first unread message

likage

unread,
Feb 14, 2017, 3:11:37 AM2/14/17
to Python Programming for Autodesk Maya
I have this dictionary.
gen_dict = {
 
"item_C_v001" : "jack",
 
"item_C_v002" : "kris",
 
"item_A_v003" : "john",
 
"item_B_v006" : "peter",
 
"item_A_v005" : "john",
 
"item_A_v004" : "dave"
}
while I am able to filter out the version makings and the users to:

Item Name     | No. of Vers.      | User
item_A        
| 3                 | dave, john
item_B        
| 1                 | peter
item_C        
| 2                 | jack, kris

Currently I am trying to input how many versions each user own so that it will be something like this:
Item Name     | No. of Vers.      | User
item_A        
| 3                 | dave(1), john(2)
item_B        
| 1                 | peter(1)
item_C        
| 2                 | jack(1), kris(1)

but I am having problems with it.
This is the code I have used for filtering..
from collections import defaultdict

gen_dict = {
 "item_C_v001" : "jack",
 "item_C_v002" : "kris",
 "item_A_v003" : "john",
 "item_B_v006" : "peter",
 "item_A_v005" : "john",
 "item_A_v004" : "dave"
}

user_dict = defaultdict(set)
count_dict = {}

for item_name, user in gen_dict.iteritems():
    user_dict[item_name[:-3]].add(user) # Sure you want -3 not -5?
    count_dict[item_name[:-3]] = count_dict.get(item_name[:-3], 0) + 1

for name, num in sorted(count_dict.iteritems()):
    print "Version Name : {0}\nNo. of Versions : {1}\nUsers : {2}".format(
                   name, num, ', '.join(item for item in user_dict[name]))

what can I do to input that and any other ways to refine my code?

Justin Israel

unread,
Feb 14, 2017, 5:19:43 AM2/14/17
to Python Programming for Autodesk Maya
Here is a version that does what you want, while changing a minimal amount of what you had:

I added another mapping to track the number of items seen per user. That way when you are iterating your counts you can look up the user and item name and get the number of items they own. 

I am sure there are ways to completely restructure the entire thing, but given the limited context, this is just one way to change what you already have.

Justin


--
You received this message because you are subscribed to the Google Groups "Python Programming for Autodesk Maya" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python_inside_m...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/python_inside_maya/401522d4-a96a-4063-9755-863a93de490c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

likage

unread,
Feb 14, 2017, 8:06:38 PM2/14/17
to Python Programming for Autodesk Maya
Thank you Justin for your help. 
My bad, it does seems I have omitted out a portion of where/how gen_dict comes about but in any case, it comes from an iterator. And since I am only getting 2 parameters - namely the object name and the user name affiliated with it, I made it into a dictionary, as this is the best solution I can come up with.

Even so, if this time round I had wanted to add in an additional parameter, say the size of each item such that it is (this is not a dict yet):
(1 MiB) "item_C_v001" : "jack"
(5 MiB) "item_C_v002" : "kris"
(1 MiB) "item_A_v003" : "john",
(1 MiB) "item_B_v006" : "peter",
(2 MiB) "item_A_v005" : "john",
(1 MiB) "item_A_v004" : "dave"

While I am able to make my output as follows:
Item Name     | No. of Vers.      | User

item_A        
| 3                 | dave(1, 1MiB), john(2, 3MiB)
item_B        
| 1                 | peter(1, 1MiB)
item_C        
| 2                 | jack(1, 1MiB), kris(1, 5MiB)

My code is as follows: http://pastebin.com/G8WVzE3e

Wondering if you could take a look and see if my method is a good way of doing so? 
I am having some difficulties of seeing how I can get the dictionaries I have created to 'link' / make use of each other, because say if I am trying to sort it by data size in descending order, it is not working even if I try to sort it by the values..
Pardon me if my code is not fanciful


Justin Israel

unread,
Feb 16, 2017, 5:19:07 PM2/16/17
to python_in...@googlegroups.com
If you want to sort your user list by data size, then you should first create a list of tuples [(size, user), ...] instead of directly formatted it into a joined string. You can sort the list and then join it into a string. Its not really a matter of "linking" dictionaries, since you are currently getting the data you want, just not in the sort order you want.

Eventually if you find yourself creating tons of dictionaries and trying to correlate them together into data sets, it sounds like you are in need of a database, be it in-memory or file based. Something like in-memory sqlite or the pandas data framework (which I haven't used but recall being something suited for this).

Justin


--
You received this message because you are subscribed to the Google Groups "Python Programming for Autodesk Maya" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python_inside_m...@googlegroups.com.

likage

unread,
Feb 16, 2017, 7:50:57 PM2/16/17
to Python Programming for Autodesk Maya
This time round, I tried using tuple like you have mentioned. see,s to be working well but I got an issue, where if there is an existing user, it will collate and summed up the overall data size under 1 user...

In the pastebin, I am expecting item_C to be `kilo [3 (1]` but it seems to add up the values derived from item_B in kilo's case

Justin Israel

unread,
Feb 16, 2017, 11:04:50 PM2/16/17
to Python Programming for Autodesk Maya
Untested, but that looks to be almost exactly the same logic as what I gave you for tracking the counts for each user/product. But instead of tracking a counter you would just accumlate the size:

user_ver_count = defaultdict(lambda: defaultdict(int))
user_ver_size = defaultdict(lambda: defaultdict(int))

...
for vers_name, artist_alias in gen_dict.iteritems():
    ...
    user_ver_count[artist_alias[0]][strip_version_name] += 1
    user_ver_size[artist_alias[0]][strip_version_name] += artist_alias[1]

Also, this line is impossible to read:
users = ', '.join('{0} [{1} ({2}]'.format(alias, convert_size_query(sum(int(i) for i in asset_size_dict.get(alias))), user_ver_count[alias][version_name]) for alias in asset_user_dict[version_name])
It doesn't hurt to break up your logic into individual lines, to help others that can and will end up reading your code. And also to set obscure members of a list into a nicely named local variable  :-)
Justin

--
You received this message because you are subscribed to the Google Groups "Python Programming for Autodesk Maya" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python_inside_m...@googlegroups.com.

likage

unread,
Feb 23, 2017, 3:36:17 PM2/23/17
to Python Programming for Autodesk Maya
Hi Justin,

Sorry for the late reply.

It works like a charm, thanks!


On Thursday, February 16, 2017 at 8:04:50 PM UTC-8, Justin Israel wrote:
Untested, but that looks to be almost exactly the same logic as what I gave you for tracking the counts for each user/product. But instead of tracking a counter you would just accumlate the size:

user_ver_count = defaultdict(lambda: defaultdict(int))
user_ver_size = defaultdict(lambda: defaultdict(int))

...
for vers_name, artist_alias in gen_dict.iteritems():
    ...
    user_ver_count[artist_alias[0]][strip_version_name] += 1
    user_ver_size[artist_alias[0]][strip_version_name] += artist_alias[1]

Also, this line is impossible to read:
users = ', '.join('{0} [{1} ({2}]'.format(alias, convert_size_query(sum(int(i) for i in asset_size_dict.get(alias))), user_ver_count[alias][version_name]) for alias in asset_user_dict[version_name])
It doesn't hurt to break up your logic into individual lines, to help others that can and will end up reading your code. And also to set obscure members of a list into a nicely named local variable  :-)
Justin

On Fri, Feb 17, 2017 at 1:50 PM likage <dissid...@gmail.com> wrote:
This time round, I tried using tuple like you have mentioned. see,s to be working well but I got an issue, where if there is an existing user, it will collate and summed up the overall data size under 1 user...

In the pastebin, I am expecting item_C to be `kilo [3 (1]` but it seems to add up the values derived from item_B in kilo's case

--
You received this message because you are subscribed to the Google Groups "Python Programming for Autodesk Maya" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python_inside_maya+unsub...@googlegroups.com.

Justin Israel

unread,
Feb 23, 2017, 3:39:06 PM2/23/17
to python_in...@googlegroups.com
On Fri, Feb 24, 2017 at 9:36 AM likage <dissid...@gmail.com> wrote:
Hi Justin,

Sorry for the late reply.

It works like a charm, thanks!

Sweet! You are welcome.
 


On Thursday, February 16, 2017 at 8:04:50 PM UTC-8, Justin Israel wrote:
Untested, but that looks to be almost exactly the same logic as what I gave you for tracking the counts for each user/product. But instead of tracking a counter you would just accumlate the size:

user_ver_count = defaultdict(lambda: defaultdict(int))
user_ver_size = defaultdict(lambda: defaultdict(int))

...
for vers_name, artist_alias in gen_dict.iteritems():
    ...
    user_ver_count[artist_alias[0]][strip_version_name] += 1
    user_ver_size[artist_alias[0]][strip_version_name] += artist_alias[1]

Also, this line is impossible to read:
users = ', '.join('{0} [{1} ({2}]'.format(alias, convert_size_query(sum(int(i) for i in asset_size_dict.get(alias))), user_ver_count[alias][version_name]) for alias in asset_user_dict[version_name])
It doesn't hurt to break up your logic into individual lines, to help others that can and will end up reading your code. And also to set obscure members of a list into a nicely named local variable  :-)
Justin

On Fri, Feb 17, 2017 at 1:50 PM likage <dissid...@gmail.com> wrote:
This time round, I tried using tuple like you have mentioned. see,s to be working well but I got an issue, where if there is an existing user, it will collate and summed up the overall data size under 1 user...

In the pastebin, I am expecting item_C to be `kilo [3 (1]` but it seems to add up the values derived from item_B in kilo's case

--
You received this message because you are subscribed to the Google Groups "Python Programming for Autodesk Maya" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python_inside_m...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Python Programming for Autodesk Maya" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python_inside_m...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/python_inside_maya/ccd99399-493e-4616-8c2f-a5ba656e172b%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages