Questions about Rhythm Badge Dataset

34 views
Skip to first unread message

Haizhu Yang

unread,
Oct 21, 2019, 10:22:36 PM10/21/19
to Rhythm Badges
Dear Dr. Lederman,

We are a group of Data Science major seniors at the University of Rochester who are currently working on a project to help URMC analyze the Networking Rhythm Badge Dataset. This dataset was collected during a special-formatted meeting held by URMC and it includes proximity data of attendees collected by the badge system. We are hoping that we could have an accurate understanding of the badge system and the meaning behind the data so that we can have a better start at cleaning and analyzing the data. We heard from our project sponsor URMC that you would be a great resource to reach out to and to learn about the data. Could you help us to clarify some of our questions? Here are our main questions: 

1. What does “count: x” mean in this dataset?
2. Why does one person have multiple records of data in one period of time?
3. What is the unit for the timestamp in log? Some of the time spans for one sector of data is 0.003. Is it possible that this time span is intervened by other signals sent out by other devices?
4. How should we intuitively understand the “rssi” variable in a real-life setting? How is it calculated? For example, does -93 mean 2 feet in real life, etc.  
5. What does “proximity received” for the data? Is there a standard for categorizing whether it is received or not? 

Thank you for your time and looking forward to hearing back from you. 

Haizhu, Yumeng, Ziyu, and Tianyou

Oren Lederman

unread,
Oct 24, 2019, 11:52:59 AM10/24/19
to Rhythm Badges
Hi Haizhu,

Based on your questions, I'm assuming that you are looking at the raw JSON format files. So i'll start by answering your questions, and then direct you to some examples:
1. count: x - the number of times badge A saw badge B during the scanning period (should be a 15 seconds period). The more times A saw B, the better the result. We don't saw the raw readings, but instead you'll get a reading that is the max RSSI (signal strength) during that period. Note that the time is in UTC time (not your local time). You'll need to convert it (see my code examples)

2+3. Each scan (and each record) represents a 15 seconds scan. During this time, badge A might see badge B multiple times. The timestamp represents the beginning of that time window (epoch seconds)

4. RSSI is signal strength, and is... well.. tricky. It's not an exact measure, and if possible, I advise you to try using several different cutoffs to check the power of your results. I tend to use -60 or -62 as the cutoff for people being several feet apart. The exact number depends on the interaction you expect to see and your research question - in a conference (for example) where people stand very close to each other, I would use a lower number because an interaction is more likely to be a real interaction if they are close by. If you want to measure how much time people spend in close proximity (but not necessarily speaking to each other) you can use -60 or -62.

5. "proximity received" is just a type of record (badly named). The other type is audio. Just ignore this field :)

Now, to make your like much, much easier, I strongly recommend you check out the example repositories I've made for processing the data. A very simplistic one can be found here  - https://github.com/HumanDynamics/openbadge-analysis-examples, and in particular, you should check out this example on how to convert the raw data into basic building blocks - https://github.com/HumanDynamics/openbadge-analysis-examples/blob/master/notebooks/hub_proximity_example.ipynb

A much more comprehensive example, taken from the analysis I used for my PhD dissertation can be found here - https://groups.google.com/forum/#!topic/rhythm-badges/gSrMX_-KgnE . In this post, I explain some background info and point to the repository with the code.

One important thing you need to keep in mind is the data cleaning you'll need to do. Assuming you guys followed my instructions on how to deploy the badges, you should have some beacons ("location" badges) that will tell you when the participant badges were at the reception, and when they were picked up by participants. This is important since when the badges are at the reception table, they'll see each other and will appear to be VERY close to each other. My (second) code example shows how to use the list of beacons to do that. The beacons marked as "board" were the reception badges in my case.  Make sure to check the src/data/config.py file for timezone configuration and other useful settings.

If this explanation is a bit confusing, I suggest you start with a simple graph - show the average RSSI between all badges over time. It shoudl show you that at first, the average RSSI was high ( -50 or higher), and then as badges started to leave the reception the average RSSI drops.

Oh, and another important thing to know (because people don't notice it) - when you work with the member-to-member (m2m) tables, remember that they represent a undirected graph. Therefore, each edge will appear only once in the original tables: you'll see A->B, but not B->A . In my more complete analysis (deltav dataset), you might notice that at some point I do create a table that has both A->B and B->A , for convenience (easier to perform a join). I mark these tables as _dbl (double sided). 
Reply all
Reply to author
Forward
0 new messages