Recommendations for systems for analyzing and visualizing the data

88 views
Skip to first unread message

Oren Lederman

unread,
Jan 4, 2019, 1:15:14 PM1/4/19
to Rhythm Badges
From Kevin Fiscella:

Oren,

I am a member of the  University of Rochester CTSI team. Do you have recommendations for systems for analyzing and visualizing the data, particularly in the context of conferences where there are multiple brief one-on-one interactions and as well as for those involving longer group interactions, eg meetings? We are interested in both individual participant data eg number of unique interactions, cross-disciplinary interactions (based on IDs), average turns per interaction, in addition to network analysis. Also, do you recommend use of stationary badges to identify locations of interactions or do you recommend other approaches to identifying location? 

Kevin Fiscella, MD, MPH
University of Rochester Medical Center  

Oren Lederman

unread,
Jan 4, 2019, 1:41:11 PM1/4/19
to rhythm...@googlegroups.com
Hi Kevin, 

Lots of questions :) I'll try and answer all of them:

Analysis and visualization - we created a basic python analysis package (https://github.com/HumanDynamics/openbadge-analysis) and some examples on how to use it (https://github.com/HumanDynamics/openbadge-analysis-examples). In particular, you'll want to look at this example - https://github.com/HumanDynamics/openbadge-analysis-examples/blob/master/notebooks/hub_proximity_example.ipynb . It uses proximity data from a somewhat similar deployment (3 days workshop). I have a more sophisticated pipeline that I'm using myself that I'll be able to share later on. 

As for visualization - I haven't really visualized the locations of badges, so I don't have any recommendations on this. Some colleagues used simple python with heatmaps and the image of the venue as a background. I have used simple graphs to show times and location (using counts) of when and where people interact. One particularly useful visualization is a heatmap that shows you when badges are active and/or their location over time (using different colors). There are some examples in the notebook I mentioned above, and more examples in repository that I haven't made public yet. Interactions can be visualized using networkx, or using heatmaps that show interaction between people (and if you sort them by groups, it's easier to see inter-group interaction). 

Turns calculation - first, a disclaimer - while the badges can do this, this feature works in relatively quiet environments (meeting rooms). Don't expect it to work well in a noisy environment (open spaces, large gatherings). It might, with a high enough threshold and data loss, but I haven't tried that yet. Also, the code supporting this feature is one of the oldest parts of our analysis code and is a bit messy (in particular the hard coded timezone used in the function that reads the raw data). You can find examples on how to handle audio data here - https://github.com/HumanDynamics/openbadge-analysis-examples/blob/master/notebooks/meeting_simple_plots.ipynb (focus on the sample2data and make_df_stitched functions). There is also a new voice activity detection function (VAD) created by my colleague that is much better than the one I use in my examples, but we haven't fully integrated it into the pipeline. It seems to be much better though, and supports overlaps and interruptions. There is a detailed explanation (with code examples) on how to use it - https://github.com/HumanDynamics/openbadge-analysis-examples/blob/master/notebooks/multi-channel_VAD_illustration.ipynb 

Stationary badges - we call these location beacons, and there is built-in support in the system for these.  When you setup a project in the server, you'll see that you can add members (participants) and location beacons. They get a different range of ids, and are given a priority in the scans that the badges do to ensure we get location data. I use a very simply location detection, and simply determine the location using the closest beacon (after some smoothing). 

A very important example of a location beacon is what I call "board beacon" - these are beacons I place next to where badges are stored, handed out, and returned. It helps keeping track of when badges are actually being used. Note that it's not an integrated part of the system - it's something I implement in my specific analysis (that I can share). I recommend having multiple board beacons, to ensure they show up in scans.

Home this helps. Let me know if you have any more questions.

Oren

Oren Lederman

unread,
Jan 4, 2019, 2:06:05 PM1/4/19
to Rhythm Badges
Also, a few more notes on analysis of proximity data - 
  • There is no hidden magic in analyzing the data. Proximity data using bluetooth (or RF in general) is very noisy. Our tests (and others' research papers) show that even stationary badges show different signal strength over time due to reflections, interference with wifi and even change in temperature
  • The two approaches that we have used for determining interactions are:
    • Using a threshold. The correct threshold is based on the layout of the venue and "density" of people. For example, in my recent analysis in an accelerator program, people spent most of the time in two large open spaces, sitting very close to each other. This make it difficult to tell what's an interaction and what's not. I had to use a relatively low threshold (allowing for false positives) and measure times when people "hang out" together rather than interact with each other
    • Closest person. In a networking event, my colleagues assumed you interact with a single person at a time and used the data to determine the person closest to you
  • There are other ways to handle and model the data, but I haven't tried them yet. In particular you could use 5-minute time bins and high threshold in order to intensify temporal grouping of people


On Friday, January 4, 2019 at 10:15:14 AM UTC-8, Oren Lederman wrote:

Bo Doan

unread,
May 16, 2019, 2:19:09 PM5/16/19
to Rhythm Badges
Hi Oren,

I've been trying to visualize audio data collected from the development mode. My question is how could I visualize speaking contribution in bar chart instead of stacked area (contribution.py)? I am under a time constraint that I cannot really sit down and play with the code - it would be nice if the analysis package provides something I can use. A side note: I have been playing with the example code (in the example analysis repository) on my data, everything looks neat. 

(By the way, I did try to change the configuration of the server as you suggested in a thread I started in the past, the server looks like it is up and running, but I have not had time to actually record data with server mode yet. My apology for not getting you updated.)

Thank you.
Bo

Oren Lederman

unread,
May 16, 2019, 6:32:49 PM5/16/19
to Rhythm Badges

Something like this? If so, I just pushed an update example to the examples repo (https://github.com/HumanDynamics/openbadge-analysis-examples/blob/master/notebooks/meeting_simple_plots.ipynb).

If that's not what you are looking for, please attach a sketch showing what you need and I'll try to help

participation_bars.png

Bo Doan

unread,
May 20, 2019, 1:43:40 AM5/20/19
to Rhythm Badges

Thank you so much for such a quick reply. Adding to the bar plot, how can I have separated graph for each time interval (sketch below, y-axis can be time or percentage of contribution)? Our experiment have 4 sub-discussion with breaks in-between for questionnaires, so it will be nice if we can some how see the change in speaking contribution over the course of time. We are interested, at least for the data we have right now from our pilot, how storytelling can "bring people closer to each other".

examples_openbadge.png

Also, I tried to apply the turning taking example code (find genuine, real speakers) into our data, but I ended up with this errors. Could you take a look and guide me through what I should do? I think my data file, which can be found in the attachment, slightly differs from the example data, don't know whether this should be a problem though.

'''

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-6-6d76904d4345> in <module>()
----> 1 df_spk_mean, df_spk_std = get_spk_genuine(df_flt, thre)

c:\users\dell\appdata\local\programs\python\python27\lib\site-packages\openbadge_analysis-0.4.4-py2.7.egg\openbadge_analysis\preprocessing\audio.pyc in get_spk_genuine(df_meet, thre)
   156     get genuine spk
   157     """
--> 158     df_meet_sec = get_meet_sec(df_meet)
   159     df_cor = df_meet_sec.groupby(df_meet_sec.index).corr().dropna()
   160     df_cor = pd.DataFrame((df_cor >= thre).T.all())

c:\users\dell\appdata\local\programs\python\python27\lib\site-packages\openbadge_analysis-0.4.4-py2.7.egg\openbadge_analysis\preprocessing\audio.pyc in get_meet_sec(df_meet)
    21     """
    22     df_meet_sec = df_meet.copy()
---> 23     df_meet_sec.index = df_meet_sec.index.map(lambda x: x.replace(microsecond=0))
    24     return df_meet_sec
    25 

c:\users\dell\appdata\local\programs\python\python27\lib\site-packages\pandas\core\indexes\base.pyc in map(self, mapper)
  2774         """
  2775         from .multi import MultiIndex
-> 2776         mapped_values = self._arrmap(self.values, mapper)
  2777         attributes = self._get_attributes_dict()
  2778         if mapped_values.size and isinstance(mapped_values[0], tuple):

pandas\_libs\algos_common_helper.pxi in pandas._libs.algos.arrmap_object (pandas\_libs\algos.c:31954)()

c:\users\dell\appdata\local\programs\python\python27\lib\site-packages\openbadge_analysis-0.4.4-py2.7.egg\openbadge_analysis\preprocessing\audio.pyc in <lambda>(x)
    21     """
    22     df_meet_sec = df_meet.copy()
---> 23     df_meet_sec.index = df_meet_sec.index.map(lambda x: x.replace(microsecond=0))
    24     return df_meet_sec
    25 

AttributeError: 'tuple' object has no attribute 'replace'

'''

Best regards.
audio_data.txt

Oren Lederman

unread,
May 20, 2019, 3:39:55 PM5/20/19
to Rhythm Badges
Hi Bo,

I can't tell what the problem is without the rest of your code, but it doesn't look like a data problem but a coding error. I don't have time to debug this or make the figures for you either... that's something you'll need to figure out.
I can say that if you want to calculate the actual time people talked, each time slot represents a 50ms window. So if you multiply the counts by 50ms you'll get the total time each person spoke.

Oren
Reply all
Reply to author
Forward
0 new messages