How to create a histogram of non-numeric data used for a scale/legend?

600 views
Skip to first unread message

David

unread,
Jun 20, 2017, 1:21:14 PM6/20/17
to vega-js
I'm using Vega 3 and have data in the following format:
[{  "val": [ "X1",  "X2" ] },
 
{  "val": [ "X2" ] },
 
{  "val": [ "X1", "X2", "X3" ] } ]

I'd like to compute a histogram on val, take the top sorted histogram elements, and use the results for a legend and/or a scale range used in an axis.
In the above example, the result would be X2:3 and X1:2 with a top cutoff of 2.

How can this be done (without actually writing a new transform function)?

Thanks!




Roy I

unread,
Jun 22, 2017, 11:35:32 AM6/22/17
to vega-js
The data for Vega v3 must be in a format that Vega v3 supports (see: https://vega.github.io/vega/docs/data/), e.g.
[ {"val": "X1"},
{"val": "X2"},
{"val": "X2"},
{"val": "X1"},
{"val": "X2"},
{"val": "X3"}
]
or:
[ "X1", "X2", "X2", "X1", "X2", "X3" ],


Here is a working Vega v3 spec for plotting ranked frequency bar chart with cutoff using your example data (works in new Vega v3 on-line editor: https://vega.github.io/new-editor/?mode=vega)


Vega spec (v3.0.0-beta.38)
---------------------------------------
{
  "width": 500,
  "height": 300,
  "padding": 5,
 
   "data": [
    { "name": "data_input",
      "values": [ {"val": "X1"},
{"val": "X2"},
{"val": "X2"},
{"val": "X1"},
{"val": "X2"},
{"val": "X3"}
]
    },
{ "name": "data_freq",
 "source": "data_input",
 "transform": [
  {
"type": "aggregate",
"groupby": ["val"],
"fields": ["val"],
"ops": ["count"],
"as": ["frequency"]
}
]
    },
{ "name": "data_freq_rank_filtered",
 "source": "data_freq",
 "transform": [
{ "type": "collect",
"sort": { "field": "frequency",
"op": "max",
"order": "descending"
}
},
{ "type": "rank",
"field": "val"
},
{ "type": "filter",
"expr": "datum.rank < 2"
}
]
    }
  ],
  
   "scales": [
   {  "name": "scale_x",
     "type": "band",
     "range": "width",
     "domain": { "data": "data_freq_rank_filtered",
       "field": "val"
},
"padding": 0.5
    },
    { "name": "scale_y",
      "type": "linear",
      "range": "height",
      "domain": { "data": "data_freq_rank_filtered",
       "field": "frequency"
       },
      "zero": true, 
   "nice": true
    }
  ],

  "axes": [
    { "orient": "bottom", "scale": "scale_x"},
    { "orient": "left", "scale": "scale_y", "tickCount": 3}
  ],

  "marks": [
        { "type": "rect",
          "from": {"data": "data_freq_rank_filtered"},
          "encode": {
            "update": {
           "fill": {"value": "steelblue"},
         "x": {"scale": "scale_x", "field": "val"},
         "width": {"scale": "scale_x", "band": 1},
         "y": {"scale": "scale_y", "field": "frequency"},
         "y2": {"scale": "scale_y", "value": 0}
            }
          }
        }
]

Roy I

unread,
Jun 22, 2017, 5:04:32 PM6/22/17
to vega-js
By the way...

"Histograms are sometimes confused with bar charts. A histogram is used for continuous data, where the bins represent ranges of data, while a bar chart is a plot of categorical variables."






On Tuesday, June 20, 2017 at 1:21:14 PM UTC-4, David wrote:

David

unread,
Jun 23, 2017, 3:39:47 AM6/23/17
to vega-js
Thanks, Roy.
Problem is that the data format is as I specified, and there doesn't seem to be a standard transform to make the change - otherwise, yes, using aggregate would work.
Might have no other option than writing my own transform which changes the pulse.

Roy I

unread,
Jun 23, 2017, 9:54:47 AM6/23/17
to vega-js
Here is an example javascript function that converts your data for Vega:

----------------------------
function convertData(input){
var result = [];
var arr;
for (var i = 0; i < input.length; i++){
arr = input[i]["val"];
for (var j = 0; j < arr.length; j++){
result.push({"val": arr[j]});
}
}
return result;
}

// test
var rawData = [{"val": ["X1","X2"]}, {"val":["X2"]}, {"val": ["X1","X2","X3"]}]; 

var myData = convertData(rawData);

console.log(JSON.stringify(myData));
// [{"val":"X1"},{"val":"X2"},{"val":"X2"},{"val":"X1"},{"val":"X2"},{"val":"X3"}]





On Tuesday, June 20, 2017 at 1:21:14 PM UTC-4, David wrote:

David

unread,
Jun 23, 2017, 5:47:15 PM6/23/17
to vega-js
Thanks again :) The data for the histogram is reactive to events which cause filters on the upstream data, so I can't precalculate  the histogram or make the data conversion.
For example, a legend being clicked filters out some data, and the histogram must be computed on what is left.
I might be able to precalculate the conversion into another data collection, then do lookups inside the filter - though this may greatly impact performance.
Still trying to write a custom transform - thought this is coming to be a great challenge in and of itself :)
Reply all
Reply to author
Forward
0 new messages