block by Kcnarf 096cb061195863d2021e291c56daf16e

weighted KDE & Voronoï maps

Full Screen

This block experiments two things:

  1. computation of Weighted KDE, an extension of standard KDE; in standard KDE, each datum counts for the same amount (i.e. 1), whereas in weighted KDE, each data counts for a specific amount (i.e. its weight); for example, one can make a standard KDE of the number of sales per day, or a weighted KDE of the total sales’ profit per day
  2. fill the weighted KDE curve/area with cells encoding data’s weights (light weight -> small cell, heavy weight -> large cell); the objective is to give a sens of the underlying distribution that produces the weighted KDE; I use the d3-voronoï-map plugin to do so, but:
    1. tweek it so that each site’s x-coord remains unchanged durring the voronoï map computation (see file d”-voronoi-map-fixed-x.js)
    2. define a specific initial positioning function (exactly encodes x-coord, and computes a random y-coord)

Usage : use the controller to hide/show objects, and hover a cell or a bin for details.

Indeed, the underlying dataset does not suit the experimentation. I have to find another one.

==original README==

Kernel density estimation is a method of estimating the probability distribution of a random variable based on a random sample. In contrast to a histogram, kernel density estimation produces a smooth estimate. The smoothness can be tuned via the kernel’s bandwidth parameter. With the correct choice of bandwidth, important features of the distribution can be seen, while an incorrect choice results in undersmoothing or oversmoothing and obscured features.

This example shows a histogram and a kernel density estimation for times between eruptions of Old Faithful Geyser in Yellowstone National Park, taken from R’s faithful dataset. The data follow a bimodal distribution; short eruptions are followed by a wait time averaging about 55 minutes, and long eruptions by a wait time averaging about 80 minutes. In recent years, wait times have been increasing, possibly due to the effects of earthquakes on the geyser’s geohydrology.

This example is based on a Protovis version by John Firebaugh. See also a two-dimensional density estimation of this dataset using d3-contour.