D3JS Clustered Force

Mike Bostock is one prolithic mofo.  I don’t know why it took so long to focus on this visualization (probably because there are so many other amazing ones out there), but Clustered Force is really a very nice visualization.  I didn’t really get how nice this visualization really is until I started playing with it and extending it.

In this blog article we will explore what Dex can do with this visualization.  I will try to make amends with Clustered Force for having ignored it for so long.

We will first focus on some artificial mathematical data in order to illustrate some basic capabilities.  From there, we will branch out and explore a few datasets ranging from nutritional value of foods to the scores of football games.

CF_FX

A Mathematical Example

First we create a simple dataset with the following code:

CF_DataGenerator

This script creates 100 rows of three columns of data.  The first column is called X and ranges from 1 to 100.  The next column is “X*X”, which is the product of X multiplied by itself.  The third column is X^3 or X to the third power.

The beginning of the data looks like this:

CF_Data

No surprises, X is incrementing by 1, X*X is growing exponentially, and the third column “X^3” is growing by a power of 3.  When we feed this data into the Clustered Force visualization we see:

CF_FX1

This visualization extracts the numerical data and scales each numeric circle across all of the minimum and maximum values across all of the columns.  The following GUI is available to modify this behavior:

CF_GUI

Scaling the dataset logarithmically, setting the background to black, and the color scale to black we get something which looks totally different:

CF_FX2

Changing the color palette once more and scaling by columns will size each column relative to itself.  This will keep X^3 and X*X from obscuring relative ratios of X.

CF_FX3

Sales Data

Next, we change the flow to read in some sales data.  Feel free to explore the dataset.

CF_Sales

Nutritional Information

Scalability is an issue with a large amount of data (> 1000 data points).   Thresholds will vary with browser.  Chrome runs much faster than the rest.

With this in mind, I have limited the nutritional information to 3 columns; caloric content, fat, and protein.  As with most datasets, the data was sparsely populated so a bit of ETL with Dex was also required to do things such as replacing empty values with zeros, filtering unwanted columns and so forth.

This visualization is comprised of about 4,500 pieces of data, 3,300 of which become moving circles across the screen.  The slowdown is noticeable, but it’s still usable on a high end PC.

CF_Nutrition_FatProteinCalories

Garbage Collection Data

This dataset contains my typical jstat generated data.  It’s fun to explore.

CF_GC

Defense Budgets

This visualization shows defense budgets across the world.

CF_DefenseBudgets

Income Across the World

This visualization shows the per-capita earnings per household, normalized to USD in the Year 2000.  Years determine color categories.  The wide disparity across the world is evident.

CF_WorldIncome

American Football

Last, and least, here are some football scores from 2012.

CF_FootballScores

Until later…

About patmartin

I am a coder and Data Visualization/Machine Learning enthusiast.
This entry was posted in General. Bookmark the permalink.

Leave a comment