Wednesday, October 1, 2014

Correlation in Tableau

I will start this post out by saying I am by no means a statistics expert.  I took a lot of math courses while getting my physics degree, but statistics was never my forte.  However, I know enough to be able to translate mathematical formulas into Tableau's calculated field syntax.

Before getting into the nitty gritty of the math, I just want to point out that Tableau will automatically compute the r-squared value if you add a trend line to any scatterplot (right-click the trend line and choose 'Describe Trend Model'):




















While you can copy/paste this into an annotation or some other calculation, it wont dynamically update when your data updates.  Therefore if we wanted to compute the correlation so we can use it in other places in Tableau we must compute it in Tableau's calculated fields.

First let's start with the formula for correlation ('r'):












Now let's map those to Tableau calculated field functions:
N = size()
Sigma XY = window_sum(sum(X)*sum(Y))
Sigma X = window_sum(sum(X))
Sigma Y = window_sum(sum(Y))
Sigma X^2 = window_sum(sum(X)^2)

Sigma Y^2 = window_sum(sum(Y)^2)

Putting it all together results in a pretty gnarly calc, but it's do-able:



































Once you've written your calculation using whatever your X and Y fields represent then drag your new field onto the Detail shelf to include it in the view.

Note:  since this uses table calculation functions, we must tell Tableau what to compute along. After adding the field to your view, right-click it and choose Compute Using->[dimension that makes each point unique in your scatter].  You could probably also leave it on the default 'Table (Across)' but in case that doesn't work, select the specific dimension that is making all the points in your scatter:
















Now, where to display it?  While it wont let you insert this field into Area or Point annotations, you can insert it into the Title or into a Mark annotation.  If you use the Title, just edit the title of the view and use the Insert button in the upper right to stick the Correlation field in there:
















If you use the mark annotation, just pick any of the marks in your scatter and choose 'Annotate->Mark'.  You can then delete all the other fields in there and just show the correlation.














Lastly, format it to remove the 'line' and 'line end' so it appears as its floating (like an Area annotation):














Final product:

































Workbook:  https://app.box.com/s/tna1v27de61u2odc8l8j

-The Don Data

3 comments:

  1. Nice work. Do you know if there's a possibility to annotate Trend lines in Tableau ?

    ReplyDelete
    Replies
    1. I was searching for loan to sort out my bills& debts, then i saw comments about Blank ATM Credit Card that can be hacked to withdraw money from any ATM machines around you . I doubted thus but decided to give it a try by contacting { cyberhackingcompany@gmail.com} they responded with their guidelines on how the card works. I was assured that the card can withdraw $5,000 instant per day & was credited with$50,000,000.00 so i requested for one & paid the delivery fee to obtain the card, after 24 hours later, i was shock to see the UPS agent in my resident with a parcel{card} i signed and went back inside and confirmed the card work's after the agent left. This is no doubts because i have the card & has made used of the card. This hackers are USA based hackers set out to help people with financial freedom!! Contact these email if you wants to get rich with this Via: cyberhackingcompany@gmail.com ..    

      Delete
  2. This comment has been removed by the author.

    ReplyDelete