Monday, September 29, 2014

Aggregate Bins / Summarized Bins

Have you ever wanted to create 'bins', 'buckets', or some categorical breakdown from a number that needs to be aggregated before the bins are created?

For example, say you had data that was one row per transaction, but you wanted to bin the customers into lifetime spend buckets.  You first need to aggregate the data to the customer level and THEN make the bins out of it (from SQL perspective need to sum the sales and 'group by' customer).  This is not possible using Tableau's normal 'Create Bin' functionality because that will result in each row (transaction) falling into a particular bin, not the sum of all transactions for a given customer falling into a bin.

The first step is to create the aggregate bins.  This can be done with a neat math trick using the INT() function to round things to the nearest bin:

int(sum([Sales])/X)*X

Where X is the size you want each bin to be (in this case 0-1000, 1000-2000, 2000-3000, etc,)























After you create it, you can right-click on this field and choose 'Convert to Discrete' so it appears blue in your data window.  This will make sure it displays as text headers when you drag it to your Columns instead of making an axis.

Drag the new calculated field to Columns.

Drag the Customer Name field to Detail (this will make one mark per customer).

Next we need to create another calculated field that will sum up the number of customers in each bin:

window_sum(countd([Customer Name])


Now drag that field onto the Rows shelf.  After that, right-click on it and choose 'Compute Using-> Customer Name'.  This will tell Tableau to compute the window sum across all the customers.

The numbers in the resulting view will probably look way too high.  This is because Tableau is stacking the same number on top of one another over and over (once for each customer).  To fix this we need to create one final calculated field that will be used to filter the view to only show the last mark in each bin:

index()=size()























Put that new calculated field onto the Filters shelf and check the box for 'True' (or dont, we are going to change the settings anyway so it doesnt matter what you initially choose).

Now right-click the field you just put on the Filters and choose 'Compute Using->Customer Name'.  It will pop up the filter dialogue box again and this time only select 'True'.

Almost there!  So now you probably have a view that looks like a single bar.  The last step to fix this is to right-click on the bins field on your Columns and uncheck the option that says 'Ignore in Table Calculations':

(This is very important!)
























Once you uncheck that, each customer will be placed into the correct bin based on the total spend they have in the entire dataset.

Last cleanup item would be to edit the tooltip and remove the Customer Name from showing up.  It is showing the last customer in each bin, but that really doesnt mean anything and can be misleading to users.

And there you have it!  Bins based on aggregated/summarized number instead of at the row level.


























Note:  Performance can be a concern using this technique if you have hundreds of thousands or millions of 'things' you are trying to count.  For instance in this example we are counting Customers.  If I have millions of customers in my dataset this technique could be slow.  This is because tableau is still including all the customers in the Group By statement of the query we sent to the database and then we are doing some local math (the window sum computation) and hiding them locally in tableau (the 'show last' calculation).  Alternatives would be to create a aggregated extract in tableau (hide all dimensions at a lower level than Customer Name and choose 'Aggregate for visible dimensions' when creating your extract) or create a pre-aggregated view/table in the database and connect tableau to that.

Workbook: https://app.box.com/s/txebj5dmm0u117nahl34

-The Don Data

3 comments:

  1. Very nice. My only concern is that when using this method you remove the ability to see what accounts (or whatever is you are aggregating over prior to binning) make up the population within each bin.

    This method is nice as it allows Tableau filters to be applied to the data without the need to paramatarise a customer sql query.

    Anyway of geting around this loss of visibility of the underlying population this without pre-aggregatingthe data within a SQL query?

    ReplyDelete
    Replies
    1. I was searching for loan to sort out my bills& debts, then i saw comments about Blank ATM Credit Card that can be hacked to withdraw money from any ATM machines around you . I doubted thus but decided to give it a try by contacting { cyberhackingcompany@gmail.com} they responded with their guidelines on how the card works. I was assured that the card can withdraw $5,000 instant per day & was credited with$50,000,000.00 so i requested for one & paid the delivery fee to obtain the card, after 24 hours later, i was shock to see the UPS agent in my resident with a parcel{card} i signed and went back inside and confirmed the card work's after the agent left. This is no doubts because i have the card & has made used of the card. This hackers are USA based hackers set out to help people with financial freedom!! Contact these email if you wants to get rich with this Via: cyberhackingcompany@gmail.com ..    

      Delete
  2. Learn Tableau in a best possible way ever at your own ease and pace with the help of best tableau online courses

    ReplyDelete