In this lesson we’re going to look at how to ask our tabular model what’s not there. This is a little bit reversed. Normally if we have data in a database we’re asking what is there. We might ask what are the sales by customer for some subcategory that we’ve chosen. But we may be interested in comparing our sales in one category versus another category.
We might want to know that if a customer has purchased something in one category, what is their behavior like in another subcategory? This isn’t that hard to accomplish, we can just list sales in both subcategories and then compare the two. But what if we want something very specific? We want to know: for customers who have purchased something in one subcategory, which of them have purchased nothing in a second subcategory?
Of course through brute force we can get to this as well. We can just list out all the sales for all the customers in every category and start comparing. But what if we want a more elegant solution that creates a list of the exact answer to question 3? We’re going to answer that question in this lesson.
The data model we’ll look at is AdventureWorks, which is downloadable from CodePlex, so you can do this yourself. There are a lot of tables in this model, but we’re just going to look at these. The Internet Sales is the fact table, that’s where we have the sales line items. The Product and Customer Tables are just what they sound like. And the products have subcategories.
For that first question of listing out what are the product sales in a given subcategory, the answer would look like this–very straightforward, very easy. You can accomplish this in many ways with many different database tools.
But what if we want to put next to the total internet sales another column that shows what the sales are in a second subcategory? This is typically not possible to do in this pivot table because of the slicer. The slicer is limiting the entire pivot table to just Mountain Bikes, so we can’t place a measure with the second subcategory in the next column.
To do that, we need to add in another subcategory table, which is really a duplicate of the subcategory table. Then we can do something like this–having two subcategory slicers, and two columns–one for mountain bikes and another for helmets. But we need to do some tricks in our tabular model with DAX to make this work, and we’re going to look at that.
Let’s take a look at the DAX for this measure. CALCULATE essential says “calculate something, and I’m going to modify the scope”. What we’re going to do is to calculate the sum of sales amount from the Internet sales table that part is really straightforward.
Let’s look at the last part which is USERELATIONSHIP. What we’re doing here is turning on an inactive relationship between the ProductSubcategoryKey between the Product and Subcategory2 table. We’re turning that on to make the Excel pivot table slice the data by the Subcategory2 dimension.
The ALL part of this causes the query to ignore any selection made on the Subcategory table–just for the scope of this measure. That’s how we can clear the first filter and set the second filter within the measure.
When we do that, we get this other subcategory sales column, but it’s ignoring this selection of mountain bikes, we we get exactly what we want. We’ll look at that in a minute.
That third question is: “Give me the customers who have sales in mountain bikes, but don’t have sales in helmets.” This bottom part is what we just went through. It’s calculating the sales for the second subcategory (subcategory2).
The top part will calculate the sum of sales amount sliced by the first subcategory.
The double ampersand means “this” and “that”, so if you take this together it means “if the sales in subcategory1 are > 0 AND the sales in subcategory2 are zero, then return a 1 or zero”. This is a flag we’ll use for filtering later.
Now let’s look at what this looks like to setup. Here is our tabular model design. In this design we have the internet sales table and the customer table, which we’ve talked about. And the two subcategory tables. We’ll choose mountain bikes from the first one, and from subcategory2 we’ll choose helmets.
If we look at the relationships, there’s an active relationship to Subcategory. That’s the relationship we use most of the time. We’ve also set an inactive relationship to Subcategory2. It’s definition is the same, but the active checkbox is cleared. That’s the relationship that we’re turning on when we are calculating the sum for subcategory2.
Here’s our other subcategory sales, that’s the same thing we had on the slide. Then the “no purchase” measure is defined here (the same as on the slide).
When we’re in Excel it looks like this. Here’s our standard query, listing out the customers and their sales. But if I want to see what the sales are in helmets for these customers, I could flip back and forth, but it’s too difficult because the customer list is changing.
So what I’ll do instead is choose my “other subcategory sales”. Initially it’ll list out all sales, so the total is bigger than the Internet sales total. Let’s add the slicer on Subcategory2, and format it.
Let’s look at the “helmet example: how many customers have we sold mountain bikes to that haven’t purchased helmets? This one hasn’t, and we want to identify this customer. This customer has not purchased a bike but has purchased a helmet. Some of these have purchased neither product.
That’s a lot of information to go through, so let’s use that final calculation that we designed, and that was the “No Purchase” flag. If we turn that flag on, we’ll have a calculation that will show “1″ for the customers we’re looking for and “0″ for the customers we’re not. We can set a filter so that we list only the “NoPurchase=1″ customers. That will give us just a list of customers who purchased mountain bikes from us, but have not purchased helmets.
We can take this list of customer IDs and send a mailer out to them because we think, “Well, they’ve purchased a mountain bike, and they should get a helmet from us.”