Sunday, August 10, 2014

Calculating Median and Percentile

MEDIAN or 50th Percentile

According to Wikipedia the median or 50th percentile is the numerical value separating the higher half of a data sample, a population, or a probability distribution, from the lower half. Some examples:

Set Median / 50th percentile
1,2,3,4,5 3
1,2,3,3,4,5 3
1,2,3,4,4,5 3,5
1,2,3,3,4,4,5 3
1,2,3,3,4,4,5,5 3,5

In this example you see that if the set contains an even number of members the median is the average between the numbers left and right from the centre.

In MDX:

WITH
MEMBER [Measures].[Median Sales Per Week]
as
MEDIAN(
    ([Date].[Calendar Weeks].[Calendar Week],[Measures].[Sales Amount])
    ,[Measures].[Sales Amount])


SELECT
{ [Measures].[Sales Amount]
, [Measures].[Median Sales Per Week]}
ON COLUMNS,
[Date].[Calendar].[Calendar Year].Members on rows
FROM [Adventure Works]

Will get you:

image

Other percentile values

How about 10th, 25th, 75th and 90th percentile?

There are several solutions:

You could use the ASSP stored procedures found here, this is a very nifty library but sadly it isn’t maintained any more.

Better might be to it all in MDX.

25th and 75th percentile

If you take a small step back you see that the 25th percentile is the median between the 1th and 50th percentile, the 75th percentile is the median between the 50th and the 100th percentile.

So if know the number of elements in our set, divide them by 2 then we know the size of our set between the 1th and 50th percentile / 50th and 100th percentile.

If we use BOTTOMCOUNT or TOPCOUNT to isolate that set we only have to take the median from that set.

In MDX:

WITH
MEMBER Measures.[25th Percentile] AS MEDIAN(
    BOTTOMCOUNT(
        (
            NONEMPTY(
                ([Date].[Calendar Weeks].[Calendar Week], [Measures].[Sales Amount])
                )
        )
        ,
            (
                COUNT(
                    NONEMPTY(
                        ([Date].[Calendar Weeks].[Calendar Week], [Measures].[Sales Amount])
                        )
                )
            / 2  )
        ,[Measures].[Sales Amount]
    )
    ,[Measures].[Sales Amount]
)

MEMBER Measures.[75th Percentile] AS MEDIAN(
    TOPCOUNT(
        (
            NONEMPTY(
                ([Date].[Calendar Weeks].[Calendar Week].members, [Measures].[Sales Amount])
                )
        )
        ,
            (
                COUNT(
                    NONEMPTY(
                        ([Date].[Calendar Weeks].[Calendar Week].members, [Measures].[Sales Amount])
                        )
                )
            / 2  )
        ,[Measures].[Sales Amount]
    )
    ,[Measures].[Sales Amount]
)

 

select
{  
  measures.[25th Percentile]
, measures.[75th Percentile]
} on COLUMNS,
{
    ([Date].[Calendar Year].Children,
    [Product].[Product Categories].members)
} on ROWS
from [Adventure Works]

Will get us:

image

10th and 90th percentile

We can use the same trick here if we keep in mind that the 10th percentile is the median of the 1th and 20th and the 90th percentile is the median of the 80th and the 100th percentile. So if we divide our set by 5 you get the bottom and top 20 per cent of our set.

in MDX:

WITH
MEMBER Measures.[10th Percentile] AS MEDIAN(
   BOTTOMCOUNT(
        (
            NONEMPTY(
                ([Date].[Calendar Weeks].[Calendar Week], [Measures].[Sales Amount])
                )
        )
        ,
            (
                COUNT(
                    NONEMPTY(
                        ([Date].[Calendar Weeks].[Calendar Week], [Measures].[Sales Amount])
                        )
                )
            / 5  )
        ,[Measures].[Sales Amount]
    )
    ,[Measures].[Sales Amount]
)

MEMBER Measures.[90th Percentile] AS MEDIAN(
    TOPCOUNT(
        (
            NONEMPTY(
                ([Date].[Calendar Weeks].[Calendar Week].members, [Measures].[Sales Amount])
                )
        )
        ,
            (
                COUNT(
                    NONEMPTY(
                        ([Date].[Calendar Weeks].[Calendar Week].members, [Measures].[Sales Amount])
                        )
                )
            / 5  )
        ,[Measures].[Sales Amount]
    )
    ,[Measures].[Sales Amount]
)

 

select
{  
  measures.[10th Percentile]
, measures.[90th Percentile]
} on COLUMNS,
{
    ([Date].[Calendar Year].Children,
    [Product].[Product Categories].members)
} on ROWS
from [Adventure Works]

will get us:

image

Have Fun And Till Next Time

No comments:

Post a Comment