Using Over(Partition By) When Calculating Median Moving Average Unit Cost
Solution 1:
This one caught my attention, so I'm posting two options:
The first is a straight cte approach, and the second uses temp tables. The cte approach is fine for smaller data sets, but performance suffers as the series expands.
Both options will calculate the RUNNING Min, Max, Mean, Median, and Mode for a data series
Just a couple of items before we get into it. The normalized structure is ID and Measure.
- The ID could be a date or identity.
- The Measure is any numeric value
- Median is the mid-value of the sorted series. If an even number of observations we return the average of the two middle records
- Mode is represented as ModeR1 and ModeR2. If no repeated values, we show the min/max range
OK, let's take a look at the cte Approach
Declare @Table table (ID Int,Measure decimal(9,2))
Insert into @Table (ID,Measure) values
(1,25),
(2,75),
(3,50),
(4,25),
(5,12),
(6,66),
(7,45)
;with cteBase as (Select *,RowNr = Row_Number() over (Order By ID) From @Table),
cteExpd as (Select A.*,Measure2 = B.Measure,ExtRowNr = Row_Number() over (Partition By A.ID Order By B.Measure) From cteBase A Join cteBase B on (B.RowNr<=A.RowNr)),
cteMean as (Select ID,Mean=Avg(Measure2),Rows=Count(*) From cteExpd Group By ID),
cteMedn as (Select ID,MedRow1=ceiling(Rows/2.0),MedRow2=ceiling((Rows+1)/2.0) From cteMean),
cteMode as (Select ID,Mode=Measure2,ModeHits=count(*),ModeRowNr=Row_Number() over (Partition By ID Order By Count(*) Desc) From cteExpd Group By ID,Measure2)
Select A.ID
,A.Measure
,MinVal = min(Measure2)
,MaxVal = max(Measure2)
,Mean = max(B.Mean)
,Median = isnull(Avg(IIF(ExtRowNr between MedRow1 and MedRow2,Measure2,null)),A.Measure)
,ModeR1 = isnull(max(IIf(ModeHits>1,D.Mode,null)),min(Measure2))
,ModeR2 = isnull(max(IIf(ModeHits>1,D.Mode,null)),max(Measure2))
From cteExpd A
Join cteMean B on (A.ID=B.ID)
Join cteMedn C on (A.ID=C.ID)
Join cteMode D on (A.ID=D.ID and ModeRowNr=1)
Group By A.ID
,A.Measure
Order By A.ID
Returns
ID Measure MinVal MaxVal Mean Median ModeR1 ModeR2
1 25.00 25.00 25.00 25.000000 25.000000 25.00 25.00
2 75.00 25.00 75.00 50.000000 50.000000 25.00 75.00
3 50.00 25.00 75.00 50.000000 50.000000 25.00 75.00
4 25.00 25.00 75.00 43.750000 37.500000 25.00 25.00
5 12.00 12.00 75.00 37.400000 25.000000 25.00 25.00
6 66.00 12.00 75.00 42.166666 37.500000 25.00 25.00
7 45.00 12.00 75.00 42.571428 45.000000 25.00 25.00
This cte approach is very light and fast for smaller data series
Now the Temp Table Approach
-- Generate Base Data -- Key ID and Key Measure
Select ID =TR_Date
,Measure=TR_Y10,RowNr = Row_Number() over (Order By TR_Date)
Into #Base
From [Chinrus-Series].[dbo].[DS_Treasury_Rates]
Where Year(TR_Date)>=2013
-- Extend Base Data one-to-many
Select A.*,Measure2 = B.Measure,ExtRowNr = Row_Number() over (Partition By A.ID Order By B.Measure) into #Expd From #Base A Join #Base B on (B.RowNr<=A.RowNr)
Create Index idx on #Expd (ID)
-- Generate Mean for Series
Select ID,Mean=Avg(Measure2),Rows=Count(*) into #Mean From #Expd Group By ID
Create Index idx on #Mean (ID)
-- Calculate Median Row Number(s) -- If even(avg of middle two rows)
Select ID,MednRow1=ceiling(Rows/2.0),MednRow2=ceiling((Rows+1)/2.0) into #Medn From #Mean
Create Index idx on #Medn (ID)
-- Calculate Mode
Select * into #Mode from (Select ID,Mode=Measure2,ModeHits=count(*),ModeRowNr=Row_Number() over (Partition By ID Order By Count(*) Desc,Measure2 Desc) From #Expd Group By ID,Measure2) A where ModeRowNr=1
Create Index idx on #Mode (ID)
-- Generate Final Results
Select A.ID
,A.Measure
,MinVal = min(Measure2)
,MaxVal = max(Measure2)
,Mean = max(B.Mean)
,Median = isnull(Avg(IIF(ExtRowNr between MednRow1 and MednRow2,Measure2,null)),A.Measure)
,ModeR1 = isnull(max(IIf(ModeHits>1,D.Mode,null)),min(Measure2))
,ModeR2 = isnull(max(IIf(ModeHits>1,D.Mode,null)),max(Measure2))
From #Expd A
Join #Mean B on (A.ID=B.ID)
Join #Medn C on (A.ID=C.ID)
Join #Mode D on (A.ID=D.ID and ModeRowNr=1)
Group By A.ID
,A.Measure
Order By A.ID
Returns
ID Measure MinVal MaxVal Mean Median ModeR1 ModeR2
2013-01-02 1.86 1.86 1.86 1.86 1.86 1.86 1.86
2013-01-03 1.92 1.86 1.92 1.89 1.89 1.86 1.92
2013-01-04 1.93 1.86 1.93 1.9033 1.92 1.86 1.93
2013-01-07 1.92 1.86 1.93 1.9075 1.92 1.92 1.92
2013-01-08 1.89 1.86 1.93 1.904 1.92 1.92 1.92
...
2016-07-20 1.59 1.37 3.04 2.2578 2.24 2.20 2.20
2016-07-21 1.57 1.37 3.04 2.257 2.235 2.61 2.61
2016-07-22 1.57 1.37 3.04 2.2562 2.23 2.20 2.20
Both approaches where validated in Excel
I should add that in the final query, you could certainly add/remove items like STD, Total

Post a Comment for "Using Over(Partition By) When Calculating Median Moving Average Unit Cost"