Complicated Sql Query For A Running Total Column

October 27, 2023 Post a Comment

I'm trying to work out a pretty complex query in SQL Server 2008. I'd like some input from SQL experts here. Imagine I had a Payments table with these fields: PaymentID int, Cust

Solution 1:

Your question seems to be this:

SELECT CustomerID, SUM(Ammount) FROMtableWHERE Amount >0GROUPBY CustomerID
SELECT CustomerID, SUM(Ammount) FROMtableGROUPBY CustomerID

However, I think you mean that you want a table that appears like this

Customer  Payment  HighPoint  RunningTotal
123       5        5          5
123       5        10         10
123       -3       10         7

In which case I would create a view with the two selects above so that the view is something like.

SELECT CusotmerID, 
  PaymentDate, 
  Ammount, 
  (SELECT SUM(Ammount) 
    FROM table asALIASWHEREALIAS.Amount > 0ANDALIAS.PaymentDate <= PaymentDate 
      ANDALIAS.CustomerID = CustomerID), 
  (SELECT SUM(Ammount) 
    FROM table asALIASWHEREALIAS.CustomerID = CustomerID 
    ANDALIAS.PaymentDate <= PaymentDate)
FROM table

Also, you may consider a non-unique index on the Amount column of the table to speed up the view.

Solution 2:

The operation is linear in the number of payments for each customer. So, you are going to have to go over each payment, keeping a running total and a high water mark and at the end of all the payments, you will have your answer. Whether you do that in a CLR stored procedure (immediately jumped to mind for me) or use a cursor or temp table or whatever, it's probably not going to be fast.

If you have to run this report over and over again, you should seriously consider keeping a high water mark field and update it (or not) whenever a payment comes in. That way, your report will be trivial -- but this is what data marts are for.

Solution 3:

As an alternative to subqueries, you can use a running total query. Here's how I set one up for this case. First create some test data:

createtable #payments (
    paymentid intidentity,
    customerid int,
    paymentdate datetime,
    amount decimal
)

insertinto #payments (customerid,paymentdate,amount) values (1,'2009-01-01',1.00)
insertinto #payments (customerid,paymentdate,amount) values (1,'2009-01-02',2.00)
insertinto #payments (customerid,paymentdate,amount) values (1,'2009-01-03',-1.00)
insertinto #payments (customerid,paymentdate,amount) values (1,'2009-01-04',2.00)
insertinto #payments (customerid,paymentdate,amount) values (1,'2009-01-05',-3.00)
insertinto #payments (customerid,paymentdate,amount) values (2,'2009-01-01',10.00)
insertinto #payments (customerid,paymentdate,amount) values (2,'2009-01-02',-5.00)
insertinto #payments (customerid,paymentdate,amount) values (2,'2009-01-03',7.00)

Now you can execute the running total query, which calculates the balance for each customer after each payment:

select cur.customerid, cur.paymentdate, sum(prev.amount)
from#payments cur
inner join#payments prevon cur.customerid = prev.customerid
    and cur.paymentdate >= prev.paymentdate
groupby cur.customerid, cur.paymentdate

This generates data:

CustomerPaymentdateBalanceafterpayment12009.01.01112009.01.02312009.01.03212009.01.04412009.01.05122009.01.011022009.01.02522009.01.0312

To look at the maximum, you can do a group by on the running total query:

select customerid, max(balance)
from (
    select cur.customerid, cur.paymentdate, balance = sum(prev.amount)
    from#payments cur
    inner join#payments prevon cur.customerid = prev.customerid
        and cur.paymentdate >= prev.paymentdate
    groupby cur.customerid, cur.paymentdate
) runningtotal
groupby customerid

Which gives:

Customer   Max balance
1          4
2          12

Hope this is useful.

Solution 4:

list = list of amounts ordered by dateforeachin list as amount
  running += amount
  ifrunning >= high
    high = running

To keep it fast, you will require a running total incremented with amount on a trigger, and a high value for each customer (can also be updated by a trigger to make the re-query even simpler).

I don't think you can do this type of thing without code (stored procedures are code)

Solution 5:

like Andomar's answer. You can do the running total for each payment. Then find the max peak payment...

with
rt as (
  select
    Payments.*,
    isnull(sum(p.Amount), 0) + Payments.Amount asrunningfrom
    Payments
    leftouterjoin Payments p on Payments.CustomerID = p.CustomerID
      and p.PaymentDate <= Payments.PaymentDate
      and p.PaymentID < Payments.PaymentID
),
highest as
(
  select
    CustomerID, PaymentID, runningas peak_paid
  from
    rt
  where
    PaymentID = (select top 1 rt2.PaymentID 
        from rt rt2 
        where rt2.CustomerID = rt.CustomerID
        orderby rt2.running desc, rt2.PaymentDate, rt2.PaymentID)
)

select*,
  (selectsum(amount) from Payments where Payments.CustomerID = highest.CustomerID) as total_paid  
from
  highest;

however, since you have around 1 million payments, this could be quite slow. Like others are saying, you would want to store the CustomerID, PaymentID and peak_paid in a separate table. This table could be updated on each Payment insert or as a sqljob.

Updated query to use join instead of subqueries. Since the PaymentDate does not have a time, I filter out multiple payments on the same day by the PaymentId.

Spark Sql Blog