Doing Some Of Columns Based On Some Complex Logic In Pyspark
Here is the question in the image attached: Table: Row Col1 Col2 Col3 Result 1 10 20 100 30 2 20 40 200 60 3 30 60 0 240
Solution 1:
This answers (correctly, I might add) the original version of the question.
In SQL, you can express this using window functions. Use a cumulative sum to define the group and the an additional cumulative sum:
select t.*,
(casewhen col3 <>0then col1 + col2
elsesum(col2 +casewhen col3 =0then col1 else0end) over (partitionby grp orderbyrowdesc)
end) asresultfrom (select t.*,
sum(casewhen col3 <>0then1else0end) over (orderbyrowdesc) as grp
from t
) t;
Here is a db<>fiddle (which uses Postgres).
Note:
Your description says that the else logic should be:
elsesum(col2) over (partitionby grp orderbyrowdesc)
Your example says:
elsesum(col2 + col3) over (partitionby grp orderbyrowdesc)
And in my opinion, this seems most logical:
elsesum(col1 + col2) over (partitionby grp orderbyrowdesc)
Post a Comment for "Doing Some Of Columns Based On Some Complex Logic In Pyspark"