How Do I Grab The “next” Event When The Offset Is Variable For Items That Can Be Repeatedly Processed?
Solution 1:
This is a gaps-and-islands problem, but the islands being defined by a REQ transaction make it a bit more complicated than some.
You could use nested lead and lag functions and some manipulation to get what you need:
selectdistinct item,
coalesce(start_tran,
lag(start_tran) over (partitionby item orderbytimestamp)) as start_tran,
coalesce(end_tran,
lead(end_tran) over (partitionby item orderbytimestamp)) as end_tran,
coalesce(end_time,
lead(end_time) over (partitionby item orderbytimestamp))
-coalesce(start_time,
lag(start_time) over (partitionby item orderbytimestamp)) astimefrom (
select item, timestamp, start_tran, start_time, end_tran, end_time
from (
select item,
timestamp,
casewhen lag_tran isnullor transaction like'REQ%'then transaction endas start_tran,
casewhen lag_tran isnullor transaction like'REQ%'thentimestampendas start_time,
casewhen lead_tran isnullor lead_tran like'REQ%'then transaction endas end_tran,
casewhen lead_tran isnullor lead_tran like'REQ%'thentimestampendas end_time
from (
select item, transaction, timestamp,
lag(transaction)
over (partitionby item orderbytimestamp) as lag_tran,
lead(transaction)
over (partitionby item orderbytimestamp) as lead_tran
from transactions
)
)
where start_tran isnotnullor end_tran isnotnull
)
orderby item, start_tran;
With additional records for a second cycle for items 1 and 2 that could give:
ITEM START_TRAN END_TRAN TIME
---------- ---------- ---------- -----------
1 REQ-A PICKUP 01:53:30.01 REQ-E PICKUP 01:23:30.02 REQ-B MAIL 00:24:13.02 REQ-F REQ-F 00:0:0.03 REQ-C PICKUP 01:46:30.04 REQ-D PULL 00:23:59.05 REQ-A PICKUP 01:43:59.0SQL Fiddle showing all the intermediate steps.
It's not quite as scary as it might look at first glance. The innermost query takes the raw data and adds an extra column for the lead and lag transactions. Taking just the first set of item-1 records that would be:
ITEMTRANSACTIONTIMESTAMPLAG_TRANLEAD_TRAN-----------------------------------------------------------------1REQ-A2014-07-31T09:51:32ZPULL1PULL2014-07-31T10:22:21ZREQ-ATRANSFER1TRANSFER2014-07-31T10:22:23ZPULLARRIVE1ARRIVE2014-07-31T11:45:01ZTRANSFERPICKUP1PICKUP2014-07-31T11:45:02ZARRIVEREQ-ENotice REQ-E popping up as the last lead_tran? That's the first transaction for the second cycle of records for this item, and is going to be useful later. The next level of query uses those lead and lag values and treats REQ values as start and end markers, and uses that information to null out everything except the first and last record for each cycle.
ITEMTIMESTAMPSTART_TRANSTART_TIMEEND_TRANEND_TIME------------------------------------------------------------------------------------------------------12014-07-31T09:51:32ZREQ-A2014-07-31T09:51:32Z12014-07-31T10:22:21Z12014-07-31T10:22:23Z12014-07-31T11:45:01Z12014-07-31T11:45:02ZPICKUP2014-07-31T11:45:02ZThe next level of query removes any rows which are not representing the start or end (or both - see REQ-F in the Fiddle) as we aren't interested in them:
ITEMTIMESTAMPSTART_TRANSTART_TIMEEND_TRANEND_TIME------------------------------------------------------------------------------------------------------12014-07-31T09:51:32ZREQ-A2014-07-31T09:51:32Z12014-07-31T11:45:02ZPICKUP2014-07-31T11:45:02ZWe now have pairs of rows for each cycle (or a single row for REQ-F). The final level uses lead and lag again to fill in the blanks; if the start_tran is null then this is an end-row and we should use the previous row's start data; if end_tran is null then this is a start-row and we should use the next row's end data.
ITEM START_TRAN START_TIME END_TRAN END_TIME TIME1REQ-A2014-07-31T09:51:32ZPICKUP2014-07-31T11:45:02Z01:53:30.01REQ-A2014-07-31T09:51:32ZPICKUP2014-07-31T11:45:02Z01:53:30.0That makes both rows the same, so the distinct removes the duplicates.
Solution 2:
This should give you the same result. I am reproducing Gordon's answer and it still holds good
select item,
min(transaction) keep (dense_rank firstorderbytimestamp) as StartTx,
min(transaction) keep (dense_rank lastorderbytimestamp) as EndTx,
max(timestamp) -min(timestamp)
from transactions t
groupby item;
Even though you have duplicates in the txn, it would be handled by the analytic function. See the documentation for dense rank and the keyword keep.
Post a Comment for "How Do I Grab The “next” Event When The Offset Is Variable For Items That Can Be Repeatedly Processed?"