How To Write A Query Using Self Join On A Table With Huge Data?

June 30, 2023 Post a Comment

With the following table structure: Table 1: Customers CustID(primary key) | CustName(indexed) ---------------------------------- C1 Cust1 C2

Solution 1:

Have you tried analyzing the query with Database Engine Tuning Advisor? No guarantees, but it might provide a useful suggestion or two.

enter image description here

Solution 2:

You can use a single join to get the customfields, and use aggregate functions to transpose them. That way, the join with table CustomFields is only done once.

Like this.

SELECT
  Customers.CustID,
  MAX(CASEWHEN CF.FieldID =1THEN CF.FieldValue ELSENULLEND) AS Field1,
  MAX(CASEWHEN CF.FieldID =2THEN CF.FieldValue ELSENULLEND) AS Field2,
  MAX(CASEWHEN CF.FieldID =3THEN CF.FieldValue ELSENULLEND) AS Field3,
  MAX(CASEWHEN CF.FieldID =4THEN CF.FieldValue ELSENULLEND) AS Field4
  -- Add more...FROM Customers 

  LEFTOUTERJOIN CustomFields CF
  ON CF.ID = Customers.CustID

  WHERE Customers.CustName like'C%'GROUPBY Customers.CustID

Solution 3:

You query is totally reasonable. However, the database structure is off. You should have a separate column for the customer id and for the field id. The way the query is written, you will get as many rows for the CustId as you have fields in the other table.

They shouldn't be combined into a single column. I would expect the join to look like:

leftouterjoin
CustomerFields cfn
on cfn.CustId = Customers.CustId and
   cfn.FieldNum =<n>

Another possibility is that the performance hit is the first time you run the query. If you have enough memory, then it might run much faster the second time -- because the fields table is in memory.

For the record, I have a historical bias against "like", believe that it causes performance problems. This may not help, but you can try:

whereleft(CustName, 1) = 'C'

Solution 4:

SELECT  Pvt.ID, 
            Customers.CustName, 
            Pvt.[1] AS Field1, 
            Pvt.[2] AS Field2, 
            Pvt.[3] AS Field3, 
            Pvt.[4] AS Field4, 
            Pvt.[5] AS Field5,
            ...
    FROM (
            SELECT ID, FieldID, FieldValue
            FROM CustomFields) AS p
            PIVOT (
                MIN (FieldValue)
                FOR FieldID IN ([1], [2], [3], [4], [5], ... )
        ) AS pvt
    inner join Customers ON Customers.CustID = pvt.ID

Spark Sql Blog

How To Write A Query Using Self Join On A Table With Huge Data?

Solution 1:

Solution 2:

Solution 3:

Solution 4:

Post a Comment for "How To Write A Query Using Self Join On A Table With Huge Data?"