Monday, September 7, 2009

Turning Rows into Columns: Pivot Queries

Moving normalized, row based information into usable reports often require the use of what is commonly referred to as a "pivot query" (referring to the "pivot" motion required when changing the data's orientation). For example, Banner stores student test information in a series of rows for each student:

PIDMTESTSCORE
1S1560
1S2650
1S1600

Unfortunately, this is typically not the way one would prefer to report this information. More often, we'd like to see each person's test scores in a single line with multiple columns for each test:

PIDMS1S2
1600650

The second example is much more accessible, allowing the removal of duplicated students, and provides the reader with a more meaningful way to view institutional data (in this case, test scores).

So, How Can this Be Done?!

Well, SQL provides all the tools necessary to make this happen. The two most important tools from our SQL tool belt that will be used for this particular issue will be the DECODE (or CASE) and MAX functions. Let us first take a look at the raw data:

Query 1 - Basic Test Data

Results 1 - Basic Test Data

As you can see, test information is stored in a manner that is difficult to report, but it won't be for long! The next step is to switch the view of the information from row to columner form. To do this, we will check the test code to create each column:

Query 2 - Perform the Pivot
Results 2 - Performing the Pivot

The DECODE function, whose syntax is DECODE(column_name,'X','A','Y','B','Z','C','D'), operates as follows: it checks the column_name value to see if it is equal to 'X'. If it is equal to 'X' then it replaces itself with the value 'A'. If it is not equal to 'X' then it checks to see if it is equal to 'Y'. If it is equal to 'Y' then it replaces itself with the value of 'B' and so on until we get to 'D'. As you probably noticed, 'D' is the odd man out; it has no corresponding return value. This is referred to as the "Default". If column_name doesn't equal 'X','Y', OR 'Z', then the decode statement will return 'D' by "Default". As you can see in the above example, this only got us half way there; we still have more than one row per individual. Also, with test codes, it's possible to have more than one score with the same code. This problem is easily solved with the MAX() aggregate function. If we surround each column (other than the PIDM) with the MAX function, it will only return the highest test score for each test, leaving one row per individual. This is exactly what we want! See the following example:

Query 3 - Finalize the Pivot
Output 3 - Final Pivot
As we expected, this final output is much easier to read than the first (row style) output. However, this technique does have it's draw backs and limitations:

  1. You have to know in advance what you're going to key off to perform the pivot - in this case, we used the test codes.
  2. You have to make a column for every row you want to view as a column. In this case, we only chose 7 tests, but what if you wanted to see every test a student has taken?
Thank you for checking out my blog. Be sure to check back next week for "Race and Ethnicity"! Happy hunting!

No comments:

Post a Comment