Friday, April 20, 2012

BI Rollout: Quick Wins


I was visiting a client recently to assess the status of their business intelligence rollout and plan some next steps.  This was my third visit over a two and a half year span, so one could expect that some progress had been made.  However, it seemed that they had not made as much progress as they should have given the time they have been working with the Argos product and the energy they have put into rolling out the tool.  In their defense, they (the IT folks mainly) were having trouble figuring out the right way to approach the user community to gain buy in and support.  They had been hit with a large number of requests for support from all of the administrative offices.  What this accomplished was a few mediocre reports and datablocks in a large number of areas.  As a result there was a sense of frustration among the user community (and in the IT area as well) and the tool was not being used as extensively as it could be.  As part of my recommendations, I told them to focus on getting some quick wins to help bolster their initiative as a whole and in an attempt to begin building back some momentum for the project; as we all know, adoption = success!
This led to additional discussions around how one can go about getting quick wins.  In my experience, the steps below (while not comprehensive) are a good place to start when thinking about who to approach and how to approach them, with the goal of knocking out a few solutions to gain some project momentum (quick wins).
1.       Focus, Focus, Focus:  The shotgun approach serves only to spread thin scarce resources (resources that, in most cases, are already spread too thin).  The result is a burned out support staff and frustrated users.  So, instead one should focus.  First, establish the departmental segments who will be part of the rollout project.  Then, from that list, try to determine those departments who are excited about the project (or at least not hostile) that also do not have extremely high expectations.  That combination will give your support staff the biggest “bang” for their time spent in support.  Once the “pilot” group is chosen, analyze their needs from the perspective of their business problems.  Another issue with my recent client visit was that the support teams answer to request X was to provide A-Z.  While this certainly included X, it took a long time to deliver because of the complexity involved in providing A-Z.  In the short term, following our focus mantra, it is important to deliver specific solutions to specific business problems.  Remember, our goal here is quick wins and momentum!
2.       Talk about the Problem, not the Solution:  Another difficult thing that my client was dealing with was the fact that the support team was being dictated to what the solution ought to be, rather than being allowed to do their job and take care of the solutions themselves.  This was partially due to the demand of their user groups and partially to the fact that the support team didn’t know how to approach their users in a way that focused them on their problem.  When discussing the needs of your user groups, keep them focused on their problems.  To do this, I often find myself asking questions like “what are you hoping to accomplish?”, “where is your longest process delay?”, “what single part of this process, if automated, would save you the most time?”.  These questions move the focus of the conversation from “I want every possible data attribute related to a student” to “I need to automatically generate a list of students for the current term who are on academic suspension, then email that list to their advisor”.  The latter is a specific problem that is clearly defined.  That information can then be consumed and addressed by the support team in a timely manner.
3.       Develop a Communication/Rollout Plan:  I assure you that momentum will begin to build as the pilot department begins to spread the word of their success.  In order to prevent further frustration from those folks who want support but are not yet part of the rollout project, develop a plan to communicate a logical process and timeline in which they will get the support they need.  While this may not make them happy (if you have to tell them they have to wait for several months), most people understand that resources are limited and will acquiesce.  However, they will only do so if they are told when and how they will eventually get what they need.  The difficult part for the team planning the rollout project is determining the proper order of development, after the pilot group.  Now that momentum is in place, it is my opinion that tackling the most data-driven part of the organization is probably the best plan.  Part of the communication plan should be a clear feedback mechanism for the administrative offices to provide change requests and requests for additional support after their “go live”.  They also need to be made aware of the support changes that may occur as the support team moves to another department after their go live.  In the best cases, a new staff member could be hired inside the department to handle support and development moving forward.

These same steps can be considered with going through any project implementation:  Focus, Talk Problems, and Communicate!  Remember that the measure of your success is not how awesome your BI solutions are, but how often/thoroughly they are used across campus.  Good luck, and happy hunting!

Thursday, January 13, 2011

Oracle's Data Dictionary: The Holy Grail

Well, I just read through my blog, and it appears that I have gone woefully astray.  I promised (in September of 2009...ouch!) to provide a monthly series called Banner tips of the month.  Alas, my goal has been completely forgotten, never fulfilled.  So, here I am again to see what I can do to remedy the situation!  Starting off, with January's "tip of the month" is a post on using Oracle's Data Dictionary.

Terms and Definitions

It surprises me when I go visit a client and their analysts haven't heard or rarely use Oracle's Data Dictionary (ODD).  Wikipedia defines a data dictionary as a "centralized repository of information about data such as meaning, relationships to other data, origin, usage, and format." (http://en.wikipedia.org/wiki/Data_dictionary)  Another important term commonly used in conjunction with data dictionary is "meta data".  Meta data is simply data or information about data.  One of the most commonly used forms of meta data in the world today is the a library's card catalog, where information about the entire library system inventory can be found, categorized, and searched.  Those three words (especially the last) are critical to a good data dictionary.  In any case, back to the main point: using the ODD, we can quickly discover many interesting things about our Oracle database.  But more important than learning about the database itself, is the fact that we can glean valuable information about our Banner system from the ODD!

Here is a short, real-world, example.  I received an email today (the inspiration for this post in fact), requesting any information I may have regarding a function that may help him utilize their CAPP setup more effectively.  After thinking about it for a few minutes, I started typing an email to him indicating that I really didn't have much to aid to give in regards to his specific question.  However, after thinking about it a second more, I thought I'd do a quick search for some possible solutions using.....you guessed it, the ODD on my Banner 8 installation!  After 5 minutes spent querying the dictionary, I found several possible leads to the answer for which he was searching!  So, you're probably wondering, "what was it he was looking for!?".  Alas, I'm not going to spoil the surprise.  I will, however, give you several useful queries and tidbits about the ODD that may come in handy in your own searches along the way.

First, more specifics on the ODD.  The ODD is a series or Oracle views that contain data about the oracle system.  This includes data on all of the views, functions, indexes, packages, tables, procedures, users, and any other database related objects you may care to know about in your system.  So, if we adapt this concept to banner, when we install banner on oracle, the ODD then collects information about all of banner's tables, functions, views, and so on.  So, the next logical step would be to query those ODD views to find the information we need about those stored objects.  So, here we go.

Querying the ODD

The first thing we need to know is the name(s) of the ODD view(s) we are going to query.  For a complete list, you can visit http://www.oracle.com/pls/tahiti/tahiti.catalog_views.  The ones I use most often include the following:
  • all_tables
  • all_views
  • all_objects
  • all_source
  • all_columns
  • all_col_comments
  • and
  • all_tab_comments
Interesting that they all start with "all". There is, in fact, a reason for that, but it's not too important for the purpose of this discussion. If you're interested, you can read more about the ODD at the site I linked at the start of this paragraph. In any case, let's look at a reason you may want to query some of these views.

Let's consider a situation in which you don't know where a piece of data resides within the database (or perhaps IF it even exists at all). The first logical place to look would be all_tables OR all_tab_comments. The reason you would pick all_tab_comments is because you may not know what the name of the column or table is so you can't check all_tables. So, let's say we're looking for a column that contains data about transcripts. Here is a simple query that would return some interesting facts about tables that pertain to transcripts:

select owner
,table_name
,comments
from all_tab_comments
where lower(comments) like '%transcript%';


Now, I'm not kidding when I say that query (on my system) returned no fewer than 37 tables that had comments related to transcripts. However, notice that we also have an owner column as well as table_name and comments. So we could refine the query a little more if we knew or could guess something about what we're looking for. For example, with transcripts, there's a good bet that we only want STUDENT tables, so we could filter on the owner (SATURN) or on the table name (like 'S%' as follows:

select owner
,table_name
,comments
from all_tab_comments
where lower(comments) like '%transcript%'
and table_name like 'S%'
and owner = 'SATURN';


Now, this was a little redundant, I know, but hopefully it got the point across!
So, as you can see, using the ODD to find data location is very handy! Another common practice that is VERY useful and VERY RARELY used is using the data dictionary to find functions and procedures to aid in your reporting and application development. SunGard has spent a great deal of money (well, SCT did really) developing the Banner product and it's myriad of PL/SQL APIs, we may as well use them! Here is a query that could help you find a function to calculate age:

select owner, object_name, object_type
from all_objects
where object_type = 'FUNCTION'
and lower(object_name) like '%age%';


On my system, this query resulted in 8 different functions with the letters 'age' somewhere in the name of the function. However, only 1 function was called: F_CALCULATE_AGE. Now, that only tells us the name of the function. If we wanted to see it's guts to determine if it's something we can use, we could write the following query:

select line, text
from all_source
where name = 'F_CALCULATE_AGE'
and type = 'FUNCTION'
order by line;


This query results in two columns, one that contains the line number for every line in the function code and the other column contains the actual text for the function itself. This feature is invaluable when trying to determine if the function will be useful for whatever problem you're trying to solve.

So, the last thing I want to leave you with is an question.  What if we were to build a DataBlock or two in Argos that "parameterized" a few of these queries and made them available to some of our power users?  Happy Hunting!!!

Tuesday, June 1, 2010

Report Conversion: X to Argos

Well, it's only been a month since my last posting.  That's not great (I'm hoping to get it down to a couple weeks or so), but it's better than it was before.  And honestly, I have an excuse; I got married!  Crazy, I know, but I can't help it.  My wife (weird) is the coolest person in the world.  I'm a lucky guy.  Anyway, onto business.

Presumably, you've purchased Argos and are preparing to migrate legacy reports from your legacy system for use in your new reporting tool.  This is excellent news!  The first opportunity presented to us as a result of this process is a chance to take an inventory of your legacy reports and to DEPRECATE (yes, I said it) reports that are no longer used or will become obsolete with your new system.  As you can imagine, if not handled well, this can be a very difficult process for some people. So, before beating down the doors of your office mates and telling them their reports are going away in 3 months, take a deep breath and simply make a list of everything that exists in your legacy system, when it was last run, how often it's run, and whether it needs current data or can be run against a data warehouse (if available).  Organize them by area of ownership/responsibility (such as Records, Admissions, Financial Aid, Finance, etc), then organize a meeting with the decision makers in each of those areas to prioritize their list. That meeting will be your first opportunity to broach the subject of deprecation.  By the way, I highly recommend NOT using the word deprecate (or any form of it).  Gently encourage each of your clients to recognize the fact that they have not executed or requested report x for 6 months and that they say they're supposed to be running it once a week.  Ask them if they still need it and if the answer is "Yes", ask them why (again, gently...try not to sound like you're accusing them of anything).  Finally, make sure that they understand that "that's the way we've always done it" is not an acceptable answer.

After you've eliminated as many unnecessary reports as possible, it's time to prioritize.  Luckily, this doesn't really fall on you (generally speaking).  Work with your clients again to determine which reports are most important and, where possible, set reasonable deadlines.  Now comes the fun part, and the second most important aspect of your conversion: consolidation.  One of Argos' key strengths is it's ability to combine similar requests into a single object, called a DataBlock.  Imagine a situation in which your first four reports relate to student information.  If you were to encapsulate the data required for said reports into a single DataBlock, then set your report writers onto that DataBlock, you've just reduced your overall workload by 75%!  You're probably grumbling to yourself "well...how often does that actually happen."

I was recently working with a client whose anticipate conversion load was around 450 legacy reports.  After elimination, prioritization, and consolidation, I reduced the expected number of DataBlocks to 78.  And this is generally how things go!  Seeing a reduction in expected output of 50% or more is pretty typical!  So, you might be asking yourself "how" one can consolidate to such a degree.  Firstly, it has to do with technology.  Generally speaking, older reporting tools were designed to facilitate one report per "designable object" (what we refer to as a DataBlock).  As such, it meant expansive menus and large amounts of development resources.  BI tools continue to evolve and are now smarter than ever.  That said, when I work to consolidate like reports, I stick with a concept commonly referred to as "level of granularity".  By this, I simply mean the level at which you do not want duplicate records to appear.  Some common examples include: student, course, financial transaction, financial account, financial aid award, and prospect.  Each of these items has certain attributes that can be reported and often are reported in situations in which you only want one record per item in your report.  That is your "level of granularity".  Here's an example.  Let's say one of your reports has to do with students majoring in Psychology.  The report needs the students name, id, and email address.  Obviously, you don't want more than one row per student, so this is a case where the "student" is your level of granularity.  Now, all we have to do is figure out how to achieve that level and then "bolt on" the appropriate attributes (in this case, ID, name, and email).  It's that simple!

Alright, I think that's enough for now.  This post should have given you some ideas on what I consider the most important aspects of report conversion.  This, in conjunction with my post about development best practices, should put you well on your way to solving all the worlds problems. Happy Hunting!

Wednesday, April 28, 2010

Create a Function Library - Decrease Intellectual Angst and Increase Efficiency

Just like your typical code monkey, I'm constantly looking for ways to reduce my required intellectual output while increasing my overall efficiency. For example, right now, I'm walking on a treadmill, listening to my wedding rehearsal dinner play list to make sure it's all the same feel/volume (aka sans cuss words:)), and updating a blog that I haven't written on in 4 months (I'm very sorry).

As far as coding goes, I have been doing a lot of work with schools who are moving with increasing speed and fervor toward data-based decision making. This requires a much more analytical approach to reporting than your typical "students by major" list, which, in turn requires more database resources and involves a great deal more "logic" with regard to your institution's business rules (often the hardest part). To help aid me in this transition, I have started to compile a library of functions. For the most part, the functions are relatively simple, I just add one whenever I need to answer a new question. For example, I was recently asked by a client what percentage of their new freshmen this fall enrolled full time. My old "hack" self would have sat down at a computer for a couple of hours and written the necessary query to gather that information. BUT WAIT! This time, I wrote a database function that took as parameters term, student type, and time status. Now, to get my pct, this is the query:

select f_enrl_count('201010','N','FT') FT_NewFreshmen,
          f_enrl_count('201010','N','%') Total_NewFreshmen,
          f_enrl_count('201010','N','FT')/
          f_enrl_count('201010','N','%')*100 Pct_New_FT
  from dual;

It's now that simple!  Not only is the query simplified, the overall results come back quicker because I am using pre-compiled code and I can use it from any query tool that connects to this database!  Functions can be combined too.  For example, if you're asked the percentage question a lot and you're tired of writing the query above, you could encapsulate that code in a function called f_enrl_pct that would look like this:

create or replace function
f_enrl_pct(in_term varchar2, in_styp varchar2, in_time_status varchar2)
return number as

 v_output number;

begin

select round(f_enrl_count(in_term,in_styp,in_time_status)/
                    f_enrl_count(in_term,in_styp,'%')*100,2) into v_output
  from dual;

return v_output;
end;

Now when you want that percentage, all you have to do is write the following:

select f_enrl_pct('201010','N','FT') from dual

and WALAH, you're done!  How sweet is that.  Now, you're probably chomping at the bit for the code behind the enrollment function.  Unfortunately, because of the variety of architectures (Baseline, ODS, EDW, home grown data marts), I would rather not post the code here.  However, if you're really interested post a comment and I'll think about sharing:).  Also, if you have developed some neat functions and would like to share, I welcome your comments there as well!  Anyway, the moral of today's story is two fold:

1. Writing a blog entry while on the treadmill takes way longer than it's worth
2. Creating a function library may be the smartest thing you ever do as a sql programmer

As a side note, functions can be used for non analytical reports as well! More on that later...:)

Happy Hunting!!!

Tuesday, December 22, 2009

Argos - Where to Code your Business Logic

If you're anything like me, your first reaction was "wait, isn't 'business logic' an oxymoron!?". I suppose sometimes it is, but in this case, we're dealing with a bit of a different issue. In this case, business logic refers to the ever present technical representation of business process within the tools we use for reporting. For example, we might want to be able to report students who have taken a remedial course. An indicator must be created or found in Banner that will identify those students. The location of the logic that will be responsible for identifying those students is the topic of this posting. Because I have the most experience with the Argos reporting tool, I will be using it as an example, but I am positive the same concepts will apply to other tools. In general the goals helping guide this discussion are as follows:

- Increase Efficiency (both technically and functionally)
- Increase Accuracy
- Maintain Flexibility

Each of the areas that can "house" business logic will be evaluated using these three criteria. Now that that's out of the way, lets talk terminology. In Argos, there are three main arenas that can be used to "contain" business logic: Reports, DataBlocks, and the Database (in the form of stored functions and procedures). Generally speaking, as your organization matures, you'll want to move further away from storing your business logic at the report level.

The three goals stated above are distinctly effected the further from the DataBase you are. For example:

Let's say we're trying to report current, clerical employees. Well, it wouldn't be difficult to write a query that retreived all employees:

select pebempl_pidm
  from pebempl;

Not bad! From a DataBlock design standpoint, this is really easy! However, that puts the responsability for finding current clerical employees on the Report writer, forcing them to add whatever conditions identify such employees at the report level. For the sake of argument, let's assume that an employee class equal to 'CL' and a status of 'A' indicate a current, clerical employee. This means the report writer would have to hard code those conditions into the report object to get the appropriate information.

Now, imagine there are 50 different employee related reports. Each one of those reports must now be maintained and filtered according to whatever subgroup of employees the report writer is looking for. Each of those reports is now serving a single purpose. Additionally, the logic loses a performance edge as it has to be run each time the individual reports are invoked, which doesn't allow the code to take advantage of some of Oracle's built in performance engines (buffer, cache, etc).

So, let's imagine removing that logic from the report level and giving the user a toggle in the datablock that indicates if they want current clerical employees or not. This is good because we can then use a modified query:

select pebempl_pidm
from pebempl
where pebempl_ecls_code = :ecls
and pebempl_status = :status;

This now gives the report writer further flexibility, allowing them to choose different options with which to run a single report; a single report "template" can be used for any combination of status and employee class! This is quite a bit more flexible (as you can see), however, you're still lacking some of the performance benefits of server side code, so the final step would be to move the logic to the database in the form of a stored function! In that case, the query in the DataBlock might look like this:

select pebempl_pidm
from pebempl
where f_include_employee(pebempl_pidm,'flag1','flag2') = 'Y';

In this case, the function f_include_employee() does all the work and simply returns a Y or N to let the system know whether to include the employee or not. This is extensible, reusable code that only has to be written once. Additionally, it is pre-compiled and stored in oracle so it fully utilizes oracle's performance enhancing rules! The code inside the function is completely up to the institution and can contain whatever logic you wish! If you'd like an example of some functions that may be useful, let me know! Happy Hunting!

Wednesday, November 4, 2009

Elusive Data: Addresses and Phone Numbers

First of all, I must apologize again as I have not updated this BLOG in quite awhile. As it turns out, I'm not very good at regular writing....good to know!:) Anyway, on to today's topic.

Much of Banner's data is stored in a manner that can make it difficult to find the most up-to-date information. Good examples of this situation are addresses and phone numbers. It can be very difficult to find an individual's most current (let alone correct) address and phone number, especially when substandard data standards are used for the entry of that information. So, firstly, one must put the proper procedures in place to control accurate entry of contact information. To aid in this effort, we can provide several "audit queries" that will help identify common business problems in our address and phone number records.

Audit Your Data!


In my experience, address and phone errors fall into several different areas:
  1. Addresses that have a "To" date in the past, but are still marked as active.
  2. More than one active address of the same address type.
  3. The "To" date is NULL, but the address is inactive.
  4. The "From" date is NULL.
  5. More than one active telephone number for the same telephone type.
  6. Having an inactive phone number marked as primary.
  7. More than one primary telephone number per individual.

I'm sure many institutions will have other common "opportunities" when it comes to their contact information, but in any case, after those issues are identified, it simply becomes a matter of querying the database on a regular basis (be it weekly, monthly, etc) to locate those records that are "out of compliance". The following queries will aid in finding the issues mentioned above:


--past to dates for active addresses
select spraddr_pidm,
       count(*) address_count
  from spraddr
 where (spraddr_status_ind is null or spraddr_status_ind = 'A')
   and spraddr_to_date < sysdate
 group by spraddr_pidm;


--more than one active address
select spraddr_pidm,
       spraddr_atyp_code,
       count(*) address_count
  from spraddr
 where spraddr_status_ind is null or spraddr_status_ind = 'A'
 group by spraddr_pidm, spraddr_atyp_code
 having count(*) > 1;


select spraddr_pidm,
       count(*) address_count
  from spraddr
 where spraddr_to_date is null
   and spraddr_status_ind <> 'A'
   and spraddr_status_ind is not null
 group by spraddr_pidm;


--null from date
select spraddr_pidm,
       count(*) address_count
  from spraddr
 where spraddr_from_date is null
 group by spraddr_pidm;


--multiple active phones per phone type
select sprtele_pidm,
       sprtele_tele_code,
       count(*) tele_count
  from sprtele
 where sprtele_status_ind is null
    or sprtele_status_ind = 'A'
 group by sprtele_pidm, sprtele_tele_code
 having count(*) > 1;


--individuals with inactive primary tele numbers
select sprtele_pidm,
       count(*) tele_count
  from sprtele
 where sprtele_status_ind is not null
   and sprtele_status_ind <> 'A'
   and sprtele_primary_ind = 'Y'
 group by sprtele_pidm;


select sprtele_pidm,
       count(*) tele_count
  from sprtele
 where sprtele_primary_ind = 'Y'
 group by sprtele_pidm
 having count(*) > 1;


As you can see, the queries are all very simple, but provide powerful results. With the list of pidms generated by each query, one could build an automated audit process for addresses and phone numbers. This would obviously need to be modified to include business rules at your institution, but the concept is the same. Additionally, there is an excellent product out there called Clean Address (by Runner Technologies) that is specifically designed to comb your addresses and verify their accuracy. Its moderately priced, but well worth it when you consider how much money is wasted on postage paid for correspondance that never reaches the intended recipient.

In any case, now that we've made a concerted effort to clean our data and put business processes in place to ensure as consistant and accurate input as possible, it's time to figure out how to query the contact information for an individual and get their most up-to-date information.

The info is IN, now how do we get it OUT?!


This is a difficult topic to cover in a broad sense because the approach you take depends completely on your goals. For example, if you're simply looking for their most current mailing (typically MA = mailing) address, it's a relatively simple process (assuming you've cleaned things up as described above). At that point, the query simply looks for the active address for each entity with the atyp code equal to MA (see below)!


select spraddr_pidm,
       spraddr_street_line1,
       spraddr_street_line2,
       spraddr_city,
       spraddr_stat_code,
       spraddr_zip
  from spraddr
 where spraddr_atyp_code = 'MA'
   and (spraddr_status_ind is null
     or spraddr_status_ind = 'A');


Assuming you've checked to make sure each individual only has ONE active address per address type, this query will work like a charm! You only have to get creative (using from/to dates and sequence numbers) when this isn't the case! NOTE: the same idea holds for telephone numbers.

Unfortunately, we typically want to bring back an address based on a hierarchy. This can be a bit tricky to make happen, but here are a few ideas for a solution! Firstly, Banner provides a form called GTVSDAX in which you can setup institutional hierarchies for things like addresses and telephone numbers. Once that's done, querying the database using a hierarchy becomes trivial; check it out!


select spraddr_pidm,
       spraddr_street_line1,
       spraddr_street_line2,
       spraddr_city,
       spraddr_stat_code,
       spraddr_zip
  from spraddr
 where (spraddr_status_ind is null
     or spraddr_status_ind = 'A')
   and spraddr.rowid = f_get_address_rowid(spraddr_pidm,'ADMDDR','A',SYSDATE,1,'S','');


As you can see, it's simply a matter of using the predifined Banner function and hierarchy to find the correct address for each individual. Alas, some schools don't use or know about the GTVSDAX form and as such require more "technical" answers to this dilemma. I happened to work for one such school myself! While there, I developed an Oracle function that would allow me to get the correct address type for each person based on a hierarchy. Bear in mind that I wrote this a long while ago, so it's not my best work, but it should point you in the right direction!


create or replace FUNCTION F_GET_ADDR_INFO (V_PIDM NUMBER, V_ATYP1 VARCHAR2,
V_ATYP2 VARCHAR2, V_ATYP3 VARCHAR2, V_ATYP4 VARCHAR2, WHICH VARCHAR2) RETURN VARCHAR2 AS

 V_OUTPUT VARCHAR2(500);

BEGIN
  SELECT DECODE(WHICH,'ADDRESS',SPRADDR_STREET_LINE1 || ' ' || SPRADDR_STREET_LINE2 || ' ' || SPRADDR_STREET_LINE3,
                      'CSZ', SPRADDR_CITY || ' ' || SPRADDR_STAT_CODE || ', ' || SPRADDR_ZIP,
                      'ATYP',SPRADDR_ATYP_CODE) INTO V_OUTPUT
    FROM SPRADDR
   WHERE SPRADDR_PIDM = V_PIDM
     AND (SPRADDR_ATYP_CODE = V_ATYP1 OR
          SPRADDR_ATYP_CODE = V_ATYP2 OR
          SPRADDR_ATYP_CODE = V_ATYP4 OR
          SPRADDR_ATYP_CODE = V_ATYP3)
     AND ROWNUM = 1;
  RETURN V_OUTPUT;
END F_GET_ADDR_INFO;


For telephone numbers, we're typically looking for a specific telephone type rather than one type in a hierarchy. That being the case, you can use a "pivot" query to produce the desired results (see my article regarding pivot queries) from last month!

Well, that's all for now. I appreciate your stopping by! Remember, as the old adage goes, "garbage in, garbage out", so always start your data search with an evaluation of your current data and business processes (when possible). This may be painful in the short term, but before long you'll be reaping vast rewards! Happy hunting!!!

Monday, September 7, 2009

Turning Rows into Columns: Pivot Queries

Moving normalized, row based information into usable reports often require the use of what is commonly referred to as a "pivot query" (referring to the "pivot" motion required when changing the data's orientation). For example, Banner stores student test information in a series of rows for each student:

PIDMTESTSCORE
1S1560
1S2650
1S1600

Unfortunately, this is typically not the way one would prefer to report this information. More often, we'd like to see each person's test scores in a single line with multiple columns for each test:

PIDMS1S2
1600650

The second example is much more accessible, allowing the removal of duplicated students, and provides the reader with a more meaningful way to view institutional data (in this case, test scores).

So, How Can this Be Done?!

Well, SQL provides all the tools necessary to make this happen. The two most important tools from our SQL tool belt that will be used for this particular issue will be the DECODE (or CASE) and MAX functions. Let us first take a look at the raw data:

Query 1 - Basic Test Data

Results 1 - Basic Test Data

As you can see, test information is stored in a manner that is difficult to report, but it won't be for long! The next step is to switch the view of the information from row to columner form. To do this, we will check the test code to create each column:

Query 2 - Perform the Pivot
Results 2 - Performing the Pivot

The DECODE function, whose syntax is DECODE(column_name,'X','A','Y','B','Z','C','D'), operates as follows: it checks the column_name value to see if it is equal to 'X'. If it is equal to 'X' then it replaces itself with the value 'A'. If it is not equal to 'X' then it checks to see if it is equal to 'Y'. If it is equal to 'Y' then it replaces itself with the value of 'B' and so on until we get to 'D'. As you probably noticed, 'D' is the odd man out; it has no corresponding return value. This is referred to as the "Default". If column_name doesn't equal 'X','Y', OR 'Z', then the decode statement will return 'D' by "Default". As you can see in the above example, this only got us half way there; we still have more than one row per individual. Also, with test codes, it's possible to have more than one score with the same code. This problem is easily solved with the MAX() aggregate function. If we surround each column (other than the PIDM) with the MAX function, it will only return the highest test score for each test, leaving one row per individual. This is exactly what we want! See the following example:

Query 3 - Finalize the Pivot
Output 3 - Final Pivot
As we expected, this final output is much easier to read than the first (row style) output. However, this technique does have it's draw backs and limitations:

  1. You have to know in advance what you're going to key off to perform the pivot - in this case, we used the test codes.
  2. You have to make a column for every row you want to view as a column. In this case, we only chose 7 tests, but what if you wanted to see every test a student has taken?
Thank you for checking out my blog. Be sure to check back next week for "Race and Ethnicity"! Happy hunting!