Tuesday, December 22, 2009

Argos - Where to Code your Business Logic

If you're anything like me, your first reaction was "wait, isn't 'business logic' an oxymoron!?". I suppose sometimes it is, but in this case, we're dealing with a bit of a different issue. In this case, business logic refers to the ever present technical representation of business process within the tools we use for reporting. For example, we might want to be able to report students who have taken a remedial course. An indicator must be created or found in Banner that will identify those students. The location of the logic that will be responsible for identifying those students is the topic of this posting. Because I have the most experience with the Argos reporting tool, I will be using it as an example, but I am positive the same concepts will apply to other tools. In general the goals helping guide this discussion are as follows:

- Increase Efficiency (both technically and functionally)
- Increase Accuracy
- Maintain Flexibility

Each of the areas that can "house" business logic will be evaluated using these three criteria. Now that that's out of the way, lets talk terminology. In Argos, there are three main arenas that can be used to "contain" business logic: Reports, DataBlocks, and the Database (in the form of stored functions and procedures). Generally speaking, as your organization matures, you'll want to move further away from storing your business logic at the report level.

The three goals stated above are distinctly effected the further from the DataBase you are. For example:

Let's say we're trying to report current, clerical employees. Well, it wouldn't be difficult to write a query that retreived all employees:

select pebempl_pidm
  from pebempl;

Not bad! From a DataBlock design standpoint, this is really easy! However, that puts the responsability for finding current clerical employees on the Report writer, forcing them to add whatever conditions identify such employees at the report level. For the sake of argument, let's assume that an employee class equal to 'CL' and a status of 'A' indicate a current, clerical employee. This means the report writer would have to hard code those conditions into the report object to get the appropriate information.

Now, imagine there are 50 different employee related reports. Each one of those reports must now be maintained and filtered according to whatever subgroup of employees the report writer is looking for. Each of those reports is now serving a single purpose. Additionally, the logic loses a performance edge as it has to be run each time the individual reports are invoked, which doesn't allow the code to take advantage of some of Oracle's built in performance engines (buffer, cache, etc).

So, let's imagine removing that logic from the report level and giving the user a toggle in the datablock that indicates if they want current clerical employees or not. This is good because we can then use a modified query:

select pebempl_pidm
from pebempl
where pebempl_ecls_code = :ecls
and pebempl_status = :status;

This now gives the report writer further flexibility, allowing them to choose different options with which to run a single report; a single report "template" can be used for any combination of status and employee class! This is quite a bit more flexible (as you can see), however, you're still lacking some of the performance benefits of server side code, so the final step would be to move the logic to the database in the form of a stored function! In that case, the query in the DataBlock might look like this:

select pebempl_pidm
from pebempl
where f_include_employee(pebempl_pidm,'flag1','flag2') = 'Y';

In this case, the function f_include_employee() does all the work and simply returns a Y or N to let the system know whether to include the employee or not. This is extensible, reusable code that only has to be written once. Additionally, it is pre-compiled and stored in oracle so it fully utilizes oracle's performance enhancing rules! The code inside the function is completely up to the institution and can contain whatever logic you wish! If you'd like an example of some functions that may be useful, let me know! Happy Hunting!

Wednesday, November 4, 2009

Elusive Data: Addresses and Phone Numbers

First of all, I must apologize again as I have not updated this BLOG in quite awhile. As it turns out, I'm not very good at regular writing....good to know!:) Anyway, on to today's topic.

Much of Banner's data is stored in a manner that can make it difficult to find the most up-to-date information. Good examples of this situation are addresses and phone numbers. It can be very difficult to find an individual's most current (let alone correct) address and phone number, especially when substandard data standards are used for the entry of that information. So, firstly, one must put the proper procedures in place to control accurate entry of contact information. To aid in this effort, we can provide several "audit queries" that will help identify common business problems in our address and phone number records.

Audit Your Data!


In my experience, address and phone errors fall into several different areas:
  1. Addresses that have a "To" date in the past, but are still marked as active.
  2. More than one active address of the same address type.
  3. The "To" date is NULL, but the address is inactive.
  4. The "From" date is NULL.
  5. More than one active telephone number for the same telephone type.
  6. Having an inactive phone number marked as primary.
  7. More than one primary telephone number per individual.

I'm sure many institutions will have other common "opportunities" when it comes to their contact information, but in any case, after those issues are identified, it simply becomes a matter of querying the database on a regular basis (be it weekly, monthly, etc) to locate those records that are "out of compliance". The following queries will aid in finding the issues mentioned above:


--past to dates for active addresses
select spraddr_pidm,
       count(*) address_count
  from spraddr
 where (spraddr_status_ind is null or spraddr_status_ind = 'A')
   and spraddr_to_date < sysdate
 group by spraddr_pidm;


--more than one active address
select spraddr_pidm,
       spraddr_atyp_code,
       count(*) address_count
  from spraddr
 where spraddr_status_ind is null or spraddr_status_ind = 'A'
 group by spraddr_pidm, spraddr_atyp_code
 having count(*) > 1;


select spraddr_pidm,
       count(*) address_count
  from spraddr
 where spraddr_to_date is null
   and spraddr_status_ind <> 'A'
   and spraddr_status_ind is not null
 group by spraddr_pidm;


--null from date
select spraddr_pidm,
       count(*) address_count
  from spraddr
 where spraddr_from_date is null
 group by spraddr_pidm;


--multiple active phones per phone type
select sprtele_pidm,
       sprtele_tele_code,
       count(*) tele_count
  from sprtele
 where sprtele_status_ind is null
    or sprtele_status_ind = 'A'
 group by sprtele_pidm, sprtele_tele_code
 having count(*) > 1;


--individuals with inactive primary tele numbers
select sprtele_pidm,
       count(*) tele_count
  from sprtele
 where sprtele_status_ind is not null
   and sprtele_status_ind <> 'A'
   and sprtele_primary_ind = 'Y'
 group by sprtele_pidm;


select sprtele_pidm,
       count(*) tele_count
  from sprtele
 where sprtele_primary_ind = 'Y'
 group by sprtele_pidm
 having count(*) > 1;


As you can see, the queries are all very simple, but provide powerful results. With the list of pidms generated by each query, one could build an automated audit process for addresses and phone numbers. This would obviously need to be modified to include business rules at your institution, but the concept is the same. Additionally, there is an excellent product out there called Clean Address (by Runner Technologies) that is specifically designed to comb your addresses and verify their accuracy. Its moderately priced, but well worth it when you consider how much money is wasted on postage paid for correspondance that never reaches the intended recipient.

In any case, now that we've made a concerted effort to clean our data and put business processes in place to ensure as consistant and accurate input as possible, it's time to figure out how to query the contact information for an individual and get their most up-to-date information.

The info is IN, now how do we get it OUT?!


This is a difficult topic to cover in a broad sense because the approach you take depends completely on your goals. For example, if you're simply looking for their most current mailing (typically MA = mailing) address, it's a relatively simple process (assuming you've cleaned things up as described above). At that point, the query simply looks for the active address for each entity with the atyp code equal to MA (see below)!


select spraddr_pidm,
       spraddr_street_line1,
       spraddr_street_line2,
       spraddr_city,
       spraddr_stat_code,
       spraddr_zip
  from spraddr
 where spraddr_atyp_code = 'MA'
   and (spraddr_status_ind is null
     or spraddr_status_ind = 'A');


Assuming you've checked to make sure each individual only has ONE active address per address type, this query will work like a charm! You only have to get creative (using from/to dates and sequence numbers) when this isn't the case! NOTE: the same idea holds for telephone numbers.

Unfortunately, we typically want to bring back an address based on a hierarchy. This can be a bit tricky to make happen, but here are a few ideas for a solution! Firstly, Banner provides a form called GTVSDAX in which you can setup institutional hierarchies for things like addresses and telephone numbers. Once that's done, querying the database using a hierarchy becomes trivial; check it out!


select spraddr_pidm,
       spraddr_street_line1,
       spraddr_street_line2,
       spraddr_city,
       spraddr_stat_code,
       spraddr_zip
  from spraddr
 where (spraddr_status_ind is null
     or spraddr_status_ind = 'A')
   and spraddr.rowid = f_get_address_rowid(spraddr_pidm,'ADMDDR','A',SYSDATE,1,'S','');


As you can see, it's simply a matter of using the predifined Banner function and hierarchy to find the correct address for each individual. Alas, some schools don't use or know about the GTVSDAX form and as such require more "technical" answers to this dilemma. I happened to work for one such school myself! While there, I developed an Oracle function that would allow me to get the correct address type for each person based on a hierarchy. Bear in mind that I wrote this a long while ago, so it's not my best work, but it should point you in the right direction!


create or replace FUNCTION F_GET_ADDR_INFO (V_PIDM NUMBER, V_ATYP1 VARCHAR2,
V_ATYP2 VARCHAR2, V_ATYP3 VARCHAR2, V_ATYP4 VARCHAR2, WHICH VARCHAR2) RETURN VARCHAR2 AS

 V_OUTPUT VARCHAR2(500);

BEGIN
  SELECT DECODE(WHICH,'ADDRESS',SPRADDR_STREET_LINE1 || ' ' || SPRADDR_STREET_LINE2 || ' ' || SPRADDR_STREET_LINE3,
                      'CSZ', SPRADDR_CITY || ' ' || SPRADDR_STAT_CODE || ', ' || SPRADDR_ZIP,
                      'ATYP',SPRADDR_ATYP_CODE) INTO V_OUTPUT
    FROM SPRADDR
   WHERE SPRADDR_PIDM = V_PIDM
     AND (SPRADDR_ATYP_CODE = V_ATYP1 OR
          SPRADDR_ATYP_CODE = V_ATYP2 OR
          SPRADDR_ATYP_CODE = V_ATYP4 OR
          SPRADDR_ATYP_CODE = V_ATYP3)
     AND ROWNUM = 1;
  RETURN V_OUTPUT;
END F_GET_ADDR_INFO;


For telephone numbers, we're typically looking for a specific telephone type rather than one type in a hierarchy. That being the case, you can use a "pivot" query to produce the desired results (see my article regarding pivot queries) from last month!

Well, that's all for now. I appreciate your stopping by! Remember, as the old adage goes, "garbage in, garbage out", so always start your data search with an evaluation of your current data and business processes (when possible). This may be painful in the short term, but before long you'll be reaping vast rewards! Happy hunting!!!

Monday, September 7, 2009

Turning Rows into Columns: Pivot Queries

Moving normalized, row based information into usable reports often require the use of what is commonly referred to as a "pivot query" (referring to the "pivot" motion required when changing the data's orientation). For example, Banner stores student test information in a series of rows for each student:

PIDMTESTSCORE
1S1560
1S2650
1S1600

Unfortunately, this is typically not the way one would prefer to report this information. More often, we'd like to see each person's test scores in a single line with multiple columns for each test:

PIDMS1S2
1600650

The second example is much more accessible, allowing the removal of duplicated students, and provides the reader with a more meaningful way to view institutional data (in this case, test scores).

So, How Can this Be Done?!

Well, SQL provides all the tools necessary to make this happen. The two most important tools from our SQL tool belt that will be used for this particular issue will be the DECODE (or CASE) and MAX functions. Let us first take a look at the raw data:

Query 1 - Basic Test Data

Results 1 - Basic Test Data

As you can see, test information is stored in a manner that is difficult to report, but it won't be for long! The next step is to switch the view of the information from row to columner form. To do this, we will check the test code to create each column:

Query 2 - Perform the Pivot
Results 2 - Performing the Pivot

The DECODE function, whose syntax is DECODE(column_name,'X','A','Y','B','Z','C','D'), operates as follows: it checks the column_name value to see if it is equal to 'X'. If it is equal to 'X' then it replaces itself with the value 'A'. If it is not equal to 'X' then it checks to see if it is equal to 'Y'. If it is equal to 'Y' then it replaces itself with the value of 'B' and so on until we get to 'D'. As you probably noticed, 'D' is the odd man out; it has no corresponding return value. This is referred to as the "Default". If column_name doesn't equal 'X','Y', OR 'Z', then the decode statement will return 'D' by "Default". As you can see in the above example, this only got us half way there; we still have more than one row per individual. Also, with test codes, it's possible to have more than one score with the same code. This problem is easily solved with the MAX() aggregate function. If we surround each column (other than the PIDM) with the MAX function, it will only return the highest test score for each test, leaving one row per individual. This is exactly what we want! See the following example:

Query 3 - Finalize the Pivot
Output 3 - Final Pivot
As we expected, this final output is much easier to read than the first (row style) output. However, this technique does have it's draw backs and limitations:

  1. You have to know in advance what you're going to key off to perform the pivot - in this case, we used the test codes.
  2. You have to make a column for every row you want to view as a column. In this case, we only chose 7 tests, but what if you wanted to see every test a student has taken?
Thank you for checking out my blog. Be sure to check back next week for "Race and Ethnicity"! Happy hunting!

Sunday, September 6, 2009

Writing Series: Banner Tips of the Month

Hello everyone. I apologize for the lack of writing over the last couple months. To help remedy the situation, I will be establishing a monthly series: banner tips of the month. Every first monday, I will post a new entry for each of these topics, so come back soon! Happy hunting!!!

Thursday, May 14, 2009

Report Query Development Cookbook

Report development, independent of tool can be a very daunting task. The following pages are designed to give you a step by step process when developing report queries that (in my experience) has led to fewer errors, a more efficient development process, and more accurate output. Before we get into it though, I apologize for the lack of material over the past couple of months; starting a BLOG is one thing, keeping it up is quite another. I hope that you find the information in this post useful! Happy hunting!

The Example


The following request is a recent request I worked on with a client. It is moderately complex but still lends itself well to the topic at hand. The request was as follows:
Financial Aid Applicant Profile: All financial aid applicants for a selected term and aid year. Please include the applicant's name, ID, ISIR recieved date, verification message, EFC, college, student level, package complete date, dependency status, campus, offered amount, accepted amount, paid amount, unmet need, gender, ethnicity, total credits, and overall GPA.
As you've probably already surmised, this query was not trivial. However, when approached using the framework in the following pages, it becomes a much less daunting endeavor!

Step 1 - The Population


It is imperative that we always begin any query with the population. Simply (hah! yeah right!) identify a list of PIDMS that exactly corresponds to the number of individuals you're trying to report. In the case above, this means anyone who has applied for financial aid during a given aid year. The following query is what will gather the group in quesiton:

select spriden_id ID,
spriden_last_name LastName,
spriden_first_name FirstName
from rcrapp1, spriden
where spriden_pidm = rcrapp1_pidm
and spriden_change_ind is null
and rcrapp1_aidy_code = :aidy
and rcrapp1_infc_code = 'ISIR'
and rcrapp1_curr_rec_ind = 'Y';

As you can see, the query is very simple, although it's missing all the attributes (college, level, etc) that make this report useful. However, we now have the foundation against which all additions can be compared. Additionally, we can "spot check" a couple of IDs to make sure that this population is indeed correct (it may even be a good idea to contact the requestor and have them "sniff test" the number in the group before proceeding further. The last thing to do before adding information is to count the number of records returned. Often, this can be done using built in functionality in whatever tool you're using to write your query. For the purposes of this example, let's assume that the query above returned 1200 applicants for 0809. Proceed to step 2!

Step 2 - Adding Attributes


After completing step one, completing the report becomes a matter of "bolting on" additional information. This should be done very methodically and should start with requested attributes that can already be found in the table(s) already being used to gather the population. In this case, we're already using RCRAPP1 and SPRIDEN, so we should go through the request and look for any other attributes that come from either of those two tables. In this case there are a couple of attributes that can be added from the RCRAPP1 table:

  • ISIR Recieved Date (RCRAPP1_RCPT_DATE)

  • and Verification Message (RCRAPP1_VERIFICATION_MSG)


Because the information requested already exists in the table(s) used, there is no need for additional join conditions. This makes adding the data itself trivial:

select spriden_id ID,
spriden_last_name LastName,
spriden_first_name FirstName,
rcrapp1_rcpt_date ISIR_Recd_Date,
rcrapp1_verification_msg Ver_Msg
from rcrapp1, spriden
where spriden_pidm = rcrapp1_pidm
and spriden_change_ind is null
and rcrapp1_aidy_code = :aidy
and rcrapp1_infc_code = 'ISIR'
and rcrapp1_curr_rec_ind = 'Y';

Notice, there have been no changes to the WHERE clause of the statment and no new tables have been joined to the query, so the number of individuals selected should still be the same. SHOULD BE THE SAME. At this point, it's time to check. Rerun the query and make sure that your record count has not changed. This process will be repeated after each new set of attributes are added. Now, let's see what we have left:

  • EFC (RCRAPP4_SAR_EFC)

  • college (SGBSTDN_COLL_CODE)

  • student level (SGBSTDN_LEVL_CODE)

  • package complete date (RORSTAT_PCKG_COMP_DATE)

  • dependency status (RCRAPP2_C_DEPEND_STATUS)

  • campus (SGBSTDN_CAMP_CODE)

  • offered amount (RPRAWRD_OFFER_AMT

  • accepted amount (RPRAWRD_ACCEPT_AMT)

  • paid amount (RPRAWRD_PAID_AMT)

  • unmet need (RORSTAT_UNMET_NEED)

  • gender (SPBPERS_SEX)

  • ethnicity (SPBPERS_ETHN_CODE)

  • total credits (SUM(SFRSTCR_CREDIT_HR)

  • and overall GPA (SHRLGPA_GPA)


Notice in the above list, that I also tabulated the fields in which the data in question can be found. It may be beneficial to do this yourselves as that task can often be a difficult one. Having the list ready and waiting will make things much smoother when it comes to writing the actual SQL.
At this point, I would typically pick out what I feel are the easiest tables to add and start with them. In most cases, tables with a "B" in the third character are a good place to start. However, SGBSTDN is an exception to that rule (go figure) becuase it is EFFECTIVE DATED. Have no fear though, SPBPERS is very easy to add because the PIDM is the primary key, meaning that only one instance of any given PIDM (student) can be in the table at a time. That being the case, the only thing you have to worry about when adding SPBPERS is joining SPBPERS_PIDM to another PIDM already in the query. In this case, we'll join to SPRIDEN as shown below:

select spriden_id ID,
spriden_last_name LastName,
spriden_first_name FirstName,
rcrapp1_rcpt_date ISIR_Recd_Date,
rcrapp1_verification_msg Ver_Msg,
spbpers_sex Gender,
spbpers_ethn_code Ethnicity

from rcrapp1, spriden, spbpers
where spriden_pidm = rcrapp1_pidm
and spbpers_pidm = spriden_pidm
and spriden_change_ind is null
and rcrapp1_aidy_code = :aidy
and rcrapp1_infc_code = 'ISIR'
and rcrapp1_curr_rec_ind = 'Y';

Now, as we did before, we need to run the query again and make sure that the number of records returned is still 1200. Assuming that's the case, it's time to move onto our next set of attributes. This time, I'm going to bite the bullet and add SGBSTDN. See the query below (the additions, shown in bold).

select spriden_id ID,
spriden_last_name LastName,
spriden_first_name FirstName,
rcrapp1_rcpt_date ISIR_Recd_Date,
rcrapp1_verification_msg Ver_Msg,
spbpers_sex Gender,
spbpers_ethn_code Ethnicity,
sgbstdn_coll_code College,
sgbstdn_levl_code Level,
sgbstdn_camp_code Campus

from rcrapp1, spriden, spbpers, sgbstdn a
where spriden_pidm = rcrapp1_pidm
and spbpers_pidm = spriden_pidm
and spriden_pidm = sgbstdn_pidm
and sgbstdn_term_code_eff = (select max(sgbstdn_term_code_eff)
from sgbstdn b
where a.sgbstdn_pidm = b.sgbstdn_pidm

and spriden_change_ind is null
and rcrapp1_aidy_code = :aidy
and rcrapp1_infc_code = 'ISIR'
and rcrapp1_curr_rec_ind = 'Y';

The order and tables used for the join conditions are not terribly critical unless you're trying to increase the performance of your query (in which case, there are certain rules that can be applied). That being the case, as shown above I have joined SGBSTDN_PIDM to SPRIDEN. Additionally, note the subquery to find the student's most current SGBSTDN record. Alright, let's take stock of where we're at at this point:

  • EFC (RCRAPP4_SAR_EFC)

  • package complete date (RORSTAT_PCKG_COMP_DATE)

  • dependency status (RCRAPP2_C_DEPEND_STATUS)

  • offered amount (RPRAWRD_OFFER_AMT

  • accepted amount (RPRAWRD_ACCEPT_AMT)

  • paid amount (RPRAWRD_PAID_AMT)

  • unmet need (RORSTAT_UNMET_NEED)

  • total credits (SUM(SFRSTCR_CREDIT_HR)

  • and overall GPA (SHRLGPA_GPA)


To help get this article finished (for Pete's sake!!!), I'm going to add everything except the registered credits in the next step. However, they should be added one table at a time and tested at each step to make sure that the record set has not changed. In any case, the query with those fields can be seen below:

select spriden_id ID,
spriden_last_name LastName,
spriden_first_name FirstName,
rcrapp1_rcpt_date ISIR_Recd_Date,
rcrapp1_verification_msg Ver_Msg,
spbpers_sex Gender,
spbpers_ethn_code Ethnicity,
sgbstdn_coll_code College,
sgbstdn_levl_code Level,
sgbstdn_camp_code Campus,
rcrapp4_sar_efc EFC,
rorstat_pckg_comp_date PackgCompDate,
rcrapp2_c_depend_status DependencyStatus,
sum(rprawrd_offer_amt) Offered,
sum(rprawrd_accept_amt) Accepted,
sum(rprawrd_paid_amt) Paid,
rorstat_unmet_need UnmetNeed,
shrlgpa_gpa GPA

from rcrapp1, spriden, spbpers, sgbstdn a
rcrapp4, rorstat, rcrapp2, shrlgpa,
rprawrd
where spriden_pidm = rcrapp1_pidm
and spbpers_pidm = spriden_pidm
and spriden_pidm = sgbstdn_pidm
and spriden_pidm = shrlgpa_pidm
and sgbstdn_levl_code = shrlgpa_levl_code
and spriden_pidm = rorstat_pidm
and rorstat_aidy_code = rcrapp1_aidy_code
and rcrapp1_pidm = rcrapp2_pidm
and rcrapp1_aidy_code = rcrapp2_aidy_code
and rcrapp1_infc_code = rcrapp2_infc_code
and rcrapp1_pidm = rcrapp4_pidm
and rcrapp1_aidy_code = rcrapp4_aidy_code
and rcrapp1_infc_code = rcrapp4_infc_code
and rorstat_pidm = rprawrd_pidm

and rorstat_aidy_code = rprawrd_aidy_code
and sgbstdn_term_code_eff = (select max(sgbstdn_term_code_eff)
from sgbstdn b
where a.sgbstdn_pidm = b.sgbstdn_pidm
and spriden_change_ind is null
and rcrapp1_aidy_code = :aidy
and rcrapp1_infc_code = 'ISIR'
and rcrapp1_curr_rec_ind = 'Y'
group by spriden_id, spriden_last_name, spriden_first_name,
rcrapp1_rcpt_date, rcrapp1_verification_msg, spbpers_sex,
spbpers_ethn_code, sgbstdn_coll_code, sgbstdn_levl_code,
sgbstdn_camp_code, rcrapp4_sar_efc, rorstat_pckg_comp_date,
rorstat_unmet_need, shrlgpa_gpa;

Finally, the only thing left is to calculate the total credits for each student. This can be done in two different ways:

  • Join to SFRSTCR and sum(credits)

  • Write a SCALAR SUBQUERY to do the calculation


In cases where only one or two fields are required from a new table, I prefer using the second method (SCALAR SUBQUERY). It acts as an outer join (if required) and performs better in many cases.

Wednesday, March 11, 2009

To Join or not to Join, that is the Question!

One of the most difficult aspects of SQL queries are identifying and accurately completing the relationship identification between database objects. Let me try that again, in English: when querying more than one table at a time (in a relational database), the database needs the developer (you) to identify the relationship between the objects. That identification is difficult at times and leads to many of the problems with report results. In this post, I talk about Relational Databases, Cartesian Products, Inner, and Outer Joins. Before I get into the tips around JOINS, let's get into a little more depth regarding the theory behind join requirements.

Relational Database

A relational database is just like it sounds; a database whose contents are related. How they are related is the crux of the join issue. Take the following example:

SPRIDEN and SPBPERS are two Banner tables containing general person information. SPRIDEN contains ID and Name data, while SPBPERS contains demographic information (such as SSN, Birth Date, Citizenship, Ethnicity, etc). Referencing the first image below, we can see that the data stored in the tables is very organized, but how does the system know which SSN and Gender to associate with Mary Johnson?


As we can see, the individual's PIDM exists in both tables. If we connect (JOIN) the two, we will establish the required relationship; indicating to the system how the two records (and by extension, the two tables) are related. Clear as mud right! Hopefully the image below will help explain:


I don't want to spend too much time on theory, so I'm going to keep this brief. Hopefully you now have an idea about why joins are required. The next step is to understand that there are a few different kind of joins. For more information on the types, how and when they're used, and their syntax, click here. One thing I do want to discuss before moving onto more practical matters, is the concept of a cartesian join, or cartesian product.

Cartesian Join/Product

You have created a cartesian join when you query more than one object and DO NOT specify the join condition. You would know you've done that by several things:

  1. The query may bring down the database
  2. If the results do get returned, there are likely duplicate records
  3. The combination of information from the queried objects will make no sense; Mary Johnson's name will appear with other people's demographic information.
  4. Your query contains more than one object in the FROM clause, with no reference to their relationship in the WHERE clause.

As you probably have guessed, cartesian joins are VERY bad. The problem is that Oracle assumes that each row in the first object should be connected to each row in the second. As an example, reference the two tables below. Assuming that they are displaying all the data in each object, the following query would yeild the results below.



select ID, Lname, Gender
  from object1, object2;

As we would expect, we have 27 rows of results (3 from Object2 times 9 from Object1). Now, extrapolate; imagine two tables that contain hundreds of thousands of rows!

200,000 * 300,000 = 60,000,000,000 records returned...ouch!

It has potential to crash your system!!! What we want to do, is specify the join condition as demonstrated above, but before we do that, we need to talk about the different kinds of joins.




Inner Joins

An inner join only returns the "SELECTED" information if both sides of the join condition match. Observe the following example:

SELECT spriden_id, spbpers_ssn
FROM spriden, spbpers
WHERE spriden_pidm = spbpers_pidm;
*NOTE - the case is not important. I use UPPER case here to specify "clauses" of the select statement.

In the above example, the ID, and SSN would only return where the SPRIDEN_PIDM and SPBPERS_PIDM are the same. The implication of that is in regards to this questions, "What if they don't have a record in SPBPERS?". In this case, that individual would be REMOVED from the query results!


To help deal with this issue, we can use what's called an "outer join".

Outer Joins

There are a few different types of outer joins, but for the purposes of this discussion, we really only need to consider the reason you would use an outer vs. an inner join and what the possible repurcussions might be. For more information on outer joins (syntax, etc), click here. First off, why would you use an outer join? The most important reason is to combat the issue mentioned above under "Inner Joins"; if both sides of the join condition do not equate, then the record is dropped. In many cases (addresses, email, telephone, and others), you don't want to lose the record just because the individual in question doesn't have a record in a certain area. In those cases we use an outer join as in the script below:

SELECT spriden_id, spbpers_ssn
FROM spriden, spbpers
WHERE spriden_pidm = spbpers_pidm(+);

Notice the "(+)" at the end of SPBPERS_PIDM in the join condition. This indicates that Oracle should return ALL ROWS from spriden as well as the rows from SPBPERS where the PIDM matches. So, again looking at our sample SPRIDEN and SPBPERS tables below, we would get the following results from this query:


One word of caution: if you include a condition on a table that is outer joined, you MUST outer join on that condition as well!!! For example, let's say we only wanted to look at GENDER = 'M'. Normally, we would use the following to do so:

SELECT spriden_id, spbpers_ssn
FROM spriden, spbpers
WHERE spriden_pidm = spbpers_pidm(+)
and spbpers_sex = 'M';

However, because many individuals have no record in SPBPERS, this query would NOT return ONLY return one row, the row corresponding to the fellow with an 'M' for gender. As such, we must do the following to outer join on the condition as well as the join if we want to include folks with no SPBPERS record:

SELECT spriden_id, spbpers_ssn
FROM spriden, spbpers
WHERE spriden_pidm = spbpers_pidm(+)
and spbpers_sex(+) = 'M';

Notice the "(+)" by the spbpers_sex condition as well as the join condition! So as a general rule: any conditions against a table that is outer joined, must be outer join conditions!

Now that we have an idea of what an outer join is used for, let's discuss its implications. Basically, using an outer join introduces overhead to your query. That being the case, it's best to stick with inner joins whenever possible. Other than that, good luck! I hope this has given you a little insight into join theory, especially with regard to Banner and Oracle. Please post questions as needed! until next time, happy hunting!

Friday, February 27, 2009

Effective Dating and Sub Queries

Effective dating (no, not the romantic kind!) is one of the most common topics of discussion when reporting against Banner (Baseline, especially: the ODS and EDW go a long way to removing effective date and term code logic requirements). Most people are aware of the issue, but do not fully understand why things are the way they are and even if the "why" is understood, "how"
to go about solving the issue can sometimes be very confusing. There are two main examples I like to use when discussing this subject; job history and student records. Job history deals with "effective dates" (NBRJOBS_EFFECTIVE_DATE) while the
student base record deals with "effective term" (SGBSTDN_TERM_CODE_EFF). The following entry discusses both questions of "why" this situation is important and "how" it must be dealt with.

Why is this such a BIG deal!?!?


The most important thing to keep in mind is the two dimensional nature of the way data is stored in Banner (or any database, really). You can think of the Banner database as a really complex and robust series of spreadsheets that hold all your Banner data. Jobs and student information are no different in terms of storage; the difference is in the fact that both store historical information. Let's
take a look at a relatively simple example. Assume for a second that John Doe has only one job currently, but has moved around within the school and has been promoted a couple of times. The way his data is stored in the system would look something like this:


Figure 1 - NBRJOBS

As you can see, all job history is stored in the Banner system. If one wishes to look at the most current job information for John, one MUST find that record associated with his most current (or MAX) effective date. As we can see, that would correspond to the last record in the table, for position B2354 with a salary of $45,000.

Now let's take a look at the student side (SGBSTDN). Even though we are now dealing with effective terms, instead of effective dates, the concept is still the same (and can be applied to any situation in which you're dealing with having to figure out the most current record for a given set of records). As shown in Figure 2 which is very similar to Figure 1, the student information corresponding to John Doe in SGBSTDN can change a random terms throughout their experience in school. If their major or any other attribute (level, type, etc) changes during their tenure, a new record is added to SGBSTDN reflecting the change with a new effective term.


Figure 2 - SGBSTDN

As shown, the problem is the same. Mr. Doe's status changed from new freshmen (N in my simply coded universe) to continuing (C) after his first term. Then, after much thought, he changed his major from psychology (PSYCH) to sociology (SOC) in fall, 2008 (200910). NOTE: all codes are fictional and do not reflect real Banner data. For your school's codes, reference the validation tables associated with each data element. As with the job records, we would simply need to make sure to isolate John's most current record by looking for the record associated with his most current (or MAX) effective term. In either case, we must do this through the use of a correlated subquery.

Alright, alright, now tell me HOW!


Before we get into the details of how a subquery can be used to facilitate our needs, I'd like to briefly discuss why a subquery is required. This question of "why subquery" is a constant source of confusion among report writers everywhere. Typically, what I hear is, let's just include the MAX effective date or term in our query as in the following script:



Whenever an aggregate function (such as COUNT, AVG, MAX, or MIN) is used in a SQL query, all fields NOT being aggregated must exist in the where clause. What that translates into (in english) is that the aggregation occurs around all other fields being selected. So, in the case of the above script, it's looking for the maximum effective date for each combination of ID, Name, Position, Suffix, and Salary. Referencing Figure 1, we can see that this would be fine for the first position, because it would isolate around the terminated record (all selected fields except the date are the same), indicating the most current record for that position. However, for their currently active position, it would return a record for each salary (because it's looking at the maximum effective date for each different salary!!!). The crux of the issue, is the fact that Oracle is relatively "dumb", it simply takes the statement you write and returns the results according to pre-established rules. It cannot "guess" what we are trying to do. That being the case, it looks at each row in turn. For this reason, we must use a sub query to match the effective date in the current row to the MAX effective date for the individual, position, and suffix. Now, on to the solution!!!

A subquery is simply a query within a query. A correlated subquery, as we will see shortly, is a query in the WHERE CLAUSE, usually referencing the same table mroe than once.



There are several things that need to be pointed out about this modified query. First, take note of the subquery; it is the key that tells the system we want to keep only the most current record for each PIDM, POSITION, and SUFFIX combination. Hence, as the results show, the query shows both the most recent terminated AND active records. Additionally, note that the subquery contains only one table (NBRJOBS with the TABLE2 alias). The fact that two "instances" of the same table are referencing each other is what makes the subquery "correlated". An alias for at least one of the two tables is required, otherwise the system has no way to tell them apart. Now that we have excluded the result set, we can eliminate the individual's terminated positions by simply adding that condition to the MAIN QUERY'S WHERE CLAUSE (not the subquery) as follows:



Alright, sit back and take a deep breath, I know a whole new world of SQL querying has been opened to you:). Now, it's time to put it to practice. Using the script below, test yourself by running it on your system to get a list of student pidms who have more than one record on SGBSTDN. Then, write a script that will get ALL of one student's records based on one of the PIDMs in the list using the techniques described above. If you need help or want to compare answers, you can get my solution to this vexing problem in the first comment on this blog entry, but don't
be tempted to cheat!!!:)

SELECT SGBSTDN_PIDM, COUNT(*)
FROM SGBSTDN
HAVING COUNT(*) > 1
ORDER BY SPRIDEN_PIDM

I hope this has been helpful, let me know what you think!!!

Monday, February 23, 2009

Getting Started in Banner Reporting

The first thing one must understand before even thinking about building a report is the underlying data structure from which one must extract the information required. In Baseline Banner, there are over 4,000 objects that store various pieces of data. The Operational Data Store (ODS) and Enterprise Data Warehouse (EDW), while they have far fewer objects, contain their own set of storage objects, all of which one must understand before being able to produce a valid and accurate report. So, with close to 5,000 database objects to choose from (assuming one has access to both the ODS and EDW), how does one figure out where to get the data they need? Well, the goal of this posting is to give you a good idea of where to start.

Deciding where to get your data.


As mentioned above, SunGard provides three areas from which you can draw information for reporting: Baseline, the ODS, and the EDW. Below, I have included a blurb about why you might choose to use one over another. Happy hunting!

Baseline is the database architecture that everyone typically thinks of when they say "Banner". It is the thing that stores the hundreds (if not thousands) of transactions that occur daily at your institution and what is typically used for reporting (at least until recently). The key question when deciding to use Baseline vs. the ODS for standard reporting is whether or not the report needs to contain real-time data. The ODS is cloned nightly so any reports drawing data from it are inherently a day old. For things like up-to-the-minute enrollment statistics and current financial statements, Baseline may be required.

The Operational Data Store (ODS) is SunGard's answer to the difficulties that arise when using Baseline for reporting. It is a series of database views that "flatten" the information stored in Banner Baseline. This makes getting a report of logically related information much easier as one can get all the required information from one reporting view instead of having to grab disparate pieces of data from multiple sources (in the case of Baseline, database tables).

The Enterprise Data Warehouse (EDW) is SunGard's solution for analytical reporting. There is a new movement in business intelligence circles towards more dynamic, analytical, database tools for reporting that can be used for real decisions. This new area of reporting has requirements that are completely different from "standard" reporting: aggregated information, data organized in ways that make it easier to analyze, and most importantly, historical data (used for trend analysis).

Source ID'd, now for the detail!


So, you've figured out the source for your report. Now you have to find the individual objects (tables and/or views) and even column fields within the object that contain the data you need. As with the section above, this one is organized by source, so if you're looking for more information on reporting against Baseline, click here, the ODS, click here, and the EDW, click here.

Baseline Detail


SunGard uses a naming convention for Baseline objects that is not very useful EXCEPT in the following ways:
  1. Object names are always 7 characters long
  2. The first letter corresponds to the module

    • S = Saturn (Student Related Information)
    • G = General (Information Common Across Modules)
    • F = Finance
    • R = Financial Aid
    • T = Student Accounts
    • N = Position Control (HR Information)
    • P = Payroll
    • A = Advancement (Alumni and Development)

  3. The second two letters if "TV" identify a validation table. So, if you're looking for a validation table with academic term (student information), the first three letters of the object will be "STV = S - Saturn, TV - Validation Table"
  4. The last four characters are a little harder to nail down, but the majority of the time they indicate the type of data stored. For example, the last four characters for the validation table for academic term (as discussed above) are "TERM". Put that together with #3 and you get STVTERM. Likewise, the validation table for student major is STVMAJR, and for financial accounts, FTVACCT.
  5. Another little nugget of information is in regards to the third character. If it's an R, that means that the object has "repeating" records. That means that there is more than one record per person or other key data element. For example, SFRSTCR is the repeating table for course registration information (S=Student, R=Repeating, STCR=STudent Course Registration), and contains a record for each course a student is taking (the student is repeated). On the other Hand, if the third character is a B, that means it's a base table and will contain NO repeating rows for the key data element. NBBPOSN is the base table for position information and contains only one record per position and suffix combination. Student, Employee, and Advancement all have a base table that one must be in to be considered each respective entity type:

    • APBCONS - Contains a record for EVERY constitutent
    • PEBEMPL - Contains a record for EVERY employee
    • SGBSTDN - Contains a record for EVERY student

    If an individual does not have a record in SGBSTDN, they are NOT considered a student, etc.
So with the above information, it is my hope that you will have the tools you need to get started reporting against Banner Baseline tables. Additionally, don't forget that the Banner Forms can give you hints regarding the source of the data they contain. To find those hints, click into a data field of interest, click "help", then "dynamic help query". The window that ensues contains the TABLE and the FIELD in which that data element is stored. There is a good chance that related information on the same form resides in the same table. Good luck!!!

ODS Detail


The key to the ODS is the metadata provided by SunGard during the purchase process. To locate the documents associated with your institution's metadata, ask your Banner administrator(s). Depending on your starting point, there are a couple different ways to go about navigating the documents contained in the ODS metadata files:

  1. If you know the source in Baseline, use the "source reports" to find its location in the ODS:

    • On the main page, choose the module that contains the data from the links on the left side.
    • Scroll to the very bottom of the subsequent page where it says "Source Reports".
    • In the list of source reports, locate the name of the table in which the data element in question is stored in baseline and click its link.

  2. If you don't have the starting point specified above, start in the ODS Concept Diagrams provided in the ODS metadata:

    • Starting on the main page of the metadata, click the link to the concept diagrams at the bottom center of the page.
    • Using the menu on the left, navigate through the different functional modules and select areas related to the data for which you are looking.


EDW Detail


As I mentioned above, the EDW is built with a completely different reporting methodology in mind. As such, it is organized in what are referred to as "star schemas". These schemas are designed to make reporting aggregated, measured data easier. As shown in the article linked above, star schemas are organized in what are known as "fact" and "dimension" tables. The fact tables contain the "measures" while the dimensions contain the "buckets" by which you wish to measure the "measures":). Clear as mud, I know. So, what can you do with this information? Well, first of all, you should only be using the EDW if you are trying to report measured data. If that's the case, there is a series of metadata documents (very similar to the ODS metadata) that will help in figuring out where to get the data you require.