Friday, February 27, 2009

Effective Dating and Sub Queries

Effective dating (no, not the romantic kind!) is one of the most common topics of discussion when reporting against Banner (Baseline, especially: the ODS and EDW go a long way to removing effective date and term code logic requirements). Most people are aware of the issue, but do not fully understand why things are the way they are and even if the "why" is understood, "how"
to go about solving the issue can sometimes be very confusing. There are two main examples I like to use when discussing this subject; job history and student records. Job history deals with "effective dates" (NBRJOBS_EFFECTIVE_DATE) while the
student base record deals with "effective term" (SGBSTDN_TERM_CODE_EFF). The following entry discusses both questions of "why" this situation is important and "how" it must be dealt with.

Why is this such a BIG deal!?!?


The most important thing to keep in mind is the two dimensional nature of the way data is stored in Banner (or any database, really). You can think of the Banner database as a really complex and robust series of spreadsheets that hold all your Banner data. Jobs and student information are no different in terms of storage; the difference is in the fact that both store historical information. Let's
take a look at a relatively simple example. Assume for a second that John Doe has only one job currently, but has moved around within the school and has been promoted a couple of times. The way his data is stored in the system would look something like this:


Figure 1 - NBRJOBS

As you can see, all job history is stored in the Banner system. If one wishes to look at the most current job information for John, one MUST find that record associated with his most current (or MAX) effective date. As we can see, that would correspond to the last record in the table, for position B2354 with a salary of $45,000.

Now let's take a look at the student side (SGBSTDN). Even though we are now dealing with effective terms, instead of effective dates, the concept is still the same (and can be applied to any situation in which you're dealing with having to figure out the most current record for a given set of records). As shown in Figure 2 which is very similar to Figure 1, the student information corresponding to John Doe in SGBSTDN can change a random terms throughout their experience in school. If their major or any other attribute (level, type, etc) changes during their tenure, a new record is added to SGBSTDN reflecting the change with a new effective term.


Figure 2 - SGBSTDN

As shown, the problem is the same. Mr. Doe's status changed from new freshmen (N in my simply coded universe) to continuing (C) after his first term. Then, after much thought, he changed his major from psychology (PSYCH) to sociology (SOC) in fall, 2008 (200910). NOTE: all codes are fictional and do not reflect real Banner data. For your school's codes, reference the validation tables associated with each data element. As with the job records, we would simply need to make sure to isolate John's most current record by looking for the record associated with his most current (or MAX) effective term. In either case, we must do this through the use of a correlated subquery.

Alright, alright, now tell me HOW!


Before we get into the details of how a subquery can be used to facilitate our needs, I'd like to briefly discuss why a subquery is required. This question of "why subquery" is a constant source of confusion among report writers everywhere. Typically, what I hear is, let's just include the MAX effective date or term in our query as in the following script:



Whenever an aggregate function (such as COUNT, AVG, MAX, or MIN) is used in a SQL query, all fields NOT being aggregated must exist in the where clause. What that translates into (in english) is that the aggregation occurs around all other fields being selected. So, in the case of the above script, it's looking for the maximum effective date for each combination of ID, Name, Position, Suffix, and Salary. Referencing Figure 1, we can see that this would be fine for the first position, because it would isolate around the terminated record (all selected fields except the date are the same), indicating the most current record for that position. However, for their currently active position, it would return a record for each salary (because it's looking at the maximum effective date for each different salary!!!). The crux of the issue, is the fact that Oracle is relatively "dumb", it simply takes the statement you write and returns the results according to pre-established rules. It cannot "guess" what we are trying to do. That being the case, it looks at each row in turn. For this reason, we must use a sub query to match the effective date in the current row to the MAX effective date for the individual, position, and suffix. Now, on to the solution!!!

A subquery is simply a query within a query. A correlated subquery, as we will see shortly, is a query in the WHERE CLAUSE, usually referencing the same table mroe than once.



There are several things that need to be pointed out about this modified query. First, take note of the subquery; it is the key that tells the system we want to keep only the most current record for each PIDM, POSITION, and SUFFIX combination. Hence, as the results show, the query shows both the most recent terminated AND active records. Additionally, note that the subquery contains only one table (NBRJOBS with the TABLE2 alias). The fact that two "instances" of the same table are referencing each other is what makes the subquery "correlated". An alias for at least one of the two tables is required, otherwise the system has no way to tell them apart. Now that we have excluded the result set, we can eliminate the individual's terminated positions by simply adding that condition to the MAIN QUERY'S WHERE CLAUSE (not the subquery) as follows:



Alright, sit back and take a deep breath, I know a whole new world of SQL querying has been opened to you:). Now, it's time to put it to practice. Using the script below, test yourself by running it on your system to get a list of student pidms who have more than one record on SGBSTDN. Then, write a script that will get ALL of one student's records based on one of the PIDMs in the list using the techniques described above. If you need help or want to compare answers, you can get my solution to this vexing problem in the first comment on this blog entry, but don't
be tempted to cheat!!!:)

SELECT SGBSTDN_PIDM, COUNT(*)
FROM SGBSTDN
HAVING COUNT(*) > 1
ORDER BY SPRIDEN_PIDM

I hope this has been helpful, let me know what you think!!!

Monday, February 23, 2009

Getting Started in Banner Reporting

The first thing one must understand before even thinking about building a report is the underlying data structure from which one must extract the information required. In Baseline Banner, there are over 4,000 objects that store various pieces of data. The Operational Data Store (ODS) and Enterprise Data Warehouse (EDW), while they have far fewer objects, contain their own set of storage objects, all of which one must understand before being able to produce a valid and accurate report. So, with close to 5,000 database objects to choose from (assuming one has access to both the ODS and EDW), how does one figure out where to get the data they need? Well, the goal of this posting is to give you a good idea of where to start.

Deciding where to get your data.


As mentioned above, SunGard provides three areas from which you can draw information for reporting: Baseline, the ODS, and the EDW. Below, I have included a blurb about why you might choose to use one over another. Happy hunting!

Baseline is the database architecture that everyone typically thinks of when they say "Banner". It is the thing that stores the hundreds (if not thousands) of transactions that occur daily at your institution and what is typically used for reporting (at least until recently). The key question when deciding to use Baseline vs. the ODS for standard reporting is whether or not the report needs to contain real-time data. The ODS is cloned nightly so any reports drawing data from it are inherently a day old. For things like up-to-the-minute enrollment statistics and current financial statements, Baseline may be required.

The Operational Data Store (ODS) is SunGard's answer to the difficulties that arise when using Baseline for reporting. It is a series of database views that "flatten" the information stored in Banner Baseline. This makes getting a report of logically related information much easier as one can get all the required information from one reporting view instead of having to grab disparate pieces of data from multiple sources (in the case of Baseline, database tables).

The Enterprise Data Warehouse (EDW) is SunGard's solution for analytical reporting. There is a new movement in business intelligence circles towards more dynamic, analytical, database tools for reporting that can be used for real decisions. This new area of reporting has requirements that are completely different from "standard" reporting: aggregated information, data organized in ways that make it easier to analyze, and most importantly, historical data (used for trend analysis).

Source ID'd, now for the detail!


So, you've figured out the source for your report. Now you have to find the individual objects (tables and/or views) and even column fields within the object that contain the data you need. As with the section above, this one is organized by source, so if you're looking for more information on reporting against Baseline, click here, the ODS, click here, and the EDW, click here.

Baseline Detail


SunGard uses a naming convention for Baseline objects that is not very useful EXCEPT in the following ways:
  1. Object names are always 7 characters long
  2. The first letter corresponds to the module

    • S = Saturn (Student Related Information)
    • G = General (Information Common Across Modules)
    • F = Finance
    • R = Financial Aid
    • T = Student Accounts
    • N = Position Control (HR Information)
    • P = Payroll
    • A = Advancement (Alumni and Development)

  3. The second two letters if "TV" identify a validation table. So, if you're looking for a validation table with academic term (student information), the first three letters of the object will be "STV = S - Saturn, TV - Validation Table"
  4. The last four characters are a little harder to nail down, but the majority of the time they indicate the type of data stored. For example, the last four characters for the validation table for academic term (as discussed above) are "TERM". Put that together with #3 and you get STVTERM. Likewise, the validation table for student major is STVMAJR, and for financial accounts, FTVACCT.
  5. Another little nugget of information is in regards to the third character. If it's an R, that means that the object has "repeating" records. That means that there is more than one record per person or other key data element. For example, SFRSTCR is the repeating table for course registration information (S=Student, R=Repeating, STCR=STudent Course Registration), and contains a record for each course a student is taking (the student is repeated). On the other Hand, if the third character is a B, that means it's a base table and will contain NO repeating rows for the key data element. NBBPOSN is the base table for position information and contains only one record per position and suffix combination. Student, Employee, and Advancement all have a base table that one must be in to be considered each respective entity type:

    • APBCONS - Contains a record for EVERY constitutent
    • PEBEMPL - Contains a record for EVERY employee
    • SGBSTDN - Contains a record for EVERY student

    If an individual does not have a record in SGBSTDN, they are NOT considered a student, etc.
So with the above information, it is my hope that you will have the tools you need to get started reporting against Banner Baseline tables. Additionally, don't forget that the Banner Forms can give you hints regarding the source of the data they contain. To find those hints, click into a data field of interest, click "help", then "dynamic help query". The window that ensues contains the TABLE and the FIELD in which that data element is stored. There is a good chance that related information on the same form resides in the same table. Good luck!!!

ODS Detail


The key to the ODS is the metadata provided by SunGard during the purchase process. To locate the documents associated with your institution's metadata, ask your Banner administrator(s). Depending on your starting point, there are a couple different ways to go about navigating the documents contained in the ODS metadata files:

  1. If you know the source in Baseline, use the "source reports" to find its location in the ODS:

    • On the main page, choose the module that contains the data from the links on the left side.
    • Scroll to the very bottom of the subsequent page where it says "Source Reports".
    • In the list of source reports, locate the name of the table in which the data element in question is stored in baseline and click its link.

  2. If you don't have the starting point specified above, start in the ODS Concept Diagrams provided in the ODS metadata:

    • Starting on the main page of the metadata, click the link to the concept diagrams at the bottom center of the page.
    • Using the menu on the left, navigate through the different functional modules and select areas related to the data for which you are looking.


EDW Detail


As I mentioned above, the EDW is built with a completely different reporting methodology in mind. As such, it is organized in what are referred to as "star schemas". These schemas are designed to make reporting aggregated, measured data easier. As shown in the article linked above, star schemas are organized in what are known as "fact" and "dimension" tables. The fact tables contain the "measures" while the dimensions contain the "buckets" by which you wish to measure the "measures":). Clear as mud, I know. So, what can you do with this information? Well, first of all, you should only be using the EDW if you are trying to report measured data. If that's the case, there is a series of metadata documents (very similar to the ODS metadata) that will help in figuring out where to get the data you require.