I first formed a spreadsheet based on PDF reports posted on the school system's web site. Each year, the PDFs reported actual revenues and their sources. I left the expenses off for now. I copied the values from each year into a column of the spreadsheet. I didn't attempt to normalize any of the revenue sources since they are all from the same school system. Here is a copy of the spreadsheet: FCPS Revenues.
The target data model is dimensional. This means that each focal point of analysis -- year, school system, and revenue source -- will have a row in a dimension (a table). A fact table contains the actual revenue value (a U.S. dollar amount). The fact table will refer to the dimensions. The data model is presented below.
![]() | |||
Dimensional Data Model for FCPS Revenues |
Here is a data dictionary describing the tables.
- SR_DIM_SYSTEM. A governmental school system. A REVENUE must be reported by just one other SYSTEM. A SYSTEM may report one or more REVENUES.
- SR_DIM_YEAR. A fiscal year. A YEAR may record one or more REVENUES. A REVENUE must be recorded by just one other YEAR.
- SR_DIM_REVENUE_SOURCE. A line item in the revenue summary. A REVENUE_SOURCE may classify one or more REVENUES. A REVENUE must be classified by just one other REVENUE_SOURCE.
- SR_FACT_REVENUE. A value for the REVENUE.
While useful for personal productivity, previous tutorials showed simple cases of loading an Excel file into a MySQL table or converting a text file to JSON. Integration tools become far more beneficial with more substantive data work. This spreadsheet and E-R diagram are the basis for a more significant example.
The processing that goes with this is here: http://bekwam.blogspot.com/2011/01/tutorial-loading-dimensions-with-talend.html.
ReplyDeleteLook for a revision here: Extending the School Data Mart Project.
ReplyDelete