A DaaS implementation will include some or all of the following functions:
- Cleansing and normalization,
- Security, and
- Formatting and customization.
Role of Excel
Gathering the data is the starting point for DaaS. In any DaaS business, an ever-increasing data set will increase the business' value and number of potential customers. One way to get up-and-running with an implementation quickly is to use Excel as a primary means of inputting data into the system.
Ideally, data comes into the system through well-defined interfaces (web services, file transfers, etc.). However, when gathering a new dataset -- say from a collection of PDFs -- there may not be an application written to support the intake. A spreadsheet is viable because it can be worked on iteratively and archived for historical record and audit.
Some type of data store is recommended that will warehouse the data coming into the system. An RDBMS is a possible selection. The schema defined in the RDBMS will govern the business; different workbooks will conform to the RDBMS schema.
This data store is also important because it provides a logical and runtime division between the intake function (data loading processes) and the presentation (like XML-rendering jobs).
Presentation includes the connection, transport, and formatting of the data. XML or JSON rendered from a SQL query served up as files from a web server are presentation examples. More dynamic presentation like parametrized reports, analysis tools, and querying web services are other examples. As much as possible, new external interface requirements should be handled by an ever-expanding set of presentation jobs. Don't try to chain several presentation jobs together as they're likely to change at different times.
Data Flow Diagram
The following data flow diagram shows a DaaS implementation divided into two parts: Intake and Presentation.
|A DaaS Architecture|
Once in the database, the presentation jobs -- tailored to each format -- are run. This may happen immediately after a data load or at a convenient time.
The state of the processing can be studied by running reports against the RDBMS.
It's important for a DaaS implementation to anchor onto a data store that separate from both the intake function (Excel workbook) or the output (XML, JSON). This allows for operational flexibility in adjusting the load schedule, decoupling it from the presentation. The central data store is also a hub from which many different pieces can be added that are not dependent on each other.