In the modern information age it is imperative for the organization to have its information at its
fingertips. To meet this need for information, additions to systems are often developed to give
consolidated management overviews of the operational information. Often, consolidated views
require information from several operational systems and this leads to separate systems, often
referred to as MIS (management information systems) or decision support systems. Sometimes the
consolidated management information is represented in a specialized system often referred to as a
data warehouse.
A definition: A data warehouse is a system that contains the data of an organization and is
structured specifically for query, analysis and reporting.
In this particular philosophy the key to this definition is the words, "structured specifically", which
implies that the way the information is stored in a data warehouse is different from normal
systems. The primary reason for this is that we need data models that can speed up querying and
reporting.
The technique used for modelling data warehouses is called multidimensional modelling and uses a
particular data construct known as a star schema. The star schema (sometimes referenced as star
join schema) is the simplest data warehouse schema where a single "fact table" containing a
compound primary key, with one segment for each "dimension" and with additional columns of
additive, numeric facts. A fact table consists of the measurements, metrics or facts of a business
process. It is often located at the centre of a star schema, surrounded by dimension tables. A
dimension table is one of the set of companion tables to a fact table and normally contains
attributes or (fields) used to constrain and group data when performing data warehousing queries.
Modelling data warehouses is a daunting task and often focuses on establishing a data model only.
Often the user requirements are not formalized in a specification. The rest of this article will be
dedicated to discussing the analysis tasks to be performed in establishing the requirements and
initial models for the data entities.
Rambling: A sales manager has the requirement to have a small decision support system in the
form of a data mart. To support this requirement she needs to report on sales per customer, per
region, per organization, per location, per product, per time period, or any combination of the
above.
Note. For simplicity, we are excluding the ETL (extraction, transformation and load) from the scope
of the model:
Simple approach
Step one: Define the scope of the data warehouse.
The data warehouse at the highest level will always have the following components:
- Manage reporting – This typically includes all of the end user functionality whereby
reports and queries may be managed
- Maintain data warehouse – The maintenance of the warehouse focuses on the loading of
information into the data warehouse
- Administer data warehouse – This includes all of the tasks necessary for a data
warehouse administrator to manage the data warehouse
At the highest level the data warehouse may be represented as follows in a package diagram:
Figure 1: Package diagram
Step two: Per package, define the functions necessary to the data warehouse
For package: Manage reporting, after discussion with the sales manager, we have determined that
the following functions are required:
- Display sales by:
- Customer
- Region
- Product
- Organization
- Time period
- Region, product, organization, customer, time period
- Optionally a report may be generated
The use case diagram for this may look like the following:
Figure 2: Use case diagram
Alternatively the extended use cases may be described as scenarios against the base use case:
Display sales.
Step three: from the use case diagram we can then derive the class diagram and after further
consultation with the sales manager also the attributes and operations. This may lead to the
following class diagram (also applied the star schema using the fact table and the dimension tables):
Figure 3: Class diagram
Conclusion
Using the steps above a model of the data warehouse can be created rapidly to be used as a
requirements specification for a commercial product, or for the detailed design of a data warehouse
to be developed in-house.
Making life easy!
References: Data warehouse definitions sourced from Wikipedia