In the modern information age it is imperative for
the organization to have its information at its fingertips. To meet
this need for information, additions to systems are often developed
to give consolidated management overviews of the operational
information. Often, consolidated views require information from
several operational systems and this leads to separate systems,
often referred to as MIS (management information systems) or
decision support systems. Sometimes the consolidated management
information is represented in a specialized system often referred
to as a data warehouse.
A definition: A data warehouse is a system that contains the
data of an organization and is structured specifically for query,
analysis and reporting.
In this particular philosophy the key to this
definition is the words, "structured specifically", which implies
that the way the information is stored in a data warehouse is
different from normal systems. The primary reason for this is that
we need data models that can speed up querying and
reporting.
The technique used for modelling data warehouses
is called multidimensional modelling and uses a particular data
construct known as a star schema. The star schema (sometimes
referenced as star join schema) is the simplest data warehouse
schema where a single "fact table" containing a compound primary
key, with one segment for each "dimension" and with additional
columns of additive, numeric facts. A fact table consists of the
measurements, metrics or facts of a business process. It is often
located at the centre of a star schema, surrounded by dimension
tables. A dimension table is one of the set of companion tables to
a fact table and normally contains attributes or (fields) used to
constrain and group data when performing data warehousing
queries.
Modelling data warehouses is a daunting task and
often focuses on establishing a data model only. Often the user
requirements are not formalized in a specification. The rest
of this article will be dedicated to discussing the analysis tasks
to be performed in establishing the requirements and initial models
for the data entities.
Rambling: A sales manager has the requirement to have a
small decision support system in the form of a data mart. To
support this requirement she needs to report on sales per customer,
per region, per organization, per location, per product, per time
period, or any combination of the above.
Note. For simplicity, we are excluding the ETL
(extraction, transformation and load) from the scope of the
model:
Simple approach
Step one: Define the scope of the data
warehouse.
The data warehouse at the highest level will
always have the following components:
-
Manage
reporting – This typically includes all of the end user
functionality whereby reports and queries may be
managed
-
Maintain
data warehouse – The maintenance of the warehouse focuses on
the loading of information into the data
warehouse
-
Administer data warehouse – This includes
all of the tasks necessary for a data warehouse administrator to
manage the data warehouse
At the highest level the data warehouse may be
represented as follows in a package diagram:
Figure 1: Package
diagram
Step two: Per package, define the functions necessary to the
data warehouse
For package: Manage reporting, after discussion
with the sales manager, we have determined that the following
functions are required:
-
Display
sales by:
-
Customer
-
Region
-
Product
-
Organization
-
Time
period
-
Region,
product, organization, customer, time period
-
Optionally a report may be
generated
The use case diagram for this may look like the
following:
Figure 2: Use case
diagram
Alternatively the extended use cases may be
described as scenarios against the base use case: Display
sales.
Step three: from the use case diagram we can then derive the
class diagram and after further consultation with the sales manager
also the attributes and operations. This may lead to the following
class diagram (also applied the star schema using the fact table
and the dimension tables):
Figure 3: Class diagram
Conclusion
Using the steps above a model of the data
warehouse can be created rapidly to be used as a requirements
specification for a commercial product, or for the detailed design
of a data warehouse to be developed in-house.
Making life easy!
References: Data warehouse definitions sourced
from Wikipedia