Modelling the Data Warehouse using the UML
 
In the modern information age it is imperative for the organization to have its information at its fingertips. To meet this need for information, additions to systems are often developed to give consolidated management overviews of the operational information. Often, consolidated views require information from several operational systems and this leads to separate systems, often referred to as MIS (management information systems) or decision support systems. Sometimes the consolidated management information is represented in a specialized system often referred to as a data warehouse.
A definition: A data warehouse is a system that contains the data of an organization and is structured specifically for query, analysis and reporting.
In this particular philosophy the key to this definition is the words, "structured specifically", which implies that the way the information is stored in a data warehouse is different from normal systems. The primary reason for this is that we need data models that can speed up querying and reporting.
The technique used for modelling data warehouses is called multidimensional modelling and uses a particular data construct known as a star schema. The star schema (sometimes referenced as star join schema) is the simplest data warehouse schema where a single "fact table" containing a compound primary key, with one segment for each "dimension" and with additional columns of additive, numeric facts. A fact table consists of the measurements, metrics or facts of a business process. It is often located at the centre of a star schema, surrounded by dimension tables. A dimension table is one of the set of companion tables to a fact table and normally contains attributes or (fields) used to constrain and group data when performing data warehousing queries.
Modelling data warehouses is a daunting task and often focuses on establishing a data model only. Often the user requirements are not formalized in a specification.  The rest of this article will be dedicated to discussing the analysis tasks to be performed in establishing the requirements and initial models for the data entities.
 
Rambling: A sales manager has the requirement to have a small decision support system in the form of a data mart. To support this requirement she needs to report on sales per customer, per region, per organization, per location, per product, per time period, or any combination of the above.
Note. For simplicity, we are excluding the ETL (extraction, transformation and load) from the scope of the model:
 
Simple approach
Step one: Define the scope of the data warehouse.
The data warehouse at the highest level will always have the following components:
  • Manage reporting – This typically includes all of the end user functionality whereby reports and queries may be managed
  • Maintain data warehouse – The maintenance of the warehouse focuses on the loading of information into the data warehouse
  • Administer data warehouse – This includes all of the tasks necessary for a data warehouse administrator to manage the data warehouse
 
At the highest level the data warehouse may be represented as follows in a package diagram:
graphic
Figure 1: Package diagram
Step two: Per package, define the functions necessary to the data warehouse
For package: Manage reporting, after discussion with the sales manager, we have determined that the following functions are required:
  • Display sales by:
    • Customer
    • Region
    • Product
    • Organization
    • Time period
    • Region, product, organization, customer, time period
  • Optionally a report may be generated
 
The use case diagram for this may look like the following:
graphic
Figure 2: Use case diagram
Alternatively the extended use cases may be described as scenarios against the base use case: Display sales.
Step three: from the use case diagram we can then derive the class diagram and after further consultation with the sales manager also the attributes and operations. This may lead to the following class diagram (also applied the star schema using the fact table and the dimension tables):
graphic
Figure 3: Class diagram
Conclusion
Using the steps above a model of the data warehouse can be created rapidly to be used as a requirements specification for a commercial product, or for the detailed design of a data warehouse to be developed in-house.
Making life easy!
References: Data warehouse definitions sourced from Wikipedia
Google
 
Web xpdian.com