Basic Data Warehouse Terminology

Tuesday, September 6, 2011
by jsalvo

Here’s some definitions of common Data Warehousing terminology:

Dimensions

The analytical objects in BI that allow for slicing-and-dicing of data in the data warehouse/data mart.  Dimensions provide the qualitative information related to the facts.  Dimensions consists of attributes.  As an example, attributes of a product include its color, size, and price. Dimensions are stored in dimension tables.

Dimensions have both a natural key and a surrogate key.  The natural key is the business key, usually derived from the source system.  The surrogate key is an integer key value in the data warehouse / data mart that joins the dimensions to facts (measures).

Special Cases

Degenerate Dimensions: Columns in a fact table that are needed for analysis but do not relate to any existing dimension.  An example is a sales transaction number.

Junk Dimensions: In dimensional modeling, there are often attributes that do not belong to any specific dimension.  Generally these attributes have a limited range of values.  Often it doesn’t make sense to put the attribute in its own dimension table, but they still need to be a part of the model.  Junk dimensions are dimensions that contain attributes that do not belong to any other dimension.

Types of Slowly Changing Dimensions

Type 1: Only the most recent value of the attribute is retained.

Type 2: A new record is created in the dimension table whenever the attribute changes.  Historic facts are still associated with their original dimension record.

Type 3: Tracks the most recent historic value of an attribute.  A field is added to the dimension table to track the last value of the attribute before updating.  Very rarely used.

Factless Fact Tables (aka Bridge Tables)

Bridge tables are used to model many-to-many relationships between two dimension tables.  The only purpose of bridge tables is to represent relationships between dimensions.

Facts

An event that is being tracked or measured.   Facts are numeric values that may be aggregated for data analysis and reporting purposes.   An example of a fact is the sale of a product.  Fact tables are often related to several dimension tables via surrogate keys.  Facts are stored in fact tables.

Snowflake Schema

Data Warehousing schema that consists of fact tables directly related to dimension tables that contain relationships to other dimension tables.  More complex than a star schema (described below).

Star Schema

The simplest type of data warehouse schema.  Fact tables are directly related to one or more dimensions tables.  No relationships exist between the dimension tables.  When the dimension tables are placed around the fact table, the shape resembles a star.

Comments

comments powered by Disqus