Data Vault Modeling is a database modeling method that is designed to
provide historical storage of data coming in from multiple operational
systems. It is also a method of looking at historical data that, apart
from the modeling aspect, deals with issues such as auditing, tracing of
data, loading speed and resilience to change.
Data Vault Modeling focuses on several things. First, it emphasizes
the need to trace of where all the data in the database came from.
Second, it makes no distinction between good and bad data ("bad" meaning
not conforming to business rules),[1] leading to "a single version of the facts" versus "a single version of the truth",[2]
also expressed by Dan Linstedt as "all the data, all of the time".
Third, the modeling method is designed to be resilient to change in the
business environment where the data being stored is coming from, by
explicitly separating structural information from descriptive
attributes.[3] Finally, Data Vault is designed to enable parallel loading as much as possible,[4] so that you can scale out for very large implementations. An alternative (and seldom used) name for the method is "Common Foundational Integration Modelling Architecture."[5]
Basic notions
Data Vault attempts to solve the problem of dealing with change in
the environment by separating the business keys (that do not mutate as
often, because they uniquely identify a business entity) and the
associations between those business keys, from the descriptive
attributes of those keys. The business keys and their associations are structural attributes,
forming the skeleton of the data model. The Data Vault method has as one
of its main axioms that real business keys only change when the
business changes and are therefore the most stable elements from which
to derive the structure of a historical database. If you use these keys
as the backbone of a Data Warehouse, you can organize the rest of the
data around them. This means that choosing the correct keys for the Hubs
is of prime importance for the stability of your model.[13] The keys are stored in tables with a few constraints on the structure. These key-tables are called Hubs.
Hubs
Hubs contain a list of unique business keys with low propensity to
change. Hubs also contain a surrogate key for each Hub item and metadata
describing the origin of the business key. The descriptive attributes
for the information on the Hub (such as the description for the key,
possibly in multiple languages) are stored in structures called
Satellite tables which will be discussed below. The Hub contains at least the following fields:[14]
a surrogate key, used to connect the other structures to this table.
a business key, the driver for this hub. The business key can consist of multiple fields.
the record source, can be used to see where the business keys come
from and if the primary loading system has all of the keys available in
other systems as well.
optionally, you can also have metadata fields with information about manual updates (user/time) and the extraction date.
A Hub is not allowed to contain multiple business keys, except when
two systems deliver the same business key but with collisions that have
different meanings. Hubs should normally have at least one satellite.[14]
Hub example
This is an example for a Hub-table containing Cars, surprisingly
called "Car" (H_CAR). The driving key is Vehicle Identification Number.
No comments:
Post a Comment