9: Resources

Data Modeling History

 
This is optional additional information about the evolution of data modeling as a field, and is not necessary to the practical understanding or utilization of data modeling theory and techniques.

Relational Theory's Founding Fathers

 
  • Dr. E.F. Codd, creator of the Relational Model published these two seminal papers:
    • A Relational Model of Data for Large Shared Data Banks, 6/70
    • Extending the Database Relational Model to Capture More Meaning, 12/79
  • C.J. Date: An associate of Dr. Codd, who co-created the Relational Institute with him. Published the first book on Relational Theory:
    • An Introduction to Database Systems, 1974

1960's: Real Men Automate Processes!

 
  • Data design is seen as a nuisance.
  • Tool development concentrates on automating flow charts.
  • Processes are individually automated.

1970's: Come On People, Now, Let's Get Together!

 
  • Integrating the design of processes and use of System Development Methodologies (SDMs) is declared to be the savior of poor development performance.
  • Tool development concentrates on automating process design.
  • Normalization is born but spends an unnoticed adolescence.

1980's: We Don't Need No Stinking Process Models!

 
  • Two strong camps appear: the "data devotees" and the "process people".
  • Automated normalization tools and data synthesis are touted as "the saviors of poor development performance" (deja vu all over again?).
  • Tool development concentrates on automating both normalization and "picture drawing" of data models.
  • Data decomposition appears and grows.

1990's: Come On People, Now, Let's Get Together! (redux)

 
  • The need to integrate business process analysis and data modeling is recognized.
  • Tool development concentrates on integrating both techniques.
  • Excellent tools finally appear.
  • Data synthesis disappears and data decomposition wins out.
  • Hardware and software DBMS platforms provide sufficient performance to begin to support implementation of normalized physical data models.

For More Info

 
  • The Data Modeling Handbook Michael Reingruber, William W. Gregory, John Wiley & Sons, 1994, ISBN 0471052906
  • Data Modeling for Information Professionals
    Bob Schmidt (Editor), Prentice Hall, 1998, ISBN 0130804509
  • Mastering Data Modeling-A User-Driven Approach John Vincent Carlis, Joseph Maguire, Addison-Wesley Pub Co, 2000, ISBN 020170045X
  • Handbook of Relational Database Design Candace Fleming and Barbara von Halle, Addison-Wesley, 1989,
    ISBN 0-201-11434-8
  • Data Modeling Essentials Graeme Simsion, Van Nostrand Reinhold, 1994, ISBN 1850328773

Glossary

 
This Glossary of Logical Data Modeling terminology includes industry standard definitions.

 
1st Normal Form
Each non-key attribute in the entity is a single-occurrence fact that is functionally dependent upon the PK. There should be a 1:1 relationship between the PK and each attribute in an entity. There should be no repeating groups.

2nd Normal Form
Each non-key attribute in the entity is functionally dependent upon the entire PK. This form applies only to tables with multi-column PKs. No attribute should relate to one part of the PK directly, and indirectly or not at all to the rest of the key: The attributes depend on the whole key.

3rd Normal Form
No attribute in the entity is transitively dependent upon the PK. No extraneous attributes (attributes that relate to other attributes more strongly than to the PK or relate to a missing entity more than to the PK): The attributes depend on nothing but the key.

4th Normal Form
The entity's PK represents a single multi-valued fact that requires all PK attributes be present for proper representation.

5th Normal Form
The entity represents, in its key, a single multi-valued fact and has no unresolved symmetric constraints. A 4NF entity is also in 5NF if no symmetric constraints exist.

Aggregate Attribute: SEE: Derived/summary attribute.

Aggregate Entity
A logical object that appears to be an entity but which contains only derived/summary attributes. It is, therefore, not a true logical data model entity.

Application Data
The data commonly thought of as "business data", such as account numbers, customer names, dates of birth, transactions, etc.

As-Is
As-Is Data Environment: The current data environment. The existing legacy systems along with data sources which might be purchased or available through on-line data providers.

Assertion: SEE: Constraint

Associative Entity
An entity which contains information relating to instances of two or more other entities or between different instances of a single entity. Also called Intersection Entity. Models a M:M relationship.

Atomic Data Object
A logical or physical data object whose meaning and/or value is not based on other data objects. For example, Unit Price is atomic, Gross Margin is not.

Attribute
An elementary logical data object that describes or provides details about an entity. A column is the physical implementation of an attribute. The terms column and attribute are commonly used interchangeably. Many people also call an attribute a field, as well as a column.

Availability (of data)
The level to which data objects should be available to business end users. That is, Availability addresses the question "How easy is it to access these data?".

BDO: SEE: Business Data Object

Big Bang Data Warehouse
A data warehouse that is created in one iteration and contains all the data available in the environment. It is impossible to create such a data warehouse in most enterprises for lack of available resources. It is inadvisable in any case because it would include a large amount of data that has no business value.

Business Data Item: SEE: Business Data Object

Business Data Object
A data object that is visible to and of use to the business end user. A BDO may be atomic or derived.

Business Process Model
A graphical and textual document which describes how business processes relate with respect to data interchange.

Business Question
A question that a business person needs answered in order to perform her/his business function or which, if answered, can provide insight into a business situation.

Business Question Analysis
The technique of determining the business data objects of interest to the enterprise by requesting each Business SME to submit a set of business questions, ranked by value. These business questions are used to identify business data objects.

Business SME
A subject matter expert on the business processes and operations of the enterprise. SEE: Subject Matter Expert and Data SME

Candidate Key
An attribute or set of attributes that can be used to uniquely identify instances of the entity. One of these candidate keys is chosen as the key of the entity. If no appropriate candidate exists, a system-assigned incremented number is used.

Cardinality

  • Table: The number of rows in the table
  • Relationship: A restriction on the number of instances of the dependent entity which appear for each instance of the independent entity. Example: 1:0-M. "1" and "0-M" are cardinality statements.

Categorization: SEE: Generalization.

Cell
The intersection of a row and a column in a table where an individual data value resides.

Column: SEE: Attribute

Complete Generalization
A generalization in which all possible types of the super-type appear in the logical data model as sub-types. SEE: Generalization and Incomplete Generalization

Compound Key
A key which contains more than one attribute.

Constraint
A business rule that applies to one or more attributes in one or more tables and defines limitations on appropriate values for the attribute. Example: Birth Date must be earlier than Hire Date.

Contradicting Definitions
One attribute or entity that appears to have two or more definitions that are contradictory. For example, Employee might be "anyone receiving pay from our company" or "anyone with an employment contract with our company".

Coverage (of data)
The level to which data objects represent the existence of their corresponding real-world object.

Data Accuracy
The level to which data objects represent the correctness of their corresponding real-world object.

Data Definition Language
DDL is the subset of SQL commands used to define tables, columns, indices, data space, etc.

Data Demographics
Descriptions of the characteristics of data expected to appear in the database. This is used to help size the physical database platform. It includes data type and size, indexes, nullability, variable lengths, update frequency and number of rows in the table. These data are qualified by time, including average growth/shrinkage and peak growth/shrinkage Example: Right before and after historical data purge.

Data Object
A discrete item of logical or physical data, such as an entity, table, attribute, instance, row, or index.

Data Manipulation Language
DML is the subset of SQL commands used to retrieve, update and delete data in the database.

Data SME
A subject matter expert on the physical and logical data environment of the enterprise. It is common that there are two subsets of Data SMEs, the people familiar with the logical data (reports and summaries prepared, usually, by data analysts and/or IT) and those familiar with the physical data (files, tables, etc.). SEE: Subject Matter Expert and Business SME.

Data Source
A physical table, file, etc. from which data serves as a source for a target business, logical or physical data object. SEE: Source Data Object and Data Sourcing Model.

Data Sourcing Model
The definition of the target business, logical and physical data objects, the source physical data objects and the mapping algorithms used in deriving the target(s) from the source(s). Included is the frequency of updates. SEE: Data Source and Source Data Object.

Data Warehouse
The physical and logical structures built to provide to the business end user a single logical view of the business data objects.

Database
A collection of related business data and the metadata that describes it.

DDL: SEE: Data Definition Language.

Decision Support

  • The act of making strategic and tactical decisions by drawing inferences from past, existing and projected states (as compared to making objective decisions such as inventory replenishment)
  • The environment which supports making the above-described decisions (e.g., data warehouses, ad hoc query user interface)

Degree

  • Table: The number of columns in a table
  • Key: The number of attributes in the key.

Demographics: SEE: Data Demographics and Query Demographics

Dependent Entity
The thing modeled exists only if some other entity exists, and depends on that entity for part of its identity. E.g., Line Items exist only within Purchase Orders, never on their own. Dependent entities have multi-column PKs.

Derived Data: SEE: Derived/Summary Attribute.

Derived/ Summary Attribute
An attribute whose values are algorithmically determined from the values of other attributes or whose definition is based on the definitions of other attributes. Example: UnitMargin is calculated from UnitPrice minus Unit Cost.

Dimension
As used by multi-dimensional databases (MDDs): A distinct logical hierarchical grouping, such as "geography" (e.g., Store, District, Region) which can be selected and used by the end user to display aggregations at any of the levels in the hierarchy.

DML: Data Manipulation Language.

Domain
The valid value set from which an attribute's values are drawn.

DSS: Decision Support System.

DW: Data Warehouse or Data Warehousing.

Dynamic Materialization (of a BDO)
On the fly creation (derivation) of the value of a non-atomic (derived) business data object. There usually is a performance penalty for dynamic materialization. SEE: Static Materialization.

Element
In physical data modeling, one column in a table or field in a file. Also called: Data Element.

Entity
An logical data object (person, thing, idea, concept, etc.) about which information must be maintained by the enterprise and which is itself described by other data objects (attributes). A table is the physical implementation of an entity. The terms entity and table are commonly used interchangeably.

Entity Pool
A collection of ideas about what might be entities. This is created in a brainstorming session early on in the logical data modeling project and serves as a means to stimulate conversation on entity possibilities and to serve as the basis for the first LDM.

Entity Relationship Diagram
An industry standard visual representation of an LDM, showing Entities, Relationships and Attributes.

Entity State Null
A NULL where the attribute instance might or might not be NULL, depending on the state of the particular entity instance in which it occurs. An identifiable subset of the rows in the table may have NULLs in this attribute.

Entity Type
The formalized term for an Entity. SEE: Entity.

Entity Type Null
A NULL where the attribute instance is always NULL for the same types of instances of the entity. An identifiable subset of the rows in a table must have NULLs in this attribute.

ERD: Entity Relationship Diagram.

Facet
A grouping of entities in an LDM that represent a single business functional area. SEE: Subject Area.

FK: Foreign Key.

Foreign Key
An attribute or set of attributes in an entity that appear in that entity due to key migration from another entity via a relationship. Notation is FK.

Generalization
A construct in which one entity (the super-type) is defined as consisting of a set of other entities (the sub-types) and in which the sub-types inherit the PK of the super-type. For example, an Insurance Policy (super-type) can be an Auto Policy, a Life Policy, a Homeowners Policy, etc (the sub-types). Generalizations are used when each sub-type carries different attributes from the other sub-types. Generalizations are also called categorizations or Super/Sub-types or Sub-Entities. SEE: Complete Generalization and Incomplete Generalization.

Hierarchy Relationship
A recursive relationship in which instances relate to other instance in a hierarchical fashion (e.g., ownership of each level by the next higher level applies). Example: A manufacturing concern that makes parts and also assemblies made of those parts. SEE: Recursive Relationship.

Homograph
Two words, both with the same construction, but with different meanings, Example: Lead can be either a very heavy silvery metal or to show someone the way. Synonym: Homonym.

IDEF0
A process modeling technique and syntax developed for the United States DoD in the early-mid 1980s.

IDEF1X
A logical data modeling technique and syntax developed for the United States DoD in the early-mid 1980s. Based on IDEF-1.

Identifying Relationship
A relationship in which the key attributes of the independent entity migrate into the dependent entity as key attributes.

Incomplete Generalization
A generalization in which there are types of the super-type which do not appear in the logical data model as sub-types. SEE: Complete Generalization.

Independent Entity
The person, place, thing, etc. modeled in this entity has existence in and of itself. Example: A Customer is a person who exists separate from the company. Independent entities have a single-column PK.

Industry LDM
An LDM that describes a specific industry as a whole. It describes the business functional data for the entire industry.

Instance (of a data object)

  • PDM: A row or record on a physical table or file, or a value in a column or field.
  • LDM: A hypothetical data value or set of data values that might appear in the attribute or entity. Logical data model instances are used to aid in the understanding of the logical objects. Also referred to as "occurrence" or "sample data".

Instance Diagram
A graphical technique used in logical data modeling or physical data modeling to show how instances of entities or rows in tables are related to each other. Also called "occurrence diagram".

Intersection Entity: SEE: Associative Entity.

Isomorphic LDM
A model that correctly represents the physical, logical or theoretical system it is meant to describe.

Join
In a physical database, the act of retrieving information from two tables based on a matched value in their related same-domain attribute. Example: Join the Department table to the Employee table on the Dept# column.

Key Attribute
An attribute that participates in uniquely identifying instances of the within an entity. SEE: Primary Key.

Key Migration
An independent entity's key attributes appear in a dependent entity and define its relationship to the independent entity. These attributes might be either key or non-key attributes in the dependent entity, depending on the type of relationship. The process of a Primary Key becoming a Foreign Key. SEE: Primary Key and Foreign Key.

Latency (of data)
The level to which data objects reflect changes to their corresponding real-world object, with respect to time. That is, Latency addresses the question "How soon is a change to the real-world object available to the business user who accesses this object?".

LDM: Logical Data Model.

LDO: Logical Data Object.

Logical Data Item: SEE: Logical Data Object.

Logical Data Model
A graphical representation of the way a business organizes its information. It is not a database, and is platform-independent. It is a logical and unbiased representation of the business data. The data model defines which individual data elements exist and how they relate to one another. It provides an accurate map of business information. The LDM is designed to answer all of the required business questions, both known and future that may arise in the business.

Logical Data Object
An entity or attribute from an LDM.

Many-to-Many Relationship
A single instance in entity A may relate to zero, one or many instances in entity B, and vice versa. Notation is M:M.

Metadata
Metadata is information that describes the data structure of a system. The LDM is a visual representation of metadata and is not implemented. When a database is implemented from the PDM, the data definitions for the tables, columns and relationships, as well as their physical characteristics (data type and length and indexing, etc.) are stored physically on the system. These are metadata. The tables that house these structural descriptions are also considered metadata because they define and the implemented system.

MRA: SEE: Multi-relationship Assertion.

Multi-relationship Assertion
An inter-entity assertion where the constraint statement spans more than one relationship.

Network Relationship
A recursive relationship in which any instance can relate to any other instance. Example: Origin and Destination cities for a Freight Forwarding company. SEE: Recursive Relationship.

Non-identifying Relationship
A relationship in which the key attributes of the independent entity migrate into the dependent entity as non-key attributes.

Normalization
Normalization is a set of rules and a methodology for making sure that the attributes in a design are carried in the correct entity to map accurately to reality, eliminate data redundancy and minimize update anomalies.

Null
An unknown or indeterminate value. It is not equivalent to zero or blank. A null is not even necessarily equivalent to another null.

Occurrence: SEE: Instance and Row.

Occurrence Diagram: SEE: Instance Diagram.

One-to-Many Relationship
A single instance in entity A may relate to zero, one or many instances in entity B, but not vice versa. Notation is 1:M.

One-to-One Relationship
Zero or one instance in entity A exists for each instance in entity B. Notation is 1:1. 1:1 relationships tend to become 1:M over time.

Orphan
A database row that violates referential integrity or its corresponding business rule. Example: An Employee who is assigned to a Department that no longer exists.

Orthogonal
Orthogonal variables are completely unrelated. They are independent. Changes in one have no effect on the other.

PDM: Physical Data Model.

PDO: Physical Data Object.

Physical Data Item: SEE: Physical Data Object.

Physical Data Model
What is actually implemented on a hardware platform. It is derived from the LDM to include changes that accommodate specific platform performance issues such as storage size and access performance.

Physical Data Object
A table, column, file, segment, field, etc. which exists on a physical device. The definitions of the objects may also be referred to as physical data objects.

PK: Primary Key.

Populate
Insert the application data into a defined database.

Populated
A database that contains application data. SEE: Populate.

Primary Key

  • LDM: The attribute or set of attributes that uniquely identify each instance in an entity. Notation is PK.
  • PDM: The column or set of columns that are used to establish the physical storage location of the row. NOTE: This is the generally used term in the industry, but is incorrect for Teradata. In Teradata, this is called the Primary Index (PI) specifically to differentiate it from the logical modeling term PK, since it may in fact be made of attributes other than the ones defined for the PK in the LDM and may not be unique.

Quality (of data)
A collective name for the measures of the relevant characteristics of data objects. Quality includes Accuracy, Availability, Coverage and Latency.

Query
An SQL SELECT statement that retrieves rows that meet specified criteria from the database.

Query Demographics
Descriptions of what queries will be run against the database. This is used in sizing processors and network links in the system. They are intelligent guesses about what queries will run from where, how often, by whom, expected response times, size (rows returned) and anticipated peak usage. Dynamic (as opposed to batch) updates are usually included as a separate group. SEE: Data Demographics.

Recursive Relationship
Data hierarchies and networks are recursive relationships, where an entity has a relationship to itself. Example: Part/Component Part. A part may be made of many component parts (an assembly), and it may in turn be a component part of some larger assembly. SEE: Network Relationship and Hierarchy Relationship.

Redundant Data
Data that appears more frequently than need be in a model (e.g., department name if carried in the employee entity would result in the same character string being repeated in many instances), or which can be derived from other elements in the model. SEE: Derived/Summary Attribute.

Referential Integrity
The state where an FK value relates to a PK value elsewhere in the design. A breach of referential integrity is the opposite: An FK value exists that does not also exist as a PK value elsewhere in the design. Example: An employee's department number as carried in the Employee entity does not exist in the Department entity.

Relation
Synonym for Entity in formalized relational theory.

Relationship
A statement about how two or more entities are related and the restrictions placed on that relationship. Example, a Department employs zero, one or many Employees and an Employee can work for one And only one Department (italics identify the relationship between Department and Employee).

Relationship Name
A text description of a relationship shown on an ERD next to the line that connects two Entities to visually represent their relationship.

Report
The result of a query.

Row
The data representing a single member of an Entity. Example: John Smith is represented by a row in the Employee table. Synonyms: Instance, Tuple.

Select
A query chooses specific data from one or more tables for a report, based on specific criteria (e.g., values in attributes). Example: Select all Men, age 50 or older, who make more than 50K annually.

Set Theory
A field of mathematics that deals with groupings of data. It is one of the foundations of relational theory.

SME: Subject Matter Expert.

Snow Flake Schema
A model consisting of multiple star schemas. SEE: Star Schema.

Source Data Item: SEE: Source Data Object.

Source Data Object
A physical data object (e.g., table, column, file, field, etc.) used in creating a data warehouse data object. SEE: Data Source and Data Sourcing Model.

Star Schema
A model where the tables are denormalized in a very specific way, usually for a specific application or set of queries. It optimizes specific sets of joins and is a PDM technique that should not be used during LDM where normalization is the ideal. Also known as dimensional modeling. The basic premise of the Star Schema is that information can be classified into 2 groups:

  • Facts: Core elements being analyzed e.g., point-of-sale (POS) data
    • There are no cyclic or recursive join paths
    • Denormalized to avoid multi table joins (for performance)
  • Dimensions: Attributes of the facts e.g., product, market, promotion ,time
    • Most if not all analysis is based upon dimensions
    • Every PK of a dimension table is an FK in the fact table

Static Materialization (of a BDO)
Physical creation (derivation) and storage of the value of a non-atomic (derived) business data object. There usually is a performance benefit for static materialization. There are only two reasons to do this:

1) Performance (reduces constant recalculation and full table scans)
2) Explicit documentation of the algorithm to prevent change.

SEE Dynamic Materialization.

Sub-Entity: SEE: Generalization.

Subject Area
A grouping of Facets into a larger business functional area. SEE: Facet.

Subject Matter Expert
A person highly knowledgeable of the subject at hand.

Sub-type: SEE: Generalization.

Summary Attribute: SEE: Derived/Summary Attribute.

Super-type: SEE: Generalization.

Table: SEE: Entity.

To-Be
The business process environment which is hoped to exist after whatever process redesign is complete.

Tuple: SEE: Row.

View
A pre-defined subset of one or more tables. Views are used to control access to the underlying tables and simplify access to data. All DBMSs support views, but Teradata materializes the view.

Visible
Data that can be seen by a user of the data warehouse regardless of where originate. Visible data may dynamically calculated, or may be stored in the current or some other system and dynamically accessed by the data warehouse facility.