1: LDM Overview

Objectives

 
This module enables the student to:
  • Define the term Logical Data Model (LDM)
  • Articulate the importance of an LDM
  • Describe the importance of creating a good LDM
  • Identify the differences between an LDM and a Physical Data Model (PDM)

What is an LDM?

 
A Logical Data Model (LDM) is a graphical representation of the way a business organizes its information. It is not a database, and is platform-independent. It is a logical and unbiased representation of the business data. The data model defines which individual data elements exist and how they relate to one another. It provides an accurate model (or map) of the business information. The LDM allows the user to determine if all their business questions can be answered.

LDM Deliverable
An LDM is the deliverable that results from the logical data modeling process. It must have no unnecessary components and therefore minimal data redundancy. It represents data requirements in a correct, sharable, stable and flexible format:

  • Stable designs can service many application and business requirements over the long term
  • Correct designs map the data one-to-one to the business as it is in the real world
  • Sharable designs reflect no bias toward a particular application or business requirement
  • Flexible designs change little or not at all as the business environment changes

Industry LDMs

 
Teradata Solution Industry LDMs
Logical data models (LDMs) are critical in the design, creation and maintenance of data warehouses and are the heart of many Teradata solution offers. Teradata has created industry specific LDMs that are designed to support the specific business needs of that industry. Each industry LDM is comprehensive and ensures that the appropriate data will be included in the warehouse for those critical business needs.

The major Teradata Solution Industry specific LDMs are Communications, Financial, Retail, and Travel. The Teradata Division has invested many person years developing and implementing these models so that they require minimal modification for a customer implementation. This means the data modelers can get the data warehouse designed in minimum time with maximum functionality. Using a model that is already tested, accurate, and complete gives the customer a sense of security and confidence in our ability to deliver. Training for each specific industry LDM is available.

Benefits and Differentiators

 
Overview
There are many reasons to use the Teradata Solution Industry LDMs. One of the most common is faster warehouse implementation, as noted above. Teradata's LDMs jumpstart the planning, design and implementation process of a data warehouse. It is a template or starting point for the warehouse.

Industry LDM Differentiators

  • Cost Savings: The cost is low compared to competitors. It enables companies to ask business questions within several months instead of years, saving significant money and resource expenditures.
  • ROI: Faster implementation means achieving ROI sooner by implementing better, more insightful business decisions within a shorter time frame.
  • Reduces Risk:
    • Proven at other existing customer locations
    • Current with best practice
    • Supports known industry business requirements
    • Provides a long term foundation
    • Teradata is expert at modeling
    • Proves business question feasibility and data requirements
  • 1:1 Mapping: Teradata supports near-one-to-one mapping from the LDM to the physical design, which isn't possible with competitors' products.
  • Low Maintenance: Near-one-to-one mapping saves significant time and money initially, and even more over time as it reduces the need for re-design of both the data warehouse and its applications when additional functionality is required.
  • Communication: Creates a common business language that lets people from all walks of the enterprise, from most to least technical, communicate during warehouse implementation. This makes buy-in easier, because each party can see that what they want and need is addressed.
  • Flexible/Extensible Modularity: Gives the customer the flexibility to implement whatever functional portions of the LDM are useful to their business when they start, and scale up later with additional modules.
  • Cross Functional: The LDM provides a single, unbiased customer view across all products and functions of an enterprise.
  • Customer Centric: Allows a customer to be viewed in any and all ways the organization defines that customer. Many of the industry LDMs support an unlimited number and depth of customer hierarchies and relationships to reflect different views of the customer.

Before we can modify an industry specific LDM, we first need to understand why LDMs are needed and understand the basic structure and philosophy behind their design.

Why Create an LDM?

 
Benefits of the LDM to the Customer
  • Creates an accurate, consistent model of the way the business views its data
  • Explicitly states business rules
  • Provides a roadmap for data integration across the enterprise
  • Creates a foundation of understanding about what data requirements exist, not how these data are used
  • Cross validates with the process model
  • Provides consistent communication of business meanings across all levels of participants
  • Looks beyond current system limitations
  • Provides an objective means of IT development project prioritization
  • Starts with a complete, accurate picture of the business before attempting to determine the best data warehouse implementation approach
  • Improves probability of success by creating stability.

PS and the LDM
While our solution LDMs are already created, during implementation some level of modification is always required. Usually, Teradata associates handle this process, from gathering information on what changes are desired, to actually implementing them. A full understanding of the existing LDM as well as modeling principles is essential, so that the changes they define continue to uphold the integrity of the original model. Teradata uses the LDM during an engagement to communicate with the customer about the content and progress of the project, as well as to document decisions, and guide implementation tasks.

Sales and the LDM
While the LDM is marketed as an integral part of the Solution, it's important to know what it buys the customer so that you can discuss the advantages cogently and leverage them.

Building an LDM for a large enterprise wide system is an immense task. It takes many person years. Although building an enterprise LDM is a huge task, it is typically implemented in phases, and doesn't have to be completed before any other work can be done. Because it is so complex, it takes many iterations to fine tune and be sure that nothing is missed. The good news is that our customers don't need to build their own LDM for any industry where we've developed a solution LDM to go with our Teradata Solution. Using a solution LDM can save the customer years of effort and millions of dollars.

Analogies Help Explain the LDM

 

Foundation Analogy: The LDM is the foundation on which the physical design, implementation, and applications are built. The foundation of a building determines its overall structural integrity and ultimately its usefulness and longevity (and costs associated with later repairs). The LDM does the same for a data warehouse (DW). If the LDM is not designed correctly, the entire decision support capability is compromised. If the foundation is faulty, the building fails, causing huge cost at the least and disaster at the worst.


Map Analogy: An LDM is like a map that shows what and where all the data is. And like a map, it is only valid and useful to the extent that it reflects reality. If a map does not show a particular city, the city may still exist, but a traveler won't be able to find it using that map. An LDM should be detailed and reflect every aspect of the business accurately. For example, try finding Zurich on this map


Communication Analogy: Imagine trying to build a house without a blueprint! How would the various contractors communicate with each other? What kinds of inconsistencies, errors, expenses and emergencies would arise from the lack of a clear, consistent plan? The house might never be completed at all. If it were completed, it would probably not match the buyer's expectations.


Engine Analogy: The LDM is like the engine of a car: you don't need to understand exactly how it works, or ever see or touch it yourself. But without it, the car won't run at all, and it's best to have an expert build it for you. A well designed LDM is essential to the success of the data warehouse in supporting the business.

LDM vs. PDM

 

An LDM is NOT the same as a PDM!

A Physical Data Model (PDM) is what you actually implement on a hardware platform, and may be designed to accommodate specific platform performance issues. The LDM is exclusively a logical and unbiased representation of the business data. It does not address platform or performance considerations at all (e.g., It is platform and hardware independent). The PDM may be created from the LDM, but only after the LDM is complete and validated.

  LDM PDM
IncludesEntities and AttributesTables and Columns
Rows Identified ByPrimary KeyPrimary Index
Names UsedBusiness NamesNames may be limited by DBMS requirements
NormalizationNormalized to 3rd Normal Form (discussed in a later module) May be denormalized for performance and easier programming depending on platform
RedundancyNo redundant dataMay include redundancy
Derived DataNo derived dataMay include summarization
Who Develops ItBusiness experts and logical designersPhysical design experts

Simplified Scenario for a Complete DW Project:

  1. The data required to answer the business questions is originally identified, documented and validated in the LDM, without reference to any platform.
  2. A PDM is developed from the LDM to define the data warehouse physical structures (tables, columns, indexes, etc). The PDM is generally somewhat different from the LDM because it is platform-specific and is optimized for performance.
  3. The PDM is implemented on the platform, at which point an empty data warehouse exists.
  4. The data warehouse is populated with the company's source data.
  5. User applications are created to access the data warehouse so that the users can answer their business questions.

Relational Theory

 
Purpose & Foundation
Relational Theory is the basis of all aspects of logical data modeling, from the modeling process, to normalization rules to the resulting design. Relational Theory is the basis for the underlying structure of relational databases as well as SQL (Structured Query Language). SQL is a set-processing programming language designed specifically to work with relational databases. It is not necessary to understand Relational Theory in depth in order to do good logical data modeling. However, these core concepts from Relational Theory underlie the techniques you will be learning.

Relational Theory stresses:

  • Simple, meaningful data representation
  • Structure
  • Integrity
  • Easy, full data access and manipulation

Relational Structure

  • The basic unit of structure in a relational system is a "Relation"
  • A relation is a two dimensional table made of rows and columns
  • Rows are also called Tuples
  • Columns are also called Attributes
  • Tables are also called Entities

More in-depth discussions of these and related concepts are included throughout the rest of this course.

ERD Definition

 
An LDM is ultimately documented in an ERD (Entity-Relationship Diagram). The ERD is a standard modeling technique for defining and graphically representing an LDM:
  • Entities: Represent people, places, things, concepts, or events for which the business has the means and the will to collect and store data.
  • Relationships: Business associations that link occurrences of one or more entities and tell how they relate to one another.
  • Attributes: Data facts about an entity or a relationship. It is the lowest level of information that still has business meaning without further decomposition.
The following synonyms are used interchangeably in this course, as they are in the industry:
  • Entity = Table
  • Attribute = Column
  • Assertion = Constraint (business rule)
  • Instance = Row
ERD Conventions
All of the concepts below are much more fully defined in the following modules.
  • Entities are shown as boxes with their Key Attributes above the horizontal line within a box.
  • Relationships are shown as labeled lines between boxes.
  • Non-Key Attributes are shown below the horizontal line within a box.
  • A line indicates a relationship.
  • A solid dot represents the "many" side of a one-to-many relationship
Note: The conventions described in the sample below are based on IDEF1x notation.

Simple ERD Example (See graphic on left)
An ERD gives you a high level overall view of the system. In actual use, LDM ERDs usually do not use abbreviations. They are used in this course to reduce scrolling for your convenience.

Note: ERDs can be LDMs or PDMs. In actual use, LDM ERDs don't usually have abbreviations, but PDM ERDs frequently do due to the physical database name size limitations. Abbreviations are used in this course to reduce scrolling for your convenience.

Not-So-Simple ERDs (See a small part of an LDM)
Real ERDs for industry solutions usually include many dozens of entities and several hundred attributes. They are extremely complex, resembling a circuit board diagram, and can often cover many feet of a wall when printed and displayed in full. By the end of this course you will be able to read and understand an industry solution LDM. ERD conventions are discussed in detail in Module 8 of this course.

ERD vs Table Layouts

 
Instead of ERDs, a table layout format is used throughout this course to facilitate learning. Since people have a great deal of prior personal experience looking at tables in a wide variety of life contexts, they are very easy and intuitive to understand. ERDs are much less so.

The final module of the course covers ER Diagramming conventions in detail, and shows how the table formats translate into ERDs. ERDs are not used in this course for teaching the modeling process. However, at a few key points in this course, ERD versions of the table layouts will be available for comparison. ERDs help people understand the big picture, while table layouts make the details clear.

Table Layouts

 
Table layouts give you a more detailed definition of the tables. They are an alternate way to document an LDM that is especially useful while learning about data modeling. Table layouts show entities (sometimes called tables), attributes (sometimes called columns) and relationships, as does the ERD. But, by adding sample data values, table layouts provide a level of detail that make many concepts easier to grasp visually and intuitively. Here is the Table layout equivalent of the ERD just shown:

Table Layout Conventions
Table and column names are abbreviated to minimize scrolling. For visual clarity and optimum learning, the following standards are used:

  • Entities including their Key Attributes (marked PK for Primary Key) are shown in Black.
  • Relationships including their Key Attributes (marked FK for Foreign Key) are shown in Blue.
  • Non-Key Attributes are shown in Red.
  • The vertical double line divides the PK (left) from all other attributes (right)
  • The horizontal double line divides the column headers and constraints (top) from the sample data (bottom)

NOTE: ERDs and how they are used at customer engagements are discussed more comprehensively at the end of this course.

Modeling Process Overview

 
The Big Picture
Data modeling is the process of identifying, defining, validating and documenting all of the entities, relationships, and attributes and assertions that accurately represent the business data to fulfill the business requirements.

It is a very detailed and complex task that can take many person-months or even years for a large enterprise. That's the bad news. The good news is that the process is highly logical and intuitive, and is made of many small reasonably simple tasks. The complexity of data modeling is in the number of pieces to be defined, not in the process of defining them. Using a simple business example makes learning the process fairly easy.

Modeling Steps
These are the high level steps and sequence used to create a viable LDM. Details are covered in the next module.

  • Step 1: Define Business Requirements/Scope
  • Step 2: Define Entities
  • Step 3: Define Relationships
  • Step 4: Define Attributes
  • Step 5: Verify the Model

Try It: LDM vs. PDM

 
Try It! Exercises
Don't skip the interactive exercises that are sprinkled throughout this WBT. They are not only the most fun part of the course, but the richest for your learning. Data modeling is a big, complex task, and these exercises are designed to give you hands-on experience with each aspect of it. In addition to the small exercises, there are two larger ones, that may take you up to 30 minutes each. We recommend that you do them, as they bring all the prior learning together by simulating the entire data modeling process for a small enterprise.

Try It: LDM vs. PDM

 
Directions
The following statements were overheard at a customer site. Identify whether each speaker is referring to an LDM or a PDM:
IT Director:
Jim Watanabe

It's a complete model as far as being an unbiased representation of the business data. But it doesn't take into account the platform issues we must deal with.

This customer is referring to a:
LDM
PDM
Marketing Director:
Carmen Smith

I still have a few business questions that I'm not sure I can answer with this model.

This customer is referring to a:
LDM
PDM
End User:
Waneta Rogers

Thanks! This model has made it much easier to communicate with the people over in accounting about what we need in the new system!

This customer is referring to a:
LDM
PDM
DBA:
Doug Fortuna

I've got the model modified so that it will perform well and give users what they need in the timeframe they need it.

This customer is referring to a:
LDM
PDM

 2: Defining Entities

Objectives

 
This module enables the student to:
  • Define what an entity is in the context of an LDM
  • Identify entities from a user requirements description

Entity

 

What is an Entity?

An entity is something about which information must be maintained: A person, place, thing, concept, idea, etc. It is simplest to visualize an entity as a table with rows and columns.

Rules for Identifying Entities

 
The following are rules for identifying entities:
  • Must pass these tests:
    • It is within the scope of the project
    • It can be clearly and uniquely defined
    • Each instance can be uniquely differentiated from all other instances
    • Is not dependant on the existence of another entity
  • Usually there is other information that describes it
  • Entity name must be:
    • Unique within the model
    • A singular noun, possibly modified by adjectives
  • Relational Rules:
    • The relative position of entities on a diagram is irrelevant
    • The relative position of rows within a table is irrelevant

Entity Definition Problems

 
People tend to be fairly casual in the way they think about and describe their business needs and processes. Ask questions to specify the meaning of names that are under consideration as entity names. Here are two common situations to be alert for, and to resolve explicitly when they arise.
  • Synonyms: Different names which mean the same thing (e.g., Employee vs. Worker)
  • Homonyms: The same name means different things to different users

  • Account for the Accounting Department
  • Account for the Sales Department

Try It: Entities

 
Directions: A hardware vendor has the following business requirements. Using the rules for identifying entities from the previous page (a copy is provided for your reference below), identify the three main entities you would define.

OK. Here's what we need. We want to track purchase orders (POs) for customers. Our POs have to be at the line item level so that we can tell what customers ordered what specific parts, and when, and how many of each and what we charged on the PO. And we track the customer's name and address for shipping purposes, though shipping itself is not important right now. We also keep track of various types of customers, for example, individuals and corporations. By the way, some of our parts are composed of other parts we sell. We often include special instructions on a PO, that we write into the remarks box on the form.

The following are rules for identifying entities:

  • Must pass these tests:
    • It is within the scope of the project
    • It can be clearly and uniquely defined
    • Each instance can be uniquely differentiated from all other instances
    • Is not dependant on the existence of another entity
  • Usually there is other information that describes it
  • Entity name must be:
    • Unique within the model
    • A singular noun, possibly modified by adjectives
  • Relational Rules:
    • The relative position of entities on a diagram is irrelevant
    • The relative position of rows within a table is irrelevant

Primary Key

 
What Is a Primary Key?

Remember the rule for defining entities: Each instance of an entity must be uniquely differentiated from all other instances? Primary Keys are the way we enforce this rule.

A PK is a unique identifier. It is a column or group of columns whose combined values are unique for every row in the entity table. PK columns are traditionally shown as the first columns in a table (for human convenience, though they need not be in any order according to relational theory), and are called out explicitly as the PK in both ER Diagrams and Table Layouts.

Rules for Primary Keys

 
For each entity, you must determine what unique identifier to use (there may be several candidates for PK), and fill it in on your model. Make sure it adheres to all of these rules for PKs:
Rule Reason for Rule
Must be one per table Otherwise rows could not be uniquely identified.
Only one per table While there may be more than one column or set of columns that are unique, only one (set) is defined and used as the PK for modeling consistency.
Value must be unique If duplicates were allowed, it would no longer uniquely identify the row.
Value cannot be NULL NULL means unknown. If the value is unknown, theoretically it could be equal to some other value and therefore be a duplicate. NULL also means that a value for that attribute does not apply for that instance.
Value should not be changed History is tracked by PK. If the values in the cells change over time history for the original value becomes invalid.
May be made of any number of columns It may take a combination of several columns to achieve uniqueness. This is called a compound key.

PK Definition Issue: Uniqueness

 
Identifying a key with unique values can be troublesome. Many times a user already has identifying numbers they use for their major entities. Often the currently used numbering does not enforce uniqueness, or has changed format over time, so that a single row has more than one identifier. To make it worse, users tend to be emotionally and habitually attached to their legacy numbering schemes. Always ask explicitly if there are any known duplicates, or if the numbering scheme has changed in the past. If the answer is yes, often the best answer is to assign an incremental number, named to match the entity. That way uniqueness is assured in all cases. You can still carry the other historical numbers for reference, without using them for identification.

Intelligent Keys

 
PK Definition Issue: Intelligent Keys

An intelligent key is one that is made up of several parts, each of which has meaning to the user. Examples include phone number (for telecommunication companies) and Social Security Number for the IRS. Whenever the user offers an intelligent key as a candidate for PK, advise them to use a uniquely assigned number and carry the intelligent key as an attribute because there are several serious problems with this kind of key:

  • Over time the meaning of the components may change
  • Over time the values of the component may change for a row
  • Often they are composed of more than one type of data (alpha plus numeric)
  • Because of the kinds of changes noted above, intelligent keys usually already include:
    • Duplicates (more than one row with the same value)
    • Multiples (a single row that has had more than one value over time)

Intelligent Key Disaster Example

 
Our hardware vendor has always identified their parts in the past by the warehouse Aisle, Level and Bin number: A-3-47, so that they can instantly find whatever part the customer is asking for. An uninformed data modeler gives them a system that retains this PK for their Part table, as requested. Six months later they acquire another hardware concern and start selling twice as many types of parts. They expand their floor space and reorganize their inventory for customer convenience. Now they have to reassign new IDs to almost every part! It's a huge job, but they do it. Then a customer calls in with a question on a PO from before the change. Mass confusion results because the part numbers on the PO now refer to entirely different parts! They must create a translation table, or suffer endless confusion. All of their historical data has this problem. And it will all start over if they ever do another acquisition.
Before MergerAfter Merger
Key
A-1-1
A-1-2
A-1-3
A-1-4
Description
Bolt
Hammer
Nail
Pliers
Key
A-1-1
A-1-2
A-1-3
A-1-4
A-1-5
Description
Awl
Bolt
Hammer
Nail
Pliers
What is Part A-1-3?
What is the correct number for a Hammer?

PK Sample Data

 
Table layouts provide space for sample data values, which can help validate with the user that the combination of attribute values is in fact unique. For multi-column keys, it is best to put duplicate values in each key column, and vary them so that the combinations are unique. This makes it graphically clear to designer and user why multiple columns are needed for the PK. It may also unearth errors in assumptions about valid data values.

Try It: PKs

 
Directions: Identify a unique PK for each of the main entities defined for the hardware vendor by choosing the best PK candidate column from the related drop down list.

1.


2.


3.


Show completed model (Displays the complete model for this exercise.)

4. Do all of the entities shown correspond to a specified business need?
Yes   No   Some  

 3: Modeling Relationships

Objectives

 
This module enables the student to:
  • Define what a relationship is in the context of an LDM
  • Identify relationships for a specific LDM in table layout format from a given user requirements description

Relationships

 
What is a Relationship?
Functions in a business are naturally related in the real world, so the data that represents them is also related. A relationship is a statement about how two entities interact with one another: Customers buy Products; Employees work in Departments. Defining the relationships between tables is a core component of relational design.

Direct and Indirect Relationships

 
You must interview the user to determine if relationships are direct or indirect, because that decision depends on business rules.
  • Direct: An entity relates directly to another entity without reference to any other relationships. Example: A Customer initiates a Purchase Order.
  • Indirect: An entity relates to another entity through one or more other relationships. Ultimately all indirect relationships are constructed from direct relationships at the lowest level. Example: In our hardware scenario, a Customer orders Parts, but not directly. The customer actually initiates a Purchase Order on which each Part appears. PO/Customer is a direct relationship. PO/Part is a direct relationship. The relationship between Customer and Part is through the existence of the PO, so it is indirect.
If you model the direct relationships completely, the indirect ones are taken care of. To tell if a relationship is direct or indirect, ask the question: Does anything else have to happen for this relationship to exist? If the answer is no, the relationship is direct. If the answer is yes, the relationship is indirect.

Rules for Defining Relationships

 
Each relationship:
  • Has a name which is unique when combined with the names of the tables involved
  • Must apply equally to ALL rows in the tables
  • Has a Cardinality Statement: Dictates the number of rows of one table that can be related to a single row in the other table

Steps for Defining Relationships

 
There are basically three steps to defining the relationship in an LDM:
  1. Identify/document which pairs of tables have relationships between them
    • Is there a relationship?
    • Is it direct or indirect?
    • Name the direct relationships.
  2. Identify the cardinality of each direct relationship(discussion follows)
  3. Place Foreign Keys to model the relationships

Step 1: Relationship Matrix

 
The purpose of the matrix is to reduce the work to a minimum while guaranteeing that no possible relationships are missed. The matrix defines the nature and name of the direct relationships between each possible pair of entities. In an actual data modeling engagement, the matrix is not generally used, but for learning purposes it makes the underlying process clear.

Entity Aâ Entity BàCustomerPartPO
Customer   
Part   
PO   

How to Complete a Relationship Matrix

 
The following tasks describe how to complete a relationship matrix:
  • Alphabetize and count the entities identified
  • Create a matrix that is 1 column and 1 row bigger than the number of entities identified
  • Put all of the entities alphabetically down the left hand column of the matrix (Entity A)
  • Repeat them alphabetically across the top of the matrix (Entity B)
  • Shade the bottom half of the grid, below the top-left-to-bottom-right diagonal
    • Shading indicates that these cells can be ignored from now on
    • Shaded cells represent relationships already handled in the unshaded cells
  • For each unshaded cell, answer these questions:
    • Does a member of entity A relate to a member of entity B?
      • If the answer is no, shade the cell. Go to the next cell.
      • If the answer is yes, go to the next question.
    • Does anything else have to happen for this relationship to exist?
      • If the answer is no, this is a direct relationship. Go to the next question
      • If the answer is yes, shade the cell. This is an indirect relationship.
    • What is the relationship?
      • Describe it in one sentence.
      • Name the cell based on the verb that describes the relationship in that sentence.

Example

 
Possible Paths through the Flowchart:
  • Customer to Customer: Purple Arrows
  • Customer to PO: Green Arrows
  • Customer to Part: Red Arrows

QuestionY/NAction
Does Entity A (Customer) relate to Entity B (another Customer)?NoShade the cell.
Does Entity A (Customer) relate to Entity B (Part)?YesAsk 2nd question.
Does anything else have to happen for this relationship to exist?Yes*Shade the cell
Does Entity A (Customer) relate to Entity B (a PO)?YesAsk 2nd question.
Does anything else have to happen for this relationship to exist?NoAsk 3rd question.
What is the relationship? Answer: A customer initiates a PO. Name the cell.

* The customer must order the part (e.g. generate a PO).

Resulting Matrix

Entity Aâ Entity BàCustomerPartPO
Customer  Initiates
Part   
PO   

Try It: Matrix

 
Ask the following questions for the remaining unshaded cells, and choose the most appropriate action for each:

Entity Aâ Entity Bà CustomerPartPO
Customer  Initiates
Part 
Shade the cell
Name the cell
Shade the cell
Name the cell
PO   

Step 2: Cardinality

 
Types of Relationship: Cardinality

There are three main types of relationships between entities which are differentiated by their degree of cardinality:

TypeDefinitionExample
1:1 Zero or one row in table A exists for each row in table B. This is extremely rare. 1:1 relationships tend to become 1:M over time. The only valid examples of 1:1 relationships exist where time is permanently irrelevant. Heart Donors to Hearts.

Notice that Heart Recipients to Hearts might not necessarily stay 1:1 over time!

1:M A single row in table A may relate to zero, one or many rows in table B, but not vice versa. This is common. A specific Customer may have many POs over time, but a specific PO belongs to only one Customer.
M:M A single row in table A may relate to zero, one or many rows in table B, and vice versa. This is also common. A specific Employee may work in many Departments concurrently, and a specific Department will have many Employees.

Step 2: Defining Relationship Cardinality

 
Once you know how the entities are related, it remains to determine the degree of cardinality.
  • Define cardinality only for unshaded, named cells (e.g. direct relationships)
  • Ask and answer two questions, substituting the relationship name to clarify the question:
    • An instance of Entity A can have, at most, how many instances of Entity B concurrently?
    • An instance of Entity B can have, at most, how many instances of Entity A concurrently?
  • Answer each question with either "One" or "Many"
  • If both answers are "One" it is a 1:1 relationship. Ask further if this will always be true over time within the scope of the system. If the answer is no, it is really 1:M.
  • If both answers are "Many" it is a M:M relationship.
  • If one answer is "One" and the other is "Many" it is a 1:M relationship.
  • Post the result in the cell following the relationship name. Use 1:1, 1:M, or M:M notation.
See Flow Chart

Example

 
Example: Customer/PO Relationship Cardinality
QuestionResult
A Customer can specify, at most, how many POs concurrently?Many
A PO can be specified by, at most, how many Customers?One

Try It: Cardinality

 
Directions: Assume the same hardware store scenario for which you've already defined entities. Go through the matrix below and for each filled-in cell, designate the relationship cardinality (1:1, 1:M, or M:M).  
Entity Aâ Entity Bà Customer Part PO
Customer   
1:1
1:M
M:M
Part 
1:1
1:M
M:M
1:1
1:M
M:M
PO   

Step 3: Foreign Keys

 
What Is an FK?

Remember the rule for defining relationships: Cardinality dictates the number of instances of one entity that can be related to a single instance of the other entity? Foreign Keys are the way to model and enforce this rule and the way to model relationships. A PK identifies uniqueness while an FK identifies relationship. FK columns are documented explicitly (marked as FK) in both ER Diagrams and Table Layouts.

An FK is a column or group of columns whose combined values appear elsewhere in the model as a PK. Where it appears as a PK it is called the "parent" side of the relationship. Where it appears as an FK it is called the "child" side of the relationship.

FK Constraints: ND and NN

  • Duplication: An FK may or may not have all unique values, depending on the situation. If an FK must be unique, it is identified as ND (No Duplicates) in a table layout next to the FK designation.
  • NULLs: An FK may or may not have null values, depending on the situation. If an FK must have a known value, it is identified as NN (No NULLs) in a table layout next to the FK designation.
  • In an ERD these designations are not shown. They are covered in the attribute definitions.

Example

 
Home Construction Enterprise Example:
Lot# 14 is a PK value in the Lot table. It is an FK value in the House table. These two designations together tell us that there is a relationship between a Lot and a House. It also shows that Lot is the parent (the side where the PK is), while Lot# in the House table is the child (the side where the FK is). The PK is said to "migrate" from the parent to the child table.

Rules for Foreign Keys

 
You must document each direct relationship correctly in your model, and there is a different way to document each of the three types of relationships. Make sure it adheres to all of the rules for FKs:
Rule Reason for Rule
May be zero, one or more per table All tables in the model should be related to other tables, otherwise they are outside the scope of the system. However, a table may be exclusively on the PK end of all its relationships, and carry no FKs to other tables. This is frequently the case with minor entities which list valid values used elsewhere in the system. An example might be State table in the USA.
Value may be non unique In 1:M or M:M relationships, there are duplicates by definition on the M side of the relationship. An example might be a customer with several POs. On the PO table, the Cust# for that customer would appear multiple times.
Value may be NULL A particular row may not have a current relationship where one is possible. An example might be an employee who is currently not assigned to a specific department. The employee would have a NULL in Dept#.
Value may be changed Relationships do change over time. An example might be that an employee changes departments. In the Employee table the value for Dept# would change for this employee.
May be made of any number of columns Since an FK cannot exist without a matching PK elsewhere in the system, this rule which governs PKs also governs FKs. This is called a compound key.
Both the attribute and the value within it must exist as a PK in a related table If this rule is broken it is called a Referential Integrity Violation. An example might be: In the real world, an employee is assigned to a department. If the department is disbanded, and the employee's row is not updated to show the change, the employee appears to continue to work in a department that does not exist. The reference between tables that identifies the relationship is no longer valid.

Modeling 1:1 Relationships

 
The following are tasks for modeling 1:1 relationships:
  • Place the PK from either table as an FK in the other table.
  • Decide which entity is the parent and which is child based on the direction it is most likely to become 1:M over time.
  • An FK for a 1:1 relationship is always constrained as ND (no duplicates) because duplicates would, by definition turn it into a 1:M relationship. If it becomes 1:M over time, simply remove the ND constraint, since you've already modeled it in the correct direction to support that eventuality.
  • Fill in sample data that tests all known constraints.

Example: House to Lot (1:1 relationship) for a contracting enterprise. Note: True 1:1 relationships tend to migrate into 1:M over time (as this one does). However, it is a valid way to model a relationship that the enterprise does not expect to change over a long period of time (i.e. longer than they are interested in tracking it).

Notice that there are two valid ways to model this relationship. Always model in the direction most likely to become 1:M over time. Since this is a contractor, and not a house moving company, the Lot is parent and the House is child. A contractor might build a second house on the same lot due to fire, for example.

Modeling 1:M Relationships

 
The following are tasks for modeling 1:M relationships:
  • Place the PK from "1 side" of the relationship as an FK in the "many side" of the relationship.
  • An FK for a 1:M relationship may be constrained as NN (No NULLs. NULLs are discussed later in this module)
  • Duplicates must, by definition, be allowed, otherwise it is a 1:1 relationship.
  • Fill in sample data that tests all known constraints
Example: Departments to Employees (1:M relationship) for a retail enterprise. The business rule for this enterprise is that each employee is assigned to only one department at a time, and every employee must be assigned to a department at all times. Naturally, a department will normally have more than one employee assigned to it. There is only one valid way to model this relationship: Put the DeptID as an FK into the Employee table.

Notice that the sample data makes it obvious which way is correct:

Left Example Right Example
  • Any given department could only have one employee at a time
  • An employee might work in more than one department at a time
  • An employee might not be assigned to a department
  • Any given department can have multiple employees
  • An employee can work in only one department
  • An employee must be assigned to a department at all times

Modeling M:M Relationships (Associative Entities)

 
The following describes aspects of modeling M:M relationships (associative entities):
  • Modeling a M:M relationship requires the creation of a new entity.
  • An entity that models a M:M relationship is called an Associative Entity.
  • Place the PK from each entity as an FK within the PK of a new entity.
  • Name the new entity for its parent entities.
  • There is no definite parent or child "side" of the relationship. Both original tables can be considered parents, and the new table can be considered the child.
  • Both FKs are automatically constrained as NN because they are part of the PK.
  • Each FK may have duplicates, but the combination (e.g. the PK) may not.
  • Fill in sample data that tests all known constraints. Show duplicates in each FK.
Example: Departments to Employees (M:M relationship) for a manufacturing enterprise. The business rule for this enterprise is that each employee may be assigned to multiple departments at any one time, and every employee must be assigned to at least one department at all times. Naturally, a department will normally have more than one employee assigned to it. There is only one valid way to model this relationship: Create a new associative entity with a PK made of the two FKs, naming the new entity for its parent entities.

Try It: FKs

 
It is important at each step to document the model in order to validate its completeness and accuracy. Below is the model as it exists with the entities and PKs we've defined.

Do this exercise first on paper. Draw the three main entities (Customer, PO, and Part), identify the three direct relationships, and then place FKs with appropriate constraints (NN, ND) based on what you've just learned about how to model relationships. Then, do the exercise here, using your written model as the basis for your answers.

  1. For each entity pair shown, check all of the items that describe how you modeled it on paper. When done, click the Show Model button to see the complete model.
Entity1/Entity2 Items for Model
Customer/PO
1:M
(Choose three.)
Carry PK from Entity1 as FK in Entity2
Carry PK from Entity2 as FK in Entity1
Create a new Associative Entity
FKs within PK of entity that received it
FKs outside PK of entity that received it
FK is constrained to NN
FK is constrained to ND
PO/Part
M:M
(Choose two.)
Carry PK from Entity1 as FK in Entity2
Carry PK from Entity2 as FK in Entity1
Create a new Associative Entity
FKs within PK of entity that received it
FKs outside PK of entity that received it
FK is constrained to NN
FK is constrained to ND
Part/Part
M:M
(Choose two.)
Carry PK from Entity1 as FK in Entity2
Carry PK from Entity2 as FK in Entity1
Create a new Associative Entity
FKs within PK of entity that received it
FKs outside PK of entity that received it
FK is constrained to NN
FK is constrained to ND
Show Model (Displays the model with relationships defined.)

2. Click the Show Model button above, and answer the questions below related to it to validate that the indirect relationships we discovered are properly handled by the direct relationships we modeled:
What parts did customer 1 order? 12
91
12 & 91
Can't tell
What parts were ordered on PO 1104? 12
91
12 & 91
Can't tell

3. Do all relationships shown correspond to a specified business need?

Recursive Relationships

 
What Is a Recursive Relationship?

Good news: You've already done this! It models data hierarchies and networks. Another way to look at it is that recursive relationships are ones where an entity has a relationship to itself .

M:M Example: Part/Component Part
A part may be made of many component parts (e.g., be an assembly), and it may in turn be a component part of some larger assembly. Each part, from the most atomic (not composed of any other parts) to the most complex (composed of the most component parts and other assemblies) has its unique Part#, and is associated in the Part/Component Part entity with each of its component parts. This is also called a Bill Of Materials or BOM model, and reflects a network structure rather than just a hierarchy.

1:M Example: Employee/Manager
The Manager, is of course, also an Employee. This would be a 1:M relationship if each employee could only report to one manager, while a manager could manage multiple employees. In this case, the MgrEmp# would be carried as an FK in each Employee's row (identified by PK=Emp#) in the Employee table. Each row would therefore have two occurrences of Emp#: One identifies the employee, and one identifies the employee's manager. This represents a hierarchy, rather than a network structure.

Referential Integrity

 
What Is Referential Integrity?

Referential integrity ensures that the relationships (PK/FK references) are intact and valid.

For example, it is not possible in the real world for an employee to:

  • Work in a department that doesn't exist (e.g. a department was closed, and removed from the department table, but the employees who had worked in that department were not updated in the data warehouse to reflect that they were reassigned to other departments)
  • Report to a manager who is not an employee of the company (e.g. a manager resigned, and was removed from the employee table, but the employees who had reported to that manager were not updated in the data warehouse to reflect their new assignments to other managers)

Example

 
The PK/FK relationships defined in the LDM theoretically ensure that this could never happen, by invoking the rule: The value of an FK must always already exist in its related PK. This linkage is in fact an integral part of the definition of an FK. The implemented DBMS should automatically keep this from ever happening.

 4: Modeling Attributes

Objectives

 
This module enables the student to:
  • Define what attributes and constraints are in the context of an LDM
  • Identify attributes and constraints for a specific LDM in table layout format from a given user requirements description
  • Discuss NULLs and how they are generally addressed in data modeling

Attributes

 
What is an Attribute?

A relevant, elementary data item which describes or provides detail about an entity (e.g., a column).

Rules for Identifying Attributes

 
Rules for Identifying Attributes
  • Must pass these tests:
    • It is within the scope of the project
    • It can be clearly and uniquely defined
    • It is not described by other attribute(s) (e.g., only entities are described by attributes)
    • Its definition is not based on or derived from other attribute(s)
    • It is atomic in this context (e.g., cannot be broken down into smaller units of meaning)
    • It is drawn from a single domain of values
    • Appears in only one entity (except as a foreign key)
  • Attribute name must be:
    • Unique within a table
    • A singular noun, possibly modified by adjectives
  • Relational Rules:
    • The relative position of an attribute within an entity is irrelevant
    • Each attribute should be generally relevant to all instances of the entity

Types of Attributes

 
The following are types of attributes:
  • Key Attributes (covered above):
    • Primary Key (PK): Uniquely identifies each row in a table
    • Foreign Key (FK): Identifies the relationship between tables
  • Non-Key Attributes: All other attributes that are not part of any key. They are descriptive only, and do not define uniqueness (PK) or relationship (FK).
  • Derived Attributes: An attribute whose value can be calculated or otherwise derived from other existing attributes. Example: NetPay is derived by calculating GrossPay - TaxAmt.

Derived Attribute Issues

 
The attributes from which derived attributes are calculated are in the design, so carrying the derived attribute, in addition, creates redundant data. Derived attributes may be identified and defined in order to validate that the model can in fact deduce them, but should not be shown in the ER Diagram. The reason for this is that carrying redundant data goes against relational design theory and principles.

There are several good reasons to avoid carrying redundant data:

  • The data must be maintained in two places, which involves extra work, time and expense
  • There is a risk (likelihood) of the copies getting out of sync with each other, causing data inconsistency
  • It takes more physical storage
Strictly as a physical design decision, when the PDM is created from the LDM, the enterprise may decide to carry some derived data on the system for:
  • Increased comfort. Note: This is actually counterproductive because of the update synchronization issue. That is, the user may feel more comfortable, while the data is actually inaccurate!
  • Documentation of the derivation algorithm
  • Speed and ease of access for frequently accessed items that require large amounts of I/O to derive.

Linguistic Issues

 
People tend to be fairly casual in the way they think about and describe their business needs and processes. Ask questions to specify the meaning of names that are under consideration as attribute names. Here are two common situations to be alert for, and to resolve explicitly when they arise.
  • Synonyms: Different names which mean the same thing (e.g., JobCode vs WorkOrder#)
  • Homonyms: The same name means the different things to different users:
    • Acct# for the Accounting Department
    • Acct# for the Sales Department

Try It: Attributes

 
A hardware vendor has the following business requirements. Use the rules for identifying attributes on the previous page (a copy is provided below for your reference) to identify the attributes you would define.

OK. Here’s what we need. We want to track purchase orders (POs) for customers. Our POs have to be at the line item level so that we can tell what customers ordered what specific parts, and when, and how many of each and what we charged on the PO. And we track the customer’s name and address for shipping purposes, though shipping itself is not important right now. We also keep track various types of customers, for example, individuals and corporations. By the way, some of our parts are composed of other parts we sell. We often include special instructions on a PO, that we write into the remarks box on the form.

Rules for Identifying Attributes:

  • Must pass these tests:
    • It is within the scope of the project
    • It can be clearly and uniquely defined
    • It is not described by other attribute(s) (e.g., only entities are described by attributes)
    • Its definition is not based on or derived from other attribute(s)
    • It is atomic in this context (e.g., cannot be broken down into smaller units of meaning)
    • It is drawn from a single domain of values
    • Appears in only one entity (except as a foreign key)
  • Attribute name must be:
    • Unique within a table
    • A singular noun, possibly modified by adjectives
  • Relational rules:
    • The relative position of an attribute within an entity is irrelevant
    • Each attribute should be generally relevant to all instances of the entity

Domains

 
What Is a Domain?

A domain defines the valid values for an attribute: The actual data value in that column within each row in a table. An attribute should contain only one domain of data, but many attributes may be drawn from the same domain. A few examples of very common domains are:

  • Date
  • Time
  • Quantity
  •        
  • Amount (Currency)
  • Yes/No
  • Ranges of numeric values
  • A domain is defined by these primary characteristics:

    • Format (though the technical side of format is actually a PDM consideration, format is an LDM consideration in terms of valid data values for an attribute during design):
      • Data type (physical representation)
      • Length (storage format)
    • Values: The set of valid values that may appear in that column. All other values are considered errors and are prohibited from appearing in the column.
      • The value set is named.
      • It may change over time.
      • When the list changes, all values in the column must conform to the new list.
    • Atomic meaning: A domain has only one meaning within the system context.

    Keep Domains Atomic

     
    The same data may be atomic with respect to one system, but not to another, depending on whether subsets of the data have intrinsic meaning to that enterprise. An example is Phone Number (12 digits: 03-827-456-2837):
    • Non-Telcos: One domain since only the entire number taken as a whole has meaning
    • Telcos: Four domains, each with its different meaning, and should be modeled as four separate attributes

    To verify if an attribute (and its underlying domain) is atomic, ask the user if they would query against portions of it. If the answer is yes, break the attribute down into the smallest components that would be queried. Example: Would you ever want to report on any pieces of Customer Name? It is 2-5 domains depending on the user's answer:

    • Title (Mr., Ms., Dr. etc.)
    • First name
    • Middle name or initial
    • Last name
    • Degree (MD, DDS, LLB, etc.)

    While title and degree are not really part of customer's name, when a user mentions "customer name" in terms of their business needs, they may actually have these attributes in mind as well. Asking a question such as the one above will ferret out these additional attributes.

    Anticipated searches on subsets of the data do not indicate that the attribute should be broken out into its components in these cases:

    • Wild card or partial searches that do not indicate that fixed portions of the attribute mean different things
    • Date and Time: These are intrinsically constructed of three domains each (Day, Month, Year for Date, and Hours, Minutes, Seconds for Time), but people think of them and use them as a single piece of information about an entity. Also, standardized manipulation algorithms exist across enterprises, industries and most countries making it practical for implementation to treat these as three domains internally while actually storing them as a single attribute for convenience.

    Attribute Placement

     
    Overview

    Once you've identified the attributes, the question is which ones belong in which entities? A non-key attribute should be placed in only one entity. There is a formalized science to this process, called normalization, which is covered in detail later on in this course.

    However, to minimize work in actual implementation, the first cut for identifying and assigning attributes is generally by intuition. This results in a good model that can be validated and corrected by applying the formalized normalization rules.

    Assigning attributes intuitively is valid because in most cases the name of the attribute, or the way it's referred to tells you the answer. For example, customer name, address and type obviously belong in the Customer entity. Here's how they look after we've placed them:

    Validate: Atomic Domains?

     
    Notice that the user originally specified only two attributes: name and address. When we apply the rules and concepts presented above (especially about domains) we discover that each of the original attributes breaks into multiple attributes in order to get to the atomic level.

    Use realistic sample data to help the user validate the attributes in terms of domain:

    • Atomic level appropriate to the business context.
    • Meaning of each attribute. Example: Account# sample value is 123. The user expects ABC. It becomes clear that communication was incomplete, but the error is easy and inexpensive to correct at this point.
    • General data type (alpha/numeric). Above example applies.

    Validate: Missing Attributes?

     
    Users rarely just happened to mention everything they need at the atomic level. Always ask if there is more, and validate what is done so far. The easiest way to do this is to show them the table layout with sample data and solicit feedback. It's very likely here that the user would also want a start date, a phone number, perhaps e-mail, etc. for their customers. To save space and scrolling in this course, we're postulating that they are satisfied with the attributes shown. It is also a good idea to look at existing reports, screen layouts and file layouts to help identify attributes.

    Validate: Missing Minor Entities?

     
    Sample data can help identify if entities are missing. Remember the relational rule that seeks to minimize redundant data? When the same descriptive data is repeated within a column, where only a few values are valid, it is generally advised to create minor entities. A minor entity is a table, usually with a code value as its PK and a single description attribute to define the code. The code is carried elsewhere in the system as an FK, and the full description is accessed from the minor entity if needed for reporting. Minor entities are a good way to maintain data accuracy and reduce redundancy. Example: A Gender code table. Clues to missing minor entities include:
    • Attributes that include the word Type, Code, Category, Group or ID as part of their name. They represent various ways of categorizing the instances in an entity.
    • A small number of valid values in a column:
      • Repeated data values (especially long descriptions) in a column.
      • Repeated code values in a column. These indicate that a longer description exists which should be documented in a minor entity.

    Example

     
    Example: Employees may be paid by monthly salary or hourly wage. If there are many rows (say 50K employees), these long descriptions for each employee's PayType can cause problems:
    • During Modeling:
      • Redundant data: Goes against relational theory rules.
      • Inconsistency: Over time, the descriptions are input with various spellings (Mo. Sal., Mo Sal, Monthly Sal, etc). It becomes impossible to query by PayType with any accuracy.
    • For Implementation: Storage and maintenance savings.


    The Solution
    • Create a code to carry on each Employee row.
    • Create a minor entity to document the valid code values
    • Carry the minor entity's PK as an FK in the Employee table

    Quick Quiz

     
    1. What attribute above might contain multiple domains?  

    2. What attribute(s) above might indicate the need for a minor entity? (Choose three.)
      LastName
      FirstName
      Address
      City
      State
      Zip
      Type

    Constraints

     
    Constraint Concept Definition

    A constraint or assertion is a rule that defines limits on valid entity relationships or attribute values, based on business rules. Domain level constraints apply to all attributes in the domain. Attribute level constraints apply only to the specific attribute. Constraints can also apply across multiple entities and attributes.

    Constraint Examples

    • Date Domain: Must be a valid calendar date YYYY/MM/DD
    • BirthDate: Must indicate the person is older than 21.
    • GoldClubIndicator: Yes or No
    • Required Attribute (Be careful when defining this one, as it causes people to input incorrect data rather than allow the system to impede their work)
    • No duplicates allowed (ND)
    • Only numeric values are valid
    • An employee must belong to an existing department
    • An employee may be assigned to only one department
    • An employee may not work on any project that is not assigned to her/his department
    • HealthPlanCode must be: J, R, A or S
    • Only salaried employees may sign up for Health Plan "S"
    • If HireDate > ten years from today, VestedIndicator = Y

    Validate and Document

     
    Some constraints are explicitly described in the LDM (e.g., a relationship or a domain) while others are only described in documentation associated with the LDM. It is important to validate the entire model against all of the identified constraints to make sure that they are supported by the design when it is complete. Sample data and table layouts can be very useful in validating constraints with the user.

    NULLs Concept Definition

     
    NULL indicates that the value is unknown. NULL is not a blank or zero value. Reasons for NULLs in an attribute might indicate:
    • Not currently known but presumed to exist (e.g., marital status)
    • Refused, but presumed to exist (e.g., social security number)
    • Not relevant to this specific type of instance of the entity (e.g., type of hoof for a fish)
    • Not currently relevant (e.g., name of spouse for an single or divorced employee)
    • The attribute may not apply for all instances such as an end date where the instance has not yet ended (e.g. an active product with a null end date)
    NOTES:
    • Best practice is to not put in dummy values if the information is not know (such as a phony date if a person's birth date is not known.)
    • Different enterprises use the concept of NULLs in different ways. While one uses it for unknown, another may use it for not applicable. It is important to get these definitions clearly stated so that everyone is working from the same assumptions.

    NULL vs Zero or Blank

     
    Here are some examples of possible differences in meaning (per the enterprise's business rules):
    Attribute NameNULL meansZero MeansBlank Means
    SocialSecurityNumberUnknownPerson has no SS#Refused to supply it
    EmployeeAddressMissing dataNot validNot valid
    CharityContribAmtUnknownContributed $0.00Not valid
    SpouseNameNot relevantNot validNot supplied

    Issues with NULLs

     
    While the following are implementation issues, rather than logical modeling issues, they are the underlying reasons that the rules for NULLs were included in relational theory.
    • Aggregation functions across NULLs are undefined and generally give error messages:
      • What is the average of the following set of values: 20 + 30 + unknown + 40?
      • The average is 30 if NULLs are excluded and 18 if they are included
      • A division operation against values that include NULLs generally gives a divide check
    • Boolean operators (EQ, LT, GT, NE, etc.) on NULLs are clearly defined, but produce unpredictable results. (One NULL is not less or greater than another, but is never equal)
    • Joins generally ignore rows with NULLs in the join fields, considering them unmatched. This impacts outer join accuracy.

    Validate: Missing Sub-Entities?

     
    There is a relational rule that all of the attributes in an entity should describe all of its rows. Large numbers of NULLs in the sample data may show that some attributes in fact apply to only a defined subset of the rows. This may indicate the need for a new entity, called a sub-entity or generalization. Sub-entities that are split out from one entity have:
    • The same PK as the original entity and as each other
    • The PK for each is also an FK to the original entity
    • Different attributes from one another
    • No overlap in rows in the two sub-entities
    Note: Sub-entities are also called sub-types, with the original entity called a super-type.

    Example

     
    Example: Notice that salaried employees have a monthly salary amount and bonus percent, while hourly employees have an hourly amount and overtime authorization. All employees have a PayType. The presence of NULLs in so many rows and columns indicates the need for sub-entities.

    The Solution: Sub-Entities

    Create two more sub-entities from the original such that the NULLs no longer appear:

    Try It: Place Attributes

     
    It is important at each step to document the model in order to validate its completeness and accuracy. Below is the model as it exists with the entities and relationships we've defined.

    1. For each blue column name (Attribute) from the list on the left click the name of the entity it would be placed in as a non-key attribute from the drop down menu to its right. When done, click the Show Model button to see the complete model.
      Attributes 
      Address
      City
      CustType
      CustTypeDesc
      FirstName
      LastName
      OrderQty
      PartName
      PurchDate
      Remarks
      State
      UnitPrice
      Zip
      Show Model (Displays the complete model.)

    2. For which tables might you ask the user if there are additional attributes or constraints? Click the Check button to check the answer. Select the Cheat button to see the correct answer.
      A. N/A
      B. Customer
      C. Customer Type
      D. Part
      E. Part/Component
      F. PO/Part
      G. Purchase Order

    3. Do all of the entities shown correspond to a specified business need?  

    Congratulations

     
    Congratulations! You've just created your first mini-LDM.

    An ERD is provided for students who are already conversant in that format. Detailed discussion of ERDs is reserved for a later module.

    See an ER Diagram of the model.

     5: Bringing It All Together

    Objectives

     
    This module enables the student to describe and follow the high level steps for creating an LDM.

    LDM Process Review

     
    Essentially the previous modules have taken you through this process, but without focusing on its formalized structure. Below is a summary of the entire process as you would do it on a real life project.

    Step 1: Define Business Requirements/Scope

     
    The business requirements define the scope. As we develop the LDM we constantly validate it against the business requirements. Any component that does not relate directly to a business requirement is outside of scope for the project and is not included in the data model.

    Make sure all of the business requirements are understood!

    • What problems and issues need to be addressed? Usually these are related to increasing revenues or reducing costs.
    • What business questions must the model be able to answer?
    • What business functions must be handled?
    • What business constraints exist?
    • Is everyone who needs to be involved available to share their business requirements?
    Determine the method(s) of collecting requirements:
    • Review of existing material (such as existing reports)
    • Interviews
    • A combination of these methods

    Step 2: Define Entities

     
    • Develop the initial entity pool
    • Develop definitions for the entities
    • Eliminate entities that are outside the scope of the project
    • Identify a PK for each remaining viable entity
    • Document the viable entities
      • Optional: Use table layouts with sample data for verification with users
      • Required: Use ER Diagram for final official version

    Step 3: Define Relationships

     
    • Identify the relationships between entities
    • Use a relationship matrix to avoid missing any
      • Eliminate relationships that are outside the scope of the project
      • Eliminate indirect relationships
      • Name the remaining viable relationships
      • Identify the viable relationships' cardinality (1:1, 1:M, M:M)
    • Document the viable relationships in the model, using FKs
      • Optional: Use table layouts with sample data for verification with users
      • Required: Use ER Diagram for final official version

    Step 4: Define Non-Key Attributes

     
    • Identify and define relevant non-key attributes
    • Eliminate attributes that are outside the scope of the project
    • Use intuition to place each remaining viable attribute into only one table
    • Verify placement of viable attributes
    • Document the viable attributes in the model
      • Optional: Use table layouts with sample data for verification with users
      • Required: Use ER Diagram for final official version
    • Capture constraints in a document for final model validation

    Step 5: Verify the Model

     
    • Repeat all steps as necessary
      • Multiple iterations are usually required
      • Any change in model causes a ripple effect and so requires very careful review:
        • Changes in entities often impact relationship definition and attribute placement
        • Changes in relationships often impact attribute placement
        • Changes in attribute placement may impact other attribute placements
    • Constantly verify scope by asking:
      • What does this component mean, in terms of the business?
      • What business requirement does this component address?
    • Verify that all business requirements, questions and constraints are handled by the model
    • Do not take into consideration any physical implementation issues!
    • Stop refining the model when all data necessary to answer the business requirements are present

    What is Metadata?

     
    Metadata is information that describes the data structure of a system. An LDM is metadata. Metadata can be a difficult concept to grasp initially. Envision an LDM that describes a specific data model. Below is an example of how this would look for our hardware enterprise data model.

    Notice the Entity that describes the hardware enterprise entities is called "Entity". This is why metadata is difficult to understand and describe: It is self-referent. "Entity" describes multiple levels of meaning in the sentence above. The graphic is much easier to understand.

    Metadata and the PDM

     
    Just as there are two designs (LDM and PDM), there are two types of metadata. The LDM is a visual representation of metadata and is not implemented. When a database is implemented from the PDM, the data definitions for the tables, columns and relationships, as well as their physical characteristics (data type and length and indexing, etc.) are stored physically on the system. The tables that house these structural descriptions are also considered metadata because they define the implemented system.

    Try It: Create an LDM

     
    Step 1: Business Requirements
    Podunk University (known affectionately as PU) needs an HR system to manage faculty and staff. We have done a Business Discovery and revealed these business questions for you:

    • Which employees are staff, and which are faculty?
    • Which faculty member has what degrees from which universities, when, in what subjects?
    • What are faculty members versed in (as indicated by degree subject or post-grad project or publication)?
    • What organizations are sponsors for post-grad projects and publications for faculty? Who did the work, on what topics, and when? Only one organization sponsors any one project, but multiple faculty members can work on a project together.
    • What has each faculty member published (title, date, and publisher)? Did they edit or write it?
    • What job skills does each of our staff members possess and when did they acquire each one?
    • Who is assigned to which department(s) and when did they start there?

    NOTE: For more in-depth learning, you may want to create the complete model on your own on paper from these business requirements before doing this exercise.

    Step 2: Define Entities


    Identify the seven major entities described in the business requirements above by checkmarking on them on the list below:
    Degree
    Degree Date
    Degree Description
    Degree Subject
    Degree Type
    Degree University
    Department
    Dept Name
    Dept Start Date
    Employee
    Employee Name
    Job Skill
    Skill Description
    Post Grad Project
    Project Topic
    Project Start Date
    Project End Date
    Publication
    Author/Editor
    Publication Date
    Publication Title
    Publisher
    Skill Acquired Date
    Sponsor
    Sponsor Name
    Employment Type

    Step 3a: Identify Direct Relationships


    For each possible relationship (e.g., unshaded cell in the matrix), choose the appropriate cardinality from the drop down menu. Choose N/A for any cell where no relationship exists, or the relationship is indirect.

     DegreeDepartmentEmployeeSkillProjectPublicationSponsor
    Degree
    Department 
    Employee  
    Skill   
    Project    
    Publication     
    Sponsor      

    Step 3b: Model Direct Relationships


    Use this legend to choose one strategy for each direct relationship shown.

    ACarry the PK from Entity 1 as an FK in Entity 2
    B Carry the PK from Entity 2 as an FK in Entity 1
    CCreate a new Associative Entity

    Entity 1/Entity2Modeling Strategy
    Employee/Degree A   B   C  
    Employee/Department A   B   C  
    Employee/Skill A   B   C  
    Employee/Project A   B   C  
    Employee/Publication A   B   C  
    Project/Sponsor A   B   C  
    Publication/Sponsor A   B   C  
    Show Step 3 Model

    Step 4: Define Non-Key Attributes


    For each attribute, choose the table it belongs in from the drop down menu provided.
    AttributeTableAttributeTable
    Author/ Editor Project End Date
    Degree Date Project Start Date
    Degree Description Project Topic
    Degree Subject Publication Date
    Degree University Publication Title
    Dept Name Publisher
    Dept Start Date Skill Description
    Employee Name Skill Acquire Date
    Employment Type Sponsor Name
    Show Step 4 Model

    Step 5: Verify the Model


    Does the model answer all of the business questions?

    Business QuestionAnswer
    Which employees are staff, and which are faculty? Yes No
    Which faculty have what degrees from which universities, when, in what subjects? Yes No
    What subjects are faculty members versed in? Yes No
    What organizations are sponsors for post-grad projects and publications for faculty? Who did the work, on what topics, and when? Yes No
    What has each faculty member published? Did they edit or write it? Yes No
    What job skills do our staff members possess and when did they acquire each one? Yes No
    Who is assigned to which department(s) and when did they start there? Yes No

     6: Normalization

    Objectives

     
    This module enables the student to:
    • Define the different normal forms
    • Identify the advantages of each normal form for a given situation
    • Articulate the down-side of failing to normalize at each level
    • Describe how to adjust a model to conform to each normal form

    Overview

     
    What is Normalization?

    Normalization is a set of rules and a methodology for making sure that the attributes in a design are carried in the correct entity to map accurately to reality, eliminate data redundancy and minimize update anomalies. Most LDMs adhere first, second and third normal form, because a model in third normal form is as close to the real world conditions as is necessary or useful for most situations.

    Stated simply: One Fact, One Place!

    • 1st Normal Form (1NF), 2nd Normal Form (2NF) and 3rd Normal Form (3NF) are progressively more refined and apply to non-key attributes regarding their dependency on PK attributes.
    • 4th Normal Form (4NF) and 5th Normal Form (5NF) apply to dependencies between or among PK attributes.
    Normalization provides a rigorous, relational- theory-based way to identify and eliminate most data problems:
    • Provides precise identification of unique data values
    • Creates data structures which have no anomalies for access and maintenance functions

    The details of when to use each of these forms is covered in the following pages of this module.

    Power to the Data!

     
    The net benefit of normalization is that the power of a system is built into the data structure itself, rather than the applications. That is why it is important to design LDMs to be functionally unbiased. A normalized system gives the users control over their data. When they want more flexibility, they can enter more data in the existing design, rather than having to involve designers or programmers for enhancements and applications development. This makes the system much easier and less expensive to maintain over time.

    Easy to Remember!

     
    This mnemonic can help you remember the purpose for each of the first three levels of normalization:

    Every non-key attribute in an entity must depend on:

    The KEY - 1st Normal Form (1NF)
    The WHOLE Key - Second Normal Form (2NF)
    And NOTHING BUT the Key - Third Normal Form (3NF)

                                                                  -- E. F. Codd

    These three statements will be explained as we come to each normal form in turn.

    Intuition and Normalization!

     
    Placing attributes by intuition generally results in a model that is very close to normalized, and is much less work than formal normalization. However, the rules of normalization should be applied as a final check, and the model should be adjusted to comply with 3NF if discrepancies are identified.

    Redesign and Normalization

     
    In many engagements, data modeling is done from the current "model", which may never have gone through the normalization process, or any design process for that matter, and may or may not have been relational. Often the designers must work from a combination of user interviews and current system (e.g., PDM) documentation. These situations generally include many normalization errors that must be identified and corrected. Legacy "designs" may be so riddled with denormalizations that it may be easier to design from scratch, and then go back and validate that all the original requirements and functionality are handled.

    Earlier Is Better

     
    The earlier a design error is identified and corrected the less involved, and therefore the less expensive it is. For example, an error that requires redesign is many times more expensive if it occurs after implementation because it is more complex and interrupts production work, requiring data re-loading and having possible impact on application code and report generation (among other possible ripple effects). This is one of the main reasons for normalization.

    Sample Data: The Designer's Best Friend

     
    Sample data is the easiest, most efficient, and simplest way to detect design errors early in the process when they can be quickly corrected in the LDM. It is best to do this on a related set of tables (perhaps by functional area) rather than on the entire model, which may involve hundreds of entities and be too complex to grasp as a whole.

    The issue is early validation and correction. Attribute names (if done carefully) make the model understandable, but allow users to assume that they contain the expected data. And two users' expectations may not be the same (e.g., Acct# for the accountant is a 9 digit number, for sales it is really a customer identifier that is a 5 digit alpha value). Someone is going to be disappointed when the first report is printed. Someone (probably you) will have to take up the slack when the error is discovered. If they'd seen the table with sample data in it, they would have clarified their needs instantly and the design would have been corrected before implementation.

    1st Normal Form

     
    What is First Normal Form (1NF)

    Each non-key attribute should have a 1:1 relationship with the PK of the entity. There should be no repeating groups (multiple occurrences of attribute(s) based on the same domain).

    Repeating Groups

    An example here might be if you kept employee pay information on the Employee table. You'd need pay date, pay amount, paycheck number, etc. And you'd need to add a new occurrence of that group of information every pay period.

    Example: Mixed Domain Names

     
    Here is how a repeating attribute looks, when domain name is not included in the attribute name:

    This is a very common example, these days, as more and more phone options become available to people. You can see that the user may want to break "phone" into home phone and office phone, and probably will want to add voice mail phone, cell phone, and likely others over time. How would you know how many occurrences to plan for? No matter what you decide, over time it's likely to outgrow the planning.

    The Solution

     
    It's best to avoid the problem altogether by removing the attribute(s) that repeat into their own entity:

    With this model there is no limit to the number of phones for a customer. If a customer adds 20 or even 50 phones over time, all that is required is data entry rather than design change and implementation. This is what we mean by "putting the power in the data". This model (CustomerPhone table) is also more flexible. It could support information about each phone number, such as when the customer prefers each to be used, etc. by adding those attributes to the CustomerPhone table during the design phase.

    Food for Thought

    Notice that this model would not support a customer having more than one of a given type of phone number (e.g., multiple voice mails). How would you modify this model to do that?

    Repeating Groups

     
    Why it Matters: Data Problems

    The number of repetitions tends to grow over time:

    • No matter how many occurrences you design into the model, in time, you run out
    • Extra coding is required to handle retrieval and update for the occurrences, and occurrences with NULLs in them
    • Eventually requires redesign, and re-loading of tables already populated in the data warehouse to accommodate more occurrences
    • Is very expensive, inconvenient (interrupts production) and completely needless if 1NF is observed in the original design.
    Recognizing the Pattern

    Look for an attribute (or set of attributes) based on the same underlying domain(s):

    • Attributes with similar names
    • Attributes with different names but the same underlying domain of values
    • Attributes or attribute groups with NULLs in many rows
    • Ask: Might we ever need more columns like this one (or group) for a row in this table?
    • Example: Might we need more types of phone numbers for a customer?
    Avoiding the Problem
    • Name attributes to end with their domain name to make repeating groups easy to identify (BirthDate and HireDate instead of Date of Birth and Date Hired)
    • Use sample data during design to identify this problem early
    Correcting the Model
    • Single attribute that repeats:
      • Create a new entity
      • Carry its PK as an FK in the original entity where the violation occurred
    • Remove a repeating group of attributes to its own entity (existing or new) and carry its PK as an FK in the original entity where the violation occurred.

    The Origin of 1NF

     
    When relational theory was first developed, the existing systems often included variable length records, which table structures do not permit. 1NF was developed to address this issue, and states that all rows will have the same record layout, with no repeating items or groups, and likewise that all instances should have data for all attributes defined for an entity. There are two ways in which a record length might vary, if repeating groups are permitted:
    • Variable Record Length: Different instances in the table might have different numbers of the repetitions of the group (such as paycheck information on employees, where some employees have been there a long time, and some a short time). This is simply not permitted in relational design. Tables are structurally incapable of handling it.
    • Fixed Record Length with set number of occurrences. Tables can be designed to do this, but it creates serious problems:
      • Not every instance will actually use all the defined occurrences (e.g., 5 phone number fields are defined, and not all customers have five phone numbers)
      • Additional programming is needed to handle the occurrences, and handle occurrences that have NULLs in them
      • The number of needed occurrences grows over time, eventually requiring record re-design and data load (e.g., No one used to have 5 phone numbers. Now many people do, and the options are expanding rapidly.

    Try It: 1NF

     
    In the model below, identify the four repeating attribute(s) that you would remove from this table by clicking them in the list below. Click the Check button when you've check-marked all you intend to. Click Cheat to see the correct answer.

    House#
    Addr
    #BedRm
    #Baths
    #DishWshr
    #Ofcs
    Price
    N/A

    Show Model (Displays the complete model for this exercise.)  

    2nd Normal Form

     
    What is 2nd Normal Form (2NF)?
    Each non-key attribute must relate directly to each column of the PK. The attributes depend on the whole key.

    Example: 2NF Violation

     
    Here is how attributes that violate 2NF look. Notice that social security number relates exclusively to employee, regardless of department. Also, department manager relates exclusively to department, regardless of what employees work there. As a result, both columns (SSN and DeptMgr) have redundant data in them. Percent time, however, relates to a specific employee in a specific department, and is a valid attribute for this associative entity.

    Problems that arise from this 2NF violation:

    • Social security number has to be constantly repeated
    • Manager has to be entered each time an employee is assigned to a department
    • Can't find a manager's name unless there are employees assigned
    • Can't find an employee's social security number unless they're assigned to a department
    • Could incorrectly enter social security number as different values for the same employee
    • Could incorrectly enter the department manager's name as different spellings for the same manager
    • Counting employees by manager could be inaccurate if the spelling varied
    • If a manager changed names, it would have to be updated on many records, and accuracy of update depends on consistent spelling.

    The Whole Key

     
    Why it Matters: Data Problems

    Redundant data (e.g., many occurrences of the same description):

    • Data must be input many times, costing time and storage
    • There is no way to pre-define values for validation
    • Updating is more complex when an attribute is in more than one place
    • Data becomes inconsistent over time (no control over spelling)
    Recognizing the Pattern
    • Look for redundant data
    • Sample data is the easiest way to make this violation visible
    • Ask yourself: Does this column relate more to PK column A than to PK column B?
    • Example: Does Social Security Number relate more to Employee than to Department?
    Avoiding the Problem: Use sample data during design to identify this problem early.

    Correcting the Model

    • If possible, move the attribute into an existing table where it relates directly to the entire PK
    • If necessary, create a minor entity:
      • Define a PK the attribute does relate directly to (usually a code)
      • Carry the attribute as a description of the code value
      • Carry the code only in the entity where the code relates to the entire PK

    Try It: 2NF

     
    Part 1 Directions: Which columns violate 2NF? And what tables would you move them to? Select your answers from the drop down menus.

    ColumnMove?
    Emp#
    Dept#
    SSN
    DeptMgr
    %Time
    Show Model (Displays complete model for Part 1.)

    Part 2 Directions: Did this Solve All of the Problems? Choose from the drop-down menu for each issue.
    Social security number has to be constantly repeated
    Manager has to be entered each time a an employee is assigned to a department
    Can't find a manager's name unless there are employees assigned.
    Can't find an employee's social security number unless they're assigned to a department
    Could miss-enter social security number as different values for the same employee
    Could miss-enter the department manager's name as different spellings for the same manager
    Counting employees by manager could be inaccurate if the spelling varied
    If a manager changed names, it would have to be updated on many records, and accuracy of update depends on consistent spelling

    3rd Normal Form

     
    What is 3rd Normal Form?
    Every attribute describes the entity's PK rather than describing another attribute or the PK of a missing entity: The attributes depend on nothing but the key.

    This rule is essentially the same as 2NF, except that it applies equally to entities with single-column PKs. Its purpose is similar, and the issues that arise related to violations are similar to 2NF. Likewise, the 2NF methods for correcting it are appropriate.

    Example 3NF Violation

     
    Here is how attributes that violate 3NF look. Notice that department name relates exclusively to department, not to employee at all. As a result it has redundant data for every employee in the same department.

    Problems with this 3NF violation:

    • Department name has to be constantly repeated
    • The same department name can be spelled many ways
    • A department can't exist unless there are employees assigned
    • Counting employees by department could be inaccurate if the department name spelling varied
    • If a department changed names, it would have to be updated on many records, and accuracy of update depends on consistent spelling

    Extraneous Attributes

     
    Why it Matters: Data Problems
    Redundant data (e.g., many occurrences of the same description):
    • Data must be input many times, costing time and storage
    • Updating is more complex when an attribute is in more than one place
    • Data becomes inconsistent over time (no control over spelling)

    Recognizing the Pattern
    Look for redundant data

    • Sample data is the easiest way to make this violation visible
    • Ask: Does this column relate more to any non-key attribute than to the PK?
    • Example: Does department name relate more to department number than to employee number?
    • Ask: Does this column relate more to something other than the PK?
    • Example: Does department name relate more to something other than employee number?

    Avoiding the Problem: Use sample data during design to identify this problem early.

    Correcting the Model

    • If possible, move the attribute into an existing table where it relates directly to the PK
    • Create a minor entity:
      • Define a PK the attribute does relate directly to (usually a code)
      • Carry the attribute as a description of the code value
      • Carry the code only as an FK in the entity where the code relates directly to the PK

    Try It: 3NF

     
    Assume for this model (unlike the Podunk University model) that an employee may work in only one department. Move the attributes that relate exclusively to a portion of the PK to the entity that has that PK:

    Column Move?
    Employee.Emp#
    Employee.SSN
    Employee.DeptName
    Department.Dept#
    Department.DeptName

    Show Model (Displays the complete model.)

    Directions: Did this Solve All of the Problems? Choose from the drop-down menu for each issue.
    Department name has to be constantly repeated.
    The same department name can be spelled many ways.
    A department can't exist unless there are employees assigned.
    Counting employees by department could be inaccurate if the department name spelling varied.
    If a department changed names, it would have to be updated on many records, and accuracy of update depends on consistent spelling.

    4th and 5th Normal Forms

     
    4NF and 5NF are mentioned here only for your information. The vast majority of models never apply these levels. Essentially these NFs are designed to impose the same level of consistency within a PK composed of more than two columns as the first 3NFs impose on attributes outside the PK.

    Entities with more than two columns in the PK often contain no non-key attributes. If non-key attributes do exist, 4NF and 5NF violations are unlikely because bringing the model into 3NF compliance precludes them.

    Usually 4NF and 5NF violations occur when the definition of the information to be represented is ambiguous (e.g., the user has either not really understood what they are asking for, or they have failed to state it clearly enough for the designer to understand it). 4NF and 5NF really represent two flip sides of the same issue: The PK must contain the minimum number of attributes that accurately describe all of the business rules.

    Formal Definitions

     
    • 4NF: The entity's PK represents a single multi-valued fact that requires all PK attributes be present for proper representation.
    • 5NF: The entity represents, in its key, a single multi-valued fact and has no unresolved symmetric constraints. A 4NF entity is also in 5NF if no symmetric constraints exist.

    Normalization Review

     
    All NFs
    The normalization rules are a precise way to insure that all of the attributes are placed to:
    • Map ideally to the PK of the entity they are in (e.g., be placed in the most appropriate entity)
    • Map as closely as possible to the real world
    • Require the least possible redundant data entry
    • Support data consistency over time
    • Put the power of the design into the data (e.g., data entry) rather than application programs

    Summary of NFs

     
    1NF: No repeating attributes or groups of attributes.
    2NF: Attributes must not relate to a portion of the PK.
    3NF: Attributes must not relate to attributes outside of the PK.

    Summary of Corrective Measures

     
    Whenever a violation is identified:
    • Remove the offending attribute(s) from the entity where the violation appears
    • Place it into the existing entity or a newly created entity whose PK it optimally describes
    • Create a PK/FK relationship between the entities where the attribute was and where it is now
      • 1:1 or 1:M relationship: Carry the PK of the entity that houses the attribute as an FK in the entity where the attribute was in violation
      • M:M relationship: Create an associative entity

    Try It: Normalization

     
    The Challenge: We're being asked to migrate the hardware enterprise's legacy system into the new design. It is one big Excel spreadsheet, that maps directly to the PO, so it has several normalization violations.

    Directions: For each NF, click the checkbox for each attribute that violates that NF. Click N/A if there is no violation for that NF.

    1NF: Repeating Groups
    (Choose three.)
    Cust#
    CustName
    OrdQty
    Part#
    PartName
    PO#
    N/A
    2NF: The Whole Key
    (Choose one.)
    Cust#
    CustName
    OrdQty
    Part#
    PartName
    PO#
    N/A

    3NF: Nothing But
    the Key (Choose one.)
    Cust#
    CustName
    OrdQty
    Part#
    PartName
    PO#
    N/A

     7: Modeling Time

    Objectives

     
    This module enables the student to create a logical model that includes the historical data.

    Modeling History

     
    So far, we've looked at models that reflect the current state of a business, ignoring the passage of time. In reality, businesses (and their LDMs) must track history of all their major functions. Often the real on-going value in a data warehouse depends on the historical detail available. As in all other aspects of modeling, it is crucial to understand the business requirements.

    History Example

     
    Our hardware enterprise currently includes the Part table shown on the left. Notice that while we can tell the price of a part, we have no way of tracking changes in the price over time.

    To track history, we'll clearly need to add the date each price becomes effective for each part. We can't add multiple dates into an existing Part table row, because it would violate 1NF (no repeating attributes). Instead we create a history table in which the PK is composed of the Part# (FK from the Part table) and the part price effective date (a non-FK component of the PK). This essentially creates a M:M relationship between Part and Calendar Date (an implied entity). Now we can track as many price changes as we need to.

    Current Status vs. History

     
    Notice that we removed UnitPrice from the Part table. The most recent date in the history table is the current price. If kept in the Part table it would be redundant derived data, which we always strive to avoid in the LDM. During PDM the designer may elect to carry a "current status" attribute in the original table as a way of reducing processing overhead. It is not correct to model it into the LDM.

    History Modeling Summary

     
    • Move attributes that need to be recorded on successive dates into the History Entity
    • Date becomes an attribute (though not an FK) in the History Entity's PK
    • This modeling technique works equally well for entities with multiple-column PKs:
      • Associative Entity histories
      • Dependent Entity histories

    Try It: Model History

     
    Podunk University has decided that they need to track all department assignments across time, including cases where someone works in the same department more than once with a time gap in between. Right now their model only shows the original start date for each employee in each department, but no end date. And while they're making changes, they'd like to track percent time for each employee in each department assignment. How would you modify their model (below)?

    Recommendation: To save time, effort and errors do this exercise first on paper. Then, do the exercise on-line, using your written model as the basis for your answers.

    Directions: For each purple column name below click as many checkboxes as required to indicate all of its role(s) in a more appropriately modeled Employee/Department History entity. When you're done, click the Check button.

    Dept#
    PK
    FK
    Non-Key Attribute
    Emp#
    PK
    FK
    Non-Key Attribute
    EndDate
    PK
    FK
    Non-Key Attribute
    StartDate
    PK
    FK
    Non-Key Attribute
    %Time
    PK
    FK
    Non-Key Attribute

     8: Using ER Diagrams

    Objectives

     
    This module enables the student to read and interpret an LDM in ER Diagram format.

    Overview

     
    Types of Models
    LDMs may be used at varying levels of detail:
    • Corporate/Enterprise Model: This is a high-level model commonly called an E-R model. Includes entities and relationships, but no attributes. M:M relationships are allowed.
    • Key-Based Model: Includes entities, relationships and key attributes. M:M relationships are not allowed.
    • Fully Attributed Model: This is a complete model with all entities, relationships and attributes defined.

    ERD Definition Review

     
    An LDM is ultimately documented in an ERD (Entity-Relationship Diagram). The ERD is a standard modeling technique for defining and graphically representing an LDM:
    • Entities: Represent people, places, things, concepts, or events for which the business has the means and the will to collect and store data. Also known as Tables.
    • Relationships: Business associations that link occurrences of one or more entities and tell how they relate to one another.
    • Attributes: Data facts about an entity or a relationship. It is the lowest level of information that still has business meaning without further decomposition. Also known as columns.

    Note: ERDs can be LDMs or PDMs. In actual use, LDM ERDs don't usually have abbreviations, but PDM ERDs frequently do due to the physical database name size limitations. Abbreviations are used in this course to reduce scrolling for your convenience.

    Translating from Tables to ERDs

     
    Throughout this module ERDs are shown for various data modeling scenarios covered earlier in the course. For each, a button is provided so that you can see the equivalent model using table layouts for comparison. This course is using the IDEF1x notation. There are other notation styles such as Information Engineering (uses the crows foot to designate the many side of a relationship.)

    ERD Symbols

     
    ERD Development Tools
    There are several commercially available tools for developing ERDs, and each has a slightly different set of standards and symbols. We will be using ERwin in this course because it is the most commonly used in the field.

    ERD Conventions

     
    The following conventions are used in this course, and in ER diagramming in general. The modeling notation shown is IDEF1X and is ERwin compatible. Note: Some customers and modelers prefer using IE notation.
    Independent entity. An independent entity does not depend on another entity for its identification. It should have a single-column PK. PK attribute appears above the horizontal line.
    Dependent entity. A dependent entity depends on one or more other entities for its identification. It generally has multiple columns in its PK, one or more of which is also an FK. All PK attributes appear above the horizontal line.
    One-to-Zero, One, or Many occurrences (1:0-1-M). Solid lines indicate a relationship (join path) between two entities. The dot identifies the child end of a parent-child relationship between two entities. The PK of the parent is part of the PK of the child.
    The dotted line indicates that the child does not depend on the parent for identification.
    One-to-At least One or More occurrences (1:1-M)
    One-to-Zero, or at most One occurrence (1:0-1)
    Zero or One-to-Zero, One, or Many occurrences (0-1:0-1-M). The diamond shape on the originating end indicates the relationship is optional. Physically, this means that a NULL value can exist for an occurrence of any row of the entity positioned at the terminating end (filled dot) of the relationship.
    Zero, One or Many-to-Zero, One, or Many occurrences (M:M or 0-1-M:0-1-M). The diamond shape on the originating end indicates the relationship is optional. Physically, this means that a NULL value can exist for an occurrence of any row of the entity positioned at the terminating end (filled dot) of the relationship.
    This is an incomplete subtype symbol which indicates that NOT all possible subtype entities to the supertype are shown.
    This is a complete subtype symbol which indicates that all possible subtype entities to the supertype are shown.

    Dependent Entity

     
    Entity Types
    • An Independent Entity: Does not rely on another entity for its identification. Its single PK attribute appears above the horizontal line within a square box.
    • A Dependent Entity: Does rely on another entity for its identification. Its multi-attribute PK appears above the horizontal line within a rounded box and includes at least one FK from a parent table. It may or may not include PK components that are not FKs. History, sub-entities and associative entities are considered dependent.

    Dependent Entity Example

     

    In our hardware enterprise model, we defined a M:M relationship for PO/Part, shown in an associative entity. In that model, a customer can order the same part only once on any given PO. This means that if a customer initiates a PO and orders 5 widgets, then calls back a day later (before the PO is filled) and wants to add 6 more widgets to the PO, the order entry person must update the current PO/Part record and change the order quantity from 5 to 11. This may in fact be fine. Or it may not, in which case a change to the model is required (shown below). If the hardware vendor wants to track this kind of change, they may implement a constraint that says that order quantity may not be changed, but that a new line item should be added to the PO instead to reflect changes. This would allow the same customer to order the same part multiple times on the same PO. The model would look like this: PurchaseOrder is an independent entity. POLineItem is a dependent entity. A line item has no existence without its PO. Note that there is still a M:M relationship between PO and Part, but that now it is mediated (indirect) through PO/LineItem.


    See Table Equivalent

    Sub-Entity

     
    Review: Sub-Entity = Sub-Type
    Sub-entities are used to facilitate the relational rule that all of the attributes in an entity (a "Super-Entity") should describe all of its rows. If some attributes in fact apply to only a defined subset of the rows it may be modeled as a new entity, called a sub-entity or sub-type. Sub-entities that are split out from one entity have:
    • The same PK as the original entity and as each other
    • The PK for each is also an FK from the original entity
    • Different attributes from one another
    • No overlap in rows in the two sub-entities

    Sub-Entity Example

     
    The sub-types for an entity are shown in rounded boxes below a sub-type symbol.


    See Table Equivalent

    Types of Sub-Types

     
    • Complete: There are sub-entities for all of the possible sub-types. Every instance of the parent table must participate in one and only one of the possible sub-types defined. The ERD symbol for this is a circle with two parallel horizontal lines below it: In the example above, every employee is either hourly or salaried.
    • Incomplete: There are sub-entities for only a subset of the possible sub-types. Instances of the parent table may or may not participate in one of the possible sub-types defined. The ERD symbol for this is a circle with one horizontal line below it: In the example below an employee might be something other than salaried or hourly (e.g. contract), and there is no additional sub-type information for such employees.

    See Table Equivalent

    ERDs and NULLS

     
    In ERwin, you can't show NULL conditions in the logical model, but you can in the physical model. So, this can be a way to tell if an ERD is an LDM or a PDM. There is no way to show in ERwin notation that NULLs are or are not allowed in a given attribute. Conditions related to NULLs are documented in the constraints list, rather than the diagram. This is one place where table layouts with sample data can be very helpful.

    Entity State NULLs

     
    An entity state NULL occurs if the attribute instance might or might not be NULL, depending on the state of the particular entity instance in which it occurs. In practical terms this means that an identifiable subset of the rows in a table may have NULLs in this attribute. This condition is a warning flag that a sub-entity may need to be defined.

    Entity Type NULLs

     
    An entity type NULL occurs if the attribute instance is always NULL for the same types of instances of the entity. In practical terms this means that an identifiable subset of the rows in a table must have NULLs in this attribute. This condition indicates that a sub-entity does need to be defined. An example would be the Hourly/Salaried Employees as shown above. Hourly Employees always have NULL in MoAmt and Bonus. Salaried Employees always have NULL in HrAmt and Overtime.

    Relationship Types

     
    Identifying vs. Non-Identifying
    • An Identifying Relationship: The PK of the independent entity migrates to the PK of the dependent entity. Depicted as a solid line.
    • A Non-Identifying Relationship: The PK of the independent entity migrates to the area outside of the PK of the dependent entity. Depicted as a dotted line.
    Note: Sometimes identifying and non-identifying relationships are called defining and non-defining.

    Identifying Relationship Example

     
    PurchaseOrder to POLineItem is a dependent relationship. POLineItem to Part is an independent relationship. Notice that this is the same model shown above to distinguish independent from dependent entities.


    See Table Equivalent

    ERD and M:M

     
    Several Ways
    There are several ways to show M:M relationships on an ERD. Which one you use depends on the nature of the M:M relationship and whether or not you need to track attributes for the association.

    M:M without Non-key Attributes

     

    See Table Equivalent

    M:M with Non-key Attributes

     

    See Table Equivalent

    M:M Recursive Relationship without Non-key Attributes

     

    NOTE:
    This model could also be shown "resolved" as is the above example. In that case it would show a Part/Component Part entity, without additional attributes. This is the way M:M relationships are shown in many models. It is the only way to show it if there are attributes for the Part/Component Part entity.
    See Table Equivalent

    ERDs Minimize M:M Relationships

     
    There will be many M:M relationships in every business, and these relationships can get extremely complex (the number of FKs in the PK is theoretically unlimited). To simplify the ERD the designer may choose to include associative entities only for relationships where actual data about the association is being tracked, while showing unattributed M:M relationships as a line with double labels (see example above). When using table layouts, all associative entities are always shown, for clarity, consistency and accuracy.

    ERD vs. Table Layouts

     
    FormatBenefitsProblems
    ERD
    • More flexibility in showing variations of cardinality
    • Fits more entities per page of print
    • Industry standard for LDM documentation
    • Shows the big picture which makes it easier to understand the relationships
    • Errors are sometimes invisible because detail is lost (no sample data)
    • Complex: Looks like a circuit board if there are many entities
    • Special training required
    Tables
    • Identifies data modeling errors early by showing sample data
    • More intuitive to understand
    • Easier for non-technical people to use for communication with each other and with technical people
    • Takes too much space
    • Takes more time to populate sample data
    • Precise cardinality not as easy to see
    • Not the industry accepted standard for documentation
    • Harder to understand the big picture and to see the relationships.

    In Practical Reality

    Often design teams do not have time to specify sample data and draw out tables. They usually model directly into an ERD and just discuss sample data along the way as possible. Sample data may not always be available. Frequently the designers have to do the best they can with what they've got and then validate.

    Try It: ER Diagrams

     
    Step 1 of 3
    Take the table layout that we created for Podunk University (PU) and translate it into ERD format.

    For each table below, choose the type of entity it represents from the drop-down menu provided. Use table layouts shown below for reference.

    Degree Faculty/Publication
    Department Job Skill
    Employee Project
    Employee/Department Publication
    Faculty/Degree Sponsor
    Faculty/Project Staff/Skill

    Step 2 of 3

     
    As a result of Step 1, we've drawn the ERD below, showing independent entities in square boxes, and associative entities in rounded boxes. The next step is to modify the diagram to show PKs.

    Directions: For each table, choose the the PK by clicking on the bottom-most attribute that is part of the PK.

    Employee
    Emp#
    EmpType
    EmpName
    Faculty/Degree

    Emp#
    DegType
    Subject
    Univ
    AwardDate

    Degree
    DegType
    DegDescr

    Employee/Dept

    Emp# (FK)
    Dept# (FK)
    StartDate

    Department
    Dept#
    DeptName

    Staff/Skill

    Emp# (FK)
    Skill# (FK)
    AcquireDate

    Job Skill
    Skill#
    Descr

    Faculty/Project

    Emp# (FK)
    Proj# (FK)

    Project
    Proj#
    Spon# (FK)
    Topic
    StartDate
    EndDate

     


    Sponsor
    Spon#
    Name

    Faculty/Pub

    Emp# (FK)
    Pub# (FK)
    Auth/Ed

    Publisher
    Pub#
    Spon# (FK)
    Title
    Publisher
    PubDate

    Step 3 of 3

     
    Symbol Legend:
    A
    B
    C
    D
    Directions: Using the symbol legend to the right, for each set of related tables below, choose the correct relationship symbol (and direction) to apply to the ERD. Use table layouts shown above for reference. Hint: It may be easier to do this on paper first for reference.




    Employee to Employee/Department
    A   B   C   D
    Employee to Faculty/Degree
    A   B   C   D
    Employee to Faculty/Project
    A   B   C   D
    Employee to Faculty/Publication
    A   B   C   D
    Employee to Staff/Skill
    A   B   C   D
    Employee/Department to Department
    A   B   C   D
    Faculty/Degree to Degree
    A   B   C   D
    Faculty/Project to Project
    A   B   C   D
    Faculty/Publication to Publication
    A   B   C   D
    Sponsor to Publication
    A   B   C   D
    Sponsor to Project
    A   B   C   D
    Staff/Skill to Job Skill
    A   B   C   D

    Show completed model (Displays the complete model for this exercise.)

     9: Resources

    Data Modeling History

     
    This is optional additional information about the evolution of data modeling as a field, and is not necessary to the practical understanding or utilization of data modeling theory and techniques.

    Relational Theory's Founding Fathers

     
    • Dr. E.F. Codd, creator of the Relational Model published these two seminal papers:
      • A Relational Model of Data for Large Shared Data Banks, 6/70
      • Extending the Database Relational Model to Capture More Meaning, 12/79
    • C.J. Date: An associate of Dr. Codd, who co-created the Relational Institute with him. Published the first book on Relational Theory:
      • An Introduction to Database Systems, 1974

    1960's: Real Men Automate Processes!

     
    • Data design is seen as a nuisance.
    • Tool development concentrates on automating flow charts.
    • Processes are individually automated.

    1970's: Come On People, Now, Let's Get Together!

     
    • Integrating the design of processes and use of System Development Methodologies (SDMs) is declared to be the savior of poor development performance.
    • Tool development concentrates on automating process design.
    • Normalization is born but spends an unnoticed adolescence.

    1980's: We Don't Need No Stinking Process Models!

     
    • Two strong camps appear: the "data devotees" and the "process people".
    • Automated normalization tools and data synthesis are touted as "the saviors of poor development performance" (deja vu all over again?).
    • Tool development concentrates on automating both normalization and "picture drawing" of data models.
    • Data decomposition appears and grows.

    1990's: Come On People, Now, Let's Get Together! (redux)

     
    • The need to integrate business process analysis and data modeling is recognized.
    • Tool development concentrates on integrating both techniques.
    • Excellent tools finally appear.
    • Data synthesis disappears and data decomposition wins out.
    • Hardware and software DBMS platforms provide sufficient performance to begin to support implementation of normalized physical data models.

    For More Info

     
    • The Data Modeling Handbook Michael Reingruber, William W. Gregory, John Wiley & Sons, 1994, ISBN 0471052906
    • Data Modeling for Information Professionals
      Bob Schmidt (Editor), Prentice Hall, 1998, ISBN 0130804509
    • Mastering Data Modeling-A User-Driven Approach John Vincent Carlis, Joseph Maguire, Addison-Wesley Pub Co, 2000, ISBN 020170045X
    • Handbook of Relational Database Design Candace Fleming and Barbara von Halle, Addison-Wesley, 1989,
      ISBN 0-201-11434-8
    • Data Modeling Essentials Graeme Simsion, Van Nostrand Reinhold, 1994, ISBN 1850328773

    Glossary

     
    This Glossary of Logical Data Modeling terminology includes industry standard definitions.

     
    1st Normal Form
    Each non-key attribute in the entity is a single-occurrence fact that is functionally dependent upon the PK. There should be a 1:1 relationship between the PK and each attribute in an entity. There should be no repeating groups.

    2nd Normal Form
    Each non-key attribute in the entity is functionally dependent upon the entire PK. This form applies only to tables with multi-column PKs. No attribute should relate to one part of the PK directly, and indirectly or not at all to the rest of the key: The attributes depend on the whole key.

    3rd Normal Form
    No attribute in the entity is transitively dependent upon the PK. No extraneous attributes (attributes that relate to other attributes more strongly than to the PK or relate to a missing entity more than to the PK): The attributes depend on nothing but the key.

    4th Normal Form
    The entity's PK represents a single multi-valued fact that requires all PK attributes be present for proper representation.

    5th Normal Form
    The entity represents, in its key, a single multi-valued fact and has no unresolved symmetric constraints. A 4NF entity is also in 5NF if no symmetric constraints exist.

    Aggregate Attribute: SEE: Derived/summary attribute.

    Aggregate Entity
    A logical object that appears to be an entity but which contains only derived/summary attributes. It is, therefore, not a true logical data model entity.

    Application Data
    The data commonly thought of as "business data", such as account numbers, customer names, dates of birth, transactions, etc.

    As-Is
    As-Is Data Environment: The current data environment. The existing legacy systems along with data sources which might be purchased or available through on-line data providers.

    Assertion: SEE: Constraint

    Associative Entity
    An entity which contains information relating to instances of two or more other entities or between different instances of a single entity. Also called Intersection Entity. Models a M:M relationship.

    Atomic Data Object
    A logical or physical data object whose meaning and/or value is not based on other data objects. For example, Unit Price is atomic, Gross Margin is not.

    Attribute
    An elementary logical data object that describes or provides details about an entity. A column is the physical implementation of an attribute. The terms column and attribute are commonly used interchangeably. Many people also call an attribute a field, as well as a column.

    Availability (of data)
    The level to which data objects should be available to business end users. That is, Availability addresses the question "How easy is it to access these data?".

    BDO: SEE: Business Data Object

    Big Bang Data Warehouse
    A data warehouse that is created in one iteration and contains all the data available in the environment. It is impossible to create such a data warehouse in most enterprises for lack of available resources. It is inadvisable in any case because it would include a large amount of data that has no business value.

    Business Data Item: SEE: Business Data Object

    Business Data Object
    A data object that is visible to and of use to the business end user. A BDO may be atomic or derived.

    Business Process Model
    A graphical and textual document which describes how business processes relate with respect to data interchange.

    Business Question
    A question that a business person needs answered in order to perform her/his business function or which, if answered, can provide insight into a business situation.

    Business Question Analysis
    The technique of determining the business data objects of interest to the enterprise by requesting each Business SME to submit a set of business questions, ranked by value. These business questions are used to identify business data objects.

    Business SME
    A subject matter expert on the business processes and operations of the enterprise. SEE: Subject Matter Expert and Data SME

    Candidate Key
    An attribute or set of attributes that can be used to uniquely identify instances of the entity. One of these candidate keys is chosen as the key of the entity. If no appropriate candidate exists, a system-assigned incremented number is used.

    Cardinality

    • Table: The number of rows in the table
    • Relationship: A restriction on the number of instances of the dependent entity which appear for each instance of the independent entity. Example: 1:0-M. "1" and "0-M" are cardinality statements.

    Categorization: SEE: Generalization.

    Cell
    The intersection of a row and a column in a table where an individual data value resides.

    Column: SEE: Attribute

    Complete Generalization
    A generalization in which all possible types of the super-type appear in the logical data model as sub-types. SEE: Generalization and Incomplete Generalization

    Compound Key
    A key which contains more than one attribute.

    Constraint
    A business rule that applies to one or more attributes in one or more tables and defines limitations on appropriate values for the attribute. Example: Birth Date must be earlier than Hire Date.

    Contradicting Definitions
    One attribute or entity that appears to have two or more definitions that are contradictory. For example, Employee might be "anyone receiving pay from our company" or "anyone with an employment contract with our company".

    Coverage (of data)
    The level to which data objects represent the existence of their corresponding real-world object.

    Data Accuracy
    The level to which data objects represent the correctness of their corresponding real-world object.

    Data Definition Language
    DDL is the subset of SQL commands used to define tables, columns, indices, data space, etc.

    Data Demographics
    Descriptions of the characteristics of data expected to appear in the database. This is used to help size the physical database platform. It includes data type and size, indexes, nullability, variable lengths, update frequency and number of rows in the table. These data are qualified by time, including average growth/shrinkage and peak growth/shrinkage Example: Right before and after historical data purge.

    Data Object
    A discrete item of logical or physical data, such as an entity, table, attribute, instance, row, or index.

    Data Manipulation Language
    DML is the subset of SQL commands used to retrieve, update and delete data in the database.

    Data SME
    A subject matter expert on the physical and logical data environment of the enterprise. It is common that there are two subsets of Data SMEs, the people familiar with the logical data (reports and summaries prepared, usually, by data analysts and/or IT) and those familiar with the physical data (files, tables, etc.). SEE: Subject Matter Expert and Business SME.

    Data Source
    A physical table, file, etc. from which data serves as a source for a target business, logical or physical data object. SEE: Source Data Object and Data Sourcing Model.

    Data Sourcing Model
    The definition of the target business, logical and physical data objects, the source physical data objects and the mapping algorithms used in deriving the target(s) from the source(s). Included is the frequency of updates. SEE: Data Source and Source Data Object.

    Data Warehouse
    The physical and logical structures built to provide to the business end user a single logical view of the business data objects.

    Database
    A collection of related business data and the metadata that describes it.

    DDL: SEE: Data Definition Language.

    Decision Support

    • The act of making strategic and tactical decisions by drawing inferences from past, existing and projected states (as compared to making objective decisions such as inventory replenishment)
    • The environment which supports making the above-described decisions (e.g., data warehouses, ad hoc query user interface)

    Degree

    • Table: The number of columns in a table
    • Key: The number of attributes in the key.

    Demographics: SEE: Data Demographics and Query Demographics

    Dependent Entity
    The thing modeled exists only if some other entity exists, and depends on that entity for part of its identity. E.g., Line Items exist only within Purchase Orders, never on their own. Dependent entities have multi-column PKs.

    Derived Data: SEE: Derived/Summary Attribute.

    Derived/ Summary Attribute
    An attribute whose values are algorithmically determined from the values of other attributes or whose definition is based on the definitions of other attributes. Example: UnitMargin is calculated from UnitPrice minus Unit Cost.

    Dimension
    As used by multi-dimensional databases (MDDs): A distinct logical hierarchical grouping, such as "geography" (e.g., Store, District, Region) which can be selected and used by the end user to display aggregations at any of the levels in the hierarchy.

    DML: Data Manipulation Language.

    Domain
    The valid value set from which an attribute's values are drawn.

    DSS: Decision Support System.

    DW: Data Warehouse or Data Warehousing.

    Dynamic Materialization (of a BDO)
    On the fly creation (derivation) of the value of a non-atomic (derived) business data object. There usually is a performance penalty for dynamic materialization. SEE: Static Materialization.

    Element
    In physical data modeling, one column in a table or field in a file. Also called: Data Element.

    Entity
    An logical data object (person, thing, idea, concept, etc.) about which information must be maintained by the enterprise and which is itself described by other data objects (attributes). A table is the physical implementation of an entity. The terms entity and table are commonly used interchangeably.

    Entity Pool
    A collection of ideas about what might be entities. This is created in a brainstorming session early on in the logical data modeling project and serves as a means to stimulate conversation on entity possibilities and to serve as the basis for the first LDM.

    Entity Relationship Diagram
    An industry standard visual representation of an LDM, showing Entities, Relationships and Attributes.

    Entity State Null
    A NULL where the attribute instance might or might not be NULL, depending on the state of the particular entity instance in which it occurs. An identifiable subset of the rows in the table may have NULLs in this attribute.

    Entity Type
    The formalized term for an Entity. SEE: Entity.

    Entity Type Null
    A NULL where the attribute instance is always NULL for the same types of instances of the entity. An identifiable subset of the rows in a table must have NULLs in this attribute.

    ERD: Entity Relationship Diagram.

    Facet
    A grouping of entities in an LDM that represent a single business functional area. SEE: Subject Area.

    FK: Foreign Key.

    Foreign Key
    An attribute or set of attributes in an entity that appear in that entity due to key migration from another entity via a relationship. Notation is FK.

    Generalization
    A construct in which one entity (the super-type) is defined as consisting of a set of other entities (the sub-types) and in which the sub-types inherit the PK of the super-type. For example, an Insurance Policy (super-type) can be an Auto Policy, a Life Policy, a Homeowners Policy, etc (the sub-types). Generalizations are used when each sub-type carries different attributes from the other sub-types. Generalizations are also called categorizations or Super/Sub-types or Sub-Entities. SEE: Complete Generalization and Incomplete Generalization.

    Hierarchy Relationship
    A recursive relationship in which instances relate to other instance in a hierarchical fashion (e.g., ownership of each level by the next higher level applies). Example: A manufacturing concern that makes parts and also assemblies made of those parts. SEE: Recursive Relationship.

    Homograph
    Two words, both with the same construction, but with different meanings, Example: Lead can be either a very heavy silvery metal or to show someone the way. Synonym: Homonym.

    IDEF0
    A process modeling technique and syntax developed for the United States DoD in the early-mid 1980s.

    IDEF1X
    A logical data modeling technique and syntax developed for the United States DoD in the early-mid 1980s. Based on IDEF-1.

    Identifying Relationship
    A relationship in which the key attributes of the independent entity migrate into the dependent entity as key attributes.

    Incomplete Generalization
    A generalization in which there are types of the super-type which do not appear in the logical data model as sub-types. SEE: Complete Generalization.

    Independent Entity
    The person, place, thing, etc. modeled in this entity has existence in and of itself. Example: A Customer is a person who exists separate from the company. Independent entities have a single-column PK.

    Industry LDM
    An LDM that describes a specific industry as a whole. It describes the business functional data for the entire industry.

    Instance (of a data object)

    • PDM: A row or record on a physical table or file, or a value in a column or field.
    • LDM: A hypothetical data value or set of data values that might appear in the attribute or entity. Logical data model instances are used to aid in the understanding of the logical objects. Also referred to as "occurrence" or "sample data".

    Instance Diagram
    A graphical technique used in logical data modeling or physical data modeling to show how instances of entities or rows in tables are related to each other. Also called "occurrence diagram".

    Intersection Entity: SEE: Associative Entity.

    Isomorphic LDM
    A model that correctly represents the physical, logical or theoretical system it is meant to describe.

    Join
    In a physical database, the act of retrieving information from two tables based on a matched value in their related same-domain attribute. Example: Join the Department table to the Employee table on the Dept# column.

    Key Attribute
    An attribute that participates in uniquely identifying instances of the within an entity. SEE: Primary Key.

    Key Migration
    An independent entity's key attributes appear in a dependent entity and define its relationship to the independent entity. These attributes might be either key or non-key attributes in the dependent entity, depending on the type of relationship. The process of a Primary Key becoming a Foreign Key. SEE: Primary Key and Foreign Key.

    Latency (of data)
    The level to which data objects reflect changes to their corresponding real-world object, with respect to time. That is, Latency addresses the question "How soon is a change to the real-world object available to the business user who accesses this object?".

    LDM: Logical Data Model.

    LDO: Logical Data Object.

    Logical Data Item: SEE: Logical Data Object.

    Logical Data Model
    A graphical representation of the way a business organizes its information. It is not a database, and is platform-independent. It is a logical and unbiased representation of the business data. The data model defines which individual data elements exist and how they relate to one another. It provides an accurate map of business information. The LDM is designed to answer all of the required business questions, both known and future that may arise in the business.

    Logical Data Object
    An entity or attribute from an LDM.

    Many-to-Many Relationship
    A single instance in entity A may relate to zero, one or many instances in entity B, and vice versa. Notation is M:M.

    Metadata
    Metadata is information that describes the data structure of a system. The LDM is a visual representation of metadata and is not implemented. When a database is implemented from the PDM, the data definitions for the tables, columns and relationships, as well as their physical characteristics (data type and length and indexing, etc.) are stored physically on the system. These are metadata. The tables that house these structural descriptions are also considered metadata because they define and the implemented system.

    MRA: SEE: Multi-relationship Assertion.

    Multi-relationship Assertion
    An inter-entity assertion where the constraint statement spans more than one relationship.

    Network Relationship
    A recursive relationship in which any instance can relate to any other instance. Example: Origin and Destination cities for a Freight Forwarding company. SEE: Recursive Relationship.

    Non-identifying Relationship
    A relationship in which the key attributes of the independent entity migrate into the dependent entity as non-key attributes.

    Normalization
    Normalization is a set of rules and a methodology for making sure that the attributes in a design are carried in the correct entity to map accurately to reality, eliminate data redundancy and minimize update anomalies.

    Null
    An unknown or indeterminate value. It is not equivalent to zero or blank. A null is not even necessarily equivalent to another null.

    Occurrence: SEE: Instance and Row.

    Occurrence Diagram: SEE: Instance Diagram.

    One-to-Many Relationship
    A single instance in entity A may relate to zero, one or many instances in entity B, but not vice versa. Notation is 1:M.

    One-to-One Relationship
    Zero or one instance in entity A exists for each instance in entity B. Notation is 1:1. 1:1 relationships tend to become 1:M over time.

    Orphan
    A database row that violates referential integrity or its corresponding business rule. Example: An Employee who is assigned to a Department that no longer exists.

    Orthogonal
    Orthogonal variables are completely unrelated. They are independent. Changes in one have no effect on the other.

    PDM: Physical Data Model.

    PDO: Physical Data Object.

    Physical Data Item: SEE: Physical Data Object.

    Physical Data Model
    What is actually implemented on a hardware platform. It is derived from the LDM to include changes that accommodate specific platform performance issues such as storage size and access performance.

    Physical Data Object
    A table, column, file, segment, field, etc. which exists on a physical device. The definitions of the objects may also be referred to as physical data objects.

    PK: Primary Key.

    Populate
    Insert the application data into a defined database.

    Populated
    A database that contains application data. SEE: Populate.

    Primary Key

    • LDM: The attribute or set of attributes that uniquely identify each instance in an entity. Notation is PK.
    • PDM: The column or set of columns that are used to establish the physical storage location of the row. NOTE: This is the generally used term in the industry, but is incorrect for Teradata. In Teradata, this is called the Primary Index (PI) specifically to differentiate it from the logical modeling term PK, since it may in fact be made of attributes other than the ones defined for the PK in the LDM and may not be unique.

    Quality (of data)
    A collective name for the measures of the relevant characteristics of data objects. Quality includes Accuracy, Availability, Coverage and Latency.

    Query
    An SQL SELECT statement that retrieves rows that meet specified criteria from the database.

    Query Demographics
    Descriptions of what queries will be run against the database. This is used in sizing processors and network links in the system. They are intelligent guesses about what queries will run from where, how often, by whom, expected response times, size (rows returned) and anticipated peak usage. Dynamic (as opposed to batch) updates are usually included as a separate group. SEE: Data Demographics.

    Recursive Relationship
    Data hierarchies and networks are recursive relationships, where an entity has a relationship to itself. Example: Part/Component Part. A part may be made of many component parts (an assembly), and it may in turn be a component part of some larger assembly. SEE: Network Relationship and Hierarchy Relationship.

    Redundant Data
    Data that appears more frequently than need be in a model (e.g., department name if carried in the employee entity would result in the same character string being repeated in many instances), or which can be derived from other elements in the model. SEE: Derived/Summary Attribute.

    Referential Integrity
    The state where an FK value relates to a PK value elsewhere in the design. A breach of referential integrity is the opposite: An FK value exists that does not also exist as a PK value elsewhere in the design. Example: An employee's department number as carried in the Employee entity does not exist in the Department entity.

    Relation
    Synonym for Entity in formalized relational theory.

    Relationship
    A statement about how two or more entities are related and the restrictions placed on that relationship. Example, a Department employs zero, one or many Employees and an Employee can work for one And only one Department (italics identify the relationship between Department and Employee).

    Relationship Name
    A text description of a relationship shown on an ERD next to the line that connects two Entities to visually represent their relationship.

    Report
    The result of a query.

    Row
    The data representing a single member of an Entity. Example: John Smith is represented by a row in the Employee table. Synonyms: Instance, Tuple.

    Select
    A query chooses specific data from one or more tables for a report, based on specific criteria (e.g., values in attributes). Example: Select all Men, age 50 or older, who make more than 50K annually.

    Set Theory
    A field of mathematics that deals with groupings of data. It is one of the foundations of relational theory.

    SME: Subject Matter Expert.

    Snow Flake Schema
    A model consisting of multiple star schemas. SEE: Star Schema.

    Source Data Item: SEE: Source Data Object.

    Source Data Object
    A physical data object (e.g., table, column, file, field, etc.) used in creating a data warehouse data object. SEE: Data Source and Data Sourcing Model.

    Star Schema
    A model where the tables are denormalized in a very specific way, usually for a specific application or set of queries. It optimizes specific sets of joins and is a PDM technique that should not be used during LDM where normalization is the ideal. Also known as dimensional modeling. The basic premise of the Star Schema is that information can be classified into 2 groups:

    • Facts: Core elements being analyzed e.g., point-of-sale (POS) data
      • There are no cyclic or recursive join paths
      • Denormalized to avoid multi table joins (for performance)
    • Dimensions: Attributes of the facts e.g., product, market, promotion ,time
      • Most if not all analysis is based upon dimensions
      • Every PK of a dimension table is an FK in the fact table

    Static Materialization (of a BDO)
    Physical creation (derivation) and storage of the value of a non-atomic (derived) business data object. There usually is a performance benefit for static materialization. There are only two reasons to do this:

    1) Performance (reduces constant recalculation and full table scans)
    2) Explicit documentation of the algorithm to prevent change.

    SEE Dynamic Materialization.

    Sub-Entity: SEE: Generalization.

    Subject Area
    A grouping of Facets into a larger business functional area. SEE: Facet.

    Subject Matter Expert
    A person highly knowledgeable of the subject at hand.

    Sub-type: SEE: Generalization.

    Summary Attribute: SEE: Derived/Summary Attribute.

    Super-type: SEE: Generalization.

    Table: SEE: Entity.

    To-Be
    The business process environment which is hoped to exist after whatever process redesign is complete.

    Tuple: SEE: Row.

    View
    A pre-defined subset of one or more tables. Views are used to control access to the underlying tables and simplify access to data. All DBMSs support views, but Teradata materializes the view.

    Visible
    Data that can be seen by a user of the data warehouse regardless of where originate. Visible data may dynamically calculated, or may be stored in the current or some other system and dynamically accessed by the data warehouse facility.