Objectives
|
This module enables the student to:
- Define the term Logical Data Model (LDM)
- Articulate the importance of an LDM
- Describe the importance of creating a good LDM
- Identify the differences between an LDM and a Physical Data Model (PDM)
|
What is an LDM?
|
A Logical Data Model (LDM) is a graphical representation of the way a business organizes its information. It is not a database, and is platform-independent. It is a logical and unbiased representation of the business data. The data model defines which individual data elements exist and how they relate to one another. It provides an accurate model (or map) of the business information. The LDM allows the user to determine if all their business questions can be answered.
LDM Deliverable
An LDM is the deliverable that results from the logical data modeling process. It must have no unnecessary components and therefore minimal data redundancy. It represents data requirements in a correct, sharable, stable and flexible format:
- Stable designs can service many application and business requirements over the long term
- Correct designs map the data one-to-one to the business as it is in the real world
- Sharable designs reflect no bias toward a particular application or business requirement
- Flexible designs change little or not at all as the business environment changes
|
Industry LDMs
|
Teradata Solution Industry LDMs
Logical data models (LDMs) are critical in the design, creation and maintenance of data warehouses and are the heart of many Teradata solution offers. Teradata has created industry specific LDMs that are designed to support the specific business needs of that industry. Each industry LDM is comprehensive and ensures that the appropriate data will be included in the warehouse for those critical business needs.
The major Teradata Solution Industry specific LDMs are Communications, Financial, Retail, and Travel. The Teradata Division has invested many person years developing and implementing these models so that they require minimal modification for a customer implementation. This means the data modelers can get the data warehouse designed in minimum time with maximum functionality. Using a model that is already tested, accurate, and complete gives the customer a sense of security and confidence in our ability to deliver. Training for each specific industry LDM is available.
|
Benefits and Differentiators
|
Overview
There are many reasons to use the Teradata Solution Industry LDMs. One of the most common is faster warehouse implementation, as noted above. Teradata's LDMs jumpstart the planning, design and implementation process of a data warehouse. It is a template or starting point for the warehouse.
Industry LDM Differentiators
- Cost Savings: The cost is low compared to competitors. It enables companies to ask business questions within several months instead of years, saving significant money and resource expenditures.
- ROI: Faster implementation means achieving ROI sooner by implementing better, more insightful business decisions within a shorter time frame.
- Reduces Risk:
- Proven at other existing customer locations
- Current with best practice
- Supports known industry business requirements
- Provides a long term foundation
- Teradata is expert at modeling
- Proves business question feasibility and data requirements
- 1:1 Mapping: Teradata supports near-one-to-one mapping from the LDM to the physical design, which isn't possible with competitors' products.
- Low Maintenance: Near-one-to-one mapping saves significant time and money initially, and even more over time as it reduces the need for re-design of both the data warehouse and its applications when additional functionality is required.
- Communication: Creates a common business language that lets people from all walks of the enterprise, from most to least technical, communicate during warehouse implementation. This makes buy-in easier, because each party can see that what they want and need is addressed.
- Flexible/Extensible Modularity: Gives the customer the flexibility to implement whatever functional portions of the LDM are useful to their business when they start, and scale up later with additional modules.
- Cross Functional: The LDM provides a single, unbiased customer view across all products and functions of an enterprise.
- Customer Centric: Allows a customer to be viewed in any and all ways the organization defines that customer. Many of the industry LDMs support an unlimited number and depth of customer hierarchies and relationships to reflect different views of the customer.
Before we can modify an industry specific LDM, we first need to understand why LDMs are needed and understand the basic structure and philosophy behind their design.
|
Why Create an LDM?
|
Benefits of the LDM to the Customer
- Creates an accurate, consistent model of the way the business views its data
- Explicitly states business rules
- Provides a roadmap for data integration across the enterprise
- Creates a foundation of understanding about what data requirements exist, not how these data are used
- Cross validates with the process model
- Provides consistent communication of business meanings across all levels of participants
- Looks beyond current system limitations
- Provides an objective means of IT development project prioritization
- Starts with a complete, accurate picture of the business before attempting to determine the best data warehouse implementation approach
- Improves probability of success by creating stability.
PS and the LDM
While our solution LDMs are already created, during implementation some level of modification is always required. Usually, Teradata associates handle this process, from gathering information on what changes are desired, to actually implementing them. A full understanding of the existing LDM as well as modeling principles is essential, so that the changes they define continue to uphold the integrity of the original model. Teradata uses the LDM during an engagement to communicate with the customer about the content and progress of the project, as well as to document decisions, and guide implementation tasks.
Sales and the LDM
While the LDM is marketed as an integral part of the Solution, it's important to know what it buys the customer so that you can discuss the advantages cogently and leverage them.
Building an LDM for a large enterprise wide system is an immense task. It takes many person years. Although building an enterprise LDM is a huge task, it is typically implemented in phases, and doesn't have to be completed before any other work can be done. Because it is so complex, it takes many iterations to fine tune and be sure that nothing is missed. The good news is that our customers don't need to build their own LDM for any industry where we've developed a solution LDM to go with our Teradata Solution. Using a solution LDM can save the customer years of effort and millions of dollars.
|
Analogies Help Explain the LDM
|
 Foundation Analogy: The LDM is the foundation on which the physical design, implementation, and applications are built. The foundation of a building determines its overall structural integrity and ultimately its usefulness and longevity (and costs associated with later repairs). The LDM does the same for a data warehouse (DW). If the LDM is not designed correctly, the entire decision support capability is compromised. If the foundation is faulty, the building fails, causing huge cost at the least and disaster at the worst.
 Map Analogy: An LDM is like a map that shows what and where all the data is. And like a map, it is only valid and useful to the extent that it reflects reality. If a map does not show a particular city, the city may still exist, but a traveler won't be able to find it using that map. An LDM should be detailed and reflect every aspect of the business accurately. For example, try finding Zurich on this map
 Communication Analogy: Imagine trying to build a house without a blueprint! How would the various contractors communicate with each other? What kinds of inconsistencies, errors, expenses and emergencies would arise from the lack of a clear, consistent plan? The house might never be completed at all. If it were completed, it would probably not match the buyer's expectations.
 Engine Analogy: The LDM is like the engine of a car: you don't need to understand exactly how it works, or ever see or touch it yourself. But without it, the car won't run at all, and it's best to have an expert build it for you. A well designed LDM is essential to the success of the data warehouse in supporting the business.
|
LDM vs. PDM
|
An LDM is NOT the same as a PDM!
A Physical Data Model (PDM) is what you actually implement on a hardware platform, and may be designed to accommodate specific platform performance issues. The LDM is exclusively a logical and unbiased representation of the business data. It does not address platform or performance considerations at all (e.g., It is platform and hardware independent). The PDM may be created from the LDM, but only after the LDM is complete and validated.
| | LDM | PDM |
| Includes | Entities and Attributes | Tables and Columns |
| Rows Identified By | Primary Key | Primary Index |
| Names Used | Business Names | Names may be limited by DBMS requirements |
| Normalization | Normalized to 3rd Normal Form (discussed in a later module) | May be denormalized for performance and easier programming depending on platform |
| Redundancy | No redundant data | May include redundancy |
| Derived Data | No derived data | May include summarization |
| Who Develops It | Business experts and logical designers | Physical design experts |
Simplified Scenario for a Complete DW Project:
- The data required to answer the business questions is originally identified, documented and validated in the LDM, without reference to any platform.
- A PDM is developed from the LDM to define the data warehouse physical structures (tables, columns, indexes, etc). The PDM is generally somewhat different from the LDM because it is platform-specific and is optimized for performance.
- The PDM is implemented on the platform, at which point an empty data warehouse exists.
- The data warehouse is populated with the company's source data.
- User applications are created to access the data warehouse so that the users can answer their business questions.
|
Relational Theory
|
Purpose & Foundation
Relational Theory is the basis of all aspects of logical data modeling, from the modeling process, to normalization rules to the resulting design. Relational Theory is the basis for the underlying structure of relational databases as well as SQL (Structured Query Language). SQL is a set-processing programming language designed specifically to work with relational databases. It is not necessary to understand Relational Theory in depth in order to do good logical data modeling. However, these core concepts from Relational Theory underlie the techniques you will be learning.
Relational Theory stresses:
- Simple, meaningful data representation
- Structure
- Integrity
- Easy, full data access and manipulation
Relational Structure
- The basic unit of structure in a relational system is a "Relation"
- A relation is a two dimensional table made of rows and columns
- Rows are also called Tuples
- Columns are also called Attributes
- Tables are also called Entities
More in-depth discussions of these and related concepts are included throughout the rest of this course.
|
ERD Definition
|
An LDM is ultimately documented in an ERD (Entity-Relationship Diagram). The ERD is a standard modeling technique for defining and graphically representing an LDM:
- Entities: Represent people, places, things, concepts, or events for which the business has the means and the will to collect and store data.
- Relationships: Business associations that link occurrences of one or more entities and tell how they relate to one another.
- Attributes: Data facts about an entity or a relationship. It is the lowest level of information that still has business meaning without further decomposition.
The following synonyms are used interchangeably in this course, as they are in the industry:
- Entity = Table
- Attribute = Column
- Assertion = Constraint (business rule)
- Instance = Row
ERD Conventions
All of the concepts below are much more fully defined in the following modules.
- Entities are shown as boxes with their Key Attributes above the horizontal line within a box.
- Relationships are shown as labeled lines between boxes.
- Non-Key Attributes are shown below the horizontal line within a box.
- A line indicates a relationship.
- A solid dot represents the "many" side of a one-to-many relationship
Note: The conventions described in the sample below are based on IDEF1x notation.
 |
Simple ERD Example (See graphic on left)
An ERD gives you a high level overall view of the system. In actual use, LDM ERDs usually do not use abbreviations. They are used in this course to reduce scrolling for your convenience.
Note: ERDs can be LDMs or PDMs. In actual use, LDM ERDs don't usually have abbreviations, but PDM ERDs frequently do due to the physical database name size limitations. Abbreviations are used in this course to reduce scrolling for your convenience.
Not-So-Simple ERDs
(See a small part of an LDM)
Real ERDs for industry solutions usually include many dozens of entities and several hundred attributes. They are extremely complex, resembling a circuit board diagram, and can often cover many feet of a wall when printed and displayed in full. By the end of this course you will be able to read and understand an industry solution LDM. ERD conventions are discussed in detail in Module 8 of this course. |
|
ERD vs Table Layouts
|
Instead of ERDs, a table layout format is used throughout this course to facilitate learning. Since people have a great deal of prior personal experience looking at tables in a wide variety of life contexts, they are very easy and intuitive to understand. ERDs are much less so.
The final module of the course covers ER Diagramming conventions in detail, and shows how the table formats translate into ERDs. ERDs are not used in this course for teaching the modeling process. However, at a few key points in this course, ERD versions of the table layouts will be available for comparison. ERDs help people understand the big picture, while table layouts make the details clear.
|
Table Layouts
|
Table layouts give you a more detailed definition of the tables. They are an alternate way to document an LDM that is especially useful while learning about data modeling. Table layouts show entities (sometimes called tables), attributes (sometimes called columns) and relationships, as does the ERD. But, by adding sample data values, table layouts provide a level of detail that make many concepts easier to grasp visually and intuitively. Here is the Table layout equivalent of the ERD just shown:

Table Layout Conventions
Table and column names are abbreviated to minimize scrolling. For visual clarity and optimum learning, the following standards are used:
- Entities including their Key Attributes (marked PK for Primary Key) are shown in Black.
- Relationships including their Key Attributes (marked FK for Foreign Key) are shown in Blue.
- Non-Key Attributes are shown in Red.
- The vertical double line divides the PK (left) from all other attributes (right)
- The horizontal double line divides the column headers and constraints (top) from the sample data (bottom)
NOTE: ERDs and how they are used at customer engagements are discussed more comprehensively at the end of this course.
|
Modeling Process Overview
|
The Big Picture
Data modeling is the process of identifying, defining, validating and documenting all of the entities, relationships, and attributes and assertions that accurately represent the business data to fulfill the business requirements.
It is a very detailed and complex task that can take many person-months or even years for a large enterprise. That's the bad news. The good news is that the process is highly logical and intuitive, and is made of many small reasonably simple tasks. The complexity of data modeling is in the number of pieces to be defined, not in the process of defining them. Using a simple business example makes learning the process fairly easy.
Modeling Steps
These are the high level steps and sequence used to create a viable LDM. Details are covered in the next module.
- Step 1: Define Business Requirements/Scope
- Step 2: Define Entities
- Step 3: Define Relationships
- Step 4: Define Attributes
- Step 5: Verify the Model
|
Try It: LDM vs. PDM
|
Try It! Exercises
Don't skip the interactive exercises that are sprinkled throughout this WBT. They are not only the most fun part of the course, but the richest for your learning. Data modeling is a big, complex task, and these exercises are designed to give you hands-on experience with each aspect of it. In addition to the small exercises, there are two larger ones, that may take you up to 30 minutes each. We recommend that you do them, as they bring all the prior learning together by simulating the entire data modeling process for a small enterprise.
|
Try It: LDM vs. PDM
|
Directions
The following statements were overheard at a customer site. Identify whether each speaker is referring to an LDM or a PDM:
 |
IT Director: Jim Watanabe
It's a complete model as far as being an unbiased representation of the business data. But it doesn't take into account the platform issues we must deal with.
| This customer is referring to a:
|
 |
Marketing Director:
Carmen Smith
I still have a few business questions that I'm not sure I can answer with this model.
| This customer is referring to a:
|
 |
End User: Waneta Rogers
Thanks! This model has made it much easier to communicate with the people over in accounting about what we need in the new system!
| This customer is referring to a:
|
 |
DBA: Doug Fortuna
I've got the model modified so that it will perform well and give users what they need in the timeframe they need it.
| This customer is referring to a:
 |
|
Objectives
|
This module enables the student to:
- Define what an entity is in the context of an LDM
- Identify entities from a user requirements description
|
Entity
|
What is an Entity?
An entity is something about which information must be maintained: A person, place, thing, concept, idea, etc. It is simplest to visualize an entity as a table with rows and columns.
|
Rules for Identifying Entities
|
The following are rules for identifying entities:
- Must pass these tests:
- It is within the scope of the project
- It can be clearly and uniquely defined
- Each instance can be uniquely differentiated from all other instances
- Is not dependant on the existence of another entity
- Usually there is other information that describes it
- Entity name must be:
- Unique within the model
- A singular noun, possibly modified by adjectives
- Relational Rules:
- The relative position of entities on a diagram is irrelevant
- The relative position of rows within a table is irrelevant
|
Entity Definition Problems
|
People tend to be fairly casual in the way they think about and describe their business needs and processes. Ask questions to specify the meaning of names that are under consideration as entity names. Here are two common situations to be alert for, and to resolve explicitly when they arise.
- Synonyms: Different names which mean the same thing (e.g., Employee vs. Worker)
- Homonyms: The same name means different things to different users
- Account for the Accounting Department
- Account for the Sales Department
|
Try It: Entities
|
Directions: A hardware vendor has the following business requirements. Using the rules for identifying entities from the previous page (a copy is provided for your reference below), identify the three main entities you would define.
 OK. Here's what we need. We want to track purchase orders (POs) for customers. Our POs have to be at the line item level so that we can tell what customers ordered what specific parts, and when, and how many of each and what we charged on the PO. And we track the customer's name and address for shipping purposes, though shipping itself is not important right now. We also keep track of various types of customers, for example, individuals and corporations. By the way, some of our parts are composed of other parts we sell. We often include special instructions on a PO, that we write into the remarks box on the form. The following are rules for identifying entities:
- Must pass these tests:
- It is within the scope of the project
- It can be clearly and uniquely defined
- Each instance can be uniquely differentiated from all other instances
- Is not dependant on the existence of another entity
- Usually there is other information that describes it
- Entity name must be:
- Unique within the model
- A singular noun, possibly modified by adjectives
- Relational Rules:
- The relative position of entities on a diagram is irrelevant
- The relative position of rows within a table is irrelevant
|
Primary Key
|
What Is a Primary Key?
Remember the rule for defining entities: Each instance of an entity must be uniquely differentiated from all other instances? Primary Keys are the way we enforce this rule.
A PK is a unique identifier. It is a column or group of columns whose combined values are unique for every row in the entity table. PK columns are traditionally shown as the first columns in a table (for human convenience, though they need not be in any order according to relational theory), and are called out explicitly as the PK in both ER Diagrams and Table Layouts.
|
Rules for Primary Keys
|
For each entity, you must determine what unique identifier to use (there may be several candidates for PK), and fill it in on your model. Make sure it adheres to all of these rules for PKs:
| Rule | Reason for Rule |
| Must be one per table | Otherwise rows could not be uniquely identified. |
| Only one per table | While there may be more than one column or set of columns that are unique, only one (set) is defined and used as the PK for modeling consistency. |
| Value must be unique | If duplicates were allowed, it would no longer uniquely identify the row. |
| Value cannot be NULL | NULL means unknown. If the value is unknown, theoretically it could be equal to some other value and therefore be a duplicate. NULL also means that a value for that attribute does not apply for that instance. |
| Value should not be changed | History is tracked by PK. If the values in the cells change over time history for the original value becomes invalid. |
| May be made of any number of columns | It may take a combination of several columns to achieve uniqueness. This is called a compound key. |
|
PK Definition Issue: Uniqueness
|
Identifying a key with unique values can be troublesome. Many times a user already has identifying numbers they use for their major entities. Often the currently used numbering does not enforce uniqueness, or has changed format over time, so that a single row has more than one identifier. To make it worse, users tend to be emotionally and habitually attached to their legacy numbering schemes. Always ask explicitly if there are any known duplicates, or if the numbering scheme has changed in the past. If the answer is yes, often the best answer is to assign an incremental number, named to match the entity. That way uniqueness is assured in all cases. You can still carry the other historical numbers for reference, without using them for identification.
|
Intelligent Keys
|
PK Definition Issue: Intelligent Keys
An intelligent key is one that is made up of several parts, each of which has meaning to the user. Examples include phone number (for telecommunication companies) and Social Security Number for the IRS. Whenever the user offers an intelligent key as a candidate for PK, advise them to use a uniquely assigned number and carry the intelligent key as an attribute because there are several serious problems with this kind of key:
- Over time the meaning of the components may change
- Over time the values of the component may change for a row
- Often they are composed of more than one type of data (alpha plus numeric)
- Because of the kinds of changes noted above, intelligent keys usually already include:
- Duplicates (more than one row with the same value)
- Multiples (a single row that has had more than one value over time)
|
Intelligent Key Disaster Example
|
Our hardware vendor has always identified their parts in the past by the warehouse Aisle, Level and Bin number: A-3-47, so that they can instantly find whatever part the customer is asking for. An uninformed data modeler gives them a system that retains this PK for their Part table, as requested. Six months later they acquire another hardware concern and start selling twice as many types of parts. They expand their floor space and reorganize their inventory for customer convenience. Now they have to reassign new IDs to almost every part! It's a huge job, but they do it. Then a customer calls in with a question on a PO from before the change. Mass confusion results because the part numbers on the PO now refer to entirely different parts! They must create a translation table, or suffer endless confusion. All of their historical data has this problem. And it will all start over if they ever do another acquisition.
| Before Merger | After Merger |

|
Key A-1-1 A-1-2 A-1-3 A-1-4 |
Description Bolt Hammer Nail Pliers |
Key A-1-1 A-1-2 A-1-3 A-1-4 A-1-5 |
Description Awl Bolt Hammer Nail Pliers |
What is Part A-1-3?
What is the correct number for a Hammer?
|
PK Sample Data
|
Table layouts provide space for sample data values, which can help validate with the user that the combination of attribute values is in fact unique. For multi-column keys, it is best to put duplicate values in each key column, and vary them so that the combinations are unique. This makes it graphically clear to designer and user why multiple columns are needed for the PK. It may also unearth errors in assumptions about valid data values.
|
Try It: PKs
|
Directions: Identify a unique PK for each of the main entities
defined for the hardware vendor by choosing the best PK candidate column
from the related drop down list.
|
|
3: Modeling Relationships
|
Objectives
|
This module enables the student to:
- Define what a relationship is in the context of an LDM
- Identify relationships for a specific LDM in table layout format from a given user requirements description
|
Relationships
|
What is a Relationship?
Functions in a business are naturally related in the real world, so the data that represents them is also related. A relationship is a statement about how two entities interact with one another: Customers buy Products; Employees work in Departments. Defining the relationships between tables is a core component of relational design.
|
Direct and Indirect Relationships
|
You must interview the user to determine if relationships are direct or indirect, because that decision depends on business rules.
- Direct: An entity relates directly to another entity without reference to any other relationships. Example: A Customer initiates a Purchase Order.
- Indirect: An entity relates to another entity through one or more other relationships. Ultimately all indirect relationships are constructed from direct relationships at the lowest level. Example: In our hardware scenario, a Customer orders Parts, but not directly. The customer actually initiates a Purchase Order on which each Part appears. PO/Customer is a direct relationship. PO/Part is a direct relationship. The relationship between Customer and Part is through the existence of the PO, so it is indirect.
If you model the direct relationships completely, the indirect ones are taken care of. To tell if a relationship is direct or indirect, ask the question: Does anything else have to happen for this relationship to exist? If the answer is no, the relationship is direct. If the answer is yes, the relationship is indirect.
|
Rules for Defining Relationships
|
Each relationship:
- Has a name which is unique when combined with the names of the tables involved
- Must apply equally to ALL rows in the tables
- Has a Cardinality Statement: Dictates the number of rows of one table that can be related to a single row in the other table
|
Steps for Defining Relationships
|
There are basically three steps to defining the relationship in an LDM:
- Identify/document which pairs of tables have relationships between them
- Is there a relationship?
- Is it direct or indirect?
- Name the direct relationships.
- Identify the cardinality of each direct relationship(discussion follows)
- Place Foreign Keys to model the relationships
|
Step 1: Relationship Matrix
|
The purpose of the matrix is to reduce the work to a minimum while guaranteeing that no possible relationships are missed. The matrix defines the nature and name of the direct relationships between each possible pair of entities. In an actual data modeling engagement, the matrix is not generally used, but for learning purposes it makes the underlying process clear.
| Entity Aâ Entity Bà | Customer | Part | PO |
| Customer | | | |
| Part | | | |
| PO | | | |
|
How to Complete a Relationship Matrix
|
The following tasks describe how to complete a relationship matrix:
- Alphabetize and count the entities identified
- Create a matrix that is 1 column and 1 row bigger than the number of entities identified
- Put all of the entities alphabetically down the left hand column of the matrix (Entity A)
- Repeat them alphabetically across the top of the matrix (Entity B)
- Shade the bottom half of the grid, below the top-left-to-bottom-right diagonal
- Shading indicates that these cells can be ignored from now on
- Shaded cells represent relationships already handled in the unshaded cells
- For each unshaded cell, answer these questions:
- Does a member of entity A relate to a member of entity B?
- If the answer is no, shade the cell. Go to the next cell.
- If the answer is yes, go to the next question.
- Does anything else have to happen for this relationship to exist?
- If the answer is no, this is a direct relationship. Go to the next question
- If the answer is yes, shade the cell. This is an indirect relationship.
- What is the relationship?
- Describe it in one sentence.
- Name the cell based on the verb that describes the relationship in that sentence.

|
Example
|
 |
Possible Paths through the Flowchart:
- Customer to Customer: Purple Arrows
- Customer to PO: Green Arrows
- Customer to Part: Red Arrows
|
| Question | Y/N | Action |
| Does Entity A (Customer) relate to Entity B (another Customer)? | No | Shade the cell. |
| Does Entity A (Customer) relate to Entity B (Part)? | Yes | Ask 2nd question. |
| Does anything else have to happen for this relationship to exist? | Yes* | Shade the cell |
| Does Entity A (Customer) relate to Entity B (a PO)? | Yes | Ask 2nd question. |
| Does anything else have to happen for this relationship to exist? | No | Ask 3rd question. |
| What is the relationship? Answer: A customer initiates a PO. | | Name the cell. |
* The customer must order the part (e.g. generate a PO).
Resulting Matrix
| Entity Aâ Entity Bà | Customer | Part | PO |
| Customer | | | Initiates |
| Part | | | |
| PO | | | |
|
Try It: Matrix
|
Ask the following questions for the remaining unshaded cells, and choose the most appropriate action for each:
| Entity Aâ Entity Bà | Customer | Part | PO |
| Customer | | | Initiates |
| Part | |
|
|
| PO | | | |
|
Step 2: Cardinality
|
Types of Relationship: Cardinality
There are three main types of relationships between entities which are differentiated by their degree of cardinality:
| Type | Definition | Example |
| 1:1 |
Zero or one row in table A exists for each row in table B. This is extremely rare. 1:1 relationships tend to become 1:M over time. The only valid examples of 1:1 relationships exist where time is permanently irrelevant. |
Heart Donors to Hearts.
Notice that Heart Recipients to Hearts might not necessarily stay 1:1 over time! |
| 1:M |
A single row in table A may relate to zero, one or many rows in table B, but not vice versa. This is common. |
A specific Customer may have many POs over time, but a specific PO belongs to only one Customer. |
| M:M |
A single row in table A may relate to zero, one or many rows in table B, and vice versa. This is also common. |
A specific Employee may work in many Departments concurrently, and a specific Department will have many Employees.
|
|---|
|
Step 2: Defining Relationship Cardinality
|
Once you know how the entities are related, it remains to determine the degree of cardinality.
- Define cardinality only for unshaded, named cells (e.g. direct relationships)
- Ask and answer two questions, substituting the relationship name to clarify the question:
- An instance of Entity A can have, at most, how many instances of Entity B concurrently?
- An instance of Entity B can have, at most, how many instances of Entity A concurrently?
- Answer each question with either "One" or "Many"
- If both answers are "One" it is a 1:1 relationship. Ask further if this will always be true over time within the scope of the system. If the answer is no, it is really 1:M.
- If both answers are "Many" it is a M:M relationship.
- If one answer is "One" and the other is "Many" it is a 1:M relationship.
- Post the result in the cell following the relationship name. Use 1:1, 1:M, or M:M notation.
See Flow Chart
|
Example
|
Example: Customer/PO Relationship Cardinality
| Question | Result |
|
A Customer can specify, at most, how many POs concurrently? | Many |
|
A PO can be specified by, at most, how many Customers? | One |

|
Try It: Cardinality
|
Directions: Assume the same hardware store scenario for which you've already defined entities. Go through the matrix below and for each filled-in cell, designate the relationship cardinality (1:1, 1:M, or M:M).
| Entity Aâ Entity Bà |
Customer | Part | PO |
| Customer | | |
|
| Part | |
|
|
| PO | | | |
|
Step 3: Foreign Keys
|
What Is an FK?
Remember the rule for defining relationships: Cardinality dictates the number of instances of one entity that can be related to a single instance of the other entity? Foreign Keys are the way to model and enforce this rule and the way to model relationships. A PK identifies uniqueness while an FK identifies relationship. FK columns are documented explicitly (marked as FK) in both ER Diagrams and Table Layouts.
An FK is a column or group of columns whose combined values appear elsewhere in the model as a PK. Where it appears as a PK it is called the "parent" side of the relationship. Where it appears as an FK it is called the "child" side of the relationship.
FK Constraints: ND and NN
- Duplication: An FK may or may not have all unique values, depending on the situation. If an FK must be unique, it is identified as ND (No Duplicates) in a table layout next to the FK designation.
- NULLs: An FK may or may not have null values, depending on the situation. If an FK must have a known value, it is identified as NN (No NULLs) in a table layout next to the FK designation.
- In an ERD these designations are not shown. They are covered in the attribute definitions.
|
Example
|
Home Construction Enterprise Example:
Lot# 14 is a PK value in the Lot table. It is an FK value in the House table. These two designations together tell us that there is a relationship between a Lot and a House. It also shows that Lot is the parent (the side where the PK is), while Lot# in the House table is the child (the side where the FK is). The PK is said to "migrate" from the parent to the child table.
|
Rules for Foreign Keys
|
You must document each direct relationship correctly in your model, and there is a different way to document each of the three types of relationships. Make sure it adheres to all of the rules for FKs:
| Rule | Reason for Rule |
| May be zero, one or more per table |
All tables in the model should be related to other tables, otherwise they are outside the scope of the system. However, a table may be exclusively on the PK end of all its relationships, and carry no FKs to other tables. This is frequently the case with minor entities which list valid values used elsewhere in the system. An example might be State table in the USA. |
|
Value may be non unique |
In 1:M or M:M relationships, there are duplicates by definition on the M side of the relationship. An example might be a customer with several POs. On the PO table, the Cust# for that customer would appear multiple times. |
|
Value may be NULL |
A particular row may not have a current relationship where one is possible. An example might be an employee who is currently not assigned to a specific department. The employee would have a NULL in Dept#. |
|
Value may be changed |
Relationships do change over time. An example might be that an employee changes departments. In the Employee table the value for Dept# would change for this employee. |
|
May be made of any number of columns |
Since an FK cannot exist without a matching PK elsewhere in the system, this rule which governs PKs also governs FKs. This is called a compound key. |
|
Both the attribute and the value within it must exist as a PK in a related table |
If this rule is broken it is called a Referential Integrity Violation. An example might be: In the real world, an employee is assigned to a department. If the department is disbanded, and the employee's row is not updated to show the change, the employee appears to continue to work in a department that does not exist. The reference between tables that identifies the relationship is no longer valid. |
|
Modeling 1:1 Relationships
|
The following are tasks for modeling 1:1 relationships:
- Place the PK from either table as an FK in the other table.
- Decide which entity is the parent and which is child based on the direction it is most likely to become 1:M over time.
- An FK for a 1:1 relationship is always constrained as ND (no duplicates) because duplicates would, by definition turn it into a 1:M relationship. If it becomes 1:M over time, simply remove the ND constraint, since you've already modeled it in the correct direction to support that eventuality.
- Fill in sample data that tests all known constraints.
Example: House to Lot (1:1 relationship) for a contracting enterprise. Note: True 1:1 relationships tend to migrate into 1:M over time (as this one does). However, it is a valid way to model a relationship that the enterprise does not expect to change over a long period of time (i.e. longer than they are interested in tracking it).

Notice that there are two valid ways to model this relationship. Always model in the direction most likely to become 1:M over time. Since this is a contractor, and not a house moving company, the Lot is parent and the House is child. A contractor might build a second house on the same lot due to fire, for example.
|
Modeling 1:M Relationships
|
The following are tasks for modeling 1:M relationships:
- Place the PK from "1 side" of the relationship as an FK in the "many side" of the relationship.
- An FK for a 1:M relationship may be constrained as NN (No NULLs. NULLs are discussed later in this module)
- Duplicates must, by definition, be allowed, otherwise it is a 1:1 relationship.
- Fill in sample data that tests all known constraints
Example: Departments to Employees (1:M relationship) for a retail enterprise. The business rule for this enterprise is that each employee is assigned to only one department at a time, and every employee must be assigned to a department at all times. Naturally, a department will normally have more than one employee assigned to it. There is only one valid way to model this relationship: Put the DeptID as an FK into the Employee table.

Notice that the sample data makes it obvious which way is correct:
| Left Example | Right Example |
- Any given department could only have one employee at a time
- An employee might work in more than one department at a time
- An employee might not be assigned to a department
|
- Any given department can have multiple employees
- An employee can work in only one department
- An employee must be assigned to a department at all times
|
|
Modeling M:M Relationships (Associative Entities)
|
The following describes aspects of modeling M:M relationships (associative entities):
- Modeling a M:M relationship requires the creation of a new entity.
- An entity that models a M:M relationship is called an Associative Entity.
- Place the PK from each entity as an FK within the PK of a new entity.
- Name the new entity for its parent entities.
- There is no definite parent or child "side" of the relationship. Both original tables can be considered parents, and the new table can be considered the child.
- Both FKs are automatically constrained as NN because they are part of the PK.
- Each FK may have duplicates, but the combination (e.g. the PK) may not.
- Fill in sample data that tests all known constraints. Show duplicates in each FK.
Example: Departments to Employees (M:M relationship) for a manufacturing enterprise. The business rule for this enterprise is that each employee may be assigned to multiple departments at any one time, and every employee must be assigned to at least one department at all times. Naturally, a department will normally have more than one employee assigned to it. There is only one valid way to model this relationship: Create a new associative entity with a PK made of the two FKs, naming the new entity for its parent entities.
|
Try It: FKs
|
It is important at each step to document the model in order to validate its completeness and accuracy. Below is the model as it exists with the entities and PKs we've defined.
Do this exercise first on paper. Draw the three main entities (Customer, PO, and Part), identify the three direct relationships, and then place FKs with appropriate constraints (NN, ND) based on what you've just learned about how to model relationships. Then, do the exercise here, using your written model as the basis for your answers.
- For each entity pair shown, check all of the items that describe how you modeled it on paper. When done, click the Show Model button to see the complete model.
|
Recursive Relationships
|
What Is a Recursive Relationship?
Good news: You've already done this! It models data hierarchies and networks. Another way to look at it is that recursive relationships are ones where an entity has a relationship to itself .
M:M Example: Part/Component Part
A part may be made of many component parts (e.g., be an assembly), and it may in turn be a component part of some larger assembly. Each part, from the most atomic (not composed of any other parts) to the most complex (composed of the most component parts and other assemblies) has its unique Part#, and is associated in the Part/Component Part entity with each of its component parts. This is also called a Bill Of Materials or BOM model, and reflects a network structure rather than just a hierarchy.
1:M Example: Employee/Manager
The Manager, is of course, also an Employee. This would be a 1:M relationship if each employee could only report to one manager, while a manager could manage multiple employees. In this case, the MgrEmp# would be carried as an FK in each Employee's row (identified by PK=Emp#) in the Employee table. Each row would therefore have two occurrences of Emp#: One identifies the employee, and one identifies the employee's manager. This represents a hierarchy, rather than a network structure.
|
Referential Integrity
|
What Is Referential Integrity?
Referential integrity ensures that the relationships (PK/FK references) are intact and valid.
For example, it is not possible in the real world for an employee to:
- Work in a department that doesn't exist (e.g. a department was closed, and removed from the department table, but the employees who had worked in that department were not updated in the data warehouse to reflect that they were reassigned to other departments)
- Report to a manager who is not an employee of the company (e.g. a manager resigned, and was removed from the employee table, but the employees who had reported to that manager were not updated in the data warehouse to reflect their new assignments to other managers)
|
Example
|
The PK/FK relationships defined in the LDM theoretically ensure that this could never happen, by invoking the rule: The value of an FK must always already exist in its related PK. This linkage is in fact an integral part of the definition of an FK. The implemented DBMS should automatically keep this from ever happening.
|
Objectives
|
This module enables the student to:
- Define what attributes and constraints are in the context of an LDM
- Identify attributes and constraints for a specific LDM in table layout format from a given user requirements description
- Discuss NULLs and how they are generally addressed in data modeling
|
Attributes
|
What is an Attribute?
A relevant, elementary data item which describes or provides detail about an entity (e.g., a column).
|
Rules for Identifying Attributes
|
Rules for Identifying Attributes
- Must pass these tests:
- It is within the scope of the project
- It can be clearly and uniquely defined
- It is not described by other attribute(s) (e.g., only entities are described by attributes)
- Its definition is not based on or derived from other attribute(s)
- It is atomic in this context (e.g., cannot be broken down into smaller units of meaning)
- It is drawn from a single domain of values
- Appears in only one entity (except as a foreign key)
- Attribute name must be:
- Unique within a table
- A singular noun, possibly modified by adjectives
- Relational Rules:
- The relative position of an attribute within an entity is irrelevant
- Each attribute should be generally relevant to all instances of the entity
|
Types of Attributes
|
The following are types of attributes:
- Key Attributes (covered above):
- Primary Key (PK): Uniquely identifies each row in a table
- Foreign Key (FK): Identifies the relationship between tables
- Non-Key Attributes: All other attributes that are not part of any key. They are descriptive only, and do not define uniqueness (PK) or relationship (FK).
- Derived Attributes: An attribute whose value can be calculated or otherwise derived from other existing attributes. Example: NetPay is derived by calculating GrossPay - TaxAmt.
|
Derived Attribute Issues
|
The attributes from which derived attributes are calculated are in the design, so carrying the derived attribute, in addition, creates redundant data. Derived attributes may be identified and defined in order to validate that the model can in fact deduce them, but should not be shown in the ER Diagram. The reason for this is that carrying redundant data goes against relational design theory and principles.
There are several good reasons to avoid carrying redundant data:
- The data must be maintained in two places, which involves extra work, time and expense
- There is a risk (likelihood) of the copies getting out of sync with each other, causing data inconsistency
- It takes more physical storage
Strictly as a physical design decision, when the PDM is created from the LDM, the enterprise may decide to carry some derived data on the system for:
- Increased comfort. Note: This is actually counterproductive because of the update synchronization issue. That is, the user may feel more comfortable, while the data is actually inaccurate!
- Documentation of the derivation algorithm
- Speed and ease of access for frequently accessed items that require large amounts of I/O to derive.
|
Linguistic Issues
|
People tend to be fairly casual in the way they think about and describe their business needs and processes. Ask questions to specify the meaning of names that are under consideration as attribute names. Here are two common situations to be alert for, and to resolve explicitly when they arise.
- Synonyms: Different names which mean the same thing (e.g., JobCode vs WorkOrder#)
- Homonyms: The same name means the different things to different users:
- Acct# for the Accounting Department
- Acct# for the Sales Department
|
Try It: Attributes
|
A hardware vendor has the following business requirements. Use the rules for identifying attributes on the previous page (a copy is provided below for your reference) to identify the attributes you would define.
OK. Here’s what we need. We want to track purchase
orders (POs) for customers.
Our POs have to be at the line item level so that we can tell what
customers ordered what specific parts,
and when, and how
many of each and what we charged
on the PO. And we track the
customer’s name and address
for shipping purposes, though
shipping itself is not important right now. We also keep track various
types of customers,
for example, individuals and corporations. By the way, some of our
parts are composed of other
parts we sell. We often include
special instructions on a PO,
that we write into the remarks
box on the form.
Rules for Identifying Attributes:
- Must pass these tests:
- It is within the scope of the project
- It can be clearly and uniquely defined
- It is not described by other attribute(s) (e.g., only entities are described by attributes)
- Its definition is not based on or derived from other attribute(s)
- It is atomic in this context (e.g., cannot be broken down into smaller units of meaning)
- It is drawn from a single domain of values
- Appears in only one entity (except as a foreign key)
- Attribute name must be:
- Unique within a table
- A singular noun, possibly modified by adjectives
- Relational rules:
- The relative position of an attribute within an entity is irrelevant
- Each attribute should be generally relevant to all instances of the entity
|
Domains
|
What Is a Domain?
A domain defines the valid values for an attribute: The actual data value in that column within each row in a table. An attribute should contain only one domain of data, but many attributes may be drawn from the same domain. A few examples of very common domains are:
|
Date
Time
Quantity
| |
Amount (Currency)
Yes/No
Ranges of numeric values
|
A domain is defined by these primary characteristics:
- Format (though the technical side of format is actually a PDM consideration, format is an LDM consideration in terms of valid data values for an attribute during design):
- Data type (physical representation)
- Length (storage format)
- Values: The set of valid values that may appear in that column. All other values are considered errors and are prohibited from appearing in the column.
- The value set is named.
- It may change over time.
- When the list changes, all values in the column must conform to the new list.
- Atomic meaning: A domain has only one meaning within the system context.
|
Keep Domains Atomic
|
The same data may be atomic with respect to one system, but not to another, depending on whether subsets of the data have intrinsic meaning to that enterprise. An example is Phone Number (12 digits: 03-827-456-2837):
- Non-Telcos: One domain since only the entire number taken as a whole has meaning
- Telcos: Four domains, each with its different meaning, and should be modeled as four separate attributes
To verify if an attribute (and its underlying domain) is atomic, ask the user if they would query against portions of it. If the answer is yes, break the attribute down into the smallest components that would be queried. Example: Would you ever want to report on any pieces of Customer Name? It is 2-5 domains depending on the user's answer:
- Title (Mr., Ms., Dr. etc.)
- First name
- Middle name or initial
- Last name
- Degree (MD, DDS, LLB, etc.)
While title and degree are not really part of customer's name, when a user mentions "customer name" in terms of their business needs, they may actually have these attributes in mind as well. Asking a question such as the one above will ferret out these additional attributes.
Anticipated searches on subsets of the data do not indicate that the attribute should be broken out into its components in these cases:
- Wild card or partial searches that do not indicate that fixed portions of the attribute mean different things
- Date and Time: These are intrinsically constructed of three domains each (Day, Month, Year for Date, and Hours, Minutes, Seconds for Time), but people think of them and use them as a single piece of information about an entity. Also, standardized manipulation algorithms exist across enterprises, industries and most countries making it practical for implementation to treat these as three domains internally while actually storing them as a single attribute for convenience.
|
Attribute Placement
|
Overview
Once you've identified the attributes, the question is which ones belong in which entities? A non-key attribute should be placed in only one entity. There is a formalized science to this process, called normalization, which is covered in detail later on in this course.
However, to minimize work in actual implementation, the first cut for identifying and assigning attributes is generally by intuition. This results in a good model that can be validated and corrected by applying the formalized normalization rules.
Assigning attributes intuitively is valid because in most cases the name of the attribute, or the way it's referred to tells you the answer. For example, customer name, address and type obviously belong in the Customer entity. Here's how they look after we've placed them:

|
Validate: Atomic Domains?
|
Notice that the user originally specified only two attributes: name and address. When we apply the rules and concepts presented above (especially about domains) we discover that each of the original attributes breaks into multiple attributes in order to get to the atomic level.
Use realistic sample data to help the user validate the attributes in terms of domain:
- Atomic level appropriate to the business context.
- Meaning of each attribute. Example: Account# sample value is 123. The user expects ABC. It becomes clear that communication was incomplete, but the error is easy and inexpensive to correct at this point.
- General data type (alpha/numeric). Above example applies.
|
Validate: Missing Attributes?
|
Users rarely just happened to mention everything they need at the atomic level. Always ask if there is more, and validate what is done so far. The easiest way to do this is to show them the table layout with sample data and solicit feedback. It's very likely here that the user would also want a start date, a phone number, perhaps e-mail, etc. for their customers. To save space and scrolling in this course, we're postulating that they are satisfied with the attributes shown. It is also a good idea to look at existing reports, screen layouts and file layouts to help identify attributes.
|
Validate: Missing Minor Entities?
|
Sample data can help identify if entities are missing. Remember the relational rule that seeks to minimize redundant data? When the same descriptive data is repeated within a column, where only a few values are valid, it is generally advised to create minor entities. A minor entity is a table, usually with a code value as its PK and a single description attribute to define the code. The code is carried elsewhere in the system as an FK, and the full description is accessed from the minor entity if needed for reporting. Minor entities are a good way to maintain data accuracy and reduce redundancy. Example: A Gender code table. Clues to missing minor entities include:
- Attributes that include the word Type, Code, Category, Group or ID as part of their name. They represent various ways of categorizing the instances in an entity.
- A small number of valid values in a column:
- Repeated data values (especially long descriptions) in a column.
- Repeated code values in a column. These indicate that a longer description exists which should be documented in a minor entity.
|
Example
|
Example: Employees may be paid by monthly salary or hourly wage. If there are many rows (say 50K employees), these long descriptions for each employee's PayType can cause problems:
- During Modeling:
- Redundant data: Goes against relational theory rules.
- Inconsistency: Over time, the descriptions are input with various spellings (Mo. Sal., Mo Sal, Monthly Sal, etc). It becomes impossible to query by PayType with any accuracy.
- For Implementation: Storage and maintenance savings.
The Solution
- Create a code to carry on each Employee row.
- Create a minor entity to document the valid code values
- Carry the minor entity's PK as an FK in the Employee table
|
Quick Quiz
Constraints
|
Constraint Concept Definition
A constraint or assertion is a rule that defines limits on valid entity relationships or attribute values, based on business rules. Domain level constraints apply to all attributes in the domain. Attribute level constraints apply only to the specific attribute. Constraints can also apply across multiple entities and attributes.
Constraint Examples
- Date Domain: Must be a valid calendar date YYYY/MM/DD
- BirthDate: Must indicate the person is older than 21.
- GoldClubIndicator: Yes or No
- Required Attribute (Be careful when defining this one, as it causes people to input incorrect data rather than allow the system to impede their work)
- No duplicates allowed (ND)
- Only numeric values are valid
- An employee must belong to an existing department
- An employee may be assigned to only one department
- An employee may not work on any project that is not assigned to her/his department
- HealthPlanCode must be: J, R, A or S
- Only salaried employees may sign up for Health Plan "S"
- If HireDate > ten years from today, VestedIndicator = Y
|
Validate and Document
|
Some constraints are explicitly described in the LDM (e.g., a relationship or a domain) while others are only described in documentation associated with the LDM. It is important to validate the entire model against all of the identified constraints to make sure that they are supported by the design when it is complete. Sample data and table layouts can be very useful in validating constraints with the user.
|
NULLs Concept Definition
|
NULL indicates that the value is unknown. NULL is not a blank or zero value. Reasons for NULLs in an attribute might indicate:
- Not currently known but presumed to exist (e.g., marital status)
- Refused, but presumed to exist (e.g., social security number)
- Not relevant to this specific type of instance of the entity (e.g., type of hoof for a fish)
- Not currently relevant (e.g., name of spouse for an single or divorced employee)
- The attribute may not apply for all instances such as an end date where the instance has not yet ended (e.g. an active product with a null end date)
NOTES:
- Best practice is to not put in dummy values if the information is not know (such as a phony date if a person's birth date is not known.)
- Different enterprises use the concept of NULLs in different ways. While one uses it for unknown, another may use it for not applicable. It is important to get these definitions clearly stated so that everyone is working from the same assumptions.
|
NULL vs Zero or Blank
|
Here are some examples of possible differences in meaning (per the enterprise's business rules):
| Attribute Name | NULL means | Zero Means | Blank Means |
| SocialSecurityNumber | Unknown | Person has no SS# | Refused to supply it |
| EmployeeAddress | Missing data | Not valid | Not valid |
| CharityContribAmt | Unknown | Contributed $0.00 | Not valid |
| SpouseName | Not relevant | Not valid | Not supplied |
|
Issues with NULLs
|
While the following are implementation issues, rather than logical modeling issues, they are the underlying reasons that the rules for NULLs were included in relational theory.
- Aggregation functions across NULLs are undefined and generally give error messages:
- What is the average of the following set of values: 20 + 30 + unknown + 40?
- The average is 30 if NULLs are excluded and 18 if they are included
- A division operation against values that include NULLs generally gives a divide check
- Boolean operators (EQ, LT, GT, NE, etc.) on NULLs are clearly defined, but produce unpredictable results. (One NULL is not less or greater than another, but is never equal)
- Joins generally ignore rows with NULLs in the join fields, considering them unmatched. This impacts outer join accuracy.
|
Validate: Missing Sub-Entities?
|
There is a relational rule that all of the attributes in an entity should describe all of its rows. Large numbers of NULLs in the sample data may show that some attributes in fact apply to only a defined subset of the rows. This may indicate the need for a new entity, called a sub-entity or generalization. Sub-entities that are split out from one entity have:
- The same PK as the original entity and as each other
- The PK for each is also an FK to the original entity
- Different attributes from one another
- No overlap in rows in the two sub-entities
Note: Sub-entities are also called sub-types, with the original entity called a super-type.
|
Example
|
Example: Notice that salaried employees have a monthly salary amount and bonus percent, while hourly employees have an hourly amount and overtime authorization. All employees have a PayType. The presence of NULLs in so many rows and columns indicates the need for sub-entities.

The Solution: Sub-Entities
Create two more sub-entities from the original such that the NULLs no longer appear:

|
Try It: Place Attributes
|
It is important at each step to document the model in order to validate
its completeness and accuracy. Below is the model as it exists with the entities
and relationships we've defined.
- For each blue column name (Attribute) from the list on the left click
the name of the entity it would be placed in as a non-key attribute
from the drop down menu to its right. When done, click the Show Model
button to see the complete model.
|
Congratulations
|
Congratulations!
You've just created your first mini-LDM.
An ERD is provided for students who are already conversant in that format.
Detailed discussion of ERDs is reserved for a later module.
See an
ER Diagram
of the model.
|
|
5: Bringing It All Together
|
Objectives
|
This module enables the student to describe and follow the high level steps for creating an LDM.
|
LDM Process Review
|
Essentially the previous modules have taken you through this process, but without focusing on its formalized structure. Below is a summary of the entire process as you would do it on a real life project.
|
Step 1: Define Business Requirements/Scope
|
The business requirements define the scope. As we develop the LDM we constantly validate it against the business requirements. Any component that does not relate directly to a business requirement is outside of scope for the project and is not included in the data model.
Make sure all of the business requirements are understood!
- What problems and issues need to be addressed? Usually these are related to increasing revenues or reducing costs.
- What business questions must the model be able to answer?
- What business functions must be handled?
- What business constraints exist?
- Is everyone who needs to be involved available to share their business requirements?
Determine the method(s) of collecting requirements:
- Review of existing material (such as existing reports)
- Interviews
- A combination of these methods
|
Step 2: Define Entities
|
- Develop the initial entity pool
- Develop definitions for the entities
- Eliminate entities that are outside the scope of the project
- Identify a PK for each remaining viable entity
- Document the viable entities
- Optional: Use table layouts with sample data for verification with users
- Required: Use ER Diagram for final official version
|
Step 3: Define Relationships
|
- Identify the relationships between entities
- Use a relationship matrix to avoid missing any
- Eliminate relationships that are outside the scope of the project
- Eliminate indirect relationships
- Name the remaining viable relationships
- Identify the viable relationships' cardinality (1:1, 1:M, M:M)
- Document the viable relationships in the model, using FKs
- Optional: Use table layouts with sample data for verification with users
- Required: Use ER Diagram for final official version
|
Step 4: Define Non-Key Attributes
|
- Identify and define relevant non-key attributes
- Eliminate attributes that are outside the scope of the project
- Use intuition to place each remaining viable attribute into only one table
- Verify placement of viable attributes
- Document the viable attributes in the model
- Optional: Use table layouts with sample data for verification with users
- Required: Use ER Diagram for final official version
- Capture constraints in a document for final model validation
|
Step 5: Verify the Model
|
- Repeat all steps as necessary
- Multiple iterations are usually required
- Any change in model causes a ripple effect and so requires very careful review:
- Changes in entities often impact relationship definition and attribute placement
- Changes in relationships often impact attribute placement
- Changes in attribute placement may impact other attribute placements
- Constantly verify scope by asking:
- What does this component mean, in terms of the business?
- What business requirement does this component address?
- Verify that all business requirements, questions and constraints are handled by the model
- Do not take into consideration any physical implementation issues!
- Stop refining the model when all data necessary to answer the business requirements are present
|
What is Metadata?
|
Metadata is information that describes the data structure of a system. An LDM is metadata. Metadata can be a difficult concept to grasp initially. Envision an LDM that describes a specific data model. Below is an example of how this would look for our hardware enterprise data model.
Notice the Entity that describes the hardware enterprise entities is called "Entity". This is why metadata is difficult to understand and describe: It is self-referent. "Entity" describes multiple levels of meaning in the sentence above. The graphic is much easier to understand.
 |