Architecture-First Producing the Data Model

Feb 16, 2022

Overview

The Data Model is an important artifact created on the way to producing the end database structure. It affects both the Business and Technical Architectures. A good Data Model can directly lead to either relational or NoSql structures.

The Data Model should be taken seriously. It is not just one time documentation. The resulting Data Model should show up in some form in the resulting database structure.

If an issue is found late in the project that requires a significant database change it should be considered a bug, since the Data Modeling effort should have caught it. That type of change may be costly and disruptive to the project. The Architecture-First goal is to prevent or greatly reduce the chance of late business changes occurring.

The Data Model is a key artifact for adhering to the Architecture-First Manifesto. The manifesto items will show up in both Business and Technical Architecture articles.

Architecture-First Manifesto

Architect the Ideal, Implement the Real
- This phrase means: base the architecture on the ideal vision regardless of constraints, but implement the various components based on the constraints
Use Concept Simplicity
- Choose concept simplicity and tune for performance
- Hard to understand code will be defect filled code
Show the business logic prominently in the code
- The code should not be so convoluted with technical logic that the core business logic for the application is difficult to understand
Use the Federal Government / State Government Model
- Determine what code and rules belong in the architecture and what are part of individual implementations
  - The Architecture and Infrastructure represent the Federal Government’s role, while the Implementation represents a State’s role
5-minute Rule
- The cause of major runtime problems should be able to be diagnosed within 5 minutes
Backward Compatibility
- Code should be backward compatible by default
Self-healing Code if Distributed Code
- Develop resilient and robust code in a distributed environment, such as a microservice environment or the cloud.
Trust-Based Components
- Trust other components to do their jobs, but handle situations if problems arise
Use upstream decisions to prevent downstream problems
- Use good upstream (planning, requirements, analysis, scenarios) techniques to reduce costly downstream (coding, test/fix cycle, data migrations) resolutions.

This Manifesto shows the Architecture-First principles. There will be references to this in various articles as they adhere to the philosophy.

Data Model mining techniques

A good Data Model takes time and iterations to produce. The process of creating the Data Model should help to refine the requirements just like Use Cases do.

Some steps to produce a Data Model are the following

Review Requirements
Review Use Cases
Review the Existing Database
Define Relationships
- cardinality is important
- leads to the Data Structure
Validate with the stakeholders

Review Requirements

The Data Modeler (DM) should review the Requirements and look for key items. For instance, in the retail example the Data Modeler should find items, such as

Product
Price
Inventory
Order
Shopping Cart

The DM should find relevant categories, such as

Merchandising
Billing
Order Management

Review Use Cases

The DM should review the Use Cases to get a general feel for what the application is doing. For instance, there is a different Data Model for a Content Management application versus a Retail application.

The DM should find the targets of the Use Cases and see if they should be modeled. For instance,

Customer
Product
Price
Discount

The review should guide the DM towards decisions, such as whether a Price is a first class entity or merely an attribute on the Product. The key questions will help, such as

Are Prices fixed or can they be in a monthly cycle?
Can the Product have a different Price for different Customers?
Can multiple Products be bundled with one Price?

From the Business Architecture standpoint, the decision on what should or should not be an entity is based on logical reasons and not due to a technical reason, such as database normalization. It is based on the logical importance of one or more entities.

Review the Existing Database

The ability to review the existing database is an optimal situation. It is a working example of data relationships. Regardless of this fortunate situation, the database structure should merely serve as conceptual input to the new Data Model. The goal is to produce an improved application and not just a similar database.

It is possible that three tables may converge into one conceptual entity. This is because tables are generally normalized into a relational structure for maintainability or other reasons. Therefore, there is not typically a one for one mapping of tables to Data Model entities.

Define Relationships

As the entities are found, they should be added to one or more diagrams. These entities should be modeled in a UML ( or TML :) form.

Unlike the classic Entity Relationship Diagram (ERD), where there may be 50 to 100 tables, the Domain Model should be contained in one or more simple diagrams. The goal is understanding where the non-technical stakeholders can understand these diagrams.

The ERD is more like a map than it is a summary of relationships. It just depicts the way to find and use the data. Alternatively, the Data Model diagrams will focus on business entities and their relationships.

Data Model

For the retail example, the Data Model above may be created. It is simpler than an ERD, but has similar features. The open arrow demonstrates a 'has a' relationship. For instance, a Product has a Price.

The closed arrow demonstrates an 'is a' relationship. For instance, a Guest is a Customer and a Registered Customer is a Customer. They are two different entities, but there is some commonality that can be reused.

The next important piece of information is cardinality. It shows the relationships between the entities where it will further affect the implementation in the database structure. The value is either a number or an asterisk (*), which means many. The relationship should be read from the perspective of the arrow direction. For instance, a Customer can only have one Shopping Cart, but can have many Orders. A Product can have different Prices based on scenarios and is a one to many relationship.

In a relational database, this information will lead to determining which table has a foreign key to the other. It may also uncover the need for an association table.

Validate with the Stakeholders

The DM should not create the Data Model in isolation. The resulting Data Model should be reviewed with the stakeholders.

It is important that the DM provides a list of assumptions for validation as well as questions. The feedback is incorporated into the model.

Finishing Up

Producing an accurate Data Model is important. If there are wrong assumptions in the model it will show up downstream in code and data in a costly fashion.

There should be traceability of the entities in the database and application back to the Data Model. Every level of the application should clearly show the business relationships.

Tony’s Technical Spot

Discussion about this post