Database Normalization Rules - Eliminate Repeating Groups

With the large number of software applications in the wild, from desktop, to web, to mobile, the vast majority employ some form of database. As development teams get smaller and consist of fewer seasoned professionals, the databases that run these applications are often poorly designed, inflexible, and difficult to scale. By following a few core principles of database design, however, many of these databases could be substantially improved. The first and foremost rule of good database design: eliminate repeating groups.

A repeating group is, in essence, multiple columns that form a series. Take for example a spreadsheet with the following columns:

Author Name
Author Address
Author City
Author State
Book 1 Name
Book 2 Name
Book 3 Name
Book 4 Name

This spreadsheet, or table, represents a listing of authors and the books they have written. The weakness of repeating groups becomes immediately apparent once one of the authors writes a fifth book. As soon as that happens, the programmer needs to be called back in to expand the system and add a new column for that fifth book.

Another weakness of repeating groups is the excess data they require. Each row takes up 4 spaces, one for each book, even if the author only has 1 or 2 books total. This results in a database that takes up too much space on the disk and has slow performance.

Now, let’s say that for each book, we also want to store its published date. The number of columns grows even bigger:

Author Name
Author Address
Author City
Author State
Book 1 Name
Book 1 Publish Date
Book 2 Name
Book 2 Publish Date
Book 3 Name
Book 3 Publish Date
Book 4 Name
Book 4 Publish Date

Soon this table will be extremely difficult to manage.

The solution to this problem is to add ID columns, and create a second table, or spreadsheet, for the books:

Author Table

Author ID
Author Name
Author Address
Author City
Author State

Book Table

Book ID
Author ID
Book Name
Book Publish Date

Using two tables significantly reduces the space required, increases system flexibility, and results in a much better database foundation for the software. Although it takes a little more work in development up-front, this database design principle will pay significant dividends over the lifetime of the system. Very important in software development.

Written by Andrew Palczewski

About the Author
Andrew Palczewski is CEO of apHarmony, a Chicago software development company. He holds a Master's degree in Computer Engineering from the University of Illinois at Urbana-Champaign and has over ten years' experience in managing development of software projects.
Google+

Database Design Principles – Part 1 of 5: Eliminate Repeating Groups

Author Table

Book Table

One thought on “Database Design Principles – Part 1 of 5: Eliminate Repeating Groups”

Leave a Reply Cancel reply