I have been thinking recently about how the database can influence our design decisions, and perhaps makes them harder than they need to be in some cases. An example of this is the design of a system which stores data about people, specifically for this, their email addresses. A cut down version of the structure is this:
Which is represented in code like so:
While this works, it is heavily influenced by the storage technology. Email addresses by definition are unique already, so why do we need a primary key column? They are also associated with exactly one person, so the
person_id column is only here to facilitate that. Why not get rid of the emails table completely, and store the person's email addresses in a single column in the person table? This could be done with a simple csv, but it would be more fore-thinking to use json, so we can associate a little more data with each email address.
So before we get on to putting data in one column, what reasons we have to keep emails as a separate table?
- So they can be queried by ID.
- So we can put a constraint across
- So all emails of a given type can be found.
- So a person can be looked up by email.
- So we can attach rules to them.
The first three can be answered easily: you never query for an email address by its primary key, only by the address itself. The constraints are really just a safety net, and a nice to have at best - the collection which manages emails is well tested, implements all business rules properly, and everything which deals with emails uses it. Getting all emails of a particular type is a legitamate requirement, but can be gotten around in several ways: selecting the entire email column in a sql query, and doing additional filtering client side for the specific email types, or if you are using a database which supports json querying (such as postgres), using that to narrow the results down.
The final point is the most interesting, as it could be resolved with a few different designs. The current design has one additional table:
Person object has a method like this:
While this works, the design has a few problems:
- There is no foreign keying of
- So you have to remember to delete rules when deleting any entity
- You have to remember if an entity is valid for rules to be attached to
- If normalisation was your argument for an
emailstable, explain this table relationship...
There are two solutions to this problem:
The first is to change the rules table to just have a
target column, and put the unique data in there e.g. a rule for an email would have the email address in the
target column, a rule for a phone number would have the actual phone number in the
target column. While this works, it doesn't really improve the design of the system; we still have the existing joins and "remember to also" problems of before.
The second solution is to remove the
rules table entirely and implement rules as small collections on each target entity, and make the
person.Rules property a readonly aggregate. This has a few advantages: each entity explicitly has a rule collection if applicable, and we no longer need to remember to check another collection for updates/deletes.
The implementation of a
.Rules property on each entity is trivial - just a standard list property:
As we don't wish to repeat the logic on each collection of rules, we can add an extension method for checking if rules are in force:
And finally on the
Person object itself, we can make a simple aggregate property for all child entity's rules:
Personally I prefer the 2nd form of this, as it makes domain modelling a lot more straight forward - however like all things, you should consider all your requirements carefully - and don't let the database (sql or nosql variety) dictate your model.