I’m working on a server application and I’m fiddling with the database design at the moment, trying different ideas. It’s been a long time since I did any database design work (the Continua CI schema back in 2012) so I’m a bit rusty!
In the past I have usually normalised the tables (ie used relationships rather than duplicate data) - the benfits are
- Data integrity
- Data size
The downsides are
- it’s a pain to work with - joins and more joins and more…
- performance - all those joins slow things down.
Since this application will be heavily skewed towards reads (search) I’m leaning towards denormalising a bit (at the moment it’s completely normalised). There will be inserts of course but very little in the way of updates (initial processing after which the data doesn’t change).
FWIW, this is for my package manager project, the tables look like this (well the ones that matter)
Package (name) → PackageTargetPlatform (compiler, platform) → PackageVersion(version, meta data etc)
Right now in testing there are only 36 rows in the Package table, a few hundred in the PackageTargetPlatform table - the PackageVersion table has 6500 rows - the explosion comes from the combinations of compiler version, platform and package versions (13 compiler versions, win32/64/mac/andriod etc). So the PackageVersion table has he potential to blow out to quite a few rows over time.
Using Postgresql 13 FWIW (have used it for years with Continua CI).
I obviously need to do some testing with the different options, but I’m wondering what others do? How far do you go with denormalising (if at all) and how do you manage data integrity if you do.