Storing records with varying number of sub-items (e.g. trees) in RDBMS

I wanted to start collect recepies using a RDBMS. The recepies was primarily intended to be displayed by a webbrowser, so HTML was the target format. By using XML/XSLT some of the boring details of the conversion to HTML was abstracted away.

Each recepies has a general form which can be described like this:

  1. a header with info on what the recepie is for, and perhaps a few keywords or tags that describes this, like "drink" or "main dish".
  2. a list of ingredienses including information on how much of this

    ingredience should be used.

  3. a description on what to do with the ingrediences.

The tricky part here is the list of ingrediences, since it has an arbitrary number of items.

The wrong way of tackling this is to set up a maximum number of ingrediences and use that number of fields for the ingrediences in all recepies. That wastes fields and still doesn't let you include as many ingrediences as you may want.

Another bad way would be to dump all ingrediences into a single field since that would have to be converted into HTML by some additional algorithm that split the ingrediences on an character that by convention was used to (really) separate ingrediences.

The elegant solution to the problem is to not store the information of the ingrediences in the record of the rececpy, but to use another table with ingrediences and in the record of each ingredience store a reference to the recepie in which this ingredience is a part. So the recipy becomes a parent of the ingredience, or the particular combination of the ingredience and a certain amount of it. If you want to be able reuse the ingredience records, then store the "how much"-part in a third table which then refer to the table of ingrediences with a reference.

How much -> What -> Recipy
500 g -> Cheese -> Lasagne
3 dl -> Cream -> Lasagne
3 st -> Onion -> Lasagne
Select header,description from recepies where id=16;
Select howmuch,what from howmuch,what where what.id=howuch.id and howmuch.recipyid=16;

If you prefer to use a single sql-query it could be done, but will give overhead in form of duplicated data.

Select howmuch,what from howmuch,what where what.id=howuch.id and howmuch.recipyid=16, header,description from recepies where id=16;

If the reuse of the ingrediences doesn't turn you on then you don't need to completely separate "how much" from "what", you could have two fields in the same table for them.

Ingredience (howmuch, what) Recipy (Name)
500 g, Cheese -> Lasagne
3 dl, Cream -> Lasagne
3 st, Onion -> Lasagne
Select header,description from recepies where id=16;
Select howmuch, what from ingrediences where ingredience.recipyid=16;

If you really liked to be able to reuse for ingredience items in different recepies, and would like to be able to do the same with the "how-much part", then here is how:

Table Item

id ref to howmuch ref to what ref to recipy
1 143 423 16 (there should be "500g" "Cheese" in "Lasagne")
2 431 731 16 (there should be "3 dl" "cream" in "Lasagne")
3 739 901 16 (there should be "3 st" "onions" in "Lasagne")

Table Howmuch

id howmuch
143 "500g"
431 "3 dl"
739 "3 st"

Table What

id what
423 "Cheese"
731 "Cream"
901 "Onion"

Table Recipy

id title header description
16 "Lasagne" "dish" "Peel and chop the onions, gently fry them..."
23 "Orange juice" "drink" "Cut the organges in halves."
Select header,description from recepies where id=16;
Select howmuch from Howmuch where id in (select ref to howmuch where ref to recipy = 16), what from What where id in (select ref to what where ref to recipy = 16);

To generalise:

An object having a arbitrary number of properties can be effectively represented in a RDBMS in the following way:

Table A contains formalized (fixed-set) information about the particular object, including a unique identifier, typically also a name or a title (that is, a human meaningful unique identifier).

Table B only contains information on properties.

Table C contains information on connection between properties and object. Each record in C contains a pointer to a property and an object, meaning that the object has that property.

If properties is to be treated differently depending on their essence, or kind, then make a table Bx for properties of kind x and By for properties of kind y.

Question open for discussion: Should Table A ever hold any other fields than a numerical id and a name? (my answer: yes, all properties that are likely to be unique for an object, like the description of how to apply the ingrediences in a recipy, in short, every information that we will not try to reuse)

Implementation

Consider a (bad) implementation with a fixed number of ingrediences:

my $datalist = [

# field,     table,                   description (may contain HTML tags),  field name in object table (if empty, the field-name in the table of properties will be used)
'name',   'matratter.matratter',        'Maträtt: ',                       'id',
'name', 'matratter.ingredienser',     '<p>Basingrediens: ',                 'ingrediens1',
'name', 'matratter.ingredienser',     '<p>Ingrediens 2: ',                  'ingrediens2',
'name', 'matratter.ingredienser',     '<p>Ingrediens 3: ',                  'ingrediens3',
];

Here is how to structure the data better

my $datalist = [

# field,     table,	description,	            field name in the connection table,
'title',    'recipy',	'title of the recipy',	        '',
'header',   'recipy',	'a description of the thing,    '',
                         you can make with the recipy', '',
'type',     'types',    'what type of thing is that',   '',
'what',	    'what',	'list of ingrediences',	        'ref to what',
'howmuch',  'howmuch',	'list of amounts',	        'ref to howmuch',
'howto',    'recipy',   'What to do with the ingrediences', '', ]

The core of the solution is the connections table, which defines what properties belongs to what object. How do the system know that the list of ingrediences is to be paired with the list of amounts, rather than just being two different properties of the object(s)?

Let's generalise even further: introduce an arbitrary amount of lists, by using one connections table for each list (column "connection table" below)

my $datalist = [

# field,     table,	description,	            field name in the connection table,      connection table
'title',    'recipy',	'title of the recipy',	        '',                                  '',
'header',   'recipy',	'a description of the thing,    '',                                  '',
                         you can make with the recipy
                         perhaps also something on what
                         makes this version
                         better than others',           '',                                  '',
'type',     'types',    'what type of thing is that',   '',                                  '',
'what',	    'What',	'list of ingrediences',	        'ref to what',                       'listofingred',
'howmuch',  'Howmuch',	'list of amounts',	        'ref to howmuch',                    'listofingred',
'howto',    'recipy',   'What to do with the ingrediences', '',                              '',
'who',      'Who',      'list of persons',              'ref to persons',                    'listofratings',
'rating',   'Rating',   'list of ratings',              'ref to ratings',                    'listofratings',

]

Now, the core is decentered because we have two tables with connections: 'listofingred' for the amounts and ingrediences and 'listofratings' for a list of personal ratings.

The numerical id of the the recipy ties together all of it.

Here's how to get all info

First the fixed form information:

select title, header, types, howto from recipy,types where types.recipyid=recipy.id and recipy.id=16;

Then the list of ingrediences:

Select howmuch from Howmuch where id in (select ref_to_howmuch from listofingred where ref_to_recipy = 16), what from What where id in (select ref_to_what from listofingred where ref_to_recipy = 16);

And the list of ratings:

Select who from Who where id in (select ref_to_persons from listofratings where ref_to_recipy = 16), rating from Rating where id in (select ref_to_ratings from listofratings where ref_to_recipy = 16);

Here is the structure of the second connection table ('listofratings')

id ref_to_ratings ref_to_persons ref_to_recipy
1 2 423 16 (= person number 423 gave recipy 16 the rating "good")
2 4 731 16
3 3 901 16

Table Ratings

id rating
1 "very good"
2 "good"
3 "standard"
4 "will not do again"

Here is another example, this time a system for keeping record of mah-jong games. Each game is part of a match, and depending on the previous game in the match, different rules apply. So the order of the games is important. A match consists of at least 16 games, but most often there are more games than 16. So the problem here is to represent a match consisting of an arbitrary number of games.

The number of players can be three or four, but every game in a match must involve the same players, so information about the players really should be recorded at the match level.

Table matches

id ref_to_player1 ref_to_player2 ref_to_player3 ref_to_player4 wind first round (ref_to_winds)
1 1 3 5 NULL 2
2 3 4 5 2 0

Rather than having a special field for the pid of the player who starts as east, I have choosen to define player one as the player who starts as east, and player two as the player who plays east after player one and so on.

Table players

id name
1 Petra Andersson
2 Johannes Nilsson
3 Ulrika Berglund
4 Rolf Borjlind
5 Hanna Gunnarsson

Table winds

id name
1 east
2 north
3 west
4 south

Table games

id match_id order_in_the_match player1_results player2_results player3_results players4_results which player did mah-jong (ref_to_players)

comments powered by Disqus


Back to the index

Blog roll

R-bloggers, Debian Weekly
Valid XHTML 1.0 Strict [Valid RSS] Valid CSS! Emacs Muse Last modified: 2007-06-18