I wanted to start collect recepies using a RDBMS. The recepies was primarily intended to be displayed by a webbrowser, so HTML was the target format. By using XML/XSLT some of the boring details of the conversion to HTML was abstracted away.
Each recepies has a general form which can be described like this:
ingredience should be used.
The tricky part here is the list of ingrediences, since it has an arbitrary number of items.
The wrong way of tackling this is to set up a maximum number of ingrediences and use that number of fields for the ingrediences in all recepies. That wastes fields and still doesn't let you include as many ingrediences as you may want.
Another bad way would be to dump all ingrediences into a single field since that would have to be converted into HTML by some additional algorithm that split the ingrediences on an character that by convention was used to (really) separate ingrediences.
The elegant solution to the problem is to not store the information of the ingrediences in the record of the rececpy, but to use another table with ingrediences and in the record of each ingredience store a reference to the recepie in which this ingredience is a part. So the recipy becomes a parent of the ingredience, or the particular combination of the ingredience and a certain amount of it. If you want to be able reuse the ingredience records, then store the "how much"-part in a third table which then refer to the table of ingrediences with a reference.
How much -> | What -> | Recipy |
---|---|---|
500 g -> | Cheese -> | Lasagne |
3 dl -> | Cream -> | Lasagne |
3 st -> | Onion -> | Lasagne |
Select header,description from recepies where id=16; Select howmuch,what from howmuch,what where what.id=howuch.id and howmuch.recipyid=16;
If you prefer to use a single sql-query it could be done, but will give overhead in form of duplicated data.
Select howmuch,what from howmuch,what where what.id=howuch.id and howmuch.recipyid=16, header,description from recepies where id=16;
If the reuse of the ingrediences doesn't turn you on then you don't need to completely separate "how much" from "what", you could have two fields in the same table for them.
Ingredience (howmuch, what) | Recipy (Name) |
---|---|
500 g, Cheese -> | Lasagne |
3 dl, Cream -> | Lasagne |
3 st, Onion -> | Lasagne |
Select header,description from recepies where id=16; Select howmuch, what from ingrediences where ingredience.recipyid=16;
If you really liked to be able to reuse for ingredience items in different recepies, and would like to be able to do the same with the "how-much part", then here is how:
Table Item
id | ref to howmuch | ref to what | ref to recipy |
---|---|---|---|
1 | 143 | 423 | 16 (there should be "500g" "Cheese" in "Lasagne") |
2 | 431 | 731 | 16 (there should be "3 dl" "cream" in "Lasagne") |
3 | 739 | 901 | 16 (there should be "3 st" "onions" in "Lasagne") |
Table Howmuch
id | howmuch |
---|---|
143 | "500g" |
431 | "3 dl" |
739 | "3 st" |
Table What
id | what |
---|---|
423 | "Cheese" |
731 | "Cream" |
901 | "Onion" |
Table Recipy
id | title | header | description |
---|---|---|---|
16 | "Lasagne" | "dish" | "Peel and chop the onions, gently fry them..." |
23 | "Orange juice" | "drink" | "Cut the organges in halves." |
Select header,description from recepies where id=16; Select howmuch from Howmuch where id in (select ref to howmuch where ref to recipy = 16), what from What where id in (select ref to what where ref to recipy = 16);
To generalise:
An object having a arbitrary number of properties can be effectively represented in a RDBMS in the following way:
Table A contains formalized (fixed-set) information about the particular object, including a unique identifier, typically also a name or a title (that is, a human meaningful unique identifier).
Table B only contains information on properties.
Table C contains information on connection between properties and object. Each record in C contains a pointer to a property and an object, meaning that the object has that property.
If properties is to be treated differently depending on their essence, or kind, then make a table Bx for properties of kind x and By for properties of kind y.
Question open for discussion: Should Table A ever hold any other fields than a numerical id and a name? (my answer: yes, all properties that are likely to be unique for an object, like the description of how to apply the ingrediences in a recipy, in short, every information that we will not try to reuse)
Implementation
Consider a (bad) implementation with a fixed number of ingrediences:
my $datalist = [ # field, table, description (may contain HTML tags), field name in object table (if empty, the field-name in the table of properties will be used) 'name', 'matratter.matratter', 'Maträtt: ', 'id', 'name', 'matratter.ingredienser', '<p>Basingrediens: ', 'ingrediens1', 'name', 'matratter.ingredienser', '<p>Ingrediens 2: ', 'ingrediens2', 'name', 'matratter.ingredienser', '<p>Ingrediens 3: ', 'ingrediens3', ];
Here is how to structure the data better
my $datalist = [ # field, table, description, field name in the connection table, 'title', 'recipy', 'title of the recipy', '', 'header', 'recipy', 'a description of the thing, '', you can make with the recipy', '', 'type', 'types', 'what type of thing is that', '', 'what', 'what', 'list of ingrediences', 'ref to what', 'howmuch', 'howmuch', 'list of amounts', 'ref to howmuch', 'howto', 'recipy', 'What to do with the ingrediences', '', ]
(like the "type" of object) is coded in a table of its own (here "types").
The core of the solution is the connections table, which defines what properties belongs to what object. How do the system know that the list of ingrediences is to be paired with the list of amounts, rather than just being two different properties of the object(s)?
Let's generalise even further: introduce an arbitrary amount of lists, by using one connections table for each list (column "connection table" below)
my $datalist = [ # field, table, description, field name in the connection table, connection table 'title', 'recipy', 'title of the recipy', '', '', 'header', 'recipy', 'a description of the thing, '', '', you can make with the recipy perhaps also something on what makes this version better than others', '', '', 'type', 'types', 'what type of thing is that', '', '', 'what', 'What', 'list of ingrediences', 'ref to what', 'listofingred', 'howmuch', 'Howmuch', 'list of amounts', 'ref to howmuch', 'listofingred', 'howto', 'recipy', 'What to do with the ingrediences', '', '', 'who', 'Who', 'list of persons', 'ref to persons', 'listofratings', 'rating', 'Rating', 'list of ratings', 'ref to ratings', 'listofratings', ]
Now, the core is decentered because we have two tables with connections: 'listofingred' for the amounts and ingrediences and 'listofratings' for a list of personal ratings.
The numerical id of the the recipy ties together all of it.
Here's how to get all info
First the fixed form information:
select title, header, types, howto from recipy,types where types.recipyid=recipy.id and recipy.id=16;
Then the list of ingrediences:
Select howmuch from Howmuch where id in (select ref_to_howmuch from listofingred where ref_to_recipy = 16), what from What where id in (select ref_to_what from listofingred where ref_to_recipy = 16);
And the list of ratings:
Select who from Who where id in (select ref_to_persons from listofratings where ref_to_recipy = 16), rating from Rating where id in (select ref_to_ratings from listofratings where ref_to_recipy = 16);
Here is the structure of the second connection table ('listofratings')
id | ref_to_ratings | ref_to_persons | ref_to_recipy |
---|---|---|---|
1 | 2 | 423 | 16 (= person number 423 gave recipy 16 the rating "good") |
2 | 4 | 731 | 16 |
3 | 3 | 901 | 16 |
Table Ratings
id | rating |
---|---|
1 | "very good" |
2 | "good" |
3 | "standard" |
4 | "will not do again" |
Here is another example, this time a system for keeping record of mah-jong games. Each game is part of a match, and depending on the previous game in the match, different rules apply. So the order of the games is important. A match consists of at least 16 games, but most often there are more games than 16. So the problem here is to represent a match consisting of an arbitrary number of games.
The number of players can be three or four, but every game in a match must involve the same players, so information about the players really should be recorded at the match level.
Table matches
id | ref_to_player1 | ref_to_player2 | ref_to_player3 | ref_to_player4 | wind first round (ref_to_winds) |
---|---|---|---|---|---|
1 | 1 | 3 | 5 | NULL | 2 |
2 | 3 | 4 | 5 | 2 | 0 |
Rather than having a special field for the pid of the player who starts as east, I have choosen to define player one as the player who starts as east, and player two as the player who plays east after player one and so on.
Table players
id | name |
---|---|
1 | Petra Andersson |
2 | Johannes Nilsson |
3 | Ulrika Berglund |
4 | Rolf Borjlind |
5 | Hanna Gunnarsson |
Table winds
id | name |
---|---|
1 | east |
2 | north |
3 | west |
4 | south |
Table games
id | match_id | order_in_the_match | player1_results | player2_results | player3_results | players4_results | which player did mah-jong (ref_to_players) |
---|