Part 1, Part 2, Part 3

A wise man* once said: To build the Great Wall of China, you must start with a brick. In our previous articles we acquainted ourselves with our bricks; now it’s time to build. If we consider what we’ve learned so far – Select, Update, Insert, Delete – as unicellular organisms, what we’re about to do next is multicellular, big and beautiful – it’s like going from an amoeba to a bee, a butterfly, to Beyoncé.

[*that wise man was me]

Consider a real-world requirement that we might have of our Addressbook database; since it contains a list of our friends and their phone numbers, we will naturally want to see a list of their names and their phone numbers. Ah, but that presents a problem. Our friends’ names are in the FRIEND_NAME table, while their phone numbers are in the PHONE_NUMBER table. And complicating things further, we can only tell which number belongs to which friend by looking in the FRIEND_PHONE table. Aaargh!

Joins

We could, of course, get the information by running a series of queries: one select to find our friends’ names and their friend_id; a second to find the phone_id of the phone number linked to each friend in FRIEND_PHONE; and a third query to find the number from PHONE_NUMBER using the phone_id we identified in our second query. So yes, it can be done. But hey, you can probably ride a unicycle across Siberia – but that didn’t stop them from inventing the car.

What we need are joins. We need a select statement that can query multiple tables at the same time.

The syntax for a multi-table select statement is as follows:

SELECT <comma-separated column list>
FROM <comma-separated table list>
WHERE <conditions>

Let me translate that into an actual query for you.

SELECT FIRST_NAME, LAST_NAME, PHONE_NUMBER
FROM FRIEND_NAME, FRIEND_PHONE, PHONE_NUMBER;

Run that and see what you get. You’ve probably already guessed that I’m setting you up, but it’s important that we make these mistakes now, so we can learn about them. The above query will give you many, many rows that look identical; however, if you replace the column list with an asterisk (*) and rerun the query, you’ll notice that the records aren’t exactly identical, each has one column different.

What we have here is a Cartesian Product, and you’ve probably already guessed that it has something to do with our missing Where clause. You’re right. Whenever we don’t tell Oracle how our tables are related to each other it simply joins every record in every table to every record in every other table. This doesn’t only happen when our Where clause is completely missing; the same thing would happen if we were joining 3 tables, but only included 2 in our Where clause or if we joined the tables ambiguously (always join using key columns where possible).

So if you ever notice that your query is returning more rows that you anticipated, look for a Cartesian join. Got that? Good, let’s continue. Run the following statement:

SELECT FRIEND_ID, FIRST_NAME, LAST_NAME, PHONE_NUMBER
FROM FRIEND_NAME, FRIEND_PHONE, PHONE_NUMBER
WHERE FRIEND_NAME.FRIEND_ID = FRIEND_PHONE.FRIEND_ID
AND FRIEND_PHONE.PHONE_ID = PHONE_NUMBER.PHONE_ID;

I set you up again; sorry. Running the query will result in the following error:

ORA-00918: column ambiguously defined

The issue here is that if two or more columns of the same name (such as friend_id in our select list) exist in two or more of our tables, we must always tell Oracle which one we are referring to every time we use that column name. We do this by preceding the column name with the table name. In fact, it is good practice to do this with all columns in a multi-table statement. Makes it easier to read.

Although if you had to type out the full table names each time you referred to a column in a long statement, you’ll soon be in hospital with RSI. To get around this problem, you can use table aliases. These are short nicknames you can give to tables (to do this you follow the table name immediately with the alias in your From statement). Let me rewrite our query to illustrate this:

SELECT FN.FRIEND_ID, FN.FIRST_NAME, FN.LAST_NAME, PN.PHONE_NUMBER
FROM FRIEND_NAME FN, FRIEND_PHONE FP, PHONE_NUMBER PN
WHERE FN.FRIEND_ID = FP.FRIEND_ID
AND FP.PHONE_ID = PN.PHONE_ID
ORDER BY FN.FIRST_NAME, FN.LAST_NAME;

Run the above statement. It’s perfect, isn’t it?

SYSDATE

Not quite.

My data and yours might now be very different because of all the practising you’ve been doing (you have been practising, haven’t you? These articles might be good, but you may as well be reading a Dan Brown novel if you don’t put what you’re learning into practise). However, when I run the query, I get 2 rows for Rachel – one for her current phone number and another for her previous one.  But how can we tell which is which?

Of course we can simply add fp.start_date and fp.end_date to our column list and look to see which record is current. Alternatively, we can add the condition “AND FP.END_DATE IS NULL” to our Where clause and exclude end-dated rows in that way. But what if Rachel had told us that she was changing her phone number next month and we’d put in a future end-date? Excluding all end-dated rows now would give us the wrong result.

What we need to do is query against today’s date. But we don’t want to hard-code the date into our query, in case we want to rerun the query next week. What we need is SYSDATE.

Run the following query:

SELECT SYSDATE
FROM DUAL;

It should return today’s date. And no matter which day you run it, it should return the correct date.

Now we can rewrite our query as:

SELECT FN.FRIEND_ID, FN.FIRST_NAME, FN.LAST_NAME, PN.PHONE_NUMBER
FROM FRIEND_NAME FN, FRIEND_PHONE FP, PHONE_NUMBER PN
WHERE FN.FRIEND_ID = FP.FRIEND_ID
AND FP.PHONE_ID = PN.PHONE_ID
AND (FP.START_DATE IS NULL OR FP.START_DATE<= SYSDATE) 
AND (FP.END_DATE IS NULL OR FP.END_DATE > SYSDATE);

And that’s it. Perfect.

NVL()

Actually, maybe we can tighten it a little more. We’ve guarded against Cartesian joins, we’re using table aliases, and comparing our dates to SYSDATE, but we can use a function named NVL() to tidy up this condition: and (fp.end_date is null or fp.end_date > sysdate).

NVL() is a function used to test if a value is null and to replace it with an alternative value if it is. Its syntax is as follows:

NVL(value, replacement_value)

The NVL function will return value if it is not null, but will return replacement_value if it is. If both value and replacement_value are null, NVL will return null. Value and replacement­_value can be of whichever datatype you like.

Let us use NVL in our query.

SELECT FN.FRIEND_ID, FN.FIRST_NAME, FN.LAST_NAME, PN.PHONE_NUMBER
FROM FRIEND_NAME FN, FRIEND_PHONE FP, PHONE_NUMBER PN
WHERE FN.FRIEND_ID = FP.FRIEND_ID
AND FP.PHONE_ID = PN.PHONE_ID
AND NVL (FP.START_DATE,SYSDATE) <= SYSDATE 
AND NVL (FP.END_DATE,SYSDATE+1)  > SYSDATE;

Our query is a thing of beauty, but you must have noticed that it does not return a complete list of our friends. You may have already figured out why; it only returns friends that have a phone number. But what if we wanted to see a full list of our friends with a null if they do not have a phone number?

To do this I’ll need to tell you about outer joins.

Outer Join

The joins we’ve been using thus far are effectively simple inner joins.  When two tables are joined using a simple join (e.g. fn.friend_id = fp.friend_id) records in both tables must satisfy the condition to appear in our resultset. However, with an outer join we can ask Oracle to impose our rule on one of our tables and return nulls whenever the other table fails the test. In other words, we can say, we want to see all of our friends (all records in friend_name), and we don’t mind seeing nulls whenever they don’t have a phone number.

There are two types of outer joins; a left outer join allows nulls in the second table in our join, while a right outer join allows nulls in the first table (while showing all records from the table on the right).

The syntax is as follows:

SELECT <comma-separated column list>
FROM  <table1>
[left|right] outer join <table2>
ON <join condition>
[WHERE clause];

Our query is a little complex since it joins three tables; you might find it more helpful if I illustrate the point by starting small, with only two tables – FRIEND_NAME and FRIEND_PHONE.

SELECT FN.FRIEND_ID, FN.FIRST_NAME, FN.LAST_NAME, FP.PHONE_ID
FROM FRIEND_NAME FN
LEFT OUTER JOIN FRIEND_PHONE FP
ON FN.FRIEND_ID = FP.FRIEND_ID
and NVL (FP.START_DATE,SYSDATE) <= SYSDATE
AND NVL (FP.END_DATE,SYSDATE+1)  > SYSDATE
ORDER BY FN.FRIEND_ID;

Run the query and notice how the resultset now contains all your friends, including those who do not have a phone number.  Notice how, by using a left outer join, we got all the rows in friend_name (the table on the left of the join); change it to a right outer join and see how that changes the output.

Interesting, isn’t it?  But we need to add a third table – PHONE_NUMBER – to our query to make it useful. Let me show you how to do that.

SELECT FN.FRIEND_ID, FN.FIRST_NAME, FN.LAST_NAME, PN.PHONE_NUMBER
FROM FRIEND_NAME FN
LEFT OUTER JOIN FRIEND_PHONE FP
   ON FN.FRIEND_ID = FP.FRIEND_ID
   AND NVL (FP.START_DATE,SYSDATE) <= SYSDATE
   AND NVL (FP.END_DATE,SYSDATE+1)  > SYSDATE
LEFT OUTER JOIN PHONE_NUMBER PN
   ON FP.PHONE_ID = PN.PHONE_ID
ORDER BY FRIEND_ID;

Joining multiple tables in this way is a little like baking a cake; you apply your joins layer by layer. (NB: I’ve never baked a cake before, but I once saw someone bake a cake on TV, and I remember thinking, “Hey, that’s just like writing an outer join query!”)

If you’re up for a challenge, why don’t you add a few more layers to our query? Expand the query to display our friends’ addresses: to do this you’ll need to join the FRIEND_ADDRESS and ADDRESS tables.

While you’re doing that, I’ll go eat some cake. I’m suddenly hungry for some reason.

Next time we’ll look at PL/SQL – functions and procedures.

 

Next Article: Oracle for Absolute Beginners: Part 5 – PL/SQL

Tags: ,