Relational Algebra



Relational Algebra is the mathematical basis for the query language SQL


Introduction.

So now you have learn how to design good conceptual models to store information with the ER-model
And you also know how to turn an ER-model into a Relational model

Let us jump the next step – which is defining the Relational Database and entering the data into the database – and look at data manipulation

What can we do with the data ????





Relational Algebra: Overview

is a mathematical language for manipulating relations (ala mathematical definition of “relation” – which is a subset of some cartesian product)

contains:

set operators
relational database specific operators
set functions

Relational Algebra 关系代数-冯金伟博客园

is used to specify a result set that satisfies a certain “selection criteria”

the result of an operation in relational algebra is always a relation





Set operations

You should all know what set union is:

Suppose A = {i, j, k} and B = {k, x, y}
Then the union of A and B is {i, j, k, x, y}

Note that tuples in sets A and B must have the same FORMAT (same number and type of attributes) to be used in the union operation.

You should also have learned what set intersection is:

Suppose A = {i, j, k} and B = {k, x, y}
Then the intersection of A and B is {k}

Note that tuples in sets A and B must have the same FORMAT (same number and type of attributes) to be used in the intersection operation.

You should also have learned what set difference is:

Suppose A = {i, j, k} and B = {k, x, y}
Then the difference of A and B is {i, j}

Note that tuples in sets A and B must have the same FORMAT (same number and type of attributes) to be used in the set difference operation.

I have already discussed the cartesian productclick here 





Set Functions

Operates on a set of values and produce a single value

To illustrate the various set function, we apply the set functions on this set of value:

A = {1, 4, 9}

Results:

sum(A) = sum of all values of A (answer = 14)
avg(A) = average of all values in A (answer= 4.6666)
max(A) = max of all values in A (answer = 9)
min(A) = min of all values in A (answer = 1)
any(A) = test if set A has at least one element (answer = TRUE) (Note: result FALSE means set A is empty)
count(A) = cardinality of set A (answer = 3 (= number of elements in set))



Relational database specific operations



σ (Condition) (R)         (selection)

Selects tuples from a relation R that satisfies the condition Condition
The Condition expression contains constants and/or attributes of relation R (the condition expression cannot be a set !!!)
The result of σ (Condition) (R) is a subset of tuples of R that satisfies the boolean condition Condition

Examples:

Retrieve all employee tuples working for department number 4:

σ (dno = 4) (employee) 

The following diagram shows the result graphically (hopefully, it will illustrates the concept unambiguously):

Relational Algebra 关系代数-冯金伟博客园

Retrieve all employees earning more than $30,000:

σ (salary > 30000) (employee) 

Retrieve all employees working for department number 4 and eaning more than $30,000:

σ (dno = 4 and salary > 30000) (employee) 

OR:

σ (salary > 30000) ( σ (dno = 4) (employee) )




π (attribute-list)         (projection)

Selects out only the attribute values attribute-list from all tuples in relation R
The attribute-list contains the list of attributes in relation R that will be selected.
The result of π (attribute-list) (R) contains the is a subset of tuples of R that satisfies the boolean condition Condition

Examples:

Retrieve all SSN from the employees

π ssn (employee) 

The following diagram shows the result graphically (hopefully, it will illustrates the concept unambiguously):

Relational Algebra 关系代数-冯金伟博客园

Retrieve the sex information from all employees:

π sex (employee) 

The following diagram shows the result graphically (hopefully, it will illustrates the concept of projection unambiguously):

Relational Algebra 关系代数-冯金伟博客园

Notes:

The output of a Relational Algebra query is set.

set in Mathematics does not have duplicate elements

Therefore, the result of πsex (employee) is the set {F, M} because duplicates are removed


In SQL, the user will have the option to remove duplicates

That is because removing duplicates require longer time to process….

(The user has the option to incur the cost or not…)





Combining information from 2 relations

Consider the following query:

Find FName and LName of all employees in the “Research” department

Observation:

We will need to combine information in the employee and department relations to answer the query.

But we must be careful in combining the information:

   Employee:
   +-----------+--------+---------+-----------+-----+ 
   | ssn       | fname  | lname   | bdate     | dno | .... (other columns omitted)     
   +-----------+--------+---------+-----------+-----+ 
   | 123456789 | John   | Smith   | 09-JAN-55 |   5 | 
   | 333445555 | Frankl | Wong    | 08-DEC-45 |   5 | 
   | 999887777 | Alicia | Zelaya  | 19-JUL-58 |   4 | 
   | 987654321 | Jennif | Wallace | 20-JUN-31 |   4 | 
   | 666884444 | Ramesh | Narayan | 15-SEP-52 |   5 | 
   | 453453453 | Joyce  | English | 31-JUL-62 |   5 | 
   | 987987987 | Ahmad  | Jabbar  | 29-MAR-59 |   4 | 
   | 888665555 | James  | Borg    | 10-NOV-27 |   1 | 
   +-----------+--------+---------+-----------+-----+ 

   Department:
   +----------------+---------+-----------+--------------+
   | dname          | dnumber | mgrssn    | mgrstartdate |
   +----------------+---------+-----------+--------------+
   | Research       |       5 | 333445555 | 22-MAY-78    |
   | Administration |       4 | 987654321 | 01-JAN-85    |
   | Headquarters   |       1 | 888665555 | 19-JUN-71    |
   +----------------+---------+-----------+--------------+

How to “combine” tuples from Employee and Department:

Only combine a tuple from employee with a tuple from department when their department numbers are EQUAL


(Remember we used the department number in Employee as a foreign key !!!

The  department number in Employee “refers” to one tuple in Department


Solution:

First find the department number of the “Research” department:

R1 = π dnumber ( σ (dname = ‘Research’) ( department ) )

Graphically:

Relational Algebra 关系代数-冯金伟博客园

Then we combine the department and employee through a “mindless” cartesian product operation:

R2 = R1 x employee

The result of this operation is “tagging” an extra column “dnumber” to the employee relation:

Relational Algebra 关系代数-冯金伟博客园

Now we can use a select operation to select out the “right tuples” :

Because dnumber is now an attribute of the (new) relation R2 !!!

π(FName, LName) (σ (dno = dnumber) ( R2 ) )

Graphically:

Relational Algebra 关系代数-冯金伟博客园





Efficiency consideration of queries

Correctness is the most important property of a query

The second most important property of a query is efficiency


There maybe multiple queries that retrieve the correct set of tuples (answer)

The difference of the queries may be their execution speed


Consider the following solution:

π(FName, LName) (σ (dno = dnumber and dname = ‘Research’) ( employee x department ) )

Although this query is correct , it will take longer time to process because of the larger amount of intermediate data that the query produces:

Relational Algebra 关系代数-冯金伟博客园

The cartesian product employee x department will produce huge number of tuples – many more than in the previous solution.





Join operations

The join operation (bow tie symbol) is equivalent to:

Relational Algebra 关系代数-冯金伟博客园

Example:

Retrieve all employees from the “Research” department

slower (but correct) solution:

Relational Algebra 关系代数-冯金伟博客园

It is slower because the join operation will result in more tuples as intermediate results


The faster version is:

Relational Algebra 关系代数-冯金伟博客园




The “theta join” operation

The condition in the join operation is of the form:

   Condition

1

 and Condition

2

 and ....       
   

Each condition is of the form “x θ y”, where x and y are attributes or constants

And θ is one of the following relational operators:

==
!=
<
<=
>=
>

A join operation with a “general” join condition (where θ can be any relational operation), is called a theta join





The “equi-join” operation

Often (almost always :-)), we use the equal relational operation in the join operation:

     Condition

1

 and Condition

2

 and ....     

where each condition is of the form:

    x == y        

where x and y are attributes or constants


Equi-join:

join operation where the the join condition contain only equality compare operators is called a equi-join





The “natural join” (*) operation

The natural join operation is an equi-join operation follow by a projection operation that removes one of the duplicate attributes.

Since the equi-join will for tuples when the attribute values are equal, all tuples that produced by an equi-join will have the same attribute value under the join condition attributes.

The natural join operation is denoted by a star (*):

    R1 * attr1,attr2 R2       


Outer-join

Outer-joins:

An outer join does not require each record in the two joined tables to have a matching values

The rusulting joined table will retains records from one or both tables even if no other matching value exists.


The 3 flavors of outer-joins:

Left outer-join: A ⟕ B

The result of the left outer join (or simply left joinA ⟕ B always contains all records of the “left” table (A)even if the join-condition does not find any matching record in the “right” table (B).


unmatched tuple from A is processed “normally” (discuss in the join operation)

An  matched tuple from A is combined with a row of NULL value (so that the resulting tuple have the same columns as an matched tuple)


Example:

     A:                 B:

        X     Y            U     V
     +-----+-----+      +-----+-----+                  
     |  1  |  7  |      |  1  |  8  |
     +-----+-----+      +-----+-----+
     |  2  |  5  |      |  2  |  3  |
     +-----+-----+      +-----+-----+
     |  3  |  4  |      |  4  |  9  |
     +-----+-----+      +-----+-----+

  A ⟕A.X=B.U B:

        X     Y     U     V
     +-----+-----+-----+-----+ 
     |  1  |  7  |  1  |  8  |
     +-----+-----+-----+-----+
     |  2  |  5  |  2  |  3  |
     +-----+-----+-----+-----+
     |  3  |  4  | NULL| NULL|
     +-----+-----+-----+-----+

Right outer-join: A ⟖ B

The result of the right outer join (or simply right joinA ⟖ B always contains all records of the “right” table (B)even if the join-condition does not find any matching record in the “left” table (A).


matched tuple from B is processed “normally” (discuss in the join operation)

An  unmatched tuple from B is combined with a row of NULL value (so that the resulting tuple have the same columns as an matched tuple)


Example:

     A:                 B:

        X     Y            U     V
     +-----+-----+      +-----+-----+                  
     |  1  |  7  |      |  1  |  8  |
     +-----+-----+      +-----+-----+
     |  2  |  5  |      |  2  |  3  |
     +-----+-----+      +-----+-----+
     |  3  |  4  |      |  4  |  9  |
     +-----+-----+      +-----+-----+

  A ⟖A.X=B.U B:

        X     Y     U     V
     +-----+-----+-----+-----+ 
     |  1  |  7  |  1  |  8  |
     +-----+-----+-----+-----+
     |  2  |  5  |  2  |  3  |
     +-----+-----+-----+-----+
     | NULL| NULL|  4  |  9  |
     +-----+-----+-----+-----+

Full outer-join: A ⟗ B

The result of the full outer join (or simply outer joinA ⟗ B always contains all records of both tableseven if the join-condition does not find any matching record


matched tuple is processed “normally” (discuss in the join operation)

An  unmatched tuple in either table is combined with a row of NULL value (so that the resulting tuple have the same columns as an matched tuple)


Example:

     A:                 B:

        X     Y            U     V
     +-----+-----+      +-----+-----+                  
     |  1  |  7  |      |  1  |  8  |
     +-----+-----+      +-----+-----+
     |  2  |  5  |      |  2  |  3  |
     +-----+-----+      +-----+-----+
     |  3  |  4  |      |  4  |  9  |
     +-----+-----+      +-----+-----+

  A ⟗A.X=B.U B:

        X     Y     U     V
     +-----+-----+-----+-----+ 
     |  1  |  7  |  1  |  8  |
     +-----+-----+-----+-----+
     |  2  |  5  |  2  |  3  |
     +-----+-----+-----+-----+
     |  3  |  4  | NULL| NULL|
     +-----+-----+-----+-----+
     | NULL| NULL|  4  |  9  |
     +-----+-----+-----+-----+

Note:

If attribute names are the same in both relations, you can omit the join condition in outer joins

In that case, you will also remove one column of the same name from the resulting table


Example:

     A:                 B:

        X     Y            X     V
     +-----+-----+      +-----+-----+                  
     |  1  |  7  |      |  1  |  8  |
     +-----+-----+      +-----+-----+
     |  2  |  5  |      |  2  |  3  |
     +-----+-----+      +-----+-----+
     |  3  |  2  |      |  4  |  9  |
     +-----+-----+      +-----+-----+

  A ⟗ B:

        X     Y     V
     +-----+-----+-----+ 
     |  1  |  7  |  8  |
     +-----+-----+-----+
     |  2  |  5  |  3  |
     +-----+-----+-----+
     |  3  |  2  | NULL|
     +-----+-----+-----+
     |  4  | NULL|  9  |
     +-----+-----+-----+

Teaching notes:

I personally dislike NULL values….


Therefore, I will avoid using outer joins in my solutions….





More examples….

Find fname and lname of all employees working in the “Research” department that earn more than $50,000

Relational Algebra 关系代数-冯金伟博客园

Find fname and lname of John Smith’s supervisor

Relational Algebra 关系代数-冯金伟博客园

Find fname and lname of all employees that have dependents

Relational Algebra 关系代数-冯金伟博客园

Find fname and lname of all employees that do not have dependents

This solution is wrong:

Relational Algebra 关系代数-冯金伟博客园

The following diagram shows how the query works on an example database, which illustrates why the query is wrong:

Relational Algebra 关系代数-冯金伟博客园

The correct solution is:

Relational Algebra 关系代数-冯金伟博客园

Help1 = set of SSN of employees with dependents
Help2 = set of SSN of employees without any dependents


Find fname and lname of employees who has more than one dependents of the same sex (i.e., 2 or more boys or 2 or more girls):

Relational Algebra 关系代数-冯金伟博客园

NOTE: so join conditions do not always have to be “equal” !!!





A comment…

Clearly, I can easily formulate a gad-sillion different queries in the next day or so, so:

DO NOT

 prepare for tests with 

memorizing
 facts – you gotta understand what’s going on and do the queries “from scratch”….

And for those students that have never had me before: I have a reputation to challenge student’s grey cells to work overtime…. so you should expect some very interesting queries…. and plenty of opportunity to practice your skill (in the homeworks and SQL projects).

http://www.mathcs.emory.edu/~cheung/Courses/377/Syllabus/4-RelAlg/intro.html