A procedural language that tells the machine “How” to get what we want
Selection(select) Projection(get required column) Cartesian product(outer join) and others…
A non-procedural language that tells the machine “What” to get.
A basic SQL clause
SELECT, FROM WHERE
It usually looks like SELECT A1, A2…An FROM r1,…rm where P; SELECT is used to list desired attributes in the result A1…An are attributes FROM is a list of relations we need to access WHERE clause is a predicate(conditions) involving attributes of the relaion in the from clause.
SELECT doesn’t eliminate duplicates!
Does FROM internally calculates a Cartesian product?
for each tuple t1 in relation r1 ... for each tuple tm in relation rm comcatanate t1,....tm into a single tuple t, add t into the result relation
But we rarely used it. Instead we use where to restrict the combinations. If we have
SELECT a1, a2 FROM R1, R2 WHERE R1.a3 == R2.a3
We would only combine tuple in R1 with only those tuples in R2 that has the same a3. We only match tuples with the same a3.
Natural join will join two relations together. But will only concatenate tuples that have the same values for all attributes that appear in the schemas of both relations.
If R1 has a1, a2, a3 and R2 has a2, a3, a4. R1 NATURAL JOIN R2 will only concatenate tuple with identical a2 and a3. It will return relation with R3 = a1, a2, a3, a4.
Specifying what column should be equated
The above example shows that natural join could be error prone. To change it, use
R1 NATURAL JOIN R2 USING(a2)
This will only equate a2 as a condition for concatnating tuples in R1 and R2.
Reasons for rename
Identical attribute names from two relations
Arithmetic expression will result in attribute with no name
Just wanna change it
The syntax for as is old name as new name As can appear in both select ** and **from In select, we can rename attributes
SELECT name as person_name ...
We can also rename relations to simplify
SELECT A.name, B.id FROM banana_whatever_is_good as A, no_apple_is_the_best as B where A.id = B.id; Very helpful to compare tuple in the same relation. VERY USEFUL EG: Find all student whose credit hours completed is at least greater than at least one senior at our college. ```SQL SELECT DISTINCT S.name FROM student as S, student as C WHERE S.hour_complete > C.hour_complete and C.year = 'senior'
But C is not a copy of S. It is just an alisas.
Strings are in single quote.
% for matching substring
hello%' will match any string that starts with hello%hello$’ will match anystring that has hello
_(underscore) for matching any character
‘‘ will match any string with exact 3 chars ‘%%’ will match any string at least three chars long.
the result tuples will be in sorted order. (may it be numerical, alphabetical…) Can also be performed on multiple attributes
\* FROM student ORDER BY birthday desc, name asc; This is list all students info in descending birthday order. In case people have the same birthday, they will be ordered ascendingly based on name.
Combined tuples from two relations that have the same set of attributes
If want duplicate, you use UNION ALL
Find all tuples that exist in both relations.
NO duplicates as well.
Find all tuples that exist in first relation but not the second
Arithmetic with null
For + – * /, if one of the operand is null, the value is null.
If any comparison has one operand as NULL. Then the comparison is unknown. This creates a third logical value besides true and false.
In a where clause
true AND unknown = unknown. false AND unknown = false. true OR unknown = true, false OR unknown = unknown. NOT unknown = unknown
If in where a tuple is eval to false or unknown, it won’t be included.
They take a collection of values and return a single value. Such as AVG, MIN, MAX, SUM, COUNT
count unique student id. we need to use distinct
SELECT COUNT(DISTINCT id) FROM STUDENT
WE want to aggregate by group of sets of tuple. Such as calculating average GPA for different class man in college. We use group by. The attribute given in group by clause are used to form groups. Tuples with the same value on all attributes in the GROUP BY clause are placed in one group.
SELECT class_name, avg(GPA) as avg_GPA FROM student GROUP BY class
The attributes in SELECT must appear in GROUP BY
Having will state a condition on the group formed by GROUP BY. We used it after GROUP BY, when the groups have been formed. The predicate for HAVING is often an aggregate.
select name, avg (GPA) from student group by department having avg (GPA) > 3;
in WHERE clause
A common use of subqueries is to perform tests for set membership, make set comparisons, and determine set cardinality, by nesting subqueries in the where clause.
We can use IN to specify multiple value in WHERE clause. It is also a shorthand for multiple OR.
SELECT column_name(s) FROM table_name WHERE column_name IN (value1, value2, ...);
SELECT name FROM student WHERE name IN ('TOM', 'JOHN')
Find all the TOM and JOHN in the student body.
We can also put in subquery in IN.
SELECT column_name(s) FROM table_name WHERE column_name IN (SELECT STATEMENT);
If we want to find all the courses that are taught in 2009 and 2010. We can find it using intersect.
(SELECT course id FROM section WHERE semester = ’Fall’ AND year= 2009) INTERSECT (SELECT course id FROM section WHERE semester = ’Spring’ AND year= 2010);
(This example is from DB systems concept 6e)
We can also use IN
SELECT course id FROM section WHERE year = 2009 and course_id in (SELECT course id FROM section WHERE year = 2019)
For a query like “Find the names of all instructors whose salary is greater than at least one instructor in the Biology department.” We can use alias to refer to the same relation.
select distinct T.name from instructor as T, instructor as S where T.salary > S.salary and S.dept name = ’Biology’;
select name from instructor where salary > some (select salary from instructor where dept name = ’Biology’);
The subquery will generate all the salary of biology professor. Then the >SOME comparison is TRUE if the salary of that professor is greater than at least one member of all the salary values.
select name from instructor where salary > all (select salary from instructor where dept name = ’Biology’);
This query will find instructor’s name whose salary is greater than all salary in the bio department.
consider the query “Find the departments that have the highest average salary.”
select dept_name from prof_list group by dept_name having avg(salary) >=all (select avg(salary) from prof_list group by dept_name)
We first write the sub-query to find the avg salary of all the department. Then we will find select all the group that has their average salary higher than all department’s salary average.
Subqueries in FROM
A select-from-where expression in SQL returns a relation as a result. So we can insert select-from-where(a subquery) anywhere a relation can appear.
In an previous example, we tried to find the departments whose student has average gpa higher than 3.6. We used having on the group generated by GROUP BY.
select dept_name, avg(GPA) as avg_gpa from student_list group by dept_name having avg_gpa > 3.5
We can also express this with a sub-query in the from clause.
select dept_name, avg_gpa from (select dept_name, avg(GPA) as avg_gpa from dept_name group by dept_name) where avg_gpa > 3.6
In this query, the subquery first generate all dept with their average gpa.
Then we use a where clause to filter out needed tuples.
The having predicate is now in the where clause.
The attribute of the sub-query can be used in the outer query. As in the case for dept_name and avg_gpa.
This sub-query in from can be replaced by HAVING. But in some situations, HAVING can’t help. Notably aggregate functions like SUM that is not DESCRIPTIVE of the population.( I don’t even know how to describe it. Can someone help?)
EG: We want to find the max salary spending department.
select dept_name,max( total_salary) from (select dept_name, sum(salary) from dept group by dept_name) ##With Clause The with clause defines a temporary relation whose definition is available only to the query in which the WITH clause occurs. It gives a sub-query block a name. An example from geeksforgeeks: ```SQL WITH temporaryTable (averageValue) as (SELECT avg(Attr1) FROM Table), SELECT Attr1 FROM Table WHERE Table.Attr1 > temporaryTable.averageValue;
Modification of DB
We can only delete tuple. We can not delete particular attributes.
delete from relation where P
we will delete all tuples that satisfy the predicate.
We can make P the predicate very complex. Some example borrowed from the book:
delete all instructors who’s department is located in X building. instructor is stored in the instructor table and department is stored in the department table.
delete from instructor where dept_name in (select dept_name from department where dept_loc = 'X')
delete can only delete one relation at a time.
But we can refer to the relation under deletion in the where clause’s. EG: delete all students with below-average gpa.
delete from student where gpa < (select avg(gpa) from student);
We will first calculate the avg gpa. Then do the comparisons and delete all tuples that fail the test.
Either specify the tuple to be inserted or write a query that return a set of tuples to be inserted.
insert into course values (’CS-437’, ’Database Systems’, ’Comp. Sci.’, 4);
EG: add all students who have higher than 3.5 GPA to honor group
insert into honor select ID, name, GPA from student where GPA > 3.5
The select statement gives a set of tuple. Then it is insert into the relation.
Update some value in a tuple without changing all.
update student set grade = grade * 1.5 where grade < 60;
Give students who need help a curve.
update student set grade = grade * 1.5 where grade < (select avg(grade) from student);
We can also add cases for and if-elseif-else style update
update student set grade = case when grade < 60 then grade*1.5 when .... then ... else grade * 1.3 end