Exam 1 Key and Rubric

Report 11 Downloads 84 Views
Exam 1 CS x265

Spring 2017

Name: _____________________KEY_________________________

I will not use notes, other exams, or any source other than my own brain on this exam: __________________________________

To students: To request a regrade or to otherwise appeal your score, submit a one paragraph explanation of why you think the grade was “in error” and submit your explanation/request together with your exam to Doug after class. To graders: No question answer should receive less than 0, regardless of what rubric otherwise suggests. In general, be forgiving of minor syntax errors, but ask if there is any question

1

Exam 1 CS 265

Spring 2017

Name: _____________________KEY_________________________

I will not use notes, other exams, or any source other than my own brain on this exam: __________________________________

1. (2 pts) In the MoneyThink Mobile app “spend or save” challenge, choose one of the following groups of relations as most reasonable for inclusion in the database, with potentially still other relations, to support this app.

2 pts (a) Decision(PersonId, ItemId, ItemImage, DateOfDecision, Buy?), Friend(Person1Id, Person2Id) (b) Decision(PersonId, StockId, DateOfDecision, StockPrice, Buy?), Broker(BrokerId, ClientId)

1 pt (c) Decision(PersonId, ItemId, ItemImage, DateOfDecision, Buy?), Friend(Person1Id, Person2Id) i.e., no social information recorded (d) Decision(PersonId, StockId, Buy?), Broker(BrokerId, ClientId) (e) None of the above are reasonable choices. Explain: ______________________________________________

1 minute

2

2. Consider the following relational schema (underlined variables in each schema make up its key) Supplier ( sid: integer, sname: string, address: string, city: string) Catalog (sid: integer, pid: integer, cost: real) Part (pid: integer, pname: string, color: string) a) (2 pts) Give a relational algebra expression that implements the query specified as Find the pids and pnames of any green part. Project(pid, pname) (select(color = ‘green’) Part)

1 pt for a proper selec4on (sigma), including the base table (Part) and condi4on 1 point for a proper projec4on (pi)

b) (3 pts) Give a relational algebra expression that implements the query specified as Find the snames of Suppliers that supply any part that costs less than $5 (or just ‘5’), and list the pid along with the supplier’s name. Project(sname, pid) ( select(cost < 5) (Supplier njoin Catalog)) njoin means natural join

1 pt for a proper selec4on (sigma), including condi4on 1 point for a proper join, including both tables 1 point for proper projec4on

π (σ ( Sname, pid cost= 45 AND C.age = 100 ) AS Temp 10

7. (5 points) A health facility wants to measure the AVERAGE (AVG) spread in the weight of clients between the ages of 45 and 60, inclusive, who have at least 100 weight entries in the facility’s DB. The spread of a client’s weight is the maximum weight on record in the facility’s DB minus the minimum weight on record in the DB (regardless of age at the time the weight entries were made). The DB contains two tables representing these two relations (among others). Client (cid, name, age, address, phone, ...) DailyRecord(cid, date, weight, ...) The attributes making up the key of each table (relation) are bold-face and underlined. In the actual table definitions, the attribute of age is declared as NOT NULL (i.e., it will never be NULL), and weight can be NULL. Complete the following skeletal query to compute the AVERAGE (AVG) spread in the weight of clients between the ages of 45 and 60, inclusive, with at least 100 weight entries, by filling in the blanks.

1 pt

10 minutes

SELECT AVG(Temp.Flux) FROM (SELECT MAX(D.weight) – MIN(D.weight) AS Flux 1 pt FROM Client C, DailyRecord D Ok if weight NOT NULL check is missing, but not next 0me 1 pt for join condi0on WHERE C.id = D.id AND D.weight IS NOT NULL AND C.age >= 45 AND C.age = 100 ) AS Temp 11

7. (5 points) A health facility wants to measure the AVERAGE (AVG) spread in the weight of clients between the ages of 45 and 60, inclusive, who have at least 100 weight entries in the facility’s DB. The spread of a client’s weight is the maximum weight on record in the facility’s DB minus the minimum weight on record in the DB (regardless of age at the time the weight entries were made). The DB contains two tables representing these two relations (among others). Client (cid, name, age, address, phone, ...) DailyRecord(cid, date, weight, ...) The attributes making up the key of each table (relation) are bold-face and underlined. In the actual table definitions, the attribute of age is declared as NOT NULL (i.e., it will never be NULL), and weight can be NULL. Complete the following skeletal query to compute the AVERAGE (AVG) spread in the weight of clients between the ages of 45 and 60, inclusive, with at least 100 weight entries, by filling in the blanks.

Alternate (to previous page) SELECT AVG(Temp.MaxWeight) – AVG(Temp.MinWeight)(or AVG(Temp.MaxWeight-Temp.MinWeight)) FROM (SELECT MAX(D.weight) AS MaxWeight, MIN(D.weight) AS MinWeight FROM Client C, DailyRecord D WHERE C.id = D.id AND D.weight IS NOT NULL AND C.age >= 45 AND C.age = 100 ) AS Temp 12

8. Using the same relational schemas, write SQL queries for the following Customer ( SSN: integer, name: string, address: string, city: string) Account (SSN: integer, AccntNo: integer) Transaction (AccntNo: integer, ProductId: integer, date: string, quantity: integer) Product ( ProductId: integer, ProductName: string, cost: real) a) (3 pts) For each product, list the item’s ProductId, ProductName, and total (sum) of all item quantities purchased by Nashville customers. SELECT P.ProductId, P.ProductName, SUM(T.quantity) FROM Customers C, Accounts A, Transactions T, Products P WHERE C.city = ‘Nashville’ AND C.SSN = A.SSN AND A.AcctNo = T.AcctNo AND T.ProductId = P.ProductId GROUP BY P.ProductId, P.ProductName Don’t take off this time for missing ProductName in Group By, but -1 for each ommission from this standard

b) (2 pts) For EACH item, list the item’s ProductId, ProductName, and total (Sum) of all item quantities purchased by Nashville customers, BUT ONLY for those product’s with an individual cost > 50 and having a total purchased quantity of greater than 100. You may answer by stating the addition(s) to your query in (a) if you wish) SELECT P.ProductId, P.ProductName, SUM(T.quantity) FROM Customers C, Accounts A, Transactions T, Products P WHERE C.city = ‘Nashville’ AND C.SSN = A.SSN AND A.AcctNo = T.AcctNo AND T.ProductId = P.ProductId AND P.cost > 50 GROUP BY P.ProductId, P.ProductName 1 point for each addition in red (to whatever query HAVING SUM(T.quantity) > 100 they had from part (a), even if part (a) was incorrect

Could also use the answer to (a) as a nested query in the FROM clause, creating a temporary Table, with SUM(T.quantity) AS Total, for B, and checking whether Temp.Total > 100 in the WHERE clause of the outer query of B. 13

9. (5 pts) (Inspired by a Widom practice exercise) Consider a database of researchers and their works (like ResearchGate.com), with relational schema Researcher (ID, Name, Institution ) Collaborator (ID1, ID2 ) /* ID1 is a collaborator with ID2. Collaboration is symmetric, so if (abc, wxy) is in the Collaborator table, so is (wxy, abc) */ Follows (ID1, ID2 ) /* ID1 follows the posts of ID2, where Follows is not symmetric, so if (abc, wxy) is in Follows table, there is no guarantee that (wxy, abc) is also present. */ Consider finding all those researchers for whom all of those they Follow are at different institutions than themselves. Return the names and institutions of all such researchers. One way to write this query is SELECT R.Name, R.Institution 2 points if literal replacement of NOT EXISTS by NOT IN FROM Researcher R WHERE NOT EXISTS (SELECT * FROM Follows F, Researcher R2 WHERE R.ID = F.ID1 AND F.ID2 = R2.ID AND R2.Institution = R.Institution) Write the query that satisfies the same English specification using the “NOT IN” phrase rather than “NOT EXISTS”: All researchers who don’t follow anyone from the same institution The set of all researchers following someone who is at the same institution of R

SELECT R.Name, R.Institution FROM Researcher R WHERE R.ID NOT IN (SELECT F.ID1 FROM Follows F, Researcher R2 F.ID2 = R2.ID AND R.Institution = R2.Institution)

-1 point if Name used instead of ID; 2 points only if they have * in inner SELECT

14

9. (5 pts) (Inspired by a Widom practice exercise) Consider a database of researchers and their works (like ResearchGate.com), with relational schema Researcher (ID, Name, Institution ) Collaborator (ID1, ID2 ) /* ID1 is a collaborator with ID2. Collaboration is symmetric, so if (abc, wxy) is in the Collaborator table, so is (wxy, abc) */ Follows (ID1, ID2 ) /* ID1 follows the posts of ID2, where Follows is not symmetric, so if (abc, wxy) is in Follows table, there is no guarantee that (wxy, abc) is also present. */ Consider finding all those researchers for whom all of those they Follow are at different institutions than themselves. Return the names and institutions of all such researchers. One way to write this query is SELECT R.Name, R.Institution FROM Researcher R WHERE NOT EXISTS (SELECT * FROM Follows F, Researcher R2 WHERE R.ID = F.ID1 AND F.ID2 = R2.ID AND R2.Institution = R.Institution) Write the query that satisfies the same English specification using the “NOT IN” phrase rather than “NOT EXISTS”:

Basically same as previous, but for each researcher being considered in turn; will return empty set if R.ID isn’t following anyone at same institution

SELECT R.Name, R.Institution FROM Researcher R WHERE R.ID NOT IN (SELECT F.ID1 /* could be R.ID here */ FROM Follows F, Researcher R2 WHERE R.ID = F.ID1 AND F.ID2 = R2.ID AND R.Institution = R2.Institution)

This addition to previouspage query is what causes nested query to evaluate to NULL for Rs who arent following anyone at same institution. This isn’t needed, but full credit. 15

9. (5 pts) (Inspired by a Widom practice exercise) Consider a database of researchers and their works (like ResearchGate.com), with relational schema Researcher (ID, Name, Institution ) Collaborator (ID1, ID2 ) /* ID1 is a collaborator with ID2. Collaboration is symmetric, so if (abc, wxy) is in the Collaborator table, so is (wxy, abc) */ Follows (ID1, ID2 ) /* ID1 follows the posts of ID2, where Follows is not symmetric, so if (abc, wxy) is in Follows table, there is no guarantee that (wxy, abc) is also present. */ Consider finding all those researchers for whom all of those they Follow are at different institutions than themselves. Return the names and institutions of all such researchers. One way to write this query is SELECT R.Name, R.Institution FROM Researcher R WHERE NOT EXISTS (SELECT * FROM Follows F, Researcher R2 WHERE R.ID = F.ID1 AND F.ID2 = R2.ID AND R2.Institution = R.Institution) Write the query that satisfies the same English specification using the “NOT IN” phrase rather than “NOT EXISTS”:

IDs of all researchers who follow anyone from the same institution SELECT R.Name, R.Institution FROM Researcher R WHERE R.ID NOT IN (SELECT F.ID1 /* could be R1.ID here */ FROM Follows F, Researcher R1, Researcher R2 WHERE R1.ID = F.ID1 AND F.ID2 = R2.ID AND R1.Institution = R2.Institution)

A version that doesn’t use a correlated query – full credit 16

9. (5 pts) (Inspired by a Widom practice exercise) Consider a database of researchers and their works (like ResearchGate.com), with relational schema Researcher (ID, Name, Institution ) Collaborator (ID1, ID2 ) /* ID1 is a collaborator with ID2. Collaboration is symmetric, so if (abc, wxy) is in the Collaborator table, so is (wxy, abc) */ Follows (ID1, ID2 ) /* ID1 follows the posts of ID2, where Follows is not symmetric, so if (abc, wxy) is in Follows table, there is no guarantee that (wxy, abc) is also present. */ Consider finding all those researchers for whom all of those they Follow are at different institutions than themselves. Return the names and institutions of all such researchers. One way to write this query is SELECT R.Name, R.Institution FROM Researcher R WHERE NOT EXISTS (SELECT * FROM Follows F, Researcher R2 WHERE R.ID = F.ID1 AND F.ID2 = R2.ID AND R2.Institution = R.Institution) Write the query that satisfies the same English specification using the “NOT IN” phrase rather than “NOT EXISTS”:

Institutions of all researchers who are followed by a given R

5 pts

SELECT R.Name, R.Institution FROM Researcher R WHERE R.Institution NOT IN (SELECT R2.Institution FROM Follows F, Researcher R2 WHERE R.ID = F.ID1 AND R2.ID = F.ID2)

Correct answers also stem from variants on a different perspective, exemplified at left

All researchers who don’t follow anyone from the same institution 17

10. (5 pts) This question uses the same relational schema as the last problem. Researcher (ID, Name, Institution ) Collaborator (ID1, ID2 ) Follows (ID1, ID2 ) What is the average number of Collaborators per Researcher? Importantly, compute this average over only those Researchers who have more than 1 Collaborator. (Your result should be just one number.) . SELECT AVG(Temp.Cnt) Most will have something very much like this (5 pts) FROM (SELECT COUNT(*) AS Cnt -2 for each main missing element FROM Collaborator C GROUP BY C.ID1 /* or C.ID2 because Collaborator is symmetric */ HAVING COUNT(*) > 1) AS Temp Don’t need rename as Temp in any of these Some will have something more complicated, like this, but still (5 pts) SELECT AVG(Temp.Cnt) FROM (SELECT COUNT(*) AS Cnt FROM Researcher R, Collaborator C WHERE R.ID = C.ID1 /* or C.ID2 because Collaborator is symmetric */ GROUP BY C.ID1 /* or R.ID */ HAVING COUNT(*) > 1) AS Temp SELECT AVG(Temp.Cnt) FROM (SELECT COUNT(*) AS Cnt FROM Researcher R, Collaborator C WHERE R.ID = C.ID1 GROUP BY C.ID1) AS Temp WHERE Temp.Cnt > 1

18

11. (5 pts) Consider the relational schema HRel (ID, name)

+2 for one of correct answers selected, and 1 point each addi0onal op0on

10 minutes

Circle all queries below that return the tuples of HRel with the top 5 values of ID (a) SELECT H.ID, H.name FROM HRel H WHERE NOT EXISTS (SELECT * FROM HRel H1, HRel H2 WHERE H.ID = H1.ID AND H1.ID < H2.ID GROUP BY H1.ID HAVING COUNT(*) >= 5) ORDER BY ID DESC

(b) SELECT H.ID, H.name FROM HRel H There are less than WHERE (SELECT COUNT (*) 5 IDs greater than FROM HRel H2 WHERE H.ID < H2.ID) < 5 H.ID ORDER BY H.ID DESC

(c) SELECT H.ID, H.name FROM HRel H EXCEPT SELECT H.ID, H.name FROM HRel H WHERE (SELECT COUNT (*) FROM HRel H2 WHERE H.ID < H2.ID) >= 5 ORDER BY H.ID DESC

(d) SELECT H.ID, H.name FROM HRel H EXCEPT SELECT H1.ID, H1.name FROM HRel H1, HRel H2 WHERE H1.ID < H2.ID GROUP BY H1.ID, H1.name HAVING COUNT(*)+1 > 5 ORDER BY ID DESC

There are at least 5 IDs greater than H.ID

(e) None of the above

0 pts if op0on (e) selected, whether alone or in conjunc0on with any other answer (though unlikely anyone did that)

19

Question 11 comments. I had intended that ID was the key of HRel, and I think most everyone assumed this, even though I didn’t bold face and underline it. So, here appear to be almost no one affected by that omission, but the following comments are worth making and understanding.

In query (b), rows of HRel that would pass the WHERE test are 11, 8, 6, and all the corresponding rows would be returned (6 of them).

Even without specifying the primary key, these answers will be right using what I think is a reasonable interpretation (but not the only one).

SELECT H.ID, H.name FROM HRel H EXCEPT SELECT H1.ID, H1.name FROM HRel H1, HRel H2 WHERE H1.ID < H2.ID GROUP BY H1.ID, H1.name HAVING COUNT(*)+1 > 5

Lets say that we have repeats of ID, supposing its not the key. Consider this ID,name larger IDs 11,a 8,b 8,c 8,d 6,e 6,f 3,g 3,h 3,i 2,j 1,k

Place aka COUNT(*)+1

< 1 < 11 2 < 11 2 < 11 2 < 8,8,8,11 5 < 8,8,8,11 5 < 6,6,8,8,8,11 7 < 6,6,8,8,8,11 7 < 6,6,8,8,8,11 7 < 3,3,3,6,6,8,8,8,11 10 < 2,3,3,3,6,6,8,8,8,11 11

You’ve probably seen rankings (e.g., of colleges), in which ties receive that same number (e.g., ties for 2nd), but then the ordering picks up after the last tied value.

Consider (d), which I think is the next most intuitive

(will include groups (3,g), (3,h)…(1,k))

ORDER BY ID DESC Again, the right answer is returned under a reasonable interpretation that is often used. I intended ID to be the key, which would have resulted in much less potential for misunderstanding, but if you were one of the very few who didn’t assume that ID was key (and you didn’t assume the interpretation I just laid out), then see me. In particular, I think that a very few might have interpreted the question as asking to either (i) return exactly 5 rows with highest Ids, or (ii) return rows with the highest ID values regardless of ties, which would have been rows with values of 11 down to 2 in the example above.

Look at the simplest of the queries (b) SELECT H.ID, H.name FROM HRel H WHERE (SELECT COUNT (*) FROM HRel H2 WHERE H.ID < H2.ID) < 5 ORDER BY H.ID DESC

20