Table of Contents

Wikibooks SQL Exercise: Pieces and Proviers

Load the libraries

Utility Functions

Connect to postgres using psycopg2

Create Tables in the database

data: Pieces

data: Providers

data: Provides

data: combined data

merge two dataframes using pyspark

spark dataframe attributes

Create SQL table from spark dataframes

Global Temporary View

Temporary views in Spark SQL are session-scoped and will disappear if the session that creates it terminates. If you want to have a temporary view that is shared among all sessions and keep alive until the Spark application terminates, you can create a global temporary view. Global temporary view is tied to a system preserved database global_temp, and we must use the qualified name to refer it, e.g. SELECT * FROM global_temp.view1.

Exercise Questions

1. Select the name of all the pieces.

Select all the providers' data.

Obtain the average price of each piece (show only the piece code and the average price).

4. Obtain the names of all providers who supply piece 1.

5. Select the name of pieces provided by provider with code "HAL".

6. For each piece, find the most expensive offering of that piece and include the piece name, provider name, and price

(note that there could be two providers who supply the same piece at the most expensive price).

7. Add an entry to the database to indicate that "Skellington Supplies" (code "TNBC") will provide sprockets (code "1") for 7 cents each.

8. Increase all prices by one cent.

9. Update the database to reflect that "Susan Calvin Corp." (code "RBT") will not supply bolts (code 4).

10. Update the database to reflect that "Susan Calvin Corp." (code "RBT") will not supply any pieces (the provider should still remain in the database).

Time Taken