Table of Contents

Wikibooks SQL Exercise: Movie Theaters

Load the libraries

Connect to database using postgres

Utility Functions

Create Tables

Connect to postgres using psycopg2

Pandas and Spark dataframes

data: Movies

data: MovieTheaters

data: combined data

merge two dataframes using pyspark

spark dataframe attributes

Create SQL table from spark dataframes

Global Temporary View

Temporary views in Spark SQL are session-scoped and will disappear if the session that creates it terminates. If you want to have a temporary view that is shared among all sessions and keep alive until the Spark application terminates, you can create a global temporary view. Global temporary view is tied to a system preserved database global_temp, and we must use the qualified name to refer it, e.g. SELECT * FROM global_temp.view1.

Exercise Questions

1. Select the title of all movies.

2. Show all the distinct ratings in the database.

3. Show all unrated movies.

4. Select all movie theaters that are not currently showing a movie.

5. Select all data from all movie theaters and, additionally, the data from the movie that is being shown in the theater (if one is being shown).

6. Select all data from all movies and, if that movie is being shown in a theater, show the data from the theater.

7. Show the titles of movies not currently being shown in any theaters.

8. Add the unrated movie "One, Two, Three".

9. Set the rating of all unrated movies to "G".

10. Remove movie theaters projecting movies rated "NC-17".

Time Taken