How to Start with SQL for Education

Cameron Marsden and Jeff Pannell

As you come in

Introductions

We are Cameron Marsden and Jeff Pannell. We are senior data analysts at OSPI. We work on the Report Card. Combined we have 13 years of SQL experience.

Introduce yourself!

  • What is your name?

  • What do you do?

  • What is your experience with SQL?

  • How do you work with data?

    • What kind of data do you work with?

Agenda and Structure

  • Overview of what SQL is
    • SQL Flavors
    • Objects
    • Queries
  • Hands on exercises on how to do SQL coding
    • Basic Queries
    • Use Cases
  • There will be breaks!
  • Parking Lot
    • If you have questions, write it on a sticky note, and then put it on the big paper on the side of the room

Target Audience

  • New coders to late beginners.

If you find that you’re finishing the exercises early, try helping your neighbor (if asked). Trying to explain why what you did worked can help you learn.

Resources

If you want to learn more:

  • W3 Schools online tutorials (https://www.w3schools.com/)

  • Geeks For Geeks (https://www.geeksforgeeks.org)

  • Example Google Search “W3Schools SQL how to execute based off a condition.” Example

What is SQL?

  • SQL = Structured Query Language
  • You don’t need to know where the data is stored on the hard drive, you just need to know which table it’s in and how to describe it.
  • It comes in many flavors
    • DuckDB
    • PostgreSQL
    • Microsft SQL Server (Transact-SQL a.k.a. T-SQL)
    • SQLite
    • MySQL
    • Oracle SQL

SQL Definition (continued)

  • All of these flavors are based off the most recent version of an international standard. This makes it so that the vast majority of what you can do in one language, you can do in every language.
  • We will be focusing on the commonalities between the languages in this workshop.

SQL Use Cases

  • Centralized data storage solution

  • Data source for reports

  • Compute calculations on data and store for easy and convenient access

  • Find statistics on your data without external tools

  • Conduct data exploration to better understand your data

  • Other Advanced uses (Partitions, Conditional Statements, Running Tetris … seriously you can. But don’t)

Permissions

  • Permissions are how you are going to want to communicate with your IT department.
  • You will need permissions (at minimum) for the server and database you want to access.
  • You can ask for read only permissions. This will mean you can only see the data, but not change it.
  • If you don’t know what you need to ask for, try setting up a meeting to describe the outcomes you’re looking for and then determine what access you’d need to achieve this.

SQL Objects

  • Server
    • Contains at least one database and allows for data sequestering.
  • Database
    • Contains every other object.
  • Schema
    • Used for organizing tables and restricting access and permissions.
  • Tables

    • Contains the actual data (finally!).
  • Views

    • Like a Table but can be created on demand for the most recent data and can be made from finalized queries.
  • … and many more!

Exercise 1 : Setting up our SQL Environment

  • We will be using the DBeaver IDE
  • Our SQL server is built and will be run using DuckDB
  • Set up DBeaver and connect it to the DuckDB file

Steps

10:00
  1. Download DBeaver Community from: https://dbeaver.io/download/

  2. Open it on your computer.

  3. Only install it for your account OR just unzip it. 

  4. Download the wera-db.duckdb database file

  5. Open DBeaver.

  6. In the upper left corner, click on the blue plug with the green plus side.

  7. In the popup that appears, type in “duck”, and click on the only option. Then click “next”

  8. Click on “Open” and navigate to where you downloaded “wera-db.duckdb” and select that file. Then click Finish

  9. You may need to download the JDBC driver. If prompted, please do so.

SELECT Statements

  • SELECT Statements tell the query which column you want in your output.
  • You can either select some columns by naming them or all of them by using the ‘*’ character.
  • You can specify DISTINCT to give unique results from your query
  • Note that you can write SELECT or select or even sElEcT (please don’t). The convention is to use all capital letters for key words.
-- by the way, you can leave notes for yourself in the code by 
-- typing two hyphens next to each other.
SELECT *

SELECT SchoolYear,
       graduation

SELECT DISTINCT SchoolYear

FROM Statements

  • FROM statements tell the query which table to pull from
  • Your SELECT statement won’t run without a FROM statement, but must be written before the FROM statement.
/* Here's another way
to leave a 
comment over multiple lines */
 SELECT *
   FROM Table1
   
 SELECT SchoolYear
        ,graduationrate
   FROM graduation
   
 SELECT DISTINCT 
        SchoolYear
   FROM "wera-db".main.graduation

If you remember nothing else from this section

Important

Remember that you can get data from a table using a combination of a SELECT statement and a FROM clause.

Exercise 2 : Select and From (Part 1)

05:00
  1. Select all the rows and columns from the enrollment table.

Exercise 2 : Select and From (Part 2)

05:00
  1. Select all the rows and columns from the enrollment table.
  2. Which school years are included in the enrollment data set?

Exercise 2 : Select and From (Part 3)

05:00
  1. Select all the rows and columns from the enrollment table.
  2. Which school years are included in the enrollment data set?
  3. SELECT all districts and schools from enrollment without duplicates

Exercise 2 : Answers

-- Question 1
SELECT * 
  FROM "wera-db".main.enrollment;

-- Question 2
SELECT DISTINCT SchoolYear 
  FROM "wera-db".main.enrollment;
  
-- Question 3
SELECT DISTINCT 
       DistrictName
       ,SchoolName 
  FROM "wera-db".main.enrollment;

WHERE Statements

  • WHERE statements tell the query what filter conditions you want. WHERE clauses always come after the FROM Clause.

    • Number columns can be filtered with comparison operators =, >, <, !=
    SELECT *
      FROM "wera-db".main.graduation AS g -- The AS keyword is an alias. 
                                          -- You can rename something if you 
                                          -- don't want to have to type it 
                                          -- out all the time.
     WHERE g.GraduationRate < 0.1
    • Text Columns are based of characters and can also use the =, >, <, != operators. They can also use the LIKE keyword.
    SELECT *
      FROM "wera-db".main.enrollment AS e
     WHERE e.SchoolName LIKE '%Jefferson%'
  • The ‘%’ character is a wildcard. It means “anything can go where this character is”.

WHERE Statements Part 2

  • You can also use BETWEEENAND to find something in a range.
SELECT *
  FROM "wera-db".main.graduation AS g
 WHERE g.GraduationRate BETWEEN 0.1 AND 0.5
  • You can also use the AND keyword to use more than one filter. (and you can use more than one AND)
SELECT *
  FROM "wera-db".main.graduation AS g
 WHERE g.GraduationRate BETWEEN 0.1 AND 0.5
   AND g.StudentGroupType = 'All'

WHERE Statements Part 3

  • You can filter on whether something is NULL.
    • A NULL value says that there is no data present.
  • Filter by saying WHERE column IS NULL
SELECT DistrictName
FROM main.enrollment
WHERE SchoolName IS NULL
  • You can also filter something IS NOT NULL
SELECT DistrictName
FROM main.enrollment
WHERE SchoolName IS NOT NULL

If you remember nothing else from this section

Important

Remember that WHERE clauses FILTER data.

ORDER BY

  • Comes after where clause
  • Sorts the data
  • Can be either ASCending or DESCending
-- Most ways
  SELECT DISTINCT *
    FROM "wera-db".main.enrollment AS e
   WHERE e.SchoolName LIKE '%Jefferson%'
ORDER BY DistrictName DESC

Exercise 3 : WHERE things get interesting! (Part 1)

05:00
  1. Find all the schools in the Seattle School District.

Exercise 3 : WHERE things get interesting! (Part 2)

05:00
  1. Find all the schools in the Seattle School District.

  2. Find all schools in the Seattle School District that have “Elementary” in their name.

Exercise 3 : WHERE things get interesting! (Part 3)

05:00
  1. Find all the schools in the Seattle School District.

  2. Find all schools in the Seattle School District that have “Elementary” in their name.

  3. From that list of schools, make the largest student count be on top. (Extra credit! LIMIT the results to 5)

Exercise 3 : Answers

-- Question 1
SELECT DISTINCT 
       DistrictName 
       ,SchoolName 
  FROM "wera-db".main.enrollment
 WHERE DistrictName = 'Seattle School District No. 1';
 
-- Question 2
SELECT DISTINCT 
       DistrictName 
       ,SchoolName 
  FROM "wera-db".main.enrollment
 WHERE DistrictName = 'Seattle School District No. 1'
   AND SchoolName LIKE '%Elementary%';


-- Question 3
SELECT DiSTINCT 
       DistrictName,  
       SchoolName, 
       student_count
  FROM "wera-db".main.enrollment
 WHERE DistrictName = 'Seattle School District No. 1'
   AND SchoolName LIKE '%Elementary%'
   AND student_group = 'All Students'
   AND GradeLevel  = 'All Grades'
 ORDER BY student_count DESC;
 
-- Extra Credit! Use this as the final line on Question 3
LIMIT 5;

Inline Functions

  • Inline functions can be used in SELECT statements
  • Aggregation functions allow you to do one action to many rows at the same time.
  • Some of the common aggregation functions are:
    • SUM (adds together numbers of a numeric column over a set of grouped rows)
    • COUNT (counts the number of non-blank cells in a column over a set of grouped rows)
    • MIN (finds the smallest number of a numeric column over a set of grouped rows)
    • MAX (finds the largest number of a numeric column over a set of grouped rows)
    • AVG (finds the average of a numeric column over a set of grouped rows)
    • LEFT/RIGHT (extracts the left or right characters from text)
  • We recommend you use an alias to rename the column if you use one of these functions.

COUNT

  • You can count values in a specific column. This will count all non-NULL values in that column.
SELECT 
COUNT(SchoolName) as school_count 
--This will count all non-null values from SchoolName in the table
FROM main.enrollment
  • You can Count the number of all the rows with COUNT(*)
SELECT 
COUNT(*) as record_count 
--This will count all of the records in the table
FROM main.enrollment

GROUP BY

  • Allows for aggregation
  • You will need each column that is not aggregated to be in the group by statement.
  SELECT SchoolYear,
         DistrictName,
         SchoolName
    FROM main.enrollment
GROUP BY SchoolYear,
         DistrictName,
         SchoolName  

Aggregate Functions Examples

  SELECT DISTINCT 
         DistrictName
         ,SUM(student_count) AS student_count
    FROM "wera-db".main.enrollment
   WHERE OrganizationLevel = 'School' 
     AND student_group = 'All Students' 
     AND GradeLevel = 'All Grades'
GROUP BY DistrictName
ORDER BY record_count DESC;

If you remember nothing else from this section

Important

Remember that if you want to AGGREGATE, you will need to GROUP BY the other columns.

Exercise 4 (Part 1)

05:00
  1. Find the count of all districts with enrollments between 900 and 1100.

Exercise 4 (Part 2)

05:00
  1. Find the count of all districts with enrollments between 900 and 1100.
  2. Give the districts with counts of schools that have populations greater than 1000.

Exercise 4 (Part 3)

05:00
  1. Find the count of all districts with enrollments between 900 and 1100.
  2. Give the districts with counts of schools that have populations greater than 1000.
  3. Give the number of students by county as reported at the district level.

Exercise 4 : Answers

-- Question 1
SELECT COUNT(*) as DistrictCount
  FROM "wera-db".main.enrollment
 WHERE student_count BETWEEN 900 AND 1100 
   AND OrganizationLevel = 'District'  
   AND student_group = 'All Students'
   AND GradeLevel = 'All Grades';

-- Question 2
  SELECT DistrictName, COUNT(*) as BigSchoolCount
    FROM "wera-db".main.enrollment
   WHERE student_count > 1000 
     AND OrganizationLevel = 'School' 
     AND student_group = 'All Students' 
     AND GradeLevel = 'All Grades'
GROUP BY DistrictName 
ORDER BY BigSchoolCount DESC;

-- Question 3
  SELECT County
         ,SUM(student_count) as Student_Count 
    FROM "wera-db".main.enrollment
   WHERE OrganizationLevel = 'District'
     AND student_group = 'All Students' 
     AND GradeLevel = 'All Grades'
GROUP BY County 
ORDER BY Student_Count DESC;

BREAK

A Kit Kat bar

Image references may expire, unlike Kit Kat bars.

10:00

Back from Break

Review any Parking Lot Questions

A birds eye view of a mostly empty parking lot

JOINs

  • JOINs allow you to output data from multiple tables at the same time.
  • For the purposes of this workshop, we will focus on LEFT JOINs and INNER JOINs. There are also RIGHT JOINs and FULL OUTER JOINs.

LEFT JOIN                                                                         INNER JOIN

LEFT JOIN INNER JOIN

Join Sytax

  • Comes between FROM Clause and WHERE clause
  • Think of it as a way to expand what you’re querying from.
  • You need at least one column from each table to match ON.
SELECT *
  FROM table1
       LEFT JOIN table2 
              ON table1.SchoolName = table2.SchoolName

LEFT vs INNER

  • Use a LEFT JOIN when you want one of your tables to be a “base”.
  • That way, you can get all the information you know you want in your base table and then just add details.
  • Use an INNER JOIN when you want only those rows that are common to both.
  • For instance, if you did an INNER JOIN on enrollment and graduation, you would only have the data for schools with enrollment AND graduation data.

Exercise 5 (Part 1)

05:00
  1. Using your answer to 4.2 as a reference (you can use ours if you didn’t finish), join in to the public directory, find the administrator.
-- Example Syntax from 4.2

  SELECT DistrictName, COUNT(*) as BigSchoolCount
    FROM "wera-db".main.enrollment
   WHERE student_count > 1000 
     AND OrganizationLevel = 'School' 
     AND student_group = 'All Students' 
     AND GradeLevel = 'All Grades'
GROUP BY DistrictName 
ORDER BY BigSchoolCount DESC;

Exercise 5 (Part 2)

05:00
  1. Using your answer to 4.2 as a reference (you can use ours if you didn’t finish), join in to the public directory, find the administrator.
  2. Using your code from question 1, also find the top five schools by graduation rate that have an enrollment greater than 1000. Be sure to include the school year.

Exercise 5 (Part 3)

05:00
  1. Using your answer to 4.2 as a reference (you can use ours if you didn’t finish), join in to the public directory, find the administrator.
  2. Using your code from question 1, also find the top five schools by graduation rate that have an enrollment greater than 1000. Be sure to include the school year.
  3. Find schools that have enrollment but no graduation data. (Fun fact! You can’t use = to find NULL values. There must be some other way to find out what IS NULL. Hmmm…)

Exercise 5: Answers

-- Question 1
  SELECT E.DistrictName,
         COUNT(*) as BigSchoolCount,
         PD.Administrator,
    FROM "wera-db".main.enrollment E
         LEFT JOIN "wera-db".main.public_directory PD
                ON E.DistrictOrganizationId = PD.DistrictOrganizationId
                   AND PD.OrganizationLevel = 'District'
   WHERE student_count > 1000 AND
         E.OrganizationLevel = 'School' AND
         student_group = 'All Students' AND
         GradeLevel = 'All Grades'
GROUP BY E.DistrictName, PD.Administrator
ORDER BY BigSchoolCount DESC;
   
-- Question 2   
  SELECT DISTINCT
         e.SchoolYear
         ,e.SchoolName
         ,g.GraduationRate 
    FROM "wera-db".main.enrollment e
         LEFT JOIN "wera-db".main.graduation g 
                ON g.SchoolOrganizationId = e.SchoolOrganizationID 
   WHERE e.student_group = 'All Students'
     AND e.GradeLevel  = 'All Grades'
     AND e.SchoolName != 'District Total'
     AND g.GraduationRate IS NOT NULL
     AND g.Cohort  = 'Four Year'
     AND g.StudentGroupType  LIKE '%All%'
     AND e.student_count > 1000
ORDER BY g.GraduationRate DESC    

-- Question 3   

SELECT DISTINCT
         e.SchoolYear
         ,e.SchoolName
         ,g.GraduationRate 
    FROM "wera-db".main.enrollment e
         LEFT JOIN "wera-db".main.graduation g 
                ON g.SchoolOrganizationId = e.SchoolOrganizationID
   WHERE e.student_group = 'All Students'
     AND e.GradeLevel  = 'All Grades'
     AND e.SchoolName != 'District Total'    
     AND g.GraduationRate IS NULL;

Common Table Expressions (CTEs) (WITH Keyword)

  • This allows for doing queries off to the side, which is helpful when you want to do multiple things at once.
  • Goes before the SELECT statement.
  • Example (you can’t call an aggregate function inside of an aggregate function)
WITH step1 AS (
SELECT SchoolYear, 
       SUM(graduate)/SUM(finalcohort) AS grad_rate,
       SUM(continuing)/SUM(finalcohort) AS continuing_rate
  FROM "wera-db".main.graduation
  WHERE studentgrouptype = 'All'
    AND cohort = 'Four Year'
    AND organizationlevel = 'State'
GROUP BY SchoolYear)
SELECT SUM(grad_rate + continuing_rate) AS combined_rate
  FROM step1

Exercise 6

15:00
  1. Take the first query from Exercise 5, remove the order by clause, put it into a CTE, and then select everything from it.

Example 6: Answers

-- Question 1
WITH big_schools AS ( SELECT E.DistrictName,
         COUNT(*) as BigSchoolCount,
         PD.Administrator,
    FROM "wera-db".main.enrollment E
         LEFT JOIN "wera-db".main.public_directory PD
                ON E.DistrictOrganizationId = PD.DistrictOrganizationId
                   AND PD.OrganizationLevel = 'District'
   WHERE student_count > 1000 AND
         E.OrganizationLevel = 'School' AND
         student_group = 'All Students' AND
         GradeLevel = 'All Grades'
GROUP BY E.DistrictName, PD.Administrator)
SELECT *
FROM big_schools

Resources

If you want to learn more:

  • W3 Schools online tutorials (https://www.w3schools.com/)

  • Geeks For Geeks (https://www.geeksforgeeks.org)

  • Example Google Search “W3Schools SQL how to execute based off a condition.” Example

  • What data questions are do you encounter in your job?

  • How do you think you can answer these questions, or questions like these using SQL now?

  • For example, what types of functions would you use to:

    • Count how many students are regular attenders
    • Calculate discipline rates
    • Calculate graduation rates
    • Any others?

Thank you for attending!

Here is the link to a form where you can tell us what we did well and how we can improve. https://forms.office.com/r/gDm9wXPyrF

Advanced topics -

Having

  • Similar to WHERE statements, but they are used if you are filtering based off an aggregate function.

  • Comes after the GROUP BY clause and before the ORDER BY clause.

  • If you need to filter on a row-level criteria, use the WHERE clause.

SELECT COUNT(*) as DistrictCount
  FROM "wera-db".main.enrollment
 WHERE student_count BETWEEN 900 AND 1100
   AND OrganizationLevel = 'District'
   AND student_group = 'All Students'
   AND GradeLevel = 'All Grades'
HAVING COUNT(*) > 1;

Advanced Topics -

Which Order do these go in?

The list below contains all the possible keywords. We will get to all of them in this workshop.

  WITH cte_name AS ()
  SELECT DISTINCT
         column
         ,AGG_FUNC(column_or_expression), …
    FROM mytable
         JOIN another_table
           ON mytable.column = another_table.column
   WHERE constraint_expression
GROUP BY column
  HAVING constraint_expression
ORDER BY column ASC/DESC
   LIMIT count;

Advanced Topics -

Limit the number of results

  • This is one of those areas where SQL languages will differ.

  • Most languages will say “Put the LIMIT keyword after everything else and put the number of rows you want.

  • Microsoft SQL Products will have you put the TOP keyword after the SELECT Statement.

-- Most ways
SELECT DISTINCT *
  FROM "wera-db".main.enrollment AS e
 WHERE e.SchoolName LIKE '%Jefferson%'
 LIMIT 10

-- Microsoft way
SELECT DISTINCT TOP 10 *
  FROM [wera-db].main.enrollment AS e
 WHERE e.SchoolName LIKE '%Jefferson%'