Follow up on the previous article about SQL CRUD https://toddypet.medium.com/sql-create-insert-f9afe47316b5. In this article, I will go through the command you need while selecting data with SQL. I will first introduce a strategy for data analysis and then go through the SQL command which maps with each step.
Data analysis strategy
Almost every single thing we do can be recorded as data, businesses can empower their business decision by analyzing these data. However, have you ever encounter difficulties that reading a big set of data but don’t know what to do? It didn't matter which programming language, tools you are using(SQL, Python, Excel). You can easily clean up your data by following below 4 steps.
- Select: Select the data you need. Either from one table or is a combination of multiple tables
- Dimension: Decide how you are going to slice or group the data.
- Metrics: Decide what you want to measure. What you are calculating. Remember, you can only do it on numeric data
- Function: What function do you want to apply to the data
Let’s see how this strategy applies with SQL
Select
First of all, you need to get the set of data you need. You can do it in below ways:
- Select from one table
SELECT [column_name]
FROM [table_name]
2. Select from two tables — JOIN (column level)
we use this when you have a foreign key to other tables and join on the key
SELECT [table_name1].[column_name], [table_name2].[column_name] ...
FROM [table_name1]
JOIN [table_name2]
ON [table_name1].[column_name] = [table_name2].[column_name]
- Left join: Contain all information from the left-hand table
- Right join: Contain all information from the right-hand table
- Inner join: Contain information that exists in both tables
- Self join: join with itself often used
3. Select to filter out the data you need, below are some of the combition you can use on WHERE command
WHERE + Like + wildcard
SELECT column1, column2, ...
FROM table_name
WHERE columnN LIKE pattern;
WHERE + IN
SELECT column_name(s)
FROM table_name
WHERE column_name IN (value1, value2, ...);SELECT column_name(s)
FROM table_name
WHERE column_name IN (SELECT STATEMENT);
WHERE + BETWEEN (often used on date)
SELECT column_name(s)
FROM table_name
WHERE column_name BETWEEN value1 AND value2;
WHERE [condition 1] AND [consition 2]. AND operator display the value that all condition is true
SELECT column1, column2, ...
FROM table_name
WHERE condition1 AND condition2 AND condition3 ...;
WHERE [condition 1] OR [consition 2]. OR operator display if any of the condition is TRUE
SELECT column1, column2, ...
FROM table_name
WHERE condition1 OR condition2 OR condition3 ...;
Dimension(optional)
This step is like pivot in excel, let’s say you have a table of drinks order, you would like to see that which
GROUP BY + Having (on the aggregate function)
NOTE: whatever is in group by need to show in select
Metrics
After Group by, what you want to measure. Metrics are quantitive measurements, are the thing can be count.
For example as below:
we first group by the product_id, for each product_id we try to find the sum for all quantity * unit price
SELECT product_id, SUM(quantity * unit_price) AS total_sales
FROM sales_data
GROUP BY product_id;
//result
| product_id | total_sales |
|------------|-------------|
| 101 | 50.00 |
| 102 | 20.00 |
| 103 | 30.00 |
Function
Once you have chosen the Measure column, then you can think about what you are going to do with that number, below are some common commands you will put on the column
- Create a new column derives from an existing record
SELECT [column_name] AS [new_column_name]
FROM [table_name]
2. Common commands that has been used: MIN, MAX, COUNT, AVG, SUM
Practice
There are lots of websites provides some interview question for you to practice your SQL. Below is some recommend website:
- https://www.hackerrank.com/domains/sql?filters%5Bstatus%5D%5B%5D=unsolved&badge_type=sql
- https://sqlzoo.net/wiki/SQL_Tutorial
Ref