Portfolio Artur Skrzeta

Learn SQL Application

Intro

SQL as declarative lanuguage is used to tell the machine what we want to get and machine needs to figure out the computation tasks to provide quieried result. In this post I put my notes and examples that helps me to understand different aspects of SQL. There is also the application for learning SQL demonstrated in the Demo section and its source code linked below.

In this section I describe following concepts:

Database Schema

Primary and foreign key

Char vs varchar vs nvarchar

Unique records without DISTINCT

Common Table Expression

Also some SQL syntax elements:

Terminology and general aspects

Database Schema:
- the strucutre around the data and corresponding data relationships,
- we can have two different approaches on data schema:

SCHEMA ON READ	SCHEMA ON WRITE
In an unstructured data store we apply a schema only when the data gets read.	Forces a structure as a condition before data gets written
Data lives in unordered state - and we apply the structure to the query code.

- a method of how schema gets defined impacts the further behaviour:

OLTP	OLAP
When a series of table connected with PKs then db is designed for reading and mostly for writing single record at a time.	if there is a central table connected to keys of surrounding tbales then it's designed to keep read output highly efficient.
For transactional databs like Salesforce or any ERP.	For high scale information delivery.

- DB schema has a big impact on quering time.
- Table schema is a different concept on the way of building the fileds and establishing data types for each one.

Primary key - a column or columns combination that idetifies a specific row uniquely. It implies NOT NULL constraint and hast to be UNIQUE. In case of multiple columns as piramry key, we deal with so-called composite key that can encasulate even up to 32 columns.

Foreign key - a column that contains a primary key from another table's field. This way, we build the relationship between tables.

customers
- customer_id: pk integer
- phone: string
- email: string

orders
- order_id: integer
- status: string
- customer_id: fk integer

- In above example pk stands for primary key and fk stands for foreign key.
- In one to many relationship: one customer can have many orders, so we can find one customer multiple times in the orders table.
- As per one to many relationship, one order cannot have multiple customers.
orders table

order_id	status	customer_id
1	completed	1
2	completed	1
3	cancelled	3
4	pending	1

Db engine executes query in the sequence as below:
1. FROM, JOIN - it determines data table(s).
2. WHERE - records filtering.
3. GROUP BY - records grouping.
4. HAVING - groups filtering.
5. SELECT - columns filtering.
6. ORDER BY - results sorting.
There are different types of SQL queries:
1. DQL - Data Query Language:
     - SELECT.
2. DML – Data Manipulation Language:
     - INSERT, UPDATE, DELETE.
3. DDL – Data Definition Language:
     - CREATE, ALTER, DROP, RENAME, TRUNCATE.
4. DCL – Data Control Language:
     - GRANT, REVOKE.
5. TCL - Transaction Control Language:
     - COMMIT, ROLLBACK, SAVEPOINT.
Constraints - we put them as the limits on the data types of the table. It can be specified while creating or altering the table statement. Here they are:
- NOT NULL - prevents from keeping nulls in the column.
- CHECK - limits the value range that can placed in the column for instance:
Age INTEGER, CHECK (AGE>=18).
- DEFAULT - provides default value to all records unless other value specified.
- UNIQUE - prevents from storing duplicates within a column.
- PRIMARY KEY - sets pk on a column.
- FOREIGN KEY - sets fk indicating pk of a table in the relationship.
Data integrity - defines the accuracy as well as the consistency of the data stored in a database. Once data integrity ensured then any addition or deletion of data from the table will not create any mismatch in the relationship of the tables.
Stored procedure - saved SQL code so that it can be reused anytime with or without parameters:
- stored procedure syntax:
```
CREATE PROCEDURE procedure_name
AS
sql_statement
GO;
```
- executing stored procedure:
```
EXEC procedure_name
```
- here how it looks like with parameters:
```
CREATE PROCEDURE SelectAllEmployees @City nvarchar(30), @PostalCode nvarchar(10)
AS
SELECT * FROM employees WHERE City = @City AND PostalCode = @PostalCode
GO;
```
- calling it with parameters:
```
EXEC SelectAllCustomers @City = 'Cracow', @PostalCode = '30-708';
```
- They can be used as a modular programming which means creating once, storing and calling many times whenever required.
- This supports faster overall query performance.
- There is also such thing as recursive stored procedure which calls itself until it reaches some boundary condition. It allows to reuse the same code any number of times.

Stored procedure vs function:

Stored procedure	Function
Pre-compiled objects which are compiled for the first time and its compiled format is saved.	Gets compiled and executed every time whenever it is called.
Returning a value is optional.	The function must return a value.
Can have input or output parameters.	Can have only input parameters.
Procedures cannot be called from a Function.	Functions can be called from Procedure.
Allows SELECT as well as DML(INSERT/UPDATE/DELETE) statements.	Function allows only SELECT statement in it.
Procedures cannot be utilized in a SELECT statement.	Function can be embedded in a SELECT statement.
Stored Procedures cannot be used in the WHERE/HAVING/SELECT line.	Functions can be used in the WHERE/HAVING/SELECT line.
An exception can be handled by try-catch block.	Try-catch block cannot be used in a Function.
We can use Transactions in Procedure.	We can't use Transactions in Function.
	Functions return tables that can be treated as another rowset. This can be used in JOINs with other tables.
	Functions cannot modify the data received as parameters.

Char vs varchar vs nvarchar

char:
- string storage with fixed size,
- the max number of characters is 255,
- because of fixed size, it uses static memoru allocation,
- much faster than varchar.
varchar:
- string storage with variable size,
- the max number of character can reach up to 65535 characters,
- because of variable size, it uses dynamic memory allocation,
- much slower than char.
nvarchar:
- while varchar stores data in non-Unicode encoding, nvarchar stores data in a Unicode encoding.

Normalization

Normalization is the db designing process where we aim for minimizing any data redundancy. Redundancy of data means there are multiple copies of the same information spread over multiple locations in the same database.
There are 3 forms of the normalizaton where each consecutive normal form depends on the previous one:
✓ First normal form (1 NF),
✓ Second normal form (2 NF),
✓ Third normal form (3 NF),
where db is considered the third normal form if it meets the requirements of the first 3 normal forms.

In 1 NF wa aim to keep only atomic information. Here is the example which violates the 1NF as it contains more than one value for the department column:

Assignment table

assign_id	emp_name	emp_age	department
1	Artur	20	IT, DevOps
2	John	40	IT, DevOps, DataLab

- Definitely we should be avoiding breaking down department column into a set of new columns like department1, department2, department3. This is what we call repeating groups and it's one of the biggest design error.
- In order to reach 1 NF, we need to split the table into two tables so that each record can be unique and each cell contains an atomic, undividable value:

Departments table and assignment table

dep_id	dep_name
1	IT
2	DevOps
3	DataLab

assign_id	emp_name	emp_age	dep_id
1	Artur	20	1
2	Artur	20	2
3	John	40	1
4	John	40	2
5	John	40	3

- We can move on and make the table in 2 NF and 3 NF.
- In 2 NF, we care about having single column primary key. In our example, the assignment table has a single-column primary key i.e. assign_id.
- In 3 NF, we get rid of any dependencies where non-key columns may depend on non-key columns. Dependecy can exist only on primary key column. In order to achieve it, we need to take the non-key column emp_age which depends on the non-key column emp_name into separate table named employees.

Employees table and assignment table

emp_id	emp_name	emp_age
1	Artur	20
2	John	40

assign_id	emp_id	dep_id
1	1	1
2	1	2
3	2	1
4	2	2
5	2	3

- With that solution what we get is many-to-many relationship where many employees can be assigned to one or more departments.
- Assignment table plays then a role of the connector table that manages many-to-many relationship.
- Assignment table has following columns:
   ✓ assign_id which is the primary key (unique),
   ✓ emp_id which is the foreign key (non-unique),
   ✓ dep_id which is the foreign key (non-unique).

When we care about keeping table normalized we get a set of benefits:
- better db organization,
- more tables with smaller rows,
- easy modification,
- reduction of redundant and duplicate data.
- ensuring data integrity.

One-to-many relationship

When one record from table A can have multiple corresponding records in table B.
Lets take Products and Categories tables:
source: sqlpedia.pl
- one category can be assigned to many products, however one product cannot be assigned to many categories.

Many-to-many relationship

This kind of relationship can be handled by connector table that breaks relationship into two relationships of one-to-many. As the example, let's take order system, where each product can be ordered many times in different orders.
Here are Orders and Products tables connecter with one-to-many relationships with Order Details table.
source: sqlpedia.pl

Variables

We can set variable data type, assign a value to it, and use it in the code:

DECLARE @TestVariable AS VARCHAR(100)
SET @TestVariable = 'This is an example string.'
PRINT @TestVariable

We can display variable value with SELECT or PRINT statements.

We can also use SELECT statement in order to assing a value to a variable:

DECLARE @CarName AS NVARCHAR(50)
SELECT @CarName = car_model
FROM db.cars
WHERE car_id = 1492
PRINT @CarName

- or:

DECLARE @NumCars AS INT
SET @NumCars = (SELECT COUNT(*) FROM db.cars)

When we assign multiple-rows SELECT result to a variable, the assigned value to the variable will be the last row of the result set.
It's important to remember that local variable scope expires at the end of the batch which can be determined by GO statement.
Thus the lifecycle of variable @TestVariable determines with GO statement line. In other words, the variable which is declared above the GO statement line can not be accessed under the GO statement.

Variable scopes:

Local Variable	Global Variable
A user declares the local variable.	The system maintains the global variable. A user cannot declare them.
By default, a local variable starts with @.	The global variable starts with @@.
Can be used or exists inside the function.	Can be used or exists throughout the program.

OLTP and OLAP

Online Transaction Processing (OLTP)	Online Analytical Processing (OLAP)
Online database modifying system.	Online database data retrieving system.
Uses traditional DBMS.	Uses the data warehouse.
Insert, Update, and Delete operations.	Mostly select operations.
Queries are standardized and simple.	Complex queries involving aggregations.
Tables are normalized.	Tables are not normalized.
Transactions are the sources of data.	Different OLTP databases become the source of data for OLAP.
Maintains data integrity constraint.	Doesn't get frequently modified. Hence, data integrity is not an issue
Designed for real time business operations.	Designed for analysis of business measures.

Functions

SQL has many built-in functions that can be categorised in two categories:

Scalar function	Aggregate function	Window function
Returns a single value from the given input value.	Performs calculations on a group of values and then return a single value.	Computes values over a group of rows and returns a single result for each row.
input: 1 record output: 1 record	input: n records output: 1 record	input: n records output: n records
LCASE() UCASE() LEN() MID() / SUBSTRING() LOCATE() / CHARINDEX() ROUND() NOW() FORMAT() INSTR() CONCAT() REPLACE() YEAR()	SUM() COUNT() AVG() MIN() MAX() FIRST() LAST()	...

Windows:
- Executes a specific function based on the moving window that encapsulates one or N records at a time,
- Includes an OVER clause, which defines a moving window of rows around the row being evaluated.
```
SELECT
  [Month]
  ,[SalesTerritoryKey]
  ,[SalesAmount]
  ,[WindowFunction] = SUM([SalesAmount])
			    OVER (PARTITION BY [SalesTerritoryKey]
			      	ORDER BY [Month]
			      	ROWS BETWEEN 1 PRECEDING AND CURRENT ROW
			  	)
FROM    [CTE_source]
ORDER BY [SalesTerritoryKey],[Month];
```
mssqltips.com
- For computing moving averages, rank items, cumulative sums.
- Can be put in SELECT or ORDER BY lines.
- There are 3 main terms when working with analytic functions: partition, window and current record:

mssqltips.com

Window functions:

They allow to refer to a value of an indicated record different than the current one.

LAG:
- returns a value of a specific field from the preceiding record of a current partition,
- when LAG function called for the first record of a current partition, it returns NULL,
- example:

SELECT CAST(DueDate AS DATE) AS DueDate, OrderID, Total,
		LAG(Total) OVER (PARTITION BY DueDate ORDER BY DueDate) AS PreviousDueDateTotal
FROM tblSales
ORDER BY DueDate

- year to year comparison:

WITH cte_company_orders AS(
  SELECT
    year,
    curr_year,
    LAG(current_year, 1) OVER (ORDER BY year) AS prev_year
  FROM company_details)
SELECT
  year,
  curr_year,
  prev_year,
  ROUND( (curr_year - prev_year) / (CAST(prev_year AS NUMERIC)) * 100 ) AS perct_diff
FROM cte_company_orders

- With LAG we take a value from previous records for each following record.
- It surves for calculating year to year growth calculation where we can get value for a current record an the preceding one.

DueDate	OrderId	Total	PreviousDueDateTotal
2008-06-14	71776	87.08	NULL
2008-06-14	71832	3953160.1	87.08
2008-06-14	71833	22	3953160.1
2008-06-17	71835	9102.11	NULL
2008-06-17	71835	5512.94	9102.11

LEAD:
- returns a value of a specific field from the following record of a current partition,
- when LEAD function called for the last record of a current partition, it returns NULL.
FIRST_VALUE:
- returns the first value of a speciic field of a current partition,
LAST_VALUE:
- returns the last value of a specific field of a current partition.

Rank functions:

ROW_NUMBER()

SELECT *
FROM (
  SELECT
    *,
    ROW_NUMBER() OVER (PARTITION BY dept_name ORDER BY emp_salary) r_numb
  FROM employees) e
WHERE e.r_numb < 2

emp_id	emp_name	emp_salary	dept_name	r_numb
5	Krzysztof	7,100.00	Admin	1
3	Johnny	9,600.00	Admin	2
4	Rick	5,500.00	HR	1
1	Artur	7,500.00	IT	1
6	Caroline	15,500.00	IT	2
...	...	...	...	...

- query shows first two employees with the smalles salary per department.

DENSE_RANK()

SELECT customer_id, events
FROM (
	SELECT customer_id,
	  COUNT(*) AS events,
	  DENSE_RANK() OVER (ORDER BY COUNT(*)) AS rank
	FROM fact_events
	WHERE client_id = 'mobile'
	GROUP BY customer_id) subquery
WHERE rank <= 2
ORDER BY 2 ASC

- Above query finds bottom 2 companies with dense rank.
- DENSE_RANK() keeps counting in the consecutive sequence even though there might be some ties along the way.
- The order of the ranking is ascending by defualt.
- Sub-query outcome:

customer_id	events	rank
a	4	1
b	4	1
c	7	2
d	14	3
e	21	4

- and we take the first 3 records from the sub-query to fulfile WEHERE rank <=2

RANK and DENS_RANK:

source: techTFQ
- RANK() alone skpis every duplicated records.

Analytic function:

NTILE(x), where x is number of equal ranges in which we want to classify the records:

SELECT
	category_name,
	month,
	FORMAT(net_sales,'C','en-US') net_sales,
	NTILE(4) OVER(
		PARTITION BY category_name
		ORDER BY net_sales DESC
	) net_sales_group
FROM
	sales.vw_netsales_2017;

- outcome:

source: techTFQ
- that way we got 4 evenly distributed buckets.

Getting the category with lowest value using ntile:
```
SELECT shop_id, total_order
FROM (
  SELECT
    shop_id,
    SUM(total) AS total_order,
    NTILE(50) OVER (ORDER BY SUM(total)) AS ntile
  FROM shops
  WHERE customer_placed_order BETWEEN '2020-05-01' AND '2020-05-31'
  GROUP BY shop_id
)
WHERE ntile=1
ORDERBY 2 DESC
```
- With ORDER BY in OVER() we make sure SUM(TOTAL) is ordered ascendingly.
- Entire ordered SUM(TOTAL) column will be divided into 50 evenly distributed buckets.
- Taking first bucket ensures us we take into an account only lowest generating shops.
- NTILE takes 50 as we want to get all shops <= 2% of shop's revenue.
- If I had to take 1% of shop's revenue then I would go with NTILE(100).

Transaction

Single set of task performed on a database in logical manner.
It inlcudes operations like creating, updating, deleting.
There are 4 transaction controls:
- commit: saving all changes made by a transaction,
- rollback: reverts all changes made by a transaction and the database remains as before,
- set transaction: setting a transaction's name,
- savepoint: sets the point where a transaction can roll back to.

Privileges

A db admin can GRANT or REVOKE privileges to or from users of database.
GRANT command provides database access to a user:
```
GRANT SELECT
ON employee
TO user1
```
- This command grants a SELECT permission on employee table to user1.
REVOKE removes database access from a user:
```
REVOKE SELECT
ON employee
FROM user1
```
- This command will REVOKE a SELECT privilege on employee table from user1.
- user1 will lost privilege on employee table only when everyone who granted the permission revokes it.

SELECT INTO

Statement that copies data from a table to the new one.

SELECT *
INTO newtable [IN externaldb]
FROM oldtable
WHERE condition;

SELECT INTO	INSERT INTO SELECT
Used when the table doesn't exist.	Used when the table exists.
For single task like creating temporary table, ad hoc reporting or backup table.	Used regularly to appending rows to tables.
Copies all data with data types to the newly-created table.	Data types in source and target tables must match.

Here is how to use INSERT INTO SELECT:

INSERT INTO Customers (CustomerName, City, Country)
SELECT SupplierName, City, Country FROM Suppliers;
WHERE Country='Poland';

Temporary table:

Exists temporarily storing a subset of data from a normal table.
You filter data once and store the result in a temporary table instead of filtering the data again and again to fetch the subset.
Temporary table needs to be named with '#' prefix.

We can make use of SELECT INTO to create a temporaty table:

SELECT name, age, gender
INTO #FemaleStudents
FROM student
WHERE gender = 'Female'

We can keep it inside a stored procedure:

CREATE TABLE #MaleStudents(
  name VARCHAR(50),
  age INT,
  gender VARCHAR (50)
)

CREATE PROCEDURE spInsertStudent (@Name VARCHAR(50), @Age INT,  @Gender VARCHAR(50))
AS
BEGIN
  INSERT INTO #MaleStudents VALUES (@Name, @Age, @Gender)
END

CREATE PROCEDURE spListStudent
AS
BEGIN
  SELECT * FROM #MaleStudents
  ORDER BY name ASC
END

EXECUTE spInsertStudent Bradley, 45, Male
EXECUTE spListStudent

source: codingsight.com
- The 1st procedure inserts a new student record with the name: Bradley, age:45, gender: Male.
- The 2nd procedure selects all the records from the table in the ascending order of name.

When a temporary table created, it gets an unique identifier in order to differentiate between the temporary tables created by different connections. You can perform operations on the temporary table via the same connection that created it.
A temporary table is automatically deleted when the connection that created the table is closed.

LIKE

LIKE operator is used for pattern matching.
It can be used with wildcards:
- "%"" - matches zero or more characters.
```
SELECT *
FROM employees
WHERE emp_surname LIKE 'S%'
```
- "_" - matches exactly one character.

VIEW

It's a virtual table based on the result of any SQL statement.
A view contains all you need in every other table: rows and columns and fields can come from one or more real tables.
While creating views, we can apply WHERE or JOIN statements to keep the data as if it comes from one single table.

Here is the syntax for view creating:

CREATE VIEW view_name AS
SELECT column1, column2, ...
FROM table_name
WHERE condition;

- end here is how to query a view:

SELECT * FROM view_name

With views we can restric users from being able to check some confidential data. In a view we just define what columns are visible. The columns we want to keep invisible, we don't include them into the view while creating. For example, lets say we have the employees table data schema:
   - id: numeric
   - first_name: string
   - last_name: string
   - salary: numeric
Let's assume that column salary is confidential and has to be restricted from viewing:
```
CREATE VIEW emloyess_details AS
SELECT first_name, last_name,
FROM employess
WHERE last_name LIKE 'S%';
```
- When we wat to check salary somehow:
```
SELECT last_name, salary
FROM emloyess_details
WHERE last_name LIKE 'S%';
```
- Querying the view emloyess_details for salary field raises an error.
It aslo simplifies queries hiding their complexity made with UNION, JOIN etc.
Let's have a look at differences between a view and the actual table:
- The difference between a view and a table is that views are definitions built on top of other tables (or views), and do not hold data themselves.
- A view hides the complexity of the database tables from end users. Essentially we can think of views as a layer of abstraction on top of the database tables.
- If data is changing in the underlying table, the same change is reflected in the view.
- Views takes very little space to store, since they do not store actual data.
- Views can include only certain columns in the table so that only the non-sensitive columns are included and exposed to the end user.
We can modify the data with using views but under following conditions:
- there is no groupping with GROUP BY in the view,
- there is no DISTINCT in view's definition,
- there are no expressions in view's definiton,
- only one base table is being updatred through a view,
- modified values are in line with base table field constraints.
In all views that can modify the data we should implement WTH CHECK OPTION clause so that we can prevent users from overriding any changes he has just performed.

INDEX

An index is used to speed up searching in the database by reducing records amount to be scanned.
An index helps to speed up SELECT queries and WHERE clauses, but it slows down data input, with the UPDATE and the INSERT statements.
In order to set an index we use CREATE INDEX statement, which allows us to name the index as well as to specify a table and a column that we want to get indexed:
```
CREATE INDEX index_name
ON table_name (column_name);
```
Updating a table with indexes takes more time than updating a table without (because the indexes also need an update). So, only create indexes on columns that will be frequently searched against.
When not to use indexes:
- on small tables,
- on tables that have frequent, large updates or insert operations,
- on columns that contain a large number of NULLs,
- on columns that are frequently manipulated.
Here is how to use it:
- creating index:
```
CREATE INDEX employees_last_name_idx
ON employees (last_name);
```
- using indexed column in the select query:
```
SELECT *
FROM employees
WHERE salary > 10000 AND last_name = "Smith"
```
- We use last_name indexed column within the WHERE caluse along with salary which speeds up performing the query.
- It simply works like an index in the book where the index key (indexed column name) points to every respective record keeping value from the rest of non-key columns. What is very important, index keys are sorted.
- If the table is not indexed then database server needs to go through all records to execute any query.
- It's recommended to use at leat one indexed column when groupping wiht GROUP BY. - The most optimal number of indexes put on the same table is not greater than 10.
We can also index multiple columns, however it takes very long to set it up for the db engine.

We can say that there are two types of indexes:

Clustered Index	Non-Clustered Index
Defines the order of storing data in the table	Doesn't define the order of data inside the table
As data always can be stored in one way, there can be only one clustered index. When creating a table, the primary key constraint automatically creates index on that column.	In fact, non-clustered index is stored at different place than table data is stored. This allows for more than one non-clustered index per table.
Clustered indexes only sort tables. Therefore, they do not consume extra storage.	Non-clustered indexes are stored in a separate place from the actual table claiming more storage space.
Fast performance.	Slower performance as it requires additional lookup between index sotrage and acutal table.

- It is important to mention here that inside the table the data will be sorted by a clustered index.
- However, inside the non-clustered index data is stored in the specified order.
- The non-clustered index contains column values on which the index is created and the address of the corresponding records in the actual table.

Here is how it works:
- When a query is run against indexed column, the database will first go to the index and look for the address of the corresponding row in the table. It will then go to that row address in the table and fetch other column values.

DISTINCT

Using when we wan to get unique values from a given column.
```
SELECT DISTINCT name
FROM students
```
Aggregating function COUNT() counts duplicates. To count only unique values we need to put DISTINCT prior to COUNT() argument:
```
SELECT COUNT(DISTINCT client_region)
FROM tblSales
```
- all aggreagating functions ignore NULL values by default,
- however, with syntax: COUNT(*) we calculate both duplicates and nulls,
- all aggreagating functions take all values including deuplicated ones - we can prevent this using DISTINCT keyword,
- even though all aggregating functions can accept unique values, keyword DISTINCT is most often used with COUNT().
Using COUNT(DISTINCT col_name) to get first/second... max value:
```
SELECT name, price
FROM sales AS A
WHERE 0 = (
    SELECT COUNT(DISTINCT price)
    FROM sales AS B
    WHERE B.price > A.price
)
```
- The sub-query in the WHERE clause is being performed for each and every record from the outer query.
- For every record from the outer query, there is a check in WHERE clause: if 0 equals sub-query outcome then put outer query recod into the result set one at a time.
- When we want to get products with second biggest price we would need to replace 0 with 1 in the WHERE clause etc.

JOIN

Join is the keyword needed when quering more than one table.
While joining, we need to define on what columns (of both tables) we will be keeping tables joined.
Usually, we use primary key of one table matching it with its corresponding foreign key in another table.
There are a few kind of joins:
- (INNER) JOIN: returns rows when there is a match in rows between the tables.
- LEFT (OUTER) JOIN: returns all the rows from the left table regardless if there is a match or a null from the right table.
- RIGHT (OUTER) JOIN: returns all the rows from the right table regardless if there is a match or a null from the left table.
- FULL (OUTER) JOIN: returns all the rows from the left-hand side table and all the rows from the right-hand side table.
- CROSS JOIN: joins every row of one table with every row of another table. A very complex operation when cross joining more than 2 tables. Requiring from the performance perspective.

Let's take two table schemas for an example:

tblVideo
- video_id: PRIMARY KEY integer
- author_id: integer
- video_duration: float

tblView
- video_id: FOREIGN KEY integer
- viewer_id: integer
- viewer_timewatch: float

Inner join example:
How many publishers have at least one viewer?
```
SELECT COUNT(DISTINCT vd.author_id)
FROM tblVideo vd
INNER JOIN tblView vw
ON vd.video_id = vw.video_id
```
- It doesn't count authors whose videos are not in the tblView.
- If I want to count disctinct author's id, I wouldn't need any JOINs and do select only on tblVideo.
Left join example:
- Here is the snippet of data schema:
source: toptal.com
```
SELECT i.Id, i.BillingDate, c.Name, r.Name AS ReferredByName
FROM Invoices i
 LEFT JOIN Customers c ON i.CustomerId = c.Id
 LEFT JOIN Customers r ON c.ReferredBy = r.Id
ORDER BY i.BillingDate ASC;
```
- With LEFT JOIN we make sure that all the invoices will be returned no matter what (in case of any NULLs within Customers table).
- First LEFT JOIN joins customer id pk with customer id fk in the Invoices table.
- Second LEFT JOIN joins customer id pk with cusotmer id fk in the Customer table.
- ORDER BY orders outcome by date. We can append ASC/DESC if ascending or descending ordering.

We can join multiple tables:

SELECT column1, column2
FROM TableA
JOIN TableB ON TableA.Column3 = TableB.Column3
JOIN TableC ON TableA.Column4 = TableC.Column4

We can also join a table with itself what we call self join
```
SELECT DISTINCT E.FullName
FROM EmployeeDetails E
INNER JOIN EmployeeDetails M
ON E.EmpID = M.ManagerID;
```
- We are using different aliases to differentiate tables on join.
- This way we fetch those emps whose id are present in the ManagerID column.

Selecting the columns on which join is applied:

SELECT t1.main_col AS col_A, t2.main_col AS col_B
FROM tab1 t1
FULL OUTER JOIN tab2 t2
ON t1.main_col = t2.main_col

- we just need to alias the columns properly to fetch them boht separately.
- result:

col_A	col_B
Null	1231235
Null	5234214
4434431	4434431
2221222	2221222
9765675	Null
7654565	Null

Cross join using 1 = 1 as join condition:
- It's simply doing a cross join, which selects all rows from the first table and all rows from the second table and shows as cartesian product, i.e. with all possibilities.
- Putting in 1=1 is like saying "1=1 is always true, do don't eliminate anything".

Greater/Lower than in the ON condition:
- having below table:

name	price
a	10
b	15
c	20
d	20

and joining table to itself

SELECT *
FROM products AS A
JOIN products AS B
ON A.price > B.price

we get

name	price	name	price
b	15	a	10
c	20	a	10
c	20	b	15
d	20	a	10
d	20	b	15

- query with a such condition in JOIN ON, for every record of A table assigns all of the records of B table where price is lower.

Multiple conditions in ON caluse:

SELECT u1.*
FROM users u1
JOIN users u2
ON u1.user_id_sender = u2.user_id_sender
AND u1.user_id_receiver = u2.user_id_receiver

- we self join the same table of users,
- we use combination of two columns as a combined key.

USING

USING clause comes in use while working with JOIN and can be used instead of ON clause.
When we use USING clause, that particular column name should be present in both tables, and the SELECT query will automatically join those tables using the given column name in USING clause.
Here comes the example:
```
SELECT e.emp_id, e.last_name, d.loc_id
FROM emploees e
JOIN departments d
USING(dep_id);
```
- in tables employees and departments, dep_id column is the mutual one.

Another example with delete query:

DELETE FROM tab1
USING tab2
WHERE tab1.k = tab2.k

UNION

Merges the contents of two structurally-compatible tables into a single combined table removing duplicated records.
There is also UNION ALL that does the same things but keeps duplicated records.
Performance of UNION ALL is better than UNION, since UNION requires the server to do the additional work of removing any duplicate.

Here is how to use it:

SELECT employee_id, first_name, last_name, 'Employee' AS status
FROM employees
UNION
SELECT visitor_id, first_name, last_name, 'Visitor' AS status
FROM visitors

- note that colums selected from both tables have to be in sync with each other.

We can use multiple UNION statements in a one query.

EXCEPT

Allows to fetch values from one table that are not in another table.

CREATE TABLE test_a(id INTEGER);
INSERT INTO test_a(id) VALUES (10);
INSERT INTO test_a(id) VALUES (20);
INSERT INTO test_a(id) VALUES (30);
INSERT INTO test_a(id) VALUES (40);
INSERT INTO test_a(id) VALUES (50);

CREATE TABLE test_b(id INTEGER);
INSERT INTO test_b(id) VALUES (10);
INSERT INTO test_b(id) VALUES (30);
INSERT INTO test_b(id) VALUES (50);

SELECT * FROM test_a
EXCEPT
SELECT * FROM test_b;

----------------------
20, 40

- 20 and 40 are the values that are not present in test_b table.

INTERSECT

Fetchs common records between two tables:

SELECT * FROM EmployeeSalary
INTERSECT
SELECT * FROM ManagerSalary;

LIMIT

Limiting select query results to specific numbers of records:
```
SELECT *
FROM (
	SELECT *
	FROM employees
	ORDER BY salary DESC )
LIMIT 10;
```
- We fetch 10 highest salaries in employees tabel.
- We use subquery meaning we emebd one select into another.

TOP

We can select x top records:
- let's assume we have customers table and the fields customer_id,
- let's fetch first 100 odd cutomer_id values,
- there, we use WHERE clause to filter fetched rows, this clause allows AND as well as OR operators,
```
SELECT TOP 100 customer_id
FROM customers
WHERE customer_id % 2 = 1
ORDER BY user_id
```
- Clause WHERE can also have BETWEEM operator:
```
SELECT *
FROM employees
WHERE date_of_joining BETWEEN '2020/01/01' AND '2020/12/31';
```
- BETWEEN in WHERE clause is inclusive.
- When it comes to dates, we can extract what we want:
```
SELECT *
FROM employees
WHERE YEAR(date_of_joining) = '2020'
```

GROUP BY

Groups rows into categories within a column we indicate:
```
SELECT COUNT(CustomerID), Country
FROM Customers
GROUP BY Country
HAVING COUNT(CustomerID) > 3;
```
and then we can run any aggregating function for a particular category.
- We count all customers for a specific country but we display countires where count is bigger than 3.
- We use GROUP BY when some aggregating function in select line like COUNT, SUM, AVERAGE.
- When using GROUP BY, in the select line we can use only aggregating functions and columns that are enclosed by GROUP BY.
- We can create some sub-groups for example: GROUP BY Country, Region
- HAVING simply puts filter on COUNT(CustomerID) outcome column.
```
COUNT(CustomerID)	Country
9			Brazil
11			France
11			Germany
5			Mexico
5			Spain
7			UK
13			USA
4			Venezuela
```
WHERE vs HAVING:
- When GROUP BY is not used, then WHERE and HAVING clauses works the same.
- When GROUP BY is used, then WHERE clause is used to filter records from a result and filtering works before any grouping. HAVING clause is used to filter values from a group.

We can even have some calculations within HAVE:

WITH cte_pays AS(
	SELECT
	    date,
	    SUM(CASE WHEN a.paying_customer = 'yes' THEN d.downloads END) AS paying_sum,
	    SUM(CASE WHEN a.paying_customer = 'no' THEN d.downloads END) AS non_paying_sum
	FROM user_dimension u
	LEFT JOIN acc_dimension a ON u.acc_id = a.acc_id
	LEFT JOIN download_facts d ON u.user_id = d.user_id
	GROUP BY d.date
	ORDER BY d.date ASC)
SELECT *
FROM cte_pays p
GROUP BY
	p.date,
	p.paying_sum
	p.non_paying_sum
HAVING (p.non_paying_sum - p.paying_sum) > 0
ORDER BY p.date ASC

- with CASE WHEN and SUM() we calculate another columns based on the old column: downloads,
- the first one sums all values when condition paying_customer = 'yes',
- the another one sums all values when condition paying_customer = 'no'.

Referencing to the same table more than once:

Here is the employees table:

Emp_Id	Emp_name	Salary	Manager_Id
10	Anil	50000	18
11	Vikas	75000	16
12	Nisha	40000	18
13	Nidhi	60000	17
14	Priya	80000	18
15	Mohit	45000	18
16	Rajesh	90000	Null
17	Raman	55000	16
18	Santosh	65000	17

Lets fetch all rows with WHERE clause on a.manager_id = b.emp_id:

SELECT *
FROM Employee a, Employee b
WHERE a.manager_id = b.emp_id

10	Anil	50000	18	18	Santosh	65000	17
11	Vikas	75000	16	16	Rajesh	90000	Null
12	Nisha	40000	18	18	Santosh	65000	17
13	Nidhi	60000	17	17	Raman	55000	16
14	Priya	80000	18	18	Santosh	65000	17
15	Mohit	45000	18	18	Santosh	65000	17
17	Raman	55000	16	16	Rajesh	90000	Null
18	Santosh	65000	17	17	Raman	55000	16

- Note there is no Rajesh in the outcome as he has no manager above him. Rajesh appears only as the manager for other employees.

Let's group managers counting average salary:

SELECT b.emp_id AS "Manager_Id", b.emp_name AS "Manager", AVG(b.salary) AS "Avg_Salary_Under_Manager"
FROM employees a, employees b
WHERE a.manager_id = b.emp_id
GROUP BY b.emp_id
ORDER BY b.emp_id;

Manager_Id	Manager	Average_Salary_Under_Manager
16	Rajesh	65000
17	Raman	62500
18	Santosh	53750

CASE

We can use Case to give some conditons:
- assuming we have students table and lastname filed and we want to update some records:
```
UPDATE dbo.students
SET lastname =  CASE
	 WHEN lastname = 'aaa' THEN 'bbb'
	 WHEN lastname = 'ccc' THEN 'ddd'
	 WHEN lastname = 'eee' THEN 'fff'
	 ELSE lastname
	END
WHERE lastname IN ('aaa', 'ccc', 'eee')
```
- it's very important to enclose WHERE clause in the queries,
- if we omit the WHERE clause, all records will get updated.
Embeding CASE WHEN in SUM():
- using orders table we have fields like ship_country and status,
- we want to show the number of shipped orders by country,
- the order is shipped if the status is “shipped” or “delivered:
```
SELECT ship_country,
    SUM(CASE
      WHEN status = 'shipped' OR status = 'delivered' THEN 1
      ELSE 0
      END) AS order_shipped
FROM orders
GROUP BY ship_country;
```
- the code assigns 1 to all records where the status is either "shipped" or "delivered",
- rest of the records with different status get 0 assigned.
- another example to count percentage:
```
SELECT SUM(CASE
      WHEN CAST(order_cost AS INT) > 500
        THEN 1
      ELSE 0
      END) * 100.0 / COUNT(*) AS percentage_orders_over_500
FROM orders
```
Embeding CASE WHEN in COUNT():
```
SELECT
    	COUNT(CASE WHEN status = "open" THEN user_id ELSE NULL END) / CAST(COUNT(*), FLOAT) AS ratio
FROM users
WHERE country = "USA"
```
- if the status is equal to 'open' then there is user_id value in COUNT() input.
- If the status is different than 'open' then there is NULL in CCOUNT() input. Function COUNT() ignores NULLs - ratio numerator..
- However function: COUNT(*) count both values and nulls getting all records number - ratio denominator.
- Integer being devided by integer gives 0 - we need to cast at least denominator or numerator into float format in order to get a ratio between 0 and 1.

IN

The IN operator allows to specify multiple values in a WHERE clause.
If taken value 'Apple' is in one of given columns then we take rows into results.

Id	Col1	Col2	Col3
1	Apple	Null	Orange
2	Orange	Apple	Strawberry
3	Strawberry	Apple	Orange

SELECT *
FROM table
WHERE Col1 IN ("Apple", "Strawberry")

We can also exclude records by using NOT IN:
```
SELECT *
FROM table
WHERE Col1 NOT IN ("Apple", "Strawberry")
```
- then we get the second record only.

OVER

A window function is an SQL function where the input values are taken from a "window" of one or more rows into the results set of a SELECT statement. Window functions are distinguished from other SQL functions by the presence of an OVER clause.
OVER clause has been added to SQL Server and can be used to determine which rows from the query are applied to the function.

PARTITION BY clause is used to divide the result set from the query into data subsets, or partitions. PARTITION BY clause is not used, the entire result set from the query is the partition that will be used. The window function being used is applied to each partition separately, and the computation that the function performs is restarted for each partition. Lets have a look at example without OVER and with OVER applied:

SELECT
    emp_id,
    emp_name,
    dept_name,
    emp_salary,
    MAX(emp_salary) OVER () as max_salary
FROM employee

emp_id	emp_name	emp_salary	dept_name	max_salary
1	Artur	7,500.00	IT	15,500.00
2	Robert	10,000.00	Admin	15,500.00
3	Johnny	9,600.00	Admin	15,500.00
4	Rick	5,500.00	HR	15,500.00
5	Krzysztof	7,100.00	Admin	15,500.00
6	Caroline	15,500.00	IT	15,500.00
...	...	...	...	...

source: sqlservercentral.com
- This query returns the maximum salary of all records for each record.
- Since a PARTITION BY clause was not used, the entire result set was treated as a single partition.
- Let's apply PARTITION BY clause:

SELECT
    emp_id,
    emp_name,
    dept_name,
    emp_salary,
    MAX(emp_salary) OVER (PARTITION BY dept_name ORDER BY emp_salary) as max_salary
FROM employee

emp_id	emp_name	emp_salary	dept_name	max_salary
5	Krzysztof	7,100.00	Admin	10,000.00
3	Johnny	9,600.00	Admin	10,000.00
2	Robert	10,000.00	Admin	10,000.00
4	Rick	5,500.00	HR	5,500.00
1	Artur	7,500.00	IT	15,500.00
6	Caroline	15,500.00	IT	15,500.00
...	...	...	...	...

source: sqlservercentral.com
- Above query has a PARTITION BY clause of the dept_name column, so the max function takes all records assigned to a particular department.
- Additionaly, ORDER BY orders records by emp_salary ascendingly within each partition.

OFFSET

The OFFSET clause specifies the number of rows to skip before starting to return rows from the query. The FETCH clause specifies the number of rows to return after the OFFSET clause has been processed.
We must use the OFFSET and FETCH clauses with the ORDER BY clause.

Skipping the first 10 products and selecting the next 10 products:

SELECT product, price
FROM products
ORDER BY price, product
OFFSET 10 ROWS
FETCH NEXT 10 ROWS ONLY;

EXISTS

The EXISTS operator is used to test for the existence of any record in a subquery.
The EXISTS operator returns true if the subquery returns one or more records:
```
SELECT name
FROM products p1
WHERE EXISTS (
  SELECT 1
  FROM products p2
  WHERE p1.name = p2.name AND p2.price > 15
)
```
- Above query returns TRUE and lists the suppliers with a product price less than 20.
- In the sub-query we need to have a reference to the table from the outer query.

IS NULL

Sometimes we want to fetch the nulls, then we use IS NULL in the WHERE clasue:
```
SELECT c.id AS c_id, pc.id AS pc_id
FROM clients c
FULL OUTER JOIN premium_clients pc
ON c.id = pc.id
WHERE pc.id IS NULL
```
- checking which elements of the id clomun are sitting in the client table but they are not in the premium_clients table.
- those missing records will be displayed as NULLs in pc_id column.

IS NULL in CASE WHEN:

WITH cte_nulls_to_ints AS(
  SELECT
    *,
    CASE WHEN end_dt IS NULL THEN 1 ELSE 0 END AS int_end_dt
  FROM projects)
SELECT
  SUM(e.salary)
FROM Employees e
JPIN cte_nulls_to_ints c
ON e.id = c.emp_id
GROUP BY e.id
HAVING COUNT(c.project) = SUM(c.int_end_dt)

- we give 1 when there is NULL in the end_dt column.
- HAVIG clasue is a key here as we get only those employees (their ids) only when number of all projects equals the sum of all theis nulls which means they have not finished any project so far. - We can do same exact thing with IS NOT IN and IS NOT NULL:

SELECT
  SUM(salary)
FROM Employees e
WHERE e.id IS NOT IN(
  SELECT employee_id
  FROM projects
  WHERE end_dt IS NOT NULL
)

IS NOT NULL

In order to ensure not getting Nulls in a queried column we use IS NOT NULL in WHERE clause:
```
SELECT emp_id, salary
FROM employees
WHERE salary IS NOT NULL
```

We can also filter Nulls from multiple columns:

SELECT emp_id, salary, emp_position
FROM employees
WHERE salary IS NOT NULL AND mp_position IS NOT NULL)

IFNULL

It's a control flow function that accpets two arguments:
```
SELECT IFNULL(expression_1,expression_2);
```
- function returns expression_1 if expression_1 is not NULL, otherwise, it returns expression_2.

CAST

We can use CAST in order to convert data types into other ones.
```
SELECT CAST(11.25 AS int)
------
11
```
- in the example above we convert float type into int type.

Unique records without DISTINCT

We can use GROUP BY in order to eliminate duplicates from a query result:
```
SELECT surname
FROM employees
GROUP BY surname
```
- it works because GROUP BY groups all the duplicated values of a specific column into one value (group).
Another option is self joining the table and using NOT EXISTS in the WHERE clause:
```
SELECT emp_name, emp_address, emp_sex, emp_salary
FROM employees e1
WHERE NOT EXISTS (SELECT 1
         FROM employees e2
         WHERE e2.emp_name = e1.emp_name AND
               e2.emp_address = e1.emp_address AND
               e2.emp_sex = e1.emp_sex AND
               e2.create_date >= e1.create_date)
```
- SELECT 1 in the subquery returns constant 1 for every row in the table only when condition in the second WHERE clause is met,
- selecting constant 1 is useful when we want to cheaply determine if record matches condition in the WHERE clause.

Common Table Expression (CTE):

It's a named temporary result set that exists within the scope of a single statement and that can be referred to later within that statement, possibly multiple times.
Improves the readability of sql statements.
When CTE is helpful:
- needing to reference a derived table multiple times in a single query,
- an alternative to creating a view in the database,
- performing the same calculation multiple times over across multiple query components.

Starts with WITH keyword:

-- define CTE:
WITH Cost_by_Month AS(
SELECT campaign_id AS campaign,
    TO_CHAR(created_date, 'YYYY-MM') AS month,
    SUM(cost) AS monthly_cost
FROM marketing
WHERE created_date BETWEEN NOW() - INTERVAL '3 MONTH' AND NOW()
GROUP BY 1, 2
ORDER BY 1, 2)
SELECT
    campaign, avg(monthly_cost) as "Avg Monthly Cost"
FROM Cost_by_Month
GROUP BY campaign
ORDER BY campaign

source: chartio.com

Using CTEs to handle multiple logic

WITH cte_combined_energy AS (
    SELECT *
    FROM eu_energy
    UNION All
    SELECT *
    FROM usa_energy
    UNION All
    SELECT *
    FROM na_energy),
cte_energy_by_date AS (
    SELECT
        date,
        SUM(consumption) AS total_energy
    FROM cte_combined_energy
    GROUP BY date
    ORDER BY date ASC),
cte_max_energy AS (
    SELECT
        MAX(total_energy) AS max_energy
    FROM cte_energy_by_date)
SELECT
      cebd.date,
      cebd.total_energy
FROM cte_max_energy cme
JOIN cte_energy_by_date cebd
      ON cebd.total_energy = cme.max_energy

- cte_combined_energy: unions all 3 tables (verious length),
- cte_energy_by_date: aggregates values from consumption column and groups the outome by date,
- cte_max_energy: takes max value from aggregation made in the previous cte,
- last select joins cte_energy_by_date and cte_max_energy matching value of max in both.

data	total_energy
2021-01-08	6500
2021-01-07	6500

- This approach is way better than ordering descending and limit the outcome to 1 as we can have more than one record with max_energy value.

SQL Delta:

It performs three main functions:
- comparing and optionally synchronizing the structure (schema) of two SQL databases,
- comparing the data in tables (and optionally views) of both the databases,
- displaying the schema of a database.
SQL Delta can generate update scripts to synchronize one database with other database, for both structure and data.
SQL Delta provides various structure and comparison reports which are saved as HTML. Itpresents an overview of the objects highlighting different, added or deleted.

Running Total

Here I want to apply running total without any window functions:
- we apply the self inner join:

SELECT t1.ID, t2.ID
FROM demo t1
INNER JOIN demo t2
ON t1.ID >= t2.ID

- there are two join conditions in order to get proper set of data:
1) T1.InvoiceID >= T2.InvoiceID to mix each following invoice id with the rest of prior ones. If there was "=" in the join condition we woul get one by one assigment like: 1 - 1, 2 - 2, 3 - 3 etc...
2) T1.TransactionDate = T2.TransactionDate to include records on the same transaction date.

SELECT   T1.InvoiceID
         ,T2.InvoiceID
         ,T1.TransactionDate
FROM     Sales.CustomerTransactions T1
         INNER JOIN Sales.CustomerTransactions T2
         ON T1.InvoiceID >= T2.InvoiceID
         AND  T1.TransactionDate = T2.TransactionDate
ORDER BY T1.InvoiceID

source: essentialsql.com

InvoiceID	InvoiceID	TransactionDate
1	1	2021-01-01
2	1	2021-01-01
2	2	2021-01-01
3	1	2021-01-01
3	2	2021-01-01
3	3	2021-01-01

As we want to calculate the running total we need to get an amount for each invoice and all invoices before:

SELECT   T1.InvoiceID
         ,T1.TransactionDate
         ,T1.TransactionAmount
         ,Sum(T2.TransactionAmount) RunningTotal
FROM     Sales.CustomerTransactions T1
         INNER JOIN Sales.CustomerTransactions T2
         ON T1.InvoiceID >= T2.InvoiceID
         AND  T1.TransactionDate = T2.TransactionDate
GROUP BY T1.InvoiceID
         ,T1.TransactionDate
         ,T1.TransactionAmount
ORDER BY T1.InvoiceID
         ,T1.TransactionAmount

source: essentialsql.com

We calculate running total on T2.TransactionAmount:
source: essentialsql.com

Here is how can we achieve it with OVER caluse:

SELECT   InvoiceID
         ,TransactionDate
         ,TransactionAmount
         ,SUM(TransactionAmount) OVER(PARTITION BY TransactionDate) RunningTotal
FROM     Sales.CustomerTransactions T1
WHERE    TransactionTypeID = 1
ORDER BY InvoiceID
         ,TransactionAmount

and here is the result:

source: essentialsql.com

SELECT   InvoiceID
         ,TransactionDate
         ,TransactionAmount
         ,SUM(TransactionAmount) OVER(PARTITION BY TransactionDate ORDER BY InvoiceID) RunningTotal
FROM     Sales.CustomerTransactions T1
WHERE    TransactionTypeID = 1
ORDER BY InvoiceID
         ,TransactionAmount

when ORDER BY added, we get the expected:

source: essentialsql.com

Window alias

We use it when we want to keep our code DRY.

SELECT   TO_CHAR(CAST(created_at, date), 'YYYY-MM') AS year_month,
         ROUND((SUM(value) - LAG(SUM(value),1) OVER(w)) / LAG(SUM(value),1) OVER(w) * 100,2) as r_diff
FROM     transactions
GROUP BY year_month
WINDOW w AS (ORDER BY TO_CHAR(CAST(created_at, date), 'YYYY-MM'))
ORDER BY year_month ASC

value

current record value

previous record value

CONCAT

Function used for strings concatination:

SELECT *
    concat((first_name), '_', (second_name)) AS full_name
FROM clients
WHERE account = "premium"

Working with Dates:

CAST:
- converting object/string into date format:
CAST('2020-02-10', DATE)
EXTRACT:
SELECT EXTRACT(YEAR FROM CAST(i_date AS DATE)) AS year
- we can extract a year from dates.
INTERVAL:
- moving back 30 days from a given date:
SELECT CAST('2020-02-10', DATE) - 30 * INTERVAL '1 day'
- we can put it in the WHERE clause with BETWEEN operator:
WHERE created_at BETWEEN (CAST('2020-02-10', DATE) - 30 * INTERVAL '1 day')
AND CAST('2020-02-10', DATE)
- the date earlier than 10 months ago from now:
WHERE transaction_start_date < NOW() - 10 * INTERVAl '1 month'

Cursors:

DECLARE @AccountNumber int
DECLARE @Payment decimal(18,2)

DECLARE pmt_cursor CURSOR FOR
SELECT Account_Number, Payment
FROM [dbo].[Account_Payments]

OPEN pmt_cursor
FETCH NEXT FROM pmt_cursor INTO @AccountNumber, @Payment

WHILE @@FETCH_STATUS = 0
BEGIN
    UPDATE ##AccountBalance
    SET Balance = Balance - @Payment
    WHERE Account_Number = @AccountNumber

    FETCH NEXT FROM pmt_cursor INTO @AccountNumber, @Payment
END

CLOSE pmt_cursor
DEALLOCATE pmt_cursor

- source: James Oliver

UNPIVOT

Having a pivot like below:

week_ending_date	sunday	monday	tuesday	wednesday	thursday	friday	saturday
2020-09-05	...	...	...	...	...	...	...
2020-09-12	...	...	...	...	...	...	...
2020-09-19	...	...	...	...	...	...	...
2020-09-26	...	...	...	...	...	...	...
...	...	...	...	...	...	...	...

we want to unpivot it like following:

week_ending_date	day_of_week	sales_amount
2020-09-05	sunday	...
2020-09-05	monday	...
2020-09-05	tuesday	...
2020-09-05	wednesday	...
2020-09-05	thursday	...
2020-09-05	friday	...
2020-09-05	saturday	...
2020-09-12	sunday	...
2020-09-12	monday	...
2020-09-12	tuesday	...
2020-09-12	wednesday	...
...	...	...

using UNPIVOT operator:

SELECT week_ending_date, day_of_week, sales_amount
FROM (
  SELECT week_enidng_date, sunday, monday, tuesday, wednesday, thursday, friday, saturday
  FROM weekly_sales
) AS wkly_sls
UNPIVOT (
  sales_amount FOR day_of_week IN
    (sunday, monday, tuesday, wednesday, thursday, friday, saturday)
) AS unpivot_sales;

source: James Oliver

TRUNCATE TABLE

With the command TRUNCATE TABLE table_name we clear a table completely, however it doesn't remove it.
With TRUNCATE there is no possibility to indicate which records we want to remove.

UPDATE

We can put expressions in the SET clause so that we can update a field based on its previous value:
```
UPDATE products
SET price = price * 0.9
WHERE date_end IS NOT NULL
```
We can also take advatage of the fact the UPDATE statement is executed all at once. It means we don't have to keep a temporary variable to update fields by each other:
```
UPDATE products
SET price = second_price, second_price = price
```

ANY

ANY can replace IN in the WHERE clause.

Operator ANY is true when any of the returned value from an internal sub-query fulfils a WHERE condition.

SELECT ProductName
FROM Products
WHERE ProductID = ANY
  (SELECT ProductID
  FROM OrderDetails
  WHERE Quantity = 10);

SQL performance

SELECT specific fields instead of using SELECT *:
- returning the columns that we need not slowing down query performance with unnecessary columns in a result.
SELECT more fields to create unique records instead of using SELECT DISTINCT.
Create joins with INNER JOIN, and not with WHERE clause.
Use WHERE instead of HAVING to define filters where possible. HAVING should only be used when filtering on an aggregated field.
Use wildcards at the end of a phrase only. So, going with 'Art%' is more efficient than '%Art%'.
Use LIMIT to sample results.
Run your query during off-peak hours.
COST of joining tables is O(n^2) where n is a number of records. Having joining columns indexed speeds up the process to O(n). Having a primary key on a column it's already indexed but to get better performance we need ot put indicies on rest of the foreign key columns.
Using indexes espacially on row-base database like sql server:
- clustered index - on primary key on record id - table is ordered by primar key like a table of content where a book is ordered by page number - when query is being run and we need to find a specific id then it knows exactly where to look for it right away,
- uncluster index - like appendinx in a book - can be set on multiple columns - its more like a reference, it doesn't make a table more structured like in case of clustered index - unclustered index creates those appendix-like reference points - with unclustered index we creae a separate reference guide for the execution plan to go thorugh.
Prefiltered data set on column whose values are relevant to the analysis (like specific date, specific regions, maybe specific currency):
- narrowing down the size of a data set,
- we are wokring with what we really need - we exclyde everythin else that I don't care about speeding up the performance.
Using temporary tables and CTEs - organizning query / isolating data set being filtered due to criteria you need for the analysis:
- getting focused on a single business logic/requirement per temp table / cte,
- temporary tebles are also good for debugging.
Surrogate primary key in dimension tables - synthetic, lightweight key being a consecutive integer.
Avoiding JOIN and GROUP BY and making window functions instead.

Features

App includes following features:

Demo

Application:

User click Get Questions button what pulls questions down to the application.
Each question and answer is hidden. We can click on the proper links to unhide question.
User can unhide question first, formulate its answer, then unhide the app's answer in order to comapre.

Features:

NodeJS - server-side API sevice to handle application's HTTP requests.
MongoDB Atlas - server connects cloud data storage where I keep question-answer documents that are pulled to application.
RactJS - front-end.
ExpressJS - web framework that supports building API interfaces.

Features

Demo

Application:

Features:

Setup

No specific installation required.

Source Code