SQL cheat sheet for data analysis (2023)

group from

SQL cheat sheet for data analysis (1)

cumulative function

Calculate the quantity of the product:

choose numbering(*)fromproduct;

Count the number of products with a non-zero value:

choose numbering(price)fromproduct;

Count the number of unique onescategoryValues:

choose numbering(clearlycategory)fromproduct;

Get minimum and maximum product prices:

choose Min. (price),upper limit (price)fromproduct;

Find the total product price for each category:

choosecategory,and(price)fromproductgroup fromcategory;

Find the average price of each product category whose average price is above 3.0:

choosecategory,average value (price)fromproductgroup fromcategoryI have average value (price)> 3,0

Order by

Get product names sorted by orderpriceColumns sorted in ascending order by default:

chooseNamefromproductOrder byprice[ASC];

Get product names sorted by orderpriceColumns in descending order:

chooseNamefromproductOrder bypriceDESC;

calculate

use+,-,*,/Do basic math. To get the seconds of the week:

choose60*60*24*7;-- Result: 604800

round numbers

Round the numbers to the nearest whole number:

chooseround (1234.56789);-- Result: 1235

Round numbers to two decimal places:

chooseround ( avg ( value ), 2 )fromproductWhereclass_id = 21;-- Result: 124.56

troubleshooting

integer division

In PostgreSQL and SQL Server,/The operator performs integer division on integer arguments. If you don't see the expected number of decimal places, it's because you're dividing between two integers. Convert one to decimal:

123/2-- Result: 61throw (123 AS Decimal)/ 2-- Result: 61.5

divide by 0

To avoid this error, make sure the denominator is not0. you can use itNULLIF()function to replace0withinvalid, which leads toinvalidFor the whole expression:

counting /NULLIF(all counts, 0)

take part

JOIN is used to get data from multiple tables. To get the names of the products purchased in each order, use:

chooseOrder.OrderDate, Product NameasQuantityfromSeriestake partproductexistsproduct.id = order.product_id;

Learn more about JOIN in our interactionSQL connectioncourse.

enter

To insert data into the table, useenterSeries:

entercategoryvalues(1, "Home and Kitchen"), (2, "Clothes and Clothing");

You can specify the columns to add data to. The remaining columns are filled with predefined default values ​​orinvalidsmall.

entercategory name)values(“electronic product”);

renew

To update data in a table, userenewSeries:

renewcategoryI putis_active = true, name = 'office'Wherename = 'office';

delete

To delete data from a table, usedeleteSeries:

deleted bycategoryWhereNameVacuum;

Check out our interactive lessonsHow to insert, update and delete data in SQL.

date and time

There are 3 main types related to time:date,year, andtime stamp. Times are represented using a 24-hour clock and can be as vague as hours and minutes (for example,15:30– 3:30 PM) or to the nearest microsecond and time zone (see below):

SQL cheat sheet for data analysis (2)

14:39:53.662522-05It's almost 2:40pm. CDT (eg in Chicago, 7:40 PM UTC). The letters in the example above represent:

In the date part:

YYYY– 4 digit year. mm– Months with zero investment (01——January to12-December). DD– Day with zero investment.

part of time:

fearful– Zero hours on a 24-hour clock. mm- Minutes. SS- The number of seconds.can be omitted. ssssssss– Small fraction of a second – can be represented using 1 to 6 digits.can be omitted. ±TZ- Time zone. It must start with any of the following+the-and use two digits relative to UTC.can be omitted.

current date and time

Find out what time it is:

choose current time;

Get today's date:

choose Today's Date;

In SQL Server:

choose getDate();

Get the timestamp of the current date and time:

choose CURRENT_TIMESTAMP;

Creation date and time value

To create a date, time, or timestamp, write the value to a string and convert it to the correct type.

choose throw ('31-12-2021'asdate);choose throw ('15:31'asyear);choose throw ('2021-12-31 23:59:29+02'astime stamp);choose throw ('15:31.124769'asyear);

Note the last example - it interprets as 15 minutes 31 seconds 124769 microseconds! It's always a good idea to explicitly write 00 for the time:'00:15:31.124769'.

chronological order

useOrder bySort rows chronologically from oldest to newest in datetime columns:

chooseOrder Date, Product, Quantityfromsales volumeOrder bydate of order;
date of orderproductquantity
22-07-2023laptop2
23-07-2023biceps3
24-07-2023sports shoes10
24-07-2023Jeans3
25-07-2023mixer2

Sort from newest to oldest using descending order:

chooseOrder Date, Product, Quantityfromsales volumeOrder bydate of orderDESC;

Compare date and time values

You can use comparison operators<,<=,>,>=, and=Compare date and time values. Earlier dates are less than later dates. For example,2023-07-05less than2023-08-05.

Find sales for July 2023:

chooseOrder date, product name, quantityfromsales volumeWhereOrder date >= '07-01-2023'andorder_date < '08-01-2023';

Find customers signed up for July 2023:

chooseRegistration timestamp, emailfromcustomerWhereRecord Timestamp >= '07-01-2023'andRecord Timestamp < '2023-08-01';

notes:Note the due date in the query. upper limit'01-08-2023'Not included. time stamp'01-08-2023'it is actually a timestamp'01-08-2023 00:00:00.0'. comparison operator<Used to ensure that the selection is made for all timestamps less than'01-08-2023 00:00:00.0', meaning all timestamps in July 2023, even those near midnight on August 1, 2023.

space

Intervals measure the difference between two points in time. For example, the space between2023-07-04and06-07-2023It's 2 days.

To set a range in SQL, use the following syntax:

space"1"heaven

The pension consists of three elements:spacekeywords, quoted values, and time segment keywords. You can use the following time parts:Year,moon,heaven,Time,minute, thin, andsecond.

Add spaces to date and time values

you can use it+the-The interval to add or subtract from a date or timestamp value.

minus one year2023-07-05:

choose throw ('05-07-2023'astime stamp)-space"1"Year;-- Result: 2022-07-05 00:00:00

Find customers who placed their first order within a month of signing up:

chooseID cardfromcustomerWhereDate of first order > Date of registration +space"1"moon;

Filter events within the last 7 days

To find scheduled deliveries in the last 7 days, use:

choosedelivery date, addressfromsales volumeWheredelivery date <=Today's Date anddelivery date >=Today's Date-space"7"heaven;

notes:In SQL Server, intervals are not implemented - usedate added()anddate difference()Mode.

Filter events within the last 7 days in SQL Server

To find discounts for the last 7 days, use:

choosedelivery date, addressfromsales volumeWheredelivery date <=getDate() anddelivery date >=date added (day,-7,getDate());

extract partial date

The standard SQL syntax for getting a part of a date is

choose refining (Yearfromdate of order)fromSales volume;

You can export the following fields:
Year,moon,heaven,Time,minute, thin, andsecond.

Standard syntax does not work in SQL Server. useDATEPART (part, date)operation instead.

choose date part (year, date of order)fromSales volume;

group by year and month

Find the number of sales per month:

choose refining (Yearfromdate of order) asYear,refining (moonfromdate of order) asmoon,numbering(*) asnumberingfromsales volumegroup fromyearsOrder byyears;
Yearmoonnumbering
2022851
2022958
20221062
20221176
20221285
2023171
2023269

Note that you must also group by year and month.Delivery (months from order date)Output only the month digits (1, 2, ..., 12). To distinguish months from different years, you must also group by year.

For more information about using date and time values ​​in our interactiveStandard SQL functionscourse.

case when

case whenallows you to pass conditions likeWhereclause), evaluate them in order and return the value that satisfies the first condition.

chooseName,case neverPrice > 150afterward'High quality'nevervalue > 100afterward"middle class"but'role model'end of ASprice_categoryfromproduct;

Here, all products with prices above 150 getHigh qualitytag, users with a value above 100 (and below 150) will receive.middle classlabel, the rest receiverole modelInscription.

Cases and group v

you can combinecase whenandgroup fromCalculates object statistics in categories you specify.

choose case neverPrice > 150afterward'High quality'nevervalue > 100afterward"middle class"but'role model'end of ASprice_category,numbering(*) asproductfromproductgroup fromprice_category;

Calculate the bulk order quantity per customer usingcase whenandand():

chooseCustomer's code,and( case whenQuantity > 10afterward1but0end ) asLarge Ordersfromsales volumegroup fromCustomer's code;

...or usecase whenandnumbering():

chooseCustomer's code,numbering( case whenQuantity > 10afterwardorder numberend ) asLarge Ordersfromsales volumegroup fromCustomer's code;

Learn more in our interactionCreate basic SQL reportscourse.

group by extension

grouping set

grouping setAllows you to specify multiple sets of columns to group by in a query.

choosearea, product,numbering(order number)fromsales volumegroup from grouping set((region, product),()

SQL cheat sheet for data analysis (3)

cube

cubeCreate groupings for all possible subsetsgroup fromList.

choosearea, product,numbering(order number)fromsales volumegroup per cube(region, product);

SQL cheat sheet for data analysis (4)

wrapped

wrappedAdded new grouping levels for subsets and aggregates.

choosearea, product,numbering(order number)fromsales volumegroup with summary(region, product);

SQL cheat sheet for data analysis (5)

merge

mergereplace the first oneinvalidArgument with given value. Usually used to display labelsgroup fromexpand.

choosearea,merge(product, "all"),numbering(order number)fromsales volumegroup with summary(region, product);
areaproductnumbering
USAlaptop10
USAbiceps5
USAall15
UNITED KINGDOM.laptop6
UNITED KINGDOM.all6
allall21

Check out our practical lessonsGROUP BY extension.

common array expression

A common array expression (CTE) is a named temporary result set that can be referenced in a larger query. They are especially useful for complex assemblies and for breaking large queries into more manageable parts.

andTotal product salesas(chooseproduct,and(profit) asTotal profitfromsales volumegroup fromproduct)choose average value (Total profit)fromtotal product sales;

Check out our practical lessonscommon array expression.

window mode

Window functions calculate their results based on a sliding window frame (a set of rows relative to the current row). Unlike aggregate functions, window functions do not collapse rows.

Calculate the percentage of the total within the group

chooseProduct, Brand, Profit, (100.0 * Profit /and(profit) Exceed(Partition basebrand))asperchlorethylenefromSales volume;
productbrandprofitperchlorethylene
knifekitchen100025
containerkitchen300075
toy dollToiz200040
carToiz300060

ranking

To sort products by price:

choose rank() over (Order byprice), Namefromproduct;
ranking function
  • class– Give equal rank to associated values, leaving spaces.
  • DENSE_RANK– Give the same rank to bound values ​​without spaces.
  • ROW_NUMBER– Give consecutive numbers without spaces.
NameclassIntensive levelRow number
Jeans111
gaiters222
gaiters223
sports shoes434
sports shoes435
sports shoes436
Short-sleeved blouse747
running total

A running total is the cumulative sum of a given value and all previous values ​​in the column.

choosedate, amount,and(quantity) Exceed(Order bydate) asrunning totalfromSales volume;
moving average

moving average (akaThe running average is a technique for analyzing trends in time series data. It is the average of the current price and a specified number of previous prices.

choosedate, price,average value (price) Exceed( Order bydatebetween the lines2forward and the current row) asmoving averagefromshare price?
Difference (Delta) between two series
chooseAnnual income,back(income) Exceed(sorted by year) asRevenue_previous_year, Revenue -back(income) Exceed(sorted by year) asdifference annuallyfromannual indicators;

Learn about SQL window functions in our interactivewindow modecourse.

References

Top Articles
Latest Posts
Article information

Author: Van Hayes

Last Updated: 29/07/2023

Views: 6170

Rating: 4.6 / 5 (46 voted)

Reviews: 93% of readers found this page helpful

Author information

Name: Van Hayes

Birthday: 1994-06-07

Address: 2004 Kling Rapid, New Destiny, MT 64658-2367

Phone: +512425013758

Job: National Farming Director

Hobby: Reading, Polo, Genealogy, amateur radio, Scouting, Stand-up comedy, Cryptography

Introduction: My name is Van Hayes, I am a thankful, friendly, smiling, calm, powerful, fine, enthusiastic person who loves writing and wants to share my knowledge and understanding with you.