Awk

Page content

Awk is a powerful language mainly for data extraction. My main use has been to extract specific records or fields from large data set files before further analyzing in Excel or other text editors. Some data sets I have worked with have been so large that Excel is unable to load the data set, which has required initial data extraction by either extracting only the required combination of records or fields of the data set.

I will be collecting various awk command examples as I run into the need to create them. Awk is a powerful language for extracting and manipulating data text files. Several examples include function similar to using grep, separating columns of data, printing only certain columns or performing aggregate functions.

Printing Columns

Awk command to print columns from a file. -F “,” defines the column separator. $1 determines the field position you want to print. $0 will print the entire row of data.

awk -F "," '{print $1}' poplation_by_zipcode

Subtracting Values

Awk command for subtracting previous value from current value to return difference.

awk -F "," '{if($2 == 24014) {print $1, $2, $3}}' poplation_by_zipcode.csv | awk 'NR==1{p=$3;next}{print $1, $2, $3, $3-p; p=$3}END{print p}'

Example Source Data:

Report Date ZIP Code Population
08/21/2020 24014 15
08/22/2020 24014 16
08/23/2020 24014 16
08/24/2020 24014 16
08/25/2020 24014 17
08/26/2020 24014 17
08/27/2020 24014 17
08/28/2020 24014 17

Resulting Output of awk command:

Report Date ZIP Code Population Difference
08/21/2020 24014 15 0
08/22/2020 24014 16 1
08/23/2020 24014 16 0
08/24/2020 24014 16 0
08/25/2020 24014 17 1
08/26/2020 24014 17 0
08/27/2020 24014 17 0
08/28/2020 24014 17 0

Adding Line Number

awk '{print NR" "$0}' file.txt

Example Source Data:

$cat file.txt
First Line of file
Another line in file
This is a last line in file

Resulting Output of awk command:

awk ‘{print NR" “$0}’ file.txt
1 First Line of file
2 Another line in file
3 This is a last line in file

Calculating Sum of Values:

Awk command for adding all values in the third column together for a column total.

awk -F',' '{sum+=$3;} END{print sum;}' sample_data.csv

Example Source Data:

Report Date State Value
2024/04/01 Florida 1
2024/04/02 Florida 3
2024/04/03 Florida 4
2024/04/04 Florida 2
2024/04/01 Virginia 18
2024/04/02 Virginia 9
2024/04/03 Virginia 6
2024/04/04 Virginia 4

Resulting Output of awk command: