Awk
Awk is a powerful language mainly for data extraction. My main use has been to extract specific records or fields from large data set files before further analyzing in Excel or other text editors. Some data sets I have worked with have been so large that Excel is unable to load the data set, which has required initial data extraction by either extracting only the required combination of records or fields of the data set.
I will be collecting various awk command examples as I run into the need to create them. Awk is a powerful language for extracting and manipulating data text files. Several examples include function similar to using grep, separating columns of data, printing only certain columns or performing aggregate functions.
Printing Columns
Awk command to print columns from a file. -F “,” defines the column separator. $1 determines the field position you want to print. $0 will print the entire row of data.
awk -F "," '{print $1}' poplation_by_zipcode
Subtracting Values
Awk command for subtracting previous value from current value to return difference.
awk -F "," '{if($2 == 24014) {print $1, $2, $3}}' poplation_by_zipcode.csv | awk 'NR==1{p=$3;next}{print $1, $2, $3, $3-p; p=$3}END{print p}'
Example Source Data:
Report Date | ZIP Code | Population |
---|---|---|
08/21/2020 | 24014 | 15 |
08/22/2020 | 24014 | 16 |
08/23/2020 | 24014 | 16 |
08/24/2020 | 24014 | 16 |
08/25/2020 | 24014 | 17 |
08/26/2020 | 24014 | 17 |
08/27/2020 | 24014 | 17 |
08/28/2020 | 24014 | 17 |
Resulting Output of awk command:
Report Date | ZIP Code | Population | Difference |
---|---|---|---|
08/21/2020 | 24014 | 15 | 0 |
08/22/2020 | 24014 | 16 | 1 |
08/23/2020 | 24014 | 16 | 0 |
08/24/2020 | 24014 | 16 | 0 |
08/25/2020 | 24014 | 17 | 1 |
08/26/2020 | 24014 | 17 | 0 |
08/27/2020 | 24014 | 17 | 0 |
08/28/2020 | 24014 | 17 | 0 |
Adding Line Number
awk '{print NR" "$0}' file.txt
Example Source Data:
$cat file.txt
First Line of file
Another line in file
This is a last line in file
Resulting Output of awk command:
awk ‘{print NR" “$0}’ file.txt
1 First Line of file
2 Another line in file
3 This is a last line in file
Calculating Sum of Values:
Awk command for adding all values in the third column together for a column total.
awk -F',' '{sum+=$3;} END{print sum;}' sample_data.csv
Example Source Data:
Report Date | State | Value |
---|---|---|
2024/04/01 | Florida | 1 |
2024/04/02 | Florida | 3 |
2024/04/03 | Florida | 4 |
2024/04/04 | Florida | 2 |
2024/04/01 | Virginia | 18 |
2024/04/02 | Virginia | 9 |
2024/04/03 | Virginia | 6 |
2024/04/04 | Virginia | 4 |