How I program in AWK
43 points by xonix
43 points by xonix
After running into several incompatibility issues on Solaris and AIX systems about a decade ago while using sed and awk, I learnt to stick to the POSIX-specified feature set as much as possible. I always keep a link to the POSIX specification bookmarked so I can quickly look up command-line options, regular expression syntax and the sed and awk specifications. An updated POSIX specification was published in 2024 and is available here: https://pubs.opengroup.org/onlinepubs/9799919799/.
The pages I find myself going back to over and over again are:
The nice thing about the second link is that it documents both Basic Regular Expression (BRE) and Extended Regular Expression (ERE) on a single page, which is convenient while switching between grep and grep -E or sed and sed -E. Fortunately, awk uses ERE by default.
Reminds me of the beautiful things @thingskatedid used to post on Twitter back in the day. https://x.com/thingskatedid/status/1286559756967002113
I'm always impressed with folks who write programs in AWK. I know there are several variations of inconsistent popularity. GAWK is what I'm familiar with, and what I imagine most people get when they type awk into the terminal. Are there any advantages to the non-GNU AWK's?
I really need to find my copy of the awk book and re-read it...
MAWK is very fast if you don't need Unicode support. When I implemented a higher-level scripting language in AWK as a learning exercise, MAWK made the AWK implementation faster than the JavaScript one in certain cases.
Sure, GAWK has lots of features over POSIX AWK: https://www.gnu.org/software/gawk/manual/html_node/POSIX_002fGNU.html.
For my programs I personally stick to POSIX subset for maximum portability. For example, for Makesure I ensure the compatibility with the next versions:
One true awk: awk version 20251225
mawk 1.3.3 20090920, Copyright (C) Michael D. Brennan
mawk 1.3.4 20200120
GNU Awk 5.1.1, API: 3.1
GNU Awk 5.2.2, API 3.2, PMA Avon 8-g1
GNU Awk 5.3.2, API 4.0, PMA Avon 8-g1
Goawk v1.31.0
busybox awk: BusyBox v1.31.0 (2019-06-10 15:13:14 CEST) multi-call binary.
For one-liners and throwaway programs I do use some GAWK-specific features. For example "The optional third argument to the match() function for capturing text-matching subexpressions within a regexp" is a must have when you need to do regexp-based value extraction from a line instead of standard AWK field-based.
Another feature I liked is coprocess via |&. It can help you speedup script: https://maximullaris.com/gawk_coprocess_speedup.html.
I would certainly write small scripts in AWK, especially as a way of consolidating certain poorly conceived shell scripts. It is quite useful, but a bit idiosyncratic.
The limiting factor is when one needs structures, at which point it quickly becomes unwieldy, and another language is often a better choice. These days, my 'other' language will often be Go, as it is quite quick to turnaround in a script-like fashion.