100 Useful Command-Line Utilities
by Oliver; 2014100. datamash
Note: datamash is not a default shell program. You have to download and install it.GNU datamash is a great program for crunching through text files and collapsing rows on a common ID or computing basic statistics. Here are some simple examples of what it can do.
Collapse rows in one column based on a common ID in another column:
$ cat file.txt 3 d 2 w 3 c 4 x 1 a
$ cat file.txt | datamash -g 1 collapse 2 -s -W 1 a 2 w 3 d,c 4 xThe -g flag is the ID column; the collapse field picks the second column; the -s flag pre-sorts the file; and the -W flag allows us to delimit on whitespace.
Average rows in one column on a common ID:
$ cat file.txt A 1 3 SOME_OTHER_INFO A 1 4 SOME_OTHER_INFO2 B 2 30 SOME_OTHER_INFO4 A 2 5 SOME_OTHER_INFO3 B 1 1 SOME_OTHER_INFO4 B 2 3 SOME_OTHER_INFO4 B 2 1 SOME_OTHER_INFO4
$ cat file.txt | datamash -s -g 1,2 mean 3 -f -s A 1 3 SOME_OTHER_INFO 3.5 A 2 5 SOME_OTHER_INFO3 5 B 1 1 SOME_OTHER_INFO4 1 B 2 30 SOME_OTHER_INFO4 11.333333333333In this case, the ID is the combination of columns one and two and the mean of column 3 is added as an additional column.
Simply sum a file of numbers:
$ cat file.txt | datamash sum 1Hat tip: Albert