100 Useful Command-Line Utilities

by Oliver; 2014

25. sort

From An Introduction to the Command-Line (on Unix-like systems) - sort: As you guessed, the command sort sorts files. It has a large man page, but we can learn its basic features by example. Let's suppose we have a file, testsort.txt, such that:
$ cat testsort.txt 
vfw	34	awfjo
a	4	2
f	10	10
beb	43	c
f	2	33
f	1	?
Then:
$ sort testsort.txt 
a	4	2
beb	43	c
f	1	?
f	10	10
f	2	33
vfw	34	awfjo
What happened? The default behavior of sort is to dictionary sort the rows of a file according to what's in the first column, then second column, and so on. Where the first column has the same value—f in this example—the values of the second column determine the order of the rows. Dictionary sort means that things are sorted as they would be in a dictionary: 1,2,10 gets sorted as 1,10,2. If you want to do a numerical sort, use the -n flag; if you want to sort in reverse order, use the -r flag. You can also sort according to a specific column. The notation for this is:
sort -kn,m
where n and m are numbers which refer to the range column n to column m. In practice, it may be easier to use a single column rather than a range so, for example:
sort -k2,2
means sort by the second column (technically from column 2 to column 2).

To sort numerically by the second column:
$ sort -k2,2n testsort.txt 
f	1	?
f	2	33
a	4	2
f	10	10
vfw	34	awfjo
beb	43	c
As is often the case in unix, we can combine flags as much as we like.

Question: what does this do?
$ sort -k1,1r -k2,2n testsort.txt
vfw	34	awfjo
f	1	?
f	2	33
f	10	10
beb	43	c
a	4	2
Answer: the file has been sorted first by the first column, in reverse dictionary order, and then—where the first column is the same—by the second column in numerical order. You get the point!

Sort uniquely:
$ sort -u testsort.txt               # sort uniquely
Sort using a designated tmp directory:
$ sort -T /my/tmp/dir testsort.txt   # sort using a designated tmp directory
Behind the curtain, sort does its work by making temporary files, and it needs a place to put those files. By default, this is the directory set by TMPDIR, but if you have a giant file to sort, you might have reason to instruct sort to use another directory and that's what this flag does.

Sort numerically if the columns are in scientific notation:
$ sort -g testsort.txt
sort works particularly well with uniq. For example, look at the following list of numbers:
$ echo "2 2 2 1 2 1 3 4 5 6 6" | tr " " "\n" | sort
1
1
2
2
2
2
3
4
5
6
6
Find the duplicate entries:
$ echo "2 2 2 1 2 1 3 4 5 6 6" | tr " " "\n" | sort | uniq -d
1
2
6
Find the non-duplicate entries:
$ echo "2 2 2 1 2 1 3 4 5 6 6" | tr " " "\n" | sort | uniq -u
3
4
5

<PREV   NEXT>