Sunday 29 January 2012

sort


sort
Sort ,  as the name suggests sorts  a file  or output of a command. Various  options  determine the sort criteria  which are called as sort keys. If multiple files are passed as parameters to sort the output of sort is concatenated.
Sorting is done in ascending lexicographic order , I e the way in which it appears in a dictionary. If a file contains numbers and alphabet both,  by default sort  places  the  sorted alphabets  first and then the numbers.

Example1)
Consider a file containing  values as shown.

/home/jones$  cat  flowers.txt
rose
lily
tulip
marigold
hibiscus
Chrysanthemum
/home/jones$  sort   flowers.txt
Chrysanthemum
hibiscus
lily
marigold
rose
tulip

If you want to sort it in reverse order ,use
/home/jones$ sort   -r  flowers.txt
tulip
rose
marigold
lily
hibiscus
Chrysanthemum

Example2)
If you want the sort to to be case insensitive sort  –f must be used.
/home/jones $  cat  flowers.txt
rose
lily
Chrysanthemum
tulip
marigold
Rose
hibiscus
chrysanthemum

To   default sort(without any option) would give
/home/jones $ Sort  flowers.txt
Chrysanthemum
Rose
Chrysanthemum
hibiscus
lily
marigold
rose
tulip

Here it sorted the letters starting with uppercase first and then  sorted  lower case ones, Now  use sort -f
/home/jones $ sort  –f  flowers.txt
Chrysanthemum
chrysanthemum
hibiscus
lily
marigold
Rose
rose
tulip


Example3)
A file may contain duplicate lines, and if you want to remove duplicate files before sorting  , use sort –u
/home/jones $ grep   error  cron.log
error code 45:invalid time
error code 35:Invalid name
error code 45:invalid time
error code 25:Invalid email-id
error code 25:Invalid email-id
error code 35 Invalid name

/home/jones $ grep   error  cron.log | sort  -u
error code 25:Invalid email-id
error code 35:Invalid name
error code 45:invalid time

Example 4)                                                                                     
You require to sort based  on  a  particular field  when they are separated by a delimiter.
/home/jones $ cat  detailed-list.csv
dolphin|mammal|12
giraffe|mammal|7
kingfisher|aves|3
moth|insecta|1
shark|fish|6
viper|reptile|2

Now,  The output of default sort command is
dolphin|mammal|12
giraffe|mammal|7
kingfisher|aves|3
moth|insecta|1

shark|fish|6
viper|reptile|2
it sorted on the basis of first field (separated by ‘|’) in alphabetic order. For a change, you required to sort it based on another column.
/home/jones$ sort  -t   “|”  +1  detailed-list.csv
kingfisher|aves|3
shark|fish|6
moth|insecta|1
dolphin|mammal|12
giraffe|mammal|7
viper|reptile|2

Here  -t  “|”  tells the sort command  do sorting on  fields  delimited by a   “|” character. If you do not use  -t option, sequence of  space  characters is considered as   default delimiter .  “+1”    instructs sort to ignore the first field or sort from second field.

Similarly if you wanted to sort it based on the third column, just using  +2  instead of +1 might not work in the given example. The output of that sort command would be as follows.

/home/jones$ sort  -t  “|”  +2  detailed-list.csv
moth|insecta|1
dolphin|mammal|12
viper|reptile|2
kingfisher|aves|3
shark|fish|6
giraffe|mammal|7
It  sorted the third  column according to its alphabetic value  and not arithmetic value. To do so you must use  -n option

/home/jones$ sort   -n  -t  “|”  +2  detailed-list.csv
moth|insecta|1
viper|reptile|2
kingfisher|aves|3
shark|fish|6
giraffe|mammal|7
dolphin|mammal|12

Example5)
The  sorting based on fields can be achieved using   sort with –k option. consider a file containing numbers.
/home/jones$ cat  num-luck
6758 987 456
2586 324 934
0437 235 417
2586 324 934

Suppose  you wanted to sort  this list  such that sorting should start from 3rd column of 1st field and 4th column of  1st  field. Here fields are separated by  one or more spaces.  Columns here refer to characters.

/home/jones$ sort   -k1.3,1.4   num-luck
2812 624 208
0437 235 417
6758 987 456
2586 324 934

To sort from 2nd column of 1st field and  3rd  column of second field in reverse order,

/home/jones$ sort   -k1.2,2.3r   num-luck
2812 624 208
6758 987 456
2586 324 934
0437 235 417

Similarly, to  sort lines based on 1st and 3rd fields, use
Sort  –k1 –k3  <filename >


Example 6)
You may require to sort a particular file and rewrite the file with the sorted file. A simple  command of the form
Sort  filename  >new_sorted_filename     will not work and is dangerous as it  rewrites it into  an empty file. Use  sort with  –o option.

/home/jones$ sort  -o   flowers.txt    flowers.txt 

The  syntax is  sort  –o  <sorted-file>    <file>

Example 7)
Sort  uses a lot of temporary space  while sorting huge files. and by default it uses  /tmp   directory. If sufficient space was not allocated  to /tmp, then  the command would abort abruptly. so sort provides an option  -T by which you can  use an alternative directory  for storing temporary files.

The sort command  sort  -t   “|”  +1  detailed-list.csv in example 4 can be written as
sort  -t   “|”  +1   -T   /backup/jones     detailed-list.csv.
this will use  /backup/jones    directory   for storing temporary files.


2 comments:

  1. This is Really useful to me, learned new things of sorting any position in a column

    ReplyDelete
  2. great post! really useful! tnx :)

    ReplyDelete