Sunday 29 January 2012

wc


wc
wc is a very useful utility which can count lines, characters, words, bytes etc in a plain text file
or standard input. In shell scripts, It helps to  store a value for the total lines of a file or output of
a command  into a variable for subsequent use. The examples below  show how these things can be achieved.

Example 1)
To  get  the counts of lines,  words  and  bytes of a file  use wc as follows.

/home/mark$ wc    Bulk-SMS-file.txt
 45      140      990------------------------------> No   of   bytes
 |              |---------->  No  of  Words                   
No of lines                                                   
                                


 The individual values can be retrieved as field through awk , but wc provides options.

wc   -l      :   Total   Lines

wc   -w    :   Total   words

wc    -c     :  Total   bytes

wc   -k     :  Total   characters

The wc  command considers a word to be a string of characters of non-zero length which are delimited by a white space and  lines are counted when  newline characters occur.


Example 2)
Using  wc  on multiple files.

/home/mark$ ls  file_list-201111*
file_list-20111105.txt
file_list-20111113.txt
file_list-20111120.txt
file_list-20111128.txt

  /home/mark$ wc   -l    file_list-201111*
  98164   file_list-20111105.txt
  531665 file_list-20111113.txt
  527303 file_list-20111120.txt
  564207 file_list-20111128.txt
 1721339 total

This  gives  you the  no of lines  in  every file as well as the sum of  all the individual  line counts.

Same command  can be  run without the  -l option to get the counts of various parameters and their totals.

Example 3)
wc   can also count these  values  from standard  output through pipes.

/home/mark$ echo  "I  Love  You" | wc
      1       3      17

Hmm!   wc  achieved  something cool  this time…

Similarly you can use wc  -l  as an alternative to grep  -c.
/home/mark$  grep  –c   tremendous    Director-speech.txt 

Is same as

/home/mark$ grep   tremendous    Director-speech.txt  | wc  -l 


Example 4)
wc  -l   may consume a lot of time on counting  the line numbers when the size of the file is huge.
If  we are sure that every line in a file  contains equal number of characters, there  is a faster method to achieve it.

Assume that  you have a continuous file, i.e. a file which  has same number of characters (or bytes ) in each line.
/home/mark$  ls   -lrt   All_Customers.lst
-rw-r--r--    1 mark   Administ    1321655460912 Dec  16 14:37 All_Customers.lst

/home/mark$  wc   -l   All_Customers.lst
23456898

The following steps explain the method.

Step1)
save the  first thousand lines of the file in a separate  file.
/home/mark$  head   -1000   All_Customers.lst   > All_Customers_1000.lst`

Step2)
get  the ratio of  the  total size of the file to the  line count(1000) of the file.
/home/mark$  ls  -lrt  All_Customers_1000.lst
-rw-r--r--    1 mark   Administ    56344000 Dec  16 14:37 All_Customers.lst

/home/mark$  fsize=` ls  -lrt  All_Customers_1000.lst | awk  ‘{ print  $5}’`
/home/mark$  fratio=`expr   $fsize  /  1000`
/home/mark$  echo  $fratio 
56344

Now ,fratio actually stores the number of bytes per file.

Step3)
Now divide  the total size of  All_Customers.lst with  fratio.
/home/mark$tot_lines=` expr  1321655460912 /  $fsize`
/home/mark$echo  $tot_lines
23456898

Which is same as calculated by wc  -l.
All These steps can be used in a shell script  for any given file of this type.

If   there are multiple files of such types (such  as file_list-201111*)  and all have the same fratio.(defined above),then  you can  write a script to get the similar output  as that of
 wc  –l   file_list-201111*

The script is as shown.
fratio=313
tot_val=0
for stream in `ls file_list-201111*`
do
str=`ls -lrt $stream|awk '{print  $5 " + "}'|tr -d "\n"|sed 's/$/0/'`
val=`echo "($str)/$fratio"|bc`
echo  “$val  $stream "
tot_val=`expr $tot_val + $val`
done
echo  " $tot_val   total " 


Here ,between the output  of ls  giving sizes of all the files ,a ‘ +’  symbol  is placed and 0 at the end and is passed to bc command as an expression whose value (the sum)  is divided by the
ratio  just as explained in the 3 steps above.






















No comments:

Post a Comment