Unix Commands Tutorial: wc

wc is a very useful utility which can count lines, characters, words, bytes etc in a plain text file

or standard input. In shell scripts, It helps to store a value for the total lines of a file or output of

a command into a variable for subsequent use. The examples below show how these things can be achieved.

Example 1)

To get the counts of lines, words and bytes of a file use wc as follows.

/home/mark$ wc Bulk-SMS-file.txt

45 140 990------------------------------> No of bytes

| |----------> No of Words

No of lines

The individual values can be retrieved as field through awk , but wc provides options.

wc -l : Total Lines

wc -w : Total words

wc -c : Total bytes

wc -k : Total characters

The wc command considers a word to be a string of characters of non-zero length which are delimited by a white space and lines are counted when newline characters occur.

Example 2)

Using wc on multiple files.

/home/mark$ ls file_list-201111*

file_list-20111105.txt

file_list-20111113.txt

file_list-20111120.txt

file_list-20111128.txt

/home/mark$ wc -l file_list-201111*

98164 file_list-20111105.txt

531665 file_list-20111113.txt

527303 file_list-20111120.txt

564207 file_list-20111128.txt

1721339 total

This gives you the no of lines in every file as well as the sum of all the individual line counts.

Same command can be run without the -l option to get the counts of various parameters and their totals.

Example 3)

wc can also count these values from standard output through pipes.

/home/mark$ echo "I Love You" | wc

1 3 17

Hmm! wc achieved something cool this time…

Similarly you can use wc -l as an alternative to grep -c.

/home/mark$ grep –c tremendous Director-speech.txt

Is same as

/home/mark$ grep tremendous Director-speech.txt | wc -l

Example 4)

wc -l may consume a lot of time on counting the line numbers when the size of the file is huge.

If we are sure that every line in a file contains equal number of characters, there is a faster method to achieve it.

Assume that you have a continuous file, i.e. a file which has same number of characters (or bytes ) in each line.

/home/mark$ ls -lrt All_Customers.lst

-rw-r--r-- 1 mark Administ 1321655460912 Dec 16 14:37 All_Customers.lst

/home/mark$ wc -l All_Customers.lst

23456898

The following steps explain the method.

Step1)

save the first thousand lines of the file in a separate file.

/home/mark$ head -1000 All_Customers.lst > All_Customers_1000.lst`

Step2)

get the ratio of the total size of the file to the line count(1000) of the file.

/home/mark$ ls -lrt All_Customers_1000.lst

-rw-r--r-- 1 mark Administ 56344000 Dec 16 14:37 All_Customers.lst

/home/mark$ fsize=` ls -lrt All_Customers_1000.lst | awk ‘{ print $5}’`

/home/mark$ fratio=`expr $fsize / 1000`

/home/mark$ echo $fratio

56344

Now ,fratio actually stores the number of bytes per file.

Step3)

Now divide the total size of All_Customers.lst with fratio.

/home/mark$tot_lines=` expr 1321655460912 / $fsize`

/home/mark$echo $tot_lines

23456898

Which is same as calculated by wc -l.

All These steps can be used in a shell script for any given file of this type.

If there are multiple files of such types (such as file_list-201111*) and all have the same fratio.(defined above),then you can write a script to get the similar output as that of

wc –l file_list-201111*

The script is as shown.

fratio=313

tot_val=0

for stream in `ls file_list-201111*`

str=`ls -lrt $stream|awk '{print $5 " + "}'|tr -d "\n"|sed 's/$/0/'`

val=`echo "($str)/$fratio"|bc`

echo “$val $stream "

tot_val=`expr $tot_val + $val`

done

echo " $tot_val total "

Here ,between the output of ls giving sizes of all the files ,a ‘ +’ symbol is placed and 0 at the end and is passed to bc command as an expression whose value (the sum) is divided by the

ratio just as explained in the 3 steps above.

Unix Commands Tutorial

Sunday, 29 January 2012

wc

No comments:

Post a Comment

About Me

Blog Archive