wc
wc is a very useful utility which can count lines, characters, words, bytes etc in a plain text file
or standard input. In shell scripts, It helps to store a value for the total lines of a file or output of
a command into a variable for subsequent use. The examples below show how these things can be achieved.
Example 1)
To get the counts of lines, words and bytes of a file use wc as follows.
/home/mark$ wc Bulk-SMS-file.txt
45 140 990------------------------------> No of bytes
| |----------> No of Words
No of lines
The individual values can be retrieved as field through awk , but wc provides options.
wc -l : Total Lines
wc -w : Total words
wc -c : Total bytes
wc -k : Total characters
The wc command considers a word to be a string of characters of non-zero length which are delimited by a white space and lines are counted when newline characters occur.
Example 2)
Using wc on multiple files.
/home/mark$ ls file_list-201111*
file_list-20111105.txt
file_list-20111113.txt
file_list-20111120.txt
file_list-20111128.txt
/home/mark$ wc -l file_list-201111*
98164 file_list-20111105.txt
531665 file_list-20111113.txt
527303 file_list-20111120.txt
564207 file_list-20111128.txt
1721339 total
This gives you the no of lines in every file as well as the sum of all the individual line counts.
Same command can be run without the -l option to get the counts of various parameters and their totals.
Example 3)
wc can also count these values from standard output through pipes.
/home/mark$ echo "I Love You" | wc
1 3 17
Hmm! wc achieved something cool this time…
Similarly you can use wc -l as an alternative to grep -c.
/home/mark$ grep –c tremendous Director-speech.txt
Is same as
/home/mark$ grep tremendous Director-speech.txt | wc -l
Example 4)
wc -l may consume a lot of time on counting the line numbers when the size of the file is huge.
If we are sure that every line in a file contains equal number of characters, there is a faster method to achieve it.
Assume that you have a continuous file, i.e. a file which has same number of characters (or bytes ) in each line.
/home/mark$ ls -lrt All_Customers.lst
-rw-r--r-- 1 mark Administ 1321655460912 Dec 16 14:37 All_Customers.lst
/home/mark$ wc -l All_Customers.lst
23456898
The following steps explain the method.
Step1)
save the first thousand lines of the file in a separate file.
/home/mark$ head -1000 All_Customers.lst > All_Customers_1000.lst`
Step2)
get the ratio of the total size of the file to the line count(1000) of the file.
/home/mark$ ls -lrt All_Customers_1000.lst
-rw-r--r-- 1 mark Administ 56344000 Dec 16 14:37 All_Customers.lst
/home/mark$ fsize=` ls -lrt All_Customers_1000.lst | awk ‘{ print $5}’`
/home/mark$ fratio=`expr $fsize / 1000`
/home/mark$ echo $fratio
56344
Now ,fratio actually stores the number of bytes per file.
Step3)
Now divide the total size of All_Customers.lst with fratio.
/home/mark$tot_lines=` expr 1321655460912 / $fsize`
/home/mark$echo $tot_lines
23456898
Which is same as calculated by wc -l.
All These steps can be used in a shell script for any given file of this type.
If there are multiple files of such types (such as file_list-201111*) and all have the same fratio.(defined above),then you can write a script to get the similar output as that of
wc –l file_list-201111*
The script is as shown.
fratio=313
tot_val=0
for stream in `ls file_list-201111*`
do
str=`ls -lrt $stream|awk '{print $5 " + "}'|tr -d "\n"|sed 's/$/0/'`
val=`echo "($str)/$fratio"|bc`
echo “$val $stream "
tot_val=`expr $tot_val + $val`
done
echo " $tot_val total "
Here ,between the output of ls giving sizes of all the files ,a ‘ +’ symbol is placed and 0 at the end and is passed to bc command as an expression whose value (the sum) is divided by the
ratio just as explained in the 3 steps above.
No comments:
Post a Comment