One morning, I came upon the urge to count the frequency of byte occurances in a text file. I also viewed this as an excellent opportunity to practice my hand at C, since I've been recently living in the land of fancy-pants object-oriented scripting languages like Python.
Here is example usage of this program. It combines nicely with standard UNIX tricks like sort
and column
% ./countbytes -zeros countbytes.c | sort -r | column 1764 32 123 41 ) 30 37 % 9 119 w 3 55 7 460 105 i 123 40 ( 29 92 \ 9 58 : 2 124 | 418 116 t 116 59 ; 27 60 < 9 51 3 2 113 q 407 101 e 111 109 m 27 49 1 9 39 ' 2 78 N 309 115 s 96 117 u 27 46 . 7 69 E 2 70 F 265 110 n 94 103 g 27 42 * 7 52 4 1 84 T 263 10 86 61 = 23 125 } 6 107 k 1 83 S 261 114 r 81 98 b 23 123 { 6 106 j 1 77 M 240 111 o 78 48 0 22 122 z 6 76 L 1 74 J 203 97 a 75 44 , 17 120 x 5 73 I 1 71 G 184 102 f 74 34 " 14 54 6 5 56 8 1 68 D 136 47 / 64 121 y 13 50 2 4 65 A 1 66 B 132 108 l 60 95 _ 11 62 > 4 35 # 1 63 ? 129 104 h 52 93 ] 11 38 & 4 33 ! 129 100 d 52 91 [ 10 82 R 3 85 U 127 99 c 41 118 v 10 53 5 3 67 C 124 112 p 31 43 + 10 45 - 3 57 9
It can count the occurances of individual bytes, individual bit positions turned on (in each byte), and (as seen above) omit output if the count is zero.
The usage()
of this program is as follows:
Usage: ./countbytes [-bytes] [+bits] [-zeros] [+v] file ... Counts and displays bit and byte occurances in one or more files. -bytes omit byte counts. +bits display bit counts. -zeros omit count=0 entries. +v verbose.
When called with only the option +bits
, the program emits at most 264 lines. Each line contains either two or three values: the frequency of occurance, the byte (or bit) number, and an ASCII representation of the byte should it lie between 32 and 126, inclusive. The option -bytes
and -zeros
will reduce the number of lines printed appropriately.
You can download the source code countbytes.c for yourself.
https://michal.guerquin.com/countbytes.html
, updated 2005-06-19 19:59 EDT