Root
Count bytes

Count bytes

One morning, I came upon the urge to count the frequency of byte occurances in a text file. I also viewed this as an excellent opportunity to practice my hand at C, since I've been recently living in the land of fancy-pants object-oriented scripting languages like Python.

Here is example usage of this program. It combines nicely with standard UNIX tricks like sort and column

% ./countbytes -zeros countbytes.c | sort -r | column
1764  32         123  41 )        30  37 %         9 119 w         3  55 7
 460 105 i       123  40 (        29  92 \         9  58 :         2 124 |
 418 116 t       116  59 ;        27  60 <         9  51 3         2 113 q
 407 101 e       111 109 m        27  49 1         9  39 '         2  78 N
 309 115 s        96 117 u        27  46 .         7  69 E         2  70 F
 265 110 n        94 103 g        27  42 *         7  52 4         1  84 T
 263  10          86  61 =        23 125 }         6 107 k         1  83 S
 261 114 r        81  98 b        23 123 {         6 106 j         1  77 M
 240 111 o        78  48 0        22 122 z         6  76 L         1  74 J
 203  97 a        75  44 ,        17 120 x         5  73 I         1  71 G
 184 102 f        74  34 "        14  54 6         5  56 8         1  68 D
 136  47 /        64 121 y        13  50 2         4  65 A         1  66 B
 132 108 l        60  95 _        11  62 >         4  35 #         1  63 ?
 129 104 h        52  93 ]        11  38 &         4  33 !
 129 100 d        52  91 [        10  82 R         3  85 U
 127  99 c        41 118 v        10  53 5         3  67 C
 124 112 p        31  43 +        10  45 -         3  57 9

It can count the occurances of individual bytes, individual bit positions turned on (in each byte), and (as seen above) omit output if the count is zero.

The usage() of this program is as follows:

Usage: ./countbytes [-bytes] [+bits] [-zeros] [+v] file ...

Counts and displays bit and byte occurances in one or more files.

  -bytes    omit byte counts.
  +bits     display bit counts.
  -zeros    omit count=0 entries.
  +v        verbose.

When called with only the option +bits, the program emits at most 264 lines. Each line contains either two or three values: the frequency of occurance, the byte (or bit) number, and an ASCII representation of the byte should it lie between 32 and 126, inclusive. The option -bytes and -zeros will reduce the number of lines printed appropriately.

You can download the source code countbytes.c for yourself.


This is https://michal.guerquin.com/countbytes.html, updated 2005-06-19 19:59 EDT

Contact: michalg at domain where domain is gmail.com (more)