In­for­ma­tion is a mea­sure of how much a mes­sage grants an ob­server the abil­ity to pre­dict the world. For a for­mal de­scrip­tion of what this means, see In­for­ma­tion: For­mal­iza­tion. In­for­ma­tion is ob­server-de­pen­dent: If some­one tells both of us your age, you’d learn noth­ing, while I’d learn how old you are. In­for­ma­tion the­ory gives us tools for quan­tify­ing and study­ing in­for­ma­tion.

In­for­ma­tion is mea­sured in shan­nons, which are also the units used to mea­sure un­cer­tainty and en­tropy. Given that you’re about to ob­serve a coin that cer­tainly came up ei­ther heads or tails, one shan­non is the differ­ence be­tween ut­ter un­cer­tainty about which way the coin came up, and to­tal cer­tainty that the coin came up heads. Speci­fi­cally, the amount of in­for­ma­tion in an ob­ser­va­tion is quan­tified as the Log­a­r­ithm of the re­cip­ro­cal of the Prob­a­bil­ity that the ob­server as­signed to that ob­ser­va­tion.

For a ver­sion of the pre­vi­ous sen­tence writ­ten in English, see Mea­sur­ing in­for­ma­tion. For a dis­cus­sion of why this quan­tity in par­tic­u­lar is called “in­for­ma­tion,” see In­for­ma­tion: In­tro.

In­for­ma­tion vs Data

The word “in­for­ma­tion” has a pre­cise, tech­ni­cal mean­ing within the field of In­for­ma­tion the­ory. In­for­ma­tion is not to be con­fused with data, which is a mea­sure of how many mes­sages a com­mu­ni­ca­tion medium can dis­t­in­guish in prin­ci­ple. For ex­am­ple, a se­ries of three ones and ze­ros is 3 bits of data, but the amount of in­for­ma­tion those three bits carry de­pends on the ob­server. Un­for­tu­nately, in col­lo­quial us­age (and even in some texts on in­for­ma­tion the­ory!) the word “in­for­ma­tion” is used in­ter­change­ably with the word “data”; mat­ters are not helped by the fact that the stan­dard unit of in­for­ma­tion is some­times called a “bit” (the name for the stan­dard unit of data), de­spite the fact that these units are dis­tinct. The proper name for a bi­nary unit of in­for­ma­tion is a “shan­non.”

That said, there are many links be­tween in­for­ma­tion and data (and be­tween shan­nons and bits). For in­stance:

  • An ob­ject with a Data ca­pac­ity of \(n\) bits can carry any­where be­tween 0 and \(\infty\) shan­nons of in­for­ma­tion (de­pend­ing on the ob­server), but the max­i­mum amount of in­for­ma­tion an ob­server can con­sis­tently ex­pect from ob­serv­ing the ob­ject is \(n\) shan­nons. For de­tails, see Ex­pected info ca­pac­ity.

  • The num­ber of shan­nons an ob­server gets from an ob­ser­va­tion is equal to the num­ber of bits in the en­cod­ing for that ob­ser­va­tion in their ideal en­cod­ing. In other words, shan­nons mea­sure the num­ber of bits of data you would use to com­mu­ni­cate that ob­ser­va­tion to some­one who knows ev­ery­thing you know (ex­cept for that one ob­ser­va­tion). For de­tails, see Ideal en­cod­ing and In­for­ma­tion as en­cod­ing length.

For more on the differ­ence be­tween in­for­ma­tion and data, see Info vs data.

In­for­ma­tion and Entropy

En­tropy is a mea­sure on prob­a­bil­ity dis­tri­bu­tions which, in­tu­itively, mea­sures the to­tal un­cer­tainty of that dis­tri­bu­tion. Speci­fi­cally, en­tropy mea­sures the num­ber of shan­nons of in­for­ma­tion that the dis­tri­bu­tion ex­pects to gain by be­ing told the ac­tual state of the world. As such, it can be in­ter­preted as a mea­sure of how much in­for­ma­tion the dis­tri­bu­tion says it is miss­ing about the world. See also In­for­ma­tion and en­tropy.


The amount of in­for­ma­tion an ob­ser­va­tion car­ries to an ob­server is the Log­a­r­ithm of the re­cip­ro­cal of the Prob­a­bil­ity that they as­signed to that ob­ser­va­tion. In other words, given a prob­a­bil­ity dis­tri­bu­tion \(\mathrm P\) over a set \(O\) of pos­si­ble ob­ser­va­tions, an ob­ser­va­tion \(o \in O\) is said to carry \(\log_2\frac{1}{\mathrm P(o)}\) shan­nons of in­for­ma­tion with re­spect to \(\mathrm P\). A differ­ent choice for the base of the log­a­r­ithm cor­re­sponds to a differ­ent unit of in­for­ma­tion; see also Con­vert­ing be­tween units of in­for­ma­tion. For a full for­mal­iza­tion, see In­for­ma­tion: For­mal­iza­tion. For an un­der­stand­ing of why in­for­ma­tion is log­a­r­ith­mic, see In­for­ma­tion is log­a­r­ith­mic. For a full un­der­stand­ing of why we call this quan­tity in par­tic­u­lar “in­for­ma­tion,” see In­for­ma­tion: In­tro.