Information is a measure of how much a message grants an observer the ability to predict the world. For a formal description of what this means, see Information: Formalization. Information is observer-dependent: If someone tells both of us your age, you’d learn nothing, while I’d learn how old you are. Information theory gives us tools for quantifying and studying information.

Information is measured in shannons, which are also the units used to measure uncertainty and entropy. Given that you’re about to observe a coin that certainly came up either heads or tails, one shannon is the difference between utter uncertainty about which way the coin came up, and total certainty that the coin came up heads. Specifically, the amount of information in an observation is quantified as the Logarithm of the reciprocal of the Probability that the observer assigned to that observation.

For a version of the previous sentence written in English, see Measuring information. For a discussion of why this quantity in particular is called “information,” see Information: Intro.

Information vs Data

The word “information” has a precise, technical meaning within the field of Information theory. Information is not to be confused with data, which is a measure of how many messages a communication medium can distinguish in principle. For example, a series of three ones and zeros is 3 bits of data, but the amount of information those three bits carry depends on the observer. Unfortunately, in colloquial usage (and even in some texts on information theory!) the word “information” is used interchangeably with the word “data”; matters are not helped by the fact that the standard unit of information is sometimes called a “bit” (the name for the standard unit of data), despite the fact that these units are distinct. The proper name for a binary unit of information is a “shannon.”

That said, there are many links between information and data (and between shannons and bits). For instance:

An object with a Data capacity of \(n\) bits can carry anywhere between 0 and \(\infty\) shannons of information (depending on the observer), but the maximum amount of information an observer can consistently expect from observing the object is \(n\) shannons. For details, see Expected info capacity.
The number of shannons an observer gets from an observation is equal to the number of bits in the encoding for that observation in their ideal encoding. In other words, shannons measure the number of bits of data you would use to communicate that observation to someone who knows everything you know (except for that one observation). For details, see Ideal encoding and Information as encoding length.

For more on the difference between information and data, see Info vs data.

Information and Entropy

Entropy is a measure on probability distributions which, intuitively, measures the total uncertainty of that distribution. Specifically, entropy measures the number of shannons of information that the distribution expects to gain by being told the actual state of the world. As such, it can be interpreted as a measure of how much information the distribution says it is missing about the world. See also Information and entropy.

Formalization

The amount of information an observation carries to an observer is the Logarithm of the reciprocal of the Probability that they assigned to that observation. In other words, given a probability distribution \(\mathrm P\) over a set \(O\) of possible observations, an observation \(o \in O\) is said to carry \(\log_2\frac{1}{\mathrm P(o)}\) shannons of information with respect to \(\mathrm P\). A different choice for the base of the logarithm corresponds to a different unit of information; see also Converting between units of information. For a full formalization, see Information: Formalization. For an understanding of why information is logarithmic, see Information is logarithmic. For a full understanding of why we call this quantity in particular “information,” see Information: Intro.