Reliable prediction

Most statis­ti­cal learn­ing the­ory set­tings (such as PAC learn­ing and on­line learn­ing) do not provide good enough bounds in re­al­is­tic high-stakes set­tings, where the test data does not always look at the train­ing data and a sin­gle mis­taken pre­dic­tion can cause catas­tro­phe. It is im­por­tant to de­sign pre­dic­tors that can ei­ther re­li­ably pre­dict ver­ifi­able facts (such as hu­man be­hav­ior over a short time scale) or in­di­cate when their pre­dic­tions might be un­re­li­able.


Imag­ine that we have a pow­er­ful (but not es­pe­cially re­li­able) pre­dic­tion sys­tem for pre­dict­ing hu­man an­swers to bi­nary ques­tions. We would like to use it to pre­dict hu­man an­swers when­ever we ex­pect these pre­dic­tions to be re­li­able, and avoid out­putting pre­dic­tions wher­ever we ex­pect the pre­dic­tions to be un­re­li­able (so we can gather train­ing data in­stead).

Ac­tive learning

In [ac­tive learn­ing](https://​​en.wikipe­​​wiki/​​Ac­tive_learn­ing_(ma­chine_learn­ing)), the learner de­cides which ques­tions to query the hu­man about. For ex­am­ple, it may query the hu­man about the most am­bigu­ous ques­tions, which provide the most in­for­ma­tion about how the hu­man an­swers ques­tions. Un­for­tu­nately, this may re­sult in ask­ing the hu­man “weird” ques­tions rather than ac­tu­ally in­for­ma­tive ones (this prob­lem is doc­u­mented in this liter­a­ture sur­vey). There are some strate­gies for re­duc­ing this prob­lem, such as only ask­ing ques­tions sam­pled from some re­al­is­tic gen­er­a­tive model for ques­tions.

KWIK (“knows what it knows”) learning

In KWIK learn­ing (a var­i­ant of on­line se­lec­tive sam­pling, which is it­self a hy­brid of [ac­tive learn­ing](https://​​en.wikipe­​​wiki/​​Ac­tive_learn­ing_(ma­chine_learn­ing)) and on­line learn­ing), a learner sees an ar­bi­trary se­quence of ques­tions. The learner has some class of hy­pothe­ses for pre­dict­ing the an­swers to ques­tions, one of which is good (effi­cient rel­a­tive to the learner). For each ques­tion, the learner may ei­ther out­put a pre­dic­tion or ⊥ . If the learner out­puts a pre­dic­tion, the pre­dic­tion must be within ε of a good pre­dic­tion. If the learner out­puts ⊥ , then it re­ceives the an­swer to the ques­tion.

This is easy to imag­ine in the case where there are 100 ex­perts, one of which out­puts pre­dic­tions that are effi­cient rel­a­tive to the other ex­perts. Upon re­ceiv­ing a ques­tion, the learner asks each ex­pert for its pre­dic­tion of the an­swer to the ques­tion (as a prob­a­bil­ity). If all pre­dic­tions by ex­perts who have done well so far are within ε of each other, then the learner out­puts one of these pre­dic­tions. Other­wise, the learner out­puts ⊥, sees the hu­man’s an­swer to the ques­tion, and re­wards/​pe­nal­izes the ex­perts ac­cord­ing to their pre­dic­tions. Even­tu­ally, all ex­perts ei­ther out­put good pre­dic­tions or get pe­nal­ized for out­putting enough bad pre­dic­tions over time.

Un­for­tu­nately, it is hard to show KWIK-learn­abil­ity for hy­poth­e­sis classes more com­pli­cated than a small finite set of ex­perts or a class of lin­ear pre­dic­tors.

Fur­ther reading

Ac­tive learn­ing for opaque, pow­er­ful predictors


  • AI alignment

    The great civ­i­liza­tional prob­lem of cre­at­ing ar­tifi­cially in­tel­li­gent com­puter sys­tems such that run­ning them is a good idea.