Correcting for misinformation using machine learning ∗
We propose several algorithms to reduce errors of classification created by using imperfect training sets. A classic example is misallocation of scarce funds to poor households due to unobserved income in combination with data misreporting or corruption at the administrative level. Suppose there are several training examples in which the targets are misclassified at a certain rate. Since the errors are imperfectly correlated across training examples, wrongly classified observations in one sample will share common characteristics with observations that are differently classified in other samples. We use this insight to develop several machine learning algorithms to reduce this classification error. We apply the algorithms to the context of aid programs targeting poor households with limited knowledge of their true income and poverty ranking. In our mainline specification of the problem, our main algorithm makes a 37% improvement in the allocation of the funds. ∗This paper is an extended version of a project developed in a machine learning course at Stanford. †Department of Economics, Stanford University, 579 Serra Mall, Stanford, CA 94305, adugalic@stanford.edu. ‡Department of Economics, Stanford University, 579 Serra Mall, Stanford, CA 94305, tram@stanford.edu 1