We consider an active search problem in intensionally specified structured spaces. The ultimate goal in this setting is to discover structures from structurally different partitions of a fixed but unknown target class. An example of such a process is that of computer-aided de novo drug design. In the past 20 years several Monte Carlo search heuristics have been developed for this process. Motivated by these hand-crafted search heuristics, we devise a Metropolis--Hastings sampling scheme where the acceptance probability is given by a probabilistic surrogate of the target property, modeled with a max entropy conditional model. The surrogate model is updated in each iteration upon the evaluation of a selected structure. The proposed approach is consistent and the empirical evidence indicates that it achieves a large structural variety of discovered targets.
We present an effective method for supervised feature construction. The main goal of the approach is to construct a feature representation for which a set of linear hypotheses is of sufficient capacity -- large enough to contain a satisfactory solution to the considered problem and small enough to allow good generalization from a small number of training examples. We achieve this goal with a greedy procedure that constructs features by empirically fitting squared error residuals. The proposed constructive procedure is consistent and can output a rich set of features. The effectiveness of the approach is evaluated empirically by fitting a linear ridge regression model in the constructed feature space and our empirical results indicate a superior performance of our approach over competing methods.
We investigate a novel approach for intuitive interaction with a data set for explorative data analysis. The key idea is that a user can directly interact with a two or three dimensional embedding of the data and actively place data points to desired locations. To achieve this, we propose a variant of semisupervised kernel PCA which respects the placement of control points and maximizes the variance of the unlabelled data along the ‘directions’ of the embedding.
We investigate the problem of constructing novel representatives of a fixed but unknown concept over a structured, discrete domain, such as the set of unlabeled graphs. In particular, we consider an online scenario in which the learning algo- rithm immediately observes the true label of each proposed structure. This prob- lem is highly relevant in many application domains and has so far been tackled mostly by special-purpose algorithms. We propose a general algorithm based on iteratively sampling novel structures from the domain and updating a conditional probability model of the concept based on the observed labels. The conditional probability model is used by the sampling procedure to focus on representatives of the desired concept. An empirical study of our approach demonstrates its ef- fectiveness on a variety of problems. In this paper, we investigate an iterative process for de novo construction of structured concept rep- resentatives, focusing on the construction of graphs with desirable properties. In each step, we first sample a candidate from the posterior distribution p(x j y) of instances x given the desired label y and obtain its true label from the oracle. We then update the conditional distribution p(y j x) by fitting a probability model to the current set of examples. The posterior distribution is initially equal to some given instrumental distribution and is in further iterations updated by the learned con- ditional probability model. The procedure to obtain samples from the updated posterior resembles a Metropolis-Hastings chain in which the instrumental distribution serves as the proposal distribu- tion and the acceptance probability is computed from the conditional probability model. A simple argument shows that the stationary distribution of this chain is exactly the posterior p(xj y). To be able to discover concept representatives, the support of the instrumental distribution must contain the desired concept. The uniform distribution is generally a safe choice, as it guarantees that the entire space will eventually be discovered. In the absence of any other information about the desired concept, we advocate the uniform distribution from a maximum-entropy standpoint. However, when domain-specific knowledge is available, the instrumental distribution can be modified appropriately. This can be especially useful when the concept class is relatively small compared to the entire search space, in which case a uniform sampler might need very many samples to see a positive candidate.
Nema pronađenih rezultata, molimo da izmjenite uslove pretrage i pokušate ponovo!
Ova stranica koristi kolačiće da bi vam pružila najbolje iskustvo
Saznaj više