One of the most unique and exciting aspects of machine learning is its multidisciplinary nature which brings
together researchers from different backgrounds and often creates diverse areas of research. To cope with such a
multifarious nature, my research methodology is to approach a problem from the theoretical point of view and
develop applicable algorithms by incorporating the relevant domain knowledge. The research areas of interest
include statistical learning theory, reinforcement learning, semi-supervised learning, kernel methods, Bayesian
inference, and Bayesian nonparametric. By utilizing the aforementioned machine learning techniques, the main goal
of my research is to understand the phenomena in complex and large-scale data including, but not limited to, an
analysis of networked data, computational biology, bioinformatics, computer vision, natural language processing,
and climate informatics. Briefly, my research interest lies in the area of theoretical machine learning and its
applications in a variety of fields.
Examples of research topics that I am interested in include:
It is basically the theory that governs problems of estimating the functional dependency from a set of empirical
data, which is quite broad and closely related to classical problems in statistics including discriminant analysis and
the density estimation problem. More specifically, we are given N pairs of observations {(x_1,y_1),...,(x_N,y_N)} called
training set, which are independently and identically distributed (i.i.d.) according to some unknown but fixed
probability distribution function F(x,y). The goal of the learning machine is to construct an appropriate approximation
to the functional dependency f:X->Y using only data in the training set, where x_i in X are the input to function and
y_i in Y are the output value of f for the corresponding data.
Many problem settings in machine learning lack a direct interaction with the environment and its simulation is
usually to complex to implement. Instead, it may be easier to collect empirical data from which the desire models
can be learnt. Most algorithms that learn from empirical data generally fall into two broad categories namely
supervised and unsupervised learning. Many researches in this framework have been devoted to solve classification,
regression, and dimensionality reduction problems.
In difficult classification and regression problems, the real world data often require non-linear methods to
achieve a successful application. Instead of constructing the non-linear model explicitly, the kernel method allows
implicit mapping of data from the input space to the kernel-induced space. A linear model in this new space
could correspond to non-linear model in the input space. Consequently, a powerful non-linear model can be built
efficiently using kernel methods. Since a kernel constitutes the prior knowledge about a task, the choice of kernel
functions plays a crucial role in the success of the algorithms.
In addition to the successful application in SVM, the kernel methods have been shown to be very fruitful in
regularization framework. One of the most powerful results is the Representer theorem. Roughly speaking, it
states that the solutions to certain regularization problems have a finite representation. This result helps reduce
the complexity of learning algorithms considerably as one can efficiently find the optimal model in high-dimensional
feature space through much simpler representation of the solutions. However, the drawback of this approach is
that strong constraints have to be met, e.g., the smoothness penalty to monotonic function of the RKHS norms,
for its solution to admit the finite representation. As a result, this precludes the models based on l1 -penalty such as
LASSO. Since the regularization has been shown to have a strong connection with Bayesian analysis, it would be
interesting to approach the kernel methods from Bayesian perspective. More specifically, the regularizer, for
example, acts as a constraint for functions contained in the reproducing kernel Hilbert space (RKHS) induced
by the kernel. This constraint is in fact the prior knowledge incorporated into the process of estimation. The
Bayesian approach would possibly allow the relaxation of these constraints and enable estimates of the confidence
and reliability without additional methods such as bootstrapping or cross-validation.
Learning algorithms in supervised learning have been used successfully in many applications such as hand-written
recognition, object detection, and speech recognition. However, most of these techniques suffer from the poverty of
labelled examples. Labelled examples in supervised learning are often difficult to obtain, as opposed to unlabelled
examples that are easier to collect. Thus far, attempts have been made to exploit unlabelled data in addition
to labelled data. Semi-supervised learning (SSL) is one of the machine learning techniques that copes with this
problem. With few labelled examples, the goal of SSL is to create as accurate as or even better classifiers than
what we would obtain from traditional supervised learning.
The exploitation of large number of unlabelled examples in conjunction with few labelled examples will become
increasingly important for many learning algorithms as illustrated in SSL. Most successful applications of SSL,
however, often rely on strong assumptions. For example, the optimal classifier is assumed to go through a low
density region of the data space. In general, these assumptions assert that there are some non-trivial relationships
between the distributions of labelled and unlabelled data. However, they do not always guarantee the superior
performance of the classifiers. It is sometimes the case that when assumptions are made but do not hold, it can
degrade the performance and can be worse than supervised learning. Moreover, the advantage of unlabelled data
in SSL is still unclear, so there should be more theoretical analysis regarding this issue under more general problem
setting.
In conclusion, learning problems are becoming more complex and require increasingly sophisticate algorithms
and huge amount of data. Even though such data are abundant and cheap to obtain, their conversion into
valid training data for a specific task requires human intervention which could be expensive and time consuming.
Therefore, semi-supervised learning will certainly become one of the most significant learning algorithms in the
future.
When the supervision is not available, the goal of machine learning is instead to discover structures underlying
the data. Many tasks in unsupervised learning are closely related to several problems in statistics where the
probabilistic distributions are used to model uncertainty about the world and objects of interest. These probabilities
are attributed to encode the subjective levels of belief. Bayesian inference, as opposed to frequentist inference,
allows one to update the levels of belief by combining the prior knowledge and observation evidence. This process
of updating the probabilities reflects many situations in our everyday life. Moreover, there is also a growing
connection between Bayesian methods and Markov chain Monte Carlo (MCMC) simulations since complex models are often
intractable.
In Bayesian parametric models, the use of probability distributions restricts the class of objects that can be
modeled to the finite-dimensional objects. The framework of Bayesian nonparametric models, on the other hand,
provides the flexible probabilistic representations that allow us to work with more general objects such as lists,
trees, and partitions. To achieve this flexibility, the prior and posterior are no longer restricted to the probability
distributions, but are general stochastic processes. Moving from simple probability distributions to stochastic
processes provides the possibility to manipulate the objects in infinite-dimensional space. The Gaussian process
(GP), Chinese restaurant process (CRP), and Indian buffet process (IBP) are well-known Bayesian nonparametric
models that have been used in many successful applications.
To conclude, my research interest lies in general machine learning and the problems I have discussed so far. In
my opinion, it is impossible for anyone to master machine learning as its core knowledge is an overlap of several
disciplines and its applications range across a wide variety of fields. Although recent machine learning research is
producing impressive advances, there remains a great deal of work to be done. As a result, I am always open to
any challenging problems other than what I have mentioned that will facilitate in understanding the real-world
phenomena.
If you are interested, feel free to contact me as exciting collaborations are always welcome.