In my research I effectively use computers to solve challenging decision-making and control problems. In particular, I focus on developing general tools that enable a machine to adaptively learn a solution. This is a far superior method than executing a hand-coded solution, because the problems we face today contain uncertainty, and thus do not admit static solutions.
I build on the strength of the current computer technology to manipulate numbers and complex data structures extremely quickly. In extracting statistics from huge data sets, evaluating complex functions, and solving large systems of equations machines routinely outperform people. I am exploiting this computational power in conjunction with solid statistical methods to provide a basis for handling the inherent uncertainty and nondeterminism that characterizes most outstanding decision-making problems.
In joint work with Michael Littman at Duke and AT&T Labs, I focused on learning methods for the algorithm selection problem (adaptively and recursively choosing the appropriate algorithm for each incoming problem instance). Eventually, I came to realize that there is need for more efficient and powerful reinforcement-learning methods, so during my Ph.D. work with Ronald Parr at Duke, I focused on general-purpose reinforcement-learning algorithms with goals in two directions: efficient utilization of data and scaling to large problems. I used techniques from linear algebra, function approximation, linear programming, monte-carlo estimation, and classification to devise and propose learning algorithms that meet the challenges of a wide range of decision making problems.
I am currently doing research in this area at Georgia Tech and apply my methods to industrial problems (disassembly planning, routing in re-entrant lines) with a plan to continuing on problems from operations research (economic markets and auctions, scheduling), networking (dynamic packet routing, router/server reconfiguration), robotics (coordination, planning), and meta--computation (using learning and reasoning to build ``smarter'' computers that make better use of their resources, as opposed to simply faster computers). During this process, I expect to raise more fundamental and/or theoretical questions. In parallel, I collaborate with medical doctors from Emory University on supervised learning methods for computer-aided diagnosis of gastrointestinal bleeding.
In the past, I have established a solid basis in neural systems (Hopfield networks, neural maps), robotics (path planning, motion control), human-computer interaction (speech interfaces), and DNA computation (self--assembly). I still maintain a strong interest in all these fields and I am actively seeking collaborations along these lines.
My long-term vision is two-fold and bidirectional in terms of interdisciplinary collaboration. On one hand, I want to spread the machine learning technology by using it to solve outstanding problems in other disciplines. Computational biology, medical diagnosis, operations research, and robotics are just a few sample areas where uncertainty is inherent and machine learning can make a diŽerence. On the other hand, I want to bring new tools, developed in other disciplines, into machine learning. I believe that mathematics, statistics, control theory, and physics will be some of the main such collaborators in the years to come. Tools, such as wavelets, experimental design, adaptive control, and chaotic dynamics, have the potential of contributing significantly to the core challenges of machine learning. Under this view, I envision my research career crossing the narrow borders of strict specialization while maintaining a scholarly expertise on a very promising field.
My Ph.D. dissertation is on "Efficient Approximate Policy Iteration Methods for Sequential Decision Making in Reinforcement Learning". Briefly, I developed two algorithms for the control problem in reinforcement learning. Least-Squares Policy Iteration (LSPI) uses approximate value function representations and Rollout Classification Policy Iteration (RCPI) uses approximate policy representations. Both algorithms are based on the approximate policy iteration framework and therefore inherit certain stability and soundness properties.
Here is my Ph.D. dissertation committee:
Check out the publications page for
related published papers.
An online copy of my thesis will appear on this page soon.
As part of my Ph.D. dissertation I worked on the use of "Least-Squares Methods in Reinforcement Learning
for Control". Briefly, I used linear architectures and
least-squares projections for value function approximation in order to
solve large-scale control problems using relatively few training data.
The core algorithm, Least-Squares Policy
Iteration (LSPI), is simple, stable, data-efficient, and
widely applicable. Using LSPI and its variations I was able to obtain
very good results in several collaborative/competitive single-/multi-
agent domains.
Model-Free Least-Squares Policy Iteration
Michail G. Lagoudakis and Ronald Parr
Proceedings of NIPS*2001: Neural Information Processing Systems: Natural and Synthetic
Vancouver, BC, December 2001, pp. 1547-1554.
Least-Squares Policy Iteration
Michail G. Lagoudakis and Ronald Parr
Journal of Machine Learning Research, 4, 2003, pp. 1107-1149.
Check out the publications page for more
related published papers.
For additional information on LSPI and a code distribution go to the LSPI page.
Algorithm Selection using Reinforcement Learning
Michail G. Lagoudakis and Michael L. Littman
Proceedings of the 17th International Conference on Machine Learning
Stanford, CA, June 2000, pp. 511-518
Learning to Select Branching Rules in the DPLL Procedure for Satisfiability
Michail G. Lagoudakis and Michael L. Littman
Electronic Notes in Discrete Mathematics (ENDM), Vol. 9, Elsevier
LICS 2001 Workshop on Theory and Applications of Satisfiability Testing (SAT 2001)
Boston, MA, June 14-15, 2001 (ENDM Volume 9 online)
You can also check out RLSAT,
a DPLL like solver for SAT and #SAT problems with learning
capabilities.
2D DNA Self-Assembly for Satisfiability
Michail G. Lagoudakis and Thomas H. LaBean
Proceedings of the 5th DIMACS International Meeting on DNA Based Computers
MIT, Boston, MA, June 1999, pp. 139-152
Here is my M.Sc. thesis committee:
And here is a poster describing
the main points of my thesis.
Neural Maps for Mobile Robot Navigation
Michail G. Lagoudakis and Anthony S. Maida
Proceedings of the 1999 IEEE International Joint Conference on Neural Networks
Washington, D.C., July 1999
Mobile Robot Local Navigation with a Polar Neural Map
M.Sc. Thesis
Center for Advanced Computer Studies, University of Louisiana, Lafayette, 1998