Computing Intelligently: Real-time machine learning

In this post, I give an overview of my recent research efforts in the area of real-time machine learning. I assume a basic familiarity with machine learning concepts. I've written a brief and friendly introduction to machine learning for any readers who would find it helpful before proceeding with this post.

Definitions

Arthur Samuel, in his classic 1959 paper "Some studies in machine learning using the game of Checkers", stated the goals of machine learning in the following terms:

...it is necessary to specify methods of problem solution in minute and exact detail, a time-consuming and costly procedure. Programming computers to learn from experience should eventually eliminate the need for much of this detailed programming effort.

In Liu and Layland's 1973 paper "Scheduling algorithms for multiprogramming in a hard-real-time environment", a key characteristic ("Assumption 4", in their description) of a task that is amenable to real-time scheduling is:

Run-time for each task is constant for that task and does not vary with time. Run-time refers to the time which is taken by a processor to execute the task without interruption.

Putting these two ideas together, then, my current focus is on finding appropriate algorithms and data structures for real-time machine learning. Real-time machine learning data structures have these characteristics:

By learning from experience, they reduce programmer effort in creating and deploying a software system.
They update themselves to incorporate each new experience in constant time.

There are a couple of existing data structures with these characteristics that I'd like to highlight:

Q-learning maintains a two-dimensional array as its core data structure. This array is indexed by states and actions. For each state-action pair, an expected-reward ("Q") value is stored. This value is what the system expects to receive for applying the given action in the given state.

Assuming that a mapping from sensor values to states can be performed in constant time, this is a real-time algorithm.
By adjusting the Q-values based on experience, the agent's behavior should become more consistent with the system design goals.

The Kohonen Self-Organizing Map (SOM) is an unsupervised learning algorithm that maintains a two-dimensional array of clusters. In its traditional formulation, these clusters are formed from multiple passes over the input data.

By following the example of Touzet 1997 ("Neural reinforcement learning for behavior synthesis"), the SOM can be reconfigured to incorporate one input at a time, thus becoming a real-time machine learning algorithm.
The specific task Touzet addressed was for the SOM to cluster sonar readings. Each cluster then became a state designation for Q-learning.

One reason I find unsupervised learning algorithms like the SOM intriguing is that the concept of a cluster can be used to model the formation of invariant representations in the neocortex. For each concept we know, we have a sensory representation in the neocortex. For example, for the concept of an "apple", the neocortex has a stored concept of what you expect to perceive through the senses. Unsupervised learning algorithms provide a means of computationally modeling this concept. From this perspective, the application of the SOM to Q-learning could be interpreted as a learning architecture that is modeling a cognitive process.

Bounded Self-Organizing Clusters

In a later paper (Touzet 2006, "Modeling and simulation of elementary robot behaviors using associative memories"), Touzet developed an alternative to Q-learning using two SOMs. Intrigued by the concept, I attempted to replicate it. I got frustrated with the inadequate coverage of the input space I was getting from the SOM, however.

What do I mean by "inadequate coverage"? The performance metric I have chosen for this purpose is to minimize the Sum-of-Squared-Differences (SSD) between each training input and the reference input for its best-matching cluster. Informally speaking, I'd like every input to be "reasonably close" to the reference input of some cluster.

To address this problem, I devised a novel real-time clustering algorithm, Bounded Self-Organizing Clusters (BSOC). The core of the idea is as follows:

Initialize with a fixed maximum number M of clusters.
Until M clusters are reached, save every input as a reference input for a new cluster.
Once M clusters are in place, as each new input arrives merge the two closest reference inputs into a new reference input, thus creating space for the new arrival.

In my paper "Real-time unsupervised clustering", presented at MAICS-2016, I give a full presentation of the algorithm, as well as an experimental assessment showing it to outperform the SOM on three data sets I assembled. My presentation at the conference is available for viewing as well, for any who would be interested.

My implementation is available via Github. It is an Eclipse project that should be ready to run. I have included several things:

The algorithm itself (BoundedSelfOrgCluster.java, Node.java, Edge.java).

There is a constant that controls the version of the algorithm.
In the paper, there were two versions.
There is an additional version I have implemented since then. (Details below.)

My video data sets, along with a GUI to visualize them.
An additional GUI that applies the algorithm in real-time using your webcam.

Real-Time Performance

A relevant issue that I did not address in the paper is the actual frame rate of employing this algorithm on physical hardware when employing images as the input representation. I presently have two such implementations: one for the Lego Mindstorms EV3, and the other running on my MacBook Pro Retina.

Device	Resolution	# Clusters	Frames/second
EV3	160x120	16	0.84
EV3	80x60	16	2.85
EV3	40x30	16	9.19
EV3	40x30	32	5.52
MacBook	176x144	16	5.37
MacBook	88x72	16	12.02
MacBook	88x72	32	10.60
MacBook	88x72	64	6.02

With a modest number of clusters and a small but usable image size, we can see that real-time processing is achievable with this data structure.

Demonstration of Video Clustering

This demonstration can be replicated by running the VideoView program in the distributed code. This is the second video discussed in the paper. This video demonstrates the following features of BSOC:

When driving straight, the cluster assignment for each image in the sequence is relatively stable.
The reference inputs are reasonable interpolations of the input video frames from which they were created.

Next Steps

Supervised Learning

This summer, I will work with a Hendrix undergraduate student to use BSOC as the basis for a supervised learning algorithm. BSOC will serve as a back-end for an implementation of the k-nearest-neighbor (kNN) algorithm. The standard kNN algorithm retains every single labeled training input. As the number of training samples grows, the retrieval time increases proportionately, preventing it from working as a real-time machine learning algorithm. With BSOC as a kNN back-end, we hope to determine the performance tradeoffs involved between meeting real-time deadlines and the quality of the learned classifier.

Improved Cluster Matching with Bag-of-Words

As the number of clusters and images sizes increase, time performance degrades rapidly. We plan to investigate using BRIEF descriptors to create image-vocabulary trees following the approach described by Gálvez-López and Tardós in their paper "Bags of Binary Words for Fast Place Recognition in Image Sequences." By indexing the clusters based on the visual words they contain, we hope to be able to ignore large numbers of irrelevant reference inputs when finding the best match. An additional possibility would be to redefine the distance metric itself in terms of the visual words contained in the images.

Behavior learning

I mentioned earlier that this work was inspired by my attempts to replicate the work described in (Touzet 2006). I still haven't completed that project. In particular, I want to adapt the system he describes to use image input instead of sonar input for defining reactive behaviors.

Collaboration?

If anyone is interested in experimenting with these ideas, let me know. I'd be happy to collaborate with others who have synergistic goals. If nothing else, we could probably work together to assemble a diverse suite of data sets for testing and experimentation.

Recent Revision and Update

After the conference, I reviewed my source code, and it had diverged a bit from my pseudocode in the paper. In particular, there was a rather Byzantine scheme in place for determining whether a newly presented input ought to be merged into an existing cluster, or whether two existing clusters ought to be merged. I simplified it as follows:

Maintain an array with M+1 elements.
When adding a new reference input fills the array, find the two closest clusters, merge them, and eliminate the extra cluster, retaining only M elements in the array.

This implementation was much closer to the pseudocode in the paper and also took fewer lines of code. In addition, it also improved the performance of the BSOC2 algorithm. I'm including revised versions of Figures 8, 9, 10, and 11 below. As it happens, the revisions did not affect the performance of BSOC1 at all. All variations are retained in the Github repository.