Tasks to do

  • Implement both types of deep networks for in depth understanding of the model, for future testing and for extension to segmentation.
  • Understand the expressive power of RBM and what a 3rd order RBM (Gated BM) brings to the table
  • How can feedback connections be used for segmentation?


10/16/2009 DBN is terrible at handling noise/occlusions. A DBN which achieves 1.35% error on 10,000 test set achieves 1.48%, 2.55%, 10.57% and 31.4% error respectively as we add white pixels to its 1 to 4 borders below: This is simply due to the fact that DBN is trained on the foreground as well as the background, it's looking for black borders on all digit images! 10/14/2009 Studying and implementing Annealed Importance Sampling for RBMs in order to estimate the partition function as well as the true probability of observed images 9/28/2009 Implementing sparsity to RBM to allow for more robustness to noise, especially on the borders 9/22/2009 DBN recognition rate drops from 1.35% to 32% when a 1 pixel wide white border is added to the test image set 8/23/2009 first GPU implementation for Up-Down algorithm finally finished, speed up is 10x 8/17/2009 GPU implementation for nonlinear neighborhood component analysis finished, speed up allows for learning finish in reasonable time using mini-batch conjugate gradient updates 6/10/2009 Setting up CUDA for GPU processing 6/5/2009 Finished implementation of deep belief network 5/15/2009 - Read up on Mean Field Contrastive Divergence learning, testing a 3rd order sparse (weight matrix) BM model for segmentation. 5/12/2009 - Investigating the representational abilities of RBM. Some test result show that RBM still have trouble encoding 's' shaped 2D distributions; maybe need nonparametric energy function as well as learn non-spherical Gaussian visible units. 5/7/2009 - Investigating feature selection and segmentation of objects in images. DBN or other generative models model the entire input space, which is 28x28, but often background pixels are just noise and shouldn't be modeled. Human can ignore the background and just learn the variations of the foreground without a teacher segmenting out which regions to learn. Computer Vision can not. using a coherence model (hack before a Bayesian model), with backward "greedy piecewise" feature selection, the noisy 2's with background and gaussian noise is found to have the masks on the right, which is similar to the true mask on the left side. Notes on further improvement: 1. use low entropy as a constraint 2. robust statistics (rid of outliers - salt & peppers noise) 3. why(if) is it convex loss function for feature selection optimization problem 4. integrate with a teacher writing the digit 5. or better yet, maybe this is cheating, but optimize the parameter of the model to match some supervised masks for each digit The problem here is that for the class of 2's, they are different, whereas in this example the 2's are all the same. Now the trick is to use invariance in CNN or DBN to solve this problem. 4/28/2009 - Debugging finished, all the partial derivatives of the network (weights as well as nodes) are very close to the numerical counterparts. Now, training must be done to see if Levenberg Marquadt algorithm is needed or if it is enough to stick with plain and simple gradient descent. LeCun argues against conjugate gradient, but conjugate gradient is much faster. 4/26/2009 - Debugging continues and adding code to preprocess the input (MNIST digits) There are several ways to go about this. 4/24/2009 - Old ideas just keep repeating themselves. In this paper, where LeNet 5 is described, the approach to learn F6 layer activation is reminiscent of the recent nonlinear embedding methods! 4/23/2009 - Begin implementing a Conv Neural Net (LeNet 5), which is Type II of the deep networks. What is unique about the CNN is that it uses backpropagation to fine tune the weights in the earlier layers. This is not typically seen in other hierarchical vision models. (Regarding biologically plausibility of backprop: Hinton doesn't believe evolution isn't able to come up with a way to tune lower layer weights if it helps improve recognition at an higher layer ref. Another advantage of CNN is its weight sharing and downsampling ideas. But this makes implementation slightly complicated. CNN is also a misnomer, since all that is necessary is filtering or dot product. I'm guessing the reason it was named "Convolution" is for the property of filtering across the 2D image. In image processing, convolution and filtering (correlation) is often the same since the kernel is symmetric.