A Basic Recipe for Machine Learning

Ever since wrapping up the three Deep Learning courses by Andrew Ng I've been meaning to write down some of the gems that he's highlighted throughout the course.

One of the nice ones that I felt needed to be written down is his general recipe to approaching a deep learning algorithm/model.
I've basically summarized it in a flowchart below (because everybody loves a flowchart right?)

What is bias and variance? The below diagram is the typical explanation that I'm sure most of us are used to. 

How can we know if we have high bias or high variance? 
For high bias, we could take a look at the training set performance. A poor performance is an indicator of a poor model fit, and signals that we could try to apply a bigger network to get a better fit of the model.
For high variance, we could take a look at the validation set (or dev set - as Andrew calls it) performance. A poor performance of this set (while getting a good result for training), is an indicator that our model…

Reviewing Andrew Ng's Deep Learning Course: Neural Network and Deep Learning

Feeling rather good about myself as I'm writing this as I've just completed the first course of Andrew Ng's latest Deep Learning specialization on Coursera. I've been meaning to learn about Deep Learning for quite awhile now but haven't been able to wrap my heads around the theory aspect of it for longest of time.
Previously, my foray into deep learning has been via Udacity's Deep Learning materials, random internet articles, and the Deep Learning textbook.

Yes. THE textbook. 
Bought it from Amazon a few months ago, and am still going through the pages. Still finding it tough to find the time between going through a few pages, the day job, and sorting out the kids at night. From what I've gone through so far, I'd imagine that I would need to brush up on my rusty math in order to be able to fully appreciate the book.
I have a confession to make though.
I never really did go through Andrew Ng's first ML course (gasps!). I know, I know..the course is …

Setting Up Docker for Windows 10

Didn't really had any use for Docker until today. Was trying to follow a course via Safari Online, and long story short - I'd probably need a docker to simplify setting up all the infra.

Except setting Docker itself turned out to be quite a problem for me.
Here's how I got it to work.
1. Downloaded Docker (community edition) from their website (
2. Installed it. 3. Checked whether hyper-V is enabled. (Go to task manager -> Performance -> CPU and you should see as section as "Virtualization : Enabled") [1] 4. Open up PowerShell 5. Use 'docker-machine create ' to create a virtual machine. I named mine 'box' 6. Configure your shell (refer to image)

Reference:  [1] :

Book Review: Weapons of Math Destruction (Cathy O'Neil)

This post marks my first attempt in trying to force myself to gain a better understanding of the books that I've read. Previously, I find myself reading books after books without being able to recall the important things that I've learned earlier.

It's rather frustrating to be honest.

So I'm trying this out as a way for me to push myself to understand the book and synthesize the various concept and ideas that are conveyed from the book.

A disclaimer: My reviews will not attempt to be neutral or unbiased - as I feel that any attempt for me to try and write such kind of a blog post would result in a dry and boring outcome.

Guess you could say that it'd probably be much more of a rant rather than review.

Moving on.

I bought the book from Amazon quite awhile back in April and it has been on the shelf for quite sometime as I was another book at that time. The outline is rather interesting, as it highlights the pitfalls of big data implementation from a first person poi…

A Retrospective Look On What it Means To Be A Data Scientist

I've talked about this subject in some of my posts in my earlier years of working as a data scientist, namely in these 2 blog posts:

1. Journey In Data Science
2. Hindsight, 8 Months Down The Analytics Road

So now 3 years down the road, I guess I am a little more knowledgeable on the matter, a little bit wiser.

Back to that definition I was talking about, recently there has been two articles which I think provides a good description of what are the skills needed to become a data scientist, and what are the role that a data scientist play in a day to day setting. In the final half of this post, I'll include my 2 cents on the articles and how it relates to my daily work.

The Skills[1]

Picking it up from Forbes (which in turn picked it up from Quora), the top 5 skills are:

1. Programming. 
I guess this is pretty much a no brainer. Programming skills do come in handy especially when you're trying to (1) massage data, and (2) automate repetitive tasks.

2. Quantitative Analysis

Research Sample Size

Sometimes part of being a data scientist requires that you actually do act like a "scientist" (obviously).

In this post, we're going to have a look at a "not-so" popular subject of determining the right sample size that allows you to make a proper conclusion with respect to the population that you're interested in.

More often than not, people usually assume that a sample size needs to bear some proportional relationship to the size of the population from which it is drawn. This not necessarily be the case.
Rather, at some point, having more samples need not mean a greater accuracy in doing your analysis.

What this means is, you really don't need to gather as much samples as possible in order to come up with a reasonable conclusion that can be applied to the population at large.

The absolute size of a sample is much more important. The size is pretty much dependent on the variation in the population parameters under study and the amount of estimated pre…

Setting Up Tensorflow (with CUDA) for Windows 10

Below are some of my rough notes on how I've setup my Windows 10 laptop to use Tensorflow with CUDA.

My reference: the following NVidia drivers:
CUDA Drivers ( Currently it's CUDA Toolkit 8.0. Can install using the installer downloaded.CUDNN - CUDA for Deep Neural Networks ( Currently it's 5.1. Once extracted, place the files in the respective directory along with the other CUDA files in the NVIDIA Toolkit folderSetting up Tensorflow (CPU)

Setting up Tensorflow (GPU)

Note: It's 22/2/2017 now and Google have recently released their Tensorflow 1.0, which might've rendered the above guide obsolote (i've haven't tested them yet).

Update (23/2/2017):
The above basically creates 2 new anaconda instance for you to play with. I like this approach since i…