Showing posts from June, 2017

A Retrospective Look On What it Means To Be A Data Scientist

I've talked about this subject in some of my posts in my earlier years of working as a data scientist, namely in these 2 blog posts:

1. Journey In Data Science
2. Hindsight, 8 Months Down The Analytics Road

So now 3 years down the road, I guess I am a little more knowledgeable on the matter, a little bit wiser.

Back to that definition I was talking about, recently there has been two articles which I think provides a good description of what are the skills needed to become a data scientist, and what are the role that a data scientist play in a day to day setting. In the final half of this post, I'll include my 2 cents on the articles and how it relates to my daily work.

The Skills[1]

Picking it up from Forbes (which in turn picked it up from Quora), the top 5 skills are:

1. Programming. 
I guess this is pretty much a no brainer. Programming skills do come in handy especially when you're trying to (1) massage data, and (2) automate repetitive tasks.

2. Quantitative Analysis

Research Sample Size

Sometimes part of being a data scientist requires that you actually do act like a "scientist" (obviously).

In this post, we're going to have a look at a "not-so" popular subject of determining the right sample size that allows you to make a proper conclusion with respect to the population that you're interested in.

More often than not, people usually assume that a sample size needs to bear some proportional relationship to the size of the population from which it is drawn. This not necessarily be the case.
Rather, at some point, having more samples need not mean a greater accuracy in doing your analysis.

What this means is, you really don't need to gather as much samples as possible in order to come up with a reasonable conclusion that can be applied to the population at large.

The absolute size of a sample is much more important. The size is pretty much dependent on the variation in the population parameters under study and the amount of estimated pre…