Posts

Showing posts from November, 2014

Building a Data Product

"Do you have differentiation or competitive advantage? Ultimately, anyone can create analytics on commodity data. Analytics, unless they are really fantastic, don’t differentiate data products. What does differentiate is proprietary data. If you have data no one else has, and you can create useful analytics on it, you’ve got the key to long-term competitive success with data products."
 Some excerpts from a nice Wall Street Journal article. The articles lists down a few important questions that needs to be answered before you want to even think of venturing yourself in the data products business. Full article available here .

Information is Beautiful

Image
Taking a break from writing for today.

However I did come across a few neat blogs on data science and data visualization.

Of particular interest is Information is Beautiful. While they're not really doing a dashboard (only a few) or something interactive per se (most of the stuff in the blog are infographics), but I do like to see visually informative stuff from time to time and it's quite easy on the eyes.


But just because it's not a dashboard or interactive does not mean that it can't be done! Sometimes all that people need are just some inspiration and off they go.

Visualizing Social Network - Part 1

Image
An idea is worthless unless implemented right? You can have all the data in the world but if you can't articulate it or visualize it for others to understand, then it's really just another data in your data warehouse.

The last couple of months me and my colleagues were working on social network. Basically trying to understand how the subscribers are interconnected and identifying who are the influencers for targeted marketing - or so we thought that's how it should be.

Business justification aside - it was an interesting topic to dive into. After a few days, we managed to come up with our edge and node list, and later ran a few centrality algorithms to measure each individual within that network. Specifically, we were measuring:
Degree - The number of direct connections that one has. ie How many direct friends does he have?Closeness centrality - How close (by means of hop) is a person to each of the person in their network?Between-ness centrality -  Identifying who acts as…

Person Movement Prediction Using Hidden Markov Models

Humans typically act in a certain habitual pattern, however, they sometimes interrupt their behavior pattern and they sometimes completely change the pattern. Our aim is to relieve people of actions that are done habitually without determining a person’s action. The system should learn habits automatically and reverse assumptions if a habit changes. The predictor information should therefore be based on previous behavior patterns and applied to speculate on the future behavior of a person.

You can get the complete article here.

Modelling the Ebola Outbreak using Wolfram

The recent outbreak of the Ebola virus disease (EVD) has shown how quickly diseases can spread in human populations. This threat is, of course, not limited to EVD; there are many pathogens, such as various types of influenza (H5N1, H7N9, etc.) with the potential to cause a pandemic. Therefore, mathematical modeling of the transmission pathways becomes ever more important. Health officials need to make decisions as to how to counter the threat. There are a large number of scientific publications on the subject, such as the recent Science publication by Dirk Brockmann, which is available here. Professor Brockmann also produced videos to illustrate the research, which can be found on YouTube (video1, video2, video3). It would be interesting to reproduce some of the results from that paper and generally explore the subject with Mathematica.

Full article here: Modeling a Pandemic like Ebola with the Wolfram Language

Data extraction - Build your own phone using Raspberry Pi

Image
This is not an original post per se - just a repost and a reminder to self of sorts about how I could perhaps someday build my own phone and collect a whole bunch of other stuff that most people wouldn't dream of collecting (but of course they have already right? :) )

The article: Lifehacker - Build your own phone

The video, by David Hunt

Google Places API - Part 2

A follow up from my previous post. After some digging and experimentation, was finally able to come up with a way to mass search coordinates of a specific location type (i.e shopping malls across KL).

Basically there are a couple of services that you could use, depending on the output that you want. More on that here. So based on the guide, I know that what I need would probably be more or less covered with Radar and Nearby - since I basically need the coordinates and on a mass scale.

In essence (since the full description is available on their website and I'd rather not overcomplicate things) the Radar Search service can give you up to 200 search results at a go, given a coordinate to start with. Sounds good - yes. So I gave the example a go. But they can only give you the coordinates in the result list - without the name of the place. You also get a place id for each those coordinate, but I somehow wasn't able to use that id and retrieve the place's name - so I suppose …

Google Places API

Image
For the past few days I've been trying to gather coordinates of places around KL, in the attempt to tag those places to our cell towers for a more in depth market analysis of our consumers.

While one could in a sense get those coordinate manually from Google Maps, and jot down the coordinates on a spreadsheet - it seemed like "not-so-smart" solution and troublesome to do in the long run (ie. your boss asks you to find the number of customers that's visiting a particular shopping mall today, and tomorrow he wants to know the numbers of customers that visits golf courses in the outskirt of town. In such case jotting down the coordinates manually would be very time consuming  - not to mention crazy)

Initial attempts include web scraping and looking into the source code of the map (more on that topic later). Somehow rather those didn't work well as expected.

Hence now I turn to Google API, or more specifically - Places.

There are actually a lot of stuff that you can…

Journey in Data Science

Image
Working in a team of data science enthusiast can have it's benefits. In our team of a few, none of us can claim we're good in data science and/or big data analytics as we're pretty much new in this field. Data scientists are hard to come by, and most articles out there can attest to that. Accenture in this suggested a nice idea in their article "The Team Solution to the Data Scientist Shortage", if a data scientist person is hard to find - why not have/build a team that has the necessary skills of a "data scientist"?
Inspired by this, I've set myself a goal to at least master some of the necessary skills that make up a data science guy.

Which makes sense really. To be a master of all the above mentioned area would consume an insane amount of time. Thus to be able to segregate the task around and focus on achieving the end goal - together - as a team; would mean that we could get stuff done faster and reap the result sooner rather than wait for that…

Dual in Netezza

When you're used to doing queries in Oracle SQL it can be confusing at times when you start working in Netezza.
While most of the time the syntax is the same, some functions that you're used to using are just not there anymore.

One of them - is dual.

Luckily a quick google on the topic points you straight to the answer. - "_v_dual".

Ie: select sysdate from _v_dual;

Another interesting thing that I recently discovered while working in Netezza is the existence (I'm a really really new user to Netezza) of a spatial module that contains a lot of geo-related functions.
Can't wait to try those out. Will share any stuff that I find soon.

Models

Image
When we talk about big data analytics, the usual topic of discussion would normally revolve around Hadoop, Pig, MapReduce, NoSQL - platforms basically. Granted - those technologies are the enablers of big data, for without which - one can't store big data in their data warehouse.

For now though, I'd like to focus on the math.

The models to be precise. For without which, one can't derive any usable use cases anyhow with all those data that you have (you might be able to get those low hanging fruits, but over the years - you will have to put on your math hat as well).

Below is a link for a course in Coursera which I find to be perfect for beginners like myself who's rather new with statistics and the different models that can be used to solve different questions. The reason I'm mentioning this course in particular is because I like the pace at which it is going and how the presenter is able to articulate complex ideas in simple words.

https://class.coursera.org/model…