SNA in Netezza

Can Netezza do network analysis?

That is the current predicament that I was put under for the past couple of weeks. Based on the set of hardware that I've (read: my company) got with me - I'm supposed to be using it to calculate the usual set of social network measurements (i.e degree, betweenness, closeness, eigenvector etc). 

2 weeks have passed and while it was relatively easy to calculate the degree centrality - betweenness have proven to be quite a challenge. So far I've been able to translate Djikstra's work in determining shortest path between nodes, and using those to determine the betweenness (refer back to the formula of betweenness centrality if you're lost here). The results have been tested on a small scale network with 10 vertices - and the values match with ones given in Gephi - so initially I was quite confident to be able to simply pump in the actual data from my telecom network.

That however didn't go as smoothly. The amount of memory needed proved to be too huge - and probably my way of coding was to be blamed as well as it simply relied on too many loops - that it hung my DB.

Alternatives then:-

1. Optimize my algorithm (which I've written in stored procedures).
2. Simplify the calculation - the work by Blondel et al (Fast unfolding of communities in large networks) seems to provide a clue.
3. Use a different platform, like Neo4J, Giraph or GraphX.


Popular posts from this blog

HIVE: Both Left and Right Aliases Encountered in Join

Assign select result to variable in Netezza stored procedure

Splitting value in Netezza using array_split