Gephi is my tool of choice for graph visualization. And Hadoop stores all the data, in HDFS or HBase, accessable directly, in Hive and HBase table, or via Impala in an SQL like style. Especially time dependent graph data can efficiently by partitioned with Hive and all the large processing is done in Apache Giraph.
But now I have to load data from Hadoop into Gephi. Creation of graph files and transfer to the workstation was done regularly. I had to repeat this steps manually all the time and important is, to remember the parameters which have been applied during processing or generation of the graph. This became a critical aspect over time and manual handling of all the files was not an option on the long term.
So I created the Gephi-Hadoop-Connector, which uses the JDBC-Interface provided by Impala and Hive, to load edge- and node-lists.
One really important feature in Gephi is: it supports time dependent analysis and visualisation of networks. To build such a timeline, an individual query can be defined for each single time frame of each individual layer. If data is already partitioned by time the whole procedure is really efficient.
Finally we have to think about: How to handle all this metadata of a time dependent multilayer graph? Therefore we use the Etosha-Graph-Metastore, which will be released soon.
Please clone this repository or download the zip file.
The Gephi-Hadoop-Connector was built by @kamir.