ETOSHA

Contextual Information for Open Science

View the Project on GitHub kamir/etosha

Welcome to the Etosha propject repository!

Etosha aims on building a bridge between the Big-Data and Linked-Data domains to support integrated data driven research projects.

We create tools for contextualization, metadata extraction, and integration.

You can contribute to the Etosha data set graph! Our project Wiki collects and connects metadata in a user and also in a machine readable format.

The project is sponsored by Atlassian and managed via our public JIRA.

Example: Relevance Studies in Wikipedia

Based on open data, provided by Wikimedia Foundation, we calculate characteristic measures to individual Wikipedia pages. The measures take the neighborhood of the selected pages into account. Thus the results have to be embedded into the right context. Using the Etosha project Wiki, we provide references to the raw data, to our methodology, and to obtained results. This results are accessable via an API; build in to the SemanticMedia Wiki.
  1. Study Description (project page in Etosha wiki)
  2. Tools and Tutorials

How Etosha Enables the Global Data Village?

The Etosha dataset graph connects multiple linked data sources. It embeds databases and related metadata into non-technical contexts via semantic links. Such an embedding into the global linked data graph allows plausibility checks and supports interpretation as well as comparison of research results while access control stays always in the users hands. Data has not be moved to a public provider - all facts can be published from a private environment.

Etosha can operate as a (metadata)-gateway between emerging datahubs.

Only active linking between raw data, technical or operational metadatdata, and domain knowledge allows a 360° view on any data set. Public access to social media systems and related usage data such as the Wikipedia click count dataset is one of fundamental aspects for computational social science. Beside this, availability of processing resources became mainstream during the last decade, especially since the breakthrough the entire Hadoop ecosystem and its comercial recognition.

But real insights require more than just numbers or facts. Therefore, the Etosha project provides scalable context management services.

Efficient Communication is a key to success in general. Digitql communication tools are widely accepted but there is an obvious lag of standards which allow the growth of a dataset network – one which is comparable to Facebook’s social-graph or the CrunchBase business-graph. Such a dataset graph allows context embedding in collaboration environments and cluster spanning dataset discovery in global dataset catalog. Such a dataset catalog can be considered as one of the driving force towards a new era of data economy, based on data analysis and data driven business.

We build the Etosha-Graph to arrange, visualize, filter and share dataset context information using a class of ontologies described as DOAx files. Etosha follows the path, shown by the DOAP project already years ago and applies the concept of interlinked project-life cycle management to context sensitive dataset life cycle management.

Authors and Contributors

Etosha was initiated by Mirko Kämpf (@kamir), Eric Tessenow, Jan W. Kantelhardt, and Dror Y. Kenett.

Support or Contact

Having trouble with Pages? Check out the documentation at http://help.github.com/pages or contact support@github.com and we’ll help you sort it out.