ETA’s network processing model

ETA creates networksA network is a visual summary, generated by ETA, of the information in one or more documents. from information it extracts from documents. There are two ways of doing this:

  1. Automatic network creation as documents are ingested by selecting Network Update in your Document Processing configuration.
  2. Batch network creation after document processing is completed in a collectionA collection is a container for storing and organising ingested files and documents. Only the textual content is stored in collections, not the original files and documents. - initiated from the network page.

Processing steps in network creation

ETA creates networks in three steps:

  1. Extraction of document markup (entities etc.) into the network
  2. Resolution of entities across documents
  3. Extraction of communities

Each stage relies on the previous one; but the first two can be done together.

The overall data model spanning document collections and networks is described here.

The network data model is described here.

Example: analysis of Twitter feeds via ETA networks

You can use networked relationships to detect community structures on Twitter for both user interaction and hashtag interaction. You can also use networks to detect individuals that are highly relevant to an issue or entity.

This also enables the creation of alarms and warnings that can allow you to take a strategic position to reacting to ongoing events.

Here's an example; it's a network that was created from Twitter traffic in response to a leak from BP in October 2012. The network view was created based on following tweets and any referenced web pages in those tweets that were related to the #BP Twitter tag.

The graph can be used to identify those with the highest levels of both positive and negative influence with a single click. You can also move directly to a document or tweet to review the content with a single click.

What's the purpose of networks?

When you're looking for a single reference - search is the way forward. ETA's search function is similar to the way that Google or other search engines function. You enter a term and results within your documentation are displayed in a list.

The trouble with searches like this is they don't give you any context for your search results. (The same is true of using Google). You have a list of results but you need to drill down into those results and read the content in order to place them in the context of other terms or ideas.

Networks help you create context. They demonstrate the relationships between terms and other terms. They allow you to make quick connections between terms and understand relationships in a very simple way.

This can be used in a wide variety of means but let's take an example. A police service seizes a hard drive from a suspect. They then import the contents of that drive as a document collection. They could search through the drive for term after term and try and build up a picture of what the individual has been doing. This has long been the traditional method of investigating documentation gained in this manner. However, it's a slow process. It's also subject to human error - it's hard to read hundreds of documents and make automatic mental connections between them. A network establishes many of those relationships automatically - it's a time saving device and it eliminates much of the chances of error when establishing these relationships by hand. This allows the investigative team to draw some quick conclusions and progress their investigation rather than being tied up examining documentation. It's more likely that the team will get a result because of this.