ETA Harvester Guide
ETA Harvester is a configurable Google Chrome/Chromium extension that extracts text from web pages and sends it to a ETA collectionA collection is a container for storing and organising ingested files and documents. Only the textual content is stored in collections, not the original files and documents..
It offers two ways to harvest text: batch harvesting and page harvesting.
Batch harvesting enables you to automatically harvest text from multiple websites and send it to a ETA collection in one operation. It is useful for harvesting text from news and social media sites, and sites related to search terms of interest to you. See Batch harvesting.
Page harvesting enables you to manually select the elements you want to harvest on a web page then send the selection to a ETA collection. It is particularly useful when you are conducting an investigation by browsing the internet as you can harvest as much or as little as you like: the title and abstract of an article for example, a social media profile or a few paragraphs. See Page harvesting.
New ETA projects automatically contain a number of pre-defined rule sets for harvesting text from news sites, wikis, forums and Google searches, and from specific domains such as Twitter, Facebook and LinkedIn. Each rule set is designed to maximise the contentIn ETA Harvester, content is the text you want to harvest from a web page such as headings, authors, dates, captions and paragraphs (as opposed to the text you want to ignore from menus, sidebars and other boilerplate elements). that is harvested and minimise the boilerplate elementsElements on websites other than the content, such as navigation bars, side bars, footers, menus and advertisements.. The rule sets are in a configuration titled ‘Harvester Rule Sets’. You can customise these rule sets, delete them and/or create your own. See Rule sets.
Note: Harvester can be used to extract text from .onion sites using Tor. For more information see Harvesting content from the dark web.
To install ETA Harvester see Installing ETA Harvester.
This guide is for ETA end users and system administrators. It describes how to install and use ETA Harvester, and how to create rule sets.
Many of the features of ETA, and access to them, are configurable. For this reason there may be small variations between the screen images in this guide and your installation of ETA, and you may not be able to access all the features described.
Most of the screen captures in this guide appear as thumbnails. To expand an image, click on it. To collapse it, click on it again.
Green text indicates a glossary term. Hover over the term to display the definition.
© SC2 Corp 2018
This work is copyright. No part of this publication or the accompanying software may be reproduced by any process or disclosed to any third party without prior written permission from SC2 Corp.
All rights reserved.
SC2 Corp makes no representation or warranties with respect to the contents of this publication and specifically disclaims any implied warranties of merchantable quality or fitness for any particular purpose. SC2 CORP reserves the right to revise this publication from time to time without the obligation to notify any person or organisation of such revision. The screen illustrations in this publication are intended to be representations, not exact duplicates, of the screen layouts generated by the software.