WebJigsaw Tutorial

WebJigsaw is a visual analytics system designed to help people browse, explore, analyze, understand and make sense of collections of text documents. WebJigsaw presents multiple visualizations of the documents and the entities within them, with a special focus on showing connections between entities (entities that appear together in a document).

This web version of Jigsaw is designed to work best with collections of documents that are relatively short in length. In terms of corpus size, we mean collections that may go up to 2,500-5,000 documents. The documents ideally should be about 1-6 paragraphs in length, that is, up to a page or two. What is most important here is the number of named entities per document. This number should likely be below about 50-75 entities for WebJigsaw to be most helpful.

WebJigsaw is not designed to analyze a small number of extremely large documents like books or academic papers. These types of documents should be broken into smaller units such as sections, subsections, or pages, and then each of these units becomes its own document.

Because WebJigsaw provides many different visualizations of the documents and entities, you should ideally have a good amount of screen real estate to show the views in the browser. While you can still run the system on a smaller monitor, you may be limited in the number of views you can easily manipulate

We have tried to keep this Tutorial relatively concise so that you can easily read and browse it, while still communicating the most important information necessary to effectively use the system.

This section will help you to quickly familiarize yourself with how to run WebJigsaw.

2.1 System Requirements

WebJigsaw should run well on all of the modern browsers, but the recommended browsers are Chrome and Firefox.

2.2 Initiating a Session

Go to the WebJigsaw url , on your browser. It should bring up the tab to upload the documents to the server, shown below.

2.3 Reading in a Set of Documents

WebJigsaw can read in (and store) documents from a variety of formats. It can read original documents such as text and csv. We have also created a Jigsaw Datafile format using xml that can be read in.

To import a source document that has not been processed yet, click on the "Select Files" button. This will bring up the browser Import dialog box for selecting the documents that need be read in. Alternatively, drag the files into the "Drop files" area.

The main tab here is File Import. It allows you to read in plain text (.txt), comma-separated value (.csv), and Jigsaw Datafiles (.jig). You can read in multiple files of "txt" format, either all at once or by selecting each separately. Alternatively, you can upload a Zip (.zip) file, that is a compressed collection of ".txt" files. We have done this specifically as uploading a single zipped file to the server is generally faster than uploading multiple txt file. For csv, a special mapping process will begin that allows you to specify what each column in the file means (more on that later in this document). We hope to soon add the ability to read in pages and sites from web crawls, pages from web searches, and bibliographic style pages.

The files you import can be simple ASCII text or they can be Unicode. Because WebJigsaw can now read Unicode, text from international (non-English) languages can be handled in WebJigsaw too.

We have created a simple proprietary xml file format for igsaw. If you have some specific type of data that you would like to analyze in WebJigsaw, one option is to first translate it to WebJigsaw's Datafile format. We have included a few sample Jigsaw Datafiles in the distribution such as the documents from the 2007 VAST Symposium contest, all the paper abstracts from InfoVis and VAST papers, a sample of papers from PubMed about breast cancer, and the Bible. More about this is presented in the next section and in the Appendix.

When you import a document or a set of documents, you also can choose to perform entity identification on the documents if you would like. This is done on the Entity Analysis tab after clicking on the "Next" button. You could just leave the default selection or select the options according to your needs. (If you have many files and they are relatively large, entity identification can be time-consuming, so be patient.) To learn more about entity identification, see this section.

Additionally, you also can choose to perform computational analysis on the documents if you would like. WebJigsaw provides document summarization, document similarity, document clustering, and sentiment analysis by default. The system analyses are available on the Computations tab after clicking on the "Next" button, once you are done with Entity Identification. To learn more about computations, see this section.

When WebJigsaw imports a set of documents, it builds an analysis database for those documents on the server. This is done so that WebJigsaw can scale up to large document collections. Note, however, that when a set of documents is imported, uploading and building this database can be time-consuming, perhaps taking quite a few minutes. Once done, this analysis database is called a WebJigsaw Project.

2.4 Displaying Views

To begin analysis, you likely want to start with a set of views. Once the preprocessing is done, you will be directed to the Visualization tab. This contains a menu of the various Views. You can choose whichever ones you want. Note that you can create multiple instances of any view type. We recommend having at least one Document View open all the time.

2.5 Start Analysis and Exploration

To begin exploration, you can perform a search query, selection or execute a command in a view. When you enter a search term in the searchbox, WebJigsaw will look for that text and will display a document view containing the documents with the text present. The Documents search mode is useful when you want to search for a plain word (e.g., dog, car) that is not necessarily an entity. WebJigsaw acts more like a simple search engine for this, bringing up the documents that include the search string.

2.6 Saving a Session

You can save an analysis session already underway by saving it as Jig file. This can be done using the commands available under in the right hand menu. Note that a Jig file just contains a coded version of all the documents with their entities identified. It does not contain the computational analyses performed or any information about views.

3.1 Importing Documents

WebJigsaw can import a variety of types of text files. Presently, it can read in ascii or comma-separated value (.csv) and Unicode text (.txt). Plain ascii or Unicode text files are the most reliable type of file to import, so whenever possible we recommend that you use text files or transform your documents into text files if possible.

Note

WebJigsaw views source documents as all textual content. In general, any text within the file is viewed as the body of the document. There are two exceptions to this, however. If WebJigsaw finds the string Date: or Source: followed by some other text on a line within the top five lines of a file, then it interprets that as a special meta-data line and it uses the trailing string as the special <DocDate> or <DocSource> fields for the document, respectively

To read in multiple files at once, simply select multiple files in the File Chooser dialog box using the shift- or control- mouse selection operation for your particular browser and Operating System.

Importing CSV Files

WebJigsaw also can import .csv files. Also, it's simple to generate .csv files from your .xls or .xlsx files.

Because the primary unit of analysis in WebJigsaw is a document, you inevitably may wonder how this is handled with these types of files. In general, WebJigsaw considers each row of a sheet as a separate document. The columns in a spreadsheet can specify attributes such as the ID, date, or body text of the document (row), or they can be a type of entity. It is your responsibility to set up the mapping from columns to the relevant attributes. When you initially import a spreadsheet file or files, you will be presented with the csv file as is.

When defining a mapping, you will see options like below. You can define the attribute specified in each column by selecting the pull-down menu above that column. The menu contains items for the Document ID, date, text, and for common entity types such as person, place, and organization. This menu also allows you to create a new type of entity to be specified in a column. At the top of the dialog box, you can specify the row in which the actual data begins, thus ignoring some header rows.

Some important points about spreadsheet import:

WebJigsaw can only read CSV files (.csv) and not Excel files (.xls) or (.xlsx) files. We recommend converting these to .csv and using the files instead.
If you create a new entity type in a column, that entity type's name can contain only letters and numbers, and it must start with a letter. No other characters are allowed.
If some of your cells are empty, then the results may be unpredictable. Most of the time we believe that they simply will be skipped and it will work “correctly”, but to make sure of success, try to have contents for all cells.
If possible, try to specify the Document ID and the Document text attributes. Even if you choose some simple text column to serve as the Document text, this will be helpful. You might even make a new column in your spreadsheet that is the union of a variety of other columns.
If WebJigsaw finds duplicated Document IDs in a sheet being read, the last one will be used, the previous ones will be ignored.

Jigsaw Datafiles

We have created a proprietary file format for storing collections of documents that uses xml. In addition to the text contents of a document, this format can contain meta-information about the document such as an ID and a date, and it can hold a list of identified entities for each document. We call these proprietary files 'Jigsaw Datafiles' (.jig). We have included a number of samples on the website there for you to examine.

If you have your own data perhaps in some xml format, in a database, or in another format, it's not too difficult for you to translate this into WebJigsaw's Datafile format. Examine the Appendix of this tutorial for more information about that and for instructions on how to work with your own data. Trust us -- It's really not too bad. We have done this to convert other xml files into WebJigsaw's format and to scrape web pages and make 'Jigsaw Datafiles' from them. Remember that this is xml, however, so you cannot have characters such as &, %, <, or > in your text. The Appendix also has more information about this.

The first line of a Jigsaw Datafile can be a filetype specification (Unicode UTF-8, for example). WebJigsaw will read this specification and interpret the file correctly.

Note that if you create your own Jigsaw Datafile and you try to import it and the process fails or hangs, then you likely have a syntax error in the file such as an illegal character, a missing bracket, a mismatched open/close tag, etc.

As another option, if you have your own specific data file format and you are not sure how to put this into WebJigsaw, please get in touch with us and we can possibly provide advice about how to create a translator from that into WebJigsaw's Datafile format.

Note

If you have imported documents from text files, spreadsheets, etc., and you would like to see them in Jigsaw Datafile format, you can use the Export command for writing out the current project as a Jigsaw Datafile.

3.2 WebJigsaw Projects

When a set of documents have been successfully read in and entity identification potentially performed, this set of information is called a project. A WebJigsaw project encapsulates a set of documents that have been read into WebJigsaw along with any entity identification that has been performed on them. You can store this and then reopen them on subsequent runs of the WebJigsaw system by saving it as Jig files.

The prepreprocessing page contains an Entity Analysis tab that includes operations for the different entity processes described below.

4.1 Entity Identification

When importing text files or spreadsheets, you can choose to have the system automatically identify entities. Presently, WebJigsaw provides three main mechanisms to identify entities in documents. First, it includes third party software libraries to do automated (statistical) entity identification. Second, it includes the capability to do some basic pattern matching of text to identify entity types such as dates, phone numbers, zip codes, email addresses, URLs, and IP addresses. Third, it allows you to provide an entity type (name) and a list of the values of that entity type. Below we describe each of these in a little more detail.

For automated entity identification, WebJigsaw can apply one of two possible packages. Polygot and Stanford NER are included with the distribution, so the entity identification process will be done on the server in these cases. Both the packages have strengths and weaknesses so we recommend you try each to see which will work best for your documents. We generally use the Polygot or Spacy NER system and have found it to be quite fast in general.

WebJigsaw also contains functionality that can help you identify particular types of strings such as dates, phone numbers, zip codes, email addresses, URLs, and IP addresses in documents' text. This code does some basic regular expression matching so it is not perfect. For example, a 5-digit number will be identified as a zip code; we do not validate this with all actual zip codes in the United States.

Finally, WebJigsaw allows you to create a new entity type and specify all of the valid strings that are the instances of that entity. For instance, you could create a new entity type “Car” and specify a set of possible values such as “Ford”, “Chevrolet”, “Honda”, “Hyundai”, etc. To do this, you need to create a text file (.txt) that has each different possible entity value on a different line of the file. (Note that an entity value needs not be just one word; it can have multiple words.)

To then add this new entity type to WebJigsaw, you use the bottom region of the entity identification tab. Simply enter the entity type name to the left and then browse for the text file containing the list of entity values. Note that entity type names (such as “Car” in the example above) are case-sensitive, can only contain letters and numbers, and must start with a letter.

Entity identification can be run at start of an investigation after the documents are initially imported.

4.2 Correcting Erroneous Entity Identification

The process of automated entity identification is not perfect. Many false positives (identifying entities that really are not entities) and negatives (completely missing some valid entities) can occur especially in documents with many spelling errors from processes like OCR.

WebJigsaw provides the ability to fix incorrect entity identification once you are in the Visualization tab. In the Document View, you can double-click on an entity and you have the menu on the top to change its type. Furthermore, you can mouse-click-drag over words in a document to select them, then and using the menu you can add the word(s) as an entity. You can choose one of the existing entity types or you can create a new entity type.

The List View also includes a right-click-menu command Delete that allows you to correct erroneous entity identification and remove an entity or entities. You can select multiple entities with shift- or control-click in order to remove multiple entities at one time.

New entity types (names) are case-sensitive and cannot contain blank characters or other special characters. The entity type can only include letters and numbers, and they must begin with a letter.

4.3 Entity aliasing

WebJigsaw also allows you to create aliases for entities. Suppose that a person's name is spelled three different ways in a document collection, but you know that they are all the same person. Alternately, suppose that a person is using an alias, that is, there is another name that they go by. WebJigsaw allows entities to be aliased in order to handle either of these situations. Entity aliases can be defined interactively through the List View.

In order to interactively create an alias, select two or more entities in the List View and then right-click to invoke a menu that will have the Make Aliases command in it. Choose that, and the system will ask which of the entity names should be the main one to use for this alias. Once you have done that, all the other subordinate entities will be removed from views and only this main entity name will be used. That “winning” entity name will be shown in an underlined font to indicate that it has aliases. Upon moving the mouse cursor over such an entity, a pop-up view will arise showing the other aliases.

Once you have imported a document collection, you are ready to explore, investigate, and analyze the documents and their entities. In all likelihood, you want to create a number of different views to show the documents and entities. Remember that you can have any number of views of any of the existing view types present.

5.1 General Tips

Views show entity-document and entity-entity connections. A document and an entity are connected if the entity appears in the document. Two entities are considered to be connected if they appear in at least one document together. As the number of documents in which they appear together increases, so does the quantitative connection strength.
A single mouse click on an item (document or entity) selects that item. All the other visible items then update their appearance to show how they related to that selected item. User mouse actions such as selections and expansions also are transmitted to other active views which update their representation appropriately too.
You can turn off/on event listening in each view by clicking on the little satellite dish in the upper right. Turning off listening essentially freezes the view, that is, user actions such as clicks and double-clicks in other views will not affect this view. This capability is very useful to lock a view at an interesting state. Note that frozen views also are not affected by the Clear All Views command in the Views menu.
To examine a document or the set of documents containing an entity in an empty new Document View, right-click on the item and use the Show in new Document View command.

5.2 Search Tips

In Documents mode, which is accessed by selecting the Documents checkbox, WebJigsaw simply retrieves documents that contain words from the search query somewhere in the document text.

5.3 View-specific Use Tips

The sections below briefly describe some of the utility, commands, and capabilities of the different views in WebJigsaw.

Note that each view has its own menus at the top that provide useful operations for that view. For example, some of the views have filtering operations that allow you to limit what is shown. All views have Change Title, Minimize / Maximize, Open in new Tab and the ability to listen.

Document View

The Document View is the core view in WebJigsaw for reading document contents. The list on the lower left holds a set of documents that have been loaded into this view. All Documents are placed there by default. A document view can also be populated in response to control panel search queries, by Show commands from other views, or by Expand commands issued in other views. Additionally, the Add All button in the lower left will bring all documents in the collection into the view. Be careful about using this command with extremely large document collections.

Click on any document name to select it and show its text in the focus area to the right. The number by the document ID is how often a document has been viewed. All of the documents listed in this view are participating in the word cloud at the top, which shows the key words used throughout that set of documents.

In the region above the actual document content is the “document summary,” the one sentence from the document that WebJigsaw has selected to most exemplify what the document is about. This can be useful for fast triage of multiple long documents.

Within the document focus region, the body text for the document is shown at the top, and below it are listed any affiliated entities that do not occur in the document text. Entities are colored in a pastel shade of their default color. Clicking on an entity selects it. You can perform manual entity identification by making a mouse drag selection of a word or words which selects them, then using menu add this as a new entity. Similarly, you can right-click on an already existing entity to access commands for removing it as an entity, changing its entity type using the menu or opening a new document view with only documents containing the entity.

Note

As documents get larger and larger, they tend to load much more slowly in the Document View.

List View

We find the List View to be the most powerful and useful view in WebJigsaw. It provides very easy browsing, selection, filtering, and exploration of all the entities and documents in the collection being analyzed.

The view begins showing three columns, but you can add/remove lists (columns) via commands from the Lists menu in the view so that you can fill out a wide view with as many lists as you want. The view will scroll horizontally if there is not enough room.

Each column holds entities of a particular type - the type can be changed through the menu at the top of each list. The same entity type can be put into different columns too. Be careful with very large document collections with many, many entities of a particular type, however. That may generate a very long scrolling list.

The bar to an entity's left is a frequency counter across the entire document collection. By moving the mouse pointer over this small bar you can find the exact number of documents in the collection in which that entity appears.

The buttons and menus above a column control how that particular list appears. The first three buttons sort the list in different ways: 1) alphabetically, 2) by frequency of appearance in the entire collection, or 3) by connection strength to the selected item(s). Other buttons control the alignment of entities and allow you to clear a list.

Clicking on an entity selects it; shift-click and control-click allow multiple entities to be selected. Selected entities are drawn in yellow. Entities connected to the selected entities are drawn in orange with darker shades indicating stronger connections. Unconnected entities are drawn with a white background. When multiple entities are selected, the 4 buttons in the top control whether entity connections are shown via or'ing the selected entities, and'ing the selected entities. For example, in "And" mode, connected entities (those shown in orange) must co-occur in some document(s) with all the selected entities.

A right-mouse-button click on a selected entity or entities provides a menu with a number of useful operations including Show, Alias, and Delete.

WordTree View

This view is a version of the WordTree visualization introduced by IBM through the Many Eyes visualization site and their 2008 IEEE InfoVis paper. Here, the WordTree applies to all documents in the collection. This view helps you understand the context of different words in the collection.

When you enter a term in the upper text entry region, the system will show all the trailing words/context that follow it in some document. You can constrain the view to compress all the string

Document Grid View

This view is useful for seeing a sorted and shaded list of all the documents in the collection where the order and shading can communicate different metrics about the documents. The view begins empty but documents can be added via Show operations in other views, search queries, or the Add All button in this view. Each document is represented as a small rectangle within the view. The documents are sorted from the top-left to the bottom-right by row. You can apply different metrics to control this order and the shading of each document's rectangle. Mousing over a document rectangle shows its Document ID and the value for the metric used to control sorted order. Presently, only a number of different metrics are available: the size of a document, the number of entities in a document, the document date, the document's sentiment, and the documents' similarity to a selected document. By selecting the checkbox in the upper left, you can make the documents organized by cluster (if that has been computed) and then ordered and sorted appropriately within those clusters.

The Document Grid View has a menu command at the top for printing out to a file all of the different documents in the view in the order in which they appear and with a metric for each.

If you have performed a document clustering computation, this view has the capability to lay out all of the documents in their respective clusters rather than the default grid.

5.4 Automated Computational Analysis

WebJigsaw provides a number of different automated computational analyses that can help you explore the document collection. It provides four important capabilities: document summarization, document similarity, document clustering, and sentiment analysis.

To do this,you can choose the appropriate command(s) from in the preprocessing stage under the "Computational Analysis" tab. If you want to employ these analyses, we strongly recommend that you calculate them after performing entity identification. By default it uses clusters of size depending on the number of documents. Note that when you perform the computational analyses, WebJigsaw blocks and you cannot perform any other operations. The analyses can take a significant amount of time too. For a document collection of five thousand documents or for larger documents, the analyses may take hours. In a situation like this, we recommend that you start the analyses and then do something else in the interim, maybe even run the analyses overnight and return to investigation the next day. Below we describe each of the analyses and how WebJigsaw presents it.

Document Summarization

Document summarization is integrated in different ways in WebJigsaw. The Document View shows a word cloud (at the top) of selected documents loaded in the view. The word cloud helps you to quickly understand themes and concepts within the documents by presenting the most frequent words across the selected documents. WebJigsaw removes frequent, simple words but does not combine words like “make”, “makes”, and “making” (stemming) in order to be able to highlight identified entities in the word cloud. The number of words shown can be adjusted interactively with the slider above the cloud. Additionally, the Document View provides a one sentence summary (most significant sentence) of the displayed document. This one sentence summary of a document is available in all other WebJigsaw views as well. It can be displayed through a tooltip wherever a document is presented as a symbol or its name. The cluster view option of the Document Grid View also provides keyword summaries for the clusters.

Document Similarity

In WebJigsaw, document similarity can be measured relative to complete document text or just to the entities connected to a document. These different similarity measures are of particular interest for semi-structured document collections, such as publications, in which metadata-related entities (e.g. authors or conferences) are not mentioned in the actual document text. The Document Grid View can provide an overview of all the documents' similarity (compared to a selected document) via the order and color of the documents in the grid representation. To do this, click on a document to select it and then invoke the right menu and choose the command to make it as the basis for similarity. Then go to the upper right and make the order and/or the shading of documents in the grid be based on similarity. In all other views, the five most similar documents can be retrieved with a right mouse button command on a document representation. Note that we have found that the entity-based similarity computation sometimes crashes if some of the documents have a small number of (or no) entities.

Document Clustering by Topics

WebJigsaw also can group similar documents together. Like the calculation for document similarity, document clustering also can be based on either the document text or on the entities connected to a document. Computed clusters can be shown in the Document Grid View. Within the Grid View, there is a chooser for selecting which clustering is to be shown in the view. Each cluster is labeled by three words/terms that describe some of the main concepts within the cluster. Within the Grid View, select the option in the upper right to organize documents within the grid by cluster.

Document Sentiment/ Subjectivity Analysis

A document's sentiment is its general tone or mood - is it positive and upbeat or is it negative and angry? Subjectivity is simply classifying a sentence or a clause of the sentence as subjective or objective. Metrics about a document's sentiment, subjectivity, and polarity can be displayed in the Document Grid View. Choose the appropriate metric from the menu selections in the upper right. One metric can be represented by the order of the documents, and a second metric (or the first metric again) can be encoded by the document color. Jigsaw represents positive documents in blue (more positive is indicated by darker blue) and negative documents in red.

To read more about how the desktop version of WebJigsaw works and see a video demo, please refer to the web page. The web pages there, in particular the System Views page, tells more about the views. We would recommend reading the 2008 Information Visualization and the 2013 IEEE Trans. on Visualization and Computer Graphics journal papers (available at the website above) about the system for further help and explanation of Jigsaw's purpose and how it works. The overview, example scenario, and tutorial videos on the top Jigsaw web page also should be especially useful in understanding how the system and views work (although the overview video is a bit dated now). The Tutorial Videos page on the Jigsaw website has many useful how-to videos about the system.

If you would like help using WebJigsaw, please send email to stasko@cc.gatech.edu.

We would definitely like to hear comments and thoughts about the system. We are particularly interested to hear about the way that you are using the system and if it is beneficial to you. Please do let us know about this.

WebJigsaw Datafile Format

Jigsaw Datafiles (with suffix .jig) are xml files that encapsulate a set of one or more documents. Presently, for each document the file contains the document ID, its date, any other documents it references, the document's source, and the actual text contents of the document, along with any entities that have been identified in the document.

A Jigsaw Datafile contains an outermost <documents> tag that encloses multiple <document> items. Each <document> should contain a <docId> and it has an optional <docDate> and other reference fields. The plain text source/contents of the document should be in the <docText> field and the identified entity values such as <date>, <time>, <money>, <place>, <person>, and <organization> trail. Note that you can add other entity types into that section as well.

There are some rules to follow for entity types, values, and other text in Project files. Entity types cannot have spaces in them. Entity values and the report description text cannot contain the &, <, >, and % characters as they are illegal in xml contents. To put those characters into text regions, use the following abbreviations.

& - &
> - >
< - <
% - %

The first line of a .jig file can specify the file type, for example, Unicode, in the manner that is typically done for xml files.

An example of a Jigsaw Datafile with one document in it is shown below. Look in the datafiles folder for other larger examples.

<documents>
        <document>
        <docID>20040216-2_30</docID>
        <docDate>Feb 18 2004</docDate>
        <docSource/>
        <docText>
In the first action of its kind this winter, 18 bison were captured outside Yellowstone National Park on Tuesday and were being tested for brucellosis. Those that have signs of the disease will be sent to slaughter and the rest will be marked and set free, according to Karen Cooper, a spokeswoman for the Montana Department of Livestock. 
The bison, a mix of calves, yearlings and adults, were hazed into a pen just before noon Tuesday near Horse Butte, west of Yellowstone. The bison were then loaded onto trailers and trucked to another holding pen to be tested for brucellosis. 
Cooper said some of the bison had been hazed back into the park on Jan. 28, Feb. 5 and Feb. 13. "These were some of the same animals. We could not get them back in the park so today it was a capture operation," Cooper said. 
Several agencies participated in the capture, including the Department of Livestock, Montana Fish, Wildlife and Parks, National Park Service and the U.S. Forest Service. Through a state and federal bison management plan, government agents haze and sometimes capture bison that leave Yellowstone. The plan is intended to reduce the risk that bison will transmit brucellosis to cattle in the area. 
</docText>
        <date>Feb. 13</date>
        <date>Feb. 5</date>
        <date>Jan. 28</date>
        <date>Tuesday</date>
        <date>this winter</date>
        <date>today</date>
        <time>noon</time>
        <place>Yellowstone</place>
        <place>Yellowstone National Park</place>
        <person>Karen Cooper</person>
        <organization>Department of Livestock</organization>
        <organization>Montana Department of Livestock</organization>
        <organization>National Park Service</organization>
        <organization>U.S. Forest Service</organization>
        <place>Montana</place>
        <person>Cooper</person>
    </document>
</documents>

1. Overview and Purpose

2. Getting Started with WebJigsaw

2.1 System Requirements

2.2 Initiating a Session

2.3 Reading in a Set of Documents

2.4 Displaying Views

2.5 Start Analysis and Exploration

2.6 Saving a Session

3. Importing and Saving Documents

3.1 Importing Documents

Note

Importing CSV Files

Jigsaw Datafiles

Note

3.2 WebJigsaw Projects

4. Identifying and Working with entities

4.1 Entity Identification

4.2 Correcting Erroneous Entity Identification

4.3 Entity aliasing

5. Exploring and Analyzing Document Collection

5.1 General Tips

5.2 Search Tips

5.3 View-specific Use Tips

Document View

Note

List View

WordTree View

Document Grid View

5.4 Automated Computational Analysis

Document Summarization

Document Similarity

Document Clustering by Topics

Document Sentiment/ Subjectivity Analysis

8. Help/Comments

9. Appendix

WebJigsaw Datafile Format