Available: on-premises, cloud
Overview
The Table Extraction plugin is responsible for extracting data from the batch with tabular data in the form of tables. The user defines basic table information and a set of table columns for the table.
Ephesoft Transact uses one or more rules to perform table extraction. Note the following factors for table extraction:
- There may be multiple table extraction rules defined for a table.
- The extraction rule that provides the best and most valid data from table columns is chosen to show table extraction results.
- The validity of extracted columns is based on one or more validation patterns for each column combined with table validation rules that are applied to each row of extracted table data.
Characteristics
- For each document, consisting of one or more pages, the table extraction algorithm will extract all tables defined for a document type.
- Document is parsed to identify tables starting from the first page to the last page of the document.
- One table may span one or more pages.
- A table defined for a document would consist of multiple table columns, table extraction rules and table validation rules. Table columns and at least one table extraction rule are minimum requirements for table extraction to give some results.
- A table extraction rule contains start pattern and end pattern that denotes boundaries for table data for extraction process. A table extraction API for an extraction rule is a combination (using AND or OR operators) of 3 kinds of validation:
- Column Coordinates Validation
- Column Header Validation
- Regex Validation
This API combination denote the behavior algorithm shall use for extracting data for every table column in a row.
- Each table extraction rule has table column extraction rules i.e. one extraction rule for each of the table columns. It contains information used in column extraction by table extraction APIs like column pattern, column header pattern, start coordinate, end coordinate, multiline Anchor, required, etc.
The summary of which column extraction rule information is used with respect to which table extraction API is:
Table extraction Rule’s API | Table column extraction rule fields used. |
Column Header Validation | It uses column header pattern to search the data matching column header pattern as string with some fuzziness or search column header regex pattern’s best matched value in the page, Learn matched header string’s coordinates to extract data beneath it as data for extraction. The text at left or right proximity of the text beneath the header is also appended to the result column extracted value. |
Column coordinate validation | It uses start coordinate and end coordinate to use as coordinates denoting the vertical boundaries for location of column data on page. These two can be set by clicking on set coordinates button, uploading an image sample and drawing overlays for giving coordinates for columns. Click on Ok button sets start and end coordinates to the column extraction rule. |
Regex validation | Column pattern, Between left pattern and Between right pattern are used to find best matched text in each row for the column data.
|
Configuration
Following is the list of configurable properties for plugin in dcma-tablefinder.properties located at {EphesoftHome} WEB-INFclassesMETA-INFdcma-table-finder*:
Configurable property | Type of value | Value options | Description |
tablefinder.gap_between_column_words | Integer | NA | Gap between words of same column data. Used while column header extraction. Value is defined in pixels. By default its 60. |
tablefinder.rule_removal_invalid_characters | List of values separated by semicolon (;) | NA | Invalid characters in extracted column value which need to ignored before applying the table rule to the columns. |
Table Configuration
Add /Delete Table Info
User can add /delete any table information upon clicking the corresponding buttons at following UI:
Upon clicking the Add, following UI will be presented where user can enter values for any property:
Test Table
Table extraction plugin is responsible for extracting data from the batch with tabular data in the form of tables. Using test table feature User can check whether table configuration is ok to extract tabular data in the form of tables without running any batch. User can upload a valid image file or place the image file at the given path:
{base-folder}batch-class-id test-table
Test Table output will be shown at the following UI:
Configurable Properties
Following are the list of configurable properties for the plugin:
Configurable property | Type of value | Value options | Description |
Name | String | NA | Name for the data table. |
Validation Rule Operator | List of values |
|
In case of AND, the table row becomes valid if and only if it satisfies all the table validation rules defined. In case of OR, the table row becomes valid if it satisfies at least one of the validation rules. |
Remove Invalid Rows | Boolean |
|
Whether to remove invalid rows according to table validation rules from table result data or not. |
Currency | List of Values | Ephesoft supported currencies. | Name of the currency on the basis of which validation rules are to be applied for table.All table columns with currency field checked true, defined in a column extraction rule, will undergo currency extraction on the basis of this value for validation rule application. |
Table Column Configuration
Add /Delete Table Column Info
Table column information can be added /deleted by clicking corresponding button at following UI:
- Upon clicking the add button, following UI will be presented where user can add table column fields:
Configurable Properties
Following are the list of configurable properties for the plugin:
Configurable property | Type of value | Value options | Description |
Column Name | String | NA | Name of the column. |
Description | String | NA | Description of the column. |
Validation Pattern | String | NA | Validation pattern of the pattern. This pattern validates extracted column data for each table row. |
Alternate Values | String | NA | A semi-colon separated list of values entered by user. These values appear as suggestions for the column in the table view at validation screen. |
Table Extraction Rule Configuration
Add /Delete Table Extraction Rule
Table extraction rule can be added /deleted by clicking corresponding button at following UI:
- Upon clicking the add button, following UI will be presented where user can add table extraction rule fields:
Test Table Extraction Rule
Using test table extraction rule feature User can check whether a table extraction rule configuration is ok to extract tabular data in the form of tables without running any batch. User can upload/drag & drop a valid image file or place the image file at the given path:
{base-folder}batch-class-id test-table
Test Table Extraction Rule output will be shown at the following UI:
Configurable Properties
Following are the list of configurable properties for the plugin:
Configurable property | Type of value | Value options | Description |
Rule Name | String | NA | Unique name of table extraction rule. |
Start Pattern | String | A keyword or a valid regex expression. | A keyword to be matched as a string with some fuzziness configurable from property file or regex pattern to match some string marking the beginning of the table in a page. Correct start pattern must be specified for table data to be extracted. It can be validated using the check button. |
End Pattern | String | A keyword or a valid regex expression. | A keyword to be matched as a string with some fuzziness configurable from property file or regex pattern to match some string marking the end of the table. It can be validated using the check button. |
Table Extraction API | Combination of some Boolean values using AND and OR operator. | A combination of selected table extraction APIs (column header validation, column coordinate validation and regex validation) with AND/OR operators to decide algorithm to extract table columns. |
Column Extraction Rule Configuration
Edit Column Extraction Rule
Column extraction rule can be updated at following UI:
- Upon clicking the edit button, following UI will be presented where user can edit column extraction rule fields:
Configurable Properties
Configurable property | Type of value | Value options | Description |
Column Name | String | NA | Name of the column. Non editable field, only for reference with table column for the table. |
Column Pattern | Regular Expression | Valid regular expression | The regex pattern for column data. |
Between Left | Regular Expression | Valid regular expression | The regex pattern for data in left of the actual searched column. |
Between Right | Regular Expression | Valid regular expression | The regex pattern for data in right of the actual searched column. |
Column Header Pattern | Regular Expression | A keyword or a valid regex expression. | A keyword to be searched as a string with some fuzziness in the page or regex pattern to search column header regex pattern’s best matched value in the page. |
Start Coordinate | Integer | NA | Start Coordinate for the column. |
End Coordinate | Integer | NA | End Coordinate for the column. |
Multiline anchor | Boolean |
|
Marks the column as a required column and anchor to denote the start of a new row in the table of the page. This is useful in the case of one table row spanning in multiple rows in documents. |
Required | Boolean |
|
If the radio button is checked, each table row extracted must contain some valid data for that column. If invalid data is extracted for the column, the corresponding row will not be added to table data. |
Extract data from column | Dropdown list | List of values containing names of other columns for the table that can be selected to fill textbox containing the name of the column for extraction. | Selection for the table column name from which the current column’s data needs to be extracted when using regular expression-based extraction. If it is left empty, then it is not applicable. |
Currency | Boolean | For example :$ 12,000.00 will be manipulated as 12000.00 for validations.EURO 12.000,00 will be manipulated as 12000.00 for validations. | Specifies whether the column is a currency field. If it’s a currency field then validation rules will be applied according to the currency representation. Manipulation will be done on the basis of the value for the currency chosen at Table Info Level. If this field is unchecked, no currency extraction will be done for the column irrespective of the value chosen at Table Info Level. |
Table Validation Rule Configuration
Add /Delete Table Validation Rule
A table validation rule is applicable to operands (table columns) that must be containing extracted column data as numerical values. Table validation rules are applied to rows of table extraction data. Multiple rules are applied at each row in OR or AND fashion as defined at table information level in Validation operator. If a row is invalid it is shown as orange shaded in extraction results if remove invalid rows is not selected at table info definition level or are removed from extraction result if remove invalid rows is selected at table info definition level.
Table validation rule can be added /deleted by clicking corresponding button at following UI:
- Upon clicking the add button, following UI will be presented where user can add table validation rule fields:
First drop down list contains list of operands (Table column names).
Second drop down consists of list of valid mathematical operators for a rule.
- Clear: This button clears the rule.
Configurable property | Type of value | Value options | Description |
Rule | String | NA | A mathematical rule that applies to the combination of column values and governs the validity of a table row data. |
Description | String | NA | The rule description. This description becomes visible on the table view on a selecting a row not satisfying the rule defined for it. |
Column Header Based Extraction
Enter column header regex pattern from following UI:
[Batch Class List]>>[Batch Class]>>[Document Type]>>[Table Info]>>[Table Extraction Rule]>>[Table Column Extraction Rule]
User can set the Column header pattern field for each table column extraction rule.
There is a configurable property for table extraction using column header in
{ephesoft-home}WEB-INFclassesMETA-INFdcma-table-finder*
tablefinder.gap_between_column_words=60
This value should be specified in pixels. In addition to words that are below the column header, all words (to the left or right) will also be extracted for the column in case gap between them and the extracted data is less than the value specified for gap_between_column_word.
Regex Based Extraction
A table extraction rule must be defined with have valid start and end patterns, along with Regex validation selected in any combination of table extraction API.
User needs to enter valid column patterns (optional between left pattern and between right patterns ) for regex based extraction.
Select table extraction technique to be used
Select a table Extraction API in combination using AND or OR operators between three techniques as shown below:
[Batch Class List]>>[Batch Class]>>[Document Type]>>[Table Info]>>[Table Extraction Rule]
Dependencies
Table extraction plugin has following dependencies:
- RECOSTAR_HOCR
- TESSERACT_HOCR
One of the above plugins must be ON for key value learning as these plugins extract data from the image and create hOCR file which is required for the table extraction.
Troubleshooting
Following are few common areas for troubleshooting for table extraction plugin:
S no. | Error message | Possible root cause |
1 | Table info list is null or empty. | No table is configured for the document type. |
2 | Table Columns Info list is null or empty. | No table column is defined for table. |
3 | Table Extraction Rule List is null or empty. | No table extraction rule is defined for table. |
4 | Exception occurred while validating rule for a table row. | Table validation rules could not be applied properly on extraction results. |
5 | Skipping Table extraction. Switch set as off. | Table extraction switch is set to OFF. |
Copy Table
Overview
This feature helps in making a copy of the existing table
The table has following configuration fields -> name, validation rule operator, remove invalid rows and currency. As each table should have different names in a document, copied table will be renamed automatically.
Steps to copy a table
- Open the document from the document type list appearing under the batch class in which the table is to be copied. Select the Tables from the batch class tree view appearing on the left of the screen and click Copy button on the top of the screen
- A new row is added to the existing table list.
- After completing the table configurations click on Apply
Table Import/Export
Overview
This feature allows a user to export/import existing tables within documents or batch classes or even different Ephesoft Transact instances. Using this feature, user has a benefit of transferring the exact information of tables to another Ephesoft application running on a remote system which will save a lot of time needed to reconfigure tables for having exact processing ability on a remote system.
Export Tables
By exporting tables, one can transfer the exact environment/configuration of tables present on a system to other. This also helps a lot in testing and debugging of issues faced in a configuration dependent environment.
Steps for Exporting Tables:
- On Table Listing screen, select table to be exported from the grid by via checkbox and then click on “Export” button.
This exported zipped table file can now be transferred to any other system and can be imported over there.Please Note: Before exporting all the changes should be saved, else you will get an error pop-up asking to save your pending changes.Refer below screen shot for same:In this user has added a new table but it’s not saved. Also, user can export multiple tables at a time.
Data Exported with Table:When we export the table the complete table hierarchy which is defined in database, is exported in a zip file.
Import Table
By importing table, one can create the exact environment/configuration for table present on any other remote system from which table has been exported.
Steps for Importing Table
Prerequisites:
- Exported zipped table
Steps:
- On ‘Table Listing’ UI, click “Import Table” link present in Import Table(s) panel or drag and drop the zip file for exported tables in the bottom panel as shown below:
After completing the upload of table user will be shown a success message.
Please note: User can upload only one zip file at a time but zip file may contain multiple tables.
- On Table Listing screen, select table to be exported from the grid by via checkbox and then click on “Export” button.