TEIMMA Annotator

Home About

Important Instructions before annotating

This demo is just a showcase and is not supposed be a service to allow users to annotate their data. Hence, we only provide an already converted document pair in the demo UI to record annotations. The uploaded documents, if any, will not be converted and used in the UI. Please also note that if multiple people are testing the UI at once, recordings might be slow and inaccurate. The demo version does not allow to save annotations in the database directly (In case of repository installation PostgreSQL DB is provided). However, you can download the annotations by clicking on "Download annotations" button after viewing all recorded cases with "view all recorded cases button".

For the detailed installation instructions about the tool, please refer to: https://anonymous.4open.science/r/TEIMMA-Annotator-7271

The documents provided in the demo are:

Document 1: José M. Sánchez. 2018. Leibniz algebras with a set grading. RETRACTED. Uzb. Math. J. 2018, 2 (2018), 74–92. https://doi.org/10.29229/uzmj.2018-2-7

Document 2: Antonio J. Calderón Martín. 2014. Lie algebras with a set grading. Linear Algebra Appl. 452 (2014), 7–20. https://doi.org/10.1016/j.laa.2014.03.031

Please also note that the document pair shown on the UI is just an example. We do not comment on the authenticity of the content hence directly categorizing as plagiarized or any suspicious reuse. The final decision of the reuse is up to the domain expert.

Annotation Procedure

1. Choose a document under inspection (left side upload) and potential source document (right side upload) file with file extension in LateX or PDF or .txt.

a. File(s) with any other extension will throw an error.

b. Also, make sure you choose both files in the correct columns; otherwise, this will record the wrong document name as inspected and potential source.

2. After choosing both files from a local directory, click on the upload button.

a. This might take a while as latex conversion to HTML5 is in the process using LaTeXML.

b. You can observe the progress on the terminal.

c. Upon successful conversion, both documents will be shown on the main UI page. You can verify if the content is converted correctly and if the document under inspection and potential source document are correct.

d. The generated HTML5 files will be cached till the next upload overwrites. If you want to upload the same files again, the UI uses saved HTML files from the database.

e. You can scroll into the individual documents to see further content

3. To start recording a similarity content case, click on the Start Recording button and select a part of the document under inspection and then the potential source document (any document could be chosen first).

a. You will see that both the selected texts will be highlighted with the same background color.

b. If the color is not assigned, please redo the selection by clicking on Start Recording again.

4. Select an option from Content type.

a. If no option selected, then Text will be selected as the default option.

b. You can choose more than one box to indicate that you annotate with multi-content type.

5. Select the appropriate Obfuscation from the drop-down menu. If you think that the obfuscation present in the text does not match the available choice, then enter an appropriate option of yours in the block Enter custom name.

6. Finally, click on Finish Recording to save the recorded case. After this, the page will refresh, and your saved case will be shown with the same assigned background color.

7. To highlight the existing similarity, you can choose any of the algorithms from the available algorithms in the UI option Highlight similarity

a. You can also enter the algorithm's parameters in the provided input. For example, in the case of Threshold for LCS, you can enter the minimum number of word tokens you want to view.