Syras has de-duplication abilities to help with merging imports from different journal sources, or updates from refreshed journal database searches.
Deduplication with Syras is the process of running a two step process a few times:
- Scan across your corpus to identify suspected duplicates.
- Review the suspected duplicates and select which ones you want to delete.
The process can be run once, or a few times, depending on how confident you need to be in finding ALL possible duplicates in your corpus.
There are three methods of deduplication. Each method will be more likely to catch all duplicates, but will require more care (aka, returns a higher likelihood of false positives) when reviewing results. Scans will also take longer.
Recommended duplicate scanning workflow:
- Exact + Delete - Run an exact match scan, and automatically delete the results.
- SRA + Review - Run an SRA scan, and allows you to assess the suspected duplicates.
- This normally catches the vast majority of duplicates. If you still suspect duplicates, you can try other settings to increase the potential number of suspected duplicates.
Exact Match method
This is a strict comparison of titles and abstracts for an exact match. It’s likely to find a few, with good confidence.
Bond University provide an open-source matching algorithm that checks a number of common patterns seen in articles that look slightly different, but which are in fact duplicates. Capitalising, abbreviating, and many other subtle tweaks get caught up here. This finds a longer list, but with slightly less confidence than Exact Match.
This method loosely matches on titles only, handling case, diacritics and punctuation, etc. It only assesses title, ignoring everything else, so it’s most likely to return false positives.
Running another scan must be done after all duplicates have been reviewed
When a scan has been completed, if you chose the "review" option, all suspected duplicates must be reviewed and resolved (removed, or retained) before you can run another scan.