Василиса▶ Я жду вашего обращения. Что Вы хотите узнать?
Duplication DetectorDuplication Detector

Duplication Detector , created for Wikipedia:Copyright problems on the English Wikipedia, is a tool used to compare any two web pages to identify text which has been copied from one to the other. Either, neither, or both pages may be current or old revisions of a Wikipedia article.

Please supply the URLs of two websites to compare (you can also choose, using the advanced version, to upload either document from your computer). The tool supports text, HTML, and PDF documents. For other types of documents, check Google's cache for an HTML version by doing a Google search for "cache:URL". To make the tool run faster for very large documents, increase minimum number of words to 3. For source documents containing scattered numerals, you may have to check "Remove numbers" to get the best matches.

Note: On May 22, 2019, I've moved dupdet from gridengine to kubernetes backend. I hope this change may reduce the downtime of this tool by enabling its automatic restart in the case of error 500. If there are any issues, please query my talk page

Duplication Detector can see article text hidden by templates like {{copyvio}}, since the text is still in the HTML page source, but cannot see text that has been removed. You need to use the URL of an old revision in this case.

Simple version (generates pages that can be linked to):
Document 1 (URL): Document 2 (URL): Minimum number of words: Minimum number of characters: Remove quotations: Remove numbers:
Advanced version (allows uploads):
Document 1 (URL): (or) Document 1 (Upload): Document 2 (URL): (or) Document 2 (Upload): Minimum number of words: Minimum number of characters: Remove quotations: Remove numbers:

Things to do in the future:

  • Caching results for repeated queries
  • Use a statistical model to rule out common phrases and proper names
  • Show side-by-side comparison of phrases in original context with original capitalization and punctuation
  • Detect copying of long phrases with minor modifications such as removed/added/modified words

The PHP source for Duplication Detector is available under the Simplified BSD License and was originally written by Derrick Coetzee. It does not require Tool Labs to run, so feel free to download and use it yourself using your own webserver or php command-line tool. ( .tar.gz ) ( .zip ) Latest version available from Github .

© 2014-2021 ЯВИКС - все права защищены.
Наши контакты