Василиса▶ Я жду вашего обращения. Что Вы хотите узнать?
|Turnitin collaboration pages|
|This page in a nutshell: Turnitin, a plagiarism detection company, is interested in checking all of English Wikipedia through its algorithms and content database for copyright violations, for free, on an informal, non-exclusive basis. In turn they would like to receive attribution on the off-Wikipedia pages where Turnitin reports are located and have the ability to publicly say that they've collaborated with Wikipedia. Turnitin is willing to adapt their software specifically for Wikipedia, to test its efficacy in a pilot program, and to provide access to their servers on an ongoing basis. The community must discuss this idea and ultimately decide whether or not to pursue it. Please see the Request for Comment to decide if we can run a trial of Turnitin's software.|
Turnitin is an Internet-based plagiarism-detection service run by iParadigms. Universities, schools, and professional researchers and writers submit documents to Turnitin's websites, which check the writing for originality against a comprehensive internet crawler, a database of proprietary content, and prior submissions. Managing copyrighted content is a major focus and problem for Wikipedia. This page is designed to lay out the concept of a potential collaboration with Turnitin as a way to combine our strengths and improve or resolve a major issue for Wikipedia's content oversight.
Comments are welcome on the talk page .ContentsBackground: Turnitin[ edit ]Turnitin checks and archives millions of papers and uses its database and algorithms to identify plagiarized material.Submissions are compared to over 17 billion web pages, 200 million student papers, and over 100 million additional articles from content publishers, including library databases, text-books, digital reference collections, subscription-based publications, homework helper sites and books. (Turnitin says their web index is now up to 25 billion pages).Globally, Turnitin evaluates about 40 million papers a year. During final exam periods, the site processes 400 new submissions per second.As of 2012, Turnitin serves about 10,000 institutions in 13 languages in 126 countries.More than 2,500 higher education institutions use Turnitin, including 69 percent of the top 100 colleges and universities (U.S. News and World Report Best Colleges list).Almost 5,000 middle and high schools use Turnitin, including 56 percent of the top 100 high schools (U.S. News and World Report's America's Best High Schools).In Colorado, Turnitin is used by 100 schools—both secondary and higher education—and more than 200,000 students.More than 100 colleges use Turnitin to detect plagiarism in application essays.Turnitin's parent company iParadigms employs almost 100 people. It is backed by the private equity firm Warburg Pincus. It has 8 international offices serving almost 130 countries. It is headquartered in Oakland, California .Background:Copyright investigations on Wikipedia[ edit ]
There are several ongoing efforts to deal with copyright on Wikipedia:CorenSearchBot – This is the most sophisticated tool we currently use. It checks new Wikipedia articles and matches content against a web search and tags them with an appropriate message as well as alerting relevant copyright forums. There are limitations to Coren's bot: it does not check existing Wikipedia content, it only checks webpages not a content database, and it doesn't have a corpus of prior submissions. It's possible that Coren's algorithm is not as developed as the proprietary code by Turnitin. Coren's bot is limited to run one check per 5 seconds, which would allow it to check over 6 million articles yearly; that is enough to cover English Wikipedia almost twice, however, it's not clear if that level of operation is feasible. Coren's bot does not generate an itemized report which allows editors to actually see and compare plagiarized sections or identify the various sources which result in the match (for recent Coren bot reports, see User:CorenSearchBot/manual ). In exploring a collaboration with Turnitin, a necessary question is whether Coren's bot is sufficient, or should be expanded rather than overtaken by Turnitin's system. There's also possible areas of synergy where two can complement each other. For example, Coren's search bot could tag articles that score high in Turnitin's copyright detection. Also of interest are the bot's 'excluded' sites list , which includes Wikipedia mirrors; this could be leveraged to assist Turnitin in optimizing their algorithm for Wikipedia (see also: Wikipedia:Mirrors and forks and the Mirror filter ). Note that Coren is not currently active (since 31 December 2011), and his bot has been mirrored and replaced by User:MadmanBot . Duplication Detector – This Toolforge tool compares two web pages directly and identifies areas of overlap. It does not run automatically or query a database. Contribution Surveyor Created for Wikipedia:Contributor copyright investigations on the English Wikipedia, the tool analyzes the contributions of users with a history of copyright violations. It isolates and ranks contributions that are most likely to be copyright violations. It lists contributions by size not likelihood of violations, so while it helps prioritize the largest offenses, it doesn't do so with an emphasis of actual likeliness of a violation. Copyvio Detector - Another Toolforge tool to detect copyright violations. WikiProject Contributor Copyright Investigations – This on-Wikipedia group investigates and fixes multiple and large-scale copyright violations. Their important work is largely manual and generally tedious. Wikipedia:Copyright problems , a help page for investigating single or small-scale copyright issues.Background: Corporate collaborations[ edit ]The idea of informal relationships with corporations is not without precedent, although it is still relatively new. In 2010 and 2011 Credo Reference donated 400 free "Credo 250" accounts to Wikipedia editors ( project page ), and in 2012 HighBeam Research offered up to 1000 free 1-year accounts to editors ( project page ).Wikipedia is an immense and precious global asset. Doing anything which is perceived to compromise its neutrality is not to be undertaken lightly or at all. Wikipedia is not a commercial project; further, it's an explicitly non-commercial project, and fiercely so. There are thousands of companies who would love to leave their logo or brand association on Wikipedia, but Wikipedia's independence is a primary concern. In many ways it is simply non-negotiable.Although Wikipedia maintains such strict neutrality and independence in its operations, collaborations with corporations have the potential to enhance the core mission of the encyclopedia. If they are done right, they can be beneficial and pragmatic, addressing major areas of site operations without compromising Wikipedia's objectivity or giving undue privileges to any company.Overview[ edit ]Principles[ edit ]Respecting copyright is required by law as well as being core policy on Wikipedia, as it aims to be a truly free work for all to use, modify, repurpose, or even sell.Current tools for identifying copyright violations are limited, sometimes manual, not comprehensive, and inefficient.Turnitin provides paid access to a comprehensive copyright and plagiarism database that Wikipedians would find useful in their regular content work as well as their copyright violation investigationsTurnitin is not inexpensive and would be unaffordable to a majority of volunteer editors who work on the encyclopedia.A collaboration between Turnitin and Wikipedia would be mutually beneficial.What's in it for Wikipedia?[ edit ]Access to a leading service for plagiarism and copyright detectionIncreased efficiency and scalability for dealing with copyright violationsAbility to prioritize and oversee copyright investigations and cleanupAn opportunity to analyze every Wikipedia article using a sophisticated algorithm, which could revolutionize the way we manage our contentEnhanced community relations with a provider of education resourcesAnother tool in the community's and editors' bag for monitoring and improving articlesWhat's in it for Turnitin?[ edit ]High-profile collaboration which would solidify the software's status as the standard in its fieldTremendous amount of user feedback from a community which is known for giving feedbackOpportunity to improve the content on the largest encyclopedia in the worldVisibility within the community as having helped out with an essential aspect of site operationsIn line with policies, promotion of this collaboration throughout the communityPending discussion, attribution given on Turnitin's off-Wikipedia reportsGreater awareness among editors that Turnitin exists and provides a useful servicePotential for Turnitin to advertise that it is used to 'check Wikipedia'What it's not[ edit ]A formal partnership or contractual relationshipAn endorsement of Turnitin over other similar and competing servicesAn agreement to continue using Turnitin's services if a free, competing, or open source version of comparable software becomes availableWorking plan[ edit ]Turnitin report would be generically/anonymously linked to on talk pages (the name Turnitin would not be mentioned on talk pages), which meet a certain level of text-matching determined by the community as usefulTurnitin's report pages would be rebranded as something like "WikipediaCheck"At the bottom of Turnitin's reports would be a small icon that said "Powered by Ithenticate", which is Turnitin's parent company.Turnitin's reports integrated with a new or existing bot that periodicallly queries the Turnitin database during their off-peak hours and writes a report to the article talk page or a subpageA central page project page, talk page, or possibly even article page could be updated with results or appropriate tagsAttribution[ edit ]
One of the key issues is whether, how, and when to give Turnitin attribution or credit for the services it provides. Here's one example, a notice/banner that could be placed on the Talk page of articles.
|This article was checked for text-matches against other websites and articles on March 24, 2012. Click here to see the report. This is only a starting point for an investigation to potential copyright or plagiarism issues.|
Signed on:Ocaasi t | c 16:22, 25 March 2012 (UTC) Andrew G. West , computer science PhD student at UPenn who studies wiki security Doc James ( talk · contribs · email ) (if I write on your page reply on mine) 01:01, 8 May 2014 (UTC) Fuhghettaboutit ( talk ) Admin with years of experience with copyright violations, detecting them, investigating backwards copying and related issues. Love the idea and glad to help if I can. I have no programming skills, however, and so will be useless on that end of matters.
Bot programmers:User:ValHallASW User:Eran