Wikipedia: who’s copying whom?

In July 2006, another Wikipedia user left a message on my talk page to say that he had begun an article on renowned Leicester space physicist Ken Pounds, and noting that I was a Leicester physicist, invited me to contribute to the article. Although I was reluctant to write about someone I vaguely knew, the article as it stood wasn’t brilliant, so I decided to expand it. As when writing any other article for Wikipedia, I took information from a number of sources on the web and provided links to them at the end of the article.

A couple of weeks ago, the question came up of whether Ken Pounds was the first chief executive of PPARC. I couldn’t remember, but knew the answer would be in his Wikipedia article. I was quite surprised to find the article had been deleted. I checked the log for the article, and found it had been deleted by a Wikipedia administrator called Refdoc with the comment that is was a “blatant copyright violation”, and giving a link to a report by the Irish Higher Education Authority entitled Research Infrastructure in Ireland – Building for Tomorrow.

Initially, I suspected someone might have pasted a load of text into the article from the report, but wondered why Refdoc couldn’t have reverted to an older version of the article. However, when I checked the Irish HEA’s report, I discovered that they had actually copied the Wikipedia article word for word and used it as a biography for Prof Pounds, who is on the steering committee that produced the report. Thus the biography in the report uses my words to explain that it’s a “rare distinction” to be awarded an honorary degree by the institution one works at (or at least I think they’re my words – it seems the University have also been reading Wikipedia, re-worded in their case, for material promoting the Alumni Association Lecture 2008). The report also contains the slightly obscure statement that someone added to Wikipedia, that “one of [Ken’s] many discoveries is that Black holes are common in the Universe.”

I know I wrote the article in summer 2006, shortly after I was contacted by the other user. I checked the Internet Archive Wayback Machine, and indeed, it had archived the article in September 2006. From this archived version, it is clear that the text is identical to that in the HEA report. Yet the report was only published in December 2006! While I personally already knew the text was originally from Wikipedia, I thought the evidence I’d found would surely convince anyone else. Yet Refdoc refused the re-instate the article, accusing me of being a user with “poor respect for copyrights” and saying he had little reason to believe me. He clearly hadn’t looked at the internet archive version. That comes as no surprise, as a little investigation before he deleted the article in the first place would have revealed that the Wikipedia article existed before the HEA report. The Wikipedia edit history would also have shown that it was written and improved over a number of edits by different people, rather than copied wholesale from anywhere else. Therefore Refdoc is clearly not a particularly thorough administrator on Wikipedia.

I have now initiated Wikipedia’s Deletion review process, and at the time of writing there is a consensus that the article was wrongly deleted. Hopefully the article will be reinstated by next week.

This episode brings up an interesting point about Wikipedia. They have a strict policy when it comes to copyright, where copying directly from another source is not allowed. But as more and more documents copy text from Wikipedia, it’s going to become harder to tell which was the original source. I think the Irish HEA’s report probably violates the GFDL, as they are supposed to credit Wikipedia, and also to license any document that builds upon a GFDL’d document under the same licence. They haven’t done so, and while I’m sure that I and the other contributors don’t object to our work being used, failure to include an acknowledgement has ultimately resulted in it being us who were accused of breaching copyright.

As for Wikipedia administrators, candidates for that role will have to be prepared to put in a lot of work to investigate suspected copyright violations, as more and more will be false alarms. Anyone who isn’t prepared to do that isn’t fit to be an administrator. In the future it’s going to be more important to check the history of an article, and look for any other evidence to determine who has copied whom.

Now, I wonder how many of my other contributions to Wikipedia have been deleted while I wasn’t looking…

Update: the article on Ken Pounds was reinstated the same day.

Leave a comment


By browsing this site, you agree to its use of cookies. More information. OK