This site is devoted to copyright and issues of 'intellectual property,' particularly the issue's analytical aspects. It also concerns itself with the gap between public perception and the true facts, and with the significant lag time between the coverage on more technical sites and the mainstream press. For site feed, see: To see the list of sites monitored to create this site, see:

Wednesday, October 10, 2007

VA to no longer release cancer data. Not exactly a typical topic for here, but it does relate to openness. Epidemiologsts are notorious for hoarding data. Do one study, sit on the data, and release a trickle of papers. Your career is made after that without much work. The incredibly important Framingham study is just being opened up, which is a good step. Death data by county is still being hoarded, though, and has been closed off since 2001. The excuse is always "privacy," but magically, if you work with one of their researchers you can always manage to find a way.

Increasingly, I think the Open Access fight is being won, but the open data and open code fights are not even on the radar. There are examples to the contrary, but they are few and far between, at least in the basic sciences and public health.

Part of the problem is that privacy claims still have a veneer of legitimacy. Mind you, I think they're incredibly important, but they're becoming a catch-all excuse for hoarding. There need to be standards--both technical and behavioral--to ensure privacy. Maybe someone could even come up with a super-secure box, a server filled with hard drives that would be hardened against attack and could be used to store secure data, with all analyses requiring non-anonymized data being run before they leave the box. Want to do research on our data? Buy an ISOxxxx-compliant box and make sure the data never leaves it in non-anonymized form.



Blogger Ethan said...

You run into the same problem DRM does -- if the box can get the data unanonymized, then the user can track what the box does and do the same thing. You can slow down the process -- encrypt the hard drive and store the key in separate Flash memory or something -- but I think a more appropriate solution would be to get a big box and have, say, NIH host it.

Another question is, what kind of API would serve for access to data in this way? Some kind of SQL server that does a check to make sure its results aren't unanonymized?


2:38 PM  

Post a Comment

<< Home