Data Accessibility in the 21st Century

I was reminded at the weekend that there are are accessible scientists and there is accessible data and occasionally both. I recently had a reprint request for a paper I co-authored in 1999 and a dataset request for a paper published in 2010. I took time out of my weekend to respond to both scientists from Japan and India, respectively. Oh and BTW neither paper was NIH funded either.

I try to make my data accessible and I am pretty well accessible and easy to find. If people want my papers or datasets all they have to do is email me if its not on a website or in a database. Unless the data is clearly proprietary, and I have a few of those datasets in which the work was part of a consulting gig (and I am shackled to keep quiet), I will normally respond quickly and be helpful and positive. Please tell me if I am mistaken.

Why am I writing this?

There are also inaccessible scientists and data – and of course both. Still in the 20th century there are journals that cannot handle supplemental datasets that are SDF or other structure files. I was reminded of this when I tried to access a dataset in J Med Chem – check out this file jm301302s_si_002 (246 kB) at this paper. What is it? should be structures and data I think but I have no clue what will open it from simply clicking it – maybe someone can explain? I assume J Med Chem actually tests the files it uploads.

Repeatedly I am blown away by how little of the data actually funded by our taxes is accessible to others, whether scientists, parent advocates or who ever wants it. Its a long story not only to get the data and structures right, it is another thing to just publish on a massive dataset (potentially amenable to modeling) and only make a fraction of it publically available.

Take a relatively recent example from the NIH screening compounds against the Pregnane X Receptor using qHTS. I looked on PubChem and found only the data for rat PXR is available, leaving the human qHTS data for hPXR, hPXR LBD and CYP3A4 induction in limbo. Where is 75% of the data?  I emailed the corresponding author Menghang Xia over a month ago but so far no response.

Dear Dr. Xia,
I recently read your interesting DMD paper from 2011 on human and rat PXR and wondered what the identifier for the human data sets was in PubChem please. I am only able to find the rat data (http://pubchem.ncbi.nlm.nih.gov/assay/assay.cgi?aid=651751). Is the human data available in PubChem or is it deposited somewhere else?

Maybe they are no longer at the NIH, maybe my email got caught in the spam folder, or maybe I am being ignored. If anyone can bring this to their attention please do..would like to get that data my taxes (and yours) paid for. You might also say they are under no obligation to share the data – if that is the case scientists should not be NIH funded regardless of whether they are intramural or extramural.

So perhaps we should insist that all data from NIH funded work should be actually accessible in practice – saying that its in PubChem or a supplemental file is no good if the data cannot be opened or only 25% of the data published is there.

Then there is the NCATS repurposing compounds and the (still) lack of structures on the website..mentioned today over at In the Pipeline. Just makes you wonder what (not) to expect next..funding  oops.. gone..

If we are to build on the shoulders of our colleagues, the data has to be not only good quality but accessible. You would think this would be easy in the 21st century.





No comment yet

6 pings

  1. Wendy Warr says:

    J. Chem. Inf. Model. carries SDfiles as SI and I am sure that J. Med. Chem. does. Either the author has submitted the wrong file format or someone in production has goofed. I suggest you contact the editor and find out what happened.

  2. sean says:

    Many thanks Wendy,
    I have just emailed both editors. I know it seems trivial but a file without a file extension..hmm.

  3. sean says:

    I had an email from the Journals Production office of the American Chemical Society. The file is meant to be an .xl fileand they hope to resolve the problem soon.

  4. sean says:

    I wonder how many other files there are like this that are just not accessible and no one has brought it to the journals attention. This one has a Publication Date (Web): December 14, 2012.

    1. Wendy Warr says:

      I hope that this does not happen often, but our (ACS) help desk is very responsive and will fix such errors very quickly.

  5. sean says:

    The J Med Chem link is now corrected – Thank you J Med Chem for the fast response..No response from NIH yet…

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>