What warrants an erratum and why the old publishing model must change

Friday AM my day started with an email which I have marked up and added links to

Dear Dr. Ekins:

It has come to our attention that an error was identified in your recent Perspective entitled “The parallel worlds of public and commercial bioactive chemistry data” published in the March 12, 2015 issue of the Journal of Medicinal Chemistry (please see attached).  We would like to request that you submit an Additions and Corrections to the Journal (instructions attached).


An editor at J Med Chem


I just uploaded the letter from CAS to Figshare.

During Friday I issued the requested erratum and then retracted it latter in the day when I realized there was in fact no error. I emailed and left calls for the editor and admin so Monday should be fun..

What changed my mind was two Independent scientists and longtime SciFinder users / authors on the paper came to the same conclusion that in mid 2014 there was a problem with this patent in SciFinder (Chris Southan has now blogged more on it).

Its all a storm in a teacup as I thought we were pretty balanced in the article. Interestingly when the paper went through extensive review and major revisions no reviewer seemed to pick up on the same problem for CAS.

I think this highlights the difficulty with the old fashioned publishing model.

1. Authors submit paper to Journal on work they did months/ years ago

2. Months later they get reviews back

3. Weeks later they respond to reviews

4. Months later they get re-reviews back

5. Weeks later the re-respond to authors

6. Weeks later it is accepted

7. Days later it goes ASAP

8. weeks later proofs corrected and online

9. Months later paper published

10. By the time an article publishes it could reference databases and other sources long out of date and changed.

The publishing model ACS and other journals / societies use is way out of date and is not relevant anymore, why should it take > 6 months to go from submission to publication of a perspective? This is not even a research article where timeliness is even more critical.

And yet we still submit to ACS journals….Definitely as scientists we need more options.







Databases and collaboration require standards for human stem cell research

Last year my then colleague at CDD, Nadia Litterman and myself put together an article on databases and collaboration around the area of human stem cell research. This built on Nadia’s extensive experience of working with stem cells. My additions were mostly around software and collaboration. It was a fun paper to write as I learnt plenty! The Review has now published in Drug Discovery Today (I am on the editorial board) and you can get a free copy for the next month or so here.

This took a while to see the light of day as we bounced around at a few Nature and Cell journals. See reviews below!

I think the article may be useful also for funding bodies to propose more collaboration and data sharing. There may also be interest in this from rare disease research groups. Bottom line, more collaboration and sharing would help and with software available currently this could happen. But who will catalyze it? Based on the reviews we had from Nature was pretty much nit picking and turf protection as you can see.

Reviewer #1 (Remarks to the Author):

This Commentary is about a stem cell database and repository. The basic thesis that such a worldwide database/repository would be beneficial is clear and obvious. It has been stated previously in several venues – but it could be positive to have it stated again in a high impact journal. But I think that the commentary has a number of problems.

The commentary stresses the need for worldwide acceptance and inclusivity. I therefore found the relatively partisan and skewed presentation of the topic to be peculiar. For example, I do not think that it even mentions the world’s first clinical trial with iPSCs run by Riken in macular degeneration. Yet the authors manage to mention the trial being done by Advanced Cell Therapeutics (in which one of the authors has stock ownership) in macular degeneration.

A European Bank for induced pluripotent Stem Cells (EBiSC) was recently established and I do not think that it was even mentioned in the commentary. The EBISC self-description is:
The EBISC iPS cell bank will act as a central storage and distribution facility for human iPS cells, to be used by researchers across academia and industry in the study of disease and the development of new treatments for them. Conceptualized and coordinated by Pfizer Ltd in Cambridge, UK and managed by Roslin Cells Ltd in Edinburgh, the EBiSC bank aims to become the European “go to” resource for high quality research grade human iPS cells.
How can there be a world-wide bank that does not include this and/or Europe?

The table of “selected” companies doing stem cell research adds little or nothing and it immediately violates the concept of inclusivity. There are more than 60 stem cell companies. Highlighting just a few would alienate the others and be counterproductive.

There were other statements that were out of sync with the view of a worldwide effort. For example, they say: “While other countries like Japan are investing heavily in stem cell research, it will be important that the USA is not left behind”. This commentary is not about the USA – it is about the rest of the world also.

Perhaps the most disappointing thing about the article is that it did not give any suggested pathway for accomplishing the task. They say that doing it “would require significant sponsorship, world-wide support, active participation, and continual maintenance”. At the very least, they should put forth some suggested ways of proceeding to get to the goal. As written it doesn’t help the field progress or give any clues on how progress might be made.

The authors’ biases show in many ways. For example, they say: “In 2006, the research groups of Drs. James Thompson and Shinya Yamanaka discovered a method for generating induce pluripotent stem (iPS) cells. Yamanaka made the discovery (and got the Nobel for it). Thompson merely confirmed it (and did not get the Nobel). Listing Thompson at all is questionable, and listing him first is unacceptable.

Overall I agree completely with the basic point that is being made, but the presentation of the concept seems biased and not particularly helpful. This commentary is not up to the quality of the Nature Genetics.

Reviewer #2 (Remarks to the Author):

The commentary, “Databases and Collaboration Require Standards for Human Stem Cell Research”, by Drs Litterman and Ekins is a very reasonable call-to-action for the stem-cell research community. In this commentary, the authors point out the need for a centralized database to serve as a repository for the enormous amount of data being generated by this community of researchers. They specifically urge the development of “Minimum Information About Stem Cell Experiments” (MIASCE) that will require consensus in the field about the critical pertinent information required for stem cell lines and differentiated cells. The ultimate aim would be to project this information into a secure, cloud-based database for information sharing. The authors point to a similar centralized effort that has apparently been successful in TB research.

I think very few would doubt the need for this kind of information. Indeed, the authors point out numerous attempts (albeit decentralized) at achieving this, and consensus panels (eg at ISSCR) recognizing the need.

The problem with this commentary is it doesn’t move far beyond urging the community that this kind of resource is needed and I think it’s already clear to the community that this is true. I think an effective commentary in Nature Genetics would provide more of a concrete path forward, and some tangible ideas on how to overcome the very substantial barriers to the enterprise proposed by the authors. The barriers are many and they are formidable, including:

1) Scientific – stem cell science is in its infancy; the community is still grappling with how to reprogram somatic cells and how to differentiate pluripotent cells into different somatic cell types. There are issues of epigenetic memory, ignorance over whether different reprogramming methods produce the same cells, and a plethora of non-standardized differentiation protocols. Differentiated cell cultures are heterogeneous and single cell profiling is just beginning to be undertaken in a few labs with access to the cutting-edge technologies required. It is becoming clear that most of the noise in cultures relates to heterogeneous genetic background and cell populations, rather than a “disease signature”. This is true in even isogenic cells differing at only a single genetic locus. Before we create a database, we need useful information to be stored there and I think some clear indication of what that information might currently be, given the nascent status of the field, would be a
welcome addition to the commentary.

2) Ethical – Any advocacy for the sharing of personalized biological data needs to address the ethical issues attendant upon this information. How will we deal with different IRB protocols and their variable conditions on information sharing, the potential risks of de-identification of material in “the cloud”, the types of consent required to use MHC-matched lines for regenerative purposes in other individuals, etc.?

3) Cultural – The authors allude to the need for collaboration. Unfortunately, there will be too many in the scientific community who will be hesitant to share their data to protect their intellectual property, publications etc. This is an intensely competitive field and this issue needs to be confronted overtly. Incentives for collaboration should be spelled out, and concrete precedents in other fields need to be stated. The authors point out a recent effort in TB – can they point out specific positive outcomes from this initiative, for disease research and patients, but also for individual researchers?

While I am absolutely in favor of the ideas projected in this commentary and recognize the timely nature of the message, I do not think this commentary in its current form – particularly if it is a stand-alone piece – goes far enough to push the stem-cell field forward.

Reviewer #3 (Remarks to the Author):

This commentary entitled “Databases and Collaboration Require Standards for Human Stem Cell Research” by Nadia Litterman and Sean Ekins emphasizes the need for a collaborative effort from industry and academia to build a comprehensive, mineable, and freely available database or repository to enable sharing of data with regards to stem cell research. The idea of the authors is to build a database with common nomenclature and minimal standards that will accelerate discovery and drug development. The authors illustrate an example of tuberculosis research where this database model has been tested.
The authors describe the current field of embryonic/iPS stem cell research and indicate the major potential of the field for human health, but they also illustrate in depth the challenges and roadblocks such as variability of differentiation, efficiency, and phenotypic outcomes. They mainly attribute this to a lack of standards, nomenclature, traceability, interpretation of data and poor repetition. They conclude that these challenges hinder industry to advance stem cell research into therapies.
The authors reviewed the literature and online sources for already existing databases that contain some of the information needed to move the field forward and how they can be useful for integration into a large database, but the authors indicate that most of the existing databases as data silos and not accessible for the community.
The authors indicate that this effort for the functional, lasting database needs sponsorship, world-wide support, active participation, and continuous maintenance to become successful resource for the stem cell community and to fulfill translational goals towards therapies.

It is a well written commentary pointing out a challenge of collaborative efforts to advance a scientific field to benefit patients more readily.
The authors give an excellent background where the field is and what the current bottlenecks are that need to be removed. All of this should be documented in a comprehensive database.
Suggested improvements: One aspect to complete the picture of stem cell research in academia and pharmaceutical industry are the “stem cell research enabling biotech companies”. These biotech companies have an extreme power to shape the field with their products/kits and setting their own standards. Today’s research does not rely on making one’s own buffers and buying individual chemicals for experiments, but ordering kits and premade formulations for experiments. The standards and optimization in this case are set by biotech companies. Not sure how this can be addressed and how to find an objective way to build this into a comprehensive database. Many of the commercially available kits work, but they contain proprietary formulations or supplements.
For completeness, the authors should include the California Institute of Regenerative Medicine and the New York Stem Cell Foundation (maybe others) as authorities to ensure standardization and to provide financial support.
The company Iperian was mentioned in the manuscript and their shift in focus from iPS cell model company towards an antibody program. For completeness, Cellular Dynamics International is a successful model of a company that is providing iPS-derived differentiated cells to customers under standardized optimized conditions.

Conclusion: Overall, it is an important topic that will find interest in the stem cell research community. The authors point towards the right entities but do not propose a concerted plan how the collaboration and transparency between academia and industry needs to be approached which seems to be the bigger challenge than building an open-access data warehouse.

Clearly the final published version is modified from that submitted to Nature Genetics, we were actually encouraged to submit by the Editor.. and unfortunately the reviewers had other ideas. Thank you DDT peer reviewers for liking it.




Reviewing the robot scientist

A couple of days ago I was contacted by a freelance science writer (Andy Extance) to review an embargoed article …I had a short deadline, read the paper as if I was reviewing it. I then sent an email, and the next day Andy distilled my thoughts to a quote..I was not paid to do this.

The article by Andy is now in Scientific American..what I still cannot understand is why they are highlighting what is a pretty weak study.

Here is my full email..from which I was quoted. Consider this an open review of the paper because if the journal had sent the paper to me I would have submitted something close to this. I am pretty concerned about the kind of science that gets highlighted by the press looking for good sound bites, which is likely being fed these stories by the funding bodies and universities as a way to gain more support. The vicious cycle perpetuates. I am in favor of new technologies, but it has to be good sound science.

Dear Andy,

I read the paper. I was cringing all the way through it. Awful. Why on earth is Sci Am interested in this? Is it because they published on Adam in Science in 2009. If so what have they done since then?
My apologies but the funding bodies must have been smoking crack to award these folks money. It saddens me to think that this kind of stuff passes for science in the UK, are my fellow countrymen so enamoured by anyone that talks about active learning, quotes Lewis Carroll and Peter Medawar and mentions semantic data models? Just surprised Turing was not quoted for good measure.

The paper (if I dare call it that) is all smoke and mirrors and will get dismissed by anyone who actually reads it.

They talk about drug screening and assays as if they are experts – comments like “brute force and unintelligent” will win them zero friends. The parts on QSAR are just so basic as to be laughable. Take a look at the refs cited in the paper, they are generally pretty weak examples and in many cases long out of date. Economic/ econometrics is thrown in as though folks doing screening really care about the costs?

To the science.. pretty much every neglected disease has moved to whole cell phenotypic screens and moved away from target-based screens.. which makes you wonder seriously why they focused on a target.other than ease.
The reasons they focused on validating Eve on neglected diseases are just odd, if they wanted to do something that would impact big pharma why not go after a big disease that pharma really cares about and show how much faster you find hits and at a fraction of the cost.
To find a compound that is active against malaria is great but this compound was already known to be active. Its not even clear that this came out of their wonderful screening – modeling – AI- economically sound approach.

To put this into perspective GSK released about 14,000 malaria hits a few years ago, all whole cell data and approx 1000 had IC50 values lower than TNP-470. Novartis released data on 5700 cpds and over 700 of these have IC50 data lower than TNP-470 – admittedly this data is in plasmodium falciparum, but the point is there are hundreds if not 1000’s of examples of more active cpds.

I am not at all convinced that Eve can do QSAR – where are the correlations and statistics on the models it built. They should have been asked to prove the models were actually sound. Most QSAR papers will use additional data as a test set to validate the model, very rarely does this get fed back into the model as the authors suggest, which to me is another indicator of lack of knowledge or understanding of what they are doing. The authors do not show any enrichment data, receiver operator curves or even confusion matrixes.

Where are the metrics on the quality of their assays? Even the title of the paper is misleading – I would not call TNP-470 an approved drug, and they only showed activity of one cpd against one disease

The obvious experiment was not tried, go head to head with the standard approach of using screening data with a real comp chemist who would build the machine learning or QSAR models, predict new compounds and then get them tested. Without this I am unconvinced of the utility of this approach. Similarly there was no comparison of different QSAR or machine learning approaches. Why were classification models not tried as most HTS data is single point and perfect for binary models, they could have avoided hit confirmation stages and just flown through the screening-modeling cycles without needing to reconfirm?

Most people when sharing molecules and structures make their data available as sdf files which other modelers etc can use, apparently not these folks. These are computer scientists trying to do drug discovery and it shows.

The only ray of hope I saw in the whole paper was the use of acoustic dispensing, so at least they may have fewer issues with cpds leaching from tips or hydrophobic cpds sticking to tips..

Dare I even address the pie-in -the-sky conclusions. Eve should go back to the garden of eden and leave drug discovery to scientists who know what they are doing.

I would have rejected the paper.


In Memorium: Dr. Michael Rosenberg

I only found out today that Dr. Michael J. Rosenberg had recently died tragically. I was very fortunate to have interacted with Michael on several occasions. First he wrote a book chapter in 2006 for my first edited book, and then he wrote a book of his own on Adapitive clinical trials that was part of a series at Wiley published in 2010. Michael was a pleasure to interact with and very energetic. He had a major impact on making clinical trials more efficient and developing technologies for them which will be felt for years to come. He was also a successful businessman, founding the CRO here in RTP, Health Decisions.  My condolences go out to his family and colleagues.





A year in publications – 2014

A year in collaborative publications, the ups and downs and a few random comments as well (with a big thanks to all involved):

1. Ekins S and Freundlich JS and Coffee M, A common feature pharmacophore for FDA-approved drugs inhibiting the Ebola virus, F1000research, 3: 277, 2014.

This came initially from a Twitter exchange of papers containing FDA drugs. Pretty speculative. Initially was part of a much bigger paper (which is a story in itself). Several other ideas came at around the same time and hopefully they will see the light of day.

2. Ekins S, Collecting rare diseases, F1000research, 3, 260, 2014.

I was asked by F1000Research to put a collection together. This highlights some of the difficulties patients have in getting their ideas and work published.

3. Litterman NK, Rhee M, Swinney DC and Ekins S, Collaboration for rare disease drug discovery research, F1000research, 3:261, 2014.

This is the result of a good collaboration from 4 diverse backgrounds, I connected to one co-author via Twitter.

4. Dong Z, Ekins S and Polli JE, A substrate pharmacophore for the human sodium taurocholate co-transporting polypeptide, 478(1):88-95, 2014.

This manuscript came together pretty quickly in 2014, I think it’s the first such paper on NTCP substrates.

5. Lipinski CA, Litterman N, Southan C, Williams AJ, Clark AM and Ekins S. Parallel worlds of public and commercial bioactive chemistry data, J Med Chem, In Press 2014.

This project started from a discussion and was recently covered in detail here.

6. Litterman N, Lipinski CA, Bunin BA and Ekins S, Computational Prediction and Validation of an Expert’s Evaluation of Chemical Probes, J Chem Inf Model, 54:2996-3004, 2014.

This project started from a discussion and was recently covered in detail here and here.

7. Litterman N, and Ekins S, Databases and collaboration require standards for human stem cell research, Drug Disc Today, In press 2014.

This was initially an idea from discussion with the editor of Nature Genetics. It was rejected by that Journal. We also tried several other journals. I think it’s a great proposal / idea and could be achieved very readily. The challenge is how to get groups on board.

8. Ekins S, Freundlich JS and Reynolds RC, Are bigger datasets better for machine learning? Fusing single-point and dual-event dose response data for Mycobacterium tuberculosis, J Chem Inf Model, 54:2157-65, 2014.

Possibly the logical extension of the TB machine learning papers. Combining all datasets from the SRI/NIAID work.

9. Ekins S, Hacking into the granuloma: could antibody antibiotic conjugates be developed for TB? Tuberculosis, 94(6):715-6, 2014.

This came from a discussion over dinner when I was asked for a crazy idea. I then pulled together the basis of the commentary. It’s a pretty simple idea, building on whats been done for cancer but as far as I can tell never tried for TB. Next step is to actually do it.

10. Ekins S, Clark AM, Swamidass SJ, Litterman N and Williams AJ, Bigger data, collaborative tools and the future of predictive drug discovery, J Comp-Aided Mol Design, 54:2157-65, 2014.

An invited review for the journal, took a good amount of effort to put this together, pulling different ideas into a cohesive document. I like the end result.

11. Ekins S, Nuermberger EL and Freundlich JS, Minding the gaps in Tuberculosis research, Drug Disc Today, 19:1279-82, 2014.

This brief commentary takes the JCIM paper below and expands it. We tried Science Translational Medicine (rejected after review), Trends in Microbiology (triaged at proposal stage),

12. Sames L, Moore A, Arnold RJG and Ekins S, Recommendations to enable drug development for inherited neuropathies: Charcot-Marie-Tooth and Giant Axonal Neuropathy, F1000Research, 3:83, 2014.

This paper came out of the work we put into writing a RDCRN grant proposal in 2013 which we are still mining for additional grant proposals. A great collaboration with Parent/ patient advocates. This also marked our first submission to F1000Research.

13. Clark AM, Sarker M and Ekins S, New target predictions and visualization tools incorporating open source molecular fingerprints for TB Mobile 2.0, J Cheminform 6: 38, 2014.

This paper really highlights the incredible work of Alex Clark. How we took the update for the mobile app and added models, made descriptors open source and more.

14. Ekins S and Perlstein EO, Ten simple rules of live tweeting at scientific conferences, PLOS Comp Biol, 10(8):e1003789, 2014.

This little editorial was the surprise of the year for me and I have discussed its formation previously. An idea we had walking from a conference on our way to dinner. It took a while for this paper to get published.

15. Ekins S, Pottorf R, Reynolds RC, Williams AJ, Clark AM, Freundlich JS, Looking back to the future: predicting in vivo efficacy of small molecules versus Mycobacterium tuberculosis, J Chem Inf Model, 54:1070-82, 2014.

All the work for this paper was performed in 2013. We tried J Med Chem first before JCIM.

16. Dong Z, Ekins S and Polli JE, Quantitative NTCP pharmacophore and lack of association between DILI and NTCP inhibition, Eur J Pharm Sci, 66:1-9, 2014.

A paper that was written based on work from 2013. We had to try a few journals before this one made it out there.

17. Krasowski MD and Ekins S, Using cheminformatics to predict cross reactivity of “designer drugs” to their currently available immunoassays. J Cheminform 6:22, 2014.

A paper written early this year from work Matt Krasowski and I did in 2013, more investigation of Bath salts and similarity to immunoassays.

18. Krasowski MD, Drees D, Morris CS, Maakestad J, Blau JL and Ekins S, Cross-reactivity of Steroid Hormone Immunoassays: Clinical Significance and Two-Dimensional Molecular Similarity Prediction, BMC Clinical Pathology, BMC Clin Pathol, 14:33, 2014.

A paper written in 2013 from work done in 2012 with Matt Krasowski, looking at steroids and immunoassays cross reactivity.

19. Godbole AA, Ahmed W, Bhat RS, Bradley EK, Ekins S and Nagaraja V, Inhibition of Mycobacterium tuberculosis I by m-AMSA, a eukaryotic type II topoisomerase poison. Biochem Biophys Res Comm, 446:916-20, 2014.

Written from 2012-2013, a collaboration with a group in India as part of the MM4TB project. The first of 2 papers using docking for this target.

20. Ekins S and Williams AJ, Curing TB with open science, Tuberculosis, 94:183-5, 2014.

Written with Tony in 2013, from a discussion we had one day over coffee..what if there was more open science for TB?

21. Kandel BA, Ekins S, Leuner K, Thasler WE, Harteneck C and Zanger UM, No activation of human PXR by hyperforin-related phloroglucinols, JPET, 348:393-400, 2014.

Written in 2013, a collaboration with a German group, I generated all the PXR model predictions. One of the few examples of a “negative data” paper being published that I have been involved with!

22. Ekins S, Casey A.C, Roberts D, Parish T. and Bunin BA, Bayesian Models for Screening and TB Mobile for Target Inference with Mycobacterium tuberculosis, Tuberculosis, 94:162-9, 2014.

Written in 2013, as the third external evaluation of TB Bayesian models published to date.

23. Ekins S, Freundlich JS, Hobrath JV, White EL, Reynolds RC, Combining computational methods for hit to lead optimization in Mycobacterium tuberculosis drug discovery, Pharm Res, 31: 414-35, 2014.

Written in 2013, using data from the SRI ARRA grant which made a very useful test set for the various TB machine learning models.

24. Ponder EL, Freundlich JS, Sarker M, Ekins S, Computational models for neglected diseases: gaps and opportunities, Pharm Res, 31: 271-277, 2014.

This was written in 2013 primarily using data collected for a grant proposal. It’s a very brief summary of where computers have been used for these diseases too.

25. Ekins S, Progress in computational toxicology, J Pharmacol Toxicol Methods, 69:115-140 2014.

This was written in 2013 initially as a book chapter, the editor wanted to change it dramatically and I did not so opted to turn into a review.


R&D jobs in pharma are snow leopards – scientists must embrace social media now!

I was inspired by my friend Robert Moore to write this post. He had written back in October on how to find members of the C-suites at businesses, which are positions treasured by marketeers. He compared CEOs to snow leopards, a very rare species, that can be found if you are smart and know where to look. Robert described how to find them by the content they shared from business publications. I have kept this burning in the back of my mind because its a beautiful image, until a few circumstances have made me think of parallels elsewhere.

Wednesday Dec 3rd, GSK announced they would cut 900 R&D jobs in RTP here in North Carolina. This is but another example in the long line of big pharma layoffs over the past decade. But its not just big pharma that is laying off scientists, it is the likes of Purdue, and this is also happening in Israel with Teva and France with Pierre Fabre etc. It also makes you wonder what Merck will do once they digest Cubist. If we needed more evidence of big pharma’s failure to innovate itself, then this would be it. If you are a company that relies on researchers buying your wares then this is a wake up call too. Finding customers in pharma may be very similar to finding that snow leopard and its going to get harder. Where will those customers end up in future, how will we find them again if we do not track them?

Well it is looking increasingly like the R&D for future drugs will come predominantly from small companies or academia. More ex-big pharma scientists will be in these organizations or they will start their own company, perhaps working  initially as consultants. That is where we should be looking for the drugs for the next decades to come. We will see this shift as scientists update their LinkedIn profiles, update their Facebook pages and maybe even tweet if they are lucky to find a new job. I think this also points to the importance of scientists marketing themselves using social media. Those days when scientists could just rely on patents, publications or their ability on the speaker circuit to market their abilities are perhaps resigned to the past.

Networking by social media is likely a huge asset as hiring companies look (Google you) before they interview. If you are like me, you may feel like a social media dabbler. I exploit LinkedIn, Twitter, this blog, and a whole array of other tools like Slideshare, Figshare, Kudos to raise awareness of the science, projects, articles I collaborate on and skills on offer. I wonder is it enough? I am barely scraping the surface of what is out there and honestly it is a challenge to find time to keep up. I am not the only one both taking this approach and likely feeling the pain.  So the challenge for companies that want to sell to me will be knowing what to look for as people like me spread themselves thinly across social sites in the hope of finding someone that will hire them one day or pass their details along.

What do I want to buy? I can tell you that if I had someone that could take care of my own ‘personal marketing’ that would be fantastic. Someone that could update my Kudos pages, tweet for me, and even write these posts! I can imagine a future full of these social media assistants. Software exists on the other end to find people for marketing purposes but my guess is its not being used nearly as much as it could be. You could say the same for trying to find patients for clinical trials. Its likely that recruitment by social media will be the norm. Will recruitment for R&D jobs by social media also follow suit? I have this image of warehouses full of people mining Twitter and other social media hubs, finding targets, be they customers, patients or people they want to connect others to.

Some of the ways you as a scientist can raise your profile and do it in a way that’s not equated to spam are as follows:

1. You could tweet at conferences - This could be useful to others and people will follow you for doing this.

2. You could capture your papers in tools like Kudos and explain them in simple terms, combine other content that might increase their audience.

3. You could be ahead of the curve and write a blog post on something that is timely, a scientific observation or just what you are working on – this could be as a guest for someone else’s blog, and just put what you do into simple language. You could even put something informative on your Linkedin profile.

We are embarking on a new era, the scientist that is connected, no longer bound by the walls of the lab but connected to the world. Collaboration will be even more important, software that facilitates these collaborations will be essential. Mobility will be important as will the tools that they use.

GSK can only hope that those last employees leave the building next year and clean the whiteboards after them this time. I would also encourage them over the next few months to embrace social media so they can be found by those companies or other organizations that are hiring. As a scientist your profile and social media persona matters, you do not want to be the snow leopard.


Chemical probes and parallel database worlds – who wants to know? More publishing fun



This post is long and a highly detailed description of the challenges involved in getting scientific work published on one level, on another it gets to the heart of discoverability of data, data analysis and just the slog of publishing something that you hope is going to interest others in your direct field. You need to persevere and have an incredibly thick skin.

Yesterday I presented our recent work “Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s evaluation of the NIH chemical probes” at the In Silico Drug Discovery conference held at RTP. This work described a couple of recent collaborative publications, one of which was described in an earlier post as a very expensive dataset that included as many of the NIH Probes as we could gather.

Actually the whole project kicked off earlier in the year when I was visiting the CDD office in CA. Chris Lipinski, a long time board member was describing the challenges he was facing trying to find the “NIH Probes” and the incredibly detailed due diligence he was undertaking. Chris was doing this huge amount of work and if I remember correctly I just threw it out there that we should be modeling his score. This was another once of those moments where saying and doing it are completely possible but entailed a lot of work. I had no idea who or what would benefit from doing it, but it would be pretty interesting to see if a machine learning method could be used to help a medicinal chemist with the due diligence process, at least slim down the interesting compounds. Along the way of course you learn unexpected things and these have value. I had no idea during the initial idea what a Pandora’s box would be opened.

With Nadia Litterman and Chris we went through multiple iterations of model testing and inevitably we threw in a few other approaches to score the probes such as ligand efficiency, QED, PAINS and BadApple. Barry Bunin also helped us to interpret the descriptors we were finding in the Bayesian models. As you can see the scope of what we embarked on expanded greatly (and if you read the paper it will be even clearer). Chris spent countless hour scoring over 300 compounds. As we went through the write up process after a first pretty complete version, I realized we had more than just a modeling paper, there was also this complex perspective on using public and commercial chemical databases. Through past collaborations with Christopher Southan, Antony Williams and Alex Clark I thought they would be able to chime in too. In the end we had pretty diverse thoughts on the topic of public and commercial chemistry databases.

The NIH probe modeling paper was submitted to ACS Chemical Biology initially. We thought this was a good choice as this journal publishes many manuscripts that describe new chemical probes and our research may help in improving the quality of these molecules. We had the following reviews for the modeling paper from ACS Chemical Biology – needless to say it was rejected. The reviewers comments are perhaps useful insights and may indicate why so many shoddy probes get published in this an other journals.

Reviewer(s)’ Comments to Author:

Reviewer: 1

Comments to the Author
This publication details the creation of various computational models that supposedly distinguish between desirable and undesirable small molecules based on the opinion of one experienced medicinal chemist, “C.A.L.” – presumably Chris Lipinski.  Although Lipinski’s rule of 5 filters have been widely discussed, and Lipinski’s opinions are generally highly regarded, the authors also point out a key publication of Lajiness et al., reference # 8, in which it is noted that a group of 13 chemists were not consistent in what they rejected as being undesirable.  The logic is inescapable.  If 13 chemists are not consistent in their viewpoints, then why should one chemist’s viewpoint be any better than any of the others?  And, since Lipinski’s filters have already been widely discussed in the literature and are readily available in several cheminformatics packages, what is the new, useful, and robust science here that is going to aid screening?  What is the new value in having some kind of new computational filtering scheme that supposedly reproduces Lipinski’s viewpoint.  Unless it can be clearly shown that this “mechanized” viewpoint does a much better job at selecting highly useful chemical matter without high false negative and false positive rates relative to say, 12 other reasonably experienced medicinal chemists, I see little value in this work and I do not recommend publication.  The publication does not currently demonstrate such an advantage.

Reviewer: 2

Comments to the Author
This submission makes appropriate use of Bayesian statistics to analyze a set of publically available chemical probes. The methodology is clearly described and could have general applicability to assess future probe molecules.

I would have liked to see a more critical assessment of the process that has lead to around 20% of all new probes being rated as undesirable. The authors suggest that the concepts of rule-of-five compliance and ligand efficiency appear to have become accepted by the chemical biology community, while other factors such as the use of TFA for probe purification and sub-structural features have not become accepted. My own experience would implicate lack of awareness of these negative factors in groups involved in probe synthesis, since they often lack access to the “in house medicinal chemistry expert” suggested by the authors.  In addition, the substructure features are often encoded in a way that they are not accessible to the target community.

The authors also hint that the quality of vendor libraries might be behind the issue. A reminder (reference) that the final probe is likely to resemble the original hit might help.

I would also like to see a proposal for making the Bayesian models available to a wider community. As a recent CDD user, I note that they outline a CDD Vision, which might be a route to encouraging usage of the current models.

reviewer 3

The current work attempts to create a model that will faithfully match the opinion of an experienced medicinal chemist (Dr. Christopher Lipinski) in distinguishing desirable from undesirable compounds. The best model (Bayesian Model 4) is moderately successful (ROC = 0.735 under 5-fold cross-validation).

An important unanswered question is whether the best model performs as well as published filters such as PAINS and the Lilly, Pfizer, Abbott, and Glaxo rules. PAINS and the Lilly rules are available on public websites ( and The Pfizer, Abbott, and Glaxo queries are available in their respective publications (see refs 30-32 in Rohrig, Eur J Med Chem 84:284, 2014). Most of the “bad features” in Figure S8 look like they should match PAINS filters, but it isn’t possible to tell for sure without having the structures of the undesirable compounds (see the next paragraph).

Although I respect Dr. Lipinski, taking his assessments as “truth” in building a model is a stretch. Without seeing the structures of the desirables and undesirables, I have a hard time knowing what this study is trying to model. The Methods section indicates that the data set is available on the Collaborative Drug Discovery site, but I wasn’t able to find it there, although I did find quite a few other items that would be useful to chemists involved in screening and lead generation.

Why use just one medicinal chemist? There are a lot of experienced medicinal chemists who are retired or out of work, so it seems to me it wouldn’t be hard to assemble a panel of chemists to rate the compounds. Given the amount of money that NIH has spent on their screening initiative, maybe they would be interested in sponsoring such an exercise? Do the N57 and N170 datasets add value? The N307 set gave the best model, and if you want to do a chronological train/test split the N191 set would serve that purpose. [By the way, a chronological train/test split is a more rigorous test than a random split, so I am glad to see it used here.]

References 29, 39, and 48 seem to refer to websites, but no URL is given. If you are using EndNote, there is a format for referencing websites.

In the legend to Table 1, it mentions that mean pKa was 8.12 for undesirable and 9.71 for desirable compounds. Since these pKa values are greater than 7.4, wouldn’t these compounds be uncharged at physiological pH? I’m wondering why they are classified as acids.


Then we submitted essentially the same manuscript with minor edits to the Journal of Chemical Information and Modeling.  the reviews and our responses are shown below.

Reviewer(s)’ Comments to Author: Reviewer: 1 Recommendation: Publish after major revisions noted. Comments: The manuscript by Litterman and coworkers describes the application of state-of-the-art cheminformatics tools to model and predict the assessments of chemical entities by a human expert. From my perspective this is a relevant study for two main reasons: first, it is investigated to which extent it will might possible to standardize the assessment of the quality of any chemical entity. And secondly, the paper addresses a very important question related to knowledge management: is it possible to capture the wisdom of an experienced scientist by an algorithm that can be applied without getting direct input, for instance when the scientist has retired.

RESPONSE: Thank you

However, there are some fundamental points which I recommend to be adressed before the manuscript can be accepted for publication in JCIM. (1) It is suggested that the models which were trained from the expert’s assessment of the NIH probes can be used to identify desirable compounds (last paragraph). Here it should be clearly emphasized that the models are able to classify compounds according to the expert’s a priori definition of desirability. It remains to be seen whether these probes are valuable tool compounds or not. Some of them might turn out to be more valuable than they would be assessed today (see also Oprea at al., 2009, ref 1).
RESPONSE:  – The paragraph was changed to – A comparison versus other molecule quality metrics or filters such as QED, PAINS, BadApple and ligand efficiency indicates that a Bayesian model based on a single medicinal chemist’s decisions for a small set of probes not surprisingly can make decisions that are preferable in classifying desirable compounds (based on the expert’s a priori definition of desirability).

(2) Neither the QED nor the ligand efficieny index has been developed to predict the medchem desirability score as it is described in this article. QED for instance was derived from an analysis of several molecular properties of orally absorbed drugs. It is therefore not suprising that e.g. the QED score shows a poorer performance than the Bayesian models when predicting the desirability scores of the validation set compounds. In the way the comparison with QED and LE is described the only valid conlcusion that can be drawn is that QED and LE on one hand and the medchem desiability score don’t agree. One can’t conclude that the methods perform comparable or that one outperforms the other.
RESPONSE:  -We agree and perhaps would add that drug likeness methods do not represent a measure of medicinal chemist desirability. We state in the introduction “In addition we have compared the results of this effort with PAINS 22, QED 24, BadApple 28 and ligand efficiency 25, 29.”

In the methods we have reworded it to “The desirability of the NIH chemical probes was also compared with the quantitative estimate of drug-likeness (QED) 24 which was calculated using open source software from SilicosIt  (Schilde, Belgium).”

In the results we have reworded, “We also compared the ability of other tools for predicting the medicinal chemist’s desirability scores for the same set of 15 compounds. We found neither the QED, BadApple, or ligand efficiency metrics to be as predictive with ROC AUC of 0.58, 0.36, and 0.29 respectively. Therefore these drug likeness methods do not agree with the medicinal chemist’s desirability scores.”

In the discussion we edited to, “A comparison versus other molecule quality metrics or filters such as QED, BadApple and ligand efficiency indicates that a Bayesian model based on a single medicinal chemist’s decisions for a small set of probes not surprisingly can make decisions that are preferable in classifying desirable compounds (based on the experts a priori definition of desirability).”

(3) Taking 1 and 2 into account, the title is misleading: the expert’s assessment can only be validated by later experiences with the probes (i.e., were they found to be frequent hitters  etc). The models described in the article can only be validated by comparing predicted expert’s assessments with the actual assessments for an independent set of molecules.

RESPONSE:  -We would suggest that the title is correct because we built models that predicted molecules not in the training set for which the experts assessment was predicted and this assessment in turn included literature on biology, alerts etc. By predicting accurately the scores of the probes not in the training set, we have validated the model. The scored NIH probes that were not in the 4 iterative models in each phase are described (see Table 5 for statistics for external testing for each model). We otherwise agree with the reviewer that our computational model does not address the utility of the expert medicinal chemist’s judgment, which will be born out through future experimentation.

(4) It would be very important to judge the relative impact of “objective” criteria such as the number of literature references associated to a particular compound and “subjective” criteria like the expert’s judgement of chemical reactivity to the final desirablity assessment. A bar chart (how many compounds were labeled as undesirable b/o reactivity, how many b/o literature references etc) would help.
 RESPONSE: We agree that this is an important point. We have added a new figure (Figure 1) a pie chart to display how many compounds were labeled as undesirable due to each criteria. Approximately half of compounds are judged undesirable due to chemical reactivity.

(5) How is publication bias taken into account ? For instance it is conceivable that probe has been tested in many assays after it has been released, but was always found to be negative. If these results are not published (for any reason), the probe would be classified as undesirable. Would that alone disqualify the probe ? It might also occur that a publication of a positive result gets significantly delayed – again, the probe would be labeled as “undesirable”. Were any measures applied to account for this publication bias ?

RESPONSE:  The authors acknowledge these problems when considering publication status, and is reflected in our discussion of “soft” skills related to medicinal chemistry due diligence. For example, new compounds, those published in the last 2-3 years, were not considered undesirable due to lack of literature follow up.  We have added this to our discussion. Despite the severe limitations of our system, which we acknowledge as inherent to medicinal chemistry due diligence, our models were able to accurately predict desirable and undesirable scores.

(5) Constitution of training and validation sets for the individual model versions: it is stated that “after each model generation additional compounds were identified” (p 10). From which source where these compounds identified, why were they not identified before ? How were the smaller training sets selected (Bayesian model 1 – 57 molecules; model 2 – 170 molecules) ?

RESPONSE:  – As described in the Experimental section “With just a few exceptions NIH probe compounds were identified from the NIH’s Pubchem web based book 30 summarizing five years of probe discovery efforts. Probes are identified by ML number and by PubChem CID number. NIH probe compounds were compiled using the NIH PubChem Compound Identifier (CID) as the defining field for associating chemical structure. For chiral compounds, two dimensional depictions were searched in CAS SciFinderTM (CAS, Columbus OH) and associated references were used to define the intended structure. “

Each of the datasets were generated as Dr. Lipinski found the structures for additional probes. This process was complex and is the subject of a mini perspective submitted elsewhere because of the difficulties encountered which are of broader interest.

(6) As stated on p 18, the due diligence relies on soft skills and incorporates subjective determinations. These determinations might change over time, since the expert acquires additional knowledge. How can this dynamic aspect be incorporated in developing models for expert assessments ? The paper would benefit from suggestions or even proof-of-concept studies to adress this question.

RESPONSE:  -This is a great point while we feel it is beyond the scope of this project, it is worth pursuing elsewhere in more detail. We have documented for the first time the ability to model one medicinal chemist’s assessment of a set of probes, which is a snapshot in time. The number of probes will increase and the amount of data on them will change over time. The medicinal chemists assessment will likely also change. Our rationale was select a chemist that has great experience (40+ yrs ) that has seen it all – the assessment in this case is likely more stable. We are just modeling this chemists decision making.

(7) It is difficult to judge the relevance of the comparison with BadApple – more details on the underlying scope and methodology or a literature reference are necessary.

RESPONSE:  The BadApple score is determined from publicly available information to determine promiscuous compounds. We have added clarification and references to the website and the  American Chemical Society presentation in the text.

(8) In ref 22 and 23 substructure filters and rules are described to flag potential promiscous compounds. How many of the NIH probes would be flagged by e.g. PAINS ?

RESPONSE: The PAINS filters flagged 34 of the NIH chemical probes – 25% of the undesirable and 6.7% of the desirable. We have included this data in the text and added it to Figure 3.

Additional Questions: Please rate the quality of the science reported in this paper (10 – High Quality/ 1 – Low Quality): 6 Please rate the overall importance of this paper to the field of chemical information or modeling (10 – High Importance / 1 – Low Importance): 8 Reviewer: 2 Recommendation: Publish after minor revisions noted. Comments: Computational Prediction and Validation of an Expert’s Evaluation of Chemical Probes The work presented in the manuscript describes an effort to computationally model CAL’s (an expert medicinal chemist’s) evaluation of chemical probes identified through the NIH screening initiative. CAL found about 20% of the hits as undesirable. This exercise is used as an initial example of understanding how medicinal chemistry evaluation of quality lead chemical matter is performed and whether that can be automated through computational methods and/or expert rules teased out or learnt. The manuscript is well written, evaluation of chemical matter and capture of various criteria thorough and the computational modeling methods sound, that I don’t have any suggestions on the manuscript, experimental details, commentary and conclusions.

RESPONSE: Thank you

However, I have a philosophical question on the study that the authors have carried out and perhaps that can addressed through comments back and weaved into the manuscript discussion somewhere. Given that human evaluation of anything is very subjective and biased to begin with (As ref 8 – Lajiness et al. study indicates), what does one gain from one expert evaluation as opposed to a medchem expert panel evaluation. For e.g., a CNS chemist evaluating probes for a CNS target versus an oncology chemist evaluating probes for a end-state cancer indication will have very different perspective on attractive chemical matter or different levels of tolerance threshold during the evaluation. Further even within a single project team, medchem campaigns in the pharmaceutical industry are mostly a team-based environment, where multiple opinions are expressed, captured and debated. There is no quantitative evidence to date, that any one approach is better than the other, however consensus of an expert panel might certainly identify common elements that could be developed as such(?)

RESPONSE: Yes this is a great point. The earliest work on the probes as described used crowdsourcing with multiple scientists (not just medicinal chemists) to score the probes. We do now state in the final sentences – “This set of NIH chemical probes could also be scored by other in-house medicinal chemistry experts to come up with a customized score that in turn could be used to tailor the algorithm to their own preferences.  For example this could be tailored towards CNS or anticancer compounds”.   In the case of the study this was not a consideration. We only looked at ‘Were the compounds desirable or not based on the extensive due diligence performed’. One concern with consensus decisions is that it may dilute the expert opinion, when our goal was to capture the decisions of one expert and not the crowd. We had termed this ’ the expert in a box’ casually, could we capture all of that insight and knowledge and then distill it down to some binary decision using some fingerprint descriptors? Our answer so far based on this work was yes. Additional Questions: Please rate the quality of the science reported in this paper (10 – High Quality/ 1 – Low Quality): 6 Please rate the overall importance of this paper to the field of chemical information or modeling (10 – High Importance / 1 – Low Importance): 5


As for the discussion on public and commercial databases this work was submitted to Nature Chemical Biology as a commentary. The same journal published the only prior analysis on 64 chemical probes in 2009. We thought this would be a perfect location for a discussion of the issues between public and commercial databases. After all Nature is so supportive of data reproducibility.

Dear Dr. Ekins:

Thank you for your submission of a Commentary entitled “The parallel worlds of public or commercial chemistry and biology data”.

Our editorial team has read and discussed your manuscript. Though we agree that the topic of chemical and biological data is relevant to our audience, we unfortunately are not able to consider your Commentary for publication. Because we have such limited space for Commentaries and Reviews in the journal, these formats are typically commissioned by the editors before submission. Since we have a strong pipeline of content at the moment, especially in areas related to the development and validation of chemical probes, we unfortunately cannot take on any more Commentary articles in this particular area.

We are sorry that we cannot proceed with this particular manuscript, and hope that you will rapidly receive a more favorable response from another journal.

Best regards,

Terry L. Sheppard, Ph.D.
Nature Chemical Biology

So we then submitted it to The Journal of Medicinal Chemistry as a miniperspective – we went through 2 rounds of peer review and the manuscript changed immensely based on the reviewer comments.

Reviewers’ Comments to Author: Reviewer: 1 Comments: This is a thought-provoking article that is appropriate for publication as a Perspective in JMC. I recommend acceptance with minor edits.

RESPONSE: we thank the reviewer for their comment.

It is important that this article be clearly labeled as a Perspective, as there is a significant number of personal opinions and undocumented statements throughout.  Given the recognized professional stature of the authors, I do not doubt the veracity and value of such statements, but they certainly deviate from a JMC norm.  There are also some controversial statements that are valuable to have in writing in such a prominent journal as JMC, and I look forward to alternative interpretations from other authors in future articles.  I consider this normal scientific discourse, and encourage JMC to publish.

RESPONSE: This article is a Mini-Perspective. We have tried not to be too controversial but we feel the timing is appropriate before the situation gets too far out of hand.

Some suggestions: 1. The title is misleading (at least to me).  I recommend the term “biology data” should be re-phrased as “bioassay data”.  I might be splitting semantic hairs, but the vast majority of data encompassed in this article does not deal with efficacy or behavior of animals.  True biological data is much more complicated (dose, time, histology, organ weights, age, sex, etc.) than the data cited here (typically, EC50 or IC50 data).  I defer to the authors on this point.

RESPONSE: Thank you, we have changed to “The parallel worlds of public and commercial bioactive chemistry data”

2. Page 4, line 22. A comma is needed after “suspects’)”.

RESPONSE: Thank you, this has been added. 3.

Page 11, line 47.  I found myself asking “What is the value of prophetic compounds?”  The authors write that the “value is at least clear”, but as I read this line, the value became unclear (to me).  I recommend that the authors explicitly indicate that value, particularly as it is relevant to the Prior Art question treated in this paragraph.  I suspect the value is to “illustrate the invention,” but I defer to a legal expert for better verbiage.  If we are going to expend computational time in searching and interpreting these prophetic compounds, then surely there must be a value beyond the initial illustration of the invention.

RESPONSE: We have greatly expanded on these topics in the text – there has already been some discussion of this. We also added a glossary.

4. Page 21, reference 26. The authors must add the Patent Application Number.  I believe this is US 20090163545, but I defer to the authors.  Also, if this application has led to a granted patent, that citation should be included as well.

RESPONSE: we have updated the number in the references and the text.

5. Figure 1.  While artistic, this picture is confusing to me.  Please re-draw and remove the meaningless sine wave that traverses the picture.  Please re-position the text descriptors beneath each compound uniformly, in traditional JMC style.  The picture concept, e.g. illustration of the various kinds of compounds, is useful.

RESPONSE: We have redrawn as requested.

6. Figure 2. This is an interesting figure and I feel it adds visually to stress the theme of the paper.  However, please amend the legend to explicitly define the size and absence of a circle.  I presume the size of the circle reflects the relative size of the cluster, and the absence of a circle denotes a singleton, but I am unsure.  The red/blue dots are intriguing, but I am unclear on how “desirability” is quantitated.  Perhaps the authors intend the red/blue dots to be only a rough, maybe even arbitrary or random, visual cue with most compounds scoring intermediate.  Please provide a line in the legend that explains how the red/blue was scored.

RESPONSE: We have updated the legend. The desirability scoring is the subject of a separate manuscript in review at JCIM. This Figure 2 is not published elsewhere.- Figure 2. The chemical structures for 322 NIH MLP probes ( have been clustered into 44 groups, using ECFP_6 fingerprints 49 and using a Tanimoto similarity threshold of >0.11 for cluster membership. Each of the clusters and singletons: for each cluster, a representative molecule is shown (selected by picking the structure within the cluster with the highest average similarity to other structures in the same cluster). The clusters are decorated with semicircles which are colored blue for compounds which were considered high confidence based on our medicinal chemistry due diligence analysis (Manuscript in review), and red for those which are not. Circle area is proportional to cluster size, and singletons are represented as a dot.

Reviewer: 2 Comments: The ‘perspective’ by Lipinski et al. is in part difficult to follow and it remains largely unclear what the authors aim to bring across. One essentially looks at a collection of scattered thoughts about databases, search tools, molecular probes, or patents etc. Various (in part technical, in part general) comments about SciFinder and the CAS registry are a recurrent theme culminating in the conclusion that SciFinder is probably not capturing all compounds that are currently available… The only other major conclusion the authors appear to come up with is their wish for ‘more openness in terms of availability of chemistry and biological data …’ (but there  is little hope, as stated in the very last sentence of this manuscript …). This draft lacks a clear structure, a consistent line of thought, and meaningful take home messages that go beyond commonplace statements and might be of interest to a medicinal chemistry audience. This reviewer is also not certain that some of the more specific statements made are valid (to the extent that one can follow them), for example, those concerning ‘data dumps’ into public databases or the ‘tautomer collapse’.

RESPONSE: We have extensively rewritten the mini-perspective to address the reviewer concerns and provide more structure and narrative flow. We have made it more cohesive and come up with additional recommendations to improve the database situation. We have removed the term data dump and expanded other terms. We have added take home messages and conclusions as suggested.

Be that as it may, there already is a considerable body of literature out there concerning public compound databases, database content, and structural/activity data, very little of which has been considered here. Which are the major databases? Is there continuous development? What are major differences between public compound repositories? Are there efforts underway to synchronize database development? What about the current state of data curation? What about data integrity? Is there quality control of public and commercial databases? Is there evidence for uniqueness and potential advantages of commercial compound collections? What efforts are currently underway to integrate biological and chemical data? Why are there so many inconsistencies in compound databases and discrepancies between them? How to establish meaningful compound and data selection criteria? How do growing compound databases influence medicinal chemistry programs (if at all)? Is their evidence for the use of growing amounts of compounds data in the practice of medicinal chemistry? How do chemical database requirements change in the big data era? Such questions would be highly relevant for a database perspective.

RESPONSE: We have addressed several of these questions in the perspective. Many of these were topics we have raised earlier and now reference those papers. We have also created Table 1 (unpublished) to add more detail on databases.

As presented, some parts of this draft might be of interest for a blog, but the manuscript is not even approaching perspective standards of J. Med. Chem.

RESPONSE: We have extensively rewritten the mini-perspective to address the reviewer concerns. Based on the feedback of the other reviewers they had less concern or issue with the standard. We believe it is now greatly improved. J Med Chem is the appropriate outlet to raise awareness of this issue which will be of interest to medicinal chemists globally. We think this goes beyond the audiences of our respective blogs.

Reviewer: 3 – Review attached.    This paper addresses an important and timely topic, but it is disorganized and in places reads as an informal recounting of annoyances the authors have encountered in their development and use of various chemical databases. It could use a bit of rethinking and rewriting; some more specific comments and suggestions are provided below for the authors’ consideration.

RESPONSE: We have extensively rewritten the mini-perspective to address the reviewer concerns and provide more organization and structure in general. We believe it is now greatly improved.

It is not clear to this reader what is the main concern the authors wish to address with this article. It starts by taking the reader through some of the detailed problems of identifying and learning about a set of about 300 interesting compounds curated by the NIH; but it is never clear why these compounds are of leading interest here. Are they being used just as examples, or are they particularly important? Later, the paper puts considerable emphasis on the difficulty of completing IP due diligence for new compounds, due to the heterogeneity of chemical databases, and it began to appear that this was the main concern. The paper would benefit from a more specific statement of its concerns at the outset. Although the paper frequently refers generically to proprietary and public chemical databases, my impression is that the only proprietary database that is specifically mentioned is SciFinder/CAS. Are there any other proprietary databases (e.g., Wombat or the internal databases of pharma companies) to which the authors’ comments apply? If not, then the article would be clearer if it specified at the outset that the key proprietary database at issue is CAS.

RESPONSE: We now made it clear that the probes are used as an example and the problems we encountered when trying to find them and also score them for desirability (described in a manuscript in review at JCIM). We have also created Table 1 (unpublished) to add more detail on databases. We believe the issue is not just with CAS and have now expanded this to cover other databases in Table 1.

Many readers will not be familiar with the specific requirements for successful due diligence search, so these should be spelled out in the paper. Without this, many readers will not understand how the current chemical informatics infrastructure falls short for this application. Along similar lines, the authors should define “probe” compounds, “good probes”, “bad probes”, “prophetic compounds”, “text abstracted compounds” and other terms of art that are likely to be unfamiliar to many readers.

RESPONSE: We have provided references that address these questions, we have also added more explanation and a glossary.

A small quirk of the presentation is that the authors list multiple “personal communications” from themselves to themselves. This appears to be an effort to allocate credit to specific authors, but it’s not a practice I’ve seen before, and it strikes a jarring note. Perhaps some style expert at ACS Journals can clarify whether this is a suitable practice.

RESPONSE: We have removed these author communications and abbreviations as proposed.

There are a number of places where the authors make assertions that are vague and unsupported by data or citations. For example, on page 4, it isn’t clear how the analysis of 300 probes revealed the complexity of the data, and Figure 2 does not help with this this. (It looks like just a diagram of compound clusters.) Similarly, at the bottom of page 9, concerns are raised about the accuracy of chemical structures in catalogs, but the support is weak, as the reader only gets the personal impressions of A.J.W. and C.A.L. If C.A.L. has estimated that 50% of “commercially available” compounds have never been made, it should not be difficult to add a sentence or two explaining how the analysis and the data. Similarly, if A.J.W’s “personal experience” of processing millions of compounds has taught him the many compounds from vendors have “significant quality issues”, then it would be appropriate to provide summary statistics and examples of the types of errors. Similarly, it would be appropriate to replace “many compounds” by something more quantitative; “many” could mean 10% to A.J.W., but 50% to a reader.

RESPONSE: We have clarified the number of probes. We have provided references to our other papers dealing with database quality issues which are quantitative in this regard. We have removed the author abbreviations. We have removed any ambiguity in the numbers presented.

On page 4, in what sense have “multiple scientists scored a set of” compounds? What is meant by “score”, here? In what sense are the 64 probes “initial” and does it matter?

RESPONSE: We have expanded this and would recommend the reader read the actual paper for more detail because different scientists scored differently. Score represents each scientists evaluation of the desirability/ acceptability of the probe.

On page 5, we read that there is no common understanding of what a high-quality probe is, but then a definition is provided; this seems inconsistent. What is a “parent probe book”? The challenges encountered by C.A.L. in getting data on the NIH probes seem overly anecdotal, and it isn’t clear whether the reader is supposed to be learning something about problems at NIH from this, whether this experience is supposed to reflect upon all public chemical databases, etc. Why conclusion should the reader draw from the fact that C.A.L. eventually found a relevant spreadsheet “buried deep in in an NIH website”? It’s also a little confusing that, after this lengthy account of problems collecting all the probe information, the paper then praises the NIH probe book as a model to emulate. Finally, at the top of page 6, the authors speculate about the chemical vs biological orientations of the labs which providedthe probe data, but this seems irrelevant to any points the paper is making.

RESPONSE: We have removed this conflicting text – we believe the issues identified in this procedure are important. Access to probe molecules and data is complicated, non-obvious if not painful. The public funded efforts should make the data more accessible, this review just hints at the difficulties.

The section heading “Identifier and Structure Searches” tells the reader little about what the section will contain; and then the section in fact wanders from one topic to another. It starts with comments about ligand similarity and target similarity, discusses whether or not medicinal chemists are too conservative, delves into the vagaries of SciFinder’s chemical search capabilities, and finally devotes most of a very long paragraph to discussion of a single patent which references thousands of compounds. It isn’t clear why the reader is being told about this patent; is it problematic enough on its own to be worth extended commentary, or is it regarded as a small but worrying harbinger? Finally, the text recounts that “C.A.L. had initially worried that a reference to this patent application was somehow an indicator for a flawed or promiscuous compound. We now believe … this single patent application is an example of how complete data disclosure can lead to …. potentially harmful consequences.” It’s not clear that the report of initial worries help the reader to understand what is going on with this patent; and I didn’t fully understand the harmful consequences of this patent from the text provided.

RESPONSE: we have added an introduction to this section to facilitate lead in to the discussion. Again we have greatly edited this section to make it clearer.

Page 10: what is a “type c compound”? Who are “the hosts of ChemSpider”? Is the story about CAS and ChemSpider important for the messages of the paper?

RESPONSE: We deleted type c compound for clarity – The RSC own Chemspider. We think the story with CAS is relevant because it covers how data can pass between databases and possibly transfer problematic compounds.

Page 10: At the bottom of the page, a concern raised about the lack of metrics for specifying activity against a “biological target” is vague. Presumably the concern is greatest for phenotypic screens; one wonders whether the authors also regard Kd values as inadequately standardized. This may be the case, but more detail is needed to help the reader understand what point the authors mean to get across.

RESPONSE: We have edited this and added metrics for bioactivity – our main point is integrating data in databases and inadequate annotation and the requirement for ontologies to improve this.

Page 11 says that efforts are underway to standardize bioassay descriptions, based on “personal communication” from two of the authors. Are we to understand that these authors are actually doing the work, or are they personally communicating (to themselves) that someone else is doing it?

RESPONSE: We now added a recently published paper and removed the references to communications between authors.

Page 11, what does it mean for compounds to be “abstracted in databases”? Is this something different from just being listed in databases?

RESPONSE: This was changed.

Page 12: what are “tabular inorganics”? Can the authors at least estimate how much smaller the SciFinder collection would be if tautomer variants were merged? What is “an STN® application programming interface”? Is it different from some other type of application programing interface?

RESPONSE: We added a definition for tabular inorganics in the glossary. The STN API is described in a press release now added to the references. We do not know how much smaller the Scifinder collection would be if tautomers were merged.

Page 12: The last sentence says that proprietary and public databases will diverge until proprietary databases “determine how to extract quality data from the public platforms”. Couldn’t the proprietary databases take the public data now, and thus presumably eliminate any divergence? On the other hand, if they only extract some “quality” subset of the public data, then the divergence will persist, but this raises different issues, regarding the definition and identification of “quality” data.

RESPONSE: We have removed much of this discussion. CAS was taking the Public data like ChemSpider as described, but that ceased. It looks like CAS and likely other commercial databases cannot keep pace.

Page 13: the sentence beginning “There is however enough trouble…” reads as a non sequitur from the prior sentence, which says nothing about “perpetuating two or more parallel worlds”

RESPONSE: This statement was removed.

Finally, the article’s pessimistic concluding sentence undermines the value of the paper as a whole: if the improvement is so unlikely, why take readers’ time to tell them about the problems? Perhaps the article could end on a more positive note by exhorting the community (or just CAS?) to devise creative new business models which will enable greater integration of public and private chemical databases while retaining the strengths of both models.

RESPONSE: We have heeded this suggestion and proposed the use of InChI alongside SMILES – (CAS does not use this) that would allow comparison with other databases. We also proposed encouraging more analysis as well as a meetings between the major parties to discuss what can be done to resolve the on going situation. We have also used the suggestion of encouraging some creativity on the business side.

The second round of reviews:


Responses to Reviewers’ Comments to Author: Reviewer: 2 Comments: The authors have revised their manuscript and improved its readability. In addition, a number of irrelevant references have been eliminated. The discussion continues to be dominated by database technicalities (the majority of citations include cheminformatics journals or technical resources) with limited relevance for medicinal chemistry. The main body of the manuscript is akin to a collection of experience values trying to retrieve compound or patent information from various database sources. Unfortunately, the revised manuscript still lacks case studies and/or conclusions that would render it relevant for publication in J. Med. Chem. As presented, main points include the “lack of integration between isolated public and private data repositories”, the “multi-stop-datashop” theme, the quest for a “shift towards more collaboration or openness in terms of availability of chemistry and biological data”, and the “major hurdles that exist to prevent this from happening”. With all due respect, but this is all commonplace. The revised manuscript is now at least readable and conforms to formal publication requirements (although the quality of the few display items is rather poor and the reference list still includes numerous inconsistencies). Given the strong focus on technical aspects associated with database use, one might best advise the authors to submit their revised manuscript to an informatics-type journal where it would probably find an audience. The best choice might be J. Cheminf. that is rather technically oriented (and from which several of the cited journal references originate).


Response:  “The main body of the manuscript is akin to a collection of experience values”. Respectfully, we would like to make it clear that this is the point of our article. Here for example is a medicinal chemist trying to find the probes and decide based on data whether they actually should be probes in the first place. We are describing his experience and that of others in finding information on molecules. This is highly relevant to medicinal chemistry. We are not making molecules in this paper but the starting point for medicinal chemistry is HTS screening hits and these probes could (and some would argue) represent such molecules. The NIH spent over $500M dollars to produce these 300 or so ‘hits’ therefore the process we have undertaken serves to show the challenges and solutions to finding information on chemicals that may influence future chemistry decisions. We do not accept the suggestion that our article has “limited relevance to medicinal chemistry”. We are not aware of anyone using the whole set of NIH probes as the backdrop to such a discussion. Our article is much more than the sum of the “main points” presented by the reviewer as “all commonplace”. For example some of the issues around prior-art searching by virtual compounds could impact the composition of matter patentability of a new medicinal chemistry lead. The authors have experience in medicinal chemistry, cell biology, bioinformatics, analytical chemistry, cheminformatics and drug discovery, and I would say that we have approached it from a balanced perspective drawing from all of these perspectives, and not solely cheminformatics. It is not best suited to keep this article in a cheminformatics journal as it needs a wider audience of medicinal chemists if we are to promote some realization of the situation and effect change.

Reviewer: 1 Comments: All of my issues (reviewer #1) were addressed in the re-submitted manuscript.  The added glossary is very helpful.  This is a much improved article with the changes in the text.  Thank you.


Response: Thank you

Reviewer: 3 – Review attached. The revised version is dramatically improved but requires further editing, for clarity, specificity, and

grammar. Detailed recommendations follow.


Response: Thank you – These are predominantly minor edits, which we have dealt with appropriately.

Page,Line Comments

2,12 “bioactivity” “bioactivity data”

Response: Thank you


2,29 “so called” “so-called”

Response: Thank you


2,34 delete “importance to the”

Response: Thank you


3,42 define “multi-stop datashops” or else don’t use it

Response : added ‘the afore mentioned’…


3,42 what does “constitutive” mean here? consider deleting it

Response – replaced with essential


3,44-47 is there some reason the divergence between public and private DBs is of greater concern than the divergence between different public DBs? If not, then adjust text accordingly. If so, then explain why.

Response – explanation added


3,52 “potentially others”. I suggest mentioning one or two potential others. .

Response, It was useful to point this out. Since CAS is the largest by far “potential others” has been removed


3,54-55 what does it mean that “CAS likely document their efforts to ensure high quality

curation…”? My impression is that it’s not any documentation of efforts, but the

efforts themselves which matter, anyhow.

Response, agreed documentation removed


4,8 “warranty”: these database do not warranty the data at all, so this word use seems,

well, unwarranted.

Response – agreed so sentence shortened


4,12 define “submitter-independence” or say this some other way.

Response : data quality issues arise that are independent of the submitter 15


4,15 “Logically, however…” The “however” seems out of place, as the subsequent text

does not contrast with what came before.

Response: Deleted “however” – preceding sentences describe data quality


4,15 define or reword “extrinsically comparative database quality metrics”.

Response: Deleted “extrinsically”


4,36 add comma after “million”

Response: Thank you


4,45 It’s not clear that citation 18 supports the text referencing it

Response : these are correct references


4,48 add comma after “databases”

Response: Thank you


4,50 after “GDB”, replace comma by semicolon; add comma after “scale”; delete

“small”, as “boutique” already implies smallness

Response: deleted boutique


5,8 I think “simple molecular formula” would be clearer as “empirical formula”

Response: Thank you


5,8-9 Delete “other” and “with the identical formula”

Response: changed to same atomic composition


5,16 insert comma after “showed that”

Response: Thank you


5,18-19 Either delete “and suggested they were complementary” or rewrite it so that it

adds something to the meaning. As written it seems obvious that, if these DBs have

different data, they are complementary.

Response: This has been shortened.


5,30 replace “if” by “whether”

Response: Thank you


5,42 Figure 1: it’s not clear what we learn from this. More importantly, the caption

seems wrong, as it says this it shows “The ‘usual suspects’ lineup”, where the main

text defines usual suspects as compound with liabilities or reactivity. Clearly, Figure

1 is not the list of all “usual suspects”. In fact, most of the compounds do not

appear to be suspects at all. Also, in the captions, it is not clear what “desirable”

means. Desirable in what sense?

Response : moved location of “usual suspects”, Desirable was used according to the definition = worth having or wishing for (concise Oxford dictionary).


5,52 eleven scientists does not sound like “crowd sourcing”; the number seems too few

for a crowd.

Response : The Oprea et al paper, “A crowdsourcing evaluation of the NIH chemical probes” published in Nature chemical biology in 2009, used the term “crowdsourcing” to describe the study. We agree with this usage and for clarity will continue to use the term to refer to this study.


5,54 “acceptable” in what sense? this needs to be defined.

Response : Oprea et define ‘acceptable’, we are not judging their criteria here or using their data.


6,20 what is meant by “feasibility of pursuing a lead”? Presumably, it is feasible from the

chemical standpoint. If this is an issue of IP, then how does it differ from “freedom

to operate”? If it is the same, then delete it.

Response deleted.


6,34 “leads””lead”

Response: Thank you.


6,53 it is not clear what Figure 2 adds to this article. Also, I am concerned that clustering

compounds based on a Tanimoto similarity measure of 0.11 (see figure legend) 11

is probably not meaningful. At least for Daylight-style fingerprints, 0.11 would

normally be considered not very similar at all.

Also in the figure legend, we have “Each of the clusters and singletons: for each

cluster….”. Something is wrong with the punctuation. And we read that blue

indicates “high confidence”; but high confidence of what, exactly?


We have clarified that the threshold was chosen empirically to show a representative selection of probes. We have updated the legend and added the reference which describes how molecules were scored (recently published).


7,6 add comma after “databases”

Response: Thank you


7,13 add “that” before “solutions”

Response: Thank you


7,30 “very high binding affinity”—this is too vague for a scientific audience. Please

provide some quantitative cutoff, even if a bit rough.

Response this is the definition from ref 26


7,46 delete “use”

Response: Thank you


7,51 “Substance number… requires added effort to find the salient chemistry details”—

what does this mean? What are “salient chemical details”? Is it something other

than the chemical structure? My own impression is that an SID takes one smoothly

to a compound in PubChem, so I’m not sure what the issue is here.


Response: SID identifies a depositor-supplied molecule. (SID), is assigned by PubChem to each unique external registry ID provided by a PubChem data depositor. The molecule structure may be unknown, for example a natural product identified only by name or a compound identified only by an identifier. The depositor record could be a mixture of unknown composition. The molecule in a SID may be racemic, a mixture of stereoisomers, a regioisomer of unknown composition, a free acid or free base or a salt form. The data depositor may not be a chemistry expert or may be confused by chemistry structure. By way of contrast CID is the permanent identifier for a unique chemical structure but the unique structure still can be a mixture of enantiomers or stereoisomers. To properly perform a medicinal chemistry search one must know in structural terms what is being looked for. Therefore CID is the definitive identifier. Sometimes the relationship between SID and CID is clear, sometimes it does not exist.

Further information can be found on this at:

For detail on SID

Note: Although identifiers are unique within a PubChem database, the same integer can be used as an identifier in two or more different databases. For example, “2244” is a valid identifier in both the PubChem Substance and PubChem Compound database, where: SID: 2244 is the PubChem Substance database record for cytidylate, and CID: 2244 is the PubChem Compound database record for aspirin.


8,3 “would not retrieve a CID” – or an SID? In any case, it would be appropriate to

identify one or two sample ML Probes which have this problem, rather than just

making the assertion.


Response: ML213 is an example of the problem posed by SID and CID. The SID for this probe correctly and uniquely identifies the molecular substance, whatever it is was, that was tested in the assay for this probe. The corresponding CID for ML213 depicts a deceptively simple single structure of a norbornane carboxylic acid amide that is both chiral and whose chiral center is capable of epimerization. Thus this compound can potentially exist as four stereoisomers. From the CID number and structural depiction, the chemist does not know what was actually tested. One must read in the chemistry experimental for ML213 in the NIH probe book to find that a mixture of all four stereoisomers was actually made and that the stability curve of the mixture change over time and that the change is tentatively attributed to solubility changes. The net conclusion is that a chemist does not know anything about the activity or inactivity of the four stereoisomers. Chemical abstracts lists ML213 as the same structural depiction as shown in the CID and thus does not help in resolving the structural and biological ambiguity. The typical biologist would completely miss all the structural complexity and ambiguity. To ferret all this out by a chemist may or may not be possible for ML213 but if possible requires substantial effort.


8,39 “a CAS registry number” “CAS registry numbers”

Response: Thank you


8,55—9,37 This whole paragraph is apparently devoted to explaining why it is worth finding out if compounds similar to the one of interest have other biological activities. The rationale given is something along the lines that similar structures can have a wide

range of biological activities. I’m not sure all of these words are needed however;

surely it is worth knowing if the compound one is trying to patent has potential

side-effects. ”

Response: We think it is worth “a whole paragraph” on explaining this because people tend to forget that many chemical motifs are reused and why these are important.


9,48 add comma after “law”

Response: Thank you


9,55 “the well known issues…”: which well-known issues? And, do the authors mean

these are well-known for SciFinder in particular, or more broadly?

Response: “More broadly” has been added.


10,20 add comma after “Registry number”

Response: Thank you


10,45 replace last comma by a period

Response: Thank you


10,49-54 Why should the reader care that 132,781 compounds were specified in the HTS but not referenced as “use” compounds?

Response : These numbers simply explain the explicit and referenced content of this patent to the reader. 20% of the probes contain a sole patent reference to this patent whose biology is completely different than in any of the MP documents and this comes from only about 5000 out of 132,000 compounds in the HTS being abstracted. This unique case illustrates the havoc possible from extreme (inappropriate) data disclosure and abstraction.


10,54 Insert comma after “Thus”

Response: Thank you


11,3 Please explain how “due diligence searching can be confounded” by a patent like

this. What is the problem that it generates? I don’t think most readers will know

Response: we explain this in the following sentences.


11,18 Similarly, what are the “potentially harmful consequences” for IP due diligence.

Response: we explain this in the remainder of the article.


11,34 after “disclosure”, there should be a colon

Response: Thank you


11,37 add semi-colon after “SciFinder”

Response: Thank you


11,39 add comma after “screening data”

Response: Thank you


11,42 The text says Table 2 lists NIH probe molecules, but the second and third entries in the table do not appear to be NIH probe molecules. Clarify or correct.

Response this is now clarified and additional probes added.


11,49 add comma after “difficult”

Response: Thank you


12,29 add comma after “compounds”

Response: Thank you


13,20 add comma after “challenge”

Response: Thank you


13,51 add comma after “protocols”

Response: Thank you


14,3 replace “the question are” by “whether”

Response: Thank you


14,8 add comma after “dramatically”

Response: Thank you


14,11 add comma after “databases”

Response: Thank you


14,25 Delete “By definition….all of them.” It seems trivial and obvious. Or else change it

to say something nonobvious.

Response: changed to ‘By definition, no quantitative assessment across databases is possible without access to all of them, and to our knowledge this has not been undertaken to date.’


14,32 delete “aggregate”, unless it adds something .

Response: Thank you


14,42 add comma after “ChemSpider”

Response: Thank you


14,48 “require quantification”—why is quantification required? what is the expected

benefit? .


Response: quantitative statistics are essential for objective comparisons so structure matching is now specified in the text


15,27 Delete “To conclude, from our observations”

Response : this has been changed


15,31 delete “isolated”

Response: Thank you


15,50 either delete or clarify “at the extremes”. I’m not sure what it adds. In fact, if the

cases considered in this article are “extremes”, one might argue that the concerns

raised throughout are not that important, since presumably most users will not

have extreme experiences with the databases.

Response: extremes deleted


15,53 add comma after “Probes”

Response: Thank you


16,15 delete “also”, as we already have “in addition”

Response: Thank you


16,27 Again, “multi-stop datashop” is ill-defined.

Response: this was defined earlier


16,34 Delete “(OSDD)”, since this abbreviation is not used subsequently.

Response: Thank you


16,34 add “, and” after “similar?’”

Response: Thank you


16,45 delete “(commercial or academic)”; doesn’t seem to add anything

Response: Thank you


16,48 add comma after “same answer”

Response: Thank you


16,48-49 change “operate (i.e….. structure terms).” to “operate; i.e., ….structure terms.”

Response: Thank you


17,47 add comma after “same compounds”

Response: Thank you


Well if you made it this far you perhaps realize that the time spent actually finding the NIH probes and doing the due diligence by Chris and our modeling efforts were virtually matched with the time and effort spent responding to multiple rounds of peer review. The Minireview is now available so see if it was worth all the effort.. [as of Dec 9th it has been recommended in F1000Prime ]

So what can I say in conclusion, well as with previous challenges getting contentious issues published again it takes perseverance and reviewer comments and journal responses were a mixed bag. I hope it alerts other groups to the set of probes which are now available in the CDD Vault and elsewhere. In addition Alex Clark has put them into his approved drugs app as a separate dataset – and it is available for free for today only.. The challenge of public and commercial chemical databases will likely continue, but the impact for due diligence is huge, you can no longer rely on Scifinder as a source of chemistry information. The Chemistry data and databases are exploding, and moving fast. Journals and scientists need to wake up to what is going on to. The groups developing chemical probes, need an experienced medicinal chemist to help them and journals that publish papers on chemical probes need strict peer review and dues diligence of a probes quality. A model may be a way to flag this in the absence of the actual chemist.

On the general publishing side, I frequently get comments about publishing in the same journals, well my response is when I try to break out of the mould and try to reach a different audience I get a luke warm, or down right chilly response. Having never published in ACS Chemical Biology or Nature Chemical Biology I tried there first. I did not have an ‘in” I could rely on, no buddies that can review my papers favorably. Even when we have been encouraged by an Editor to write something for a journal such as another recent review paper with Nadia on stem cells, initially targeted at Nature Genetics, that does not guarantee it will see the light of day in that journal. After submission to other journals like Cell Stem Cells, finally it was published in Drug Discovery Today. I can say again that publishing in F1000Research is a breeze by comparison to going for the above traditional big publisher journals, I appreciate the open peer review process and transparency as can be seen in another recent paper ..

I hope by putting this post together people realize what it takes to get papers out. I owe a huge debt of gratitude to Chris Lipinski for inspiring this work and for doing so much to raise this issue and Nadia for driving the analysis of the probes, and our co-authors for their support and contributions to writing, re-writing and re-writing again!

Update Dec 15 2014…the J Med Chem Minireview gets a mention on in the pipeline..





Giving and taking my own advice on starting a company

I have a tendency to agree to other peoples good ideas to do things and then it catches up with me. You know that feeling of massive over commitment.

Well I started to say NO a lot more frequently. I say no to all manner of requests to answer questionnaires, review for journals from publishers I never heard of, present at different kinds of unrelated conferences in China (although I really would like to go one day). I still say yes to anyone asking me questions, whether its students wanting career advice, friends needing references, strangers wanting reprints etc. Rare disease parent advocates are the top of my YES list. I have plenty of time for them because they have immense challenges to raise funds and convince other scientists to do the research that one day may help their child. Their approach has been a revelation to me to the extent that I have had to put aside other academic pursuits. I have let 2 book proposals languish, along with several papers I have always wanted to work on. Well I hope I have time in the future to get back to these projects. My focus, if I have any, is rare and neglected diseases. the latter because they are killing millions needlessly and the former because there are so many of them with huge gaps in our knowledge and both severely lack funding.

A few months a go Jim Radke at Rare Disease Reports asked if I would like to blog for them on occasion. Now I have a lot of time for Jim because he is very generous at highlighting all different groups involved in rare diseases and the website has a wealth of info. He is also a super nice fellow who told me I should do more of what I do on rare diseases. So I said yes, signed the contract and got on with my work. I also pledged to give any royalties to the rare disease foundations I work with. In the back of my mind for the past few months, all I could think about was writing on something I really am not an expert on but which might help others, by presenting the famous ‘Ekins naive perspective’ on a topic.

That blog topic was released today, a burst of writing after breakfast (not usually peak creativity time for me) and I put together a draft on giving advice and then taking that advice on starting a rare disease company. I am not a classic entrepreneur, no MBA, no business training, never started anything in my life and I do not drool after others’ business advice. Talk is cheap. 3 years ago I dispensed a little advice to Jill Wood, she listened, and she went off and started a company. It took 2.5 years to fund and we have a very long way to go before we will likely have anything to show for it.  Although with rare disease parents and devoted scientists like those involved, that could change very quickly. I dispensed advice but really I was probably telling myself to do the same. I had a huge moment of inertia to overcome, a family, 2 children and several part time consulting jobs that paid for the fun projects. Helping rare disease groups has become a bigger part in my life. You meet the rare disease children, the parents, the families, the scientists and that has a big impact too. Whatever I can offer is pretty insignificant, but ideas and experience at funding work through grants is totally translatable. A positive outlook when all the odds are against you helps in some small way. We will never start a major pharma but then we do not have to. Small is great. One scientist, one parent or patient is enough. The straw that broke the camels back was meeting another rare disease parent a few weeks ago, she told me of the treatment that might help her daughter that was languishing in a lab. I resolved then that I had to use the experiences gained from starting one rare disease company to start another for those unable to do what Jill did, because they have a full time job looking after the child with a debilitating disease.

At coffee with a good friend a few days later I mentioned my need to start this New Company and my severe inability to actually do it. I went off and wrote a 1 pager because that is what I generally do, write something. My friend then put me in touch with an accountant and the ball was rolling.  A couple of weeks later on and I had no reason why I could not write todays blog, I had given advice 3 years ago and it it took me that long to take it myself. Better late than never. My incorporation papers came through today.



App exposure

It is the little things in life that make it all worthwhile – perhaps. I have noticed a definite uptick in interest in the science apps we developed over the last few years. For example I was recently contacted by a science writer who wanted to write about the open drug discovery teams app. This is all very flattering, so we will see how that translates to an article. Simultaneously I noticed that Derek Lowe over at the In The Pipeline blog mentioned Android mobile apps and TB Mobile got a mention . Of course these free apps have been around on iOS for several years now. The Android version of TB Mobile is version 1, while TB Mobile 2 on iOS has built in machine learning algorithms and many more features.

There is a definite lag time between building something novel for science like a mobile app, getting some visibility for it and even publishing papers on it. Finally to get more general interest, and well by then you have perhaps moved on to something new. A post mobile app world anyone. Yes, the fact that so much technology can be crammed into these chemistry apps by the likes of Alex Clark is truly incredible. The fact that chemists are only now realizing the potential of mobile apps is astounding. I am a late adopter of technology, but chemists and pharma in general seem even later adopters. Imagine if these apps had been around 5 years ago. Of course it does make you wonder how many apps will survive and live on in one form or another as the software and mobile device tastes change. It would be a shame to lose some of the innovation just when its getting real traction and credibility.  A discussion of the legacy of the data and algorithms created in these apps needs to be going on to prevent someone else in the future repeating what has already been created.


Rare disease collection

The last few months I have been putting some rare disease related ideas out into publications, and now they are starting to see the light of day.

First there is an editorial that highlights some of the challenges in finding out about the 700o or so rare diseases. The editorial also has some artwork from a couple of children with rare diseases in the hope this can raise a bit more awareness. This is followed by an opinion piece with collaborators Nadia Litterman, Michele Rhee and David Swinney in which the idea of a centralized rare disease institute is raised as well as discussing topics such as how to connect with others interested in rare diseases and how to foster collaborations.

Hopefully these articles will be followed in the rare disease collection at F1000Research by submissions from other scientists and rare disease advocates.

Older posts «