Chemical probes and parallel database worlds – who wants to know? More publishing fun



This post is long and a highly detailed description of the challenges involved in getting scientific work published on one level, on another it gets to the heart of discoverability of data, data analysis and just the slog of publishing something that you hope is going to interest others in your direct field. You need to persevere and have an incredibly thick skin.

Yesterday I presented our recent work “Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s evaluation of the NIH chemical probes” at the In Silico Drug Discovery conference held at RTP. This work described a couple of recent collaborative publications, one of which was described in an earlier post as a very expensive dataset that included as many of the NIH Probes as we could gather.

Actually the whole project kicked off earlier in the year when I was visiting the CDD office in CA. Chris Lipinski, a long time board member was describing the challenges he was facing trying to find the “NIH Probes” and the incredibly detailed due diligence he was undertaking. Chris was doing this huge amount of work and if I remember correctly I just threw it out there that we should be modeling his score. This was another once of those moments where saying and doing it are completely possible but entailed a lot of work. I had no idea who or what would benefit from doing it, but it would be pretty interesting to see if a machine learning method could be used to help a medicinal chemist with the due diligence process, at least slim down the interesting compounds. Along the way of course you learn unexpected things and these have value. I had no idea during the initial idea what a Pandora’s box would be opened.

With Nadia Litterman and Chris we went through multiple iterations of model testing and inevitably we threw in a few other approaches to score the probes such as ligand efficiency, QED, PAINS and BadApple. Barry Bunin also helped us to interpret the descriptors we were finding in the Bayesian models. As you can see the scope of what we embarked on expanded greatly (and if you read the paper it will be even clearer). Chris spent countless hour scoring over 300 compounds. As we went through the write up process after a first pretty complete version, I realized we had more than just a modeling paper, there was also this complex perspective on using public and commercial chemical databases. Through past collaborations with Christopher Southan, Antony Williams and Alex Clark I thought they would be able to chime in too. In the end we had pretty diverse thoughts on the topic of public and commercial chemistry databases.

The NIH probe modeling paper was submitted to ACS Chemical Biology initially. We thought this was a good choice as this journal publishes many manuscripts that describe new chemical probes and our research may help in improving the quality of these molecules. We had the following reviews for the modeling paper from ACS Chemical Biology – needless to say it was rejected. The reviewers comments are perhaps useful insights and may indicate why so many shoddy probes get published in this an other journals.

Reviewer(s)’ Comments to Author:

Reviewer: 1

Comments to the Author
This publication details the creation of various computational models that supposedly distinguish between desirable and undesirable small molecules based on the opinion of one experienced medicinal chemist, “C.A.L.” – presumably Chris Lipinski.  Although Lipinski’s rule of 5 filters have been widely discussed, and Lipinski’s opinions are generally highly regarded, the authors also point out a key publication of Lajiness et al., reference # 8, in which it is noted that a group of 13 chemists were not consistent in what they rejected as being undesirable.  The logic is inescapable.  If 13 chemists are not consistent in their viewpoints, then why should one chemist’s viewpoint be any better than any of the others?  And, since Lipinski’s filters have already been widely discussed in the literature and are readily available in several cheminformatics packages, what is the new, useful, and robust science here that is going to aid screening?  What is the new value in having some kind of new computational filtering scheme that supposedly reproduces Lipinski’s viewpoint.  Unless it can be clearly shown that this “mechanized” viewpoint does a much better job at selecting highly useful chemical matter without high false negative and false positive rates relative to say, 12 other reasonably experienced medicinal chemists, I see little value in this work and I do not recommend publication.  The publication does not currently demonstrate such an advantage.

Reviewer: 2

Comments to the Author
This submission makes appropriate use of Bayesian statistics to analyze a set of publically available chemical probes. The methodology is clearly described and could have general applicability to assess future probe molecules.

I would have liked to see a more critical assessment of the process that has lead to around 20% of all new probes being rated as undesirable. The authors suggest that the concepts of rule-of-five compliance and ligand efficiency appear to have become accepted by the chemical biology community, while other factors such as the use of TFA for probe purification and sub-structural features have not become accepted. My own experience would implicate lack of awareness of these negative factors in groups involved in probe synthesis, since they often lack access to the “in house medicinal chemistry expert” suggested by the authors.  In addition, the substructure features are often encoded in a way that they are not accessible to the target community.

The authors also hint that the quality of vendor libraries might be behind the issue. A reminder (reference) that the final probe is likely to resemble the original hit might help.

I would also like to see a proposal for making the Bayesian models available to a wider community. As a recent CDD user, I note that they outline a CDD Vision, which might be a route to encouraging usage of the current models.

reviewer 3

The current work attempts to create a model that will faithfully match the opinion of an experienced medicinal chemist (Dr. Christopher Lipinski) in distinguishing desirable from undesirable compounds. The best model (Bayesian Model 4) is moderately successful (ROC = 0.735 under 5-fold cross-validation).

An important unanswered question is whether the best model performs as well as published filters such as PAINS and the Lilly, Pfizer, Abbott, and Glaxo rules. PAINS and the Lilly rules are available on public websites (http://mobyle.rpbs.univ-paris-diderot.fr/cgi-bin/portal.py#forms::FAF-Drugs2 and http://tripod.nih.gov). The Pfizer, Abbott, and Glaxo queries are available in their respective publications (see refs 30-32 in Rohrig, Eur J Med Chem 84:284, 2014). Most of the “bad features” in Figure S8 look like they should match PAINS filters, but it isn’t possible to tell for sure without having the structures of the undesirable compounds (see the next paragraph).

Although I respect Dr. Lipinski, taking his assessments as “truth” in building a model is a stretch. Without seeing the structures of the desirables and undesirables, I have a hard time knowing what this study is trying to model. The Methods section indicates that the data set is available on the Collaborative Drug Discovery site, but I wasn’t able to find it there, although I did find quite a few other items that would be useful to chemists involved in screening and lead generation.

Why use just one medicinal chemist? There are a lot of experienced medicinal chemists who are retired or out of work, so it seems to me it wouldn’t be hard to assemble a panel of chemists to rate the compounds. Given the amount of money that NIH has spent on their screening initiative, maybe they would be interested in sponsoring such an exercise? Do the N57 and N170 datasets add value? The N307 set gave the best model, and if you want to do a chronological train/test split the N191 set would serve that purpose. [By the way, a chronological train/test split is a more rigorous test than a random split, so I am glad to see it used here.]

References 29, 39, and 48 seem to refer to websites, but no URL is given. If you are using EndNote, there is a format for referencing websites.

In the legend to Table 1, it mentions that mean pKa was 8.12 for undesirable and 9.71 for desirable compounds. Since these pKa values are greater than 7.4, wouldn’t these compounds be uncharged at physiological pH? I’m wondering why they are classified as acids.


Then we submitted essentially the same manuscript with minor edits to the Journal of Chemical Information and Modeling.  the reviews and our responses are shown below.

Reviewer(s)’ Comments to Author: Reviewer: 1 Recommendation: Publish after major revisions noted. Comments: The manuscript by Litterman and coworkers describes the application of state-of-the-art cheminformatics tools to model and predict the assessments of chemical entities by a human expert. From my perspective this is a relevant study for two main reasons: first, it is investigated to which extent it will might possible to standardize the assessment of the quality of any chemical entity. And secondly, the paper addresses a very important question related to knowledge management: is it possible to capture the wisdom of an experienced scientist by an algorithm that can be applied without getting direct input, for instance when the scientist has retired.

RESPONSE: Thank you

However, there are some fundamental points which I recommend to be adressed before the manuscript can be accepted for publication in JCIM. (1) It is suggested that the models which were trained from the expert’s assessment of the NIH probes can be used to identify desirable compounds (last paragraph). Here it should be clearly emphasized that the models are able to classify compounds according to the expert’s a priori definition of desirability. It remains to be seen whether these probes are valuable tool compounds or not. Some of them might turn out to be more valuable than they would be assessed today (see also Oprea at al., 2009, ref 1).
RESPONSE:  – The paragraph was changed to – A comparison versus other molecule quality metrics or filters such as QED, PAINS, BadApple and ligand efficiency indicates that a Bayesian model based on a single medicinal chemist’s decisions for a small set of probes not surprisingly can make decisions that are preferable in classifying desirable compounds (based on the expert’s a priori definition of desirability).

(2) Neither the QED nor the ligand efficieny index has been developed to predict the medchem desirability score as it is described in this article. QED for instance was derived from an analysis of several molecular properties of orally absorbed drugs. It is therefore not suprising that e.g. the QED score shows a poorer performance than the Bayesian models when predicting the desirability scores of the validation set compounds. In the way the comparison with QED and LE is described the only valid conlcusion that can be drawn is that QED and LE on one hand and the medchem desiability score don’t agree. One can’t conclude that the methods perform comparable or that one outperforms the other.
RESPONSE:  -We agree and perhaps would add that drug likeness methods do not represent a measure of medicinal chemist desirability. We state in the introduction “In addition we have compared the results of this effort with PAINS 22, QED 24, BadApple 28 and ligand efficiency 25, 29.”

In the methods we have reworded it to “The desirability of the NIH chemical probes was also compared with the quantitative estimate of drug-likeness (QED) 24 which was calculated using open source software from SilicosIt  (Schilde, Belgium).”

In the results we have reworded, “We also compared the ability of other tools for predicting the medicinal chemist’s desirability scores for the same set of 15 compounds. We found neither the QED, BadApple, or ligand efficiency metrics to be as predictive with ROC AUC of 0.58, 0.36, and 0.29 respectively. Therefore these drug likeness methods do not agree with the medicinal chemist’s desirability scores.”

In the discussion we edited to, “A comparison versus other molecule quality metrics or filters such as QED, BadApple and ligand efficiency indicates that a Bayesian model based on a single medicinal chemist’s decisions for a small set of probes not surprisingly can make decisions that are preferable in classifying desirable compounds (based on the experts a priori definition of desirability).”

(3) Taking 1 and 2 into account, the title is misleading: the expert’s assessment can only be validated by later experiences with the probes (i.e., were they found to be frequent hitters  etc). The models described in the article can only be validated by comparing predicted expert’s assessments with the actual assessments for an independent set of molecules.

RESPONSE:  -We would suggest that the title is correct because we built models that predicted molecules not in the training set for which the experts assessment was predicted and this assessment in turn included literature on biology, alerts etc. By predicting accurately the scores of the probes not in the training set, we have validated the model. The scored NIH probes that were not in the 4 iterative models in each phase are described (see Table 5 for statistics for external testing for each model). We otherwise agree with the reviewer that our computational model does not address the utility of the expert medicinal chemist’s judgment, which will be born out through future experimentation.

(4) It would be very important to judge the relative impact of “objective” criteria such as the number of literature references associated to a particular compound and “subjective” criteria like the expert’s judgement of chemical reactivity to the final desirablity assessment. A bar chart (how many compounds were labeled as undesirable b/o reactivity, how many b/o literature references etc) would help.
 RESPONSE: We agree that this is an important point. We have added a new figure (Figure 1) a pie chart to display how many compounds were labeled as undesirable due to each criteria. Approximately half of compounds are judged undesirable due to chemical reactivity.

(5) How is publication bias taken into account ? For instance it is conceivable that probe has been tested in many assays after it has been released, but was always found to be negative. If these results are not published (for any reason), the probe would be classified as undesirable. Would that alone disqualify the probe ? It might also occur that a publication of a positive result gets significantly delayed – again, the probe would be labeled as “undesirable”. Were any measures applied to account for this publication bias ?

RESPONSE:  The authors acknowledge these problems when considering publication status, and is reflected in our discussion of “soft” skills related to medicinal chemistry due diligence. For example, new compounds, those published in the last 2-3 years, were not considered undesirable due to lack of literature follow up.  We have added this to our discussion. Despite the severe limitations of our system, which we acknowledge as inherent to medicinal chemistry due diligence, our models were able to accurately predict desirable and undesirable scores.

(5) Constitution of training and validation sets for the individual model versions: it is stated that “after each model generation additional compounds were identified” (p 10). From which source where these compounds identified, why were they not identified before ? How were the smaller training sets selected (Bayesian model 1 – 57 molecules; model 2 – 170 molecules) ?

RESPONSE:  – As described in the Experimental section “With just a few exceptions NIH probe compounds were identified from the NIH’s Pubchem web based book 30 summarizing five years of probe discovery efforts. Probes are identified by ML number and by PubChem CID number. NIH probe compounds were compiled using the NIH PubChem Compound Identifier (CID) as the defining field for associating chemical structure. For chiral compounds, two dimensional depictions were searched in CAS SciFinderTM (CAS, Columbus OH) and associated references were used to define the intended structure. “

Each of the datasets were generated as Dr. Lipinski found the structures for additional probes. This process was complex and is the subject of a mini perspective submitted elsewhere because of the difficulties encountered which are of broader interest.

(6) As stated on p 18, the due diligence relies on soft skills and incorporates subjective determinations. These determinations might change over time, since the expert acquires additional knowledge. How can this dynamic aspect be incorporated in developing models for expert assessments ? The paper would benefit from suggestions or even proof-of-concept studies to adress this question.

RESPONSE:  -This is a great point while we feel it is beyond the scope of this project, it is worth pursuing elsewhere in more detail. We have documented for the first time the ability to model one medicinal chemist’s assessment of a set of probes, which is a snapshot in time. The number of probes will increase and the amount of data on them will change over time. The medicinal chemists assessment will likely also change. Our rationale was select a chemist that has great experience (40+ yrs ) that has seen it all – the assessment in this case is likely more stable. We are just modeling this chemists decision making.

(7) It is difficult to judge the relevance of the comparison with BadApple – more details on the underlying scope and methodology or a literature reference are necessary.

RESPONSE:  The BadApple score is determined from publicly available information to determine promiscuous compounds. We have added clarification and references to the website and the  American Chemical Society presentation in the text.

(8) In ref 22 and 23 substructure filters and rules are described to flag potential promiscous compounds. How many of the NIH probes would be flagged by e.g. PAINS ?

RESPONSE: The PAINS filters flagged 34 of the NIH chemical probes – 25% of the undesirable and 6.7% of the desirable. We have included this data in the text and added it to Figure 3.

Additional Questions: Please rate the quality of the science reported in this paper (10 – High Quality/ 1 – Low Quality): 6 Please rate the overall importance of this paper to the field of chemical information or modeling (10 – High Importance / 1 – Low Importance): 8 Reviewer: 2 Recommendation: Publish after minor revisions noted. Comments: Computational Prediction and Validation of an Expert’s Evaluation of Chemical Probes The work presented in the manuscript describes an effort to computationally model CAL’s (an expert medicinal chemist’s) evaluation of chemical probes identified through the NIH screening initiative. CAL found about 20% of the hits as undesirable. This exercise is used as an initial example of understanding how medicinal chemistry evaluation of quality lead chemical matter is performed and whether that can be automated through computational methods and/or expert rules teased out or learnt. The manuscript is well written, evaluation of chemical matter and capture of various criteria thorough and the computational modeling methods sound, that I don’t have any suggestions on the manuscript, experimental details, commentary and conclusions.

RESPONSE: Thank you

However, I have a philosophical question on the study that the authors have carried out and perhaps that can addressed through comments back and weaved into the manuscript discussion somewhere. Given that human evaluation of anything is very subjective and biased to begin with (As ref 8 – Lajiness et al. study indicates), what does one gain from one expert evaluation as opposed to a medchem expert panel evaluation. For e.g., a CNS chemist evaluating probes for a CNS target versus an oncology chemist evaluating probes for a end-state cancer indication will have very different perspective on attractive chemical matter or different levels of tolerance threshold during the evaluation. Further even within a single project team, medchem campaigns in the pharmaceutical industry are mostly a team-based environment, where multiple opinions are expressed, captured and debated. There is no quantitative evidence to date, that any one approach is better than the other, however consensus of an expert panel might certainly identify common elements that could be developed as such(?)

RESPONSE: Yes this is a great point. The earliest work on the probes as described used crowdsourcing with multiple scientists (not just medicinal chemists) to score the probes. We do now state in the final sentences – “This set of NIH chemical probes could also be scored by other in-house medicinal chemistry experts to come up with a customized score that in turn could be used to tailor the algorithm to their own preferences.  For example this could be tailored towards CNS or anticancer compounds”.   In the case of the study this was not a consideration. We only looked at ‘Were the compounds desirable or not based on the extensive due diligence performed’. One concern with consensus decisions is that it may dilute the expert opinion, when our goal was to capture the decisions of one expert and not the crowd. We had termed this ’ the expert in a box’ casually, could we capture all of that insight and knowledge and then distill it down to some binary decision using some fingerprint descriptors? Our answer so far based on this work was yes. Additional Questions: Please rate the quality of the science reported in this paper (10 – High Quality/ 1 – Low Quality): 6 Please rate the overall importance of this paper to the field of chemical information or modeling (10 – High Importance / 1 – Low Importance): 5


As for the discussion on public and commercial databases this work was submitted to Nature Chemical Biology as a commentary. The same journal published the only prior analysis on 64 chemical probes in 2009. We thought this would be a perfect location for a discussion of the issues between public and commercial databases. After all Nature is so supportive of data reproducibility.

Dear Dr. Ekins:

Thank you for your submission of a Commentary entitled “The parallel worlds of public or commercial chemistry and biology data”.

Our editorial team has read and discussed your manuscript. Though we agree that the topic of chemical and biological data is relevant to our audience, we unfortunately are not able to consider your Commentary for publication. Because we have such limited space for Commentaries and Reviews in the journal, these formats are typically commissioned by the editors before submission. Since we have a strong pipeline of content at the moment, especially in areas related to the development and validation of chemical probes, we unfortunately cannot take on any more Commentary articles in this particular area.

We are sorry that we cannot proceed with this particular manuscript, and hope that you will rapidly receive a more favorable response from another journal.

Best regards,

Terry L. Sheppard, Ph.D.
Nature Chemical Biology

So we then submitted it to The Journal of Medicinal Chemistry as a miniperspective – we went through 2 rounds of peer review and the manuscript changed immensely based on the reviewer comments.

Reviewers’ Comments to Author: Reviewer: 1 Comments: This is a thought-provoking article that is appropriate for publication as a Perspective in JMC. I recommend acceptance with minor edits.

RESPONSE: we thank the reviewer for their comment.

It is important that this article be clearly labeled as a Perspective, as there is a significant number of personal opinions and undocumented statements throughout.  Given the recognized professional stature of the authors, I do not doubt the veracity and value of such statements, but they certainly deviate from a JMC norm.  There are also some controversial statements that are valuable to have in writing in such a prominent journal as JMC, and I look forward to alternative interpretations from other authors in future articles.  I consider this normal scientific discourse, and encourage JMC to publish.

RESPONSE: This article is a Mini-Perspective. We have tried not to be too controversial but we feel the timing is appropriate before the situation gets too far out of hand.

Some suggestions: 1. The title is misleading (at least to me).  I recommend the term “biology data” should be re-phrased as “bioassay data”.  I might be splitting semantic hairs, but the vast majority of data encompassed in this article does not deal with efficacy or behavior of animals.  True biological data is much more complicated (dose, time, histology, organ weights, age, sex, etc.) than the data cited here (typically, EC50 or IC50 data).  I defer to the authors on this point.

RESPONSE: Thank you, we have changed to “The parallel worlds of public and commercial bioactive chemistry data”

2. Page 4, line 22. A comma is needed after “suspects’)”.

RESPONSE: Thank you, this has been added. 3.

Page 11, line 47.  I found myself asking “What is the value of prophetic compounds?”  The authors write that the “value is at least clear”, but as I read this line, the value became unclear (to me).  I recommend that the authors explicitly indicate that value, particularly as it is relevant to the Prior Art question treated in this paragraph.  I suspect the value is to “illustrate the invention,” but I defer to a legal expert for better verbiage.  If we are going to expend computational time in searching and interpreting these prophetic compounds, then surely there must be a value beyond the initial illustration of the invention.

RESPONSE: We have greatly expanded on these topics in the text – there has already been some discussion of this. We also added a glossary.

4. Page 21, reference 26. The authors must add the Patent Application Number.  I believe this is US 20090163545, but I defer to the authors.  Also, if this application has led to a granted patent, that citation should be included as well.

RESPONSE: we have updated the number in the references and the text.

5. Figure 1.  While artistic, this picture is confusing to me.  Please re-draw and remove the meaningless sine wave that traverses the picture.  Please re-position the text descriptors beneath each compound uniformly, in traditional JMC style.  The picture concept, e.g. illustration of the various kinds of compounds, is useful.

RESPONSE: We have redrawn as requested.

6. Figure 2. This is an interesting figure and I feel it adds visually to stress the theme of the paper.  However, please amend the legend to explicitly define the size and absence of a circle.  I presume the size of the circle reflects the relative size of the cluster, and the absence of a circle denotes a singleton, but I am unsure.  The red/blue dots are intriguing, but I am unclear on how “desirability” is quantitated.  Perhaps the authors intend the red/blue dots to be only a rough, maybe even arbitrary or random, visual cue with most compounds scoring intermediate.  Please provide a line in the legend that explains how the red/blue was scored.

RESPONSE: We have updated the legend. The desirability scoring is the subject of a separate manuscript in review at JCIM. This Figure 2 is not published elsewhere.– Figure 2. The chemical structures for 322 NIH MLP probes (http://molsync.com/demo/probes.php) have been clustered into 44 groups, using ECFP_6 fingerprints 49 and using a Tanimoto similarity threshold of >0.11 for cluster membership. Each of the clusters and singletons: for each cluster, a representative molecule is shown (selected by picking the structure within the cluster with the highest average similarity to other structures in the same cluster). The clusters are decorated with semicircles which are colored blue for compounds which were considered high confidence based on our medicinal chemistry due diligence analysis (Manuscript in review), and red for those which are not. Circle area is proportional to cluster size, and singletons are represented as a dot.

Reviewer: 2 Comments: The ‘perspective’ by Lipinski et al. is in part difficult to follow and it remains largely unclear what the authors aim to bring across. One essentially looks at a collection of scattered thoughts about databases, search tools, molecular probes, or patents etc. Various (in part technical, in part general) comments about SciFinder and the CAS registry are a recurrent theme culminating in the conclusion that SciFinder is probably not capturing all compounds that are currently available… The only other major conclusion the authors appear to come up with is their wish for ‘more openness in terms of availability of chemistry and biological data …’ (but there  is little hope, as stated in the very last sentence of this manuscript …). This draft lacks a clear structure, a consistent line of thought, and meaningful take home messages that go beyond commonplace statements and might be of interest to a medicinal chemistry audience. This reviewer is also not certain that some of the more specific statements made are valid (to the extent that one can follow them), for example, those concerning ‘data dumps’ into public databases or the ‘tautomer collapse’.

RESPONSE: We have extensively rewritten the mini-perspective to address the reviewer concerns and provide more structure and narrative flow. We have made it more cohesive and come up with additional recommendations to improve the database situation. We have removed the term data dump and expanded other terms. We have added take home messages and conclusions as suggested.

Be that as it may, there already is a considerable body of literature out there concerning public compound databases, database content, and structural/activity data, very little of which has been considered here. Which are the major databases? Is there continuous development? What are major differences between public compound repositories? Are there efforts underway to synchronize database development? What about the current state of data curation? What about data integrity? Is there quality control of public and commercial databases? Is there evidence for uniqueness and potential advantages of commercial compound collections? What efforts are currently underway to integrate biological and chemical data? Why are there so many inconsistencies in compound databases and discrepancies between them? How to establish meaningful compound and data selection criteria? How do growing compound databases influence medicinal chemistry programs (if at all)? Is their evidence for the use of growing amounts of compounds data in the practice of medicinal chemistry? How do chemical database requirements change in the big data era? Such questions would be highly relevant for a database perspective.

RESPONSE: We have addressed several of these questions in the perspective. Many of these were topics we have raised earlier and now reference those papers. We have also created Table 1 (unpublished) to add more detail on databases.

As presented, some parts of this draft might be of interest for a blog, but the manuscript is not even approaching perspective standards of J. Med. Chem.

RESPONSE: We have extensively rewritten the mini-perspective to address the reviewer concerns. Based on the feedback of the other reviewers they had less concern or issue with the standard. We believe it is now greatly improved. J Med Chem is the appropriate outlet to raise awareness of this issue which will be of interest to medicinal chemists globally. We think this goes beyond the audiences of our respective blogs.

Reviewer: 3 – Review attached.    This paper addresses an important and timely topic, but it is disorganized and in places reads as an informal recounting of annoyances the authors have encountered in their development and use of various chemical databases. It could use a bit of rethinking and rewriting; some more specific comments and suggestions are provided below for the authors’ consideration.

RESPONSE: We have extensively rewritten the mini-perspective to address the reviewer concerns and provide more organization and structure in general. We believe it is now greatly improved.

It is not clear to this reader what is the main concern the authors wish to address with this article. It starts by taking the reader through some of the detailed problems of identifying and learning about a set of about 300 interesting compounds curated by the NIH; but it is never clear why these compounds are of leading interest here. Are they being used just as examples, or are they particularly important? Later, the paper puts considerable emphasis on the difficulty of completing IP due diligence for new compounds, due to the heterogeneity of chemical databases, and it began to appear that this was the main concern. The paper would benefit from a more specific statement of its concerns at the outset. Although the paper frequently refers generically to proprietary and public chemical databases, my impression is that the only proprietary database that is specifically mentioned is SciFinder/CAS. Are there any other proprietary databases (e.g., Wombat or the internal databases of pharma companies) to which the authors’ comments apply? If not, then the article would be clearer if it specified at the outset that the key proprietary database at issue is CAS.

RESPONSE: We now made it clear that the probes are used as an example and the problems we encountered when trying to find them and also score them for desirability (described in a manuscript in review at JCIM). We have also created Table 1 (unpublished) to add more detail on databases. We believe the issue is not just with CAS and have now expanded this to cover other databases in Table 1.

Many readers will not be familiar with the specific requirements for successful due diligence search, so these should be spelled out in the paper. Without this, many readers will not understand how the current chemical informatics infrastructure falls short for this application. Along similar lines, the authors should define “probe” compounds, “good probes”, “bad probes”, “prophetic compounds”, “text abstracted compounds” and other terms of art that are likely to be unfamiliar to many readers.

RESPONSE: We have provided references that address these questions, we have also added more explanation and a glossary.

A small quirk of the presentation is that the authors list multiple “personal communications” from themselves to themselves. This appears to be an effort to allocate credit to specific authors, but it’s not a practice I’ve seen before, and it strikes a jarring note. Perhaps some style expert at ACS Journals can clarify whether this is a suitable practice.

RESPONSE: We have removed these author communications and abbreviations as proposed.

There are a number of places where the authors make assertions that are vague and unsupported by data or citations. For example, on page 4, it isn’t clear how the analysis of 300 probes revealed the complexity of the data, and Figure 2 does not help with this this. (It looks like just a diagram of compound clusters.) Similarly, at the bottom of page 9, concerns are raised about the accuracy of chemical structures in catalogs, but the support is weak, as the reader only gets the personal impressions of A.J.W. and C.A.L. If C.A.L. has estimated that 50% of “commercially available” compounds have never been made, it should not be difficult to add a sentence or two explaining how the analysis and the data. Similarly, if A.J.W’s “personal experience” of processing millions of compounds has taught him the many compounds from vendors have “significant quality issues”, then it would be appropriate to provide summary statistics and examples of the types of errors. Similarly, it would be appropriate to replace “many compounds” by something more quantitative; “many” could mean 10% to A.J.W., but 50% to a reader.

RESPONSE: We have clarified the number of probes. We have provided references to our other papers dealing with database quality issues which are quantitative in this regard. We have removed the author abbreviations. We have removed any ambiguity in the numbers presented.

On page 4, in what sense have “multiple scientists scored a set of” compounds? What is meant by “score”, here? In what sense are the 64 probes “initial” and does it matter?

RESPONSE: We have expanded this and would recommend the reader read the actual paper for more detail because different scientists scored differently. Score represents each scientists evaluation of the desirability/ acceptability of the probe.

On page 5, we read that there is no common understanding of what a high-quality probe is, but then a definition is provided; this seems inconsistent. What is a “parent probe book”? The challenges encountered by C.A.L. in getting data on the NIH probes seem overly anecdotal, and it isn’t clear whether the reader is supposed to be learning something about problems at NIH from this, whether this experience is supposed to reflect upon all public chemical databases, etc. Why conclusion should the reader draw from the fact that C.A.L. eventually found a relevant spreadsheet “buried deep in in an NIH website”? It’s also a little confusing that, after this lengthy account of problems collecting all the probe information, the paper then praises the NIH probe book as a model to emulate. Finally, at the top of page 6, the authors speculate about the chemical vs biological orientations of the labs which providedthe probe data, but this seems irrelevant to any points the paper is making.

RESPONSE: We have removed this conflicting text – we believe the issues identified in this procedure are important. Access to probe molecules and data is complicated, non-obvious if not painful. The public funded efforts should make the data more accessible, this review just hints at the difficulties.

The section heading “Identifier and Structure Searches” tells the reader little about what the section will contain; and then the section in fact wanders from one topic to another. It starts with comments about ligand similarity and target similarity, discusses whether or not medicinal chemists are too conservative, delves into the vagaries of SciFinder’s chemical search capabilities, and finally devotes most of a very long paragraph to discussion of a single patent which references thousands of compounds. It isn’t clear why the reader is being told about this patent; is it problematic enough on its own to be worth extended commentary, or is it regarded as a small but worrying harbinger? Finally, the text recounts that “C.A.L. had initially worried that a reference to this patent application was somehow an indicator for a flawed or promiscuous compound. We now believe … this single patent application is an example of how complete data disclosure can lead to …. potentially harmful consequences.” It’s not clear that the report of initial worries help the reader to understand what is going on with this patent; and I didn’t fully understand the harmful consequences of this patent from the text provided.

RESPONSE: we have added an introduction to this section to facilitate lead in to the discussion. Again we have greatly edited this section to make it clearer.

Page 10: what is a “type c compound”? Who are “the hosts of ChemSpider”? Is the story about CAS and ChemSpider important for the messages of the paper?

RESPONSE: We deleted type c compound for clarity – The RSC own Chemspider. We think the story with CAS is relevant because it covers how data can pass between databases and possibly transfer problematic compounds.

Page 10: At the bottom of the page, a concern raised about the lack of metrics for specifying activity against a “biological target” is vague. Presumably the concern is greatest for phenotypic screens; one wonders whether the authors also regard Kd values as inadequately standardized. This may be the case, but more detail is needed to help the reader understand what point the authors mean to get across.

RESPONSE: We have edited this and added metrics for bioactivity – our main point is integrating data in databases and inadequate annotation and the requirement for ontologies to improve this.

Page 11 says that efforts are underway to standardize bioassay descriptions, based on “personal communication” from two of the authors. Are we to understand that these authors are actually doing the work, or are they personally communicating (to themselves) that someone else is doing it?

RESPONSE: We now added a recently published paper and removed the references to communications between authors.

Page 11, what does it mean for compounds to be “abstracted in databases”? Is this something different from just being listed in databases?

RESPONSE: This was changed.

Page 12: what are “tabular inorganics”? Can the authors at least estimate how much smaller the SciFinder collection would be if tautomer variants were merged? What is “an STN® application programming interface”? Is it different from some other type of application programing interface?

RESPONSE: We added a definition for tabular inorganics in the glossary. The STN API is described in a press release http://www.cas.org/news/media-releases/scifinder-offers-api-capabilities now added to the references. We do not know how much smaller the Scifinder collection would be if tautomers were merged.

Page 12: The last sentence says that proprietary and public databases will diverge until proprietary databases “determine how to extract quality data from the public platforms”. Couldn’t the proprietary databases take the public data now, and thus presumably eliminate any divergence? On the other hand, if they only extract some “quality” subset of the public data, then the divergence will persist, but this raises different issues, regarding the definition and identification of “quality” data.

RESPONSE: We have removed much of this discussion. CAS was taking the Public data like ChemSpider as described, but that ceased. It looks like CAS and likely other commercial databases cannot keep pace.

Page 13: the sentence beginning “There is however enough trouble…” reads as a non sequitur from the prior sentence, which says nothing about “perpetuating two or more parallel worlds”

RESPONSE: This statement was removed.

Finally, the article’s pessimistic concluding sentence undermines the value of the paper as a whole: if the improvement is so unlikely, why take readers’ time to tell them about the problems? Perhaps the article could end on a more positive note by exhorting the community (or just CAS?) to devise creative new business models which will enable greater integration of public and private chemical databases while retaining the strengths of both models.

RESPONSE: We have heeded this suggestion and proposed the use of InChI alongside SMILES – (CAS does not use this) that would allow comparison with other databases. We also proposed encouraging more analysis as well as a meetings between the major parties to discuss what can be done to resolve the on going situation. We have also used the suggestion of encouraging some creativity on the business side.

The second round of reviews:


Responses to Reviewers’ Comments to Author: Reviewer: 2 Comments: The authors have revised their manuscript and improved its readability. In addition, a number of irrelevant references have been eliminated. The discussion continues to be dominated by database technicalities (the majority of citations include cheminformatics journals or technical resources) with limited relevance for medicinal chemistry. The main body of the manuscript is akin to a collection of experience values trying to retrieve compound or patent information from various database sources. Unfortunately, the revised manuscript still lacks case studies and/or conclusions that would render it relevant for publication in J. Med. Chem. As presented, main points include the “lack of integration between isolated public and private data repositories”, the “multi-stop-datashop” theme, the quest for a “shift towards more collaboration or openness in terms of availability of chemistry and biological data”, and the “major hurdles that exist to prevent this from happening”. With all due respect, but this is all commonplace. The revised manuscript is now at least readable and conforms to formal publication requirements (although the quality of the few display items is rather poor and the reference list still includes numerous inconsistencies). Given the strong focus on technical aspects associated with database use, one might best advise the authors to submit their revised manuscript to an informatics-type journal where it would probably find an audience. The best choice might be J. Cheminf. that is rather technically oriented (and from which several of the cited journal references originate).


Response:  “The main body of the manuscript is akin to a collection of experience values”. Respectfully, we would like to make it clear that this is the point of our article. Here for example is a medicinal chemist trying to find the probes and decide based on data whether they actually should be probes in the first place. We are describing his experience and that of others in finding information on molecules. This is highly relevant to medicinal chemistry. We are not making molecules in this paper but the starting point for medicinal chemistry is HTS screening hits and these probes could (and some would argue) represent such molecules. The NIH spent over $500M dollars to produce these 300 or so ‘hits’ therefore the process we have undertaken serves to show the challenges and solutions to finding information on chemicals that may influence future chemistry decisions. We do not accept the suggestion that our article has “limited relevance to medicinal chemistry”. We are not aware of anyone using the whole set of NIH probes as the backdrop to such a discussion. Our article is much more than the sum of the “main points” presented by the reviewer as “all commonplace”. For example some of the issues around prior-art searching by virtual compounds could impact the composition of matter patentability of a new medicinal chemistry lead. The authors have experience in medicinal chemistry, cell biology, bioinformatics, analytical chemistry, cheminformatics and drug discovery, and I would say that we have approached it from a balanced perspective drawing from all of these perspectives, and not solely cheminformatics. It is not best suited to keep this article in a cheminformatics journal as it needs a wider audience of medicinal chemists if we are to promote some realization of the situation and effect change.

Reviewer: 1 Comments: All of my issues (reviewer #1) were addressed in the re-submitted manuscript.  The added glossary is very helpful.  This is a much improved article with the changes in the text.  Thank you.


Response: Thank you

Reviewer: 3 – Review attached. The revised version is dramatically improved but requires further editing, for clarity, specificity, and

grammar. Detailed recommendations follow.


Response: Thank you – These are predominantly minor edits, which we have dealt with appropriately.

Page,Line Comments

2,12 “bioactivity” “bioactivity data”

Response: Thank you


2,29 “so called” “so-called”

Response: Thank you


2,34 delete “importance to the”

Response: Thank you


3,42 define “multi-stop datashops” or else don’t use it

Response : added ‘the afore mentioned’…


3,42 what does “constitutive” mean here? consider deleting it

Response – replaced with essential


3,44-47 is there some reason the divergence between public and private DBs is of greater concern than the divergence between different public DBs? If not, then adjust text accordingly. If so, then explain why.

Response – explanation added


3,52 “potentially others”. I suggest mentioning one or two potential others. .

Response, It was useful to point this out. Since CAS is the largest by far “potential others” has been removed


3,54-55 what does it mean that “CAS likely document their efforts to ensure high quality

curation…”? My impression is that it’s not any documentation of efforts, but the

efforts themselves which matter, anyhow.

Response, agreed documentation removed


4,8 “warranty”: these database do not warranty the data at all, so this word use seems,

well, unwarranted.

Response – agreed so sentence shortened


4,12 define “submitter-independence” or say this some other way.

Response : data quality issues arise that are independent of the submitter 15


4,15 “Logically, however…” The “however” seems out of place, as the subsequent text

does not contrast with what came before.

Response: Deleted “however” – preceding sentences describe data quality


4,15 define or reword “extrinsically comparative database quality metrics”.

Response: Deleted “extrinsically”


4,36 add comma after “million”

Response: Thank you


4,45 It’s not clear that citation 18 supports the text referencing it

Response : these are correct references


4,48 add comma after “databases”

Response: Thank you


4,50 after “GDB”, replace comma by semicolon; add comma after “scale”; delete

“small”, as “boutique” already implies smallness

Response: deleted boutique


5,8 I think “simple molecular formula” would be clearer as “empirical formula”

Response: Thank you


5,8-9 Delete “other” and “with the identical formula”

Response: changed to same atomic composition


5,16 insert comma after “showed that”

Response: Thank you


5,18-19 Either delete “and suggested they were complementary” or rewrite it so that it

adds something to the meaning. As written it seems obvious that, if these DBs have

different data, they are complementary.

Response: This has been shortened.


5,30 replace “if” by “whether”

Response: Thank you


5,42 Figure 1: it’s not clear what we learn from this. More importantly, the caption

seems wrong, as it says this it shows “The ‘usual suspects’ lineup”, where the main

text defines usual suspects as compound with liabilities or reactivity. Clearly, Figure

1 is not the list of all “usual suspects”. In fact, most of the compounds do not

appear to be suspects at all. Also, in the captions, it is not clear what “desirable”

means. Desirable in what sense?

Response : moved location of “usual suspects”, Desirable was used according to the definition = worth having or wishing for (concise Oxford dictionary).


5,52 eleven scientists does not sound like “crowd sourcing”; the number seems too few

for a crowd.

Response : The Oprea et al paper, “A crowdsourcing evaluation of the NIH chemical probes” published in Nature chemical biology in 2009, used the term “crowdsourcing” to describe the study. We agree with this usage and for clarity will continue to use the term to refer to this study.


5,54 “acceptable” in what sense? this needs to be defined.

Response : Oprea et define ‘acceptable’, we are not judging their criteria here or using their data.


6,20 what is meant by “feasibility of pursuing a lead”? Presumably, it is feasible from the

chemical standpoint. If this is an issue of IP, then how does it differ from “freedom

to operate”? If it is the same, then delete it.

Response deleted.


6,34 “leads””lead”

Response: Thank you.


6,53 it is not clear what Figure 2 adds to this article. Also, I am concerned that clustering

compounds based on a Tanimoto similarity measure of 0.11 (see figure legend) 11

is probably not meaningful. At least for Daylight-style fingerprints, 0.11 would

normally be considered not very similar at all.

Also in the figure legend, we have “Each of the clusters and singletons: for each

cluster….”. Something is wrong with the punctuation. And we read that blue

indicates “high confidence”; but high confidence of what, exactly?


We have clarified that the threshold was chosen empirically to show a representative selection of probes. We have updated the legend and added the reference which describes how molecules were scored (recently published).


7,6 add comma after “databases”

Response: Thank you


7,13 add “that” before “solutions”

Response: Thank you


7,30 “very high binding affinity”—this is too vague for a scientific audience. Please

provide some quantitative cutoff, even if a bit rough.

Response this is the definition from ref 26


7,46 delete “use”

Response: Thank you


7,51 “Substance number… requires added effort to find the salient chemistry details”—

what does this mean? What are “salient chemical details”? Is it something other

than the chemical structure? My own impression is that an SID takes one smoothly

to a compound in PubChem, so I’m not sure what the issue is here.


Response: SID identifies a depositor-supplied molecule. (SID), is assigned by PubChem to each unique external registry ID provided by a PubChem data depositor. The molecule structure may be unknown, for example a natural product identified only by name or a compound identified only by an identifier. The depositor record could be a mixture of unknown composition. The molecule in a SID may be racemic, a mixture of stereoisomers, a regioisomer of unknown composition, a free acid or free base or a salt form. The data depositor may not be a chemistry expert or may be confused by chemistry structure. By way of contrast CID is the permanent identifier for a unique chemical structure but the unique structure still can be a mixture of enantiomers or stereoisomers. To properly perform a medicinal chemistry search one must know in structural terms what is being looked for. Therefore CID is the definitive identifier. Sometimes the relationship between SID and CID is clear, sometimes it does not exist.

Further information can be found on this at: https://pubchem.ncbi.nlm.nih.gov/docs/subcmpd_summary_page_help.html#MoleculeSID

For detail on SID


Note: Although identifiers are unique within a PubChem database, the same integer can be used as an identifier in two or more different databases. For example, “2244” is a valid identifier in both the PubChem Substance and PubChem Compound database, where: SID: 2244 is the PubChem Substance database record for cytidylate, and CID: 2244 is the PubChem Compound database record for aspirin.


8,3 “would not retrieve a CID” – or an SID? In any case, it would be appropriate to

identify one or two sample ML Probes which have this problem, rather than just

making the assertion.


Response: ML213 is an example of the problem posed by SID and CID. The SID for this probe correctly and uniquely identifies the molecular substance, whatever it is was, that was tested in the assay for this probe. The corresponding CID for ML213 depicts a deceptively simple single structure of a norbornane carboxylic acid amide that is both chiral and whose chiral center is capable of epimerization. Thus this compound can potentially exist as four stereoisomers. From the CID number and structural depiction, the chemist does not know what was actually tested. One must read in the chemistry experimental for ML213 in the NIH probe book to find that a mixture of all four stereoisomers was actually made and that the stability curve of the mixture change over time and that the change is tentatively attributed to solubility changes. The net conclusion is that a chemist does not know anything about the activity or inactivity of the four stereoisomers. Chemical abstracts lists ML213 as the same structural depiction as shown in the CID and thus does not help in resolving the structural and biological ambiguity. The typical biologist would completely miss all the structural complexity and ambiguity. To ferret all this out by a chemist may or may not be possible for ML213 but if possible requires substantial effort.


8,39 “a CAS registry number” “CAS registry numbers”

Response: Thank you


8,55—9,37 This whole paragraph is apparently devoted to explaining why it is worth finding out if compounds similar to the one of interest have other biological activities. The rationale given is something along the lines that similar structures can have a wide

range of biological activities. I’m not sure all of these words are needed however;

surely it is worth knowing if the compound one is trying to patent has potential

side-effects. ”

Response: We think it is worth “a whole paragraph” on explaining this because people tend to forget that many chemical motifs are reused and why these are important.


9,48 add comma after “law”

Response: Thank you


9,55 “the well known issues…”: which well-known issues? And, do the authors mean

these are well-known for SciFinder in particular, or more broadly?

Response: “More broadly” has been added.


10,20 add comma after “Registry number”

Response: Thank you


10,45 replace last comma by a period

Response: Thank you


10,49-54 Why should the reader care that 132,781 compounds were specified in the HTS but not referenced as “use” compounds?

Response : These numbers simply explain the explicit and referenced content of this patent to the reader. 20% of the probes contain a sole patent reference to this patent whose biology is completely different than in any of the MP documents and this comes from only about 5000 out of 132,000 compounds in the HTS being abstracted. This unique case illustrates the havoc possible from extreme (inappropriate) data disclosure and abstraction.


10,54 Insert comma after “Thus”

Response: Thank you


11,3 Please explain how “due diligence searching can be confounded” by a patent like

this. What is the problem that it generates? I don’t think most readers will know

Response: we explain this in the following sentences.


11,18 Similarly, what are the “potentially harmful consequences” for IP due diligence.

Response: we explain this in the remainder of the article.


11,34 after “disclosure”, there should be a colon

Response: Thank you


11,37 add semi-colon after “SciFinder”

Response: Thank you


11,39 add comma after “screening data”

Response: Thank you


11,42 The text says Table 2 lists NIH probe molecules, but the second and third entries in the table do not appear to be NIH probe molecules. Clarify or correct.

Response this is now clarified and additional probes added.


11,49 add comma after “difficult”

Response: Thank you


12,29 add comma after “compounds”

Response: Thank you


13,20 add comma after “challenge”

Response: Thank you


13,51 add comma after “protocols”

Response: Thank you


14,3 replace “the question are” by “whether”

Response: Thank you


14,8 add comma after “dramatically”

Response: Thank you


14,11 add comma after “databases”

Response: Thank you


14,25 Delete “By definition….all of them.” It seems trivial and obvious. Or else change it

to say something nonobvious.

Response: changed to ‘By definition, no quantitative assessment across databases is possible without access to all of them, and to our knowledge this has not been undertaken to date.’


14,32 delete “aggregate”, unless it adds something .

Response: Thank you


14,42 add comma after “ChemSpider”

Response: Thank you


14,48 “require quantification”—why is quantification required? what is the expected

benefit? .


Response: quantitative statistics are essential for objective comparisons so structure matching is now specified in the text


15,27 Delete “To conclude, from our observations”

Response : this has been changed


15,31 delete “isolated”

Response: Thank you


15,50 either delete or clarify “at the extremes”. I’m not sure what it adds. In fact, if the

cases considered in this article are “extremes”, one might argue that the concerns

raised throughout are not that important, since presumably most users will not

have extreme experiences with the databases.

Response: extremes deleted


15,53 add comma after “Probes”

Response: Thank you


16,15 delete “also”, as we already have “in addition”

Response: Thank you


16,27 Again, “multi-stop datashop” is ill-defined.

Response: this was defined earlier


16,34 Delete “(OSDD)”, since this abbreviation is not used subsequently.

Response: Thank you


16,34 add “, and” after “similar?’”

Response: Thank you


16,45 delete “(commercial or academic)”; doesn’t seem to add anything

Response: Thank you


16,48 add comma after “same answer”

Response: Thank you


16,48-49 change “operate (i.e….. structure terms).” to “operate; i.e., ….structure terms.”

Response: Thank you


17,47 add comma after “same compounds”

Response: Thank you


Well if you made it this far you perhaps realize that the time spent actually finding the NIH probes and doing the due diligence by Chris and our modeling efforts were virtually matched with the time and effort spent responding to multiple rounds of peer review. The Minireview is now available so see if it was worth all the effort.. [as of Dec 9th it has been recommended in F1000Prime ]

So what can I say in conclusion, well as with previous challenges getting contentious issues published again it takes perseverance and reviewer comments and journal responses were a mixed bag. I hope it alerts other groups to the set of probes which are now available in the CDD Vault and elsewhere. In addition Alex Clark has put them into his approved drugs app as a separate dataset – and it is available for free for today only.. The challenge of public and commercial chemical databases will likely continue, but the impact for due diligence is huge, you can no longer rely on Scifinder as a source of chemistry information. The Chemistry data and databases are exploding, and moving fast. Journals and scientists need to wake up to what is going on to. The groups developing chemical probes, need an experienced medicinal chemist to help them and journals that publish papers on chemical probes need strict peer review and dues diligence of a probes quality. A model may be a way to flag this in the absence of the actual chemist.

On the general publishing side, I frequently get comments about publishing in the same journals, well my response is when I try to break out of the mould and try to reach a different audience I get a luke warm, or down right chilly response. Having never published in ACS Chemical Biology or Nature Chemical Biology I tried there first. I did not have an ‘in” I could rely on, no buddies that can review my papers favorably. Even when we have been encouraged by an Editor to write something for a journal such as another recent review paper with Nadia on stem cells, initially targeted at Nature Genetics, that does not guarantee it will see the light of day in that journal. After submission to other journals like Cell Stem Cells, finally it was published in Drug Discovery Today. I can say again that publishing in F1000Research is a breeze by comparison to going for the above traditional big publisher journals, I appreciate the open peer review process and transparency as can be seen in another recent paper ..

I hope by putting this post together people realize what it takes to get papers out. I owe a huge debt of gratitude to Chris Lipinski for inspiring this work and for doing so much to raise this issue and Nadia for driving the analysis of the probes, and our co-authors for their support and contributions to writing, re-writing and re-writing again!

Update Dec 15 2014…the J Med Chem Minireview gets a mention on in the pipeline..




Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>