Dec
19

In Memorium: Dr. Michael Rosenberg

I only found out today that Dr. Michael J. Rosenberg had recently died tragically. I was very fortunate to have interacted with Michael on several occasions. First he wrote a book chapter in 2006 for my first edited book, and then he wrote a book of his own on Adapitive clinical trials that was part of a series at Wiley published in 2010. Michael was a pleasure to interact with and very energetic. He had a major impact on making clinical trials more efficient and developing technologies for them which will be felt for years to come. He was also a successful businessman, founding the CRO here in RTP, Health Decisions.  My condolences go out to his family and colleagues.

 

 

 

Dec
15

A year in publications – 2014

A year in collaborative publications, the ups and downs and a few random comments as well (with a big thanks to all involved):

1. Ekins S and Freundlich JS and Coffee M, A common feature pharmacophore for FDA-approved drugs inhibiting the Ebola virus, F1000research, 3: 277, 2014.

This came initially from a Twitter exchange of papers containing FDA drugs. Pretty speculative. Initially was part of a much bigger paper (which is a story in itself). Several other ideas came at around the same time and hopefully they will see the light of day.

2. Ekins S, Collecting rare diseases, F1000research, 3, 260, 2014.

I was asked by F1000Research to put a collection together. This highlights some of the difficulties patients have in getting their ideas and work published.

3. Litterman NK, Rhee M, Swinney DC and Ekins S, Collaboration for rare disease drug discovery research, F1000research, 3:261, 2014.

This is the result of a good collaboration from 4 diverse backgrounds, I connected to one co-author via Twitter.

4. Dong Z, Ekins S and Polli JE, A substrate pharmacophore for the human sodium taurocholate co-transporting polypeptide, 478(1):88-95, 2014.

This manuscript came together pretty quickly in 2014, I think it’s the first such paper on NTCP substrates.

5. Lipinski CA, Litterman N, Southan C, Williams AJ, Clark AM and Ekins S. Parallel worlds of public and commercial bioactive chemistry data, J Med Chem, In Press 2014.

This project started from a discussion and was recently covered in detail here.

6. Litterman N, Lipinski CA, Bunin BA and Ekins S, Computational Prediction and Validation of an Expert’s Evaluation of Chemical Probes, J Chem Inf Model, 54:2996-3004, 2014.

This project started from a discussion and was recently covered in detail here and here.

7. Litterman N, and Ekins S, Databases and collaboration require standards for human stem cell research, Drug Disc Today, In press 2014.

This was initially an idea from discussion with the editor of Nature Genetics. It was rejected by that Journal. We also tried several other journals. I think it’s a great proposal / idea and could be achieved very readily. The challenge is how to get groups on board.

8. Ekins S, Freundlich JS and Reynolds RC, Are bigger datasets better for machine learning? Fusing single-point and dual-event dose response data for Mycobacterium tuberculosis, J Chem Inf Model, 54:2157-65, 2014.

Possibly the logical extension of the TB machine learning papers. Combining all datasets from the SRI/NIAID work.

9. Ekins S, Hacking into the granuloma: could antibody antibiotic conjugates be developed for TB? Tuberculosis, 94(6):715-6, 2014.

This came from a discussion over dinner when I was asked for a crazy idea. I then pulled together the basis of the commentary. It’s a pretty simple idea, building on whats been done for cancer but as far as I can tell never tried for TB. Next step is to actually do it.

10. Ekins S, Clark AM, Swamidass SJ, Litterman N and Williams AJ, Bigger data, collaborative tools and the future of predictive drug discovery, J Comp-Aided Mol Design, 54:2157-65, 2014.

An invited review for the journal, took a good amount of effort to put this together, pulling different ideas into a cohesive document. I like the end result.

11. Ekins S, Nuermberger EL and Freundlich JS, Minding the gaps in Tuberculosis research, Drug Disc Today, 19:1279-82, 2014.

This brief commentary takes the JCIM paper below and expands it. We tried Science Translational Medicine (rejected after review), Trends in Microbiology (triaged at proposal stage),

12. Sames L, Moore A, Arnold RJG and Ekins S, Recommendations to enable drug development for inherited neuropathies: Charcot-Marie-Tooth and Giant Axonal Neuropathy, F1000Research, 3:83, 2014.

This paper came out of the work we put into writing a RDCRN grant proposal in 2013 which we are still mining for additional grant proposals. A great collaboration with Parent/ patient advocates. This also marked our first submission to F1000Research.

13. Clark AM, Sarker M and Ekins S, New target predictions and visualization tools incorporating open source molecular fingerprints for TB Mobile 2.0, J Cheminform 6: 38, 2014.

This paper really highlights the incredible work of Alex Clark. How we took the update for the mobile app and added models, made descriptors open source and more.

14. Ekins S and Perlstein EO, Ten simple rules of live tweeting at scientific conferences, PLOS Comp Biol, 10(8):e1003789, 2014.

This little editorial was the surprise of the year for me and I have discussed its formation previously. An idea we had walking from a conference on our way to dinner. It took a while for this paper to get published.

15. Ekins S, Pottorf R, Reynolds RC, Williams AJ, Clark AM, Freundlich JS, Looking back to the future: predicting in vivo efficacy of small molecules versus Mycobacterium tuberculosis, J Chem Inf Model, 54:1070-82, 2014.

All the work for this paper was performed in 2013. We tried J Med Chem first before JCIM.

16. Dong Z, Ekins S and Polli JE, Quantitative NTCP pharmacophore and lack of association between DILI and NTCP inhibition, Eur J Pharm Sci, 66:1-9, 2014.

A paper that was written based on work from 2013. We had to try a few journals before this one made it out there.

17. Krasowski MD and Ekins S, Using cheminformatics to predict cross reactivity of “designer drugs” to their currently available immunoassays. J Cheminform 6:22, 2014.

A paper written early this year from work Matt Krasowski and I did in 2013, more investigation of Bath salts and similarity to immunoassays.

18. Krasowski MD, Drees D, Morris CS, Maakestad J, Blau JL and Ekins S, Cross-reactivity of Steroid Hormone Immunoassays: Clinical Significance and Two-Dimensional Molecular Similarity Prediction, BMC Clinical Pathology, BMC Clin Pathol, 14:33, 2014.

A paper written in 2013 from work done in 2012 with Matt Krasowski, looking at steroids and immunoassays cross reactivity.

19. Godbole AA, Ahmed W, Bhat RS, Bradley EK, Ekins S and Nagaraja V, Inhibition of Mycobacterium tuberculosis I by m-AMSA, a eukaryotic type II topoisomerase poison. Biochem Biophys Res Comm, 446:916-20, 2014.

Written from 2012-2013, a collaboration with a group in India as part of the MM4TB project. The first of 2 papers using docking for this target.

20. Ekins S and Williams AJ, Curing TB with open science, Tuberculosis, 94:183-5, 2014.

Written with Tony in 2013, from a discussion we had one day over coffee..what if there was more open science for TB?

21. Kandel BA, Ekins S, Leuner K, Thasler WE, Harteneck C and Zanger UM, No activation of human PXR by hyperforin-related phloroglucinols, JPET, 348:393-400, 2014.

Written in 2013, a collaboration with a German group, I generated all the PXR model predictions. One of the few examples of a “negative data” paper being published that I have been involved with!

22. Ekins S, Casey A.C, Roberts D, Parish T. and Bunin BA, Bayesian Models for Screening and TB Mobile for Target Inference with Mycobacterium tuberculosis, Tuberculosis, 94:162-9, 2014.

Written in 2013, as the third external evaluation of TB Bayesian models published to date.

23. Ekins S, Freundlich JS, Hobrath JV, White EL, Reynolds RC, Combining computational methods for hit to lead optimization in Mycobacterium tuberculosis drug discovery, Pharm Res, 31: 414-35, 2014.

Written in 2013, using data from the SRI ARRA grant which made a very useful test set for the various TB machine learning models.

24. Ponder EL, Freundlich JS, Sarker M, Ekins S, Computational models for neglected diseases: gaps and opportunities, Pharm Res, 31: 271-277, 2014.

This was written in 2013 primarily using data collected for a grant proposal. It’s a very brief summary of where computers have been used for these diseases too.

25. Ekins S, Progress in computational toxicology, J Pharmacol Toxicol Methods, 69:115-140 2014.

This was written in 2013 initially as a book chapter, the editor wanted to change it dramatically and I did not so opted to turn into a review.

Dec
11

R&D jobs in pharma are snow leopards – scientists must embrace social media now!

I was inspired by my friend Robert Moore to write this post. He had written back in October on how to find members of the C-suites at businesses, which are positions treasured by marketeers. He compared CEOs to snow leopards, a very rare species, that can be found if you are smart and know where to look. Robert described how to find them by the content they shared from business publications. I have kept this burning in the back of my mind because its a beautiful image, until a few circumstances have made me think of parallels elsewhere.

Wednesday Dec 3rd, GSK announced they would cut 900 R&D jobs in RTP here in North Carolina. This is but another example in the long line of big pharma layoffs over the past decade. But its not just big pharma that is laying off scientists, it is the likes of Purdue, and this is also happening in Israel with Teva and France with Pierre Fabre etc. It also makes you wonder what Merck will do once they digest Cubist. If we needed more evidence of big pharma’s failure to innovate itself, then this would be it. If you are a company that relies on researchers buying your wares then this is a wake up call too. Finding customers in pharma may be very similar to finding that snow leopard and its going to get harder. Where will those customers end up in future, how will we find them again if we do not track them?

Well it is looking increasingly like the R&D for future drugs will come predominantly from small companies or academia. More ex-big pharma scientists will be in these organizations or they will start their own company, perhaps working  initially as consultants. That is where we should be looking for the drugs for the next decades to come. We will see this shift as scientists update their LinkedIn profiles, update their Facebook pages and maybe even tweet if they are lucky to find a new job. I think this also points to the importance of scientists marketing themselves using social media. Those days when scientists could just rely on patents, publications or their ability on the speaker circuit to market their abilities are perhaps resigned to the past.

Networking by social media is likely a huge asset as hiring companies look (Google you) before they interview. If you are like me, you may feel like a social media dabbler. I exploit LinkedIn, Twitter, this blog, and a whole array of other tools like Slideshare, Figshare, Kudos to raise awareness of the science, projects, articles I collaborate on and skills on offer. I wonder is it enough? I am barely scraping the surface of what is out there and honestly it is a challenge to find time to keep up. I am not the only one both taking this approach and likely feeling the pain.  So the challenge for companies that want to sell to me will be knowing what to look for as people like me spread themselves thinly across social sites in the hope of finding someone that will hire them one day or pass their details along.

What do I want to buy? I can tell you that if I had someone that could take care of my own ‘personal marketing’ that would be fantastic. Someone that could update my Kudos pages, tweet for me, and even write these posts! I can imagine a future full of these social media assistants. Software exists on the other end to find people for marketing purposes but my guess is its not being used nearly as much as it could be. You could say the same for trying to find patients for clinical trials. Its likely that recruitment by social media will be the norm. Will recruitment for R&D jobs by social media also follow suit? I have this image of warehouses full of people mining Twitter and other social media hubs, finding targets, be they customers, patients or people they want to connect others to.

Some of the ways you as a scientist can raise your profile and do it in a way that’s not equated to spam are as follows:

1. You could tweet at conferences - This could be useful to others and people will follow you for doing this.

2. You could capture your papers in tools like Kudos and explain them in simple terms, combine other content that might increase their audience.

3. You could be ahead of the curve and write a blog post on something that is timely, a scientific observation or just what you are working on – this could be as a guest for someone else’s blog, and just put what you do into simple language. You could even put something informative on your Linkedin profile.

We are embarking on a new era, the scientist that is connected, no longer bound by the walls of the lab but connected to the world. Collaboration will be even more important, software that facilitates these collaborations will be essential. Mobility will be important as will the tools that they use.

GSK can only hope that those last employees leave the building next year and clean the whiteboards after them this time. I would also encourage them over the next few months to embrace social media so they can be found by those companies or other organizations that are hiring. As a scientist your profile and social media persona matters, you do not want to be the snow leopard.

Dec
04

Chemical probes and parallel database worlds – who wants to know? More publishing fun

 

 

This post is long and a highly detailed description of the challenges involved in getting scientific work published on one level, on another it gets to the heart of discoverability of data, data analysis and just the slog of publishing something that you hope is going to interest others in your direct field. You need to persevere and have an incredibly thick skin.

Yesterday I presented our recent work “Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s evaluation of the NIH chemical probes” at the In Silico Drug Discovery conference held at RTP. This work described a couple of recent collaborative publications, one of which was described in an earlier post as a very expensive dataset that included as many of the NIH Probes as we could gather.

Actually the whole project kicked off earlier in the year when I was visiting the CDD office in CA. Chris Lipinski, a long time board member was describing the challenges he was facing trying to find the “NIH Probes” and the incredibly detailed due diligence he was undertaking. Chris was doing this huge amount of work and if I remember correctly I just threw it out there that we should be modeling his score. This was another once of those moments where saying and doing it are completely possible but entailed a lot of work. I had no idea who or what would benefit from doing it, but it would be pretty interesting to see if a machine learning method could be used to help a medicinal chemist with the due diligence process, at least slim down the interesting compounds. Along the way of course you learn unexpected things and these have value. I had no idea during the initial idea what a Pandora’s box would be opened.

With Nadia Litterman and Chris we went through multiple iterations of model testing and inevitably we threw in a few other approaches to score the probes such as ligand efficiency, QED, PAINS and BadApple. Barry Bunin also helped us to interpret the descriptors we were finding in the Bayesian models. As you can see the scope of what we embarked on expanded greatly (and if you read the paper it will be even clearer). Chris spent countless hour scoring over 300 compounds. As we went through the write up process after a first pretty complete version, I realized we had more than just a modeling paper, there was also this complex perspective on using public and commercial chemical databases. Through past collaborations with Christopher Southan, Antony Williams and Alex Clark I thought they would be able to chime in too. In the end we had pretty diverse thoughts on the topic of public and commercial chemistry databases.

The NIH probe modeling paper was submitted to ACS Chemical Biology initially. We thought this was a good choice as this journal publishes many manuscripts that describe new chemical probes and our research may help in improving the quality of these molecules. We had the following reviews for the modeling paper from ACS Chemical Biology – needless to say it was rejected. The reviewers comments are perhaps useful insights and may indicate why so many shoddy probes get published in this an other journals.

Reviewer(s)’ Comments to Author:

Reviewer: 1

Comments to the Author
This publication details the creation of various computational models that supposedly distinguish between desirable and undesirable small molecules based on the opinion of one experienced medicinal chemist, “C.A.L.” – presumably Chris Lipinski.  Although Lipinski’s rule of 5 filters have been widely discussed, and Lipinski’s opinions are generally highly regarded, the authors also point out a key publication of Lajiness et al., reference # 8, in which it is noted that a group of 13 chemists were not consistent in what they rejected as being undesirable.  The logic is inescapable.  If 13 chemists are not consistent in their viewpoints, then why should one chemist’s viewpoint be any better than any of the others?  And, since Lipinski’s filters have already been widely discussed in the literature and are readily available in several cheminformatics packages, what is the new, useful, and robust science here that is going to aid screening?  What is the new value in having some kind of new computational filtering scheme that supposedly reproduces Lipinski’s viewpoint.  Unless it can be clearly shown that this “mechanized” viewpoint does a much better job at selecting highly useful chemical matter without high false negative and false positive rates relative to say, 12 other reasonably experienced medicinal chemists, I see little value in this work and I do not recommend publication.  The publication does not currently demonstrate such an advantage.

Reviewer: 2

Comments to the Author
This submission makes appropriate use of Bayesian statistics to analyze a set of publically available chemical probes. The methodology is clearly described and could have general applicability to assess future probe molecules.

I would have liked to see a more critical assessment of the process that has lead to around 20% of all new probes being rated as undesirable. The authors suggest that the concepts of rule-of-five compliance and ligand efficiency appear to have become accepted by the chemical biology community, while other factors such as the use of TFA for probe purification and sub-structural features have not become accepted. My own experience would implicate lack of awareness of these negative factors in groups involved in probe synthesis, since they often lack access to the “in house medicinal chemistry expert” suggested by the authors.  In addition, the substructure features are often encoded in a way that they are not accessible to the target community.

The authors also hint that the quality of vendor libraries might be behind the issue. A reminder (reference) that the final probe is likely to resemble the original hit might help.

I would also like to see a proposal for making the Bayesian models available to a wider community. As a recent CDD user, I note that they outline a CDD Vision, which might be a route to encouraging usage of the current models.

reviewer 3

The current work attempts to create a model that will faithfully match the opinion of an experienced medicinal chemist (Dr. Christopher Lipinski) in distinguishing desirable from undesirable compounds. The best model (Bayesian Model 4) is moderately successful (ROC = 0.735 under 5-fold cross-validation).

An important unanswered question is whether the best model performs as well as published filters such as PAINS and the Lilly, Pfizer, Abbott, and Glaxo rules. PAINS and the Lilly rules are available on public websites (http://mobyle.rpbs.univ-paris-diderot.fr/cgi-bin/portal.py#forms::FAF-Drugs2 and http://tripod.nih.gov). The Pfizer, Abbott, and Glaxo queries are available in their respective publications (see refs 30-32 in Rohrig, Eur J Med Chem 84:284, 2014). Most of the “bad features” in Figure S8 look like they should match PAINS filters, but it isn’t possible to tell for sure without having the structures of the undesirable compounds (see the next paragraph).

Although I respect Dr. Lipinski, taking his assessments as “truth” in building a model is a stretch. Without seeing the structures of the desirables and undesirables, I have a hard time knowing what this study is trying to model. The Methods section indicates that the data set is available on the Collaborative Drug Discovery site, but I wasn’t able to find it there, although I did find quite a few other items that would be useful to chemists involved in screening and lead generation.

Why use just one medicinal chemist? There are a lot of experienced medicinal chemists who are retired or out of work, so it seems to me it wouldn’t be hard to assemble a panel of chemists to rate the compounds. Given the amount of money that NIH has spent on their screening initiative, maybe they would be interested in sponsoring such an exercise? Do the N57 and N170 datasets add value? The N307 set gave the best model, and if you want to do a chronological train/test split the N191 set would serve that purpose. [By the way, a chronological train/test split is a more rigorous test than a random split, so I am glad to see it used here.]

References 29, 39, and 48 seem to refer to websites, but no URL is given. If you are using EndNote, there is a format for referencing websites.

In the legend to Table 1, it mentions that mean pKa was 8.12 for undesirable and 9.71 for desirable compounds. Since these pKa values are greater than 7.4, wouldn’t these compounds be uncharged at physiological pH? I’m wondering why they are classified as acids.

 

Then we submitted essentially the same manuscript with minor edits to the Journal of Chemical Information and Modeling.  the reviews and our responses are shown below.

Reviewer(s)’ Comments to Author: Reviewer: 1 Recommendation: Publish after major revisions noted. Comments: The manuscript by Litterman and coworkers describes the application of state-of-the-art cheminformatics tools to model and predict the assessments of chemical entities by a human expert. From my perspective this is a relevant study for two main reasons: first, it is investigated to which extent it will might possible to standardize the assessment of the quality of any chemical entity. And secondly, the paper addresses a very important question related to knowledge management: is it possible to capture the wisdom of an experienced scientist by an algorithm that can be applied without getting direct input, for instance when the scientist has retired.

RESPONSE: Thank you

However, there are some fundamental points which I recommend to be adressed before the manuscript can be accepted for publication in JCIM. (1) It is suggested that the models which were trained from the expert’s assessment of the NIH probes can be used to identify desirable compounds (last paragraph). Here it should be clearly emphasized that the models are able to classify compounds according to the expert’s a priori definition of desirability. It remains to be seen whether these probes are valuable tool compounds or not. Some of them might turn out to be more valuable than they would be assessed today (see also Oprea at al., 2009, ref 1).
RESPONSE:  – The paragraph was changed to – A comparison versus other molecule quality metrics or filters such as QED, PAINS, BadApple and ligand efficiency indicates that a Bayesian model based on a single medicinal chemist’s decisions for a small set of probes not surprisingly can make decisions that are preferable in classifying desirable compounds (based on the expert’s a priori definition of desirability).

(2) Neither the QED nor the ligand efficieny index has been developed to predict the medchem desirability score as it is described in this article. QED for instance was derived from an analysis of several molecular properties of orally absorbed drugs. It is therefore not suprising that e.g. the QED score shows a poorer performance than the Bayesian models when predicting the desirability scores of the validation set compounds. In the way the comparison with QED and LE is described the only valid conlcusion that can be drawn is that QED and LE on one hand and the medchem desiability score don’t agree. One can’t conclude that the methods perform comparable or that one outperforms the other.
RESPONSE:  -We agree and perhaps would add that drug likeness methods do not represent a measure of medicinal chemist desirability. We state in the introduction “In addition we have compared the results of this effort with PAINS 22, QED 24, BadApple 28 and ligand efficiency 25, 29.”

In the methods we have reworded it to “The desirability of the NIH chemical probes was also compared with the quantitative estimate of drug-likeness (QED) 24 which was calculated using open source software from SilicosIt  (Schilde, Belgium).”

In the results we have reworded, “We also compared the ability of other tools for predicting the medicinal chemist’s desirability scores for the same set of 15 compounds. We found neither the QED, BadApple, or ligand efficiency metrics to be as predictive with ROC AUC of 0.58, 0.36, and 0.29 respectively. Therefore these drug likeness methods do not agree with the medicinal chemist’s desirability scores.”

In the discussion we edited to, “A comparison versus other molecule quality metrics or filters such as QED, BadApple and ligand efficiency indicates that a Bayesian model based on a single medicinal chemist’s decisions for a small set of probes not surprisingly can make decisions that are preferable in classifying desirable compounds (based on the experts a priori definition of desirability).”

(3) Taking 1 and 2 into account, the title is misleading: the expert’s assessment can only be validated by later experiences with the probes (i.e., were they found to be frequent hitters  etc). The models described in the article can only be validated by comparing predicted expert’s assessments with the actual assessments for an independent set of molecules.

RESPONSE:  -We would suggest that the title is correct because we built models that predicted molecules not in the training set for which the experts assessment was predicted and this assessment in turn included literature on biology, alerts etc. By predicting accurately the scores of the probes not in the training set, we have validated the model. The scored NIH probes that were not in the 4 iterative models in each phase are described (see Table 5 for statistics for external testing for each model). We otherwise agree with the reviewer that our computational model does not address the utility of the expert medicinal chemist’s judgment, which will be born out through future experimentation.

(4) It would be very important to judge the relative impact of “objective” criteria such as the number of literature references associated to a particular compound and “subjective” criteria like the expert’s judgement of chemical reactivity to the final desirablity assessment. A bar chart (how many compounds were labeled as undesirable b/o reactivity, how many b/o literature references etc) would help.
 RESPONSE: We agree that this is an important point. We have added a new figure (Figure 1) a pie chart to display how many compounds were labeled as undesirable due to each criteria. Approximately half of compounds are judged undesirable due to chemical reactivity.

(5) How is publication bias taken into account ? For instance it is conceivable that probe has been tested in many assays after it has been released, but was always found to be negative. If these results are not published (for any reason), the probe would be classified as undesirable. Would that alone disqualify the probe ? It might also occur that a publication of a positive result gets significantly delayed – again, the probe would be labeled as “undesirable”. Were any measures applied to account for this publication bias ?

RESPONSE:  The authors acknowledge these problems when considering publication status, and is reflected in our discussion of “soft” skills related to medicinal chemistry due diligence. For example, new compounds, those published in the last 2-3 years, were not considered undesirable due to lack of literature follow up.  We have added this to our discussion. Despite the severe limitations of our system, which we acknowledge as inherent to medicinal chemistry due diligence, our models were able to accurately predict desirable and undesirable scores.

(5) Constitution of training and validation sets for the individual model versions: it is stated that “after each model generation additional compounds were identified” (p 10). From which source where these compounds identified, why were they not identified before ? How were the smaller training sets selected (Bayesian model 1 – 57 molecules; model 2 – 170 molecules) ?

RESPONSE:  – As described in the Experimental section “With just a few exceptions NIH probe compounds were identified from the NIH’s Pubchem web based book 30 summarizing five years of probe discovery efforts. Probes are identified by ML number and by PubChem CID number. NIH probe compounds were compiled using the NIH PubChem Compound Identifier (CID) as the defining field for associating chemical structure. For chiral compounds, two dimensional depictions were searched in CAS SciFinderTM (CAS, Columbus OH) and associated references were used to define the intended structure. “

Each of the datasets were generated as Dr. Lipinski found the structures for additional probes. This process was complex and is the subject of a mini perspective submitted elsewhere because of the difficulties encountered which are of broader interest.

(6) As stated on p 18, the due diligence relies on soft skills and incorporates subjective determinations. These determinations might change over time, since the expert acquires additional knowledge. How can this dynamic aspect be incorporated in developing models for expert assessments ? The paper would benefit from suggestions or even proof-of-concept studies to adress this question.

RESPONSE:  -This is a great point while we feel it is beyond the scope of this project, it is worth pursuing elsewhere in more detail. We have documented for the first time the ability to model one medicinal chemist’s assessment of a set of probes, which is a snapshot in time. The number of probes will increase and the amount of data on them will change over time. The medicinal chemists assessment will likely also change. Our rationale was select a chemist that has great experience (40+ yrs ) that has seen it all – the assessment in this case is likely more stable. We are just modeling this chemists decision making.

(7) It is difficult to judge the relevance of the comparison with BadApple – more details on the underlying scope and methodology or a literature reference are necessary.

RESPONSE:  The BadApple score is determined from publicly available information to determine promiscuous compounds. We have added clarification and references to the website and the  American Chemical Society presentation in the text.

(8) In ref 22 and 23 substructure filters and rules are described to flag potential promiscous compounds. How many of the NIH probes would be flagged by e.g. PAINS ?

RESPONSE: The PAINS filters flagged 34 of the NIH chemical probes – 25% of the undesirable and 6.7% of the desirable. We have included this data in the text and added it to Figure 3.

Additional Questions: Please rate the quality of the science reported in this paper (10 – High Quality/ 1 – Low Quality): 6 Please rate the overall importance of this paper to the field of chemical information or modeling (10 – High Importance / 1 – Low Importance): 8 Reviewer: 2 Recommendation: Publish after minor revisions noted. Comments: Computational Prediction and Validation of an Expert’s Evaluation of Chemical Probes The work presented in the manuscript describes an effort to computationally model CAL’s (an expert medicinal chemist’s) evaluation of chemical probes identified through the NIH screening initiative. CAL found about 20% of the hits as undesirable. This exercise is used as an initial example of understanding how medicinal chemistry evaluation of quality lead chemical matter is performed and whether that can be automated through computational methods and/or expert rules teased out or learnt. The manuscript is well written, evaluation of chemical matter and capture of various criteria thorough and the computational modeling methods sound, that I don’t have any suggestions on the manuscript, experimental details, commentary and conclusions.

RESPONSE: Thank you

However, I have a philosophical question on the study that the authors have carried out and perhaps that can addressed through comments back and weaved into the manuscript discussion somewhere. Given that human evaluation of anything is very subjective and biased to begin with (As ref 8 – Lajiness et al. study indicates), what does one gain from one expert evaluation as opposed to a medchem expert panel evaluation. For e.g., a CNS chemist evaluating probes for a CNS target versus an oncology chemist evaluating probes for a end-state cancer indication will have very different perspective on attractive chemical matter or different levels of tolerance threshold during the evaluation. Further even within a single project team, medchem campaigns in the pharmaceutical industry are mostly a team-based environment, where multiple opinions are expressed, captured and debated. There is no quantitative evidence to date, that any one approach is better than the other, however consensus of an expert panel might certainly identify common elements that could be developed as such(?)

RESPONSE: Yes this is a great point. The earliest work on the probes as described used crowdsourcing with multiple scientists (not just medicinal chemists) to score the probes. We do now state in the final sentences – “This set of NIH chemical probes could also be scored by other in-house medicinal chemistry experts to come up with a customized score that in turn could be used to tailor the algorithm to their own preferences.  For example this could be tailored towards CNS or anticancer compounds”.   In the case of the study this was not a consideration. We only looked at ‘Were the compounds desirable or not based on the extensive due diligence performed’. One concern with consensus decisions is that it may dilute the expert opinion, when our goal was to capture the decisions of one expert and not the crowd. We had termed this ’ the expert in a box’ casually, could we capture all of that insight and knowledge and then distill it down to some binary decision using some fingerprint descriptors? Our answer so far based on this work was yes. Additional Questions: Please rate the quality of the science reported in this paper (10 – High Quality/ 1 – Low Quality): 6 Please rate the overall importance of this paper to the field of chemical information or modeling (10 – High Importance / 1 – Low Importance): 5

———

As for the discussion on public and commercial databases this work was submitted to Nature Chemical Biology as a commentary. The same journal published the only prior analysis on 64 chemical probes in 2009. We thought this would be a perfect location for a discussion of the issues between public and commercial databases. After all Nature is so supportive of data reproducibility.

Dear Dr. Ekins:

Thank you for your submission of a Commentary entitled “The parallel worlds of public or commercial chemistry and biology data”.

Our editorial team has read and discussed your manuscript. Though we agree that the topic of chemical and biological data is relevant to our audience, we unfortunately are not able to consider your Commentary for publication. Because we have such limited space for Commentaries and Reviews in the journal, these formats are typically commissioned by the editors before submission. Since we have a strong pipeline of content at the moment, especially in areas related to the development and validation of chemical probes, we unfortunately cannot take on any more Commentary articles in this particular area.

We are sorry that we cannot proceed with this particular manuscript, and hope that you will rapidly receive a more favorable response from another journal.

Best regards,

Terry L. Sheppard, Ph.D.
Editor
Nature Chemical Biology

So we then submitted it to The Journal of Medicinal Chemistry as a miniperspective – we went through 2 rounds of peer review and the manuscript changed immensely based on the reviewer comments.

Reviewers’ Comments to Author: Reviewer: 1 Comments: This is a thought-provoking article that is appropriate for publication as a Perspective in JMC. I recommend acceptance with minor edits.

RESPONSE: we thank the reviewer for their comment.

It is important that this article be clearly labeled as a Perspective, as there is a significant number of personal opinions and undocumented statements throughout.  Given the recognized professional stature of the authors, I do not doubt the veracity and value of such statements, but they certainly deviate from a JMC norm.  There are also some controversial statements that are valuable to have in writing in such a prominent journal as JMC, and I look forward to alternative interpretations from other authors in future articles.  I consider this normal scientific discourse, and encourage JMC to publish.

RESPONSE: This article is a Mini-Perspective. We have tried not to be too controversial but we feel the timing is appropriate before the situation gets too far out of hand.

Some suggestions: 1. The title is misleading (at least to me).  I recommend the term “biology data” should be re-phrased as “bioassay data”.  I might be splitting semantic hairs, but the vast majority of data encompassed in this article does not deal with efficacy or behavior of animals.  True biological data is much more complicated (dose, time, histology, organ weights, age, sex, etc.) than the data cited here (typically, EC50 or IC50 data).  I defer to the authors on this point.

RESPONSE: Thank you, we have changed to “The parallel worlds of public and commercial bioactive chemistry data”

2. Page 4, line 22. A comma is needed after “suspects’)”.

RESPONSE: Thank you, this has been added. 3.

Page 11, line 47.  I found myself asking “What is the value of prophetic compounds?”  The authors write that the “value is at least clear”, but as I read this line, the value became unclear (to me).  I recommend that the authors explicitly indicate that value, particularly as it is relevant to the Prior Art question treated in this paragraph.  I suspect the value is to “illustrate the invention,” but I defer to a legal expert for better verbiage.  If we are going to expend computational time in searching and interpreting these prophetic compounds, then surely there must be a value beyond the initial illustration of the invention.

RESPONSE: We have greatly expanded on these topics in the text – there has already been some discussion of this. We also added a glossary.

4. Page 21, reference 26. The authors must add the Patent Application Number.  I believe this is US 20090163545, but I defer to the authors.  Also, if this application has led to a granted patent, that citation should be included as well.

RESPONSE: we have updated the number in the references and the text.

5. Figure 1.  While artistic, this picture is confusing to me.  Please re-draw and remove the meaningless sine wave that traverses the picture.  Please re-position the text descriptors beneath each compound uniformly, in traditional JMC style.  The picture concept, e.g. illustration of the various kinds of compounds, is useful.

RESPONSE: We have redrawn as requested.

6. Figure 2. This is an interesting figure and I feel it adds visually to stress the theme of the paper.  However, please amend the legend to explicitly define the size and absence of a circle.  I presume the size of the circle reflects the relative size of the cluster, and the absence of a circle denotes a singleton, but I am unsure.  The red/blue dots are intriguing, but I am unclear on how “desirability” is quantitated.  Perhaps the authors intend the red/blue dots to be only a rough, maybe even arbitrary or random, visual cue with most compounds scoring intermediate.  Please provide a line in the legend that explains how the red/blue was scored.

RESPONSE: We have updated the legend. The desirability scoring is the subject of a separate manuscript in review at JCIM. This Figure 2 is not published elsewhere.- Figure 2. The chemical structures for 322 NIH MLP probes (http://molsync.com/demo/probes.php) have been clustered into 44 groups, using ECFP_6 fingerprints 49 and using a Tanimoto similarity threshold of >0.11 for cluster membership. Each of the clusters and singletons: for each cluster, a representative molecule is shown (selected by picking the structure within the cluster with the highest average similarity to other structures in the same cluster). The clusters are decorated with semicircles which are colored blue for compounds which were considered high confidence based on our medicinal chemistry due diligence analysis (Manuscript in review), and red for those which are not. Circle area is proportional to cluster size, and singletons are represented as a dot.

Reviewer: 2 Comments: The ‘perspective’ by Lipinski et al. is in part difficult to follow and it remains largely unclear what the authors aim to bring across. One essentially looks at a collection of scattered thoughts about databases, search tools, molecular probes, or patents etc. Various (in part technical, in part general) comments about SciFinder and the CAS registry are a recurrent theme culminating in the conclusion that SciFinder is probably not capturing all compounds that are currently available… The only other major conclusion the authors appear to come up with is their wish for ‘more openness in terms of availability of chemistry and biological data …’ (but there  is little hope, as stated in the very last sentence of this manuscript …). This draft lacks a clear structure, a consistent line of thought, and meaningful take home messages that go beyond commonplace statements and might be of interest to a medicinal chemistry audience. This reviewer is also not certain that some of the more specific statements made are valid (to the extent that one can follow them), for example, those concerning ‘data dumps’ into public databases or the ‘tautomer collapse’.

RESPONSE: We have extensively rewritten the mini-perspective to address the reviewer concerns and provide more structure and narrative flow. We have made it more cohesive and come up with additional recommendations to improve the database situation. We have removed the term data dump and expanded other terms. We have added take home messages and conclusions as suggested.

Be that as it may, there already is a considerable body of literature out there concerning public compound databases, database content, and structural/activity data, very little of which has been considered here. Which are the major databases? Is there continuous development? What are major differences between public compound repositories? Are there efforts underway to synchronize database development? What about the current state of data curation? What about data integrity? Is there quality control of public and commercial databases? Is there evidence for uniqueness and potential advantages of commercial compound collections? What efforts are currently underway to integrate biological and chemical data? Why are there so many inconsistencies in compound databases and discrepancies between them? How to establish meaningful compound and data selection criteria? How do growing compound databases influence medicinal chemistry programs (if at all)? Is their evidence for the use of growing amounts of compounds data in the practice of medicinal chemistry? How do chemical database requirements change in the big data era? Such questions would be highly relevant for a database perspective.

RESPONSE: We have addressed several of these questions in the perspective. Many of these were topics we have raised earlier and now reference those papers. We have also created Table 1 (unpublished) to add more detail on databases.

As presented, some parts of this draft might be of interest for a blog, but the manuscript is not even approaching perspective standards of J. Med. Chem.

RESPONSE: We have extensively rewritten the mini-perspective to address the reviewer concerns. Based on the feedback of the other reviewers they had less concern or issue with the standard. We believe it is now greatly improved. J Med Chem is the appropriate outlet to raise awareness of this issue which will be of interest to medicinal chemists globally. We think this goes beyond the audiences of our respective blogs.

Reviewer: 3 – Review attached.    This paper addresses an important and timely topic, but it is disorganized and in places reads as an informal recounting of annoyances the authors have encountered in their development and use of various chemical databases. It could use a bit of rethinking and rewriting; some more specific comments and suggestions are provided below for the authors’ consideration.

RESPONSE: We have extensively rewritten the mini-perspective to address the reviewer concerns and provide more organization and structure in general. We believe it is now greatly improved.

It is not clear to this reader what is the main concern the authors wish to address with this article. It starts by taking the reader through some of the detailed problems of identifying and learning about a set of about 300 interesting compounds curated by the NIH; but it is never clear why these compounds are of leading interest here. Are they being used just as examples, or are they particularly important? Later, the paper puts considerable emphasis on the difficulty of completing IP due diligence for new compounds, due to the heterogeneity of chemical databases, and it began to appear that this was the main concern. The paper would benefit from a more specific statement of its concerns at the outset. Although the paper frequently refers generically to proprietary and public chemical databases, my impression is that the only proprietary database that is specifically mentioned is SciFinder/CAS. Are there any other proprietary databases (e.g., Wombat or the internal databases of pharma companies) to which the authors’ comments apply? If not, then the article would be clearer if it specified at the outset that the key proprietary database at issue is CAS.

RESPONSE: We now made it clear that the probes are used as an example and the problems we encountered when trying to find them and also score them for desirability (described in a manuscript in review at JCIM). We have also created Table 1 (unpublished) to add more detail on databases. We believe the issue is not just with CAS and have now expanded this to cover other databases in Table 1.

Many readers will not be familiar with the specific requirements for successful due diligence search, so these should be spelled out in the paper. Without this, many readers will not understand how the current chemical informatics infrastructure falls short for this application. Along similar lines, the authors should define “probe” compounds, “good probes”, “bad probes”, “prophetic compounds”, “text abstracted compounds” and other terms of art that are likely to be unfamiliar to many readers.

RESPONSE: We have provided references that address these questions, we have also added more explanation and a glossary.

A small quirk of the presentation is that the authors list multiple “personal communications” from themselves to themselves. This appears to be an effort to allocate credit to specific authors, but it’s not a practice I’ve seen before, and it strikes a jarring note. Perhaps some style expert at ACS Journals can clarify whether this is a suitable practice.

RESPONSE: We have removed these author communications and abbreviations as proposed.

There are a number of places where the authors make assertions that are vague and unsupported by data or citations. For example, on page 4, it isn’t clear how the analysis of 300 probes revealed the complexity of the data, and Figure 2 does not help with this this. (It looks like just a diagram of compound clusters.) Similarly, at the bottom of page 9, concerns are raised about the accuracy of chemical structures in catalogs, but the support is weak, as the reader only gets the personal impressions of A.J.W. and C.A.L. If C.A.L. has estimated that 50% of “commercially available” compounds have never been made, it should not be difficult to add a sentence or two explaining how the analysis and the data. Similarly, if A.J.W’s “personal experience” of processing millions of compounds has taught him the many compounds from vendors have “significant quality issues”, then it would be appropriate to provide summary statistics and examples of the types of errors. Similarly, it would be appropriate to replace “many compounds” by something more quantitative; “many” could mean 10% to A.J.W., but 50% to a reader.

RESPONSE: We have clarified the number of probes. We have provided references to our other papers dealing with database quality issues which are quantitative in this regard. We have removed the author abbreviations. We have removed any ambiguity in the numbers presented.

On page 4, in what sense have “multiple scientists scored a set of” compounds? What is meant by “score”, here? In what sense are the 64 probes “initial” and does it matter?

RESPONSE: We have expanded this and would recommend the reader read the actual paper for more detail because different scientists scored differently. Score represents each scientists evaluation of the desirability/ acceptability of the probe.

On page 5, we read that there is no common understanding of what a high-quality probe is, but then a definition is provided; this seems inconsistent. What is a “parent probe book”? The challenges encountered by C.A.L. in getting data on the NIH probes seem overly anecdotal, and it isn’t clear whether the reader is supposed to be learning something about problems at NIH from this, whether this experience is supposed to reflect upon all public chemical databases, etc. Why conclusion should the reader draw from the fact that C.A.L. eventually found a relevant spreadsheet “buried deep in in an NIH website”? It’s also a little confusing that, after this lengthy account of problems collecting all the probe information, the paper then praises the NIH probe book as a model to emulate. Finally, at the top of page 6, the authors speculate about the chemical vs biological orientations of the labs which providedthe probe data, but this seems irrelevant to any points the paper is making.

RESPONSE: We have removed this conflicting text – we believe the issues identified in this procedure are important. Access to probe molecules and data is complicated, non-obvious if not painful. The public funded efforts should make the data more accessible, this review just hints at the difficulties.

The section heading “Identifier and Structure Searches” tells the reader little about what the section will contain; and then the section in fact wanders from one topic to another. It starts with comments about ligand similarity and target similarity, discusses whether or not medicinal chemists are too conservative, delves into the vagaries of SciFinder’s chemical search capabilities, and finally devotes most of a very long paragraph to discussion of a single patent which references thousands of compounds. It isn’t clear why the reader is being told about this patent; is it problematic enough on its own to be worth extended commentary, or is it regarded as a small but worrying harbinger? Finally, the text recounts that “C.A.L. had initially worried that a reference to this patent application was somehow an indicator for a flawed or promiscuous compound. We now believe … this single patent application is an example of how complete data disclosure can lead to …. potentially harmful consequences.” It’s not clear that the report of initial worries help the reader to understand what is going on with this patent; and I didn’t fully understand the harmful consequences of this patent from the text provided.

RESPONSE: we have added an introduction to this section to facilitate lead in to the discussion. Again we have greatly edited this section to make it clearer.

Page 10: what is a “type c compound”? Who are “the hosts of ChemSpider”? Is the story about CAS and ChemSpider important for the messages of the paper?

RESPONSE: We deleted type c compound for clarity – The RSC own Chemspider. We think the story with CAS is relevant because it covers how data can pass between databases and possibly transfer problematic compounds.

Page 10: At the bottom of the page, a concern raised about the lack of metrics for specifying activity against a “biological target” is vague. Presumably the concern is greatest for phenotypic screens; one wonders whether the authors also regard Kd values as inadequately standardized. This may be the case, but more detail is needed to help the reader understand what point the authors mean to get across.

RESPONSE: We have edited this and added metrics for bioactivity – our main point is integrating data in databases and inadequate annotation and the requirement for ontologies to improve this.

Page 11 says that efforts are underway to standardize bioassay descriptions, based on “personal communication” from two of the authors. Are we to understand that these authors are actually doing the work, or are they personally communicating (to themselves) that someone else is doing it?

RESPONSE: We now added a recently published paper and removed the references to communications between authors.

Page 11, what does it mean for compounds to be “abstracted in databases”? Is this something different from just being listed in databases?

RESPONSE: This was changed.

Page 12: what are “tabular inorganics”? Can the authors at least estimate how much smaller the SciFinder collection would be if tautomer variants were merged? What is “an STN® application programming interface”? Is it different from some other type of application programing interface?

RESPONSE: We added a definition for tabular inorganics in the glossary. The STN API is described in a press release http://www.cas.org/news/media-releases/scifinder-offers-api-capabilities now added to the references. We do not know how much smaller the Scifinder collection would be if tautomers were merged.

Page 12: The last sentence says that proprietary and public databases will diverge until proprietary databases “determine how to extract quality data from the public platforms”. Couldn’t the proprietary databases take the public data now, and thus presumably eliminate any divergence? On the other hand, if they only extract some “quality” subset of the public data, then the divergence will persist, but this raises different issues, regarding the definition and identification of “quality” data.

RESPONSE: We have removed much of this discussion. CAS was taking the Public data like ChemSpider as described, but that ceased. It looks like CAS and likely other commercial databases cannot keep pace.

Page 13: the sentence beginning “There is however enough trouble…” reads as a non sequitur from the prior sentence, which says nothing about “perpetuating two or more parallel worlds”

RESPONSE: This statement was removed.

Finally, the article’s pessimistic concluding sentence undermines the value of the paper as a whole: if the improvement is so unlikely, why take readers’ time to tell them about the problems? Perhaps the article could end on a more positive note by exhorting the community (or just CAS?) to devise creative new business models which will enable greater integration of public and private chemical databases while retaining the strengths of both models.

RESPONSE: We have heeded this suggestion and proposed the use of InChI alongside SMILES – (CAS does not use this) that would allow comparison with other databases. We also proposed encouraging more analysis as well as a meetings between the major parties to discuss what can be done to resolve the on going situation. We have also used the suggestion of encouraging some creativity on the business side.

The second round of reviews:

 

Responses to Reviewers’ Comments to Author: Reviewer: 2 Comments: The authors have revised their manuscript and improved its readability. In addition, a number of irrelevant references have been eliminated. The discussion continues to be dominated by database technicalities (the majority of citations include cheminformatics journals or technical resources) with limited relevance for medicinal chemistry. The main body of the manuscript is akin to a collection of experience values trying to retrieve compound or patent information from various database sources. Unfortunately, the revised manuscript still lacks case studies and/or conclusions that would render it relevant for publication in J. Med. Chem. As presented, main points include the “lack of integration between isolated public and private data repositories”, the “multi-stop-datashop” theme, the quest for a “shift towards more collaboration or openness in terms of availability of chemistry and biological data”, and the “major hurdles that exist to prevent this from happening”. With all due respect, but this is all commonplace. The revised manuscript is now at least readable and conforms to formal publication requirements (although the quality of the few display items is rather poor and the reference list still includes numerous inconsistencies). Given the strong focus on technical aspects associated with database use, one might best advise the authors to submit their revised manuscript to an informatics-type journal where it would probably find an audience. The best choice might be J. Cheminf. that is rather technically oriented (and from which several of the cited journal references originate).

 

Response:  “The main body of the manuscript is akin to a collection of experience values”. Respectfully, we would like to make it clear that this is the point of our article. Here for example is a medicinal chemist trying to find the probes and decide based on data whether they actually should be probes in the first place. We are describing his experience and that of others in finding information on molecules. This is highly relevant to medicinal chemistry. We are not making molecules in this paper but the starting point for medicinal chemistry is HTS screening hits and these probes could (and some would argue) represent such molecules. The NIH spent over $500M dollars to produce these 300 or so ‘hits’ therefore the process we have undertaken serves to show the challenges and solutions to finding information on chemicals that may influence future chemistry decisions. We do not accept the suggestion that our article has “limited relevance to medicinal chemistry”. We are not aware of anyone using the whole set of NIH probes as the backdrop to such a discussion. Our article is much more than the sum of the “main points” presented by the reviewer as “all commonplace”. For example some of the issues around prior-art searching by virtual compounds could impact the composition of matter patentability of a new medicinal chemistry lead. The authors have experience in medicinal chemistry, cell biology, bioinformatics, analytical chemistry, cheminformatics and drug discovery, and I would say that we have approached it from a balanced perspective drawing from all of these perspectives, and not solely cheminformatics. It is not best suited to keep this article in a cheminformatics journal as it needs a wider audience of medicinal chemists if we are to promote some realization of the situation and effect change.

Reviewer: 1 Comments: All of my issues (reviewer #1) were addressed in the re-submitted manuscript.  The added glossary is very helpful.  This is a much improved article with the changes in the text.  Thank you.

 

Response: Thank you

Reviewer: 3 – Review attached. The revised version is dramatically improved but requires further editing, for clarity, specificity, and

grammar. Detailed recommendations follow.

 

Response: Thank you – These are predominantly minor edits, which we have dealt with appropriately.

Page,Line Comments

2,12 “bioactivity” “bioactivity data”

Response: Thank you

 

2,29 “so called” “so-called”

Response: Thank you

 

2,34 delete “importance to the”

Response: Thank you

 

3,42 define “multi-stop datashops” or else don’t use it

Response : added ‘the afore mentioned’…

 

3,42 what does “constitutive” mean here? consider deleting it

Response – replaced with essential

 

3,44-47 is there some reason the divergence between public and private DBs is of greater concern than the divergence between different public DBs? If not, then adjust text accordingly. If so, then explain why.

Response – explanation added

 

3,52 “potentially others”. I suggest mentioning one or two potential others. .

Response, It was useful to point this out. Since CAS is the largest by far “potential others” has been removed

 

3,54-55 what does it mean that “CAS likely document their efforts to ensure high quality

curation…”? My impression is that it’s not any documentation of efforts, but the

efforts themselves which matter, anyhow.

Response, agreed documentation removed

 

4,8 “warranty”: these database do not warranty the data at all, so this word use seems,

well, unwarranted.

Response – agreed so sentence shortened

 

4,12 define “submitter-independence” or say this some other way.

Response : data quality issues arise that are independent of the submitter 15

 

4,15 “Logically, however…” The “however” seems out of place, as the subsequent text

does not contrast with what came before.

Response: Deleted “however” – preceding sentences describe data quality

 

4,15 define or reword “extrinsically comparative database quality metrics”.

Response: Deleted “extrinsically”

 

4,36 add comma after “million”

Response: Thank you

 

4,45 It’s not clear that citation 18 supports the text referencing it

Response : these are correct references

 

4,48 add comma after “databases”

Response: Thank you

 

4,50 after “GDB”, replace comma by semicolon; add comma after “scale”; delete

“small”, as “boutique” already implies smallness

Response: deleted boutique

 

5,8 I think “simple molecular formula” would be clearer as “empirical formula”

Response: Thank you

 

5,8-9 Delete “other” and “with the identical formula”

Response: changed to same atomic composition

 

5,16 insert comma after “showed that”

Response: Thank you

 

5,18-19 Either delete “and suggested they were complementary” or rewrite it so that it

adds something to the meaning. As written it seems obvious that, if these DBs have

different data, they are complementary.

Response: This has been shortened.

 

5,30 replace “if” by “whether”

Response: Thank you

 

5,42 Figure 1: it’s not clear what we learn from this. More importantly, the caption

seems wrong, as it says this it shows “The ‘usual suspects’ lineup”, where the main

text defines usual suspects as compound with liabilities or reactivity. Clearly, Figure

1 is not the list of all “usual suspects”. In fact, most of the compounds do not

appear to be suspects at all. Also, in the captions, it is not clear what “desirable”

means. Desirable in what sense?

Response : moved location of “usual suspects”, Desirable was used according to the definition = worth having or wishing for (concise Oxford dictionary).

 

5,52 eleven scientists does not sound like “crowd sourcing”; the number seems too few

for a crowd.

Response : The Oprea et al paper, “A crowdsourcing evaluation of the NIH chemical probes” published in Nature chemical biology in 2009, used the term “crowdsourcing” to describe the study. We agree with this usage and for clarity will continue to use the term to refer to this study.

 

5,54 “acceptable” in what sense? this needs to be defined.

Response : Oprea et define ‘acceptable’, we are not judging their criteria here or using their data.

 

6,20 what is meant by “feasibility of pursuing a lead”? Presumably, it is feasible from the

chemical standpoint. If this is an issue of IP, then how does it differ from “freedom

to operate”? If it is the same, then delete it.

Response deleted.

 

6,34 “leads””lead”

Response: Thank you.

 

6,53 it is not clear what Figure 2 adds to this article. Also, I am concerned that clustering

compounds based on a Tanimoto similarity measure of 0.11 (see figure legend) 11

is probably not meaningful. At least for Daylight-style fingerprints, 0.11 would

normally be considered not very similar at all.

Also in the figure legend, we have “Each of the clusters and singletons: for each

cluster….”. Something is wrong with the punctuation. And we read that blue

indicates “high confidence”; but high confidence of what, exactly?

 

We have clarified that the threshold was chosen empirically to show a representative selection of probes. We have updated the legend and added the reference which describes how molecules were scored (recently published).

 

7,6 add comma after “databases”

Response: Thank you

 

7,13 add “that” before “solutions”

Response: Thank you

 

7,30 “very high binding affinity”—this is too vague for a scientific audience. Please

provide some quantitative cutoff, even if a bit rough.

Response this is the definition from ref 26

 

7,46 delete “use”

Response: Thank you

 

7,51 “Substance number… requires added effort to find the salient chemistry details”—

what does this mean? What are “salient chemical details”? Is it something other

than the chemical structure? My own impression is that an SID takes one smoothly

to a compound in PubChem, so I’m not sure what the issue is here.

 

Response: SID identifies a depositor-supplied molecule. (SID), is assigned by PubChem to each unique external registry ID provided by a PubChem data depositor. The molecule structure may be unknown, for example a natural product identified only by name or a compound identified only by an identifier. The depositor record could be a mixture of unknown composition. The molecule in a SID may be racemic, a mixture of stereoisomers, a regioisomer of unknown composition, a free acid or free base or a salt form. The data depositor may not be a chemistry expert or may be confused by chemistry structure. By way of contrast CID is the permanent identifier for a unique chemical structure but the unique structure still can be a mixture of enantiomers or stereoisomers. To properly perform a medicinal chemistry search one must know in structural terms what is being looked for. Therefore CID is the definitive identifier. Sometimes the relationship between SID and CID is clear, sometimes it does not exist.

Further information can be found on this at: https://pubchem.ncbi.nlm.nih.gov/docs/subcmpd_summary_page_help.html#MoleculeSID

For detail on SID

https://pubchem.ncbi.nlm.nih.gov/docs/subcmpd_summary_page_help.html#DataProcessingSID

Note: Although identifiers are unique within a PubChem database, the same integer can be used as an identifier in two or more different databases. For example, “2244″ is a valid identifier in both the PubChem Substance and PubChem Compound database, where: SID: 2244 is the PubChem Substance database record for cytidylate, and CID: 2244 is the PubChem Compound database record for aspirin.

 

8,3 “would not retrieve a CID” – or an SID? In any case, it would be appropriate to

identify one or two sample ML Probes which have this problem, rather than just

making the assertion.

 

Response: ML213 is an example of the problem posed by SID and CID. The SID for this probe correctly and uniquely identifies the molecular substance, whatever it is was, that was tested in the assay for this probe. The corresponding CID for ML213 depicts a deceptively simple single structure of a norbornane carboxylic acid amide that is both chiral and whose chiral center is capable of epimerization. Thus this compound can potentially exist as four stereoisomers. From the CID number and structural depiction, the chemist does not know what was actually tested. One must read in the chemistry experimental for ML213 in the NIH probe book to find that a mixture of all four stereoisomers was actually made and that the stability curve of the mixture change over time and that the change is tentatively attributed to solubility changes. The net conclusion is that a chemist does not know anything about the activity or inactivity of the four stereoisomers. Chemical abstracts lists ML213 as the same structural depiction as shown in the CID and thus does not help in resolving the structural and biological ambiguity. The typical biologist would completely miss all the structural complexity and ambiguity. To ferret all this out by a chemist may or may not be possible for ML213 but if possible requires substantial effort.

 

8,39 “a CAS registry number” “CAS registry numbers”

Response: Thank you

 

8,55—9,37 This whole paragraph is apparently devoted to explaining why it is worth finding out if compounds similar to the one of interest have other biological activities. The rationale given is something along the lines that similar structures can have a wide

range of biological activities. I’m not sure all of these words are needed however;

surely it is worth knowing if the compound one is trying to patent has potential

side-effects. ”

Response: We think it is worth “a whole paragraph” on explaining this because people tend to forget that many chemical motifs are reused and why these are important.

 

9,48 add comma after “law”

Response: Thank you

 

9,55 “the well known issues…”: which well-known issues? And, do the authors mean

these are well-known for SciFinder in particular, or more broadly?

Response: “More broadly” has been added.

 

10,20 add comma after “Registry number”

Response: Thank you

 

10,45 replace last comma by a period

Response: Thank you

 

10,49-54 Why should the reader care that 132,781 compounds were specified in the HTS but not referenced as “use” compounds?

Response : These numbers simply explain the explicit and referenced content of this patent to the reader. 20% of the probes contain a sole patent reference to this patent whose biology is completely different than in any of the MP documents and this comes from only about 5000 out of 132,000 compounds in the HTS being abstracted. This unique case illustrates the havoc possible from extreme (inappropriate) data disclosure and abstraction.

 

10,54 Insert comma after “Thus”

Response: Thank you

 

11,3 Please explain how “due diligence searching can be confounded” by a patent like

this. What is the problem that it generates? I don’t think most readers will know

Response: we explain this in the following sentences.

 

11,18 Similarly, what are the “potentially harmful consequences” for IP due diligence.

Response: we explain this in the remainder of the article.

 

11,34 after “disclosure”, there should be a colon

Response: Thank you

 

11,37 add semi-colon after “SciFinder”

Response: Thank you

 

11,39 add comma after “screening data”

Response: Thank you

 

11,42 The text says Table 2 lists NIH probe molecules, but the second and third entries in the table do not appear to be NIH probe molecules. Clarify or correct.

Response this is now clarified and additional probes added.

 

11,49 add comma after “difficult”

Response: Thank you

 

12,29 add comma after “compounds”

Response: Thank you

 

13,20 add comma after “challenge”

Response: Thank you

 

13,51 add comma after “protocols”

Response: Thank you

 

14,3 replace “the question are” by “whether”

Response: Thank you

 

14,8 add comma after “dramatically”

Response: Thank you

 

14,11 add comma after “databases”

Response: Thank you

 

14,25 Delete “By definition….all of them.” It seems trivial and obvious. Or else change it

to say something nonobvious.

Response: changed to ‘By definition, no quantitative assessment across databases is possible without access to all of them, and to our knowledge this has not been undertaken to date.’

 

14,32 delete “aggregate”, unless it adds something .

Response: Thank you

 

14,42 add comma after “ChemSpider”

Response: Thank you

 

14,48 “require quantification”—why is quantification required? what is the expected

benefit? .

 

Response: quantitative statistics are essential for objective comparisons so structure matching is now specified in the text

 

15,27 Delete “To conclude, from our observations”

Response : this has been changed

 

15,31 delete “isolated”

Response: Thank you

 

15,50 either delete or clarify “at the extremes”. I’m not sure what it adds. In fact, if the

cases considered in this article are “extremes”, one might argue that the concerns

raised throughout are not that important, since presumably most users will not

have extreme experiences with the databases.

Response: extremes deleted

 

15,53 add comma after “Probes”

Response: Thank you

 

16,15 delete “also”, as we already have “in addition”

Response: Thank you

 

16,27 Again, “multi-stop datashop” is ill-defined.

Response: this was defined earlier

 

16,34 Delete “(OSDD)”, since this abbreviation is not used subsequently.

Response: Thank you

 

16,34 add “, and” after “similar?’”

Response: Thank you

 

16,45 delete “(commercial or academic)”; doesn’t seem to add anything

Response: Thank you

 

16,48 add comma after “same answer”

Response: Thank you

 

16,48-49 change “operate (i.e….. structure terms).” to “operate; i.e., ….structure terms.”

Response: Thank you

 

17,47 add comma after “same compounds”

Response: Thank you

—————

Well if you made it this far you perhaps realize that the time spent actually finding the NIH probes and doing the due diligence by Chris and our modeling efforts were virtually matched with the time and effort spent responding to multiple rounds of peer review. The Minireview is now available so see if it was worth all the effort.. [as of Dec 9th it has been recommended in F1000Prime ]

So what can I say in conclusion, well as with previous challenges getting contentious issues published again it takes perseverance and reviewer comments and journal responses were a mixed bag. I hope it alerts other groups to the set of probes which are now available in the CDD Vault and elsewhere. In addition Alex Clark has put them into his approved drugs app as a separate dataset – and it is available for free for today only.. The challenge of public and commercial chemical databases will likely continue, but the impact for due diligence is huge, you can no longer rely on Scifinder as a source of chemistry information. The Chemistry data and databases are exploding, and moving fast. Journals and scientists need to wake up to what is going on to. The groups developing chemical probes, need an experienced medicinal chemist to help them and journals that publish papers on chemical probes need strict peer review and dues diligence of a probes quality. A model may be a way to flag this in the absence of the actual chemist.

On the general publishing side, I frequently get comments about publishing in the same journals, well my response is when I try to break out of the mould and try to reach a different audience I get a luke warm, or down right chilly response. Having never published in ACS Chemical Biology or Nature Chemical Biology I tried there first. I did not have an ‘in” I could rely on, no buddies that can review my papers favorably. Even when we have been encouraged by an Editor to write something for a journal such as another recent review paper with Nadia on stem cells, initially targeted at Nature Genetics, that does not guarantee it will see the light of day in that journal. After submission to other journals like Cell Stem Cells, finally it was published in Drug Discovery Today. I can say again that publishing in F1000Research is a breeze by comparison to going for the above traditional big publisher journals, I appreciate the open peer review process and transparency as can be seen in another recent paper ..

I hope by putting this post together people realize what it takes to get papers out. I owe a huge debt of gratitude to Chris Lipinski for inspiring this work and for doing so much to raise this issue and Nadia for driving the analysis of the probes, and our co-authors for their support and contributions to writing, re-writing and re-writing again!

Update Dec 15 2014…the J Med Chem Minireview gets a mention on in the pipeline..

 

 

 

Nov
12

Giving and taking my own advice on starting a company

I have a tendency to agree to other peoples good ideas to do things and then it catches up with me. You know that feeling of massive over commitment.

Well I started to say NO a lot more frequently. I say no to all manner of requests to answer questionnaires, review for journals from publishers I never heard of, present at different kinds of unrelated conferences in China (although I really would like to go one day). I still say yes to anyone asking me questions, whether its students wanting career advice, friends needing references, strangers wanting reprints etc. Rare disease parent advocates are the top of my YES list. I have plenty of time for them because they have immense challenges to raise funds and convince other scientists to do the research that one day may help their child. Their approach has been a revelation to me to the extent that I have had to put aside other academic pursuits. I have let 2 book proposals languish, along with several papers I have always wanted to work on. Well I hope I have time in the future to get back to these projects. My focus, if I have any, is rare and neglected diseases. the latter because they are killing millions needlessly and the former because there are so many of them with huge gaps in our knowledge and both severely lack funding.

A few months a go Jim Radke at Rare Disease Reports asked if I would like to blog for them on occasion. Now I have a lot of time for Jim because he is very generous at highlighting all different groups involved in rare diseases and the website has a wealth of info. He is also a super nice fellow who told me I should do more of what I do on rare diseases. So I said yes, signed the contract and got on with my work. I also pledged to give any royalties to the rare disease foundations I work with. In the back of my mind for the past few months, all I could think about was writing on something I really am not an expert on but which might help others, by presenting the famous ‘Ekins naive perspective’ on a topic.

That blog topic was released today, a burst of writing after breakfast (not usually peak creativity time for me) and I put together a draft on giving advice and then taking that advice on starting a rare disease company. I am not a classic entrepreneur, no MBA, no business training, never started anything in my life and I do not drool after others’ business advice. Talk is cheap. 3 years ago I dispensed a little advice to Jill Wood, she listened, and she went off and started a company. It took 2.5 years to fund and we have a very long way to go before we will likely have anything to show for it.  Although with rare disease parents and devoted scientists like those involved, that could change very quickly. I dispensed advice but really I was probably telling myself to do the same. I had a huge moment of inertia to overcome, a family, 2 children and several part time consulting jobs that paid for the fun projects. Helping rare disease groups has become a bigger part in my life. You meet the rare disease children, the parents, the families, the scientists and that has a big impact too. Whatever I can offer is pretty insignificant, but ideas and experience at funding work through grants is totally translatable. A positive outlook when all the odds are against you helps in some small way. We will never start a major pharma but then we do not have to. Small is great. One scientist, one parent or patient is enough. The straw that broke the camels back was meeting another rare disease parent a few weeks ago, she told me of the treatment that might help her daughter that was languishing in a lab. I resolved then that I had to use the experiences gained from starting one rare disease company to start another for those unable to do what Jill did, because they have a full time job looking after the child with a debilitating disease.

At coffee with a good friend a few days later I mentioned my need to start this New Company and my severe inability to actually do it. I went off and wrote a 1 pager because that is what I generally do, write something. My friend then put me in touch with an accountant and the ball was rolling.  A couple of weeks later on and I had no reason why I could not write todays blog, I had given advice 3 years ago and it it took me that long to take it myself. Better late than never. My incorporation papers came through today.

 

Nov
04

App exposure

It is the little things in life that make it all worthwhile – perhaps. I have noticed a definite uptick in interest in the science apps we developed over the last few years. For example I was recently contacted by a science writer who wanted to write about the open drug discovery teams app. This is all very flattering, so we will see how that translates to an article. Simultaneously I noticed that Derek Lowe over at the In The Pipeline blog mentioned Android mobile apps and TB Mobile got a mention . Of course these free apps have been around on iOS for several years now. The Android version of TB Mobile is version 1, while TB Mobile 2 on iOS has built in machine learning algorithms and many more features.

There is a definite lag time between building something novel for science like a mobile app, getting some visibility for it and even publishing papers on it. Finally to get more general interest, and well by then you have perhaps moved on to something new. A post mobile app world anyone. Yes, the fact that so much technology can be crammed into these chemistry apps by the likes of Alex Clark is truly incredible. The fact that chemists are only now realizing the potential of mobile apps is astounding. I am a late adopter of technology, but chemists and pharma in general seem even later adopters. Imagine if these apps had been around 5 years ago. Of course it does make you wonder how many apps will survive and live on in one form or another as the software and mobile device tastes change. It would be a shame to lose some of the innovation just when its getting real traction and credibility.  A discussion of the legacy of the data and algorithms created in these apps needs to be going on to prevent someone else in the future repeating what has already been created.

Oct
31

Rare disease collection

The last few months I have been putting some rare disease related ideas out into publications, and now they are starting to see the light of day.

First there is an editorial that highlights some of the challenges in finding out about the 700o or so rare diseases. The editorial also has some artwork from a couple of children with rare diseases in the hope this can raise a bit more awareness. This is followed by an opinion piece with collaborators Nadia Litterman, Michele Rhee and David Swinney in which the idea of a centralized rare disease institute is raised as well as discussing topics such as how to connect with others interested in rare diseases and how to foster collaborations.

Hopefully these articles will be followed in the rare disease collection at F1000Research by submissions from other scientists and rare disease advocates.

Oct
23

Underwhelming big pharma response to Ebola – I urge more collaboration

A few weeks ago I asked where are the big pharma companies who have been largely surprisingly quiet during the Ebola epidemic. Today I see a press release announcing J&J will have their vaccine next year and GSK later this year. Possibly other companies will be involved at the WHO’s urging. What took so long? This has been all over the press for months and these companies are just now waking up from their stupor or pulling themselves away from playing Angry Birds or the “who can we buy out next” game! And why oh why is there no mention of trying small molecule drugs? Where are all those big pharma’s the Merck’s, Pfizer, Sanofi, Novartis etc?

Lets not forget that in the second world war many drug companies ‘collaborated’ to manufacture penicillin – what was the commercial gain then if any? What is the difference between then and now? The big pharma companies have largely lost their direction, perhaps even their moral compass in some senses. Where are the visionary leaders? I can bet if Paul Janssen was still around he would have been all over Ebola.

There are patients dying in Africa and elsewhere, get out of those meetings and do something, send your drugs to USAMRIID, get them tested vs Ebola and then if they are useful crank up the manufacturing and ship them out. Now there is a strategy. Stop talking about doing something with WHO and actually DO SOMETHING.

Oct
20

In response to the NIH director

I just found a post by Dr. Antony Fauci and Dr. Francis Collins on Ebola. So I have added my 2 cents – lets see if it gets accepted and a response from some moderator (..Tax dollars well spent – who would want that job responding to folks like me). There are plenty of smart people in the world and I am not seeing anyone using the great human assets that are out there to identify treatments for ebola.

Dear Dr’s Fauci and Collins,

A full 2 weeks before this post I searched the NIH’s own Pubmed (anyone can do this) and found a study from last year that DTRA funded to screen FDA approved drugs (http://www.ncbi.nlm.nih.gov/pubmed/23577127)-.. Several drugs (Amodiaquine, Chloroquine, etc ..) were found active and with promising data in mouse models. A Dr. in Haiti then alerted me to another paper with additional compounds (http://www.ncbi.nlm.nih.gov/pubmed/23785035). It appears there are no shortage of FDA approved drugs that have activity in vitro and in vivo in mouse etc.. there is even a common pharmacophore which I put in the public domain (http://www.collabchem.com/2014/10/02/a-pharmacophore-for-compounds-active-against-ebola/).

It would not take much for any of these drugs to be explored further. I am amazed that all the discussion is on a vaccine / biologics and yet there has been considerable efforts to fund screens of small molecule drugs and they have been largely ignored. I am also saddened by the lack of big pharma response (http://www.collabchem.com/2014/10/08/where-is-the-big-pharma-knight-riding-in-to-slay-ebola/).

We have known about the disease for 40 years and yet we did not have a plan for when it went beyond one village in Africa? That is very surprising to me as a scientist.
There are many questions that someone should answer, like why was the funding for small molecule screening and exploration of the hits/ leads stopped? Why was there no exhaustive effort to screen every FDA approved drug? Why are the existing drugs already on the shelf in Africa not being used? Why is nobody looking at those that are not getting Ebola, is it because they are already taking a medicine that is protecting them?

Food for thought.

 

Oct
16

How the experiment may impact the data

Here’s one to file under “I am still trying to get my head around it”.

Back in April at the CDD community meeting Christopher (Chris) Lipinski presented some slides looking at kinase selectivity and the relationship with ligand efficiency. There seemed to be a general trend that more selective compounds had a better ligand efficiency. My colleagues at CDD have been digging deeper into this and will present a webinar Wed, Oct 22 from 2-3pm ET at which Chris and Matt Soellner will debate Entropic and Enthalpic Propensities Inherent in SBDD and HTS.

Now my involvement has been pretty limited to thinking of some interesting datasets to compare. Obviously my personal bias is towards the neglected diseases and anything that is in the public databases. For one I was interested to see how the >1000 whole cell Mycobacterium tuberculosis (Mtb) hits coming out of high throughput screens compared with ligands from structure based drug design studies (SBDD) for which there are examples in the PDB. A measure of enthalpy of the SBDD hits suggested it was higher than for the HTS hits. Because several datasets have been released on antimalarial HTS hits we can do the same comparison with SBDD hits.

Without trying to give too much away I would say some of the slides I have previewed were very interesting. Of course most will want to hear about the kinase data but lets think about what other questions could we ask. SBDD by its nature is trying to optimize the fit of compounds into a target, simplistically it is trying to get good interactions. Phenotypic HTS is not bothered about that, key determinants of activity are getting the molecule into the cell and then shutting down some target/s. So hydrophobicity is predominantly driving whole cell activity for Mtb, as we see many of the hits have a higher calculated logP (using whatever method you decide), although other properties may also be key.

So fundamentally depending on what kind of experimental approach we use to get Mtb active compounds we are biasing towards compounds with different physicochemical properties. We have come full circle. Target based approaches to antibacterial drug discovery have been a failure because one they found few hits and two the few hits did not have whole cell activity. It seems obvious now but target based drug discovery is really finding a needle in a haystack, trying to get very specific interactions while whole cell approaches ‘just’ need to get the compound in and perhaps have OKish affinity for one or more targets. Maybe the latter represents more of a complete system effect (more targets to interact with vs a single target).

So what does this say about our efforts using computational approaches to find compounds active against Mtb? Will they also have some of the same issues inherent in HTS and SBDD? For example docking molecules in a crystal structure as part of SBDD is going to drive towards very specific interactions, and if the method and scoring functions are poor then the hit rate will be very low. Machine learning methods are going to learn from just the mass of data you give them. So if you feed in whole cell data all you are going to do is basically replicate the physicochemical properties that allow you to get compounds into Mtb and hit a whole array of potential targets. Is there some middle ground here, a hybrid approach?

Perhaps running compounds through whole cell assays and just feeding those hits into SBDD as starting points? Then followed by feeding the resulting SBDD designs/hits into whole cell assays to ensure that there is a balance between specificity and ability to get into the cell. Perhaps this iterative approach would be more efficient computationally as a pipeline where the known whole cell hits are fed into docking against as many Mtb structures in the PDB as possible and those that have good scores would serve as a starting point for design.

Another question you could perhaps ask is are the compounds that we want to avoid in HTS (like PAINS) different in some way? Would they stand out from real HTS hits and real SBDD hits. Is a PAIN found by docking more useful than a PAIN found by HTS? Do they have different enthalpy scores?

Well I am sure the webinar will have others asking questions too. Its certainly got me thinking.

 

 

Older posts «