Viva Collaboration! A new paper in a PLOS journal – and epic responses to reviewers

I am excited to say our new paper on Chagas Disease just published in PLOS Neglected Tropical Diseases. This was a relatively painless process compared to previous PLOS experiences. We submitted March 31, had reviews back by May 8th and the paper was finally accepted June 5th after some back and forth. We did have some pretty extensive editor and reviewer comments to address, and in my bid for openness I now list the reviewer comments and our responses below. I am grateful to the editors and reviewers for their input. Of course I am also very grateful to our collaborators at SRI, UCSD and my co-corresponding author Jair!

To say this was an adventure would be an understatement:

This work was funded by NIAID as a phase I STTR to CDD, Inc. and upon funding we quickly had to find another collaborator as ours moved to south America and that’s a no-no for such grants. We lucked out in several respects in that the McKerrow group were kind enough to do the in vitro testing and when we found we had some interesting results we were able to get some in vivo data. There were a few dead ends along the way as we transitioned from our initial idea for the project to using machine learning models based on the public data. There was some remarkable luck in finding pyronaridine as it appeared to have been overlooked in an earlier screen by the Broad. We took a chance, retested it and then did something no one else had and that was put it in the mouse model for Chagas disease. I hope this is just the beginning for this molecule in this disease as its already used as a combination medicine for malaria. Several years of hard work by the team involved to get to this point. Its been exhausting and exhilarating in equal measures. What happens next, that’s the BIG QUESTION. Viva Collaboration!


Responses to reviews

The manuscript has been reviewed by three experts in the field with their comments and recommendations below.  I agree that this manuscript has significant merit, but requires “major revision”.  There was concern that the reported enrichment might be overstated if the chemical libraries are biased for bioactives (see reviewer 1).  The introduction contains many statements that oversimplify the true situation (see reviewer 3) and there are many other comments about style and clarity of the writing.  As pointed out by reviewer 1, the finding of several nitrofurans as hits does not support the notion that the method identifies non-obvious compounds.  The question about appropriate controls in the in vivo experiment needs to be carefully addressed.  The specific comments of the three reviewers are as follows:

Response- We have addressed the comments described below. We have proposed to use our approach with many other NTDs in future and at that point we will certainly evaluate other sets of compounds for T. cruzi activity. We were focused on drugs and natural products (components/starting points of many drugs). The current study represents a good validation of the method in our opinion without screening all 7200 to obtain complete statistics – which was outside of the scope and budget of the project. We do not think we have over simplified the situation in the introduction but we hope our edits have improved it. We found several non-nitrofurans as hits. It is our opinion that we had appropriate controls for in vivo experiments (known positive controls and vehicle).

Reviewer #1:

General comments: The authors present a new Bayesian machine learning model aimed at enriching the selection of anti-trypanosoma compounds from large chemical libraries which do not need to have biological annotation. This methodological approach may be very powerful in order to reduce sampling compound numbers down whenever there is limitation in the assay throughput. It is potentially applicable to other diseases. The validation of the approach is based on the comparison of the hit rate obtained from the final selection versus historical experience in random screening. Nevertheless it is not experimentally proven that the chemical libraries used as compound source are not biased for bioactives. If they were, the enrichment factor offered by the Bayesian model would be contaminated. It is recommended that the authors provide more compelling evidence in this regard.

 Response- We did not filter any molecules out of any of the libraries. Many of the libraries were made up of drugs so these could be classed as bioactives against a whole array of different targets, antibiotics, antifungals antidepressants etc. Without screening every molecule in vitro is unclear how many actives/ inactives there are vs T. cruzi in the 7200 compounds used for virtual screening. The whole point of the virtual screening approach is that we just test a fraction. We have tried to address in the discussion any compounds overlapping with the training set which was our bigger concern.  

As for the Target Prediction methodology, it would also be desirable that it were experimentally proven. However, it is understood that this experimentation may fall out of the scope of the current paper, so the manuscript could be published without it.

Response – we include the target prediction methodologies in the absence of experimental verification because this is work that will be performed in future. Suggesting potential targets is potentially useful to provide some idea of mechanism and allows the scientific community to empirically verify the hypothetical target.

Comments for authors´consideration: 1.    Page 4, Author´s summary, and page 11, last paragraph: the enrichment factor of the Bayesian selection vs. a random selection is not higher than 4-8 fold (as you cite far down in the paper). Can we guarantee that the compounds set “in silico screened” are not bias for bioactives (e.g. malaria, NP (cytotoxic)? If the compound sets used as source for the analysis were biased for antibiotic activity, an enrichment over random selection would also be observed. Why not a much larger compound database with random and broad chemical diversity coverage was used? Why not to test all compounds, or at least a subset randomly selected in order to ascertain false negatives and compare success rate of random vs. Bayesian tool-driven selections? Or retrospectively, for instance compounds in reference 26.

Response – It is unclear where the reviewer obtains the 4-8 fold enrichment value. As described above, none of the libraries we searched were filtered to remove any compounds such as antimalarials etc. We did not try to remove drugs of any class. It is unclear why we would want to do that if we were trying to find new molecules for T. cruzi. Similarly we did not filter for cytotoxic compounds. We could certainly screen much larger datasets but our goal was to see if the model could help identify molecules in the smaller datasets for which compounds were readily available. Clearly our results did not reveal any antibiotics as hits and it is unclear why we would want to search for antibiotics.

Ref. 26 is the GSK kineto-box which became available only after we did the initial screening and compound selection. Scoring the GSK kineto-box Chagas hits with Bayesian model may be useful to see if there were any compounds overlapping. Our goal was to perform prospective testing rather than retrospective testing.

2.    Page 6, Paragraph 1, line 9. I suggest to consider the following more recent review on this matter: Nature Reviews Drug Discovery, published online 18 July 2014; doi:10.1038/nrd4336 Ok

Response: We have added this paper on The discovery of first-in-class drugs: origins and evolution.

3.   Page 7, Paragraph 2: You may want to comment on the negative outcome of the clinical trial with Posaconazole and Ravuconazole. Since you cite non-CYP51 preclinical and clinical assets, I miss a consideration to fexinidazole and oxaboroles.

Response: We added a few sentences about the CYP51 Phase II results. The Fexinidazole work is not published yet (we spoke to Eric Chatelain). We mentioned also about fexinidazole and oxaboroles programs lead by DNDi and added refs. 

4.   Page 9, Paragraph 2, line 4 and page 10, paragraph 1: Have you had the chance of including compounds in reference [26] in your analysis? It might be a valuable set to assess false negatives retrospectively.

Response: This paper also came out after we had completed our model and screening and we have not included these compounds. Future work could make use of this although again this would be another retrospective analysis and of limited value.

5.    Page 13, line 1: typo of bracket? “… purchased from eMolecules (La Jolla, CA)…”

Response: Thank you we have corrected the position of the parenthesis.

6.    Page 14, Paragraph 2: what´s the statistical cut-off at 3xStDev? Is 50% well far above?

Response: Usually 60% is close to the 3x StDev from DMSO controls. In our screening specifically, the 3x StDev from DMSO controls was 70% in Replicate 1 and 66% in replicate 2. When we use a 50% cut-off, we are being less stringent and we know we will select false hits for dose-response, but we reduce our odds to miss a false negative.

7.    Page 16, Paragraph 3 (Results, Bayesian Models) and Figures S1 and S3. I miss the authors do not remark the nitro-aromatic/heterocyclic moiety as a “good feature” (i.e. G13 in Figure S1?). Actually, 3 out of the 11 compounds that turned out to be active in vitro (Table S2) contain this motif. Moreover, these 3 compounds are nitro-furans which clearly resemble the chemical structure of Nifurtimox, a classical toxic anti-chagasic compound in the clinic. I think the authors should discuss about this very predictable finding. It is expected that the Bayesian model also leads to selection of novel compounds which are not so close analogues of existing gold standards. Otherwise, its value-added is questionable.

 Response: We described this feature as “aromatic fragments containing basic nitrogen” We certainly found non obvious compounds like tetrandrine and pyronaridine.

8.    Page 17, Paragraph 2: I suggest indicating the total number of molecules scored through the Bayesian model. Approx 7,500?

Response: We added “Approximately 7200 molecules were screened using the Bayesian model.

9.    Page 17, Paragraph 2: Aren´t there 5, and not only 4, compounds with EC50<1 uM (including SC-0011754 in table S2)? Noteworthy, 2 out of these top 5 are nitrofurans.

Response: We have corrected this.

10.    Page 17, Paragraph 3: May the authors elaborate on the observations of adverse/toxic effects of Furazolidone and the nitrofural prodrug? And in comparison with Nifurtimox, which is not included in the experiment?

Response: With respect, we were not interested in the side effects of these very well-known and well characterized drugs for which we provided references.

11.    Table S2: SC-0011801 is last compound with EC50 figures in the table. I assume all the compounds below are inactive (>10), but why are the cells in the table empty? I am pleased to see the inclusion of EC90 and Hill slope in the table.

Response: We did not test these inactive compounds in dose-response experiments, therefore the values are not available.

12.    Table S2: I suggest you define the legend of “Infection Ratio” column. Are the responses from the primary screening at 10 uM in duplicate?

Response: Infection Ratio: number of infected cells divided by the total number of cells. Yes, primary screening was done in duplicate, thus the two values for infection ratio (at 10 uM).

13. Table S2: SC-0011754 is Nitrofural, isn´t it? If so, please add its name to the table.

 Response: Thank you – we have added it.

14. Page 18, Paragraph 2: Without experimental testing, the target prediction based on the Pathway Genome Data Base constructed cannot be validated. It is true you are building a sound hypothesis, but the goodness of the prediction is not assessed in the paper. Since, it is not experimentally proven that TR is the target for pyronaridine, I would not stress in excess the value added by the this approach.

Response: Thank you for your suggestions, this database creation is described in this paper and is now available to other researchers for the first time. We clearly described that our goal is to ultimately use the database for target identification. We did use the metabolites from the database for similarity analysis to propose targets (along with other approaches) for future testing. This represents considerable future work but is useful to propose such targets.

15. Page 20, Paragraph 1: If you discount the 3 nitrofurans as a foreseen selection, the success rate is 8/97, i.e. 8%. That means a 4-8 fold enrichment vs. random selection. Might the authors discuss whether this is higher than expected and the value-added? In order to fairly evaluate the goodness and value of the predictive model, I miss an assessment of false negatives, and a comparison with a random selection of compounds from the same set. One caveat: some of the compounds do not look like tractable for an oral and safe drug, e.g. SC-0011752, SC-0011796. Do you have cytotoxicity results for all 11 hits? Which of the 11 hits were present in the training set?

Response: Our previous work with Mycobacterium tuberculosis has shown with prospective testing that depending on the training sets and test sets we would see similar enrichments as described by the reviewer. Also we have reported hit rates from 20-70%. Generally HTS screening provides hit rates < 1% We did not try to perform random selection and did not test every compound in the libraries that were scored.  Clearly it would have been prohibitively expensive to do this. Cytotoxicity data is available for all the hits only in Table S2.

16.    Page 20, Paragraph 2: You may want to mention that Pyronaridine has been in clinical use in China. And positive opinion by EME. Current clinical trials for combinations.

Response : we have now added the following to address this ‘Pyronaridine is in clinical use as an antimalarial [90,91], is a P-glycoprotein inhibitor [92] and was given a positive opinion by the European Medicines Agency using this molecule in a combination therapy [93]. ’

Reviewer #2:

A very interesting and innovative approach to bring chemo and bioinformatics together to address research gaps in the Chagas field, and attempt to connect phenotypic and mechanistically driven approaches to drug discovery, with the goal of populating and diversifying the late stage discovery pipeline. The public access output of the research is an excellent resource. The identification of pyronaridine as a Tc in vivo active is an interesting discovery and PoC for the methodology.

Response: Thank you

The authors do not discuss any potential improvements that could be made to the model building/machine learning based on the molecules that were identified by the process.

Response: This is an interesting question. Our focus was not on how to improve the model/ machine learning. We have applied the same approach as was found to work reasonably well with TB and other diseases in our hands. A whole paper could probably be written on what other approaches could be tried which would be well outside the current scope. We could change the algorithm and descriptors and it would likely have a small effect. We could filter the training set further based on some criteria etc.

We have added There are many steps we could take to update our computational models such as incorporating the current data and using other machine learning algorithms.’

Specifically: (p12) the same set of molecular descriptors were used to evaluate natural product libraries and “drug-like” libraries (would the model be applicable to fragment libraries?)

Response: It is possible we could filter other vendor libraries and fragments, we have not tried this and it is unclear if / how this would be an improvement on what we have done.

(p16) the good molecular features identified in the DR response data alone model (Fig S1) are not well represented in the actual hit list

Response: These are just some of the good features in the training set, it is possible these features did not appear that often in the test sets of compounds.

(p17) functional groups known to be problematic in drug development are well represented in the hits eg nitro groups, furans, micheal acceptors, metal chelators (hydroxamic acid), gramine (reactive metabolite formation), poly carboxylic acids, high MW compounds eg steroids (what was the MW filter?), bisguanidines, amidines and polyoxygenated compounds.

Although most of these compounds did not repeat in the hit confirmation assay, can the authors comment on the filtering and selection process to remove nuisance compounds (frequents hitters in HTS sets – now described as PAINS (Baell et al)) or reactive functional groups from the learning libraries.

Response: We did not filter the sets of drugs and other natural products using the PAINS filters (if we were using large vendor libraries this would be a valid filter). It is well known that the T. cruzi molecules contain these groups but they have been known for a long time. We performed dose response analysis only on the active compounds 11/17 had EC50 < 10uM.

Reviewer #3:

This is potentially an interesting study that uses various computational approaches to facilitate drug discovery for Chagas disease and applies these approaches in proof-of-principle experiments. However, the actual content suffers from many shortcomings that need to be addressed before the story can be published. The main concerns of this reviewer include: 1. There is no entry for T. cruzi in the BioCyc database even though the authors claim they have created such a database. 2. Some experiments were executed without proper controls. 3. Manuscript text is not written well and contains many unsubstantiated claims, numerous typos, missing information and it is difficult to follow at times. It needs significant work and I flagged the most obvious examples in the specific comment section below, but that list is not exhaustive at all.

Response 1. We sincerely apologize that T. cruzi was unavailable in BioCyc when the reviewer attempted to access it. There was a server powerdown and it did not automatically come back up, and we are working to ensure it does not go down again. The database can be accessed at http://node2.csl.sri.com:1555/

Response 2 We have corrected the omission of the vehicle control in the methods which we described in the figure previously.

Response 3. We have made significant changes to address all these concerns.

Specific comments. Methodology/Principal Findings p.3 – “97 compounds…”- Please change to “Ninety-seven compounds…”.

Response: We have changed this.

p.3 – “We progressed five compounds to an in vivo mouse efficacy model…” Efficacy model of what disease? This is the first time in the manuscript any mouse model is mentioned.

Response: We have changed this.

Author Summary p.4 – “We have used data from a phenotypic screen to build Bayesian models to predict activity against T. cruzi in vitro.”  Predict activity of what?

Response: We have changed this. To anti-parasitic activity

p.4 – “We identified the antimalarial pyronaridine has having in vivo efficacy and providing us with a new starting point for further investigation and optimization.” Please change this sentence to a grammatically correct form.

Response: We have changed this.

Introduction p.6 – “In the 1980’s the pharmaceutical industry took advantage of advances in molecular biology/genetic engineering and began replacing cumbersome phenotypic and whole cell HTS with target-based screening assays.” The reason for replacing phenotypic assays with target-based ones was not because of the former being cumbersome. They are not – please delete this word. The sentence also implies that phenotypic and whole cell HTS assays were not one and the same thing. I am not aware of any phenotypic HTS run in the 1980s or earlier that would not be a whole cell assay. Please use ‘phenotypic’, ‘whole cell’, or ‘phenotypic, whole cell’.

Response: We have changed this.

p.6 – ‘While target-based screens using simple recombinant protein enzyme assays offer many advantages in terms of cost and scalability,…’ Please change ‘enzyme’ to ‘enzymatic’.

Response: We have changed this.

Target-based assays do not provide advantages over cell-based assays in terms cost or scalability, and these are definitely not the reasons why they are being used. This sentence needs to be changed.

Response: We respectfully disagree with the reviewer. In our experience target-based assays are cheaper and can be performed at a much higher throughput than a cell-based assay.

p.6 – “they rely on the assumption that the selected target is in fact the best and most druggable target for a given disease.”  Again, this is not the case. The assumption is that the target is good enough to yield a drug that can meet the drug target product profile for a given disease.

Response: We have changed this.

p.6 – “…especially for neglected infectious diseases where drug targets are poorly understood or target-based approaches have been unsuccessful in the past [1] (or a complete failure [2]).” Please remove the sentence in the parentheses and reference [2]. Common bacterial infections described in this article (such those caused by Staphylococcus aureus) do not belong among neglected infectious diseases. Additionally, cell-based screens described in the reference [2] fared equally bad in terms of finding promising hits as the target-based screens described in the same paper, so the reference does not support the point authors are trying to make.

Response we changed it to : “Nonetheless, in the last decade, there has been a shift back towards using phenotypic screens as a starting point for drug discovery, especially for infectious diseases where drug targets are poorly understood or target-based approaches have been unsuccessful in the past [1]. In fact, analysis of the origin of first-in-class small molecules found that phenotypic screens identified more novel inhibitors than any other approach between 1999 and 2008 [3].”

p.6 – “One such disease area, where target-based drug discovery has largely failed, is in the field of neglected tropical diseases (NTDs).” This is again incorrect. There simply are almost no validated targets for these diseases and target-based drug discovery was barely attempted. If authors have knowledge of failed target-based programs, they should include relevant references after this statement.

 Response ; we respectfully differ in our opinion. We provide plenty of examples of target based drug discovery such as: CYP51 and cruzain as mentioned in the text, glycoproteins and glycolipids synthesis (mucins, trans-sialidase), DNA topoisomerase and other enzymes involved on the replication of the DNA, phosphodiesterase C, etc.

6, 7 – “The trend towards using phenotypic screens over target-based screens is particularly strong for NTDs and related bacterial and fungal pathogens.” There are no fungal diseases on the NTD list as far as I know. In what sense are these fungal pathogens related to the NTDs?

Response: We have changed it to ‘The trend towards using phenotypic screens over target-based screens is particularly strong for NTDs as well as bacterial and fungal pathogens.’

p. 7 – “The current clinical and preclinical pipeline for T. cruzi is extremely sparse and lacks drug target diversity (currently focused on 3 targets, CYP51, cruzain and genes associated with DNA damage) [12-14]. Reference 14 leads to a website that offers Internet Information Services (IIS) for Windows® Server. Please provide references that support claims made in the text (perhaps this one? – ‘http://www.bvgh.org/Current-Programs/Neglected-Disease-Product-Pipelines/Global-Health-Primer.aspx’).

Response: We apologize for this broken link – we have changed this.

7 – “The remaining three products in clinical development (Phase I and II) target a single enzyme, CYP51, which has been the focus of Chagas disease research to date [17-22].” This is incorrect; please change to ‘the focus of Chagas disease drug development’.  

Response: We have changed this.

7 – “The only additional novel drug target with a single compound in preclinical development is cruzain, a T. cruzi cysteine protease and there is considerable literature surrounding this class of inhibitor.” Please change to ‘this inhibitor’ or ‘this class of inhibitors’.

Response: We have changed this.

8 – “There have been some target-based high throughput screens for CYP51 [22] and cruzain [24] as well as virtual screening for cruzain [23].” I assume that the authors mean that there were some screens to discover INHIBITORS of CYP51 and cruzain. If that is the case, please modify the sentence.

Response: We have changed this.

p. 9 – “In addition we have created a BioCyc database for T. cruzi, which complements other sources of related metabolic pathway data (including KEGG T. cruzi pathways [48]. Currently, there is no Trypanosoma cruzi organism database in BioCyc.

Response: We sincerely apologize that T.Cruzi was unavailable in BioCyc when the reviewer attempted to access it. There was a server powerdown and it did not automatically come back up, and we are working to ensure it does not go down again. The database can be accessed at http://node2.csl.sri.com:1555/

Methods p. 10 – “CDD database and Chagas datasets.” The authors need to specify the name of the dataset that was created as part of this publication. Is it “Trypanosome: Chagas Disease Literature Compounds”?

Response: The Broad dataset was named TRYPANOSOME: Broad Primary HTS to identify inhibitors of T. cruzi Replication

In the process of this work we also curated the dataset Trypanosome: Chagas Disease Literature Compounds

The molecules in Table S2 are currently in a private vault and will be shared at a later date.

p. 10 – “Data annotation and Pathway Genome Data Base construction”.  A large part of this text belongs into the Results section. Figure 1 is not very informative as it completely lacks any description other than Pathway Genome Data Base for T. cruzi. It is unclear to me what various symbols shown in the Figure mean.

Response we have added a results section for the PGDB and a description for Figure 1.

“A PGDB was constructed for T. cruzi using the complete genome sequence of the Dm28c strain (Figure 1).  The underlying genome sequence consisted of 5,287 contigs assembled into 1,378 scaffolds of 30,716,540 base pairs.  Pathologic found 11,349 distinct gene products, at least 880 of which were found to be enzymes and at least 16 of which are transporters. Pathologic was able to infer 1030 enzymatic reactions and 122 pathways from these assignments as well as the existence of 806 metabolic compounds. This set was filtered to 358 molecules after removal of compounds with R- groups and small nuisance molecules. This dataset was then used to infer potential targets by comparing the Tanimoto similarity with a phenotypic screening hit [39].”

12 – “These models were used to score the following drug libraries; Selleck Chemicals (Houston, TX) natural product library (139 molecules)…” The Selleck natural product library contains 131 natural products, not 139. Please check which of these 2 numbers is correct.
Response: The version of this library downloaded several years ago contains 139 molecules.

12 – “…and Traditional Chinese Medicine components (373 molecules).” Please, include reference for this library or, if a reference is unavailable, a brief description how it was put together.
Response : This library represents some common single component TCMs and was kindly provided by Dr. Ni Ai, Zhejiang University, China.

14 – “Hit selection and secondary screening (dose-response assay)” This section includes a repetition of the T. cruzi infection assay description from the preceding section (“Primary In vitro screening”) and contains some additional experimental details. If both sections refer to the same assay as it seems, the authors should avoid the repetition and should include only the more detailed protocol version in the manuscript.
Response: The methods section has been modified following the reviewer’s advice.

15 – “Mice were housed at a maximum of 5 per cage and kept in a specific-pathogen free (SPF) room…” Please change to “specific pathogen-free…”.
Response: We have changed this.

15 – “To infect the mice, trypomastigotes of T. cruzi Brazil luc strain were harvested from culture supernatant and injected intraperitonealy, 105 trypomastigotes per mouse.” Please describe how trypomastigotes were harvested and the volume of parasite suspension used for the infection. The level of detail is not sufficient for the experiment to be repeated by an independent group.

Response: We have changed this. Additional information was included.
Correct spelling is ‘intraperitoneally’.


Response: We have changed this.

p. 15 – “Starting on day 3 the infected mice were treated with test compounds at 50 mg/kg administered in 20% Kolliphor, i.p., b.i.d., for four consecutive days.” This experiment is missing the negative control in which infected mice are treated with the vehicle by IP injection. As both parasites and experimental compounds are injected into intraperitoneal cavity, the observed anti-parasitic effect might be at least partly due to the direct effect of the vehicle on the parasites proliferating in the IP cavity. The authors need to repeat this experiment with all the necessary controls.


Response: The controls were administered using the same route as the tested compounds: IP. The text was corrected.

Please include the volume used for injection of compounds. ‘Intraperitoneal’ is one word and should be abbreviated as ‘IP’ instead of  ‘i.p.’


Response: The Volume was included.

p. 15 – “At day 7 post-infection, the luminescent signal from infected mice was read upon injection of D-luciferin.” Please describe in more detail how this was done or include a reference. The level of detail is not sufficient for the experiment to be repeated by an independent group.


Response: We have changed this.

15 – “The absolute numbers of measured photons/s/cm2 were averaged between all five mice in each group and compared directly with compound-treated mice and the control groups.” This sentence does not make much sense. How did authors compare the luminescence signals with compound-treated mice and the control groups?


Response: We have changed this.

p. 15 – “The efficacy percentage was calculated based on relative luminescence signal reduction compared to the controls.” I assume they used the vehicle-treated group for luciferase signal normalization. This needs to be specified in more detail how it was done.


Response: We have changed this.

p. 16 – “Two tailed paired Student t test was used to assess statistical significance between luminescence values from vehicle-treated and compound-treated groups at day 7 postinfection.” The authors need to specify what hypothesis was tested – presumably that two such datasets are different?

 Response: We have rephrased this

p. 16 – “The UCSD Institutional Animal Care and Use Committee reviewed and ‘approved’ this study with protocol number S14187.” Why is the word “‘approved'” used with quotation marks?
Response: We had written – ‘All animal protocols were approved and carried out in accordance with the guidelines established by the Institutional Animal Care and Use Committee from UCSD (Protocol S14187).’

Results p. 16 – “Using either dose response data alone or the combination or dose response and cytotoxicity (dual activity) resulted in statistically comparable models.” This sentence is difficult to understand. What combination do authors have in mind?


Response: We changed it to Using either dose response data alone or the combination of dose response and cytotoxicity (dual activity) resulted in statistically comparable models.

16 – “Both had leave one out ROC values greater than 0.8 (Table 1).” I do not understand this sentence. Do the authors mean ‘leave-one-out ROC values’? Also, what is ROC? This parameter is not described in the Methods section. Does this refer to the ROC AUC parameter?


Response: ROC is now spelled out – it can be used interchangeable with ROC AUC. This is a standard measure of how the model performs in predicting compounds left out (whether leave one out, 5 fold cross validation or leave out 50% x 100 cross validation etc).

p. 16 – “The use of FCFP_6 fingerprints enabled the good features important for activity to be visualized in the dose response data alone model (Figure S1)…” What do authors mean by the good features? This is not postulated anywhere in the article. Do they mean features that positively correlate with compound potency or selectivity?


Response: We have changed the text to: ‘The use of FCFP_6 fingerprints enabled the features important for activity (termed good features) to be visualized in the dose response data alone model (Figure S1) which included tertiary amines, piperidines and aromatic fragments containing basic nitrogen functionality while those features that were negatively related to activity included cyclic hydrazines prone to tautomerization as well as a number of electron-poor chlorinated aromatic systems (Figure S2).’

p. 17 – “…providing very clear trends and perhaps a rationale for why the enrichments are so dramatic for these systems.” It is not explained anywhere what do the authors mean by ‘the enrichments’.


Response: We have deleted this statement.

p. 17 – “Ninety seven molecules were tested and 11 were found to have EC50 values less than 10 uM (Table S2). Four of these molecules (verapamil, pyronaridine, furazolidone and tetrandrine) had in vitro EC50 values less than 1 uM (Table 2).” The authors need to list how many independent experiments they ran for determining the EC50 values (n). Based on the Methods section it appears that they ran only one experiment in duplicate. If that is the case, they need to repeat experiments so that n=3 at least.


Response: Primary screening was done in duplicate

p. 17,18 – “In vivo testing” The experiment needs to be repeated with proper controls. Please see my comment in the Methods sections.


Response: As described in the methods we used the correct vehicle control.

p. 18  – “The molecules with the highest Tanimoto similarity in CDD were T. cruzi GAPDH inhibitors (Figure S6).” First of all, it is not clear from the Figure S6 legend what this figure shows. What molecule was used for this similarity search? Secondly, the shown molecules have only 43% similarity to pyroninaridine (assuming this compound was used in the search) and look very different.  The authors do not make any effort to introduce a validated metric that would assess how reliable this type of prediction could be. Unless they present some experimental evidence, this section does not have any value and should be deleted. Such a conclusion is further confirmed by quite extensive list of proposed targets that follows in the text and includes polyamine biosynthesis, trypanothione disulfide reductase, and topoisomerase VI.


Response : The figure S6 shows the similarity search of a public curated dataset using pyronaridine in CDD of Chagas related compounds from the literature. This is a different analysis to using a similarity search of the metabolites in the PGDB which resulted in one molecule with 67% Tanimoto similarity. Tanimoto similarity is widely used for similarity searching. We propose that this section of the manuscript uses all the molecule sets available and the metabolites from the PGDB to try to infer potential targets. Similarly searching in ChEMBL and other public sources is a sensible approach to try to leverage the information that is already existing for compounds and their known targets.

Discussion p. 20 – “Historically, for a diversity-based library undergoing HTS, it is expected a range of 1 to 2% of hits based on observed activity (usually >50% antiparasitic activity at 10 μM and no signs of cytotoxicity at this concentration) will be observed [30]. Screening in the attached reference was done at 3.75 uM compound concentration and resulted in identification of ~ 4,000 hits from 300k small molecule library. As the screening described in the current article was done at 10 uM compound concentration, one would expect the hit rate to be significantly higher than 1-2%. To make assessment whether 10% hit rate is higher than a hit rate in an unbiased screen, the authors need to include in Table S2 CC50 values for the listed compounds.


Response : Cytotoxicity data was generated only for compounds with dose response data


The Case for Why the EC Needs to Fund Small Molecule TB Drug Discovery

My previous post may have been too refined..so let me revert to a drier scientific style. In the last few weeks Iain Old, Giovanna Riccardi, myself and several others from the MM4TB project have been lobbying MEPs and basically anyone who will listen to try to raise awareness of the lack of funding for TB drug discovering in the Horizon 2020 project. We have summarized the dire situation in the following which we have shared by email.
Tuberculosis (TB) is a truly ‘global health crisis’ and a persistent threat in high income countries [1], affecting more than two billion people around the world [2, 3]. On an annual basis, nearly 9 million people are infected, with 1.5 million of them dying. In 2013, The UK, a country with a population of over 60 million had over 7800 tuberculosis cases. Although effective TB drugs have existed for over 50 years, multidrug resistant (MDR) strains have spread across the globe (including Europe [4, 5]) which hinders treatment and increases costs [2, 6-11]. Each year, at least half a million new MDR TB cases occur. Although incidence, prevalence and mortality rates are falling in African, Eastern Mediterranean and European regions it is not fast enough to meet the 2015 global targets of the Millenium Development Goals. At least $8bn is required to deal with this epidemic and that excludes the estimated $2bn that is needed for drug and diagnostic discovery. There are currently 15 vaccines in clinical trials [3]. TB vaccines are widely used in Europe and elsewhere and seem to offer some protection from the disease but have proved a dismal failure in developing countries, like India and in Africa, where the disease is endemic.  In contrast there are only 10 drugs for TB in clinical development [3] which is inadequate to address drug resistance to existing drugs. Trials of drug combinations have so far proved inferior to the current 6 month standard of care [3]. Therefore it is widely supported that there needs to be more early stage drug discovery to feed the clinical pipeline to address this disease [1, 3]. While there is extensive work on new vaccine approaches being undertaken; some supported by the EU, it seems likely that it will take two generations to reach widespread success, if indeed this can ever be achieved. Thus, there will be a continuing need for drugs to treat TB and particularly ones that are effective against currently virulent strains and mutations of these, particularly those that are already multi-drug resistant.
In the past the European Commission has funded TB research in FP7, with over €100 million (€16.3 million euros in major vaccine projects (NEWTBVAC), €20.2 million in drugs (MM4TB and ORCHID), €6.3 million on diagnostics and €19 million on epidemiology (TB PAN-NET)) [13]. We are concerned with the lack of funding from the European Commission for TB drug discovery in Horizon2020. For example In the EC’s own press release for World TB Day 2015 they made no mention of Tuberculosis drug development in Horizon2020 [12] and the only two projects funded (€26.2 million) concern vaccines (EMI-TB and TBVAC2020). While there is some work on TB being done in IMI, the lack of big pharma interest in Tuberculosis drug development means that IMI is unsuited for Tuberculosis drug development projects. If we are to retain the talented international research teams working to find drugs for TB which could help fight the rapidly advancing drug resistance, we must fund them in Horizon2020 rather than putting all EC money on vaccines and hoping it will pay off one day.
1.         Lonnroth, K., et al., Towards tuberculosis elimination: an action framework for low-incidence countries. Eur Respir J, 2015. 45(4): p. 928-52.
2.         WHO, Global Tuberculosis Report. 2013, World Health Organization: Geneva.
3.         WHO. Global Tuberculosis Report 2014. 2014; Available from: http://apps.who.int/iris/bitstream/10665/137094/1/9789241564809_eng.pdf?ua=1.
4.         Jakab, Z., et al., Consolidated Action Plan to Prevent and Combat Multidrug- and Extensively Drug-resistant Tuberculosis in the WHO European Region 2011-2015: Cost-effectiveness analysis. Tuberculosis (Edinb), 2015.
5.         Zignol, M., et al., Drug-resistant tuberculosis in the WHO European Region: an analysis of surveillance data.Drug Resist Updat, 2013. 16(6): p. 108-15.
6.         Velayati, A.A., P. Farnia, and M.R. Masjedi, The totally drug resistant tuberculosos (TDR-TB). Int J Clin Exp Med, 2013. 6(4): p. 307-309.
7.         Abubakar, I., et al., Drug-resistant tuberculosis: time for visionary political leadership. The Lancet Infectious Diseases, 2013. 13(6): p. 529-539.
8.         Dheda, K. and G.B. Migliori, The global rise of extensively drug-resistant tuberculosis: is the time to bring back sanatoria now overdue? The Lancet, 2012. 379(9817): p. 773-775.
9.         Gothi, D. and J.M. Joshi, Resistant TB: Newer Drugs and Community Approach. Recent Pat Antiinfect Drug Discov, 2011. 6(1): p. 27-37.
10.       Udwadia, Z.F., et al., Totally Drug-Resistant Tuberculosis in India. Clinical Infectious Diseases, 2012. 54(4): p. 579-581.
11.       Velayati, A.A., et al., Emergence of new forms of totally drug-resistant tuberculosis bacilli: Super extensively drug-resistant tuberculosis or totally drug-resistant strains in iran. CHEST Journal, 2009. 136(2): p. 420-425.
13.       Commission, E. World Tuberculosis Day 2015: EU Research to Fight Tuberculosis. 2015; Available from: http://ec.europa.eu/research/index.cfm?pg=world-tuberculosis-day-2015.


The war against Tuberculosis needs new drugs

This is not a call to arms for a war fought with bombs and bullets, but one which is equally catastrophic in terms of human loss. We are fighting armies of microscopic bacteria called Tuberculosis (TB) that knows no country boundaries, infecting 9 million and killing 1.5 million per year. Human kind has only defended itself since the discovery of TB drugs nearly 70 years ago, yet the effectiveness of this armament is rapidly dwindling. TB has regrouped with multidrug and extensively drug resistant forms spreading across the globe. This threatens to cripple our health systems with the costly and lengthy treatments. The most effective weapons have been drugs, while an effective vaccine has eluded us, but we still invest in the 15 current clinical trials. In stark contrast, there are only 10 drugs for treating TB in clinical development, many of which are new indications for existing drugs or combinations. Knowing the very high failure rate of costly clinical trials, perhaps just one of these might prove effective but at what cost? While our knowledge of TB functioning increases, there still needs to be more ‘early stage’ drug discovery efforts to feed the clinical pipeline. In the past the European Commission has funded TB research in the FP7 program*, with over €16.3 million euros spent on vaccine projects versus €20.2 million on drug discovery projects in academia and industry. This is a drop in the ocean compared with the $billions that are needed annually. In 2016 the focus will be on vaccines although it will likely take generations for success, with no funding for drug discovery in Horizon2020 at all. The result will be loss of our infantry, the many drug researchers that battle on the front lines in their laboratories. Europe has hope for winning the war, it must reverse this retreat away from drugs in favor of vaccines and we must fund these scientists. This will take courage on the part of the politicians and program officers, as drugs are not trendy new technologies that grab headlines, but a reliable workhorse which have held TB at bay. We should not surrender the only opportunity to defeat our nemesis.

*I am involved with the FP7 funded MM4TB project.


Find out more about Sanfilippo Syndrome and Charcot-Marie-Tooth

A couple of diseases I work on are described in two of recent newsletters.

First up is Sanfilippo Syndrome, which is described in the Lysosomal Disease Network’s newsletter Indications. This describes Jonah’s Just Begun and Phoenix Nest and was written By Jill Wood. I did a bit of proof reading of this article so I hope its OK – I take responsibility for typos.

Second is the latest CMT Update which has an article in it on the release of a new mouse model for CMT2.

Happy reading and a big well done to all involved in these rare diseases, and they show what can be done by some very talented parents / patients!


Turning points and games with a purpose

Today I was talking to one of my old mentors from my postdoc days (in big pharma), for the first time in a few years. I realized that working with him and exposure to computational chemistry software  was a real turning point in my career in 1996. Several more turning points later …and I am using computers everyday in my research & writing and that got me to where I am as described in my last post. As these things do, that discussion lead to me thinking about what would be the next turning point for me and perhaps science?

Last night I was catching up on some long overdue reading. Most people have  heard of FoldIt, the protein folding/ protein design game from the University of Washington. But what about other games with a purpose like Open-Phylo and Dizeez, I bet few have heard of these computer games. Has anyone reading this played them? What did you get out of them? How long did you spend on them and did they maintain your interest? As someone that has not played any of these games nor for that matter Angry Birds, I would welcome any insights before I try to find the time to give them a go. I am intrigued how others can spend time on computers and increasingly mobile devices to play games, while for a couple of decades I have used computers and software as tools for work and research and little else. What if I could use games to get my work done!

So could my next turning point be using software games as tools for drug discovery? How do we bring the software we use out of the expert domain and put it into the hands of the crowd for public good? I do not have any answers but I thought I would start a little list of such games with a purpose.

Here are the first 3 for the biological sciences:

FoldIt (see this, this, this this)




A triple life in science

Ever since embarking on my pretty unusual career path in leaving big pharma in 2001, I have been faced with several forks in the road, hard decisions on which way to go. To join a software company or not? work with university spin out? – some were good decisions others less so. Since 2008 I have generally opted for the path of least resistance and just added these opportunities to the growing list and ran with it. So when people ask me who I work for or what my day job is I have to take a few minutes to explain that I work for a lot of different companies or organizations and wear a few different hats. I am pretty sure when I go visit a company for the first time and explain this to them it can be hard to take in if they have perhaps only ever worked for 1 or 2 companies. I can summarize my work life as a triple life. I have been giving this some thought recently as I was invited to talk to undergraduate students at the upcoming ACS meeting in Boston in August. Doubtless I will prepare some slides at some point.

Currently the largest slice of my time goes to CDD where I work on neglected disease grants as PI (NIH, MM4TB) and write papers on the projects funded. After this I spend a good percentage of my time working on rare diseases, for Phoenix Nest, Inc. working on our enzyme replacement for Sanfilippo syndrome type D STTR with LABioMed and also acting as CSO at the Hereditary Neuropathy Foundation, working on Charcot-Marie-Tooth research. I also volunteer my assistance to Hannah’s Hope For GAN. In addition in my start up Collaborations Pharmaceuticals, Inc. I work with collaborators to perform preliminary experiments to get data we can use in future grants and patents. A common theme here is writing STTR or other grants as well as papers to raise awareness of the research undertaken. The final component is Collaborations In Chemistry, through which I do additional consulting for academia,  biotech and consumer product companies on ADME/Tox, neglected diseases and pretty much anything that comes along. This is also an outlet for any other interesting computational collaborative project that comes along, such as last years foray into Ebola Virus research and tweeting at conferences.

This diversity of projects is welcome because it makes it more interesting as I  like to continually try something new in science, although it makes it hard sometimes to cull projects. I am fortunate to be able to collaborate with so many terrific groups of people who can tolerate this alternative career path. Yes it is challenging to keep it all straight and manage time, but I would say working for oneself is something others should try if they are in the situation to do it. I am perhaps at another of those crossroads at which point I should probably hire an assistant to help, so if you know anyone that would be interested please let me know. Who needs a double, give me a triple life in Science!


Open Source Bayesian Models (X2)

For the last 5-6 years I have been kind of obsessed (in a good way) with how perhaps we could try to get computational machine learning models for drug discovery to a point where they could be shared. The reasoning behind this being that we publish papers, but the models described in them never really get used by anyone else. Its been a bit of a journey that  as of yesterday resulted in Alex Clark and I having 2 papers accepted at JCIM here and here. I thought I would provide a bit more detail of why I think this is important.

It all started back in November 2009 when I had a meeting with Chris Waller, Eric Gifford, Rishi Gupta (all Pfizer employees), Barry Bunin and Moses Hohman (CDD) at Pfizer. The hope was to try to get access to data from big pharma as models in CDD Public or  CDD Vault. What actually came out was something different but still useful. The light bulb went on at the table, why not compare commercial descriptors and algorithms with the open source descriptors and algorithms for different ADME datasets. A year later this work came out as a paper in Drug Metabolism and Disposition. Of course this also makes you think how the reliance on expense tools may be lessened.

Following this we put a SBIR together that helped to fund the development of the FCFP6 and ECFP6 descriptors (by Alex Clark) that are now on Github. These descriptors allowed Alex to build Bayesian models in TB Mobile 2.0 for target prediction. The most recent work published in JCIM builds on this to describe “the creation of a reference implementation of a Bayesian model-building software module, which we have released as an open source component that is now included in the Chemistry Development Kit (CDK) project, as well as implemented in the CDD Vault and in several mobile apps.”

There is still a lot of work to be done to get the CDD Models to where I want it to be, and validate models, but I hope by making the software and models accessible we have helped others to run with it too. The second part is independent of the CDD efforts and was to show what could  be achieved with these open source technologies.. “we performed a large scale validation study in order to ensure that the technique generalizes to a broad variety of drug discovery datasets. To achieve this we have used the ChEMBL (version 20) database and split it into more than 2000 separate datasets, each of which consists of compounds and measurements with the same target and activity measurement.”

We then made these models accessible on a website which can be used by anyone and uploaded into the mobile apps Alex developed.

We are immensely grateful to the 3 reviewers and editor (Alex Tropsha) of these manuscripts because they had double the workload. As I have done in the past I include the reviewer comments and our rebuttals to illustrate where the reviews made us modify the original submissions. Both papers were made open access – we have not had the proofs yet at the time of writing so there may be some typos needing correction.

It has been hugely rewarding working with Alex on this project and the immediate benefits I see from the 2000 ChEMBL models are that anyone could take these and use them to do drug discovery / virtual screening on so many different targets. Its pretty over whelming to imagine having so many models, and while its not “Big data” for some, for us as modelers this is about as big as it gets. The community does need to realize it can get even bigger as this represents just a fraction of the ChEMBL dataasets which are a moving target.


cover art idea


paper 1


Manuscript ID: ci-2015-00143z
Title: “Open Source Bayesian Models: I. Application to ADME/Tox and Drug Discovery Datasets”
Author(s): Clark, Alex; Dole, Krishna; Coulon-Spektor, Anna; McNutt, Andrew; Grass, George; Freundlich, Joel; Reynolds, Robert; Ekins, Sean

Reviewer: 1

Well written article about a nice, free and open piece of work about a Bayesian model-building software module used to build an array of Bayesian models for ADME/Tox, in vitro and in vivo bioactivity and other physicochemical properties. The thorough description including code examples makes the method easily accessible for readers. Releasing the software as part of a widely cited open source tool kit make it easy to access and test. I hope the authors pay the open access fees for this article.

Response: Thank you. We plan on making both parts open access if accepted.

Reviewer: 2

The authors’ two-part publication on the development and application of their open-source tools for building Bayesian models is well-written and addresses an important need in the field: Easy development of predictive models in the CADD field with free and public tools, and easy sharing of such models within the research community. I therefore recommend publication after minor modifications.

Response: Thank you for your comments.

I have some reservation about the large number of citations of previous CDD work in either manuscript, which smacks a bit of company advertisement. However, these cited works seem relevant for the topic presented here, so I’ll give the authors the benefit of the doubt.

Response: We agree the selected citations are relevant to the manuscript. There are a handful that we would class as CDD papers e.g. describing TB Mobile and CDD Models. The majority of the references by Ekins et al. relate to work done outside of CDD that is relevant including both academic and industrial collaborations using machine learning.

While Bayesian classifiers are certainly useful (and have been widely applied), there are other modern machine-learning techniques such as kNN, random forests, and all the way up to the hot topic of Deep Learning, especially if one desires quantitative vs. just classification predictions. I am sure the reader would be interested in hearing the authors’ view on, if not possible plans for, implementation of such models in an open-source approach as described here.

Response: We agree there are many approaches, as we mention briefly, however if we were to go into detail our manuscript would be a review. We have now added the note “A more exhaustive review of the different machine learning approaches is outside the scope of this work.” We have chosen to focus exclusively on the Bayesian approach for the reasons provided, and have submitted these manuscripts because we have explicit new contributions to describe. We have previously compared Bayesian and other approaches for classification with different datasets and seen little difference between algorithms based on ROC assessments. While these other machine learning methods are of interest to anyone in the field, we respectfully decline to comment on them further, as we do not have a significant amount to add to the subject at this time.

As far as I can tell, the authors mention applicability domain (AD) only en passant in ms. I and not at all in ms. II. One cannot do (and publish) modern (Q)SAR without AD analysis. In ms. I, what is the “applicability number” mentioned on p.20? What are the “further measures” (p.30) of AD they plan to implement? In ms. II, the analysis of “balanced” vs. “diabolical” partitioning is cute and instructive (though neither really novel nor unexpected in its outcome) but most importantly, lacks AD analysis: One would assume that most of the predictions in the “diabolical” cases were out of AD. The authors need to do and present AD data.

Response: “Applicability Domain” usually refers to QSAR with continuous descriptors, not to Bayesian methods with binary fingerprints. Our goal in the manuscript is to enable extra-pharma drug discovery projects to exploit in silico machine learning methods that have until now been confined in practice to pharma and to a few academic groups. To do this we use previously published datasets (described and validated by ourselves and others elsewhere) to show the open algorithm / descriptors produce similar results for the ROC values. Our goal was not to compare applicability for the models. We have updated the description of the CDD Models implementation to clarify our simplistic approaches for domain transferability measures applied here “After the model has been created, each molecule in the user’s selected ‘project’ receives a relative score, applicability number (fraction of structural features shared with the training set), and maximum similarity number (maximum Tanimoto/Jaccard similarity to any of the “good” molecules).”
In both papers, the authors talk about combining of assay result sets for the same target. In this context, they then do what most authors do to “ensure logical consistency” (ms.1, p.19, .l.39) by removing duplicates via averaging or exclusion of the compounds if the measurements are incompatible (ms.II, p.23-24). I have my issues with this default approach: What if these cases of incompatible results are exactly a warning sign that the entire two assays are mutually incompatible? Please report the extreme cases, i.e. the target:assay instance that had the highest percentage of incompatible results, both in terms of the fraction of all compounds, and the fraction of the overlap subset (compounds with multiple measurements reported). The point here is that if a significant number of compounds in the overlap set have divergent results, then maybe the combined collection should not be used for this mix-and-match approach altogether; and having only one measurement (with obviously no possibility for incompatible results) is actually not good but bad. This issue is obviously much more severe for quantitative models. But I am convinced that even classifiers are negatively affected by this. See for example the papers by Kalliokoski and Kramer et al. in the 2012-2013 time frame, analyzing these issues for ChEMBL data sets.

Response: We are entirely in agreement with the concerns expressed. We admit to being a little brief in describing how we reject incompatible results, though our description in paper 2 captures the essence of how we went about data preparation (e.g. the examples we give as “<3 and >4, or <6 and =7″ for two incompatible groups). In the greater scheme of things, we are working toward a data collation system that is a little smarter, and can use provenance information to make more informed decisions about how to deal with clashes (e.g. one source more likely to be incorrect, or a “voting” winner takes all in the case of more than 2 options). For the moment, however, we have simply assumed that everything in ChEMBL is equally valid, and used a very simple conflict resolution system, and described it in minimal detail. We assert that this is reasonable for this project, since it defers to the ChEMBL curators, who have a rigorous process in place. It is important to point out, however, that the extraction process that we used to obtain model sources from ChEMBL has been carried out for the purpose of creating a large number of test cases containing highly realistic data, with the objectives being to (1) demonstrate that a significant amount of data is readily available, and (2) to build and validate additional algorithms for working with this abundance of models, in a way that is scalable in terms of human time. The ability to obtain thousands of models from public sources is quite novel in cheminformatics, and has only become viable in recent years due to improvements in the quality of public data, and open source algorithms. For purposes of using this data for a major prospective drug discovery campaign, we would recommend more attention to detail, which we are currently pursuing.

Paper 1 (ci-2015-00143z) Open Source Bayesian Models: I. Application to ADME/Tox and Drug Discovery Datasets:

p.4, l.27: “[…] have essentially put the experimental assays out of business.” – Do the authors have a reference for this or is this just hearsay or private discussions?

Response: We have had numerous discussions with ex-employees (whom we cannot cite) at big pharma and the wealth of papers from Pfizer over the last 5-10 years (which we cited in the sentence) clearly show the strength of models developed.

p.5, l.20: “The current development of technologies for open models and descriptors
build on established methodologies.” – Is “build” a verb or a noun here? If the former, it should be “builds” since, to be grammatically correct, it has to refer to “development.”

Response: We have used ‘builds’.

p.5, l.46: An additional freely available web tool for the prediction of toxicities, physicochemical properties, and biological activities that the authors could cite is the Chemical Activity Predictor at http://cactus.nci.nih.gov/chemical/apps/cap.

Response: thank you for bringing this to our attention. We have added “In addition, there are web tools for the prediction of bioactivities and physicochemical properties like the Chemistry Activity Predictor (GUSAR) {Zakharov, 2014 #7222}.”

p.31-32, sections Author Contributions, Conflicts of Interest, and Acknowledgments: Punctuation and name abbreviation issues (SE vs. S.E. etc.).

Response: We are grateful to the reviewer for taking the time to identify these errors, and have fixed each of them.

Reviewer: 3

The authors describe an implementation of Naïve Bayes within CDK and E/FCFP* descriptors. They show some examples with development and sharing the models using their development. The authors indicate that their development enhances CDK tools by allowing users to easily develop, publish and apply and share models. This is an interesting extension of CDK, which, however, on my opinion require a more focused article. Indeed, in this study the authors try to combine software development and benchmarking studies. With respect to the first study I suggest the authors to write it as an Application Note (see guidelines on the journal web page) while the second part of the study should be done as a proper benchmarking study (see also below) to prove that NB has a significant value to the readers of the journal.

Response: We thank the reviewer for their comments. We believe our work is worthy of a manuscript rather than an application note as it describes software development and application in paper 1. Paper 2 uses the software developed in paper 1 for a novel application, namely the challenge of building 1000’s of models from a very big dataset as well as automatically assigning classes from continuous datasets. Neither of the other two reviewers suggested publishing paper 1 as an application note.

In many places, the authors term Bayesian models instead of Naïve Bayesian (NB) model. NB is crude approximation of Bayesian modeling (e.g., there is an assumption that all descriptors are independent). Some short introduction to the theory of NM approach and its comparison with full Bayesian models, which provide optimal separation of classes, should be made. Several objective benchmarking studies, see e.g. http://www.cs.cornell.edu/~alexn/papers/empirical.icml06.pdf, have indicated that NB has did not have a good reputation in comparison to different modern methods. Moreover, since 2006 many new algorithms have appeared. Therefore, the application of this method in computer science literature is rather limited. To this extent the conclusion of the authors that NB performs similarly to other used approaches are unexpected. I believe that it is a result of a specific selection of the studies used in this comparison.

Response: We believe that we have defined this terminology well enough to be able to use the term “Bayesian” as shorthand notation. The method we describe is actually the Laplacian-corrected naive Bayesian, which is unwieldy. We introduce the difference in some detail, and why we have followed previous cheminformaticians in favouring this variant: it is highly amenable to the use of thousands of structure-derived fingerprints, but it has significant drawbacks, one of them being that the result is not a probability, which is different to versions such as the standard naive Bayesian approach. We have devoted a significant amount of discussion to this, and do not believe that any more is required. Our previous papers cited in this manuscript describe numerous examples of comparing Bayesian versus SVM versus Trees, in all cases we have seen little difference using the exact same molecular descriptors when comparing the ROC for test sets (leave out groups or external).

We included 3 references to describe Laplacian-corrected naive Bayesian: Rogers, D.; Brown, Klon, A. E.; Lowrie, J. F.; Diller, D. J., Improved naive Bayesian modeling of numerical data for absorption, distribution, metabolism and excretion (ADME) property prediction. Journal of chemical information and modeling 2006, 46, 1945-56.
R. D.; Hahn, M., Using extended-connectivity fingerprints with Laplacian-modified Bayesian analysis in high-throughput screening follow-up. Journal of biomolecular screening 2005, 10, 682-6.
Chen, B.; Sheridan, R. P.; Hornak, V.; Voigt, J. H., Comparison of random forest and Pipeline Pilot Naive Bayes in prospective QSAR predictions. Journal of chemical information and modeling 2012, 52, 792-803.
Indeed, while the authors provide comparison of some models to previous results, they do it almost exclusively using models from their own publications. Moreover, some of these publications were review articles (e.g., ref 108), which thus may have a limited value in terms of achieved accuracies.

Response: Our previous papers cited were not all reviews (like ref 108), we use ref 108 to simplify referencing the earlier papers described. In this manuscript we describe numerous examples of comparing Bayesian versus SVM versus Trees, in all cases we have seen little difference using the exact same molecular descriptors. Our aim by comparison is to show that the ROC values (n fold validation) in the current study are similar to those used previously in our earlier studies.
This is on my opinion is not sufficient. For several datasets, e.g. AMES mutagenicity, there are multiple benchmarking studies. A proper comparison of the performance of the proposed methodology to these results (using similar test protocols as provided within these studies) is required to prove the claim of the authors that models developed with NB are of similar quality as compared to other methods.

Response: We respectfully disagree that our validation is insufficient. In a previous publication we described the implementation of ECFP6/FCFP6 fingerprints for use in the CDK toolkit, which we performed with the intention of matching the efficacy of the original implementation that was designed by SciTegic and published. While details were withheld by SciTegic, we have previously established that our implementation has equivalent performance. Since we are using the exact same algorithm for deriving Bayesian models, it is hardly a stretch to expect that the Bayesian models we created would also perform similarly well, and we have presented a number of examples to indicate this is the case. We have used the same method in many other cases which are not (yet) published and found this to be the case also. Readers are also able to confirm this for themselves using the open source implementation. In short, we believe our claims are therefore valid, and comes with plenty of supporting evidence. The goal of paper 1 was to show that our method with an open algorithm and descriptors could reproduce similar statistics for datasets which we have used previously. We believe it is acceptable to use our past datasets for this, many are published by others and prior papers provide this information. Our goal was not to focus on benchmarking of any one dataset– see comments above.

If this is not the case, I do not really see the advantages of development and sharing the NB models. The scientists will be doing this using the best available approaches notwithstanding whether they use public or commercial software. Indeed, the economic gain by applying most predictive algorithms can provide much better cost savings compared to the use of the models with lower prediction ability. This economic gain can be much higher than the software costs. Moreover, the problem with model sharing in many cases is not limited by the availability of open software or open descriptors. It is more related to problems with IP issues and data security.

Response: We have previously shown there is no significant difference between costly commercial software and using open source descriptors and algorithms with very large datasets at Pfizer and extensive external testing (Gupta et al., 2010 paper). We find the reviewer’s comments to be quite perplexing. If the reviewer means to say that there is zero value in creating an open source freely sharable implementation that can be easily used by any scientist on virtually any platform, when there are currently only expensive & proprietary products that are unavailable to all but a few… then it is hard to know where to start with rebutting this. Needless to say that we know from our own experience that this is simply not the case. Bayesian methods based on circular fingerprints are extremely useful (as we and many others have attested in the literature), and putting them in the hands of everyone who could possibly benefit from them has value that is self-evident, to say the least. We are also interested in the possibility of making other methods available in the same way. Making it easy to share the resulting models is major theme of our work of late, and we are pursuing this goal as far as our resources allow. We have discussed some of the caveats of potentially revealing information about the molecules used to create the models, in order to provide the users with the ability to make an informed decision about IP protection. The benefits of sharing are numerous – part of the challenge is because scientists create models and publish them in a way that does not make them accessible to others – we want to try to change that, and time will tell how much impact our efforts will have. Our motivation is not economic gain but scientific gain. Models can be shared securely in CDD or they can be made completely or partially open. Each use depends on the needs of the user for the particular project.

Last, but not lest, all data used in this study should be supplied together with the article (as zipped files with chemical structures, names of molecules and activity data; the original data sources can be also included.). The indicated links do not provide an access to all datasets (i.e., registration is required for some sets). This will be required to allow the readers to re-use them in other studies.

Response: As we have described in the data and materials availability: Data and materials availability: ‘All computational models are available from the authors upon request. All molecules for malaria, tuberculosis and cholera datasets from Table 1 are available in CDD Public (https://app.collaborativedrug.com/register) and the models from Table 2 are available from (http://molsync.com/bayesian1).’
The CDD public data from Table 1 is readily accessible after registering. Most of the datasets in Table 1 have already been published and made available by others (see citations). We include just one proprietary dataset (Caco-2). If scientists need access to the other datasets they can request them from us. We are not aware there is a requirement of the journal to make all data open. Clearly drug companies that publish data in JCIM do not do this frequently.

Since the content from paper 2 represents a rather large fraction of ChEMBL, it would be antisocial to include it as a either a single file on the ACS server, or as thousands of smaller files. For this reason we prefer to host it ourselves (and make it accessible to the community) in a way that is more convenient to the reader.


Please make sure your COI statement appears in the manuscript:
“S.E. is a consultant for Collaborative Drug Discovery Inc. A.M.C. is the founder of Molecular Materials Informatics, Inc.”

Response – yes we included this.


Paper 2
Manuscript ID: ci-2015-00144w
Title: “Open Source Bayesian Models: II. Mining a “Big Dataset” to Create and Validate Models with ChEMBL”
Author(s): Clark, Alex; Ekins, Sean

Reviewer: 1

This paper is a companion to the software paper submitted in parallel. It describes the use the ChEMBL to test their two-state Bayesian classification described in the parallel paper.
The reasoning behind the study as well as the methodology is properly described and accessible, as is the extraction of the underlying data sets.
All models produced in this study are openly available, as is the software which has been integrated into the open source chemistry development kit (CDK).

Response: Thank you.

Reviewer: 2

Paper 2 (ci-2015-00144w) Open Source Bayesian Models: II. Mining a “Big Dataset” to
Create and Validate Models with ChEMBL:

p.20, l.38: “independent not order dependent” – strangely phrased.

Response: Corrected.

p.23, l.23: “The first limit clause restricts to any of the assay identifiers for the block, which varies from one to thousands.” – Unclear phrasing and/or mangled grammar: Restricts what? And what varies from one to thousands?

Response: Corrected.

p.26, l.1ff: What software and method was used for this analysis and the plots?

Response: Analysis has been done by software described in these two manuscripts. Plots were created using original software (which we do not describe, since it is not novel and was created only to support the manuscript).

p.33, l.15: “described by Keiser et al., 80.” – Screwed-up punctuation. Or a sentence part missing?

Response: we have changed this as follows. ‘Similarity ensemble analysis (SEA) was described by Keiser et al., 80 which used 246 targets and 65,241 molecules and the Tanimoto similarity was compared for each pair of molecules. This approach was used to identify new targets for several known drugs that were not expected.”

p.33, l.34: “These had correct […]” – better: “These models…”

Response: Corrected.

p.33, l.37: “build models for adverse drug reactions these in turn” – comma (or semicolon or even full stop) missing before “these”.

Response: Corrected.

p.33, l.46: “Natives Bayesian” – I am pretty sure the authors meant “Naïve Bayesian.”

Response: Corrected.

p.33, l.48: “It was however shown that combining HTS a fingerprints […]” – Mangled sentence.

Response: Corrected.

p.34, l.25: “over 1800 molecules tested against over 800 molecules” – this makes no sense.

Response: Corrected.. Should be 800 end points/ assays.

p.36, l.8: “In this case, secure collaborative software would be used to transfer and run the model.” – Too much advertisement for CDD.
Response: We are stating a fact that if IP was to be maintained it would have to happen in a secure environment. We do not mention CDD explicitly.

p.38, sections Author Contributions, Conflicts of Interest, and Acknowledgments: Punctuation and name abbreviation issues (SE vs. S.E. etc.).

Response: Corrected.
Reviewer: 3

The authors have extracted and analyzed datasets extracted from ChemBL database using naïve bayes classifier. They tried to develop a threshold schema to separate quantitative data on classes of active and inactive compounds and made developed models and associated data available for download by the external users.

The mapping of naïve bayes scores to probability estimation is well known in the computer science literature, which has been addressed more than a decade ago, see e.g. http://www.research.ibm.com/people/z/zadrozny/kdd2002-Transf.pdf. I do not see a reason to develop “yet another” algorithm without providing a correct benchmarking and comparison of it with the previous studies.

Response: The reviewer has not taken into account the fact that we are describing the Laplacian-corrected variant of the naive Bayesian method. The references given refer to the conventional form, which is more popular outside of cheminformatics (which usually does not have to deal with thousands of fingerprints), and as such are describing the process normalizing values that are already formally probabilities in the 0..1 range. The method that we have adopted generates values with arbitrary scale and range, and so this limits the extent to which they can be interpreted. The raw values are suitable for ordering, but little else. We are not aware of previously disclosed methods for transforming continuous values into a “probability-like” range, and we deem this to be of some value to cheminformatics. We have also described these issues in considerable detail in the text, and do not believe that any further discussion is necessary.

The authors should not substitute term “Bayesian models” with “Naive Bayes models”. NB is based on very strong assumptions about the statistical properties of descriptors and does not provide optimal models as full Bayesian modeling.

Response: As previously noted, we have used the term “Bayesian” as shorthand for “Laplacian-corrected naive Bayesian”, after having introduced the term. In the interests of literary quality, we have kept to this convention.

The article does not have a result section. It starts with description of data preparation, which belongs to the Data section. Actually, there is no need to specify sql queries used to extract data. Such technical information can be better published as supplementary materials or just skipped.

Response: We respectfully disagree. We have formatted the manuscript in a way that we believe serves the casual reader as well as anyone studying it in detail, and have provided all of the content categories that are expected of a research paper. While migrating the SQL queries to supplementary information would not be a dealbreaker for the overall value of the manuscript, we believe that it is useful for readers to communicate what work is required in order to transmute the data source into something that is immediately useful. Some readers may be under the impression that it is much easier or much harder than it really is, and anyone who is familiar with data processing methods would find it valuable. For this reason, we have retained this section of the manuscript as is.

The IC50 values used by the authors were collected from different articles, which were based on different experimental conditions. The authors should provide some arguments and discussions how the use of different experimental conditions affects the results and why such different data can be merged together.

In some cases the ChEMBL data are also from a single lab so it depends on the dataset. We agree one would expect some interlab variability when the data comes from more than one laboratory. In this work, which is focused on method development, we have “passed the buck” to the ChEMBL team. We have explained in detail (hence the SQL queries) how we have chosen to assimilate values with the same target/assay types. To the extent that they are incompatible, this is decision that was made by the curators of ChEMBL. To scientists using this data for prospective studies, it is up to them to decide whether it was reasonable for us to assume that the ChEMBL curation is good enough. We do not argue that this core assumption is appropriate for all drug discovery scenarios, but we do demonstrate our claim that by doing so, it is possible to produce a large number of models with an entirely automated method, and that this is of interest to the greater community. Whether the generally-high ROC values are indicative of high compatibility of data from different labs is not something we claim to have proven. However, the fact that there is sufficient high quality open data data – and now methods – for creating well-performing models for many hundreds of biorelevant targets with adequate model sizes is in itself very interesting, and in our opinion, well worth sharing with the community.

I do not understand the arguments about the need to develop sophisticated algorithms to select a threshold for classification models. The regression models are much better suitable to work with quantitative models. If only a classification is required, the selection of a threshold depends on the intended use of the results (e.g., models developed for screening of new compounds with 10mM and 1µM should be based on the appropriate thresholds for the activity data). Because of these two arguments, I could not really follow the logic and need to design some new criteria for separation of active and inactive compounds. Again, this part belongs to the methodological part of the article.

Response: We may have erred on the side of assuming that this concept is familiar to all cheminformaticians involved in drug discovery, though we believe it is reasonable given the readership of the journal. In order to build a model based on 2-state classifications, it is necessary to have data that is classified as one of two states. Since bioassay data is typically given as continuous values, often in concentration units, the easiest way to do this is to define a threshold. The choice of threshold varies considerably depending on the circumstances, e.g. for some targets, only strong binders are interesting, while for other cases, the available data may not include many/any strong binders, and so a lower threshold is appropriate. The best choice is not necessary obvious. When a handful of models is being considered, contextual scientific knowledge is usually available, with manual trial-and-error as a fallback, but for thousands of models, this represents a major scaling issue. It is our belief that most of the readers for whom this article ought to appeal will be familiar with this concept, and that the explanation we have provided in the manuscript is sufficient.

Thus, actually I did not find what are the results of this study and what is their value? Who will be the potential users of the developed models and how can be these results used? Unfortunately, the article does not have a clear answer to this question.

Response: We have spent some time describing the possibilities that arise from having thousands of models for bio-relevant targets based on high quality open data. We believe that this should be largely self-evident to anyone who is working in the drug discovery industry: having a model for almost every drug target conveniently on hand, and freely available, is transformative and quite different from the status quo. As we have described in the discussion, while others have built Bayesian models for multiple targets, none has considered the scale of what we have demonstrated with the ChEMBL data – namely over 2000 classification models. While we make no claims to the effect that these models are completely ready to be used directly for prospective drug discovery campaigns, it is a major step in the direction of creating large collections of models, and should be of very broad interest and applicable to other data collections. From experience and collaborations, we have already identified academic and commercial organizations that would benefit from the models, and fully intended to follow up any interesting results with disclosure in the literature.


My experiences reviewing at PLOS

While I am a supporter and proponent of open accessing publishing, I have in the past been fairly critical of some of these journals, in particular PLOS. Even though I still publish in their journals I have held off from PLOS ONE in favor of F1000Research who I feel have a better publishing – reviewing model which I suits me just fine. Today something new happened as far as I can tell though. They (PLOS ONE) sent me and probably thousands of others, thank you’s for reviewing! I am flattered..but hold on..

Sean Ekins
PLOS ONE Reviewer (2014)

May 2015

Dear Sean,

On behalf of PLOS and the PLOS ONE editorial team, I would like to thank you for participating in the peer review process this past year at PLOS ONE. We very much appreciate your valuable input in 2014. We know there are many claims on your time and expertise but with your help, we have continued to publish an influential, lively and highly accessed Open Access journal. Simply put, we could not do it without you and the thousands of other volunteers for PLOS ONE and the other PLOS journals who graciously contributed time reviewing manuscripts.

A public “Thank You” to our 2014 reviewers – including you – was published in February 2015.

(2015) PLOS ONE 2014 Reviewer Thank You. PLoS ONE 10(2): e0121093. doi:10.1371/journal.pone.0121093

Your name is listed in the Supporting Information file associated with the article. I hope that you will be able to use this letter, along with the article citation, to claim the credit and recognition you deserve within your institution for supporting PLOS ONE and Open Access publishing.

If you would ever like to provide feedback on our processes, we would very much welcome that. Please send your feedback to us at plosone@plos.org.

With Gratitude,

Damian Pattinson
Editorial Director

P.S. If you’d like to receive news and information from PLOS, opt-in here.

I appreciate the little email but at the same time it made me wonder if all of this is starting to get a little out of hand. Maybe its just me. I find that perhaps I am getting a bit more cynical but do we really need to get credit for reviewing papers? Is it not our role as scientists to review papers? Honestly as an editorial board member at other journals I get to see plenty of papers that get routed around to reviewers and while I do not review as many papers as I used to I feel I am doing my bit for scholarly publishing. I also know there are also some mechanisms for paying reviewers, and while I do not want to stifle these alternative business models, they are not yet the norm.  So quite honestly I do not feel the need to claim credit for reviewing someones paper. Yes, I like the idea of listing openly the names of reviewers of each paper (in the interests of transparency) and PLOS ONE are not there yet. I also like the idea of sharing reviewers reviews as I have been trying to do here and PLOS ONE do not do that either. But I for one will not be adding another line to my CV that states I reviewed for PLOS ONE. What next a citation for reading a paper in PLOS ONE? Where do we draw the line? My request to them is do the things that help with transparency and building confidence in reviewing rather than what appears trivial to me. I would prefer a discount on publishing with them if I review papers rather than some citation. That would be a very nice “Thank You”. I look forward to that kind of email in future, but it may be a long wait.


Contrasts in Pharmacology 2.0

Last week I was in Turin to give a talk at the Contrasts in Pharmacology 2.0 meeting, organized by the Fondazione Internazionale Menarini.

My talk title was provided to me “Bigger data to increase drug discovery”.

I am grateful to the organizers for inviting me and was honored to be there alongside speakers from the WHO, GSK, Roche, Brigham and Womens Hospital, University of York etc… It was a great opportunity to learn about other areas in pharmacology and meet some new researchers. The talks are all available and were recorded for online viewing.


Papers and posters for ACS Boston 2015

It is that time of the year when we get those automated emails telling us our abstracts have been scheduled for the next ACS in August. In addition to these I will be participating in a careers panel for undergraduates which will be interesting – hopefully to inspire the next generation that there is lots to do in drug discovery, cheminformatics, toxicology and rare diseases. In addition I will have a poster in the small chemical business section, that should be a new venture as a budding entrepreneur. So plenty of work ahead in prepping for these. Finally there are posters collaborators are submitting which I hope to add in due course.

Here goes the list of papers to be presented at the 250th ACS National Meeting that will be held in Boston, Massachusetts, August 16-20, 2015.

PAPER ID: 2248982
PAPER TITLE: “Mining big datasets to create and validate machine learning models”

DIVISION: [CINF] Division of Chemical Information

PAPER ID: 2248973
PAPER TITLE: “Making it open: Putting cheminformatics to use against the Ebola virus”

DIVISION: [CINF] Division of Chemical Information

PAPER ID: 2246123
PAPER TITLE: “Applying cheminformatics and bioinformatics approaches to neglected tropical disease big data”

DIVISION: [CINF] Division of Chemical Information

PAPER ID: 2249028
PAPER TITLE: “Development and sharing of ADME/Tox and Drug Discovery machine learning models”

DIVISION: [COMP] Division of Computers in Chemistry

PAPER ID: 2248999
PAPER TITLE: “Mobile Apps for Transporter Drug-Drug Interaction Prediction – A Tool of the Future, now

DIVISION: [TOXI] Division of Chemical Toxicology

PAPER ID: 2248989
PAPER TITLE: “Starting small companies focused on rare diseases”

DIVISION: [SCHB] Division of Small Chemical Businesses

Older posts «