Predicting Targets for TB Phenotypic Screening Hits and Openess of data and models

So far the only readily accessible, free approach to predicting TB targets is TB Mobile developed for iOS and Android with Alex Clark. While relatively crude the app does allow look up and similarity searching of 745 compounds with known targets so it may help propose possible targets for a new compound. Obviously there are caveats based on the data and limited number of targets annotated.

Nearly a year ago (Jan 23rd) both Chris Southan and myself described how the recently released set of GSK TB Leads could be used with various approaches to potentially predict possible targets in Mtb in this set. I even put together a poster showing how the first version of TB Mobile could be used to make predictions for 11 of the compounds. Later in the year (March) with Alex and Malabika Sarker we published a paper on TB Mobile. This described developing the app and some testing with literature compounds not in the app itself (in the majority of cases). We have also described  in a paper (November) how the dataset of compounds from TB Mobile can be used with the GSK compounds using a clustering approach to suggest targets.

Why is this important? Well phenotypic high throughput screening has increased dramatically for TB to the point were literally millions of compounds have been screened by GSK and Novartis and hundreds of thousands have been screened by many academic labs. The assays may provide hits but there is no indication of targets and if you want to optimize the compounds the target information is needed. Identifying targets experimentally is a slow and expensive process if this was undertaken on all 177 GSK leads.

So I just read a paper (which I recommend) found belatedly and published in PLOS Computational Biology (October 2013) by authors at a number of labs in the US and Europe. They use a chemogenomics approach with the ChEMBL database (Bayesian models generated with the commercial PipelinePilot software), Structural Space analysis (using Random Forest Score of molecule similarities versus compounds in the Protein Data Bank) and finally Historical assay space (using proprietary GSK bioassay data for 120 assays vs 63 human targets). The authors used the larger GSK set of 776 compounds tested versus BCG instead of just the 177 leads (its unclear why they did this). They predicted 139 target compound links but provided no experimental validation for any of the predictions.

Interestingly on p14 of the paper under the “Exploring historical assay data section”, there is a sentence describing target analysis and anti-malarial actives and inactives  – I am not quite sure how this fits in because it is not discussed elsewhere.

While they have provided a considerable amount of information which could be useful to others. This approach is not something any researcher can do themselves. For example while the Chemogenomics approach uses ChEMBL target data which is accessible, the models use PipelinePilot. I know, I know, again we hit that wall between using easy to use commercial tools and developing models that are open and accessible to all. But perhaps more difficult is the access to the proprietary GSK data used for the Historical assay space predictions. Because, well that is certainly not OPEN and certainly not something other researchers can take advantage of.

This paper for me while technically brilliant illustrates several of the biggest issues that perplex me in science. Many of us (including me ) continue to develop models with commercial software which is just not accessible to others ( even though I am happy to provide underlying compounds and the models themselves – you still will need the same software). Also we have commented before on the many computational modeling papers for TB that have no experimental validation in them. This is another example to add to the list. My final niggle is that the paper was not even the first effort to predict the targets for the compounds and no citations of the earlier efforts described above were made. How does PLOS reconcile this in publishing this work?

So how can we fix this. Well there needs to be more efforts on building and validating computational models using the open software and descriptors that are available. We are currently working on the next version of TB Mobile so I hope in due course we can provide more confidence in the target predictions based on the published data we have for TB.

Still someone has to take the next BIG step of testing the target predictions for the GSK compounds. Perhaps a few different groups could propose using their different target prediction approaches with just the 177 leads (which are of most interest, apparently). Then perhaps GSK would agree to test some of the compounds and identify targets. Having published extensively on computational models for TB bioactivity and also provided prospective evaluation and validation, I see this is what is going to be ultimately needed if we are to give other scientists confidence in the target prediction methods.

In the meantime I imagine these posts and our published efforts to try to make free tools available will continue to not be cited by those pushing inaccessible methods?





Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>