Missing citations and papers relating to the Malaria screening public datasets

Just catching up on some reading and I caught the nice paper on the MalariaBox in PLOSONE which describes the molecular property profiling and filtering of 3 sets of compounds screened by groups at GSK, Novartis and St Jude . The readers of this paper may be interested in work not cited that also discusses properties and filtering the 3 malaria screening sets from 2010.

For example a Google search with “GSK and malaria and filtering” pulls up the first 2 papers below:

Drug Discov Today. 2010 Oct;15(19-20):812-5. doi: 10.1016/j.drudis.2010.08.010. Epub 2010 Aug 21.
When pharmaceutical companies publish large datasets: an abundance of riches or fool’s gold?
Ekins S, Williams AJ.


The recent announcement that GlaxoSmithKline have released a huge tranche of whole-cell malaria screening data to the public domain, accompanied by a corresponding publication, raises some issues for consideration before this exemplar instance becomes a trend. We have examined the data from a high level, by studying the molecular properties, and consider the various alerts presently in use by major pharma companies. We not only acknowledge the potential value of such data but also raise the issue of the actual value of such datasets released into the public domain. We also suggest approaches that could enhance the value of such datasets to the community and theoretically offer an immediate benefit to the search for leads for other neglected diseases.

24 citations on Google Scholar

Meta-analysis of molecular property patterns and filtering of public datasets of antimalarial “hits” and drugs
Sean Ekins*abcd and   Antony J. Williamse

Med. Chem. Commun., 2010,1, 325-330

DOI: 10.1039/C0MD00129E
Received 30 Jul 2010, Accepted 03 Sep 2010
First published online 30 Sep 2010

Neglected infectious diseases such as tuberculosis (TB) and malaria kill millions of people annually and the oral drugs used are subject to resistance requiring the urgent development of new therapeutics. Several groups, including pharmaceutical companies, have made large sets of antimalarial screening hit compounds and the associated bioassay data available for the community to learn from and potentially optimize. We have examined both intrinsic and predicted molecular properties across these datasets and compared them with large libraries of compounds screened against Mycobacterium tuberculosis in order to identify any obvious patterns, trends or relationships. One set of antimalarial hits provided by GlaxoSmithKline appears less optimal for lead optimization compared with two other sets of screening hits we examined. Active compounds against both diseases were identified to have larger molecular weight ([similar]350–400) and logP values of [similar]4.0, values that are, in general, distinct from the less active compounds. The antimalarial hits were also filtered with computational rules to identify potentially undesirable substructures. We were surprised that approximately 75–85% of these compounds failed one of the sets of filters that we applied during this work. The level of filter failure was much higher than for FDA approved drugs or a subset of antimalarial drugs. Both antimalarial and antituberculosis drug discovery should likely use simple available approaches to ensure that the hits derived from large scale screening are worth optimizing and do not clearly represent reactive compounds with a higher probability of toxicity in vivo.

12 citations on Google Scholar

In addition we recently identified compounds out of the GSK malaria screening set with activity against TB.

Chem Biol. 2013 Mar 21;20(3):370-8. doi: 10.1016/j.chembiol.2013.01.011.
Bayesian models leveraging bioactivity and cytotoxicity information for drug discovery.
Ekins S, Reynolds RC, Kim H, Koo MS, Ekonomidis M, Talaue M, Paget SD, Woolhiser LK, Lenaerts AJ, Bunin BA, Connell N, Freundlich JS.


Identification of unique leads represents a significant challenge in drug discovery. This hurdle is magnified in neglected diseases such as tuberculosis. We have leveraged public high-throughput screening (HTS) data to experimentally validate a virtual screening approach employing Bayesian models built with bioactivity information (single-event model) as well as bioactivity and cytotoxicity information (dual-event model). We virtually screened a commercial library and experimentally confirmed actives with hit rates exceeding typical HTS results by one to two orders of magnitude. This initial dual-event Bayesian model identified compounds with antitubercular whole-cell activity and low mammalian cell cytotoxicity from a published set of antimalarials. The most potent hit exhibits the in vitro activity and in vitro/in vivo safety profile of a drug lead. These Bayesian models offer significant economies in time and cost to drug discovery.

1 citation on Google scholar

Finally, the GSK compounds had already been virtually screened vs TB models in 2010 (Drug Discov Today paper above) and the recent GSK published TB data  (http://www.ncbi.nlm.nih.gov/pubmed/23307663) was used to show prediction success here .

All of this work appeared in the public domain prior to the current paper’s acceptance.

I figured rather than just posting all this on the PLOS website it would be of general interest to my readers. I cannot understand why the Med. Chem. Commun paper is not in PubMed..but either way our work on these datasets is in the public domain and would have been found simply by searching the web.

If anyone would like preprints of the papers above please let me know.

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>