Collaborations to get the NCATS Library of Industry provided reagents

Its seems a while since I blogged on the absolutely bizarre posting of 58 molecules as the ‘library of industry provided reagents’ to be used as a starting point for repurposing – without posting structures. Since my last Blog I have become aware of at least 3 specific groups trying to collate the molecules, Tudor Oprea¬† and collaborators at UNM and elsewhere..Chris Lipinski and Chris Southan. At the recent ACS in Philly. Chris Lipinski presented his results and I thank him for sharing the data and molecules (he included the Oprea results)..

Chris Lipinski was able to find 36 small molecules and 2 biologics usng CAS SciFinder, Thomson Reuters Integrity, various web postings

Tudor Oprea et al. was able to find 41 small molecules and 2 biologics using US Patents database (IBM), Google, publications

Chris Southan described his approach on his blog and found 30 compounds and put them in PubChem

We have looked at the molecules Chris Lipinski found and could not find a significant difference in a few molecular properties to differentiate those discontinued and those still in clinical trials.

What has not been done so far is look at overlap across all 3 groups above. How can we bring all these efforts together? Are there other efforts to do the same out there? e.g. have NCATS tried to do this?

But the question still resonates WHY?

Why do these 3 groups ‘have to’ collate the molecules?

Why could the NCATS initiative have not posted the molecules on the website or linked to them in PubChem in the first place (would have taken no effort for each company to provide a structure)?

Why did they get groups to propose repurposing the molecules without disclosing molecule structures?

Why is computational analysis not at the forefront of the repurposing efforts before spending experimental resources?

Why oh why did someone not think of this, or did they?

Its not like people have not proposed how to do this kind of thing before.



  1. Antony Williams says:

    I just don’t get why funding was put into the project (and likely a big chunk of money) but no effort put into preparing the SDF file for the community ahead of time. No..instead, anyone interested in computational work has to do all the work to aggregate the data. And that will happen many times. ONE group, the host, should have done the work and asked the rest of us to annotate/curate if we found issues.

  2. sean says:

    Many thanks Tony.
    Based on who put this dataset together it surprises me that there was not even a note as to why structures were not put up. Was it because pharma would not let them, was it because they were afraid we would all rush and generate predictions with models and then they would be inundated with grant proposals? Who knows.

    Again the crowd is doing the work of a govt organization..and not getting paid for it.

  3. sean says:

    My latest post on the identifications is


    Wouldn’t-it-be-nice-if organizations with advanced development candidates published med chem papers, according to MIABE guidelines, where a) the IUPAC string was in direct proximity to the code number in the PubMed abstract (especially for subscriptions journal) b) cited their patent numbers in the article c) submitted the structures to PubChem and ChemSpider (thereby ensuring they were correct off the bat, d) at least check that MeSH had picked up the link correctly and LBNL ensure that ChEMBL curated the paper and captured the assay data that , in turn, went into PubChem Bioassay.

    We can dream I suppose ……

  4. sean says:

    The last Post was from Chris Southan as he was having some problems posting to WordPress..I posted for him.

