Well it has been a week since the first blog on issues with the NCGC Pharmaceutical collection and my follow-up suggestion of the need to improve the situation. I looked at the website today and so far there has been no recognition of some of the structural problems that have been described. This of course does not mean that someone is not working on it over at the NIH. The most recent change log is shown below from their website.
In the meantime several blogs later and a linkedin discussion suggests that the highly curated set of 2500 compounds may be problematic too (not just the bigger set of HTS amenable compounds) “28 out of the first 100 records had incorrect or absent stereochemistry”
After seeing how different states highlight the general hygiene of public establishments (e.g. posting a score for their quality out of 100 for all to see), I was thinking perhaps we need the same for public databases of molecule structures. So for example: if out of 100 randomly sampled compounds in a database there are 28 errors (and i use the term loosely here to cover subtle and major errors) then that database would have a score of 72.
This number could then be posted on the database login page for all to see ..of course it could also be out of 10..or a color flag of some description. Perhaps others can come up with more sophisticated scoring mechanisms.
Leave a Reply