Fraud detection, Kullback–Leibler divergence, Movies, ship design and other good stuff at the Discovery Summit

Well hopefully that title grabbed your attention and you did not mistake it for an episode of the Big Bang Theory.. I have been attending the Discovery Summit 2012 at SAS in Cary NC for the last day and a half. All because I occasionally use one of their products, JMP for statistical analysis and have done so going back over a decade. I decided to submit a poster on some of the work Antony Williams recently highlighted in a talk at the ACS on liquid dispensing effects on IC50 (see slides 37-38). This project was initiated by Joe Olechno at Labcyte after some of the things we put out on data quality. Anyway that’s a story for another day. The point is, I used JMP as part of the project too and that got me in to this great meeting.

As I proved to myself last year going to a meeting that I would not normally attend is refreshing and can propel you in new directions like the green chemistry conference last year lead to the green solvents mobile app development with Alex Clark.

So yesterday afternoon (after a 6h telecon in the morning) I got to the meeting in time to present the poster and attend some talks – one was given by Dr Stan Young from NISS. We had both co authored a paper a few years back so it was great to see his talk on PCA vs non-negative matrix factorization and Stan also mentioned the Kullback–Leibler divergence. Stan and Christophe lambert also authored a chapter in the first book I edited too ..gave me some ideas of things I want to try in future.

Posters are great because you never know what questions or ideas people will come to you with. One attendee suggested perhaps the liquid handling results may be responsible for negative data when looking at infectious diseases and mutant strains (probably I have mangled the meaning of this)..so basically not seeing a change in the kinetics of a mutant may be due to the liquid handling too or at least the technology used for it may have a role in influencing the data? Interesting idea woth pursuing.

Today I attended a captivating presentation by Antonia de Medinaceli on fraud detection. I actually thought it was the best talk I have seen, the topic was presented in a way that was very accessible and entertaining. So many of the topics mentioned like rules, predictive modeling, anomaly detection and network analysis would not have been out of place in a drug discovery talk..Even the Mahalanobis distance got a mention, and that is something I was aware of from multiple optimization of ADME/tox data . Got me thinking,  although fraud detection is about humans..people and molecules are they really that different?

Lunch was another one of those random meeting opportunities and I managed to find a table with Nelson Lee Afanador,  a Sr Project statistician from Merck, in my old state of Pennsylvania..We had a great discussion and got chatting on many topics. But one was the public perception of how drugs are developed. He mentioned a recent movie Contagion (and I will not spoil the plot because I have not seen it yet) but basically one of those movies where a few scientists have to find a cure for a disease before it wipes out mankind..or something similar. Nelson said why is it in the movies the scientists always ALWAYS find a cure in a few days and save the day. This never happens in reality it takes over a decade and massive teams of people to get a therapeutic drug or vaccine and do the clinical trials. So thankyou Hollywood for making the global population think that drug discovery is so fast!! In many cases we just cannot find cures for diseases due to many factors..

Another talk I attended was given by Janel Nixon who I thought had a great intro using another movie Moneyball to explain how sometimes experts take the wrong metrics when actually something else altogether is important. Now this is a movie I have seen on a plane and I would recommend it (and Hollywood does a better job on baseball movies than disease movies IMHO). Janel als0 described her work with the US Navy in working on a collaborative project to design a boat. What I really liked was she got all the different parties to agree their common high level goal was “protect the nation”. She then described how she could use statistics as response surface models (which she pushed out as excel files – because the government only seems to use Microsoft products!) to justify budget and came to the conclusion that a helicopter and gun were important for boat design, out of all the variables.  She was honest enough to say that her initial data suggested bad or infeasible designs but she then used this data (predictions) to get more real (or realistic) designs..This talk was so fascinating because although they have not built the boat yet from the statistical inputs, apparently GE and other companies are using such statistical approaches with engineers to design products from scratch.

So my perspective is obviously flavored by the pharmaceutical interests and experience I have and all I can think is how could the approach Janel explained be used in drug discovery – e.g. get all the researchers on a project to agree to a high level goal “a cure for disease Z” and work from there to then get all the inputs that feed into drug discovery and use models for all parameters to describe what the therapeutic should look like. If the approach can work on the macro level to design boats can it also work on the small scale of a molecule? Could we imagine the next drug companies lead by folks like Janel or Antonia, with statistics and data mining /modeling driving every decision through the company.

Another aspect of the meeting is that they have a mobile app that is useful to track the talks, schedule and it lists all the attendees, very well thought out as you would expect for one of the largest privately held software companies.

I have half a day of the meeting left, but I have really enjoyed my first Discovery Summit. So thank you SAS!






Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>