Wednesday, April 11, 2007

Talk: Managing Uncertainty Using Probabilistic Databases

With the frequency (Nyquist-Shannon sampling theorem should give you an idea of how fast I need to be to keep up with the information in flow :-) of talks I am attending to and the palate full of other work, I usually don't get much time to randomize ;-) those talks. Enough about beating the bush..let's get started with the $subject..

Talk #1 for the day: By Nilesh Dalvi (PhD Candidate, University of Washington) who is being considered for a faculty position at Purdue. It's that time of the year at Purdue where many faculty candidate talks are organized and I usually don't miss any talk that touches my area of interest. (One incentive to go there is you get free food..I am just kidding :-). Getting even a slot to be considered as a faculty candidate at Purdue is quite tough..so these guys are really really prepared with their stuff). Although databases/data mining is not my main area of research, I do find it interesting. His talk was centered around data mining approach to measuring uncertainty more or less objectively in information retrieval from probabilistic databases. Well, he presented what he ate, drank, slept on for the last 5 years. You can get direct access to most of the papers he authored/co-authored from his home page. His talk rekindled my liking about probability and statistics techniques I learnt back in high school (well we call it GCE advanced level) and then later in undergrad studies at the University of Moratuwa. OTOMH, some of the highlights of his talk:

-how do we rank query results from a probabilistic databases?
-how do we efficiently evaluate queries on probabilistic databases? (they have an implementation adding some more syntax to existing SQL)
-how do we reason about the privacy in data sharing with many published views (snapshots if you will) from a database irrespective of how large is our sample?
-how do we come up with optimization techniques for queries which are NP-complete?

No comments: