Experience
Education
Bio
Currently, I work in the information integration group at the Information Sciences Institute at the University of Southern California. My dissertation research involved devising data integration solutions for Linked Open Data (LOD), which is a distributed body of cross-domain, structured datasets and the backbone of the Semantic Web ecosystem. According to the most recent studies, LOD currently contains tens of millions of entities and over 1 million documents. Building an automated data integration system for LOD is difficult because it requires the automatic population of an Entity Name System (ENS). An ENS is a thesaurus for entities, and an essential component in a full data integration system. Populating an ENS, in turn, requires solving the Entity Resolution (ER) problem. ER is an Artificial Intelligence problem that has references dating back to at least fifty years, but is still not satisfactorily solved. In the LOD universe, solving ER becomes a Big Data problem, since LOD exhibits all four Big Data facets of velocity, variety, volume and veracity. For my dissertation, I built and evaluated an ER system for LOD, and to use the system to populate an ENS in an existing data integration architecture. My dissertation is getting published as a book by Springer.