|
Models in Information Retrieval, including Cross Language RetrievalInformation retrieval algorithms have emerged as the key to effective search of large collections of unstructured text such as found on the Internet. Vector space algorithms are used by Lycos and AltaVista, while Inktome uses a probabilistic document retrieval algorithms.The three major theoretical models in information retrieval are Boolean/logic, vector space, and probabilistic. This tutorial will explain the unique characteristics and problems of each model and how each model has evolved along different lines. Modern variants of the basic models are explained. A major application area of IR is in cross-language retrieval, which combines linguistic techniques with traditional mono-lingual retrieval techniques. This is a burgeoning research area and deserves special devoted attention to the techniques of machine translation, bilingual dictionaries, and corpora-based learning. I will also discuss the special challenges of Asian language retrieval (Japanese, Chinese, Indian subcontinent languages). The attendees of this tutorial will obtain a basic understanding of the major theoretical models upon which modern text retrieval software is based. The tutorial should provide each participant with a starting point for further elf-education. Who should attend: This course is designed to provide a fast-paced yet rigorous introduction to the basic models of Information Retrieval for academic and industrial research and development computer scientists whose background lies outside the Information Retrieval area.
About the instructor:Fredric Gey's research specializes in probabilistic document retrieval using logistic regression techniques. He is principal investigator of NSF grant IRI 9630765 Probabilistic Retrieval of Full-Text Document Collections Using Logistic Regression. He is Co-principal Investigator for the ARPA research contract "Search Support for Unfamiliar Metadata Vocabularies," July 1997-June 2000. He directs the UC Berkeley entries to the TREC conferences, and is designated as General Chairman for SIGIR99 to be held at the University of California, Berkeley during the summer of 1999. He holds a PhD in Information Science from UC Berkeley.
| |
|
Spatio-Temporal Information Systems: A Conceptual PerspectiveDespite the well-established benefits of conceptual modeling for application design, current spatio-temporal models do not cope satisfactorily with designers' requirements. In this tutorial we first identify the goals of a spatio-temporal conceptual model and then we describe the MADS model along the structural, spatial, and temporal dimensions. As the modeling concepts are orthogonal, the proposed model achieves both simplicity (as concepts are independent from each other) and expressive power (as concepts may be freely associated). The model has been implemented and can be translated to operational models of existing products. The tutorial briefly describes the architecture we defined for providing users with a set of conceptual interfaces for defining and accessing spatio-temporal information systems.Finally, the tutorial reports on results of an experimentation which allowed us to assess the qualities of the model. KEYWORDS: information systems, conceptual modeling, data models, spatial databases, temporal databases, database design, GIS, geographic information systems, CASE tools, practical experiments.
Outline of the Tutorial
| |
|
Recent Advances in Data Mining Algorithms on Large DatabasesA large number of corporations have invested heavily in information technology to manage their businesses more effectively, and vast amounts of critical business data have been stored in database systems. The volume of this data is expected to grow considerably in the near future. Yet many organizations have been unable to collect valuable insights from the data to guide their marketing strategy, investment and management policies. One of the reasons for this is that most information is stored implicitly in the large amounts of data. Fortunately, new and sophisticated techniques being developed in the area of data mining can help companies leverage their data more effectively and extract insightful information from their data.This tutorial describes the fundamental algorithms for data mining, many of which have been proposed in recent years. These techniques include association rules, correlation, causal relationship, clustering, outlier detection, similar time sequences, similar images, sequential patterns and classification. In addition, since we will cover technical material in some degree of depth, the audience will get a good exposure to the results in the area, and also future research directions. Who should attend: Professionals who would like to get introduced to/know about the state of the art data mining techniques and products for large databases.
Tutorial Outline
Rajeev Rastogi is active in the field of databases and has served as a program committee member for several conferences in the area. His writings have appeared in a number of ACM and IEEE publications and other professional conferences and journals. His research interests include database systems, storage systems and knowledge discovery. His most recent research has focused on the areas of high-performance transaction systems, continuous-media storage servers, tertiary storage systems, data mining, and multidatabase transaction management. Kyuseok Shim is currently leading the Serendip Data Mining project in Bell Laboratories. Before that, he worked for Rakesh Agrawal's Quest Data Mining project at IBM Almaden Research Center. He also worked as a summer intern for two summers at Hewlett Packard Laboratories. He received B.S. degree in Electrical Engineering from Seoul National University, and the MS and Ph.D. degrees in Computer Science from University of Maryland, College Park. Kyuseok Shim has been working in the area of databases focusing on data mining, data warehousing, query processing and query optimization, and constraint-based database systems. He has published several research papers in prestigious database conferences and journals. He has also served as a program committee member on database and knowledge discovery conferences.
|