CMSC 828K 
Data Structures and Algorithms for Information Retrieval 
Instructors: Tapas Kanungo and David Mount 
Spring 2000 
Presentation Schedule and Slides
  
 Course Description 
We will discuss various data structures and algorithms that are
used in information retrieval systems. The emphasis will be on
understanding the details of the algorithms, the data structures
used, the space-time complexities. We will also attempt to propose 
alternate algorithms that are more efficient than the ones used
in the literature. 
Student Participation 
Students are responsible for carrying out the following activities.
-  Select one or a few articles or chapters from the books.  Discuss
   your choices with either or both of the course coordinators.  Ideally
   you should make your articles available for copying at least one day
   prior to the previous class period, so that copies can be distributed
   to everyone.
 -  Give a one hour presentation discussing the details of algorithm.
   The first 15 minutes should be spent on the background and
   introduction, and the rest of the time should be spent on technical
   details and methods.
 -  Lead a discussion on the paper presented: What was the key problem?
   Are there other ways of addressing this problem?  Are there faster
   methods?  What are the strengths and weaknesses of the paper?  What
   other improvements or tradeoffs are there?
 -  Submit a summary report of roughly four pages with a bibliography
   citing related research.  The report should summarize the paper and
   the discussion.  The document should be in Latex and the bibliography
   should be in Bibtex.  We plan on combining these summaries at the
   end of the semester to form a survey to the area and a bibliography.
   Samples will be given to help in preparing these.
 
 Potential Topics 
The following list is intended to serve as a starting point for ideas
for topics.  Feel free to suggest others that you think may be
interesting and appropriate.  Inverted files, lexical analysis and stop
lists, stemming algorithms; string searching algorithms; clustering
algorithms; kd trees, nearest neighbor searching; singular value
decomposition/latent semantic indexing;  probabilistic searching;
hidden Markov models (HMMs) and HMM-based searching; searching scanned
documents, photographs, video and speech; searching in the compressed domain.
 Research Resources 
-   Modern Information Retrieval 
 
 Ricardo Baeza-Yates and Berthier Ribeiro-Neto 
 
 Addison Wesley, 1999.
 -   Information Retrieval: Data Structures and Algorithms 
 
        William B. Frakes and Ricardo Baeza-Yates (eds.) 
 
         Prentice Hall, 1992. 
         
         Code for Chapters 7, 8, 9, 12, and 13 
 -   Managing Gigabytes: Compressing and Indexing Documents and Images 
      Ian H. Witten, Alistair Moffat and Timothy C. Bell 
 
        Van Nostrand Reinhold, 1994.
 -  Readings in Information Retrieval 
 
        Karen Sparck Jones and Peter Wilett (eds.) 
 
        Morgan Kaufmann, 1997.
 -  Statistical Pattern Recognition 
        Keinosuke Fukunaga 
 
        Academic Press, 1990
 -  Statistical Pattern Recognition: A Review 
     A. K. Jain, R. Duin and J. Mao 
     IEEE Trans. on PAMI, January 2000
      Tech Report version  
 -  The ACM SIGIR conference proceedings website: 
http://www.acm.org/pubs/contents/proceedings/series/sigir/ 
 
Grading
Grades will be based on the presentation and class participation.
This is a 2-credit seminar. However, if a student is interested in
doing a project he/she can earn 3 credits.   Please contact one of
the coordinators to discuss possible projects.
Contact Information
Meeting Time and Place
| Day       | Time    | Room      | 
| Tuesday   |  3:30- 5:30  |   AVW 4406  | 
Presentation Schedule and Slides
  
 Copy all the slides (tar+gz)