Information Search, Retrieval, Extraction and Mining
Machine Learning/Pattern Recogntion
Optical Character Recognition/Digital Libraries
Computer Vision
Systems
Service Science

Information search, retrieval, extraction, and mining

Web information retrieval and document relevance ranking; language and encoding detection; named entity extraction; Webfountain; TREC Web track -- came 4th internationally for homepage search (2001)
Biomedical informatics: information extraction using machine learning, text categorization, ontology mapping from MEDLINE and Patent documents; Came 2nd in two tracks at the NIST/TREC international competition on genomics (2005)
Cross-Language information retrieval: Worked on the DARPA TIDES information project
Statistical machine learning, statistical pattern recogntiion, model selection
K-means clustering algorithms
MDL-based image segmentation
Hidden Makov models, logistic regression, conditional random fields
ROCs, monotone regression
Optical character recognition (OCR), digital library
Printing/scanning document degradation models: estimation, validation, and generation of synthetic noisy document images
Automatic generation of groundtruth for scanned documents
Layout analysis: language models for layout analysis; evaluation of segmentation algorithms
Multilingual optical character recognition: performance evaluation of multilingual OCR systems, character recognition using morphological decision tree
OCR-based Arabic named-enity recognition -- impact of OCR errors
Language identification (part of OmniPage OCR product)
Visual tools for groundtruth creation
Fulltext access for microfilm newspaper archives
ARPA CD-ROM Project: Created a CD-ROM with scanned documents and corresponding ground-truth for performance evaluation of document understanding systems.
Systems
Storage systems: Erasure codes for storage systems; evaluation of erasure codes
Autonomic computing/databases: Dynamic optimization of database systems using feedback from storage systems
Web services: automatic database layout to meet QoS agreement
Computer vision
Object Recognition: appearance-based parts and relations, statistical validation of software
Image Databases: minimum description length image segmentation techniques for the IBM QBIC (Query By Image Content) Database
Performance Evaluation: quantitative performance evaluation of detection algorithms, phychophysical methods
Mathematical Morphology: Minkowski shape (binary and grayscale) decomposition, statistical morphology, document image restoration
Services Science
IBM is in midst of a big change. Since more than 50 percent of the revenue (approx $50 billion) is generated by the services division, the research division has created a new Services Research group that has been tasked with modeling and understanding complex service organizations. The idea is to use these quantitative models to simulate what-if situations, create policies, etc. These models would hopefully help us reduce the cost and thus increase profits. I am currently part of this brand new Services Research group.

For more, see IBM's SSME site.

Information search, retrieval, extraction, and mining

Statistical machine learning, statistical pattern recogntiion, model selection

Optical character recognition (OCR), digital library

Systems

Computer vision

Services Science