Information Search, Retrieval, Extraction and Mining
Machine Learning/Pattern Recogntion
Optical Character Recognition/Digital Libraries
Computer Vision
Systems
Service Science
- Web information retrieval and document relevance ranking; language and encoding detection; named entity extraction; Webfountain; TREC Web
track -- came 4th internationally for homepage search (2001)
- Biomedical informatics: information extraction using machine learning, text categorization, ontology mapping from MEDLINE and Patent documents; Came 2nd in two tracks at the NIST/TREC international competition
on genomics (2005)
- Cross-Language information retrieval: Worked on the DARPA TIDES information project
- K-means clustering algorithms
- MDL-based image segmentation
- Hidden Makov models, logistic regression, conditional random fields
- ROCs, monotone regression
- Printing/scanning document degradation models: estimation, validation, and generation of
synthetic noisy document images
- Automatic generation of groundtruth for scanned documents
- Layout analysis: language models for layout analysis; evaluation of segmentation algorithms
- Multilingual optical character recognition: performance evaluation of multilingual OCR systems,
character recognition using morphological decision tree
- OCR-based Arabic named-enity recognition -- impact of OCR errors
- Language identification (part of OmniPage OCR product)
- Visual tools for groundtruth creation
- Fulltext access for microfilm newspaper archives
- ARPA CD-ROM Project: Created a CD-ROM with scanned documents and corresponding ground-truth for performance evaluation of document understanding systems.
- Storage systems: Erasure codes for storage systems; evaluation of erasure codes
- Autonomic computing/databases: Dynamic optimization of database systems using feedback
from storage systems
- Web services: automatic database layout to meet QoS agreement
- Object Recognition: appearance-based parts and relations, statistical validation of software
- Image Databases: minimum description length image segmentation techniques for the IBM QBIC (Query By Image Content) Database
- Performance Evaluation: quantitative performance evaluation of detection algorithms, phychophysical methods
- Mathematical Morphology: Minkowski shape (binary and grayscale) decomposition, statistical morphology, document image restoration
IBM is in midst of a big change. Since more than 50 percent of the revenue (approx $50 billion)
is generated by the services division, the research division has created a new Services Research
group that has been tasked with modeling and understanding complex service organizations. The idea
is to use these quantitative models to simulate what-if situations, create policies, etc. These
models would hopefully help us reduce the cost and thus increase profits. I am currently part of this
brand new Services Research group.
For more, see IBM's SSME site.