Collections and Tools
-
Disambiguation
- Author name disambiguation collections
-
SyGAR
-
Wiki Quality
- Collections from the Wikipedia, Muppets, and Star-Wars used to predict the quality of wiki articles. More information about the used features can be found at: http://dl.acm.org/citation.cfm?id=2063507
-
TD 2003
- Similarity matrix based on Letor 3.0
-
TD 2004
- Similarity matrix based on Letor 3.0
-
Temporal Contexts - Datasets
- Bag of words representation for documents from ACM-DL and MEDLINE datasets. Dataset format: doc_id;year;class;{term_id;term_frequency;}+ That is, each line corresponds to a document. The first field is the unique document identifier. The second field denotes its year of creation, and the third its class. The remaining fields are pairs of (term identifier;term frequency) . Note that each field is separated by ';'.
-
INDi
- Incremental unsupervised Name Disambiguation
-
SchemaMatching
- Repository of files related to the Schema Matching using Genetic Programming
-
Ranking Q&A Forums
- Data for the Q&A Forums


