Personal tools
You are here: Home Collections and Tools

Collections and Tools

Folder Disambiguation
Author name disambiguation collections
Folder SyGAR
Folder Wiki Quality
Collections from the Wikipedia, Muppets, and Star-Wars used to predict the quality of wiki articles. More information about the used features can be found at: and
Folder TD 2003
Similarity matrix based on Letor 3.0
Folder TD 2004
Similarity matrix based on Letor 3.0
Folder Temporal Contexts - Datasets
Bag of words representation for documents from ACM-DL and MEDLINE datasets. Dataset format: doc_id;year;class;{term_id;term_frequency;}+ That is, each line corresponds to a document. The first field is the unique document identifier. The second field denotes its year of creation, and the third its class. The remaining fields are pairs of (term identifier;term frequency) . Note that each field is separated by ';'.
Folder INDi
Incremental unsupervised Name Disambiguation
Folder SchemaMatching
Repository of files related to the Schema Matching using Genetic Programming
Folder Ranking Q&A Forums
Data for the Q&A Forums. More information can be found at and
Folder Web 2.0 Quality Assessment
Folder Comparative Performance Evaluation of Relational and NoSQL Databases for Spatial and Mobile Applications [Query Template Guide]
Folder Lattes Expertise Retrieval
Lattes Expertise Retrieval (LExR) test collection for research on academic expertise retrieval.
Folder A Two-Stage Machine Learning Approach for Temporally-Robust Text Classification
Reprozip package for replicating experiments reported in the "A Two-Stage Machine Learning Approach for Temporally-Robust Text Classification" paper.
Document Actions
« January 2020 »