Creating controlled vocabularies for smart search at WUR

Top, J.L.; Öztürk, B.; Hoekstra, J.J.; Vlek, R.J.


Searching text or documents in large unstructured and semi-structured data sources is not trivial. A search engine is supposed to make more search efficient and effective. It supports to build a query that can be applied automatically to extract the information that complies with the user’s intention. Controlled vocabularies and ontologies help improving the search and make it domain-aware. In this document, we explain the notion of a controlled vocabulary, its construction methods and its use in smart search engines. Manual construction of controlled vocabularies and ontologies can be achieved using several existing tools,which require specific technical skills. Therefore, we refer to the ROC+ tool, developed within WFBR, which helps domain researchers build a controlled vocabulary in a faster and easier way. Another application, namely the TALK tool, was developed to start a discussion on a specific term in multidisciplinary teams. It proposes automatically generated associated terms, which can then be exported in a machine processible nformat as input for ROC+. We also briefly mention the use of NLP technology in text mining, where domain related concepts can be automatically extracted from pdf documents. Finally, some example of controlled vocabularies developed within WFBR are listed for further reference.