List of organizers:
- Svetoslav Marinov email@example.com (Findwise)
- Paula Petcu firstname.lastname@example.org (Findwise)
- Henrik Strindberg email@example.com (Findwise)
Query logs have been the focus of mostly academic workshops and conferences. At the same time, with the upsurge in high quality search and user-interaction solutions in the enterprise world, companies have a ready access to (very often) large amount of log data. This data, however, whose core is often natural language queries, remains largely unexplored and neglected. The aim of this workshop is to show best practices and standards, new trends, potentials and techniques in order to analyze, draw conclusions, utilize and benefit from the data. The workshop is meant as a forum where people working in the field of Natural Language Processing and Machine Learning could share their ideas with representatives from the Enterprise world on the topic of query-log analysis.
The workshop will be introduced by Findwise. We will give a short description of how query logs are being currently used within Enterprise Search. The workshop will consist of 5 invited oral presentations. Each presentation will be allotted 15 min plus 5 minutes of questions. The workshop will conclude with 10-20 min of a brainstorming/discussion session where speakers, organizers and audience can participate and discuss the future trends of query-log analysis.
- Findiwse Search Analytics at Findwise
- Ann-Marie Eklund & Dimitrios Kokkinakis (Gothenburg University)
Drug interests revealed by a public health portal
Online health information seeking has become an important part of people's everyday lives. However, studies have shown that many of those have problems forming effective queries. In order to develop better support and tools for assisting people in health-related query formation we have to gain a deeper understanding into their information seeking behaviour in relation to key issues, such as medication and drugs. The present study attempts to understand the semantics of the users' information needs with respect to medication-related information. Search log queries from the Swedish 1177.se health portal were automatically annotated and categorized according to relevant background knowledge sources; namely, the Swedish national formulary (FASS), the National Repository for Medicinal Products (NPL) and the Substance-hierarchy of the Systematized Nomenclature of Medicine - Clinical Terms (SNOMED CT). Understanding the semantics of information needs can enable optimization and tailoring of (official) health related information presented to the online consumer, provide better terminology support and thematic coding of the queries and in the long run better models of consumers’ information needs. Perhaps, this can be achieved by integrating and presenting alternatives to query expansion techniques, this time by using publicly available data about e.g. drugs, such as the Linking Open Drug Data (LODD) which creates Linked Data representations of such information.
- Theodora Tsikrika (University of Applied Sciences Western Switzerland)
Image search logs analysis and exploitation
This talk describes the analysis and exploitation of image search logs collected in an enterprise setting. The first part of the talk focusses on search log analysis methods. The majority of current search log analysis methods describe natural language queries and search interactions in purely statistical terms without considering the semantics of the available information. A novel semantic search log analysis method will be presented; this method enriches current approaches by exploiting the knowledge in a linked data cloud. Particular focus will be placed on the analysis of users’ behavioural patterns regarding query formulation and modification and on the implications of our findings.
The second part of the talk focusses on the exploitation of search log data for image annotation and retrieval applications. Automatic image annotation using supervised learning is performed by concept classifiers trained on labelled example images. We propose the use of clickthrough data collected from search logs as a source for the automatic generation of concept training data, thus avoiding the expensive manual annotation effort. Our experimental results indicate that the contribution of search log based training data is positive despite their inherent noise. Both studies were performed using the search logs of the commercial picture portal of a European news agency.
- Fredrik Johansson, Tobias Färdig, Vinay Jethava, Devdatt Dudhashi (Chalmers University of Technology)
Intent-aware Temporal Query Modeling
This presentation will introduce a data-driven approach for capturing the temporal variations in user search behaviour by modeling the dynamic query relationships using query-log data. The dependences between different queries (in terms of the query words and latent user intent) are represented using hypergraphs which allows us to explore more complex relationships compared to graph-based approaches. The inferred interactions are used for query keyword suggestion - a key task in web information retrieval. Preliminary experiments using query logs collected from internal search engine of a large health care organization yield promising results.
Search In Focus: Exploratory Study on Query Logs and Actionable Intelligence
Query logs are an important source of information to surmize users intents'. Although Karlgren (2010) points out that “There are several reasons to be cautious in drawing too far-reaching conclusions: we cannot say for sure what the users were after; [...]“, some linguistic problems could be sorted out by applying more advanced text/content analytics, such as register/sublanguage identification and terminology classification (see Friberg Heppin, 2011) . In this presentation, I will argue that query logs can be considered a digital textual genre alike emails, blogs, chats, tweets and so forth. All these genres contain unstructured information that, still today, is difficult to leverage upon satisfactorily. The hypothesis that I would like to put forward in this workshop is that query logs might be easier to exploit to extract useful information and actionable intelligence than other digital genres.
- Frédérique Segond (Objet Direct)
Making sense of query logs to improve understanding of customers and users: the GALATEAS example
The GALATEAS EU project offers digital content providers with an innovative approach to understanding users' behavior by analyzing language-based information from transaction logs and facilitates the development of improved navigation and search technologies for multilingual content access. While most of search analysis services is seeking to make sense out of computer-generated records such as access date, time a user spend on a web site etc., they are not well fitted for dealing with human generated data. These analyses provide valuable insight into the complexity and successfulness of search interaction but offer limited interpretation of the observed searching behavior as they do not consider the semantics of the users' queries.
In this talk we will concentrate on LangLog , the GALATEAS web service that, based on linguistic and statistical features, analyzes the web server and search engine logs with particular focus on extracting meaning from search queries - the human generated content of the logs, and provides synthesis on users' behavior and needs. We will present the "nuts and bolts" of using advanced language technologies to process short queries such as, normalization, automatic classification, semantic clustering and semantic disambiguation.
- 09:00 - 09:20 Findwise: Introduction & The Findwise Approach
- 09:20 - 09:40 Theodora Tsikrika: Image search logs analysis and exploitation
- 09:40 - 10:10 Frédérique Segond: Making sense of query logs to improve understanding of customers and users: the GALATEAS example
- 10:15 - 10:30 Coffee break
- 10:30 - 10:50 Marina Santini: Search In Focus: Exploratory Study on Query Logs and Actionable Intelligence
- 10:50 - 11:10 Fredrik Johansson & Vinaj Jethava: Intent-aware Temporal Query Modeling
- 11:10 - 11:30 Ann-Marie Eklund & Dimitris Kokkinakis: Drug interests revealed by a public health portal
- 11:30 - 11:45 Discussion
When & Where
Findwise is a growing and award winning IT consultancy with 100 employees at offices in Sweden, Denmark, Finland, Norway and Poland. Founded in 2005 by a team of experts from the enterprise search industry, Findwise creates search-driven Findability solutions for intranets, web, e-commerce and applications. A vendor independent expert with knowledge and experience from the leading search technology platforms: Autonomy IDOL, Microsoft (SharePoint and FAST Search products), Google GSA, IBM ICA/OmniFind, LucidWorks Enterprise, Splunk, Apptus and Open Source (Apache Lucene/Solr, Hadoop and Elasticsearch).