Towards a tag-based user model: how can user model benefit from tags?
Francesca Carmagnola, Federica Cena, Omar Cortassa, Cristina Gena, Ilaria Torre
Dipartimento di Informatica, Università di Torino, Italy
Abstract. Social tagging is a kind of social annotation by which users label resources, typically web objects, by means of keywords with the goal of sharing, discovering and recovering them. In this paper we investigate the possibility of exploiting the user tagging activity in order to infer knowledge about the user. Up to now the relation between tagging and user modeling seems not to be investigated in depth. Given the widespread diffusion of web tools for collaborative tagging, it is interesting to understand how user modeling can benefit from this feedback.
1 Introduction and state of the art
With the beginning of the new millennium, the Web has seen a big transformation which led to the explosion of the so-called “social software” and to the definition of a new paradigm of the Web, the Web 2.01. This new paradigm offers users several ways to participate in the creation of web content: it makes easy and stimulating the process of tagging (labeling resources by means of keywords), inserting new contents, sharing objects, providing comments and so on. These activities are typically defined as “social” or “collaborative” annotations. In the last two years in the field of adaptive systems several projects have been developed exploiting social navigation. Ahn et al.  use social annotation to improve information visualization by presenting visual indicators that provide information about user and group annotations to resources; Bateman et al.  propose a framework for integrating social tagging into a natural language ontology. Finally some works make tags themselves the object of adaptation, e.g. Xu et al. . Up to now nobody seems to have exploited tag annotation in order to enrich and extend the user model. van Setten  provide some ideas about how information systems can adapt themselves using annotations to support users in finding the information they need. Moreover, they indicate that this profile can then be used for recommendation, using techniques as collaborative filtering or case based reasoning. This is a possible use of tags, but it would be even more interesting to semantically
analyze tags and reason on them in order to infer new knowledge about a specific user. The aim of our work is indeed to understand how tags can be used for user modeling, and specifically how tags can be useful to increase and improve the knowledge of an adaptive system about users. This work moves from a recommender system, iCITY , a web-based multi-device application that provides suggestions on cultural events in the city of Turin, and allows users to tag the events. Events are classified on the basis of a domain ontology, and suggestions are based on user model and user location, and the user interface is adapted to the device being used. The paper is structured as follows. In Sec. 2 we analyze reasoning on the action of tagging, and on the content of tags. We also present a test we carried out to support our analysis. Finally Sec. 3 concludes the paper and presents some open issues.
2 Reasoning on tags
Tags can be useful in increasing and optimizing the knowledge of an adaptive system about a user. What we want to investigate in this first part of the section is the relevance of tagging (meant as the action led by a user when adding tags), showing how and why this action could represent an important feedback for user profiling. Thus, we start analyzing the user model of iCITY and, in particular, the user dimensions that could be inferred from the action of tagging: i) user’s interactivity level, namely the measure of how much the user interacts with the system. It is related, on the one hand, with the willingness of the user to interact with the application, and, on the other hand, with the real possibility of the user to interact with it. The action of tagging seems to be a relevant indicator of the user interactivity level, since it requires some effort to accomplish it, compared to the other user actions; ii) user’s organization level, which identifies the attitude of the user in organizing and categorizing things. In all the tagging services available on the web, the main motivation for user to tag is to satisfy the need of organizing resources in a personal way in order to better visualize, store and retrieve them later; iii) user’s interest in a content, if a user spends time in selecting or inserting tags on a specific item she is probably interested in the item. Now, we want to investigate the chance to reason on the semantics of specific tags inserted by the user in order to enrich the user model by refining the value of existing user features and inferring new user features. To accomplish such a goal, the following three main tasks seem to be necessary. 1) Categorization of tags. In order to explore how iCITY users tag events and, consequently, how this knowledge can be exploited for user modeling, we carried out an initial evaluation. We selected a list of events from the RSS channel that feeds iCITY2, to simulate the tagging activity on the web site. We chose 15 events belonging
to different categories (art, theatre, cinema, music, books), and then we set the items in three homogeneous groups to be presented to three different groups of users. We selected 39 users choosing them between students (23 subjects), researchers working in our departments (10 s.), relatives and friends (6 s.). We organized the experimental tasks as follows: we showed each user a printed list containing 5 events and their description, and we asked them to tag them. They could freely write their own tags (up to 5 tags for event) or choose them from the words contained in the event description (the reason of this second option is that iCITY suggests also the tags automatically extracted from the event description). We collected 217 tags and we analyzed them in an inductive way, following the principles of the Grounded Theory . The main two categories emerged from our analysis are the following: proposed tags (tags derived from the event description): 76% of tags and free tags (not derived from the description): 24% of tags. We then analyzed tags taking also into account other properties related to the tagged event. Thus other categories emerged: specific tags (tags that add some specification about the event): 61.19%; generic tags (tags that classify the event in a more general dimension): 22.37%; contextual tags (tags about the context of the event: location, time, etc.): 13.24%; synonym tags (tags that are synonyms of terms in the event description): 2.74%; invented tags (e.g. unhyphenated compound words like “PicassoExhibition”): 2.17%. Considering the gap between our test and the real online service iCITY provides, the next step of our analysis has been to integrate the classification obtained by our test with the categories that could not be detected with it. Thus, first of all we included the categories Subjective tags (tags that express user's opinion and emotion) and Organizational tags (tags that identify personal stuff). Then, we took into account the types of tags suggested by iCITY, which suggests tags on the basis of i) the most popular tags in the community; ii) the most used tags previously inserted by the user, and iii) the tags recommended on the basis of the user model features combined with the event description. As a consequence, our classification is extended with the following three categories: Most popular tags, Most used tags and Recommended tags. These categories will be taken into account as subclasses of the general class Proposed tags. 2) How to automatically analyse tags. At this point of our analysis, the main problem to face with is how to transform all the above categories into information processable by machine in order to reason on them. According to the above tags classification, some tags can be analyzed exploiting the iCITY events ontology, other categories of tags can be detected on the basis of the user behaviour, but a better solution might be analyzing tags by mean of a natural language ontology, such as WordNet3. In the following we provide, for each category of our classification, some ideas of how to analyze them: - proposed tags/free tags: this is the easiest category to detect, since the categories are based on the user selections and the proposed selections are controlled by the system. Thus, it is possible to check if the tags come from the system’s inference (recommended tags), if they come from the most used tags of the user, if they belong to other users (most popular tags), if they are inserted for the first time from
the user (free tags), and in this specific case also if they do not belong to the WordNet dictionary (invented tags); generic/specific tags: for each event, tags are recognized as “general” if they are mapped on the upper categories of the iCITY ontology; “specific” if they are mapped on instances or lower concepts of WordNet related to the categories of the ontology; synonym tags: inserted tags are compared with WordNet vocabulary in order to identify synonyms of the word used in the description of the specific event; contextual tag: by means of the WordNet vocabulary, iCITY tries to discover whether the tag is related to the context of the event. It is possible only for tags with a well-defined format (e.g. time) or tags which represent instances of previously identified as contextual concepts in WordNet (e.g. location-based concepts); subjective tags: these tags express user's opinion and emotion, and, again, they can be identified by means of WordNet. organizational tags: these tags can be used to organize events and thus it is difficult to recognize them by using WordNet. Tags can be assumed to be organizational if the same user uses them with a high frequency.
Finally, we also consider the meaning of the tag: WordNet can return the category to which the tag belongs and this could be useful in order to discover whether the tag pertains to the same category of the event. E.g., a user could tag a movie like “Ray”, about the Ray Charles’ life, with the tag “jazz”, which is a lower concept of WordNet category three. A final remark to this section regards a big problem we have not taken into account up to now. It is the possible polysemy of tags, which can make difficult the use of WordNet. For discussions about that see Levialdi et al. . 3) Matching between tags and user model dimensions: now, starting from this classification of tags, we analyzed how each tag category can be relevant for user modeling dimensions. If the user selects one of the proposed tags, we can infer a medium level of participation in the tagging activity; we can also assume a low level of knowledge on the content and a medium level of organization (maybe she could be not so interested in well categorizing the events). All these inferences are weak since the user behaviour could be due as well to slackness or to the fact that she simply found the right tag among those ones suggested by the system. Analyzing more specifically the type of the proposed tags, if the user selects the most popular tags we can weakly infer that she trusts the other people of the community and that she conforms herself to the general thought (conformism). While if she always uses the same tags after some interactions, we could infer a propensity to regular habits (orderliness). Finally, if the user selects tags recommended by the system, we could infer a high level of trust in the system. On the contrary, if the user uses a lot of free tags, we can make other assumptions. Her knowledge in the topic is probably medium-high, because inserting free tags requires a specific knowledge in the area. It could also mean a high creativity, a great participation in the tagging activity (because using personal words requires more effort than to simply selecting from suggested tags) and a high level of organization.
The last three values are even higher when the free tag is an invented one. If the user uses specific words, this could indicate a great knowledge in the topic; but, on the contrary, if she uses generic words, this does not necessarily imply a low knowledge. In fact, if the generic words are appropriate, it could mean a high knowledge that allows using high abstract concepts. The use of synonyms could imply again a good knowledge in the topic and a high level of creativity; while contextual tags could mean that the user has high practical knowledge probably derived from a direct participation at event, and thus a high interest in it. The meaning of the tag could reveal some cross-categorization, that could reveal a high knowledge in the event. Finally, organizational tags express a high attitude to organization and creativity and subjective tags reveal a tendency to personalize the interaction.
3 Conclusion and Future work
In this paper we have analyzed the possible contribution that the analysis of tagging activity can bring to user modeling. The next step is to verify these hypotheses with a deep evaluation. At the same time we are investigating the possibility of exploiting the list of tags publicly available in the accounts (express through URLs) of the web communities the user belongs to, since most of them make the list publicly available in some xml-based syntax. By importing such tags (and to map them onto the domain ontology) it would be possible to enrich and extend the user model and consequently improve and refine recommendations.
1. 2. 3. 4. 5. 6. 7. 8. Ahn J., Farzan R., Brusilovsky P., A Two-Level Adaptive Visualization for Information Access to Open-Corpus Educational Resources, Workshop on Social Navigation and Community-Based Adaptation Technologies (AH 06), June 20th, 2006, Dublin, Ireland Allport, G. W.: Pattern and growth in personality.Rinehart and Winston, Holt, NY (1965) Bateman S, Brooks C., McCalla G., Collaborative Tagging Approaches for Onto-logical Metadata in Adaptive E-Learning Systems, Workshop on Applications of Semantic Web Technologies for E-Learning (AH 06), June 20th, 2006, Dublin, Ireland. Carmagnola F., Cena F., Console L., Cortassa O., Ferri M., Gena C., Goy A., Parena M., Torre I., Toso A., Vernero F., Vellar A., iCITY – an adaptive social mobile guide for cultural events, Workshop on Mobile Guide, October 18th, 2006, Italy. Dix A., Levialdi S., Malizia A., Semantic Halo for Collaboration Tagging, Systems, Workshop on Social Navigation and Community-Based Adaptation Tech-nologies (AH 06), June 20th, 2006, Dublin, Ireland. Strauss A. L. and Corbin J. M.. Basics of qualitative research: techniques and procedures for developing grounded theory. SAGE, Thousand Oaks, 1998. van Setten M., Brussee R., van Vliet H., Gazendam L., van H uten Y., Veenstra M., On the Importance of "Who Tagged What" , Workshop on Social Navigation and Community-Based Adaptation Technologies (AH 06), June 20th, 2006, Dub-lin, Ireland. Xu Z., Fu Y., Mao J. and Su D., Towards the Semantic Web: Collaborative Tag Suggestions, Workshop on Collaborative Web Tagging (WWW06), May 22nd, 2006, Edinburgh, Scotland.