Think locally, search globally; context based information retrieval Alexandra Dumitrescu Simone Santini Escluela Polit´ecnica Superior Universidad Aut´onoma de Madrid Spain
Abstract This paper propose the use of local context as a way to semantic information retrieval. In our model, rather than trying to formalize the contents of the documents among which the search is done (e.g. by formally annotating them), we try to automatically build a representation of the context in which the search is done. We consider that search is always done as part of an activity, and that the search context is determined by the activity that is carried out at a particular moment. Many activities that a person is engaged in are carried out with the help of a computer, and leave a digital trace in the files that are created in connection to it. We take these files, and the structural relations between the directories in which they are stored, as a starting point for the representation of context. In a suitable word space, we train a self-organizing map, thus creating a latent semantic manifold that, modified by the query terms introduced by a user, is used to create a contextual query.
1
Introduction
At last year ICSC one of us (Santini) presented a paper that criticized the wide adoption, in the semantic web, of a Tarskian point of view of semantic [15]. The gist of the argument was that even the philosophy of language that started with Tarskian semantics [16] (viz. the analytical philosophy and, in general, the anglo-saxon philosophy of language of the mid-XX century) has come to reject it, at least since Quine and Davidson [11], coming to a point of view that it would be only mildly daring to call protohermeneutic, a point of view, at least, that recognizes the importance of social factors and the interpretative context in the formation of meaning. It would be a gross misunderstanding to claim that analythical philosophers are now sailing by Heideggerian wind, but (if we can continue the metaphor), they are at least not in irons. We argued that the semantic web was philosophically out-of-date in that it
doesn’t acknowledge the dialectic, historical nature of the signification process. The paper concluded with some general indications on how a proper formalization of the local context of a search might provide some sort of hermeneutical key to unlock the door to the fabled digital semantic bonanza. Last year’s paper was, as it was painfully noticed by those who attended the presentation, an unapologetically philosophical ombudsman of annotation semantics. This paper presents some early technical consequences that could be drawn from last year’s position paper. We present a framework for what we call context-based retrieval [4]. Coherently with our presuppositions, our interest is not much modeling the content of documents, that is, modeling the database in which we are searching, but modeling the context in which we carry out the search. Our technique is based on the idea that the activities that one is performing at the time of the search are an important determinant of its context and that certain activities that are carried out using a computer leave what we call a digital trace made of the contents of the documents that are created and edited while performing these activities. The contents of one’s computer—in particular the contents of one’s working directory at the time when the search is initiated— are therefore a good indicator of the context in which the search is carried out. We will develop a suitable representation of the contents of one’s computer and use it as a query modification mechanism so that the resulting query will be a function of the user’s input and of the context determined by the user’s activity. Context is somehow downplayed in the most common of the current approaches to semantics: that of the semantic web [1]. The semantic web approach is based, grosso modo, on two assumptions regarding the process of signification: i) the meaning of a document can be represented by a suitable logical theory, and ii) the meaning of a document is in the document, that is, it can be construed as an attribute of a document, and it can be specified by attaching a suitable set of statement
(taken from the selected logical theory) to the document.
the context in which you act is, by definition, meaningless. We can express this situation with the following expression: µ(t)
It should be noted that the two assumptions are logically independent. We can take or leave each one of them individually, and it is only the second assumption that has led to a scarce appreciation for context in the semantic web community. If meaning is an attribute of a document, its dependency on the context in which the document is interpreted is severely limited. The most common way to take the reading context into account is, in the ontological scheme, to attach a number of possible interpretations to the document, and to select the most suitable depending on the needs of the reader [7]. Similarly limited are collaborative labeling systems (yclept “Folksonomies” [8]). Limited forms of user context, usually restricted to consideration of the documents retrieved immediately before the current query, have also been considered [6, 2]. The first assumption, namely that meaning should be represented as a collection of statement from a logic theory, does not prevent search based on context, but it does constrain the form that context representation can take. If meaning is to be represented by logical means, and context is a determinant of meaning, then context too should be represented by logical means.
C1 −→ C2
where C1 and C2 are the contexts of the reader before and after interpreting the text, t is the text, and µ(t) is its meaning. The properties of the “space of contexts” depend crucially on the properties of the representation of the context that we have chosen, and it is therefore difficult to say something more about meaning is we don’t impose some additional condition. A reasonable one seems to be that we be capable of measuring the degree by which two contexts differ by means of an operation ∆(C1 , C2 ) ≥ 0 such that, for each context C, it is ∆(C, C) = 0. We don’t require, for the time being, that ∆ be a distance. Now the meaning of a document d in a context C can be defined as the difference that d causes to C: µC (d) = ∆(µ(d)(C), C)
(2)
We model activities as a context games played by an actor. Here, the word game should not be taken in the sense of game theiry, but in a more informal sense akin to the Wittgensteinian notion of Sprachspiel: an activity that determines certain moves that are legal in it. In our general model, we have a double representation of the contextat a given stage of the game, that is, a context is a pair D ` S. Here, D is a syntagma, a syntactic representation of the context of the work area (viz. the primary and secondary context), while S is a seme: a latent semantic representation of the same context. At this stage, we do not specify the nature of these representations, although in the next section we will present a specific embodiment of it. There are two classes of moves, which we call endos and exos. Exos are purely local: they affect directly the syntagma, while exos use a local change in the syntagma to create a “target” seme, but the actual context change is caused by data that come from outside the game (viz. not given by the actor). This simple classification, at the current state of affairs, has no pretension of exaustivity, and it simply reflects the praxis of operating with a context. The seme S is not affected directly by the actor or by the results of data searches: it only changes with the changes of the syntagma D. In other words, the evolution equation of the seme S is of the form S 0 = ψ(D, S). In an endo, the syntagma D is changed directly by the actor into a modified syntagm D0 = f (x, D), where x is a user input. This change can occurr in a variety of ways; for isntance, the actor may introduce a new document, or edit a document, remove a document, etc. The new seme S 0 is computed based on the modified syntagm and on the old seme. A functional diagram of this sequence of operations
The contextual search technique presented in this paper is based on the rejection of both these assumptions, but these rejection have different methodological and epistemological weight. Our rejection of the second point is foundational. Meaning is not a datum, but a by-product of a process of interpretation, a process that is always situated and inseparable from the context in which it takes place [5, 14]. Our rejection of the first point is temporary, and a matter of praxis. As should be clear in the rest of the paper, our technique requires that the context representation be derived automatically from the contents of one’s computer, and that it evolve with the evolution of one’s activity. All these operations are easier to implement if we adopt a geometric, “soft” representation of context, while they would have been much harder had we adopted a logic representation. Therefore, the decision not to use logics is a practical compromise of the current technique.
2
(1)
Context and context games
Let us start with a fairly general theoretical model. We have said that the context in which a document is interpreted is essential to determine its meaning, that is, that the context changes the meaning of a text. We can also put things topsyturvy: the function of the semantics of a text is to change the context of the reader [12]. A document that doesn’t change 2
can be written as: f (x)
/ 0 _ pD wppp × q8 NNNN ' 0 qqq ψ S S
D _
(3)
admin
(4) .pdf .pdf .pdf
Exos entail access to an external data source. Since the results of this access are not completely under the control of the actor, the final context won’t be either. The actor can manipulate the syntagm and, through this manipulation, specify a target context, which is a sort of a desideratum of the search but, as a context of the game, a ficticious one. Its only use is to determine the context variation ∆S that, according to our general model, represents the desired semantics of the searched documents. The system retrieves documents, and those approved by the actor1 are inserted into the syntagma. Conceptually, we can represent an exo as follows. The direct user intervention q results in an intermediate syn˜ = f (q, D) and seme S˜ = ψ(D, ˜ S). The diftagma D ˜ ference ∆ = Delta(S, S) is the intended semantics of the retrieval, and is transormed into a suitable query θ(∆). The results of the query, τ , are used to derive a new syntagma D0 = f (τ, D) and seme S 0 = ψ(D0 , S). Schematically: D _ VVVVVVVV VV+ f (q) ˜ pD p xp p _ A × NNN ψ N& ˜ S S
/× O
/ D0 _ θO j* *j (ext.) *j *j *j * τ /6 ∆ 3 × NNψ NN& S0 f
paper
Q Q ? pres.
MB text Bhhh hhhh accessory context bibliog. material
−1
f / D0 / 00 _ p o D_ p wpp wooo p7 × OOOO q8 × NNNN ' 00 ' 0 ppp qqq ψ ψ S S S f
Q Q
tickets text hotel registration
Note that in contextual games, operations can’t be undone simply by operating the inverse transformation from D0 to D since, in the meantime, the seme has changed. D _
primary context
ICSC
image.jpg text.tex
Figure 1. The structure of directories and context for the preparation of a presentation.
3
Implementing Context
The practical problems posed by the general orientation presented here include how to capture ongoing activities, how to represent them and formalize them, in such a way that they can be used as a basis for data access. This mightn’t always be possible: The general context of everyday life escapes any attempt of formalization (fortunately). However, for many people, an increasing number of activities is carried out with the help of digital devices of various nature, devices that contain data produced in the course of these activities. These data form a sort of digital trace of the activity, and can be used to represent its context. Suppose that somebody (the actor) is preparing a presentation for a conference to which she had submitted a paper and that, during this process, she needs to look for a reference to illustrate a point in the presentation. In order to prepare the presentation, one might have created a document in a directory (let us say the directory presentation) where she have possibly copied some documents that she thought might be useful. This directory (the primary context) is likely to be placed in a hierarchy like that of figure 1. Its sibling directories and their descendants (the accessory folder) will contain documents somehow related to the topic at hand although, probably, not so directly as those that can be found in the work directory. In order to use this information, we have to specify two things: how to represent the context and how the various actions that the activity allows will modify it; in particular, in this paper, how the query moves of the game modify it. Our context representation is based on a self-organizing map that we lay out in a suitable space of words to constitute a latent semantic manifold, that is, a non-linear low-
(5)
Note that, in this scheme, the final context D0 ` S 0 is not a ˜ ` S, ˜ but of the original function of the modified context D one D ` S. As in the previous case, the final context is the result of a direct change in the syntagma. The difference between the two is that, now, only the query is created locally: the text that changes the context comes from outside the system (as indicated by the wavy arrow θ /o /o /o / τ ). 1 This approval is somewhat of a break in our model; in an ideal situation, all the documents retrieved by the system would enter the context. Life (computing life, that is) being what it is, we must pragmatically admit the possibility that even the best of systems will retrieve some spectacularly irrelevant documents that would lead the context astray.
3
be [t1 , . . . , tW ], (ij) be the pair formed by the word ti followed by the word tj , P the set of all pairs found in the documents, and Nij the number of times that the pair (ij) appears. Then in the generator of the directory D, the pair (ij) is given a weight
dimensional subspace of the word space that capture important semantic regularities among words. (Our latent semantic manifold is to be contrasted with linear models such as the latent semantics subspaces [3].) The technique is based on the self-organizing map WEBSOM [9], but while WEBSOM and other latent semantic techniques have been used so far mainly for the reprsentation of data bases, we shall use them as a context representation technique, and as a query creation instrument. We shall divide the construction of the representation in two parts. First, we build a syntagma that captures the syntactic regularities that exist in the documents of the context. This process results in a context representation in the form of a point cloud in a vector space whose axes represent words. In this space we then train the self-organizing map to consitute the latent semantic manifold (the technical incarnation of the seme) that will be our final context representation and our query tool. The function ψ that relates the seme to the syntagma consists, in this case, of the learning algorithm for self-organiaing maps.
D ωij =P
(hk)∈P
Nhk
(7)
The set of all these weighted pairs constitutes the representation of the directory, that is, its generator. Note that the weighting scheme that we are using is fairly simple, and it doesn’t take into account the document frequency of a word as is the case, for instance, of the tf/idf weighting scheme [13]. The reason is that, in our case, we do not have a collection of documents against which we compare the frequencies that we find in a document: all we have is a single context, with the frequencies relative to it. In order to create the index, we put together the generators of the directory D and all the directories of its accessory context K through a linear combination of the weights of homologous word pairs. Let 0 ≤ γ ≤ 1 be a constant. Then for each pair (ij) we define the index weight as
In each document directory of the user computer we build a complete context representation, all the way to the latent semantics manifold, that depend on the documents contained in the context of that directory. Here, we will consider a working directory D (our primary context) and a collection of directories K = {D1 , . . . , Dk } that constitute its accessory context. The point cloud representation of the context of D, is called the index of D, and is assembled based on information derived separately for each directory, and that depend only on the documents of the directory it represents. We call this the generator of the directory. Let gen(D) be the generator of directory D. We build a generator for D and for each directory in its context: gen(D1 ), . . . , gen(Dk ), then we merge them using a suitable indexing function f to obtain the index of the directory D, that is, ind(D) = f (gen(D); gen(D1 ), . . . , gen(Dk )) .
Nij
D D wij = γωij +
1−γ X F ωij |K|
(8)
F ∈K
The pairs with these weights constitute the point cloud representation of the context of the directory D, that is, its index. The points are represented in a vector space whose axes are the words t1 , . . . , tW . In this space, the pair (ij), D with index weight wij is represented by the point i
z }| { D D pij = (0, . . . , wij , 0, . . . , wij , 0, . . . , 0) | {z }
(9)
j
Each point pij lies in the two-dimensional sub-space determined by the axes ti and tj (see figure 3) At the end of this step, the context of the directory D is represented by a set fo points ID in this space, a point for each word found in the directory D or in one of the directories of its context K.
(6)
We begin by collecting all documents in the directory in a single, large document. To this document we apply standard algorithms for stopword removal and stemming. The result is a series of stems of significant words (see figure 3), from which we consider groups of n consecutive stems (word groups). In figure 3 we have illustrated the case n = 2, which is the one that we shall consider in this paper. Correspondingly, we will talk about word pairs rather than groups2 . Let the words (terms) found in the directory
The point cloud thus built is used as the training data for a self-organizing map deployed in the term space. The map is a grid of elements called neurons, each one of which is a point in the word space and is identified by two integer indices, that is, a neuron is given as:
2 In the information retrieval literature, these groups are called word contexts [18]. Since the expression context is one of the essential concepts in this paper, and since we are using it in quite a different connotation, we prefer to depart from the standard terminology rather than risking a significant confusion.
µν [µν] = (uµν 1 , . . . , uT ) 1 ≤ µ ≤ N, 1 ≤ ν ≤ M
(10)
The map is discrete, two dimensional with the 4neighborhood topology. That is, given the neuron [µν], its 4
fourscore and seven years ago our forefathers came to these shores (stop word removal, stemming) ? fourscore seven year forefather come shore ?
?
?
(fourscore,seven) (year,forefather) (come,shore)
pairs formation
?
?
(seven,year) (forefather,come)
Figure 2. Initial steps for the construction of the generator.
seven 6 w (seven,year) 23 -r 6 1: 2: w23 3: 4: ? 5: 3 year 6: w 34 w 34 + -r + (year,forefather) forefather
parameter that increases as learning proceeds. The function h(t, n) represents the “degree of neighborhood-ness” of two neurons at a distance n at time t; for it we postulate the following properties: fourscore seven i) ∀t.(t ≥ 0 ⇒ h(t, 0) = 1); year forefather ii) ∀t, n.(t ≥ 0 ∧ n ≥ 0 ⇒ 0 ≤ h(t, n) ≤ 1); come shore iii) ∀t, n.(t ≥ 0 ∧ n ≥ 0 ⇒ h(t, n) ≥ h(t + 1, n), h(t, n) ≥ h(t, n + 1)); The degree to which neuron [ζξ] belongs to the neighborhood of neuron [µν] at time t is given by h(t, δ([ζξ], [µν])). Condition iii) localizes the neighborhood around [µν] and causes it to “shrink” in time. In addition to the neighborhood we define a learning parameter α(t), t ∈ N such that 0 ≤ α(t) ≤ 1, and ∀t.(t ≥ 0 ⇒ α(t) ≥ α(t + 1)). In order to create the latent semantic manifold for a directory D, all the points in the index ID are presented to the map, and the training algorithm is applied. We call the presentation of a point p ∈ ID an event of learning, and the presentation of all the points of ID an epoch. Learning consists of a number of epochs, counted by a counter t. The neurons of the map are at first spread randomly in the word space; then, for each event consisting of the presentation of the point p, the following learning steps take place:
Figure 3. Position of the points of a point cloud in the word space.
neighbors are the neurons [(µ − 1)ν], [(µ + 1)ν], [µ(ν − 1)], and [µ(ν + 1)]. We can visualize the map as a grid laid out in the word space with rods joining neighboring neurons. Given two neurons, [µν] and [ζξ], we can measure their distance either by considering them as as points in the word space (e.gl. with a Euclidean metric): " d([ζξ], [µν]) =
T X
# 12 (uζξ i
−
2 uµν i )
,
(11)
i) the neuron that is closest to p according to the word space distance is found:
i=1
or as points in the grid using the graph distance between them: δ([ζξ], [µν]) = |ζ − µ| + |ξ − ν|. Note that the first distance can be computed between any point in the word space and a neuron, not just between two neurons. On this map we define a neighborhood function, h(t, n), which depends on two parameters t, n ∈ N; n is the graph distance between a given neuron (the neuron whose neighborhood we are determining) and another neuron, t is a time
[∗] = arg min d(p, [µν]); [µν]
(12)
ii) the neuron [∗] and all its neighbors are shifted towards p. The amount of this shift depends on the learning parameter α and on the distance from [∗] on the map: ∀[µν] [µν] ← [µν]+α(t)h(t, δ([∗], [µν]))·(p−[µν]) (13) 5
After learning, each neuron crystalizes in a final position in the word space. The neuron [µν], placed at point µν (uµν 1 , . . . , uT ) induces a weighting in the words of the context given by µν t1 : uµν (14) 1 · · · tT : u T
into a query that today’s servers can handle. To this end, we resort to the observation made in a previous section, that each neuron, positioned in the word space, induces a weighting of the words in the context. Let us consider a (small) neighborhood N of [∗], the neuron closest to the inquiry. For each [µν] ∈ N we have a weighting of each term as in (14). The raw weight of the term tk is the sum of the weight that it receives from all the neurons in N: X [µν] zˆk = uk (17)
This observation will be instrumental in deriving our simplified query scheme, as will be shown in the next section.
3.1
The query
[µν]∈N
The training procedure described in the previous section results in a context representation for each working directory. The same representation is used as a basis for the contextualization of queries done in a working directory D. We must remember, once again, that in our model of semantics, a query is part of a word game that takes place in the working directory D and that, no matter what the terms used by the user are, the contextual query will depend on the context of the directory D. We begin with the terms entered by the user, which we call the inquiry, to distinguish it from the conceptual query that we will create. The inquiry may be composed of a set of keywords, a sentence, even a whole paragraph. We process it by removing the stop words it contains and by stemming. The result is a series of stems (keywords) Y = {tk1 , . . . , tkq }. For the sake of generality, we assume that the user associated weights {uk1 , . . . , ukq } to these terms. The inquiry can P thus be represented as a q point in the word space as q = r=1 ukr ekr , where ei =
We consider a set Z consisting of the terms with the highest weights. This set can be chosen based on two policies, specified during configuration: either one selects a fixed number of words with the highest weights, or one selects all the words whose weight is above a fixed threshold. Be it as it may, once we have the set Z we can normalize the weights with respect to the maximun, to obtain the final weighting scheme: zk =
(18)
The query is then the set of weighted words {(tk , zk )|k ∈ Z}
4
(19)
Testing
Testing the full extent of the context approach using the full query form (16) is quite problematic at this time for lack of a proper contextual server and its data base infrastructure. In order to obtain some preliminary indications, we used the limited weighting capabilities offered by the google(TM)
i
z }| { (0, . . . , 0, 1, 0, . . . , 0). The inquiry modifies the context by subjecting it to a sort of partial learning. Let [∗] be the neuron in the map closest to the inquiry point q. The map is updated, through a learning iteration, in such a way that the neuron [∗] gets closer to the point q by a factor φ, with 0 < φ ≤ 1. That is, given that the distance between q and [∗] is d(q, [∗]) before the context modification, it will be (1 − φ)d(q, [∗]) afterwards. All the neurons in the neighborhood of [∗] will be updated by changing their position as [µν]0 ← [µν] + φh(t, δ([∗], [µν])) · (q − [µν]).
zˆk where zM = max{zk |k ∈ Z} zM
search engine. The contextual query was translated in a collection of weighted terms as in (19), and weighting was roughly approximated through positioning and repetition of terms in the search engine query. We considered, for the results reported here, two contexts: the first is formed by the directory structure in the computer of one of us and uses, as working directory, a directory with several columns written by that author, whilst the second is a collection of documents on neurophysiology. For each context we randomly selected 32 terms, and each one of them was used in turn as a query term, either directly on the search engine (control group: 32 queries) or by creating context queries (test group: 32 queries). For each query, the results were collected and the precision of the first n ranked results, with n = 1, . . . , 8 was determined. We used a fairly lax criterion to determine whether an image was a “hit” or not. In this we complied with the broad and generalist nature of the two contexts: we considered as “hits” all documents that, in the test subject’s judgment, could conceivably be used as a reference for a presentation or a paper
(15)
This is the target context of the query. According to the semantic model that we are using, the target semantics for our query is given by the difference between the target context and the original one: ˜ = [µν] − [µν]0 = φh(t, δ([∗], [µν])) · (q − [µν]) (16) [µν] ˜ in a neighborhood of [∗] ˜ constitute our The values [µν] complete query expression. In this paper we are considering client-only approaches ˜ to contextual queries, so we have to transform the map [µν] 6
Context: computing
1.00
Context: neurophysiology
1.00
s With context c Without context
s With context c Without context s
c
0.00
s
c
s
c
s
c
s
c
s
c
s
c
s
s
c
c
0.00
s c
s
s
c
c
s
s
s
s
c
c
c
c
0.00 9.00 0.00
9.00
Figure 5. Precision of the results for the neurophysiology context.
Figure 4. Precision of the results for the computing context.
terms, such as “sort” (data sort in computing, an approximation of qualities in the common language) or “relationship” (relationship between neural events, interpersonal relationships) gave, quite unsurprisingly, the most dramatic improvements when context was used.
in the topic of the context (computing of neurophysiology). In information retrieval it is customary to measure at the same time precision and recall, often plotting one against the other, while we only measure precision. There are two reasons for this choice. On the one hand, in an open data base of millions of images like google’s, it is not quite clear how one might go about measuring recall: given any topic not overly specialized, it is virtually impossible to determine how many images are relevant for the given query. On the other hand, we argue that the significance of retrieval for internet applications is problematic at best. For many topics, the number of relevant images, while unknown, can be estimated in the thousands. Therefore, the recall of any manageable set of results is virtually zero. In other words, on the internet one doesn’t quite want to receive a set of results with a significant recall, because this set would be unmanageably large.
5
Words of parting
This paper has presented the outline of a model of meaning not based on annotations, one in which the reader’s context plays a preponderant rˆole. We have presented a simple framework in which we are currently experimenting with this model, a framework that in the future will be extended in different directions: on the one hand, the integration in it of more formal representations, at least for those parts of the context that can be formalized; on the other hand, the development of suitable data base techniques to make this kind of query efficient. Our purpose will be, on the one hand, to build a contextbased data access client (configured as a plug-in to some word processing or presentation program, if possible) to make context based retrieval on general web sites and repositories and, on the other hand, to build a context-based access server. The latter will be akin to the servers built for search engines such as yahoo(TM) or google(TM) but, while these servers do not co¨operate with the user’s computer (apart from the elementary communication necessary to retrieve the query and return the results), the server that we consider here will be integrated with the user’s computer from which it will derive the current context, and with which it will co¨operate to support interaction. We shall also work to integrate logical representations
The average precision, calculated over 32 queries, of the first n results is plotted against n (for n = 1, . . . , 8) together with the corresponding variance, in figures 4 and 5 for the computing and the neurophysiology context, respectively. It is evident even without a detailed analysis that the difference is large and statistically significant. Qualitatively, we observed that the difference depends on the particular query that is being made. Very technical words, whose semantic span is very limited to begin with, benefit little from the context, and fetch basically the same results with or without it. A query word such as “algorithm” in the computing context or “neurotransmitter” in the neurophysiology context, for instance, is quite unlikely to fetch images not related to computing or neurophysiology, regardless of the presence of context. On the opposite side, queries with ambiguous 7
in our model. To this end, we are analyzing two classes of techniques. On one hand, we are looking at the relations between geometry and fuzzy logic [17] and, on the other, at moephisms between logic systems and their continuity properties [10].
[8] Andreas Hotho, Robert J¨schke, Christoph Schmitz, and Gerd Stumme. Information retrieval in folksonomies: Search and ranking. In The Semantic Web: Research and Applications, volume 4011/2006 of Lecture notes in computer science, pages 411–26. Heidelberg:Springer-Verlag, 2006.
Acknowledgment
[9] S. Kaski. Computationally efficient approximation of a probabilistic model for document representation in the WEBSOM full-text analysis method. Neural Processing letters, 5(2), 1997.
This work was supported in part by Consejer´ıa de Educaci´on, Comunidad Aut´onoma de Madrid, under the grant CCG08-UAM/TIC/4303, B´usqueda basada en contexto como alternativa sem´antica al modelo ontol´ogico. Simone Santini was in part supported by the Ram´on y Cajal initiative of the Ministero de educaci´on y ciencia. Alexandra Dumitrescu was in part supported by the European Social Fund, Universidad Aut´onoma de Madrid.
[10] Markus Kr¨otzsch. Morphisms in logic, topology, and formal concept analysis. Master thesis, International center for computational logic, Dresden University of Technology, February 2005. [11] W. V. O. Quine. From a logical point of view. Cambridge, MA: Harvard University Press, 1953.
References
[12] Jos´e Carlo Rodr´ıguez. Jugadas, partidas y juegos de lenguaje: el significado como modificaci´on del contexto. Asunci´on:Centro de documentos y estudios, 2003.
[1] T. Berners-Lee, J. Hendler, and O. Lassila. The semantic web. Scientific American, 2001. [2] J. Budzik and K. J. Hammond. User interactions with everyday applications as context for just-in-time information access. In Proceedings of the 5th international conference on intelligent user itnerfaces, pages 44–51. New York:ACM Press, 2000.
[13] S. Santini. Exploratory image databases: contextbased retrieval. San Diego:Academic Press, 2001. [14] Simone Santini. Ontology: use and abuse. In Proceedings of AMR 2007: international workshop on adaptive multimedia retrieval. Heidelberg:Springer-Verlag, 2007.
[3] Scott Deerwester, Susan T. Dumais, George W, Furnas, Thomas K, Landauer, and Richard Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391– 407, 2000.
[15] Simone Santini. An oddly-positioned position paper on context and ontology. In Proceedings of the 2nd IEEE International conference on semantic computing. Santa Clara:IEEE, 2008.
[4] Alexandra Dumitrescu. The context as a determinant of semantics: use of teh local context of a personal computer to conduct searches on the web. Master thesis, Escuela Poli`ecnica Superior, Universidad Aut´onoma de Madrid, November 2008.
[16] Alfred Tarski. The semantic conception of truth and the foundations of semantics. Philosophy and phenomenological research, 4, 1944. [17] Guo-Jun Wang, Xiao-Jing Hiu, and Song Jian-She. The r0 -type fuzzy logic metric space and an algorithm for solving fuzzy modus ponens. Computer and Mathematics with applications, 55:1974–87, 2008.
[5] Terry Eagleton. Literary theory, an introduction. Minneapolis:Minnesota University Press, 1996. [6] L. Finkelstein, E. Gabrilovich, Y. Matias, E. Rivlin, Z. Solan, G. Wolfam, and E. Ruppin. Placing search in context: the concept revisited. ACM Transactions on information systems, 20(1):116–31, 2002.
[18] S. K. M. Wong Wong, Wojciech Ziarko, and Patrick C. N Wong. Generalized vector spaces model in information retrieval. In AMC Press New York, editor, Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval, pages 18–25, 1985.
[7] E. J. Glover, S. Lawrence, W. P. Birmingham, and C. L. Giles. Architecture of a metasearch engine that supports user information needs. In Proceedings of the eigth international conference on information and knowledge management, pages 210–6. New York:ACM Press, 1999. 8