|
HILT
Hypotheses
What
follows is a series of e-mails detailing hypotheses, ideas and guidance
on issues related to research within the scope of the HILT Project.
The idea behind the writing of such hypotheses is that stakeholders
and project group members will then help to identify data that may
allow us to refute some hypotheses and/or lend support to others.
The hypotheses are based on discussion which arose with both the
HILT Project Management Group and the HILT Steering Group:
For
the most recent HILT hypotheses please consult:
9.
Hypotheses Version 2
From: Dennis Nicholson <d.m.nicholson@strath.ac.uk>
The
e-mails below are listed in chronological order so as to retain
the strand of discussion:
1.
HILT: Request for hypotheses
From:
Susannah Wake <susannah.wake@strath.ac.uk>
2.
RE: HILT: Request for hypotheses
From:
Chris Rusbridge <c.rusbridge@compserv.gla.ac.uk>
3.
Re: [Fwd: RE: HILT: Request for hypotheses]
From:
"Dr P.B. Watry" <P.B.Watry@liverpool.ac.uk>
4.
HILT hypotheses
From:
"Craven,
Louise" <louise.craven@pro.gov.uk>
5.
Hypothesis
From: Alan Gilchrist <cura@fastnet.co.uk>
6.
Summary of HILT Hypothesis (1st Draft)
From: Dennis Nicholson <d.m.nicholson@strath.ac.uk>
7.
Re: Summary of HILT Hypothesis (1st Draft)
From: Rachel Heery <lisrmh@ukoln.ac.uk>
Followed with more perspectives from UKOLN by Rosemary Russell.
8.
Re: Summary of HILT Hypothesis (1st Draft)
From: Dennis Nicholson <d.m.nicholson@strath.ac.uk>
9.
Hypotheses Version 2
From: Dennis Nicholson <d.m.nicholson@strath.ac.uk>
10.
FW: Summary of HILT Hypothesis (1st Draft)
From: Stella G Dextre Clarke <SDClarke@LukeHouse.demon.co.uk>
Forwarded
by Dennis Nicholson 25/01/01
11.
Re: Hypotheses Version 2
From: "MacEwan, Andrew" [Andrew.MacEwan@BL.UK]
Top
Subject:
HILT: Request for hypotheses
Date: Wed, 20 Dec 2000 10:55:17 +0000
From: Susannah Wake <susannah.wake@strath.ac.uk>
To: sg hilt <lis-hilt-sg@jiscmail.ac.uk>
CC: mgt hilt <lis-hilt-mgt@jiscmail.ac.uk>
Dear All,
As requested I am sending you an e-mail to remind you about one
of the actions we decided to carry out at the Steering Group meeting
on the 14th December. The action was for each member of the group
to devise a hypothesis for a possible solution to the problem of
cross-searching by subject across communities. Alternatively any
ideas on what wouldn't work will also be accepted. These do not
need to be in-depth, just have a good brainstorm.
Included
below, for reference, are a couple of hypotheses we have developed.
If you like you could also tell us why these would or would not
work. It would be beneficial to all if we could conduct this through
the mailing list.
Hope
that you all have very festive Christmas and New Year,
Susannah.
p.s. Thanks to those who have sent us cards.
Hypothesis1:
Possible solution is that each community co-ordinates its own efforts
towards creating standards and a limited set of classification and
subject schemes, and thesauri. Then, when a search site is set up
to search across these communities, there will be a more manageable
number of schemes mapping will facilitate mapping to an easier extent.
In essence each sector much ensure controlled vocabulary is centrally
amended and updated.
Hypothesis
2: Use a universal faceted classification scheme such as UDC as
a language-independent core or intermediary concept language. Build
a universal macro-thesaurus on it (there may already be one such
as the UNESCO thesaurus; alternatively use DDC and LC subject headings,
which are already mapped to each other). Have each subject community
use their own most established thesaurus or thesauri to create microthesauri
off the macro thesaurus. This process can continue down to more
detailed levels of thesauri for more specific subject areas. Different
sectors with different needs can either come in at the macro level
(for general museums and archives) or work in with the micro level
if they cover specific subject areas.
Top
Subject:
RE: HILT: Request for hypotheses
Date:
Wed, 20 Dec 2000 16:44:01 -0000
From: Chris Rusbridge <c.rusbridge@compserv.gla.ac.uk>
To: 'Susannah Wake' <susannah.wake@strath.ac.uk>
CC: "'LIS-HILT-SG@JISCMAIL.AC.UK'" <lis-hilt-sg@jiscmail.ac.uk>
I have
some rather more pessimistic Rusbridge Hypotheses. Don't read too
much into the pessimism; the optimistic hypotheses have already
been put forward...
R1:
the general mapping of thesauri is impossible in any accurate way,
since it represents a particular case of the machine translation
problem, which notoriously fouls up on ambiguous terminology (remember
the joke about 'the vodka is good but the meat is rotten').
R1a:
Machine translation is a fruitful field to explore in HILT!
R2:
even if thesaurus mapping is possible, it will be of little help
to the average searcher, given what is known about searching habits.
In particular, it is difficult to persuade searchers to map their
specific term to a more general one, or to make use of 'advanced'
search facilities.
R2a:
use of thesauri to broaden terms will only increase the deluge of
useless information emerging from queries, and it will be even harder
to find the answer amongst the dross.
R3:
even if searching would be helped, those who prepare the metadata
cannot be relied upon either to adhere to the thesauri nominally
in use, or to bring their data up to date (search any largish library
catalogue by author to see the number of name variants you find).
R4:
browse structures represent some form of classification (often informal
and irregular); browse structures fail to the extent that there
is a mismatch between the classification scheme and the browser's
mental models. Being able to re-build browse structures according
to a different classification scheme that more accurately represents
the mental model would help browsers navigate more easily.
R5:
there exists a high level thesaurus, but there is imperfect mapping
between it and domain specific thesauri. This mismatch will be detrimental
rather than helpful to the searcher in many cases (although the
mapping may be helpful in others).
R6:
this is a case where the flexibility of the human brain is paramount.
A good librarian will be of more assistance than an unfamiliar thesaurus!
R7:
re-phrasing the issue in terms of ontologies will help (I could
have put this in the negative as well).
R8:
the answers can only be found through user testing with searchers,
browsers and metadata managers.
--
Chris Rusbridge
Director of Information Services, University
of Glasgow
GLASGOW G12 8QQ
phone 0141 330 2516 fax 0141 330 5620
email: C.Rusbridge@compserv.gla.ac.uk
Top
Subject:
Re: [Fwd: RE: HILT: Request for hypotheses]
Date: Fri, 22 Dec 2000 09:43:11 +0000 (GMT)
From:
"Dr P.B. Watry" <P.B.Watry@liverpool.ac.uk>
To: Susannah Wake <susannah.wake@strath.ac.uk>
CC: LIS-HILT-SG@JISCMAIL.AC.UK, c.rusbridge@compserv.gla.ac.uk
Hello
I think
that Chris is largely on the money with these comments, particularly
about use of high level thesauri. There appear to be three major
difficulties.
1.
The very pressing need to train data contributors how correctly
to use thesauri in order to construct control access points. Many
archivists involved with the HE Archives Hub, for example, made
up unofficial extensions to the UNESCO thesaurus without any form
of versioning control. Also, many didn't appear to understand that
it is best to maintain thesauri independently of the services they
support.
2.
The related question of getting end-users to use thesauri effectively
to get to the information they need. Here, there needs to be a variety
of search strategies which are transparent, including 2-stage hypertext
browsing, support for relevance ranked searches, and capability
to browse subject access headings. All of which are now implemented
for the HE Archives Hub (which supports both LCSH and UNESCO data).
The most recent research suggests that "clustering" related subject
headings may be the most effective way forward.
3.
The final question of "mapping" one thesaurus onto another. We are
carrying on experiments in this direction. But I have to say that
LCSH for all its faults seems to be the most appropriate thesaurus
for English language data sets. To me it seems the best solution
for many data contributors is to create "official" extensions to
LCSH which suit their needs and register them at the Library of
Congress, where they can be properly versioned, distributed, and
supported.
For
the HE Archives Hub we constructed a Cheshire Z39.50 resource for
LCSH and UNESCO. See http://gondolin.hist.liv.ac.uk/~cheshire/lcsh
and http://gondolin.hist.liv.ac.uk/~cheshire/unesco.
You may find it interesting to have a look at the initial search
page of the HE Archives Hub with the new subject browsing capabilities.
http://www.archiveshub.ac.uk
I will
be away for my annual leave for the first three weeks of January.
Best
wishes
Paul
Top
Subject:
HILT hypotheses
Date: Wed, 3 Jan 2001 15:53:21 -0000
From: "Craven, Louise" <louise.craven@pro.gov.uk>
To: "'susannah.wake@STRATH.AC.UK'" <susannah.wake@strath.ac.uk>
Happy
New Year!
I'm
starting from a position which accepts that Hypothesis 1 and Hypothesis
2 can be achieved, then identifies a problem which exists in the
use of thesauri I'm familiar with, and then applies the problem
to both hypotheses.
Apologies
for being so detailed and micro rather than macro!
Problem:
how will a HILT cope with words whose meanings have changed over
time?
Where
there is a clear change in meaning, and a clearish date, an artificial
cut off date could be used as a qualifier (as some cataloguers of
medieval docs do with the use of toponyms, epithets and patronymics,
to mark the development of surnames), or the context could be given
in parentheses after the term (as with homonyms.)
but
where
there is no clean change and where the older meaning may be retained
in a current compound, this is not so easy to do
eg
intelligence
currently
according to OED defined as ' the faculty of understanding' and
found in UNESCO as a next narrower term under the whole micothesaurus
Psychology
but
in 15th -18th Century usage, intelligence means 'news or information'
This
usage is however, currently maintained in 'military intelligence'
(OED: ' The obtaining of information ; the agency for obtaining
secret information 1697, (Revived in modern wars))' , as also in
'marketing intelligence'
How
are these sequential and variant meanings to be provided for?
Within
the UNESCO thesaurus, military intelligence can be provided for
by adding the term and by adding see alsos with upward posting.
(The differences in meaning here though are not related in the strict
associative sense, or in the ISO 2788 sense found as 8.4.3 Terms
belonging to different categories)
4.10
Psychology
intelligence
see
also information sciences
see also military intelligence
5.05
Information sciences
Information
see also psychology
see also military intelligence
6.45
Civil, Military and mining engineering
Military Engineering
NT 1 Military intelligence
see also information sciences
see also psychology
LCSH
has
Intelligence
Intelligence Quotients
and
Intelligence
Service
Military intelligence
similar
to UNESCO's 4.10 and 6.45, but again a term/relationship which accommodates
the historical meaning of intelligence as news/information needs
to be added.
If
you apply the problem now to Hypothesis 1 and 2:
Hypothesis
(1) Clearly in this instance these terms could be mapped, but the
searcher looking for 'intelligence' (we know not of what meaning)
would be faced with a set of relatively complicated relationships
re one term in two thesauri. If this is magnified in terms of similar
examples and numerous thesauri, are we making a structure which
is unewieldy and unhelpful to the user? These are after all relatively
high level terms.
Would
a browse facility at the outset of the HILT reduce relationship
links and possible ambiguity which searches would throw up?
Hypothesis
(2) Can a language independent core or an intermediary concept language
deal with these kind of changes in meaning?
Best
wishes
Louise
Craven
Louise
would like to add that in using these examples she was not referring
to mapping as a whole but to instances "i.e. the changing of
meaning." She believes that mapping is possible but points
out that "there are many problems which need addressing/ for
which we need to find solutions."
Top
Subject:
Hypothesis
Date: Wed, 10 Jan 2001 18:21:02 -0000
From: Alan Gilchrist <cura@fastnet.co.uk>
To: LIS-HILT-MGT@JISCMAIL.AC.UK
A hypothesis
for HILT
Preamble
HILT
is, essentially, an exercise dealing with probability and granularity.
Unless huge funds are made available, the approach must be pragmatic,
and possibly iterative in the implementation, moving towards an
acceptable level of performance rather than striving for perfection.
There
are three levels of granularity:
1. An outline description of the stakeholder's collection.
2. The classification/indexing language used by the stakeholder.
3. The set of classification/indexing decisions made by the stakeholder.
One,
above, might provide a useful first filter. With respect to 2. and
3. above, there are probably only four options available for achieving
correlation between the stakeholder collections:
1.
A complete merger/reconciliation of the language schemes (hugely
impracticable)
2. A partial reconciliation of the language schemes (recommended
below)
3. The creation and use of a switching mechanism or "intermediate
lexicon" through which indexing decisions would be translated (hugely
impracticable, and unlikely to be effective given the heterogeneous
nature of the stakeholder collections)
4. The use of "intelligent" software to crawl the indexing decisions
made by each stakeholder (pre-supposing that they are all in electronic
format). Such an agent would probably be in the family of automatic
categorisers/text miners and be rule-based (problem of uncertainty
of efficacy and probably a prohibitive cost, unless a friendly software
house could be persuaded to finance an experiment).
Hypothesis
1.
Establish subject-based profiles of each collection, possibly in
the form of a matrix - generic subjects vs. form. (The recent questionnaire
might provide some or all of this data).
2. Select a common, "neutral" classification scheme (DDC, Broad
System of Ordering (BSO), UNESCO Thesaurus) as a generic benchmark.
3. Ask each stakeholder to indicate on this benchmark. those subjects
which are contained in his/her collection to some reasonably significant
extent; and to some previously agreed hierarchical levels (which
will vary between the major subjects)
4. Consider the possibility of extending this mark-up to include
numbers of items in each collection against each of the marked-up
subjects (derivable from electronic records as posting frequency?).
5. Support local searching of selected collections.
6. Establish a feedback mechanism to examine failures. (For example,
all or selected failures could be passed to each stakeholder to
be searched independently in each collection).
7. Adjust and maintain the benchmark scheme and the mark-ups accordingly.
Summary
This
hypothesis, or possible line of attack suggests that it could be
practicable to work at the level of the indexing languages used
(and how they are used), rather at the far more detailed and complex
level of indexing decisions made by each stakeholder. In effect,
the HILT would be a weighted index to the indexing languages used;
and the searcher, having selected the most likely collections would
then be obliged to conduct specific searches of each collection
selected, using the particular indexing languages and protocols
provided. [Question: Does this approach seem to be too modest
in comparison with the initial aspirations of the HILT Project?]
However,
there may still be reconciliation problems at the indexing language
level, caused by local pragmatics, or the interpretation (even of
quasi-standard schemes) in order to meet perceived local needs.
(The daftest example I can think of occurred at Aslib some years
ago. Because Aslib was the FID National Member, it used the UDC
in its Library. The concept "Cottage industries" was classified
as "Industry:Cottages"). There may well exist more legitimate examples.
It is not inconceivable to imagine a report on "Money-lending in
rural India" being classed under India, Economics, Sociology by
three different collections with different core interests.
Top
Subject:
Summary of HILT Hypothesis (1st Draft)
Date: Mon, 15 Jan 2001 14:05:37 -0000
From: Dennis Nicholson <d.m.nicholson@strath.ac.uk>
To: LIS-HILT-MGT@JISCMAIL.AC.UK
Dear
All
Thank
you for your helpful responses to our request for hypotheses. I
have now compiled what I hope is a coherent and more or less logically
arranged list of these reworded in what I trust :) is a helpful
way.
I would
be grateful if you would study the attached document with a view
to telling me:
1.
Whether I have missed something out that you said (I felt that a
number of the hypotheses sent were essentially the same as each
other but that, obviously, is a matter of interpretation)
2.
Whether, having read this wondrous document, you are inspired to
add other hypotheses to the list
3.
Whether you feel I have misrepresented something you submitted
4.
Whether you feel I should reword anything I have written
PLEASE
DO NOT, AT THIS STAGE, SEND DISCUSSIONS OR REFUTATIONS TO THE LIST.
I WANT TO GET THE LIST RIGHT BEFORE WE START THAT
Cheers
Dennis
Attached
Document:
HILT
Hypotheses Summary (1st draft) Pre-question: What is the problem,
what is its scope?
What
is required to solve the problem is:
1.
A magic wand - in other words, the problem can't be solved, either
because no solution will be universally acceptable or sufficiently
flexible or any solution will be too difficult to implement for
political or practical or economic reasons (e.g. because machine
translation is required and this is impossible because of the ambiguity
of many terms) or staff in different services and/or communities
won't apply the agreed standard approach correctly or in the same
standard fashion or the users won't use or understand the wonderful
new scheme even if it is put in place and used consistently. Arguably,
this hypothesis is simply the negation of all of the others and
translates to 'All positive hypothesis as to a solution have been
shown to be refutable therefore the problem can't be solved'. There
is, however, at least one good reason to leave it in. An extreme
variation of it is arguably that - despite appearances - there is
no problem. I guess this translates as the users are fine as they
are - we think they need the universal and consistent use of controlled
terminologies for cross searching but in reality they don't, they
either don't need to do it or are happy to muddle through with what
they've got (which may well be more suited to their preferred approach).
I don't believe this, but it is theoretically a position that might
be defended in the absence of evidence to the contrary. This is
arguably the hypothesis 'the problem can't be solved because there
is no problem'. If it is true, or even largely true, it will, at
the very least, have implications for how much trouble, effort and
expense it would be worth going to solve the problem via one of
the hypothetical solutions proposed below. Another variation is
that many users - maybe even the majority - will find that the use
of thesauri will make finding things more difficult because, since
users can't map their mental terminologies to the thesauri, it will
increase the number of false drops (I'm inclined to award Chris
the prize for the best 'standing the accepted view on its head and
giving it a good shake' hypothesis for this one J ). If it is true
then it may be that, for example, 'the greatest good' is served
by not having a HILT, even if some users do need it [Question: Does
having a choice solve the problem? Note that the answer is not necessarily
'yes']
2.
A single universal scheme that everyone, albeit reluctantly in some
cases, will sign up to and apply (hot favourites here being LCSH
and Unesco - not necessarily in that order, I hasten to add). Included
here is the creation of "official" extensions to (for example) LCSH
to meet 'local' needs so that they can be properly versioned, distributed,
and supported.
3.
A mapping between two or more key schemes (LCSH and Unesco?) - especially
if there is an automated terminologies mapping service allowing
a gradual build up and maintenance of a complex series of mappings
[Question: Does Louise's example refute this by showing that successful
mapping is impossible?]
4.
A community based approach in which each community aims to ensure
adherence to standards and a limited set of classification and subject
schemes, and thesauri is required to ensure interoperability within
the community and to make inter-community interoperability at least
a more manageable problem by limiting the extent of variations within
communities [Question 1: are communities self-defining - i.e. a
community is a community if it says it is and institutions are members
if they say they are?] [Note: this is a re-casting of the hypothesis
previously known as 1]
5.
The combined use of one or more universal subject schemes mapped
not to each other but to a classification scheme such as DDC or
UDC which is then used as a 'language independent core' and the
basis for cross-searching between services using different universal
schemes. This would probably be implemented in the context of hypothesis
4 [Note: this is a re-casting of part of the hypothesis previously
known as 2 - it is also, roughly speaking, what OCLC propose (I
think), although their view would be that DDC is the best choice
for various reasons - especially since it supports communication
across linguistic boundaries. It is also, roughly speaking, the
approach suggested by Alan and others and an approach that has been
considered within CAIRNS
and which we may be able to test here at Strathclyde - see my earlier
e-mail]
6.
As for 5, except create a new universal scheme based on DDC or UDC.
In this scenario, it would be possible for communities to also create
more specific micro-thesauri for more in-depth searching of their
collections. In this event, different sectors with different needs
can either come in at the macro level (for general museums and archives)
or work in with the micro level if they cover specific subject areas.
[Note: this is a re-casting of the other part of the hypothesis
previously known as 2]
7.
Machine translation [Note: we need to define precisely what we mean
by this - for example, do we include neural networks and AI agents
that can learn or be taught about mappings?]
8.
Not only one or other of 2-7 above and 9-14 below but also some
mechanism for 'mapping' the HILT terminologies to those in the minds
of the users is needed. [Question: Might it be enough to allow users
to browse the terms used and do their own mapping?]
9.
Not only one or other of 2-8 above and 10-14 below but also some
means of ensuring that staff apply whatever solution is adopted
consistently and fully understand what it is they are aiming to
do - training, hierarchical checking mechanisms, metadata creation
aids etc. - is needed.
10.
Not only one or other of 2-9 above and 11-14 but also a mechanism
to allow the user to re-build the classification/browsing structure
automatically to suit their own mental terminology is needed. [Question:
How would this be done?]
11.
Not only one or other of 2-10 above and 12-14 below but also a coherent
mapping match between the solution and existing domain specific
thesauri is needed [Note that this is not the same as 6 above].
12.
Not only one or other or a mix of 2-11 above and 14 below, but also
a good librarian or other intermediary to translate between user
and thesauri is needed.
13.
Not only one or other of 2-12 above but also good user training
and a suitable variety of flexible search facilities are needed
(e.g. 2-stage hypertext browsing, relevance ranked searches, browsable
subject headings, clustering related subject headings, linking combinations
of key words, combining clusters).
14.
Not only 2-13 above but also a multi-lingual capability is needed
15.
A good librarian or intermediary without all these other trappings.
16.
A closer analysis of the nature of the problem - for example, would
re-phrasing the issue in terms of ontologies help illuminate the
route to a solution? [There is an attempt to define what an ontology
is in this context at http://www-ksl.stanford.edu/kst/what-is-an-ontology.html
If that doesn't help, then try http://www.cs.vassar.edu/faculty/welty/papers/subjects/subject.html
and look at Figure 2.1 and its legend. I'm no expert, though, so
if someone out there can point to a better web page please feel
free]
17.
Empirical data obtained by conducting tests involving searchers,
browsers and metadata managers. DMN HILT 15.01.01
Top
Subject:
Re: Summary of HILT Hypothesis (1st Draft)
Date: Tue, 16 Jan 2001 11:24:38 +0000
From: Rachel Heery <lisrmh@ukoln.ac.uk>
To:
LIS-HILT-SG@JISCMAIL.AC.UK
On
Mon, 15 Jan 2001, Dennis Nicholson wrote:
>
2. Whether, having read this wondrous document, you are inspired
to
add other hypotheses to the list
I hesitate
to add anything as there is already a long list... but perhaps there
could be more mention of existing commercial products as a possible
solution? Perhaps the following could be used as expansions of hypotheses
3 and/or 7?
---------------------------------------------------------------
Hypothesis: there are existing commercial products which would offer
cost effective means to improve navigation of 'large digital spaces'
by the end-user. There are a variety of products (e.g. WordMap
[1], Autonomy[2])
which take different approaches. They are characterised by using
taxonomies, whether built from a sample document dataset or from
combining existing taxonomies, in order to enhance the user's natural
language search.
A complementary
hypothesis might be that it is prohibitively expensive to commit
to a solution that involves hand-crafted metadata creation (especially
if one considers trying to update existing metadata) and that the
solution must be available as 'middleware' separate from the metadata
repositories themselves. (This might be considered making sense
from a funding viewpoint in that one could subscribe to the 'navigation
enhancement' as a separate facility independent of particular services.)
----------------------------------------------------------------
The above hypotheses assume the taxonomy assists the user in gathering
search terms which are then used to search a number of diverse services.
I think the success of this approach in the commercial world might
depend on searching against full text, but maybe it would work on
(rich) structured metadata too....
I do
think it would be helpful to develop 'statements of the problem
we are trying to solve' in parallel with the hypotheses... are we
trying to enable subject access at the 'item level' or at the 'collection
level'? are we trying to accommodate the user's detailed search
terms or are we making available collection strengths?
btw
I noticed mention of Alan Gilchrist's report on the TFPL
web site. A fitting extract from exec summary:
"
At the heart of the taxonomy debate is the need to achieve
a balance between the talent of the taxonomy designer, the cost
of the system to implement the taxonomy and the familiarity of the
users both with the system and the structure of the information
itself."
http://www.tfpl.com/areas_of_expertise/__knowledge_management/taxonomies/
taxonomies.html
Title:
Taxonomies for business: Access and connectivity in a wired world
Authors: Alan Gilchrist, Peter Kibby, Barry Mahon and Sandra Ward
Publisher: TFPL Ltd., 17-18 Britton Street, London,EC1M 5TL, UK
Date: November 2000 ISBN: 1 870889 – 98 – 3 Price: £80 /$120 plus
p&p
1.
www.wordmap.com
2. www.autonomy.com
also
see
http://www.interwoven.com/products/metafinder/description.html
Rachel
Top
Rosemary
Russell <lisrr@ukoln.ac.uk>
from UKOLN noted important
issues for HILT to address. These include:
* As
has been said, HILT has ambitious aims, and a fairly short timescale
* It is difficult to see that a single common scheme might be found
to meet the requirements of such a broad group of communities
* Will people be motivated enough, to be prepared to envisage compromising
and/or to undertake additional work eg mapping from a local scheme
to a universal one? May take some persuasion.
* Collection owners/service providers who use specialist, detailed
subject schemes may perceive a 'dumbing down' of their local 'advanced'
search options when accessed by a distributed cross-searching service
which uses a high-level thesaurus. (This objection has often arisen
in discussions about Z39.50 searching, where service providers can
be reluctant to offer 'lowest common denominator' searching, as
opposed to their local specialised search interface, which may offer
many more options. Research (Dig Lib?) has shown that research users
are keen to have *both* - cross-searching and individual specialist
database searching.)
* There is potentially a very large number of schemes for HILT project
staff to analyse -- see already the A-Z list of thesauri at: http://hilt.cdlr.strath.ac.uk/Sources/thesauri.htm
* Related Renardus research
issues to track: One of the things Renardus wants to do is add subject
browse access functionality at the Renardus broker level. The subject
scheme chosen for this is the top levels of the Dewey Decimal Classification
(DDC). Use of DDC has been negotiated with OCLC Forest Press. However,
most of the gateways that the Renardus broker will interrogate do
not use DDC. A small group of Renardus people are meeting next week
to talk about how to produce recommendations for classification
mappings from the schemes used by gateways to DDC. Things to be
considered include: a. how detailed the top-level view of DDC should
be (this may differ between parts of the schedules); b. whether
to discard the 'facet-type' features of DDC; c. how to do the mappings
themselves - one-to-one mappings will not necessarily be useful.
It isn't clear whether creating a useful (and scalable) browse system
in this way is possible, but it is worth investigating. (MD)
Rosemary
Top
Subject:
Re: Summary of HILT Hypothesis (1st Draft)
Date: Tue, 16 Jan 2001 14:05:36 -0000
From: Dennis Nicholson <d.m.nicholson@strath.ac.uk>
To: LIS-HILT-SG@JISCMAIL.AC.UK
Thanks,
Rachel. I will incorporate into next draft.
Would
you agree that it might fit into an amended and extended version
of 7 (Machine Translation)?
Dennis
Top
Subject:
Hypotheses Version 2
Date: Mon, 22 Jan 2001 13:29:10 -0000
From: Dennis Nicholson <d.m.nicholson@strath.ac.uk>
To: LIS-HILT-MGT@JISCMAIL.AC.UK
Dear
All
I attach
a second version of the HILT hypotheses document for comment. I've
made changes to hypotheses 3 and 7 in line with comments from the
HILT e-mail groups and expanded the introduction covering what problem
we are addressing and also reworked the wording of the hypotheses
in line with this.
I'd
appreciate it if you could comment on the new version. Specifically:
1. Do you agree with the wording/detail in the introduction?
2. Are we still missing some hypotheses?
3. Do the hypotheses as reworked look OK - do they still mean what
you intended or understood them to mean earlier?
4. Anything else that occurs to you.
Please
don't comment on the truth or falsehood of any of the hypotheses
as yet. Once I get responses on this version, we will look at how
the results of the stakeholder and literature surveys impinge on
the hypotheses and also invite comment from the three lists on whether
participants have access to data that will refute a particular hypotheses.
Thanks
- please try to respond by the end of the week
Cheers
Dennis
Attached
Document:
HILT
Hypotheses Summary
[2nd draft - changes made to 3 and 7 in line with comments on V1
plus introduction expanded and hypotheses reworked in line with
this]
Introduction: What is HILT aiming to do? HILT is charged
with examining the practices and circumstances of UK networked information
and resource services in respect of the subject description of their
materials with the aim of determining:
1. What action, if any, is required with regard to harmonising or
integrating differing practices in this area in order to optimise
accurate and appropriate collection, item, and sub-item level retrieval
by subject in cross-searches of these services to whatever extent
the staff of the services believe is required by the users they
serve - in short, to determine whether there really is a problem,
or a problem of any significant magnitude, and, if so, what its
scope is. Implicit in this is a question of whether or not the situation
is the same for all services and their users or whether the answer
to these questions differs in respect of different sub-groups of
the total group of services.
2. Whether any problem identified can be solved in an affordable,
cost-effective, practical, politically workable way that is both
sustainable and compatible with international approaches to such
issues and, if so, how it may best be solved.
Hypotheses
In
line with this, the initial list of HILT hypotheses are:
0. There is no problem, or no significant problem. I guess this
translates as the users are fine as they are - we think they need
the universal and consistent use of controlled terminologies for
cross searching but in reality they don't, they either don't need
to do it or can happily muddle through with what they've got (which
may well be more suited to their preferred approach). I don't believe
this, but it is theoretically a position that might be defended
in the absence of evidence to the contrary. If this hypothesis is
true, or even largely true, it will, at the very least, have implications
for how much trouble, effort and expense it would be worth going
to solve the problem via one of the hypothetical solutions proposed
below.
1. There is a problem, and it can be solved in an affordable, cost-effective,
practical, politically workable way that is both sustainable and
compatible with international approaches to such issues by implementing
one or other, or a mix of the ideas detailed in the list below (see
2 onwards). This hypothesis will be refuted if all of the remaining
hypotheses are refuted (including 0 above). Refutation will mean
that there is a problem but no way of solving it within overall
requirements - either because no solution will be universally acceptable
or sufficiently flexible or any solution will be too difficult to
implement for political or practical or economic reasons (e.g. because
machine translation is required and this is impossible because of
the ambiguity of many terms) or staff in different services and/or
communities won't apply the agreed standard approach correctly or
in the same standard fashion or the users won't use or understand
the wonderful new scheme even if it is put in place and used consistently.
Another variation suggested is that many users - maybe even the
majority - will find that the use of thesauri will make finding
things more difficult because, since users can't map their mental
terminologies to the thesauri, it will increase the number of false
drops, the argument being, I guess, that 'the greatest good' is
then served by not having a HILT, even if some users do need it
[Question: Does having a choice solve the problem? Note that the
answer is not necessarily 'yes']
2. The problem can be solved by using a single universal scheme
that everyone, albeit reluctantly in some cases, will sign up to
and apply (hot favourites here being LCSH and Unesco - not necessarily
in that order, I hasten to add). Included here is the creation of
"official" extensions to (for example) LCSH to meet 'local' needs
so that they can be properly versioned, distributed, and supported.
3. The problem can be solved by using a mapping between two or more
key schemes (LCSH and Unesco?) - especially if there is an automated
terminologies mapping service allowing a gradual build up and maintenance
of a complex series of mappings
4. The problem can be solved by using a community based approach
in which each community aims to ensure adherence to standards and
a limited set of classification and subject schemes, and thesauri
is required to ensure interoperability within the community and
to make inter-community interoperability at least a more manageable
problem by limiting the extent of variations within communities
[Question: are communities self-defining - i.e. a community is a
community if it says it is and institutions are members if they
say they are?]
5. The problem can be solved by using one or more universal subject
schemes mapped not to each other but to a classification scheme
such as DDC or UDC which is then used as a 'language independent
core' and the basis for cross-searching between services using different
universal schemes. This would probably be implemented in the context
of hypothesis 4 [Note: this is, roughly speaking, what OCLC propose
(I think), although their view would be that DDC is the best choice
for various reasons - especially since it supports communication
across linguistic boundaries. It is also, roughly speaking, the
approach suggested by Alan and others and an approach that has been
considered within CAIRNS
and which we may be able to test here at Strathclyde - see my earlier
e-mail]
6. The problem can be solved by using a similar approach to 5 but
for which new universal scheme based on DDC or UDC is created so
that communities can also create more specific micro-thesauri for
more in-depth searching of their collections based on the scheme
In this event, different sectors with different needs can either
come in at the macro level (for general museums and archives) or
work in with the micro level if they cover specific subject areas.
7. The problem can be solved by using a machine-assisted solution,
perhaps involving middleware like wordmap (www.wordmap.com)
or autonomy (www.autonomy.com)
and based on taxonomies (see http://www.interwoven.com/products/metafinder/description.html
and http://www.tfpl.com/areas_of_expertise/__knowledge_management/taxonomies/taxonomies.html
or (possibly) neural networks or AI agents that can learn or be
taught about mappings. One reason for this being the case might
be that a solution based on metadata crafting may be prohibitively
expensive.
8. The problem can be solved by using a mix of one or other the
various solutions proposed elsewhere in this list plus some mechanism
for 'mapping' the HILT terminologies to those in the minds of the
users is needed. [Question: Might it be enough to allow users to
browse the terms used and do their own mapping?]
9. The problem can be solved by using a mix of one or other the
various solutions proposed elsewhere in this list plus some means
of ensuring that staff apply whatever solution is adopted consistently
and fully understand what it is they are aiming to do - training,
hierarchical checking mechanisms, metadata creation aids etc. -
are needed.
10. The problem can be solved by using a mix of one or other the
various solutions proposed elsewhere in this list plus a mechanism
to allow the user to re-build the classification/browsing structure
automatically to suit their own mental terminology is needed. [Question:
How would this be done?]
11. The problem can be solved by using a mix of one or other the
various solutions proposed elsewhere in this list plus a coherent
mapping match between the solution and existing domain specific
thesauri [Note that this is not the same as 6 above].
12. The problem can be solved by using a mix of one or other the
various solutions proposed elsewhere in this list plus a good librarian
or other intermediary to translate between user and thesauri is
needed.
13. The problem can be solved by using a mix of one or other of
the various solutions proposed elsewhere in this list plus good
user training and a suitable variety of flexible search facilities
[e.g. 2-stage hypertext browsing, relevance ranked searches, browsable
subject headings, clustering related subject headings, linking combinations
of key words, combining clusters].
14. The problem can be solved by using a mix of one or other the
various solutions proposed elsewhere in this list plus a multi-lingual
capability
15. The problem can be solved by using a good librarian or intermediary
without all these other trappings.
16. A solution can (only) be found through a closer analysis of
the nature of the problem For example, would re-phrasing the issue
in terms of ontologies help illuminate the route to a solution?
[There is an attempt to define what an ontology is in this context
at http://www-ksl.stanford.edu/kst/what-is-an-ontology.html
If that doesn't help, then try http://www.cs.vassar.edu/faculty/welty/papers/subjects/subject.html
and look at Figure 2.1 and its legend. I'm no expert, though, so
if someone out there can point to a better web page please feel
free]
17.
A solution can (only) be found by obtaining empirical data obtained
by conducting tests involving searchers, browsers and metadata managers.
DMN HILT 22.01.01
Subject: FW: Summary of HILT Hypothesis (1st
Draft)
Date: 16 January 2001 15:20
From: Stella Clarke [mailto:sdclarke@lukehouse.demon.co.uk]
To: d.m.nicholson@strath.ac.uk
Dennis,
This
comment comes not from a "stakeholder" but from an interested observer.
Thus I have not seen the correspondence on the HILT management list,
just your own summary. And I am not sure what is the overall scope
of the project.
One
option not on your list is to develop a "search thesaurus" rather
than a controlled vocabulary. In other words, it would seek to give
you inspiration as to keywords to try, rather than guiding you to
preferred usage. There are several models of search thesaurus and
no consensus. In one model, the origin of each keyword is noted
(e.g. from INSPEC thesaurus, and/or ERIC,
etc) so that you know which database(s) to apply it to. Even better
if it had the numbers of postings! Then it would be a resource to
help you choose databases as well as improve your search statement.
Some imaginative things could be done with concept clustering too.
The search thesaurus could be viewed as a variation on your first
hypothesis, in which you give up trying to cajole users into understanding
what an IR thesaurus is, and fall back on a sort of souped-up inspirational
tool in the Roget mould. Sorry if all this is outside the scope
of the project. I'm not pushing it as the best solution either -
just another option.
Stella
Top
Subject:
Re: Hypotheses Version 2
Date: 6 February 2001
From: "MacEwan, Andrew" [Andrew.MacEwan@BL.UK]
Dennis
et al
A
comment on the aims. I suggest introducing the word "navigation"
in addition to "retrieval" by subject. Retrieval only covers the
concept of inputting a subject term and matching it with items/collections
indexed with that term. Navigation covers the aspect of placing
a given subject in a context within a thesaurus or classification
scheme - thus allowing users to use the structure to inform and
redefine their searches. I think it is worth including this concept
in the aims because it helps to inform judgement on what the different
hypotheses can achieve for the end user.
Aim
1 could read:
"Whether
it would benefit the end user to harmonise or integrate differing
practices in this area in order to optimise accurate and appropriate
navigation and retrieval by subject at collection, item and sub-item
level."
Regards
Andrew
MacEwan
The British Library
Top
|