| |
HILT: High-Level
Thesaurus Project Proposal
HILT High
Level Thesaurus Project Phase II: Collection Level Subject Terminology
Requirements in FE and HE: Pilot Service, Cost-benefit Analysis and User
Evaluation
| Contents |
|
1. Overview
1.1 The Problem
1.2 Background: HILT Phase I to HILT Phase II
1.3 Staying in Step with other Relevant JISC Activities
1.4 An Experienced Team and a Rich Distributed Research
Environment
2. Project Aims
3. Project Description, Including General Approach
and Methodology
3.1 General Points
Building
the TeRM
Building
the Research Environment
Machine
to Machine (M2M) Considerations
3.2
Key Project Activity Groups (Note: group schedules overlap - see
main section 6 below)
4. Participants and Roles
5. Project Management
6. Start and Finish Dates and Schedule
7. Deliverables
8. Standards
Costs (not publicly available)
10. Dissemination Strategy
11. Proposed Exit Strategy
12. Project Contact
Appendix
A: Interactive Terminologies Route Map (TeRM) Diagram
Appendix B: Key Questions Relating to Terminology
Map Modelling Requirements
Appendix C: Relationship to HILT Phase 1,
the DNER, and the JISC Information Environment
|
| |
|
| 1
Overview |
|
| 1.1
The Problem |
| Ensuring
that FE and HE users of the planned Information Environment (IE) can
find appropriate learning, research and information resources by subject
is one of the major challenges facing the JISC, the DNER, the RDN,
and the various key information and learning service providers across
the archives, libraries, museums, and electronic services domains.
The various service providers use a range of subject schemes (from
general schemes like LCSH, UNESCO, DDC, and AAT, to specific schemes
like MeSH) to meet the requirement to adequately and consistently
describe their resources for accurate retrieval. If cross-searching
and browsing is to function coherently for users of the IE, these
schemes must be mapped to one another, perhaps using a common 'spine'
such as DDC with international and multi-lingual application and the
potential to facilitate machine to machine (M2M) interworking. More
importantly, perhaps, the terminologies in the minds of different
types of FE and HE users must be 'disambiguated' , then translated
into the service-assigned terms the users need to cross-search or
browse the group of services of relevance to their query. The aim
of HILT Phase II is to build and evaluate a pilot service that will
mediate this process as a 'Shared Service' in the IE. |
|
| 1.2
Background: HILT Phase I to HILT Phase II |
|
HILT
Phase I reported in November 2001. It was funded jointly by JISC
and RSLP and was in line with the 'Study and Specification' stage
of one aspect of the Shared Services programme - that aspect relating
to the provision of thesauri and terminology services. HILT Phase
II moves this process into the 'Pilot Project' stage, focussing
- as recommended by the HILT Phase I evaluator - on terminology
and thesauri requirements at the collection level, but also bearing
in mind the need to extend this in due course to the needs of item
level retrieval. Phase II will last for 12 months, will cost £89,000,
and will utilise the work of HILT Phase I, and the skills and experience
of the team that carried it out, to:
- Build
an initial pilot, then develop it in line with user, service,
and expert evaluator outputs
- Determine
detailed requirements, costs, and benefits to FE and HE users
of a full terminologies service focussed primarily on collection
level needs
|
|
| 1.3
Staying in Step with other Relevant JISC Activities |
| The
project team will take full account of advice and outputs from other
relevant JISC activities in the area, including the feasibility and
scoping work being carried out by MIMAS and UKOLN in respect of Shared
Services for collection and service description, the work of the Collection
Descriptions Focus, and relevant activities in other IE areas such
as security and authentication work, RDN work within the Renardus
Project, the work of the FE support Centres, the work towards the
creation of a learning materials repository, where subject terminologies
will again be an issue, DNER formative evaluation work at CERLIM,
and work on user behaviour. |
|
| 1.4
An Experienced Team and a Rich Distributed Research Environment |
| A full
list of the organisations and individuals involved in the project
is provided in section 4 below. Through its involvement in the CAIRNS
clumps project (which utilised collection strengths to landscape mini-clumps),
the SCONE and SEED projects which combined to build a cross-sectoral
collections database , and HILT Phase I, the lead site - Strathclyde
University's Centre for Digital Library Research (CDLR ) - has extensive
experience in the use of collection level descriptions in a dynamic
distributed environment, and of associated terminology problems. It
also has available a rich distributed information environment in which
to study the operation of the pilot and its interaction with users
and services. This includes the CAIRNS distributed catalogue with
universities, NLS, NGfL, SLAINTE, and Glasgow Digital Library (GDL)
databases, a subject-based collection strengths landscaping mechanism,
the SCONE named collections database, an OAI e-prints server, NOF
and other digitisation project databases, and the potential to mount
other Z39.50 databases. Other participants - particularly UKOLN, mda,
NCA, the RDN and the DNER, and the HILT terminology experts, add additional
depth and breadth to the team. In addition, OCLC has agreed to assist
the study by providing access to a machine-readable mapping of LCSH
to DDC and associated access to expertise. The CDLR also works closely
with the ten Glasgow FE colleges within the RSLP GDL project. |
| |
|
| 2
Project Aims |
|
|
HILT
Phase II will set up a pilot terminologies route map or TeRM service,
similar to that proposed in HILT Phase I (see Appendix
A for description), aiming to:
a.
Provide a practical experimental focus within which to investigate
and establish subject terminology service requirements for the JISC
Information Environment, with particular reference to DNER, RDN,
User, Collection Level, International Compatibility, and local,
regional, national and UK-wide access considerations.
b.
Make recommendations as regards a possible future service, taking
into account a range of factors, including the level and nature
of user need, practicality, design requirements, effectiveness,
functionality available in existing commercial software packages
as against original development, and (above all) costs against benefits.
|
| |
|
| 3.
Project Description, Including General Approach and Methodology |
|
| 3.1
General Points |
| Building
the TeRM |
| For
the purposes of this project, the pilot TeRM would be built using
commercially available Wordmap software. This is known (through HILT
Phase I experience) to provide a good initial illustration of the
kind of facilities needed for the pilot and would be provided by the
company at a relatively low 'research' price (around £22,000 in total
). This does not imply a preference for this software or supplier,
nor even for a commercial as opposed to a 'home-grown' or open source
approach. The project would aim to develop a full requirement specification
through evaluative activities conducted by user and service focus
groups and external experts. It would then compare all relevant packages
available, having conducted an in-depth survey of all current commercial
and other solutions. WordMap would be amongst those able to offer
software that might meet a significant part of the specification,
but would not be favoured. The question of whether or not a community-based
open source approach is preferable to buying a commercial solution
would also be examined. |
|
There
are three reasons for using a specific piece of commercial software
at this stage:
1.
Experience within HILT PHASE I suggests that project participants
find it easier to discuss the requirements of such a service given
a real illustrative example on which to focus. It is therefore believed
essential that we mount an illustrative pilot early on in the project
in order to help engage the interest and attention of users and
other stakeholders and give them a practical environment within
which to envisage and consider the problem. Wordmap is being used
because we want to have a real working demonstrator at an early
stage for users and service providers to interact with. Attempting
to draw out the full requirement before implementing an illustrative
pilot would, it is believed, result in a poorly researched requirement
as users and service providers would not have been sufficiently
stimulated by operation in a real context to allow a full specification
to emerge.
2.
It is feasible to negotiate a low price for a pilot of this kind
with one company before the project, it would not be possible to
guarantee doing this in the middle of a project after we'd spent
many months preparing a requirement.
3.
Taking a development from scratch approach at this stage would almost
certainly involve employing an experienced programmer for the duration
of the project. This would be in addition to the one full-time member
of staff costed below and would certainly cost more than the cost
of the software. It would also entail additional risks. On the one
hand, such staff are hard to find and often move on for improved
pay and conditions before a project is completed. On the other,
hold-ups or failures in software development schedules could undermine
the schedule in respect of other project activities. The Wordmap
approach is a pragmatic one that will enable us to evaluate the
real uses and issues in a timely way, whilst avoiding the potential
waste and risk involved in development from scratch before a full
requirement has been established.
The
initial illustrative TeRM would be based on the RDN terminologies
, on terminologies available as part of the Wordmap taxonomies set,
which include, in particular, a set of terms used by general internet
users, and on selective subsets of LCSH, DDC, UNESCO, and AAT. OCLC
will provide an LCSH - DDC mapping, and may also be able to provide
a DDC to Conspectus subject headings mapping. The UNESCO thesaurus
is available online and we will look to obtain AAT selections from
manual sources. The aim would be a selective mapping sufficient
for the purposes of the pilot in the first instance - i.e. not a
comprehensive terminologies map. Consideration would also be given
to the points noted in Appendix B below and to the various issues
raised by the HILT Phase I evaluator, Leonard Will (HILT Final Report
, Section 10).
|
| Building
the Research Environment |
| This
would be achieved by adding a range of DNER and other collections,
including RDN collections, Archives collections, Museums collections,
and a local OAI collection, to a copy of the SCONE Collections database
to create a HILT Phase II testbed collections database and CLD-based
landscaping and cross-searching environment using the CAIRNS dynamic
landscaping mechanism and broadcast search facility. The aim would
be to utilise 'native subject schemes' for the collections in the
environment, and to use the pilot TeRM to 'disambiguate' user terms
and resolve differences between schemes. A range of user base-landscapes
would be utilised, roughly associated with subject hubs as regards
subject interests, but representing a variety of user circumstances,
local, regional, national, UK-wide (general) and UK-wide (subject
hub) . The aim would be to link the TeRM to the landscaping mechanism
if possible (CAIRNS experience suggests it should be), or to simulate
this aspect if it is not (this would be less elegant, but sufficient
for project research purposes). |
| Machine
to Machine (M2M) Considerations |
| It will
be difficult in such a relatively small, relatively low-cost project
to fully investigate m2m use of the TeRM facility in an operational
sense. It is therefore proposed that the pilot project focus primarily
on use of the TeRM by users but cover the need for it to ultimately
meet service needs also by examining the requirement for this on an
ongoing basis at a mainly theoretical level. This aspect would be
a joint activity between UKOLN, the RDN, the DNER, and the CDLR and
others, but would be led and co-ordinated by UKOLN. It would involve
examination of a range of service to service interfaces, including
those utilising Z39.50, http, OAI (http, XML), and RSS. Operational
tests would be conducted when possible, however. Semantic web developments
will be monitored to help ensure inter-compatability. |
| 3.2
Key Project Activity Groups (Note: group schedules overlap - see main
section 6 below) |
|
1.
Set up: Web-site activities begun; Service, User, Community
stakeholders e-lists set-up.
2.
Refine and begin to implement project plan: Begin professional
level evaluation; Fully scope problem; Identify evaluation criteria
for success or failure of the TeRM approach, looking at technological,
terminological, cost, and user requirements; Determine subject areas
to focus on; Begin to identify commercial software suppliers; Involve
evaluator and the second terminology expert fully in all processes.
3.
'Engage' with users, service staff, communities, and the problem
in general; implement illustrative stage 1 pilot: Start dissemination
activities; set focus group dates; Install pilot software; Implement
illustrative (stage 1) pilot to inform and stimulate discussion;
Plan focus groups with a view to drawing out initial service specification;
Examine the process of assigning subject terms at RDN hubs and at
other relevant sites; Begin general and M2M requirements studies;
Begin functionality of commercial software survey; Hold focus groups
(include terminology experts).
4.
Begin to build stage 2 pilot TeRM based on illustrative pilot:
Determine initial service specification via focus groups, professional
level evaluation, general and M2M requirements study, functionality
survey, and project team analysis and discussion; Implement operational
(stage 2) Pilot Service.
5.
Terminology map modelling for TeRM (starts prior to implementation
of stage 1 pilot): Investigate terminological and functionality
requirements of a TeRM through intellectual analysis, the general
and M2M requirements studies, and empirical testing in a practical
environment. Utilise the following process when conducting empirical
testing
a.
New user authenticates as either HE or FE then chooses from the
range of available entry points: local, regional, national, UK-wide
(general) and UK-wide (subject hub).
b. New user 'disambiguates' her subject requirements using interaction
with TeRM
c. Appropriate terms are sent to the HILT Phase II CLD-based landscaping
mechanism
d. HILT Phase II landscaping mechanism builds appropriate entry
landscapes, showing only those collections appropriate to the entry
point in question. In some cases, these will be 'mock-ups' of the
interfaces of existing services or subsets of those services:
- An
FE college or university OPAC screen with a small change to permit
the further discovery of collections held elsewhere
- An
RDN subject hub (e.g. EEVL) or specific subject page (e.g. engineering
in EEVL)
- RDN
'Central'
e.
In other cases, they will be CAIRNS-type geographical or subject
landscapes:
- The
3 Glasgow Universities
- A
computer-aided design subject strength landscape showing Strathclyde
University and Heriot Watt University as shown by accessing this
URL: http://cairns.lib.strath.ac.uk;cfdocs;CAIRNS;CAIRNS;SubjCollSelect.cfm?uSubID=3128
- A
'mock-up' of an DNER-wide level entry page, dynamically generated
from the SCONE copy database which would have a selection of additional
DNER collections added for the purpose
f.
TeRM terminologies route map is adjusted until the CLD-based landscaping
operates as it should at this relatively simple level
g. A representative set of single term subject queries are then
'disambiguated' and 'mapped' via the TeRM and then run against the
collections in each of these base landscapes, the latter process
utilising CAIRNS cross-searching where the collections have Z39.50
catalogues or service by service methods where cross-searching is
not currently possible (e.g. for the local OAI database). h. TeRM
terminologies route map is adjusted until the single query item-level
searching operates as it should.
i. Each base landscape has an 'expand collections searched' button.
When this is clicked, a new TeRM-mediated landscape is generated
depending on the nature of the subject query being processed, the
user (HE or FE), and the base landscape (e.g. institution-level
would go to metropolitan area level, RDN subject hub to DNER level,
and so on). Again - as in e above - the TeRM terminologies route
map is adjusted until the CLD-based landscaping operates as it should
at this second collection discovery level.
j. The same representative set of single term subject queries utilised
at h above are again 'disambiguated' and 'mapped' via the TeRM and
then run against the collections in each of these new 'second-level'
collections landscapes, the latter process once again utilising
CAIRNS cross-searching where the collections have Z39.50 catalogues
or service by service methods where cross-searching is not currently
possible. The TeRM terminologies route map is again adjusted until
the single query item-level searching operates as it should for
item-level cross-searching at this second level.
Notes
1. There are two key questions underlying the above investigation
of terminology map modelling requirements - see Appendix B for details
2. The above process will be monitored and influenced by the ongoing
professional level evaluation process
6.
Gradually optimise effectiveness and functionality of stage 2
pilot to produce stage 3 pilot and interim specification for a future
service, utilising this same process
7.
Investigate costs of an operational service at various service
levels: Determine interim specification as described above;
determine associated terminology-related set-up and maintenance
costs; estimate feasibility and cost of a full service based on
commercial software or adapted commercial software; estimate and
compare likely cost of HE/FE developed software or any available
open source software.
8.
Prepare, disseminate, and obtain feedback on, Interim Report
9.
Complete, launch, and conduct user trials on, improved stage
3 pilot: finalise design based on interim report feedback and
ongoing professional level evaluation; Launch improved pilot; Plan
user trials and focus groups; Hold user trials and focus groups
10.
Finalise user, terminological, and functional requirements of
a future service and refine estimated set-up and maintenance costs;
further improve pilot if necessary and possible.
11.
Project completion activities: Compile draft final report;
Evaluate project and pilot and encompass results in draft report;
Circulate draft final report and obtain final feedback; Finalise
report; Submit and disseminate Report; Close down project
|
| |
|
| 4.
Participants and Roles |
|
|
It
is proposed that HILT Phase II should involve the same mix of participants
as HILT Phase I but also involve representatives from the DNER,
the RDN, and FE; specifically:
- The
Centre for Digital Library Research (CDLR) at Strathclyde University
- lead;
- DNER
representative
- mda
(formerly the Museums Documentation Association);
- National
Council on Archives; · National Grid for Learning (NGfL) Scotland;
- Online
Computer Library Center (OCLC);
- RDN
representative
- FE
Representative
- Scottish
Library and Information Council (SLIC)
- Scottish
University for Industry (SufI)
- UK
Office for Library and Information Networking (UKOLN)
- Terminology
experts, Alan Gilchrist and Leonard Will (external evaluator)
|
| |
|
| 5.
Project Management |
|
| Day
to day management will be the responsibility of the project staff.
This Project Team will report to a Project Management Group
(PMG) consisting of the team and a representative from each participant.
There will also be a Project Steering Group (PSG) comprising
representatives from all major stakeholders, including users, and
a Professional Level Evaluation Group (PLEG), responsible for
conducting an ongoing professional level evaluation of the pilot and
associated work. This will work with, and on behalf of, the external
evaluator, and will be made up of independent professionals from the
various domains, assisted as necessary by project staff. The PMG will
aim to meet every two months, reporting to the PSG, which will meet
every 4 months. The PLEG will meet as required, reporting to the PSG,
but will mainly conduct its business online. In addition, the project
will liaise with the JISC Development Team and follow reporting requirements
as set out by the JISC. |
| |
|
| 6.
Start and Finish Dates and Schedule |
|
|
| HILT
Phase 2 Pilot
|
Month:
|
1
|
2
|
3
|
4
|
5
|
6
|
7
|
8
|
9
|
10
|
11
|
12
|
|
Activity:
|
|
| Set up activities |
|
|
|
|
|
|
|
|
|
|
|
|
| Web-site activities |
|
|
|
|
|
|
|
|
|
|
|
|
| Service stakeholders e-list |
|
|
|
|
|
|
|
|
|
|
|
|
| User stakeholders e-list |
|
|
|
|
|
|
|
|
|
|
|
|
| Community stakeholders e-list |
|
|
|
|
|
|
|
|
|
|
|
|
| Professional level evaluation process |
|
|
|
|
|
|
|
|
|
|
|
|
| Scope problem |
|
|
|
|
|
|
|
|
|
|
|
|
| Identify/monitor evaluation criteria |
|
|
|
|
|
|
|
|
|
|
|
|
| Identify and monitor subject
areas to test |
|
|
|
|
|
|
|
|
|
|
|
|
| Identify software suppliers |
|
|
|
|
|
|
|
|
|
|
|
|
| Dissemination activities |
|
|
|
|
|
|
|
|
|
|
|
|
| Set up focus group dates |
|
|
|
|
|
|
|
|
|
|
|
|
| Install pilot software |
|
|
|
|
|
|
|
|
|
|
|
|
| Implement stage 1 pilot (illustrative) |
|
|
|
|
|
|
|
|
|
|
|
|
| Plan focus groups |
|
|
|
|
|
|
|
|
|
|
|
|
| Engage with indexing staff |
|
|
|
|
|
|
|
|
|
|
|
|
| Initial Service Specification |
|
|
|
|
|
|
|
|
|
|
|
|
| M2M Requirements Study
|
|
|
|
|
|
|
|
|
|
|
|
|
| Detailed Functionality Survey |
|
|
|
|
|
|
|
|
|
|
|
|
| Hold focus groups |
|
|
|
|
|
|
|
|
|
|
|
|
| Implement stage 2 pilot service |
|
|
|
|
|
|
|
|
|
|
|
|
| Terminology map modelling |
|
|
|
|
|
|
|
|
|
|
|
|
| Full service specification |
|
|
|
|
|
|
|
|
|
|
|
|
| Estimate and refine full service costs |
|
|
|
|
|
|
|
|
|
|
|
|
| Interim Report and Feedback |
|
|
|
|
|
|
|
|
|
|
|
|
| Implement and develop stage 3 pilot |
|
|
|
|
|
|
|
|
|
|
|
|
| Plan user trials/focus groups |
|
|
|
|
|
|
|
|
|
|
|
|
| Hold user trials/ focus groups |
|
|
|
|
|
|
|
|
|
|
|
|
| Improve stage 3 pilot |
|
|
|
|
|
|
|
|
|
|
|
|
| Compile draft final report |
|
|
|
|
|
|
|
|
|
|
|
|
| Evaluate project and pilot |
|
|
|
|
|
|
|
|
|
|
|
|
| Circulate draft final report |
|
|
|
|
|
|
|
|
|
|
|
|
| Finalise report |
|
|
|
|
|
|
|
|
|
|
|
|
| Submit/disseminate Report |
|
|
|
|
|
|
|
|
|
|
|
|
| Project closedown activities |
|
|
|
|
|
|
|
|
|
|
|
|
|
| Overall
duration: twelve months. Target dates: Mid May 2002 - Mid May 2003 |
| |
|
| 7.
Deliverables |
|
|
1.
Greater understanding of the problem and of the needs of FE and
HE users in respect of subject retrieval in the projected JISC Information
Environment, both within JISC, JISC services, and - though dissemination
activities - in the community as a whole.
2. An in-depth understanding of terminology mapping requirements
in the DNER and associated UK services, taking local, regional,
national, international, subject-hub, FE and HE, and archives, libraries,
museums, and electronic services considerations into account.
3. A working pilot terminologies demonstrator service for the JISC
IE (with limited functionality and with a full service possibly
requiring a change of software).
4. Requirements, set up and maintenance costs, and costs against
benefits, for a future service, including both user and M2M terminological
and functional requirements.
5. Final Report on the project, together with appropriate recommendations.
|
| |
|
| 8.
Standards |
|
|
The
project will adhere to appropriate standards where these exist and
will be advised in this by other participants, such as UKOLN, the
DNER, domain representatives and terminology experts. The project
is aware of the British standard guide to establishment and development
of monolingual thesauri (BS5723:1987) (ISO2788-1986) and the British
standard guide to establishment and development of multilingual
thesauri (BS6723:1985) (ISO5964-1985) and will consult these and
other appropriate works. In addition, it will aim to build UK requirements
around terminologies recognised and used internationally (e.g. DDC,
LCSH, UNESCO, AAT). In all other matters, the developing DNER/IE
standards and the eLib Standards Guidelines would be consulted.
The situation regarding other possibly relevant initiatives such
as zThes and SRW will be monitored as the project develops.
CAIRNS
uses the Z39.50 standard, and the CLD approach used in the SCONE
collections database is based on the UKOLN instantiation, and on
the original model produced by Michael Heaney when the UKOLN instantiation
required extension to take account of practical requirements within
SCONE/CAIRNS . The Collections Focus have had a look at these collection
descriptions and are happy with them.
|
| |
| |
| 9.
Costs (not publicly available) |
| |
| |
| 10.
Dissemination Strategy |
|
| Dissemination
of information would be via the HILT Phase II web site, postings to
appropriate e-mail lists, papers and news items submitted to professional
publications and presentations at seminars and conferences. Key progress
reports would be sent to relevant organisations and to all institutions
in the United Kingdom. An active and successful dissemination programme
would be a major aim throughout the project. |
| |
|
| 11.
Proposed Exit Strategy |
|
| The
Project will make recommendations about the possible nature and cost
of a future service. The CDLR will maintain the demonstrator service
for a reasonable period of time beyond the end of the project, the
exact time to be agreed with the JISC. |
| |
|
| 12.
Project Contact |
|
|
Dennis
Nicholson, Director, Centre for Digital Library Research, University
of Strathclyde
|
|