HILT: High-Level Thesaurus Project Proposal

HILT High Level Thesaurus Project Phase II: Collection Level Subject Terminology Requirements in FE and HE: Pilot Service, Cost-benefit Analysis and User Evaluation

Contents

1. Overview
1.1 The Problem
1.2 Background: HILT Phase I to HILT Phase II
1.3 Staying in Step with other Relevant JISC Activities
1.4 An Experienced Team and a Rich Distributed Research Environment
2. Project Aims
3. Project Description, Including General Approach and Methodology
3.1 General Points

  • Building the TeRM
  • Building the Research Environment
  • Machine to Machine (M2M) Considerations
    3.2 Key Project Activity Groups (Note: group schedules overlap - see main section 6 below)
    4. Participants and Roles
    5. Project Management
    6. Start and Finish Dates and Schedule
    7. Deliverables
    8. Standards
    Costs (not publicly available)
    10. Dissemination Strategy
    11. Proposed Exit Strategy
    12. Project Contact
  • Appendix A: Interactive Terminologies Route Map (TeRM) Diagram
    Appendix B: Key Questions Relating to Terminology Map Modelling Requirements
    Appendix C: Relationship to HILT Phase 1, the DNER, and the JISC Information Environment


     
    1 Overview
    1.1 The Problem
    Ensuring that FE and HE users of the planned Information Environment (IE) can find appropriate learning, research and information resources by subject is one of the major challenges facing the JISC, the DNER, the RDN, and the various key information and learning service providers across the archives, libraries, museums, and electronic services domains. The various service providers use a range of subject schemes (from general schemes like LCSH, UNESCO, DDC, and AAT, to specific schemes like MeSH) to meet the requirement to adequately and consistently describe their resources for accurate retrieval. If cross-searching and browsing is to function coherently for users of the IE, these schemes must be mapped to one another, perhaps using a common 'spine' such as DDC with international and multi-lingual application and the potential to facilitate machine to machine (M2M) interworking. More importantly, perhaps, the terminologies in the minds of different types of FE and HE users must be 'disambiguated' , then translated into the service-assigned terms the users need to cross-search or browse the group of services of relevance to their query. The aim of HILT Phase II is to build and evaluate a pilot service that will mediate this process as a 'Shared Service' in the IE.
    1.2 Background: HILT Phase I to HILT Phase II

    HILT Phase I reported in November 2001. It was funded jointly by JISC and RSLP and was in line with the 'Study and Specification' stage of one aspect of the Shared Services programme - that aspect relating to the provision of thesauri and terminology services. HILT Phase II moves this process into the 'Pilot Project' stage, focussing - as recommended by the HILT Phase I evaluator - on terminology and thesauri requirements at the collection level, but also bearing in mind the need to extend this in due course to the needs of item level retrieval. Phase II will last for 12 months, will cost £89,000, and will utilise the work of HILT Phase I, and the skills and experience of the team that carried it out, to:

    • Build an initial pilot, then develop it in line with user, service, and expert evaluator outputs
    • Determine detailed requirements, costs, and benefits to FE and HE users of a full terminologies service focussed primarily on collection level needs
    1.3 Staying in Step with other Relevant JISC Activities
    The project team will take full account of advice and outputs from other relevant JISC activities in the area, including the feasibility and scoping work being carried out by MIMAS and UKOLN in respect of Shared Services for collection and service description, the work of the Collection Descriptions Focus, and relevant activities in other IE areas such as security and authentication work, RDN work within the Renardus Project, the work of the FE support Centres, the work towards the creation of a learning materials repository, where subject terminologies will again be an issue, DNER formative evaluation work at CERLIM, and work on user behaviour.
    1.4 An Experienced Team and a Rich Distributed Research Environment
    A full list of the organisations and individuals involved in the project is provided in section 4 below. Through its involvement in the CAIRNS clumps project (which utilised collection strengths to landscape mini-clumps), the SCONE and SEED projects which combined to build a cross-sectoral collections database , and HILT Phase I, the lead site - Strathclyde University's Centre for Digital Library Research (CDLR ) - has extensive experience in the use of collection level descriptions in a dynamic distributed environment, and of associated terminology problems. It also has available a rich distributed information environment in which to study the operation of the pilot and its interaction with users and services. This includes the CAIRNS distributed catalogue with universities, NLS, NGfL, SLAINTE, and Glasgow Digital Library (GDL) databases, a subject-based collection strengths landscaping mechanism, the SCONE named collections database, an OAI e-prints server, NOF and other digitisation project databases, and the potential to mount other Z39.50 databases. Other participants - particularly UKOLN, mda, NCA, the RDN and the DNER, and the HILT terminology experts, add additional depth and breadth to the team. In addition, OCLC has agreed to assist the study by providing access to a machine-readable mapping of LCSH to DDC and associated access to expertise. The CDLR also works closely with the ten Glasgow FE colleges within the RSLP GDL project.
     
    2 Project Aims

    HILT Phase II will set up a pilot terminologies route map or TeRM service, similar to that proposed in HILT Phase I (see Appendix A for description), aiming to:

    a. Provide a practical experimental focus within which to investigate and establish subject terminology service requirements for the JISC Information Environment, with particular reference to DNER, RDN, User, Collection Level, International Compatibility, and local, regional, national and UK-wide access considerations.

    b. Make recommendations as regards a possible future service, taking into account a range of factors, including the level and nature of user need, practicality, design requirements, effectiveness, functionality available in existing commercial software packages as against original development, and (above all) costs against benefits.

     
    3. Project Description, Including General Approach and Methodology
    3.1 General Points
    Building the TeRM
    For the purposes of this project, the pilot TeRM would be built using commercially available Wordmap software. This is known (through HILT Phase I experience) to provide a good initial illustration of the kind of facilities needed for the pilot and would be provided by the company at a relatively low 'research' price (around £22,000 in total ). This does not imply a preference for this software or supplier, nor even for a commercial as opposed to a 'home-grown' or open source approach. The project would aim to develop a full requirement specification through evaluative activities conducted by user and service focus groups and external experts. It would then compare all relevant packages available, having conducted an in-depth survey of all current commercial and other solutions. WordMap would be amongst those able to offer software that might meet a significant part of the specification, but would not be favoured. The question of whether or not a community-based open source approach is preferable to buying a commercial solution would also be examined.

    There are three reasons for using a specific piece of commercial software at this stage:

    1. Experience within HILT PHASE I suggests that project participants find it easier to discuss the requirements of such a service given a real illustrative example on which to focus. It is therefore believed essential that we mount an illustrative pilot early on in the project in order to help engage the interest and attention of users and other stakeholders and give them a practical environment within which to envisage and consider the problem. Wordmap is being used because we want to have a real working demonstrator at an early stage for users and service providers to interact with. Attempting to draw out the full requirement before implementing an illustrative pilot would, it is believed, result in a poorly researched requirement as users and service providers would not have been sufficiently stimulated by operation in a real context to allow a full specification to emerge.

    2. It is feasible to negotiate a low price for a pilot of this kind with one company before the project, it would not be possible to guarantee doing this in the middle of a project after we'd spent many months preparing a requirement.

    3. Taking a development from scratch approach at this stage would almost certainly involve employing an experienced programmer for the duration of the project. This would be in addition to the one full-time member of staff costed below and would certainly cost more than the cost of the software. It would also entail additional risks. On the one hand, such staff are hard to find and often move on for improved pay and conditions before a project is completed. On the other, hold-ups or failures in software development schedules could undermine the schedule in respect of other project activities. The Wordmap approach is a pragmatic one that will enable us to evaluate the real uses and issues in a timely way, whilst avoiding the potential waste and risk involved in development from scratch before a full requirement has been established.

    The initial illustrative TeRM would be based on the RDN terminologies , on terminologies available as part of the Wordmap taxonomies set, which include, in particular, a set of terms used by general internet users, and on selective subsets of LCSH, DDC, UNESCO, and AAT. OCLC will provide an LCSH - DDC mapping, and may also be able to provide a DDC to Conspectus subject headings mapping. The UNESCO thesaurus is available online and we will look to obtain AAT selections from manual sources. The aim would be a selective mapping sufficient for the purposes of the pilot in the first instance - i.e. not a comprehensive terminologies map. Consideration would also be given to the points noted in Appendix B below and to the various issues raised by the HILT Phase I evaluator, Leonard Will (HILT Final Report , Section 10).

    Building the Research Environment
    This would be achieved by adding a range of DNER and other collections, including RDN collections, Archives collections, Museums collections, and a local OAI collection, to a copy of the SCONE Collections database to create a HILT Phase II testbed collections database and CLD-based landscaping and cross-searching environment using the CAIRNS dynamic landscaping mechanism and broadcast search facility. The aim would be to utilise 'native subject schemes' for the collections in the environment, and to use the pilot TeRM to 'disambiguate' user terms and resolve differences between schemes. A range of user base-landscapes would be utilised, roughly associated with subject hubs as regards subject interests, but representing a variety of user circumstances, local, regional, national, UK-wide (general) and UK-wide (subject hub) . The aim would be to link the TeRM to the landscaping mechanism if possible (CAIRNS experience suggests it should be), or to simulate this aspect if it is not (this would be less elegant, but sufficient for project research purposes).
    Machine to Machine (M2M) Considerations
    It will be difficult in such a relatively small, relatively low-cost project to fully investigate m2m use of the TeRM facility in an operational sense. It is therefore proposed that the pilot project focus primarily on use of the TeRM by users but cover the need for it to ultimately meet service needs also by examining the requirement for this on an ongoing basis at a mainly theoretical level. This aspect would be a joint activity between UKOLN, the RDN, the DNER, and the CDLR and others, but would be led and co-ordinated by UKOLN. It would involve examination of a range of service to service interfaces, including those utilising Z39.50, http, OAI (http, XML), and RSS. Operational tests would be conducted when possible, however. Semantic web developments will be monitored to help ensure inter-compatability.
    3.2 Key Project Activity Groups (Note: group schedules overlap - see main section 6 below)

    1. Set up: Web-site activities begun; Service, User, Community stakeholders e-lists set-up.

    2. Refine and begin to implement project plan: Begin professional level evaluation; Fully scope problem; Identify evaluation criteria for success or failure of the TeRM approach, looking at technological, terminological, cost, and user requirements; Determine subject areas to focus on; Begin to identify commercial software suppliers; Involve evaluator and the second terminology expert fully in all processes.

    3. 'Engage' with users, service staff, communities, and the problem in general; implement illustrative stage 1 pilot: Start dissemination activities; set focus group dates; Install pilot software; Implement illustrative (stage 1) pilot to inform and stimulate discussion; Plan focus groups with a view to drawing out initial service specification; Examine the process of assigning subject terms at RDN hubs and at other relevant sites; Begin general and M2M requirements studies; Begin functionality of commercial software survey; Hold focus groups (include terminology experts).

    4. Begin to build stage 2 pilot TeRM based on illustrative pilot: Determine initial service specification via focus groups, professional level evaluation, general and M2M requirements study, functionality survey, and project team analysis and discussion; Implement operational (stage 2) Pilot Service.

    5. Terminology map modelling for TeRM (starts prior to implementation of stage 1 pilot): Investigate terminological and functionality requirements of a TeRM through intellectual analysis, the general and M2M requirements studies, and empirical testing in a practical environment. Utilise the following process when conducting empirical testing

    a. New user authenticates as either HE or FE then chooses from the range of available entry points: local, regional, national, UK-wide (general) and UK-wide (subject hub).
    b. New user 'disambiguates' her subject requirements using interaction with TeRM
    c. Appropriate terms are sent to the HILT Phase II CLD-based landscaping mechanism
    d. HILT Phase II landscaping mechanism builds appropriate entry landscapes, showing only those collections appropriate to the entry point in question. In some cases, these will be 'mock-ups' of the interfaces of existing services or subsets of those services:

    • An FE college or university OPAC screen with a small change to permit the further discovery of collections held elsewhere
    • An RDN subject hub (e.g. EEVL) or specific subject page (e.g. engineering in EEVL)
    • RDN 'Central'

    e. In other cases, they will be CAIRNS-type geographical or subject landscapes:

    • The 3 Glasgow Universities
    • A computer-aided design subject strength landscape showing Strathclyde University and Heriot Watt University as shown by accessing this URL: http://cairns.lib.strath.ac.uk;cfdocs;CAIRNS;CAIRNS;SubjCollSelect.cfm?uSubID=3128
    • A 'mock-up' of an DNER-wide level entry page, dynamically generated from the SCONE copy database which would have a selection of additional DNER collections added for the purpose

    f. TeRM terminologies route map is adjusted until the CLD-based landscaping operates as it should at this relatively simple level
    g. A representative set of single term subject queries are then 'disambiguated' and 'mapped' via the TeRM and then run against the collections in each of these base landscapes, the latter process utilising CAIRNS cross-searching where the collections have Z39.50 catalogues or service by service methods where cross-searching is not currently possible (e.g. for the local OAI database). h. TeRM terminologies route map is adjusted until the single query item-level searching operates as it should.
    i. Each base landscape has an 'expand collections searched' button. When this is clicked, a new TeRM-mediated landscape is generated depending on the nature of the subject query being processed, the user (HE or FE), and the base landscape (e.g. institution-level would go to metropolitan area level, RDN subject hub to DNER level, and so on). Again - as in e above - the TeRM terminologies route map is adjusted until the CLD-based landscaping operates as it should at this second collection discovery level.
    j. The same representative set of single term subject queries utilised at h above are again 'disambiguated' and 'mapped' via the TeRM and then run against the collections in each of these new 'second-level' collections landscapes, the latter process once again utilising CAIRNS cross-searching where the collections have Z39.50 catalogues or service by service methods where cross-searching is not currently possible. The TeRM terminologies route map is again adjusted until the single query item-level searching operates as it should for item-level cross-searching at this second level.

    Notes 1. There are two key questions underlying the above investigation of terminology map modelling requirements - see Appendix B for details
    2. The above process will be monitored and influenced by the ongoing professional level evaluation process

    6. Gradually optimise effectiveness and functionality of stage 2 pilot to produce stage 3 pilot and interim specification for a future service, utilising this same process

    7. Investigate costs of an operational service at various service levels: Determine interim specification as described above; determine associated terminology-related set-up and maintenance costs; estimate feasibility and cost of a full service based on commercial software or adapted commercial software; estimate and compare likely cost of HE/FE developed software or any available open source software.

    8. Prepare, disseminate, and obtain feedback on, Interim Report

    9. Complete, launch, and conduct user trials on, improved stage 3 pilot: finalise design based on interim report feedback and ongoing professional level evaluation; Launch improved pilot; Plan user trials and focus groups; Hold user trials and focus groups

    10. Finalise user, terminological, and functional requirements of a future service and refine estimated set-up and maintenance costs; further improve pilot if necessary and possible.

    11. Project completion activities: Compile draft final report; Evaluate project and pilot and encompass results in draft report; Circulate draft final report and obtain final feedback; Finalise report; Submit and disseminate Report; Close down project

     
    4. Participants and Roles

    It is proposed that HILT Phase II should involve the same mix of participants as HILT Phase I but also involve representatives from the DNER, the RDN, and FE; specifically:

    • The Centre for Digital Library Research (CDLR) at Strathclyde University - lead;
    • DNER representative
    • mda (formerly the Museums Documentation Association);
    • National Council on Archives; · National Grid for Learning (NGfL) Scotland;
    • Online Computer Library Center (OCLC);
    • RDN representative
    • FE Representative
    • Scottish Library and Information Council (SLIC)
    • Scottish University for Industry (SufI)
    • UK Office for Library and Information Networking (UKOLN)
    • Terminology experts, Alan Gilchrist and Leonard Will (external evaluator)
     
    5. Project Management
    Day to day management will be the responsibility of the project staff. This Project Team will report to a Project Management Group (PMG) consisting of the team and a representative from each participant. There will also be a Project Steering Group (PSG) comprising representatives from all major stakeholders, including users, and a Professional Level Evaluation Group (PLEG), responsible for conducting an ongoing professional level evaluation of the pilot and associated work. This will work with, and on behalf of, the external evaluator, and will be made up of independent professionals from the various domains, assisted as necessary by project staff. The PMG will aim to meet every two months, reporting to the PSG, which will meet every 4 months. The PLEG will meet as required, reporting to the PSG, but will mainly conduct its business online. In addition, the project will liaise with the JISC Development Team and follow reporting requirements as set out by the JISC.
     
    6. Start and Finish Dates and Schedule

    HILT Phase 2 Pilot

    Month:

    1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    11

    12

    Activity:

    Set up activities                        
    Web-site activities                        
    Service stakeholders e-list                        
    User stakeholders e-list                        
    Community stakeholders e-list                        
    Professional level evaluation process                        
    Scope problem                        
    Identify/monitor evaluation criteria                        
    Identify and monitor subject areas to test                        
    Identify software suppliers                        
    Dissemination activities                        
    Set up focus group dates                        
    Install pilot software                        
    Implement stage 1 pilot (illustrative)                        
    Plan focus groups                        
    Engage with indexing staff                        
    Initial Service Specification                        
    M2M Requirements Study                        
    Detailed Functionality Survey                        
    Hold focus groups                        
    Implement stage 2 pilot service                        
    Terminology map modelling                        
    Full service specification                        
    Estimate and refine full service costs                        
    Interim Report and Feedback                        
    Implement and develop stage 3 pilot                        
    Plan user trials/focus groups                        
    Hold user trials/ focus groups                        
    Improve stage 3 pilot                        
    Compile draft final report                        
    Evaluate project and pilot                        
    Circulate draft final report                        
    Finalise report                        
    Submit/disseminate Report                        
    Project closedown activities                        

    Overall duration: twelve months. Target dates: Mid May 2002 - Mid May 2003
     
    7. Deliverables

    1. Greater understanding of the problem and of the needs of FE and HE users in respect of subject retrieval in the projected JISC Information Environment, both within JISC, JISC services, and - though dissemination activities - in the community as a whole.
    2. An in-depth understanding of terminology mapping requirements in the DNER and associated UK services, taking local, regional, national, international, subject-hub, FE and HE, and archives, libraries, museums, and electronic services considerations into account.
    3. A working pilot terminologies demonstrator service for the JISC IE (with limited functionality and with a full service possibly requiring a change of software).
    4. Requirements, set up and maintenance costs, and costs against benefits, for a future service, including both user and M2M terminological and functional requirements.
    5. Final Report on the project, together with appropriate recommendations.

     
    8. Standards

    The project will adhere to appropriate standards where these exist and will be advised in this by other participants, such as UKOLN, the DNER, domain representatives and terminology experts. The project is aware of the British standard guide to establishment and development of monolingual thesauri (BS5723:1987) (ISO2788-1986) and the British standard guide to establishment and development of multilingual thesauri (BS6723:1985) (ISO5964-1985) and will consult these and other appropriate works. In addition, it will aim to build UK requirements around terminologies recognised and used internationally (e.g. DDC, LCSH, UNESCO, AAT). In all other matters, the developing DNER/IE standards and the eLib Standards Guidelines would be consulted. The situation regarding other possibly relevant initiatives such as zThes and SRW will be monitored as the project develops.

    CAIRNS uses the Z39.50 standard, and the CLD approach used in the SCONE collections database is based on the UKOLN instantiation, and on the original model produced by Michael Heaney when the UKOLN instantiation required extension to take account of practical requirements within SCONE/CAIRNS . The Collections Focus have had a look at these collection descriptions and are happy with them.

     
     
    9. Costs (not publicly available)
     
     
    10. Dissemination Strategy
    Dissemination of information would be via the HILT Phase II web site, postings to appropriate e-mail lists, papers and news items submitted to professional publications and presentations at seminars and conferences. Key progress reports would be sent to relevant organisations and to all institutions in the United Kingdom. An active and successful dissemination programme would be a major aim throughout the project.
     
    11. Proposed Exit Strategy
    The Project will make recommendations about the possible nature and cost of a future service. The CDLR will maintain the demonstrator service for a reasonable period of time beyond the end of the project, the exact time to be agreed with the JISC.
     
    12. Project Contact

    Dennis Nicholson, Director, Centre for Digital Library Research, University of Strathclyde

     

     

     


    © HILT: High-Level Thesaurus