HILT: High-Level Thesaurus Project Proposal

Contents

  1. Introduction, Background, and Overview
  2. Direct Contribution to the DNR
  3. Relevance to the DNER and the RDN
  4. Expected Impact
  5. Regional, National and International Importance to Researchers
  6. Description of Collection, Strengths, Size
  7. Purpose and Outline Description of the Project
  8. Lead Institution, Partners, Relationship to Core Institutional Objectives
  9. Project Management Proposals, Mechanisms for Self-monitoring, Self-evaluation
  10. External Evaluation
  11. Deliverables
  12. Start and Finish Dates
  13. Milestones; Schedule
  14. Standards
  15. Person Weeks required for the Work
  16. Total Estimated Cost and Contribution from RSLP
  17. Dissemination Strategy
  18. Proposed Exit Strategy
  19. Statement of Policy on Access
  20. Statement on Minimum Standards for Bibliographic Records
  21. Biographies of Key Individuals
  22. Key Institutional Contacts

 

 

  1. Introduction, Background, and Overview
  2. Introduction

    This full bid can be considered either as a SCONE extension proposal or as relating to Collaborative Collection Management Projects priority E.1.2.5 of the RSLP call (Annexe B): 'Establishment and evaluation of the costs and benefits of different approaches to the co-ordination of collection management and service delivery activities' or E 1.3 , which covers projects in areas other than those specified by RSLP. It was invited by RSLP and JISC subsequent to RSLP discussing the possibility of joint funding with JISC after the HILT expression of interest had been accepted by the RSLP Steering Group. The full bid has been re-written subsequent to discussions with Ronald Milne and Chris Rusbridge which suggested re-focusing the bid, widening the participants to include archives community participants, and removing the demonstrator proposed in the expression of interest (amongst other things). The proposal comes from a partnership comprising the Centre for Digital Library Research, UKOLN, OCLC, The MDA, The National Council on Archives, NGfL (Scotland), SUfI and the Scottish Library and Information Council (SLIC). Other relevant organisations will be represented through the Steering Group. Full details of the proposal are presented below (see 1 (Overview), 7 and 11 particularly). In general outline, however, the project aims to study and report on the problem of cross-searching and browsing by subject across a range of communities, services, and service or resource types (libraries, museums, archives, the DNR, clumps, the DNER, the RDN, bibliographic databases, numeric data, and others). What exactly is the nature and structure of the problem? Who needs to solve it? What does current expert opinion say about the issue? What are the requirements of a solution? Can a common subject scheme be found that will meet all or nearly all of these requirements? If not, is a UK high-level thesaurus that is integrated with one or more international schemes the answer? The group would wish to look at these and any other options that may exist once the exact nature of the problem itself has been defined and documented.

    Top

    Background

    From the SCONE perspective, the HILT proposal arises originally out of the SCONE deliverable to map the Conspectus subject scheme used in CAIRNS and SCONE to other UK schemes but goes far beyond this. It was found to be necessary to go beyond the SCONE mapping because other schemes in use (e.g. in the other clumps projects and the Research Assessment Exercise (RAE)) were one level schemes and did not have the depth required either to map to the Conspectus scheme - a three-level hierarchy - or to provide useful navigation within and beyond CAIRNS. An alternative UK-oriented scheme with similar depth - or a means of mapping to such a scheme - is required and one aim of the HILT proposal is to address this requirement. Earlier work done by SCONE, CAIRNS and NGfL (Scotland) looking at the Conspectus scheme, the all-subjects schools scheme used by NGfL (Scotland) and various other schemes will help inform the specifics of this particular requirement. This work had support from CAIRNS, SCONE, National Library of Scotland (NLS), NGfL (Scotland), SLAINTE, the other UK clumps projects (RIDING, M25 and Music Libraries Online), and the cross-sectoral group represented by the Scottish Library and Information Council's (SLIC) Advisory Group on Interoperability and Access (SAGIA) which covers not only Higher Education, but also Further Education (Glasgow Telecolleges Network), Public libraries, NGfL (Scotland), the Scottish Cultural Resources Access Network (SCRAN) and the Scottish University for Industry (SUfI). The HILT expression of interest which preceded this full bid was a development of this early work, and was further informed by discussions at the MODELS 11 workshop on terminology held in Bath on January 11th of this year. One outcome of the workshop was agreement that a HILT expression of interest should be submitted to RSLP. MODELS 11 included representatives from a wide range of organisations. In addition to those in the original SCONE group described above, these included the UK Office for Library and Information Networking (UKOLN), the Museum Documentation Association (MDA), the National Preservation Office (NPO), the British Library, English Heritage, the Science Museum, British Educational Communications and Technology Agency (BECTA), Natural History Museum, National Museums of Scotland, the Higher Education Funding Council for England (HEFCE), and the Library and Information Commission (LIC).

    Top

    Overview of HILT proposal

    This full bid is informed by the earlier work described above, but has been shaped in the main by the subsequent discussions with RSLP and JISC, and by the comments of the external assessors on the original expression of interest.

    As indicated earlier, the HILT proposal comes from a partnership comprising the Centre for Digital Library Research, OCLC, UKOLN, The MDA, The National Council on Archives, NGfL (Scotland), SUfI and the Scottish Library and Information Council (SLIC). All have a common interest in facilitating user searching and browsing by subject, whether this be within a single service (e.g. SLIC), across a group of similar services (e.g. the RDN), or across a group of services that span sectors, domains, regions, professions, languages, time periods with differing terminologies, countries, or a mixed subset of these (one example covering some of these would be a clumps project such as CAIRNS). The aim of the HILT proposal is to determine how this requirement to offer users subject searching and browsing - or, more commonly, cross-searching and browsing - can best be met, when the various communities, services and initiatives who have the need (HE, FE, public libraries, Museums, The Archives Community, NGfL, UfI, the RDN, the DNER, the Clumps projects, and others) usually have different requirements, take different approaches, and, more often than not, use different subject schemes. More specifically, the HILT project aims to:

    1. Thoroughly research, determine and document the exact nature of the problem in detail, focusing on UK requirements across the various communities, services and initiatives, but setting the study firmly (and necessarily) in the context of international requirements and standards:
      • Surveying and reviewing both the literature and expert opinion
      • Identifying all key communities, services and initiatives with an interest in resolving it
      • Determining what perspective on the problem the various communities have and what they see their users' requirements as being (Note: it was agreed with RSLP and JISC that this was the only practical way of taking user needs into account at this stage)
      • Determining which subject schemes or thesauri are used by the stakeholder groups and also which other schemes exist that might solve the problem
      • Identifying relevant organisational, inter-organisational and 'political' issues
      • Identifying non-terminological and non-technical barriers to the adoption of any given solution (e.g. barriers to uptake by stakeholders or their cataloguers, difficulty and cost of retroconversion of legacy metadata, etc.)
    2. Analyse the data obtained in this exercise, and discuss the results with the various communities, with a view to determining:
      • The exact nature of the subject terminologies problem itself
      • The structural requirements of any solutions (including not only terminological relationship requirements but also other elements such as ease and cost of maintenance)
      • Other requirements (organisational, non-terminological and non-technical barriers etc.)
      • User and machine interface issues
      • Requirements in respect of integrating with subject terminologies and thesauri focused on specific subject areas (and other, similar, 'narrow-focus' in-depth schemes)
    3. Use this information to reach a consensus within the project as to whether:
      • There is an existing universal subject scheme, thesaurus, or other solution that meets the requirements (or nearly so)
      • It would be possible to adapt one or more existing schemes, thesauri, or other solutions to solve the problem
      • It is necessary (and possible) to create a subject scheme, or thesaurus, or other subject organisation and indexing system to solve the problem
      • It is impossible to solve the problem and, if so, why, and what the implications of this are for users
    4. Attempt to reach a similar consensus within the group of stakeholders generally, both at a MODELS series workshop and through other methods
    5. Contribute to and co-operate with an external evaluation of the project
    6. Make a final report and recommendations to RSLP, JISC, the various stakeholders, and the national and international community generally.

    Top

     

  3. Direct Contribution to the DNR
  4. A successful project outcome will help make the DNR more accessible by proposing a means by which cross-searching and cross-browsing by subject can be improved, either by the identification of a common subject scheme that could be accepted and used by different sectors and domains, different regions of the UK, and major services and initiatives (RDN, Archives Hub, DNER, clumps projects, SCRAN) and that is internationally recognised and used, or by the specification of a means by which different schemes required by different communities and services may be mapped to such a common scheme, potentially producing both human and machine-readable outputs.

     

  5. Relevance to the DNER and the RDN
  6. The DNER and the RDN both have a requirement for high level subject searching and browsing and have not yet made a decision on how to approach the issue. The outcome of HILT will therefore be of interest to them. If a suitable solution were available as a result of HILT they would adopt this rather than fund an additional study of their own.

    Top

  7. Expected Impact
  8. All disciplines are covered by this proposal, so there is no particular impact on any given discipline. A positive outcome would benefit all disciplines equally and would also offer major benefits to researchers, teachers and students whose work is multi-disciplinary.

     

  9. Regional, National and International Importance to Researchers
  10. Since the aim is to find a solution that will be of benefit in the context of the whole DNR (and therefore the DNER), it is safe to assume that the collections to which it is proposed HILT be applied contain a significant range of materials of regional, national or international value.

     

  11. Description of Collection, Strengths, Size
  12. The aim is to provide a common subject scheme that will apply across the whole of the distributed national resource (DNR). In so far as this question is relevant to the proposal, therefore, the collection in question is the DNR and has its strength and size.

    Top

  13. Purpose and Outline Description of the Project
  14. The purpose of the project is to study and report on the problem of cross-searching and browsing by subject across a range of communities, services, and service or resource types (Libraries, Museums, Archives, the DNR, clumps, the DNER, the RDN, bibliographic databases, numeric data, and others) - to research the problem, analyse and document its exact nature in detail, determine whether it can be solved and, if so, how, and attempt to reach a consensus on the issue across the various communities, services and initiatives identified by the project as stakeholders.

    This would be done by:

      • Surveying the literature for publications on the issue and for UK and International perspectives on it; discussing the topic with internationally active groups (OCLC, UNESCO and others). This would look at issues relating to controlled vocabularies generally and the project report would contain a pre-amble relating to this. However, the intention is that the project should focus mainly on the issue of subject terminologies
      • Identifying all key UK groups, communities and services with an interest in finding a solution to this problem; creating mechanisms for informing these stakeholders on project progress and obtaining ongoing input from them (web-site, e-list, possibly a newsletter if any participants will have difficulties accessing the web-site and the e-list)
      • Discussing various project elements and approaches with the external evaluator to agree a way forward
      • Discussing the problem with each of the stakeholders with a view to establishing and documenting the details of their perspective on the problem and related issues - the subject schemes they use (e.g. LCSH, UNESCO thesaurus, Cornucopia, DDC), the requirements of their users, the elements of the problem from their viewpoints, their needs as regards integration of any UK solution with international schemes, their views on which, if any, existing universal schemes they see as potentially offering a solution or partial solution to either the question of international integration, or the cross-searching or browsing problem, or both.
      • On the basis of this survey, and a follow up search for relevant publications and projects, widening the review of the literature to form a more complete view of available knowledge and opinion on the issue
      • Making information on the main universal subject schemes identified in this process available to the various communities (archives, museums, libraries etc.) and ascertaining the views of each on the merits and demerits of the various schemes.
      • Conducting a survey of literature, projects, organisations, and individuals to determine current views on best practice in respect of both user and machine oriented interfaces both to thesauri and to subject terminologies generally
      • Organising, evaluating and analysing the data from the above data gathering exercises, identifying a range of possible approaches to the various issues raised, and compiling a first draft of a report on the project, together with draft recommendations on the best approach to the problem. This would look at the structural requirements of any solution proposed, including maintenance issues, community control issues, and issues related to the needed inter-relationships between subject terms in any thesaurus
      • Making the report widely available for comment for a period leading up to a major workshop in the MODELS series, this to include breakout sessions on key issues
      • Producing a new draft of the report based on workshop outcomes and disseminating widely for comment
      • Producing a penultimate draft of the report
      • Bringing in the external evaluator at this stage to evaluate the project and the report
      • Producing a final report and recommendations, incorporating the external evaluation report itself and, if appropriate, any changes it proposes

    All of the above would, of course, be monitored by a Steering Group with representation of all major stakeholders

    Top

  15. Lead Institution, Partners, Relationship to Core Institutional Objectives
  16. The proposal is led by the Centre for Digital Library Research at Strathclyde University. The other partners are UKOLN, The MDA, The National Council on Archives, OCLC, NGfL (Scotland), SUfI, and the Scottish Library and Information Council (SLIC). Letters of support are provided: 2 will arrive late and will be forwarded later. A letter from the external evaluation consultant is also provided.

     

  17. Project Management Proposals, Mechanisms for Self-monitoring, Self-evaluation
  18. Day to day management will be the responsibility of the project staff and the Project Director. This Project Team will report to a Project Management Group consisting of the team and representatives from each of the participating institutions, perhaps enhanced by a few experts in sectoral terminologies. In addition, there will be a Project Steering Group representing stakeholder groups. To ensure the needs of the whole community are met, the project Steering Group would have at least one, and sometimes two, representatives from the various communities involved (Museums, Archives, HE, FE, public libraries, NGfL, SUFI and so on). Consideration will also be given to creating a larger working group that would echo the high level Steering Group in terms of membership but would work more directly and more frequently with the Project Steering Group. Regional requirements in Wales and Ireland will also be taken into account, as will possible future requirements to build in multi-lingual capacity. Self-monitoring and self-evaluation mechanisms will be built into this structure and will be agreed in detail with the Project Steering group. However, there will also be an external evaluation undertaken by a consultant with appropriate experience in the field.

     

  19. External Evaluation
  20. An external evaluation will be carried out towards the end of the project. This has been itemised separately in the costs section. Leonard Will is proposed as the consultant and it is estimated that a total of 12 days of consultancy will be required.

    Top

  21. Deliverables
  22. The project deliverables for HILT will be:

      • Initial survey and review of the UK and international literature on the issue of high level subject terminologies and their inter-relationships, and of expert UK and international opinion on problems, solutions and strategies related to the topic. [Deliverable 1]
      • A comprehensive list of all major UK stakeholders [Deliverable 2]
      • The development and implementation of a dissemination and feedback strategy for the project, the aim of which would be to inform these stakeholders on project progress and obtain ongoing input from them (web-site, e-list, possibly a newsletter if any participants will have difficulties accessing the web-site and the e-list) [Deliverable 3]
      • A report on each stakeholder’s perspective on the problem and related issues - the subject schemes they use (including those related to more specific subject areas), the elements of the problem from their viewpoints, their needs as regards integration of any UK solution with international schemes, their views on which, if any, existing universal schemes they see as potentially offering a solution or partial solution to either the question of international integration, or the cross-searching or browsing problem, or both. [Deliverable 4]
      • Extended version of deliverable 1 based on additional research informed by the results of deliverable 4 and offering a more complete view of available knowledge and opinion on the issue [Deliverable 5]
      • Report on the views of the various stakeholders on the merits and demerits of the various high level subject schemes detailed in deliverable 5. [Deliverable 6]
      • Report of a survey of literature, projects, organisations, and individuals to determine current views on best practice in respect of both user and machine oriented interfaces both to thesauri and to subject terminologies generally [Deliverable 7]
      • Draft report on the findings of the project to date, organising, evaluating and analysing the data, identifying a range of possible approaches to the various issues raised, and making draft recommendations on the best approach to the problem (or whether it is resolvable at all). [Deliverable 8]
      • Awareness raising in the stakeholder communities through dissemination of the draft report and a request for feedback and comment [Deliverable 9]
      • A major workshop in the MODELS series, this to include breakout sessions on key issues identified in the draft report [Deliverable 10]
      • Wide dissemination of workshop outcomes in the form of a new draft of the report [Deliverable 11]
      • Penultimate draft of the report [Deliverable 12]
      • Report of the external evaluator [Deliverable 13]
      • Final report and recommendations, incorporating the external evaluation report itself and, if appropriate, any changes it proposes [Deliverable 14]

    Top

     

  23. Start and Finish Dates
  24. It is proposed that the project be carried out over twelve months. The start date will depend on when the funds are made available. Possible dates are August 1st 2000 to July 31st 2001.

     

  25. Milestones; Schedule
  26.  

    The HILT project will last 12 months

    Month:

    Project set-up, staff in place, committees in place

    1

    Initial survey and review [Deliverable 1]

    1-3

    List of UK stakeholders [Deliverable 2]

    1-2

    Dissemination and feedback strategy (includes web-site) [Deliverable 3]

    14

    Stakeholders' perspectives report [Deliverable 4]

    2-6

    Extended survey and review [Deliverable 5]

    6-7

    Stakeholders' report on the merits and demerits of the various high level subject schemes [Deliverable 6]

    5-7

    Report on best practice: user and machine oriented interfaces [Deliverable 7]

    5-7

    Draft report, organising, evaluating, analysing data, and making draft recommendations. [Deliverable 8]

    5-7

    Awareness raising through dissemination of draft report [Deliverable 9]

    8

    MODELS Workshop [Deliverable 10]

    9

    Wide dissemination of workshop outcomes in the form of a new draft of the report [Deliverable 11]

    9-11

    Penultimate draft of the report [Deliverable 12]

    9-10

    External evaluation and report [Deliverable 13]

    2,10-11

    Final report and recommendations, incorporating the external evaluation report itself and, if appropriate, any changes it proposes [Deliverable 14]

    11-12

     

    Top

    HILT Schedule

    Month:

    S

    O

    N

    D

    J

    F

    M

    A

    M

    J

    J

    A

    Post Project

    Activity:

    Project set-up, staff in place, committees in place

                             

    Initial survey and review [Deliverable 1]

                             

    List of UK stakeholders [Deliverable 2]

                             

    Dissemination and feedback strategy (includes web-site) [Deliverable 3]

                             

    Stakeholder perspective's report [Deliverable 4]

                             

    Extended survey and review [Deliverable 5]

                             

    Stakeholders' report on the merits and demerits of the various high level subject schemes [Deliverable 6]

                             

    Report on best practice: user and machine oriented interfaces [Deliverable 7]

                             

    Draft report, organising, evaluating, analysing data, and making draft recommendations. [Deliverable 8]

                             

    Awareness raising through dissemination of draft report [Deliverable 9]

                             

    MODELS Workshop [Deliverable 10]

                             

    Wide dissemination of workshop outcomes in the form of a new draft of the report [Deliverable 11]

                             

    Penultimate draft of the report [Deliverable 12]

                             

    External evaluation and report [Deliverable 13]

                             

    Final report and recommendations, incorporating the external evaluation report itself and, if appropriate, any changes it proposes [Deliverable 14]

                             

    Top

  27. Standards
  28. HILT will aim to adhere to appropriate standards wherever possible. The project is aware of the British standard guide to establishment and development of monolingual thesauri (BS5723:1987) (ISO2788-1986) and the British standard guide to establishment and development of multilingual thesauri (BS6723:1985) (ISO5964-1985) and will consult these and other appropriate works. In addition, it will aim to build UK requirements around terminologies recognised and used internationally (e.g. DDC, LCSH, UNESCO). In all other matters, the eLib Standards Guidelines http://www.ukoln.ac.uk/services/elib/papers/other/standards/version2 would be consulted.

     

  29. Person Weeks required for the Work
  30. It is estimated that the work will occupy Grade 1A researcher at CDLR or equivalent, that MDA and NCA will each require £5000 to cover the cost of working with the project. In addition, that a .25 grade 1A researcher will be required at UKOLN, and that £5000 will be required to conduct the external evaluation. The CDLR and UKOLN posts will last for one year each.

     

  31. Total Estimated Cost and Contribution from RSLP
  32. The total project cost is £71,955 over 1 year, the RSLP contribution would be £67,955

    Top

  33. Dissemination Strategy
  34. Dissemination of information would be via the HILT web-site (and the CDLR, MDA, NCA and UKOLN web-sites), postings to appropriate e-mail lists, papers and news items submitted to professional publications and presentations at seminars and conferences. Key progress reports would be sent to all relevant organisations and institutions in the United Kingdom.

    The dissemination strategy will be proactive:

      • Bulletins will be sent on project aims, progress, and outcomes to all key e-mail discussion lists relevant to stakeholders and potential stakeholders. Each will have brief details of the news item, together with a specific URL (if appropriate) to further details on the project web-site. Each e-mail will also have the general URL of the project web-site
      • Similar news items and project reports will be disseminated by other means - through news items and papers in journals read by stakeholders and presentations given at seminars and conferences relevant to stakeholders
      • Stakeholders, their discussion lists, journals, newsletters and meetings will be identified early in the project so that the above strategy can be implemented.
      • Towards the end of the project, brief 'glossy' leaflets describing project outcomes will be published and sent to all relevant UK institutions. A web-based copy will also be set up and wider dissemination ensured through providing its URL in e-mail bulletins to appropriate lists.

    Top

     

  35. Proposed Exit Strategy
  36. HILT is essentially a study and an exit strategy is not appropriate (agreed with R. Milne). However, the project will aim to address how a proposed solution to the problem it addresses might be supported in the long term and will aim as far as possible to find a solution that fits in with existing maintained schemes in order to minimise any maintenance and development costs. It is not possible to say at this stage whether this will be possible.

     

  37. Statement of Policy on Access
  38. HILT participants agree to meet the requirements of the RSLP Steering Group on access where this is relevant to them and to the project.

     

  39. Statement on Minimum Standards for Bibliographic Records
  40. HILT participants agree to meet the requirements of the RSLP Steering Group with regard to minimum standards for bibliographic records where this is relevant to them and to the project.

    Top

  41. Biographies of Key Individuals
  42. Include:

    • Dennis Nicholson (CDLR and CAIRNS/SCONE)
    • Rachel Heery (UKOLN)
    • Matthew Stiff (MDA, Cornucopia)
    • Nick Kingsley (National Council for Archives)
    • Leonard Will (external consultant - see associated letter)
    • Paul Miller (Interoperability Focus) - Steering Group and advice
    • Gordon Dunsire (CAIRNS and SLAINTE)
    • Crawford Revie (Department of Information Science, University of Strathclyde).Over the past 2-3 years Crawford has been involved in a SHEFC funded project to create a 'MultiBrowser' to navigate data collected from the RAE exercises, and has set up a social science-based thesaural interface with the CSPP to survey statistical data. He is currently supervising a PhD student in the area of thesaurus-based query expansion within the agricultural domain, and has recently presented a workshop on "Thesauri on the Web" to the UN’s Food and Agriculture Organisation at their headquarters in Rome.

  43. Key Institutional Contacts
  44. Senior institutional officer: Derek Law, University of Strathclyde

    Project Director: Dennis Nicholson, University of Strathclyde

 


© HILT: High-Level Thesaurus