phase II

 

 
   
 

Minutes from HILT II Management Group Meeting
1st July 2003 11am
Seminar Room 3, Andersonian Library, Strathclyde University

Present: Alan Dawson (AD), Alan Gilchrist (AG), Stuart Holm (SH), Emma McCulloch (EM), Dennis Nicholson (DN), Ali Shiri (AS), Leonard Will (LW)

Apologies: Fionnuala Cassidy (FC), Elaine Fulton (EF), Nick Kingsley (NK), Joan Mitchell (JM)


1. Welcome, introductions, apologies for absence

 

2. Minutes of February meeting

DN asked for the second last paragraph on page 2 to be removed from the last minutes.

 

3. Matters arising


EM has looked into the WARMER project. It is an extension of the WARM (Widening Access to Resources in Merseyside) project which aimed to make academic and public library catalogues and archives searchable. EM only uncovered a reference to WARMER rather than any project website.

Action: EM and AS to look into this further for any relevance to HILT.

Fabio Crestani's paper entitled 'Combination of similarity measures for effective spoken document retrieval' was circulated to the group by EM. However, AG had thought a reference made to a Rutger's thesis within Crestani's paper was worth looking into.

Action: EM to investigate this

DN has discussed the INSIGHT methodology with Michael Coen. It is a useful methodology for the cost benefit analysis of the HILT project but some problems have been identified. This will be further discussed under item 7 on the agenda.

AG reported that Charles Oppenheim has written three papers for the Journal of Information Science. However, these are not of direct help to HILT.

DN reported that performance problems with the HILT server have now been resolved.

DN noted that LW had suggested the creation of a PERT chart detailing outstanding project work since the last meeting. He directed the group to the last page of the current methodologies document and described how the diagram displays relationships and dependencies between project processes. The next stage is to finalise the pilot which will feed into the specification for a full server. The cost-benefit analysis then needs to be conducted before putting it all together in the final report. LW felt he would still like to see something more detailed and it was suggested that AS should draft a to do list for circulation.

Action: AS to draft to do list

DN followed up the joint project suggestion between HILT and Wordmap made by AG at the last meeting; nothing came of this.

SH and AG attended a meeting with Matrix Data where mapping work is in place between several schemes including DDC and UNESCO. The mappings were originally used in conjunction with a UDC spine and Oracle to handle TV archives at the BBC.

Action: AG to ask Matrix if they are happy to be contacted; if so AS will follow this to enquire about the status of the mappings. Questions about the level of granularity, data format, and access costs should be asked.

 

4. Progress update

  • The workshop was held on 12th June 2003.

  • Different versions of the pilot server are in place (using different NTIs (Navigation Taxonomy Instances)).

  • Changes have been made to the stemming algorithm; this will be further discussed under item 9 on the agenda.

  • Access to JISC collections has been incorporated into the pilot where possible.

  • Different records from the same service have been included to satisfy collection strength related deliverables.

  • The 'UK term set' obtained from a UK University showed little differences to DDC when mapped to it. Many of the differences were typos and spelling differences. This is therefore unsuitable for use as a 'UK term set'. BUBL also showed few differences in this respect. AG suggested looking at an annotated version of DDC which has been compiled with a UK focus.

  • The INSIGHT methodology has been analysed.

  • A new version of the methodologies document has been circulated (more under agenda item 8).

  • The role of the SG in the cost benefit analysis has been defined (more under agenda item 7).

  • DN has attended various Shared Services meetings; of particular relevance to HILT is the Information Environment (IE) Service Registry for which requirements have been noted.

  • The HILT team are now starting to focus on the final report. AS has been identifying areas which may relate to HILT outcomes eg. Ontologies, the semantic web.

5. Workshop

EM reported that the workshop held on 12th June was attended by over 40 users who gave excellent feedback on the pilot. Preliminary results from the workshop were presented to the group. These comprised mean ratings taken from questions using the Likert scale which ran from 1 to 7. AS reported that these questions considered the layout and organisation of the screens, (homepage, disambiguation phase, and results), language used and instruction given. They also asked users to judge how easy/difficult it was to select an option at the disambiguation stage and how relevant they considered the results to be to their search. In addition to this, a selection of user comments were circulated amongst the group. These highlighted a range of positive and negative comments, problems specifically associated with the use of DDC, and suggestions for improvement.

Users' responses now have to be revisited for further analysis and search terms chosen by users must be mapped to the database. LW stressed that this is crucial since it is an area which has not been researched to date.

LW asked if users generally seemed disappointed with retrieval as some collections eg. medical are very general and would be returned in response to many queries. The team felt that this wasn't the case as some of the more general collections such as OMNI can be searched dynamically, bringing back information specific to the users' searches (ie. item level).

 

6. Pilot Terminologies Server Specification Update

DN presented the group with a document titled 'HILT Model Extension Notes' which details aspects/features to be built into the pilot.

Action: MG members are to read this document and respond to DN via email

 

7. SG Investigation and Discussion of Draft Cost-Benefit Analysis Plan

DN explained that this agenda item was included to inform the management group of documents being presented to the steering group. The plan is to do some preliminary work on the cost-benefit analysis process with the steering group on Thursday 3rd July and then conduct the full cost benefit analysis at the following steering group meeting in September.

AG suggested that it may be practical to compare benefits against a range of costs rather than trying to pinpoint exact figures which may not be wholly accurate.

It was decided that an additional August meeting would be held by the management group to discuss the documentation and outcome of the steering group's work on costs and benefits. Views from this will then be fed into the final report.

Action: EM to organise a meeting for the second half of August

 

8. Methodologies Document update

DN highlighted changes to various sections of the methodologies document:

  • Section 3: information on cost-benefit analysis and M2M requirements has been added

  • Section 4: details of auto term translation, Cheshire etc have been added as options to be compared within the cost-benefit analysis - the outcome of the SG discussion on 3rd July will determine which options to look at here

  • A chart has been added to the end of the document showing an overview of project processes including dependencies (as discussed under 'matters arising').

DN commented that the methodologies document will become a record of what we've done as well as a source of justification for decisions taken throughout the project.

 

9. Brainstorm on terminologies server specification

AD provided an overview of the pilot server to date. There are 650 UNESCO terms exactly mapped in HILT. No intellectual mappings exist between DDC and UNESCO as yet, but this will be done. MESH and DDC have been intellectually mapped. AD has looked at existing mappings between LCSH and DDC and has found them to be somewhat 'hit and miss'.

Action: DN/AD to clarify with OCLC what the program is doing

Points noted during the brainstorm:

  • The home page needs to be completely customised; the current automated means of creation limits the display to alphabetical order only.

  • Useful to include search terms beside disambiguation options (this would explain why eg. 'frogs' retrieves material on railways).

  • Should explanatory notes from DDC be used to clarify differences between options at the disambiguation phase? Include a sample record?

  • Browse hierarchy is unclear; show DDC path as indented display?

  • 'More Results' - should an explanation of what's happening here be given to users (wildcards etc)?

  • 'More Results' shows some of the original results again but at different positions in the ranking (ie. having different numbers). Might be better to get the widest retrieval first and rank them? LW suggested keeping the original results at the beginning so that they retained the same numbers following the selection of 'More Results'.

  • Headings including 'other' eg 'other animal flesh' are fairly meaningless without context ie. What is 'other'?

  • 'More Results' and 'Next' cause confusion when shown on the same page

  • Search box picks up DDC headings and displays them during later stages eg. 'fumes, gases, smoke'. Users may think they can enter search terms in this way.

  • How to order results? Resource type? It was thought that more information was needed on users' tasks to improve retrieval effectiveness. For example, different resources should be offered to someone writing a research paper compared with someone doing a commercial report.

  • AD pointed out problems searching for 'Language' and 'Science'.

Actions: These points should be considered by the HILT team and incorporated into the pilot specification as necessary
A note should be made in the final report stating that HILT is geared towards subject searches
Group members to email DN/EM with any further thoughts on this

 

10. Any other business

There was no other business.

 

11. DONM

This will be arranged via email.

Action: EM to arrange next meeting

 

12. Expenses Forms

Expenses forms were distributed as necessary.

EM
04/07/03

 


© HILT: High-Level Thesaurus