索引典在博物館的規劃應用(新增)
導論與基本原理
導論 索引典(Thesauri)?
從數位典藏/圖書館面臨的挑戰談起…
- 改善檢索效益,以處理大量的典藏資訊
- 為不同媒材的典藏資訊,提供統一查詢機制
- 為使用者提供知識為本的查詢機制
- 支援個人或工作團隊建立與維護資訊系統
- 支援資訊尋求行為,成為解決問題、學習與智性工作的一部分
- 支援合作性工作
- 讓學術傳播運行於電腦支援的多邊對話中
知識組織系統之概觀
用詞清單(Term Lists)
- 權威檔(Authority files)
- 詞彙表(Glossaries)
- 字典(Dictionaries)
- 地名詞典(Gazetteers)
分類與類目(Classification and Categories)
- 主題標目(Subject headings)
- 分類表或類目表(Classification Schemes, Taxonomies, and Categorization Schemes)
關聯性清單(Relationship Lists)
- 索引典(Thesauri)
- 語意網路(Semantic Networks)
- 知識本體(Ontologies)
權威檔(Authority Files)
Authority files are lists of terms that are used to control the variant names for an entity or the domain value for a particular field. Examples include names for countries, individuals, and organizations. Nonpreferred terms may be linked to the preferred versions. This type of KOS generally does not include a deep organization or complex structure. The presentation may be alphabetical or organized by a shallow classification scheme. A limited hierarchy may be applied to allow for simple navigation, particularly when the authority file is being accessed manually or is extremely large. Examples of authority files include the Library of Congress Name Authority File and the Getty Geographic Authority File.
詞彙表(Glossaries)
A glossary is a list of terms, usually with definitions. The terms may be from a specific subject field or from a particular work. The terms are defined within a specific environment and rarely include variant meanings. Examples include the Environmental Protection Agency (EPA) Terms of the Environment.
字典(Dictionaries)
A dictionary is a listing or words and phrases giving information such as spelling, morphology and part of speech, senses, definitions, usage, origin, and equivalents in other languages (bi- or multilingual dictionary).
地名詞典(Gazetteers)
A gazetteer is a list of place names. Traditional gazetteers have been published as books or have appeared as indexes to atlases. Each entry may also be identified by feature type, such as river, city, or school. An example is the U.S. Code of Geographic Names. Geospatially referenced gazetteers provide coordinates for locating the place on the earth's surface.
主題標目(Subject Headings)
This scheme type provides a set of controlled terms to represent the subjects of items in a collection. Subject heading lists can be extensive and cover a broad range of subjects; however, the subject heading list's structure is generally very shallow, with a limited hierarchical structure.
分類表(Classification Schemes)
v Classification Schemes, Taxonomies, and Categorization Schemes.
These terms are often used interchangeably. Although there may be subtle differences from example to example, these types of KOSs all provide ways to separate entities into "buckets" or broad topic levels. Some examples provide a hierarchical arrangement of numeric or alphabetic notation to represent broad topics. These types of KOSs may not follow the rules for hierarchy required in the ANSI NISO Thesaurus Standard (Z39.19) (NISO 1998), and they lack the explicit relationships presented in a thesaurus. Examples of classification schemes include the Library of Congress Classification Schedules (an open, expandable system), the Dewey Decimal Classification (a closed system of 10 numeric sections with decimal extensions), and the Universal Decimal Classification (based on Dewey but extended to include facets, or particular aspects of a topic).
Subject categories are often used to group thesaurus terms in broad topic sets that lie outside the hierarchical scheme of the thesaurus. Taxonomies are increasingly being used in object-oriented design and knowledge management systems to indicate any grouping of objects based on a particular characteristic. (The science of naming things is called taxonomy).
索引典(Thesauri)
1. A thesaurus is a structure that manages the complexities of terminology and provides conceptual relationships, ideally through an embedded classification/ontology.
2. A thesaurus may specify descriptors authorized for indexing and searching. These descriptors form a controlled vocabulary (authority list, index language).
3. A monolingual thesaurus has terms from one language, a multilingual thesaurus from two or more language.
語意網路(Semantic Network)
With the advent of natural language processing, there have been significant developments in semantic networks. These KOSs structure concepts and terms not as hierarchies but as a network or a web. Concepts are thought of as nodes, and relationships branch out from them. The relationships generally go beyond the standard BT, NT, and RT. They may include specific whole-part, cause-effect, or parent-child relationships. The most noted semantic network is Princeton University's WordNet, which is now used in a variety of search engines.
知識本體(ontology)
Ontology is the newest label to be attached to some knowledge organization systems. The knowledge-management community is developing ontologies as specific concept models. They can represent complex relationships among objects, and include the rules and axioms missing from semantic networks. Ontologies that describe knowledge in a specific area are often connected with systems for data mining and knowledge management.
索引典與Metadata的關係 Metadata
索引典的功能:以數位典藏環境為例
1. 支援學習與吸收理解資訊
2. 協助研究者與實務者的問題釐清
3. 支援資訊檢索
- 提供知識為本的使用者檢索支援
- 支援資訊展現
- 提供索引的工具
- 促進多個資料庫的結合,或統一查詢多個資料庫
- 支援檢索後的文件處理
索引典的基本原理
1. 通則(概念的類型)
- 具體實體(concrete entities)
˜ Things and their physical parts
˜ Materials
- 抽象實體(abstract entities)
˜ Actions and events
˜ Abstract entitles, and properties of things, materials or actions
˜ Disciplines or sciences
˜ Units of measurement
- 個別實體(individual entities)
2. 詞的形式
- 名詞或名詞片語
˜ Adjectival phrases
˜ Prepositional phrases
- 形容詞
- 副詞
- 動詞
- 縮寫
3. 同形異義字(homographs)或一詞多義(polysemes)
4. 詞的選擇
- Spelling
- Loan words and translations of loan words
- Transliteration
- Slang terms and jargon
- Common names and trade names
- Popular names and scientific names
- Place names
- Proper names of institutions and persons
5. 範圍註與定義
6. 複合詞(compound terms)基本概念
- 定義
- 原則
- 考慮因素
-特質
-列入複合詞的要件
-不列入複合詞的要件
7. 複合詞(compound terms)的定義
- 是一種多字詞(multiwords)
- 將二個以上的字予以結合在一起,來表達一個語意(lexical unit)
8. 複合詞(compound terms)的原則
- 必須能夠在一個階層式或樹狀式的結構中, 來表達一個概念(a single concept)或思想(a unit of thought)
- 範例
˜ children and television
˜ adopted children
˜ educational television
9. 複合詞(compound terms)的考慮因素
- 作品保證(literary warrant)
- 索引典詞彙數量的管理
- 紙本式與電腦系統式
˜ Precoordinated
˜ Postcoordinated
- 避免檢索上的錯誤(false drops in retrieval)
˜ library science
˜ library science
˜ science library
10. 複合詞(compound terms)的特質
- 集中焦點(focus, head noun)
˜ 用以標引與識別較大範圍的層次(broader class)
- 辨別差異(difference, modifier)
˜ 用以標引與縮小較小範圍的層次(subclass)
- 範例
˜ concrete reinforced concrete
˜ glass stained glass
11. 列入複合詞的要件
- 分開會導致語意上的模糊或遺漏
˜ plant food, rose windows
- 單獨存在時, 會有語意模糊的現象
˜ composite drawings, first aid
- 修飾詞(modifier)已非原來的意涵
˜ trade winds
- 修飾詞已引導至另外一種意義
˜ butterfly valves, tree structure
- 修飾詞並不是對原有的層級概念(subclass)加以修飾
˜ rubber ducks
- 已是正式名的一部份
˜ United Nations
12. 不列入複合詞的要件
-焦點(focus)屬於某一屬性或物件的一部份;如果必須依存它時,仍使用複合詞
˜ office management = offices[object] + management[action]
˜ printed textiles (printed[action] + textiles[object])
- 修飾物件的一項動作,如果這項動作與物件有依存關係時,仍使用複合詞
˜ birds migration = birds[agent] + migration[action]
˜ dancing shoes (dancing[action] + shoes[object])
13.索引典的基本關聯屬性
等同關係
1. 參照符號
- USE
- UF
2. 包括二類型的詞
- 同義字
˜ Terms of different linguistic origin
˜ Popular names and scientific names
˜ Common nouns and trade names
˜ Variant names for emergent concepts
˜ Current or favoured terms vs. outdated or deprecated terms
˜ Variant spellings, including stem variants and irregular plurals
˜ Terms originating from different cultures sharing a common language
˜ Abbreviations and full names
˜ The factored and unfactored form of a compound term
- 半同義字
3. USE
4. UF
- Aves USE birds;birds UF Aves
- Outline USE shape;shape UF outline
5. 等同關係的類型(同義字)
a) terms of different linguistic origin
Examples:
cats / felines
freedom / liberty
sodium / natrium
sweat / perspiration
b) popular terms and scientific names
Examples:
aspirin / acetylsalicylic acid
gulls / Laridae
salt / sodium chloride
c) generic nouns and trade names
Examples:
petroleum jelly / Vaseline
photocopies / Xeroxes
refrigerators / Frigidaires
tissues / Kleenex
d) variant names for emergent concepts
Examples:
hovercraft / air cushion vehicles
e) current or favored terms versus outdated or deprecated terms
Examples:
poliomyelitis / infantile paralysis
developing countries / underdeveloped countries
f) common nouns and slang or jargon terms
Examples:
helicopters / whirlybirds
psychiatrists / shrinks
g) dialectal variants
Examples:
elevators / liftssubways / undergrounds
6. 等同關係的類型(半同義字)
Examples:
Wetness / dryness
Smoothness / roughness
- Generic Posting
Examples:
waxes plant waxes
UF plant waxes USE waxes
Furniture
UF beds beds USE furniture
UF chairs chairs USE furniture
UF desks desks USE furniture
UF tables tables USE furnitire
1.3.5 層級關係
1. 參照符號
- BT
- NT
2. 包括四類型
- 屬種關係
- 整部關係
˜ Systems and organs of the body
˜ Geographical locations
˜ Disciplines or fields of discourse
˜ Hierarchical social structures
- 實例關係
- 多層級關係
3. 參照符號
- BT (Broader Term)
- NT (Narrower Term)
4. 例子1
mammals vertebrates
BT vertebrate NT mammals
5. 例子2
- anatomy vs. central nervous system
- Central nervous system vs. brain
6. 屬種關係
7. 代碼符號
BTG = Broader term (generic)
NTG = Narrower term (generic)
例子 ratsrodents
BTG rodents NTG rats
8. 整部關係
- systems and organs of the body
Example:
nervous system
central nervous system
brain
spinal cord
- Geographic locations
Example:
Canada
Ontario
Ottawa
Toronto
- Disciplines or fields of discourse
Example:
science
biology
botany
zoology
- Hierarchical organizational, corporate, social, or political structures
Example:
countries
states/provinces
cities
9. 整部關係的代碼符號
- BTP = Broader term (partitive)
- NTP = Narrower term (partitive)
Example:
Central nervous system nervous system
BTP nervous system NTP central nervous system
10. 實例關係
Example:
mountain regions-class- state capitals
Alps -instances- Albany
Himalayas Trenton
11. 實例關係的代碼符號
- BTP = Broader term (partitive)
- NTP = Narrower term (partitive)
Example:
Fairy tales
NTI Cinderella
12. 多層級關係
13. 多層級關係的節點標示(node labels)
Example:
Cars
by purpose
racing cars
sports cars
1.3.6 聯想關係
1. 參照符號
- RT
2. 包括二類型
- 相同範疇
- 不同範疇
˜ A discipline or filed of study and the objects or phenomena studied
˜ An operation or process and its agent or instrument
˜ An action and the product of the action
˜ An action and its patient
˜ Concepts related to their properties
˜ Concepts related to their origins
˜ Concepts linked by causal dependence
˜ A thing and its counter agent
˜ A concept and its unit of measurement
˜ Syncategorematic phrases and their embedded nouns
3. 參照符號
— RT (related term)
Example:
cells cytology
RT cytology RT cells
Example:(相同範疇)
boats ships
BT vehicles BT vehicles
RT ships RT boats
4. 相同範疇的聯想關係(衍生關係)
Example:(字順展現)
donkeys
horses
BT equines
BT equines
RT mules
RT mules
equines
mules
NT donkeys
BT equines
NT houses
RT donkeys
NT mules
RT houses
5. 不同範疇的聯想關係
—Disciplines or fields of study and the objects or phenomena studied, or the discipline!|s practitioners
Example:
mathematics
mathematicians
RT mathematicians
RT mathematics
neurology
nervous system
RT nervous system
RT neurology
botany
plants
RT plants
RT botany
—Operations or processes and their agents or instruments
Example:
temperature control
thermostats
RT thermostats
RT temperature
hunters
hunting
RT hunting
RT hunters
—An action and their products
Example:
scientific research
RT scientific inventions
publishing
RT music scores
—An action and its patient
Example:
data analysis
RT data
teaching
RT students
—Concepts related to their properties
Example:
liquids
RT surface tension
women
RT femininity
—Concepts related to their origins
Example:
water
RT water wells
information
RT information sources
—Concepts linked by causal dependence
Example:
injury
RT accidents
infections
RT pathogens
—A thing or action and its counter agent
Example:
害蟲
RT 殺蟲劑
腐蝕
RT 腐蝕抗化劑
—A raw material and its product
Example:
(拌水泥用的)粒料
RT 混凝土
獸皮
RT 皮革製品
—An action and a property associated with it
Example:
精確測量
RT 準確性
—A concept and its opposite
Example:
單身
RT 已婚者
寬容
RT 偏見
6. 多層級關係的節點標示(node labels)
Example:
Books
RT
Binding
printing
1.3.7 索引典的結構
概念與用語間的關係 (Concept-term relationships)
概念性結構(conceptual structure)的二大原則
—語意與層面分析(Semantic and facet analysis)
—層級 (Hierarchy)
應用範例
Linking to Full Text
Linking Sequence No. to Bio-sequence Databanks
Linking Individual Industrial Codes to the Full Scheme
Linking to Descriptive Records
Linking Organism Names to Taxonomic Records
Linking Personal Names to Biographical Information
- http://virtua.lib.tku.edu.tw
- http://www.lib.ntu.edu.tw/catalog/webpac/webpac.asp
規劃與設計原則
設計前提
1. 避免重複工作的投入
- 採用既有的索引典
- 以既有的索引典為基礎,進行小幅度新增,修改與刪除
- 發展新的索引典
2. 決定索引典的結構與展現格式
- 扁平式(flat)
- 階層式(hierarchy)
3. 發展方式
- 委員會:由一組學科專家組成
- 經驗式:從既有文獻中予以分析出所需的詞彙
- 混合式(hybrid)
4. 電腦工具的協助與利用
- 潛在的詞彙(candidate terms)與停用詞清單(stop list, ex. To, the et al.)
- 索引典詞彙的實際使用次數
- 索引典詞彙實際被使用(query)的次數
5. 索引詞的Metadata (Term Records)
- Descriptor, scope note (SN), 同義詞, non-displayable variations, NT/BT, RT, category/call no., 歷史註 (HN)
6. 索引詞的品質驗證(Term Verification)
7. 索引詞的精確度(Level of Specificity)
- 數量&成本
8. 未使用的索引詞(Unassigned Descriptors)
9. 公佈與存儲(Announcement and Deposit of Published Thesauri)
規劃原則
1. 範圍
- 學科與領域
- 核心主題—深入,邊緣主題—粗略
2. 資料類型
- 期刊—精細,圖書—約略
3. 資料量
- 資料量大小與成長率
- 量大小 vs. 成本高低
4. 使用者
- 學科專家 vs. 大眾
5. 問題類型
- 概括—粗略,明確—詳細
6. 詞彙組合方式
- 前組合或後組合
7. 詳簡度
建構方法
索引典的建構程序
1. 主題領域的界定
2. 索引典特徵與陳列的選擇
3. 公告
4. 演繹法 vs. 歸納法
5. 詞的選擇
- Terminological sources in standardized form
- Literature scanning
* Manual selection
* Automatic term selection
- Question scanning
- User’/experts’ experience and knowledge
- The complier’s experience and knolwedge
6. 詞的記錄
7. 發現結構
- Preliminary organization of the subjects covered by the thesaurus
- Analysis and grouping of terms within broad categories
* Analysis using a systematic display
* Analysis using a graphic display
- Editing the systematic display
8. 從systematic display製作字順索引典
— 傳統字順索引典的製作
— 字順展現伴隨分類展現
9. 與專家進行最後檢查
10. 介紹此套索引典
11. 編輯
12. 測試
13. 出版製作
14. 寄存於交換中心
管理
索引典的管理
1. 維護機制與例行作業
2. 索引典的修改
— 既有詞彙的修正(Amendment of exiting terms)
— 詞彙狀態的標示(Status of exiting terms)
* 原不用詞彙晉升為使用詞彙
* 原使用詞彙的階層下降或升級
— 詞彙的刪除(Deletion or demotion of existing terms)
— 關係的更新(Addition of new, or deletion of old relationships)
— 詞彙的新增(Addition of new terms)
— 既有索引典結構的調整(Amendment of exiting structure)
3. 索引典管理軟體
4. 索引典的實質形式
— 印成紙本
— 電子索引典
* 子檔案(Flat file ): 並不具備任何關係或結構
* 資料庫架構(Database structured)
* 超鏈結功能(Hypertext)
相關議題
1. 多語言索引典
2. 調和與整合索引典
— 調合式
* 從專業或單一學科轉換至某一常用或通用的索引典
* AAT→LCSH
* MeSH→LCSH
— 整合式
* 巨觀 vs. 微觀(Macrothesauri vs. Microthesauri)
-將某一專業或單一學科融入至某一常用或通用的索引典
* 變成單一索引典: 全部保留與並置,同時予以打散與建立
主要參考書目
Soergel, D. (2004). Thesauri and ontologies in digital libraries: tutorial. In ECDL 2004. ISO 2788-1986(E) (Documentation-Guidelines for the establishment and development of monolingual thesauri)
NISO (2003). Guidelines for the construction, format, and management of monolingual thesauri. ANSI/NISO Z39.19-2003
Aitchison, J., Gilchrist A. and Bawden, D. (1997). Thesaurus construction and use: a practical manual. London: Aslib.
參考資料
參與研發單位:技術研發分項計畫-後設資料工作組
提供單位:技術研發分項計畫-後設資料工作組
使用單位:各主題計畫