數位典藏中文漢字處理技術-語言座標
核心技術開發
語言座標建置目標,是以概念單位(詞)為中心,建立跨語言典藏內容與知識交換的基底架構。這個基底架構以跨越語言差異為經,以建立知識體系描述為緯,企符合下一代網路─語意網─的需求。本計畫所建立之雙語知識本體詞網,將是下一代網路對中文進行內容知識處理與交換時不可或缺的基本骨幹。主要開發之核心技術分別為:
1. 建構中文、中英雙語通用型及特定知識本體之輔助工具平台。
2. 以精確詞義區分研究建立中文詞彙知識系統與中英雙語詞網,同時開發雙語領域詞彙庫之建構工具與系統維護介面。
3. 漢字意符知識本體的建立。
4. 參與開發Word Sketch Engine系統並進行中文內容的導入。
語言座標計畫相關網址:http://linganchor.sinica.edu.tw/。
系統說明
- 中研院中英雙語知識本體詞網
中研院中英雙語知識本體詞網:http://bow.sinica.edu.tw
- 中文詞彙網路
中文詞彙網路:http://cwn.ling.sinica.edu.tw
- 中文Word Sketch Engine系統
中文Word Sketch Engine系統:http://corpora.fi.muni.cz/chinese_all_test/
免費工具
項目 | 創作代表人 | 移轉對象 | 開放時間 (年/月) |
漢字構型資料庫 | Freeware | http://rocling.iis.sinica.edu.tw/CKIP/tool/ | 2002/10 |
分詞工具(Segment) | Freeware | http://rocling.iis.sinica.edu.tw/CKIP/tool/ | 2002/10 |
領域辭典工具 | Freeware | http://rocling.iis.sinica.edu.tw/CKIP/tool/ | 2004/12 |
Tag tool for Windows | Freeware | http://ckipsvr.iis.sinica.edu.tw | 2002/10 |
處理網頁缺字的JavaApplet | Freeware | http://ckipsvr.iis.sinica.edu.tw | 2002/10 |
雙音節頻率前5000詞表 | Freeware | http://ckipsvr.iis.sinica.edu.tw | 2004/12 |
三音節頻率前3000詞表 | Freeware | http://ckipsvr.iis.sinica.edu.tw | 2004/12 |
Word Sketch Engine | Freeware | http://corpora.fi.muni.cz/chinese_all_test/ |
技術轉移
項目 | 創作代表人 | 移轉對象 | 經費收入 | 日期 (年/月) | 備註(已核定/洽談中) |
中英雙語知識本體 | 黃居仁 | 新加坡大學資工系(Ng Hwee Tou) | $11,679 | 94/7 | 已核定 |
中英雙語知識本體 | 黃居仁 | 新加坡大學資工系(Chan Yee Seng) | $11,680 | 94/7 | 已核定 |
中英雙語知識本體 | 黃居仁 | Eduard Hovy University of Southern California (ISI) | $11,973 | 95/1 | 已核定 |
論文發表
- 期刊論文
1. Huang, Chu-Ren, Chun-Ling Chen, Cui-Xia Weng, Hsiang-Ping Lee, Yong-Xiang Chen and Keh-jiann Chen. (2005). The Sinica Sense Management System: Design and Implementation. Computational Linguistics and Chinese Language Processing, Vol. 10, No.4.
2. Huang, Chu-Ren, and Ru-Yng Chang. (2004). Categorical ambiguity and information content: A Corpus-based study of Chinese. Hongyin Tao (Ed.) Special Issue: Corpora, Language Use, and Grammar. Journal of Chinese Language and Computing, Vol 14, No. 2.
3. Huang, Chu-Ren and Kathleen Ahrens. (2003). Individuals, Kind and Events: Classifier Coercion of Nouns. Language Sciences. 25.4.353-373.
4. Ahrens, Kathleen, Shirley Yuan-hsun Chuang, and Chu-Ren Huang. (2003). Sense and Meaning Facets in Verbal Semantics: A MARVS Perspective. Languages and Linguistics. 4.3.469-484.
5. Huang, Chu-Ren, Elanna I. J. Tseng, Dylan B. S. Tsai, Brian Murphy. (2003). Cross-lingual Portability of Semantic relations: Bootstrapping Chinese WordNet with English WordNet Relations. Languages and Linguistics. 4.3.509-532.
6. Weng, Cui-Xia, and Chu-Ren Huang. (2003). The Semantics of Shapes: a study based on Mandarin quan1zi5 (圈子). Journal of Chinese Language and Computing. 13.4.329-337.
- 會議論文
1. 黃居仁(2005)。漢字知識表達的幾個層面:字、義、與詞義關係 (Knowledge Representation with Hanzi: The relationship among characters, words, and senses). 漢字與全球化國際學術研討會 (International Conference on Chinese Characters and Globalization). January 28-30. Taipei.
2. Wu, Yiching, Simon Smith, and Chu-Ren Huang. (2005). How to express ‘express’? – Pedagogical reflections from Internet resources. Presented at 2005 International Conference on TEFL and Applied Linguistics. March 11-13. Ming-Chuan University, Taoyuan Taiwan.
3. 黃居仁,洪嘉馡(2005)。感官動詞的近義辨析:詞義與概念的關係。[Near Synonyms in Verbs of Senses: From Lexical Meaning to Concepts.] Plenary paper presented at the Sixth Chinese Lexical Semantics Workshop. Xiamen. April 21-24.
4. 洪嘉馡,黃居仁,巫宜靜(2005)。異體字與異體詞詞彙語意初探。[Towards a study on the Lexical Semantics of Character- and Word-Variants.] Presented at the Sixth Chinese Lexical Semantics Workshop. Xiamen. April 21-24.
5. 周亞民,黃居仁(2005)。漢字意符知識結構的建立。[Construction of a Knowledge Structure based Chinese Radicals.] Presented at the Sixth Chinese Lexical Semantics Workshop. Xiamen. April 21-24.
6. Kilgarriff, Adam, Chu-Ren Huang, Pavel Rychly, Simon Smith, and David Tugwell. (2005). Chinese Word Sketches. ASIALEX (2005): Words in Asian Cultural Context. June 1-3. Singapore.
7. 黃居仁(2005)。語意網時代的語言學研究:機會與挑戰。 [Linguistic Research in the Age of Semantic Web: Opportunities and Challenges]. Keynote Speech. 2005 National Conference on Linguistics. July 3-4. Hsinchu: National Chiao-tung University.
8. Kilgarriff, Adam, Chu-Ren Huang, Michael Rundell, Pavel Rychly, Simon Smith, David Tugwell, Elaine Ui Dhonnchadha. (2005). Word Sketches for Irish and Chinese. Presented at Corpus Linguistics 2005. July 14-17. Birmingham, UK.
9. Huang, Chu-Ren. (2005). Ontology and Lexical Semantics. Invited Speech. 8th National Joint Conference on Computational Linguistics. Nanjing. August 27-31.
10. Huang, Chu-Ren, I-Li Su, Jia-fei Hong, and Xiang-bin Li. (2005). Cross-lingual Conversion of Lexical Semantic Relations: Building Parallel Wordnets. Presented at the Fifth Asian Language Resources Workshop. October 14. Jeju, Korea.
11. Huang, Chu-Ren, Adam Kilgarriff, Yicing Wu, Chih-Min Chiu, Simon Smith, Pavel Rychly, Ming-Hong Bai, and Keh-jiann Chen. (2005). Chinese Sketch Engine and the Extraction of Collocations. Presented at the Fourth SigHan Workshop on Chinese Language Processing. October 14-15. Jeju, Korea.
12. Huang, Chu-Ren, Shiang-bin Li, and Jia-fei Hong. (2005). The Robustness of Domain Lexico-Taxonomy: Expanding Domain Lexicon with Cilin. Presented at the Fourth SigHan Workshop on Chinese Language Processing. October 14-15. Jeju, Korea.
13. Chou,Ya-ming, Chu-Ren Huang. (2005). Hantology: An Ontology based on Conventionalized Conceptualization. Presented at the Fourth OntoLex Workshop. October 15. Jeju, Korea.
14. Chang, Ru-Yng, Chu-Ren Huang, Feng-ju Lo, Sueming Chang. (2005). From General Ontology to Specialized Ontology: A Study based on a Single Author Historical Corpus. Presented at the Fourth OntoLex Workshop. October 15. Jeju, Korea.
15. Chang, Ru-Yng, Chu-Ren Huang, and Chin-Chuan Cheng. (2004). Implementation of an OLAC-based Linguistic Metadata System over a Set of Heterogeneous Language Archives. Proceedings of the 4th workshop on Asian Language Resources. PP.79-86. March 25, 2004. Hainan Island.
16. Huang, Chu-Ren, Li, Xiang-Bing, Hong, Jia-Fei. (2004). Domain Lexico-Taxonomy:An Approach Towards Multi-domain Language Processing. Proceedings of the Asian Symposium on Natural Language Processing to Overcome Language Barriers. Pp. 54-60. March 25-26, (2004). Hainan Island.
17. Huang, Chu-Ren, Chang, Ru-Yng, Lee, Shiang-Bin. (2004). Sinica BOW (Bilingual Ontological Wordnet): Integration of Bilingual WordNet and SUMO”. Presented at the 4th International Conference on Language Resources and Evaluation (LREC(2004)). Lisbon. Portugal. 26-28 May, 2004.
18. Huang, Chu-Ren, Feng-ju Lo, Ru-Yng Chang, and Sueming Chang. (2004). Reconstructing the Ontology of the Tang Dynasty: A pilot study of the Shakespearean-garden approach. Presented at the OntoLex (2004) Workshop. Lisbon. May 30, 2004.
19. Zhang, Huarui, Chu-Ren Huang, and Shiwen Yu. (2004). Distributional Consistency: A General Method for Defining A Core Lexicon. Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC2004). 1119-1222. Lisbon. Portugal. 26-28 May, 2004.
20. Huang, Chu-Ren, Chun-ling Chen, Cui-Xia Weng, and Keh-jiann Chen. (2004). The Sinica Sense Management System: Design and Implementation. In Ji Donghong, Lua Kim Teng, and Wang Hui (Eds.) Recent Advancement in Chinese Lexical Semantics. Pp. 15-22. Presented at the Fifth Chinese Lexical Semantics Workshop (CLSW5). June 14-16, Singapore.
21. Chung, Siaw-Fong, Kathleen Ahrens and Chu-Ren Huang. (2004). From Lexical Semantics to Conceptual Metaphors: Mapping Principle Verification with WordNet and SUMO. In Ji Donghong, Lua Kim Teng, and Wang Hui Eds. Recent Advancement in Chinese Lexical Semantics. Pp. 99-106. Presented at the Fifth Chinese Lexical Semantics Workshop (CLSW5). June 14-16, Singapore.
22. Ahrens, Kathleen, Chung, Siaw-Fong, and Chu-Ren Huang. (2004). Using WordNet and SUMO to Determine Source Domains of Conceptual Metaphors. In Ji Donghong, Lua Kim Teng, and Wang Hui Eds. Recent Advancement in Chinese Lexical Semantics. Pp. 91-98. Presented at the Fifth Chinese Lexical Semantics Workshop (CLSW5). June 14-16, Singapore.
23. Chung, Siaw-Fong, Kathleen Ahrens and Chu-Ren Huang. (2004). RECESSION IS A DISEASE: Defining Source Domains through WordNet and SUMO. In the Proceedings of the 2004 Linguistic Society of Korea (LSK) International Summer Conference, Yonsei University, Seoul Korea, Korea. July 28-30.
24. Huang, Chu-Ren. (2004). Towards a Fully Sense-Tagged Corpus for Mandarin Chinese. Presented at the 4th China-Japan Joint Conference to Promote Cooperation in Natural Language Processing (CJNLP-04). November 10-13, City University of Hong Kong.
25. Huang, Chu-Ren. (2004). Irregular Form-Sense Pairs in Mandarin Chinese: Examples for LMF Case Study. Presented at ISO TC37/SC 2 and SC 4 Joint Meetings. November 16-19, 2004, Pisa.
26. Huang, Chu-Ren. (2004). Text-based Construction and Comparison of Domain Ontology: A Study Based on Classical Poetry. Invited Speech. Presented at PACLIC18. December 8-10. Tokyo: Waseda University.
27. Hong, Jia-Fei, Xiang-Bing Li and Chu-Ren Huang. (2004). Ontology-based Prediction of Compound Relations: A study based on SUMO. Presented at PACLIC18. December 8-10. Tokyo: Waseda University.
28. Cheng, Chin-chuan, Chu-ren Huang, Xiang-yu Chen, Yu-chun Huang, Joyce Han, Feng-ju Lo. (2004). Extensive Reading with Guidance. Presented at International Workshop on Language e-Learning 2004. December 10, Tokyo: Waseda University.
29. 黃居仁(2004)。由語言資源到知識本體─「語言典藏」與「語言座標」計畫的規劃與執行。兩岸三院資訊技術交流與數位資源共享研討會。台北:中央研究院。2004年6月1日至3日。
30. 張如瑩,黃居仁 (2004)。中央研究院中英雙語知識本體詞網 (Sinica BOW):結合詞網,知識本體,與領域標記的詞彙知識庫。發表於第十六屆自然語言與語音處理研討會(ROCLING XVI)September 2-3. Greenbay, Taipei.
31. 柯淑津,陳振南,黃居仁 (2004)。全語料庫中文詞義標記的初步研究 [First Steps Towards a Fully Sense-Tagged Chinese Corpus]. 漢語詞彙語意研究的現狀與發展趨勢國際學術研討會。11月7-8日,北京大學。
32. 洪嘉馡,黃居仁 (2004)。「聲」與「音」的近義辨析:詞義與概念的關係 [The Near Synonym Pair Sheng and Yin: A study of the relation between sense and concept]. 漢語詞彙語意研究的現狀與發展趨勢國際學術研討會。11月7-8日,北京大學。
33. Huang, Chu-Ren, Feng-ju Lo, Ru-Yng Chang, Sueming Chang. (2004). Sinica BOW and 300 Tang Poems: An overview of a bilingual ontological wordnet and its application to a small ontology of Tang poetry. Invited talk. Workshop on Possibilities of a Knowledgebase of Tang Civilization: Towards a new comprehensive digital archive of Tang China. Institute for Research in Humanities, Kyoto University. February 20-21.
34. Huang, Chu-Ren. (2003). Word Knowledge, World Knowledge, and Ontology: Towards a linguistic infrastructure for knowledge representation and knowledge engineering. Presented at Language and Knowledge Representation: Mini-workshop on functional approaches to language. National Jiao-Tung University. February 26, 2003.
35. 黃居仁,蔡柏生,朱梅欣,何婉如,黃麗婉,蔡宜妮 (2003)。詞義與義面:中文詞彙意義的區辨與操作原則 [Sense and Meaning Facet: Criteria and Operational Guidelines for Chinese Sense Distinction]. Presented at the Fourth Chinese Lexical Semantics Workshops. June 23-25 Hong Kong, Hong Kong City University.
36. Weng, Cui-Xia Ru-Yng Chang, Elizabeth Zeiton, Chao-Jun Chen, Derming Juang, Chu-Ren Huang, and Chin-chuan Cheng. (2003). Taiwan's NDAP Language Archives Project: from bronze inscription texts to Austronesian field recording.Presented at E-MELD Conference 2003: Digitizing & Annotating Texts & Field Recordings. East Lansing, July 11-13.
37. Ahrens, Kathleen, Chu-Ren Huang, and Siaw-Fong Chung. (2003). Conceptual Metaphors: Ontology-based representation and corpora driven Mapping Principles. Proceedings of the ACL Workshop on Lexicon and Figurative Language. Pp.35-41. July 11, Sapporo, Japan.
38. Bird, Steven, Chu-Ren Huang, Gary Simons. (2003). Accessing Distributed Resources Information: an OLAC Perspectiv e. Presented at the ENABLER / ELSNET Workshop International Roadmap for Language Resources. August 28-29, Paris.
39. Chung, Siaw Fong, Kathleen Ahrens and Chu-Ren Huang. (2003). ECONOMY IS A PERSON: A Chinese-English Corpora and Ontological-based Comparison Using the Conceptual Mapping Model. In the Proceedings of the 15th ROCLING Conference for the Association for Computational Linguistics and Chinese Language Processing, National Tsing-Hwa University, Taiwan. PP.87-110.
40. 黃居仁 (2003)。從詞彙庫到知識本體:為專業知識庫許個「語意網」的未來美景.「醫藥衛生圖書資源專題講座暨研習會:網路時代參考服務之應用」邀請講席。 2003年9月25日,成功大學醫學圖書館。
41. Weng, Cui-Xia and Chu-Ren Huang. (2003). The Semantics of Shapes: A Study based on Mandarin Quan1zi5 [Circle]. Proceedings of the 17th Pacific Asia Conference on Language, Information and Computation. Singapore, 1-3 October, 2003.
42. Tsai, I-Ni, and Chu-Ren Huang. (2003). The Semantics of Onomatopoeic Speech Act Verbs. Proceedings of the 17th Pacific Asia Conference on Language, Information and Computation. Singapore, 1-3 October, 2003.
43. 黃居仁,張如瑩,蔡柏生(2003)。語意網時代的網路華語教學-兼介中英雙語知識本體與領域檢索介面。Chinese Language Education and the Developing Semantic Web: An Introduction to Chinese-English Bilingual Ontology Interface. Presented at the Third International Conference of Internet Chinese Education. Taipei, Oct. 24-26. 2003.
44. Chung, Siaw-Fong, Chu-Ren Huang, and Kathleen Ahrens. (2003). ECONOMY IS A TRANSPORTATION_DEVICE: Contrastive Representation of Source and Domain Knowledge in English and Chinese. Proceedings of the 2003 IEEE International Conference on Natural Language Processing and Knowledge Engineering (NLPKE2003), Special Session on Upper Ontology and Natural Language Processing. Oct. 28, 2003.
45. Huang, Chu-Ren. (2003). SINICA BOW: Integrating Bilingual WordNet and SUMO Ontology. Invited Panel talk: Synergy Between Language Resources and Knowledge Resources. The 2003 IEEE International Conference on Natural Language Processing and Knowledge Engineering (NLPKE2003), Special Session on Upper Ontology and Natural Language Processing. Beijing. Oct. 28, 2003.
46. Huang, Chu-Ren. (2003). Sense and Sense-Ability. Presented at the 2003 POLA Workshop in honor of Professor William S.-Y. Wang. Nov. 30-Dec. 1, Aspire Park, Taoyuan.
參考資料
參與研發單位:中央研究院-語言所(文獻語料庫)
提供單位:語言座標計畫
使用單位:語言座標計畫