《計算機應用研究》|Application Research of Computers

基于類別信息和特征熵的文本特征權重計算

Feature weighting scheme based on category information and term entropy

免費全文下載 (已被下載 次)  
獲取PDF全文
作者 阿力木江·艾沙,殷曉雨,庫爾班·吾布力,李喆
機構 新疆大學 a.網絡與信息技術中心;b.信息科學與工程學院,烏魯木齊 830046
統計 摘要被查看 次,已被下載
文章編號 1001-3695(2019)11-007-3237-03
DOI 10.19734/j.issn.1001-3695.2018.05.0294
摘要 基于類別信息的特征權重計算方法對特征與類別的關系表達不夠準確,即對于類別頻率相同的特征無法比較其對類別的區分能力,因此要考慮特征在類內的分布情況。將特征的反類別頻率(inverse category frequency,ICF)和類內熵(entropy)相結合引入到特征權重計算方案中,構造了兩種有監督特征權重計算方案。在維吾爾文文本分類語料上進行的實驗結果表明,該方法能夠明顯改善樣本的空間分布狀態并提高維吾爾文文本分類的微平均<i>F</i><sub>1</sub>值。
關鍵詞 文本分類; 文本特征; 權重計算; 類別頻率
基金項目 新疆維吾爾自治區自然科學基金資助項目(2016D01C068)
本文URL http://www.ziusle.tw/article/01-2019-11-007.html
英文標題 Feature weighting scheme based on category information and term entropy
作者英文名 Alimjan Aysa, Yin Xiaoyu, Kurban Ubul, Li Zhe
機構英文名 a.Network & Information Technology Center,b.School of Information Science & Engineering,Xinjiang University,Urumqi 830046,China
英文摘要 Feature weighting schemes based on category information is not accurate enough to express the relationship between features and categories. That is the classification ability of the features with the same category frequency can't be compared, so the distribution of the features in the category should be considered. This paper combined the inverse category frequency(ICF) and inner category entropy of the features into the term weight calculation, and constructed two supervised feature weighting schemes. The experimental results on the Uygur text categorization dataset show that this method can obviously improve the spatial distribution of the samples and improve the micro average <i>F</i><sub>1</sub> value of the Uygur text classification.
英文關鍵詞 text classification; text feature; term weighting; category frequency
參考文獻 查看稿件參考文獻
 
收稿日期 2018/5/7
修回日期 2018/6/27
頁碼 3237-3239,3285
中圖分類號 TP391.1
文獻標志碼 A
中超外援名额