歷年優秀論文獎助 > 2023優秀學位論文得獎名單 > 陳倩琪
研究生 | 陳倩琪 |
研究生(外文) | Sin Kei Chan |
論文名稱 | 基於機器學習之早期預測學術文獻影響力研究 |
論文名稱(外文) | Using Machine Learning for Early Academic Influence Prediction |
指導教授 | 張嘉玲 |
指導教授(外文) | Chia-ling Chang |
口試委員 | 劉譯閎、林逸農 |
口試日期 | 2023-06-19 |
學位類別 | 碩士 |
校院名稱 | 淡江大學 |
系所名稱 | 資訊與圖書館學研究所 |
論文出版年 | 2023 |
畢業學年度 | 111 |
語文別 | 中文 |
論文頁數 | 73 |
中文關鍵詞 | 學術影響力、早期預測、冷啟動問題、資料不平衡 |
外文關鍵詞 | Academic Influence、Early Prediction、Cold Start Problem、Data Imbalance |
中文摘要 | 學術影響力(Academic influence)常用於評鑑學術領域或機構,藉以反映當前學術領域的發展現況。該指標的優秀之處可供借鑑,不足之處則可加以改良,進而促進學術進步,因此一直受到各界關注。相較於過去預測學術影響力,剛發表的文獻,因冷啟動問題無法作預測,本研究專注於早期預測學術文獻影響力,在文獻發表之時立即預測其未來是否能獲得高被引次數,藉以觀察當前學術領域的現況和發展趨勢,並為研究者、研究機構、期刊編輯與審稿人提供參考依據。 故此,本研究使用機器學習建立早期預測學術文獻影響力模型,以文獻內容、作者、期刊,三構面進行預測,其特色是加入Scopus與JCR的期刊評鑑指標,目的更全面地描述學術文獻影響力的構成以提高模型的預測表現。再者,鑒於學科領域的差異,本研究選擇了圖書資訊學領域作為研究對象。 在研究中,我們發現並解決了學術文獻影響力的資料不平衡問題。我們使用了SMOTE,並採用Ensemble的Stacking架構,在模型中第一階段使用Long Short-Term Memory、Multilayer Perceptron、Support Vector Machine、Logistic Regression、Random Forest、XGBoost、Naive Bayes作預測,第二階段採用Logistic Regression作預測。實驗結果顯示,本研究提出的方法優於其他演算法模型,並且具有能夠在文獻發表之時立即預測其未來是否能獲得高被引次數的能力。 |
外文摘要 | Academic influence is frequently employed to assess academic domains or institutions, serving as a reflection of the current state of development within the academic field. As a result, it has garnered continuous attention from various sectors. Compared to previous efforts in predicting academic influence, recently published literature faces challenges related to Cold Start Problem that hinder accurate predictions. This study focuses on early academic influence prediction, aiming to predict at the time of publication whether a paper will receive high citation counts in the future. This approach allows for the observation of the current state and developmental trends in the academic field, providing a reference for researchers, research institutions, journal editors, and reviewers. Therefore, a machine learning model is developed for early academic influence prediction for research papers. This model utilizes three dimensions: the content of the paper, the authors, and the journals. Its notable feature is the inclusion of Scopus and JCR (Journal Citation Reports) journal evaluation metrics. Considering the differences in various disciplines, this study chooses the field of Library and Information Science as its research focus. In the study, we tackled data imbalance using the SMOTE approach and Ensemble Stacking. The model's first stage combined prediction methods including Long Short-Term Memory, Multilayer Perceptron, Support Vector Machine, Logistic Regression, Random Forest, XGBoost, and Naive Bayes. In the second stage, Logistic Regression was employed. Experimental results demonstrate the superiority of our proposed method over other algorithms. Furthermore, it accurately predicts high citation counts for future-published papers. |
連結 | 臺灣博碩士論文知識加值系統 |