站在巨人的肩膀上
搜索结果
约 2,221,951 条结果
Evaluating the Effectiveness of Large Language Models in Representing Textual Descriptions of Geometry and Spatial Relations (Short Paper)
T. B. Brown, Benjamin F. Mann - DROPS (Schloss Dagstuhl – Leibniz Center for Informatics) - 2023This research focuses on assessing the ability of large language models (LLMs) in representing geometries and their spatial relations. We utilize LLMs including GPT-2 and BERT to encode the well-known text (WKT) format of geometries and then feed their embeddings into classifiers and regressors to evaluate the effectiveness of the LLMs-generated embeddings for geometric attributes. The experiments demonstrate that wh…被引用次数:14,086ChatGPT for good? On opportunities and challenges of large language models for education
Enkelejda Kasneci, Kathrin Seßler, Stefan Küchemann, Maria Bannert - Learning and Individual Differences - 2023该记录暂无摘要,您可以通过来源链接查看详细信息。被引用次数:4,031Large language models in medicine
Arun James Thirunavukarasu, Darren Shu Jeng Ting, Kabilan Elangovan, Laura Gutiérrez - Nature Medicine - 2023该记录暂无摘要,您可以通过来源链接查看详细信息。被引用次数:2,772Large language models encode clinical knowledge
Karan Singhal, Shekoofeh Azizi, Tao Tu, S. Sara Mahdavi - Nature - 2023Abstract Large language models (LLMs) have demonstrated impressive capabilities, but the bar for clinical applications is high. Attempts to assess the clinical knowledge of models typically rely on automated evaluations based on limited benchmarks. Here, to address these limitations, we present MultiMedQA, a benchmark combining six existing medical question answering datasets spanning professional medicine, research…被引用次数:2,555Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models
Tiffany H. Kung, Morgan Cheatham, Arielle Medenilla, Czarina Sillos - PLOS Digital Health - 2023We evaluated the performance of a large language model called ChatGPT on the United States Medical Licensing Exam (USMLE), which consists of three exams: Step 1, Step 2CK, and Step 3. ChatGPT performed at or near the passing threshold for all three exams without any specialized training or reinforcement. Additionally, ChatGPT demonstrated a high level of concordance and insight in its explanations. These results sugg…被引用次数:3,253A Survey on Evaluation of Large Language Models
Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu - ACM Transactions on Intelligent Systems and Technology - 2024Large language models (LLMs) are gaining increasing popularity in both academia and industry, owing to their unprecedented performance in various applications. As LLMs continue to play a vital role in both research and daily use, their evaluation becomes increasingly critical, not only at the task level, but also at the society level for better understanding of their potential risks. Over the past years, significant…被引用次数:2,015LoRA: Low-Rank Adaptation of Large Language Models
J. Edward Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu - arXiv (Cornell University) - 2021An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example -- deploying independent instances of fine-tuned models, each with 175B parameters, is prohibitively expensive. We propo…被引用次数:2,366A Survey of Large Language Models
Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang - arXiv (Cornell University) - 2023Language is essentially a complex, intricate system of human expressions governed by grammatical rules. It poses a significant challenge to develop capable AI algorithms for comprehending and grasping a language. As a major approach, language modeling has been widely studied for language understanding and generation in the past two decades, evolving from statistical language models to neural language models. Recently…被引用次数:1,362Emergent Abilities of Large Language Models
Jason Lee, Yi Tay, Rishi Bommasani, Colin Raffel - arXiv (Cornell University) - 2022Scaling up language models has been shown to predictably improve performance and sample efficiency on a wide range of downstream tasks. This paper instead discusses an unpredictable phenomenon that we refer to as emergent abilities of large language models. We consider an ability to be emergent if it is not present in smaller models but is present in larger models. Thus, emergent abilities cannot be predicted simply…被引用次数:1,013Evaluating Large Language Models Trained on Code
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan - arXiv (Cornell University) - 2021We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities. A distinct production version of Codex powers GitHub Copilot. On HumanEval, a new evaluation set we release to measure functional correctness for synthesizing programs from docstrings, our model solves 28.8% of the problems, while GPT-3 solves 0% and GPT-J solves 11.4%. Furthermo…被引用次数:1,397