站在巨人的肩膀上
搜索结果
约 194,157 条结果
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Nils Reimers, Iryna Gurevych - 2019Nils Reimers, Iryna Gurevych. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019.被引用次数:9,693DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh, Lysandre Debut, Julien Chaumond, Thomas Wolf - arXiv (Cornell University) - 2019As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large models in on-the-edge and/or under constrained computational training or inference budgets remains challenging. In this work, we propose a method to pre-train a smaller general-purpose language representation model, called DistilBERT, which can then be fine-tuned with good perfor…被引用次数:4,555ALBERT: A Lite BERT for Self-supervised Learning of Language\n Representations
Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel - arXiv (Cornell University) - 2019Increasing model size when pretraining natural language representations often\nresults in improved performance on downstream tasks. However, at some point\nfurther model increases become harder due to GPU/TPU memory limitations and\nlonger training times. To address these problems, we present two\nparameter-reduction techniques to lower memory consumption and increase the\ntraining speed of BERT. Comprehensive empiri…被引用次数:4,059TinyBERT: Distilling BERT for Natural Language Understanding
Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang - 2020Language model pre-training, such as BERT, has significantly improved the performances of many natural language processing tasks. However, pre-trained language models are usually computationally expensive, so it is difficult to efficiently execute them on resource-restricted devices. To accelerate inference and reduce model size while maintaining accuracy, we first propose a novel Transformer distillation method that…被引用次数:1,534BERTScore: Evaluating Text Generation with BERT
Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger - arXiv (Cornell University) - 2019We propose BERTScore, an automatic evaluation metric for text generation. Analogously to common metrics, BERTScore computes a similarity score for each token in the candidate sentence with each token in the reference sentence. However, instead of exact matches, we compute token similarity using contextual embeddings. We evaluate using the outputs of 363 machine translation and image captioning systems. BERTScore corr…被引用次数:2,010How Multilingual is Multilingual BERT?
Telmo Pires, Eva Schlinger, Dan Garrette - 2019In this paper, we show that Multilingual BERT (M-BERT), released by Devlin et al. (2018) as a single language model pre-trained from monolingual corpora in 104 languages, is surprisingly good at zero-shot cross-lingual model transfer, in which task-specific annotations in one language are used to fine-tune the model for evaluation in another language. To understand why, we present a large number of probing experiment…被引用次数:1,090How to Fine-Tune BERT for Text Classification?
Chi Sun, Xipeng Qiu, Yige Xu, Xuanjing Huang - Lecture notes in computer science - 2019该记录暂无摘要,您可以通过来源链接查看详细信息。被引用次数:1,310BERT Rediscovers the Classical NLP Pipeline
Ian Tenney, Dipanjan Das, Ellie Pavlick - 2019Pre-trained text encoders have rapidly advanced the state of the art on many NLP tasks. We focus on one such model, BERT, and aim to quantify where linguistic information is captured within the network. We find that the model represents the steps of the traditional NLP pipeline in an interpretable and localizable way, and that the regions responsible for each step appear in the expected sequence: POS tagging, parsing…被引用次数:1,233What Does BERT Learn about the Structure of Language?
Ganesh Jawahar, Benoît Sagot, Djamé Seddah - 2019BERT is a recent language representation model that has surprisingly performed well in diverse language understanding benchmarks. This result indicates the possibility that BERT networks capture structural information about language. In this work, we provide novel support for this claim by performing a series of experiments to unpack the elements of English language structure learned by BERT. Our findings are fourfol…被引用次数:1,175BERT-ATTACK: Adversarial Attack Against BERT Using BERT
Linyang Li, Ruotian Ma, Qipeng Guo, Xiangyang Xue - 2020Adversarial attacks for discrete data (such as texts) have been proved significantly more challenging than continuous data (such as images) since it is difficult to generate adversarial samples with gradient-based methods. Current successful attack methods for texts usually adopt heuristic replacement strategies on the character or word level, which remains challenging to find the optimal solution in the massive spac…被引用次数:505