搜索结果

约 194,157 条结果

  1. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

    Nils Reimers, Iryna Gurevych - 2019
    Nils Reimers, Iryna Gurevych. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019.
    被引用次数:9,693
  2. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

    Victor Sanh, Lysandre Debut, Julien Chaumond, Thomas Wolf - arXiv (Cornell University) - 2019
    As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large models in on-the-edge and/or under constrained computational training or inference budgets remains challenging. In this work, we propose a method to pre-train a smaller general-purpose language representation model, called DistilBERT, which can then be fine-tuned with good perfor…
    被引用次数:4,555
  3. ALBERT: A Lite BERT for Self-supervised Learning of Language\n Representations

    Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel - arXiv (Cornell University) - 2019
    Increasing model size when pretraining natural language representations often\nresults in improved performance on downstream tasks. However, at some point\nfurther model increases become harder due to GPU/TPU memory limitations and\nlonger training times. To address these problems, we present two\nparameter-reduction techniques to lower memory consumption and increase the\ntraining speed of BERT. Comprehensive empiri…
    被引用次数:4,059
  4. TinyBERT: Distilling BERT for Natural Language Understanding

    Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang - 2020
    Language model pre-training, such as BERT, has significantly improved the performances of many natural language processing tasks. However, pre-trained language models are usually computationally expensive, so it is difficult to efficiently execute them on resource-restricted devices. To accelerate inference and reduce model size while maintaining accuracy, we first propose a novel Transformer distillation method that…
    被引用次数:1,534
  5. BERTScore: Evaluating Text Generation with BERT

    Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger - arXiv (Cornell University) - 2019
    We propose BERTScore, an automatic evaluation metric for text generation. Analogously to common metrics, BERTScore computes a similarity score for each token in the candidate sentence with each token in the reference sentence. However, instead of exact matches, we compute token similarity using contextual embeddings. We evaluate using the outputs of 363 machine translation and image captioning systems. BERTScore corr…
    被引用次数:2,010
  6. How Multilingual is Multilingual BERT?

    Telmo Pires, Eva Schlinger, Dan Garrette - 2019
    In this paper, we show that Multilingual BERT (M-BERT), released by Devlin et al. (2018) as a single language model pre-trained from monolingual corpora in 104 languages, is surprisingly good at zero-shot cross-lingual model transfer, in which task-specific annotations in one language are used to fine-tune the model for evaluation in another language. To understand why, we present a large number of probing experiment…
    被引用次数:1,090
  7. How to Fine-Tune BERT for Text Classification?

    Chi Sun, Xipeng Qiu, Yige Xu, Xuanjing Huang - Lecture notes in computer science - 2019
    该记录暂无摘要,您可以通过来源链接查看详细信息。
    被引用次数:1,310
  8. BERT Rediscovers the Classical NLP Pipeline

    Ian Tenney, Dipanjan Das, Ellie Pavlick - 2019
    Pre-trained text encoders have rapidly advanced the state of the art on many NLP tasks. We focus on one such model, BERT, and aim to quantify where linguistic information is captured within the network. We find that the model represents the steps of the traditional NLP pipeline in an interpretable and localizable way, and that the regions responsible for each step appear in the expected sequence: POS tagging, parsing…
    被引用次数:1,233
  9. What Does BERT Learn about the Structure of Language?

    Ganesh Jawahar, Benoît Sagot, Djamé Seddah - 2019
    BERT is a recent language representation model that has surprisingly performed well in diverse language understanding benchmarks. This result indicates the possibility that BERT networks capture structural information about language. In this work, we provide novel support for this claim by performing a series of experiments to unpack the elements of English language structure learned by BERT. Our findings are fourfol…
    被引用次数:1,175
  10. BERT-ATTACK: Adversarial Attack Against BERT Using BERT

    Linyang Li, Ruotian Ma, Qipeng Guo, Xiangyang Xue - 2020
    Adversarial attacks for discrete data (such as texts) have been proved significantly more challenging than continuous data (such as images) since it is difficult to generate adversarial samples with gradient-based methods. Current successful attack methods for texts usually adopt heuristic replacement strategies on the character or word level, which remains challenging to find the optimal solution in the massive spac…
    被引用次数:505