BERT - 学术搜索 | Google学术镜像

搜索结果

约 194,157 条结果

[PDF] aclweb.org
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Nils Reimers, Iryna Gurevych - 2019
Nils Reimers, Iryna Gurevych. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019.
被引用次数：9,693
[PDF] arxiv.org
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh, Lysandre Debut, Julien Chaumond, Thomas Wolf - arXiv (Cornell University) - 2019
As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large models in on-the-edge and/or under constrained computational training or inference budgets remains challenging. In this work, we propose a method to pre-train a smaller general-purpose language representation model, called DistilBERT, which can then be fine-tuned with good perfor…
被引用次数：4,555
[PDF] arxiv.org
ALBERT: A Lite BERT for Self-supervised Learning of Language\n Representations
Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel - arXiv (Cornell University) - 2019
Increasing model size when pretraining natural language representations often\nresults in improved performance on downstream tasks. However, at some point\nfurther model increases become harder due to GPU/TPU memory limitations and\nlonger training times. To address these problems, we present two\nparameter-reduction techniques to lower memory consumption and increase the\ntraining speed of BERT. Comprehensive empiri…
被引用次数：4,059
[PDF] aclweb.org
TinyBERT: Distilling BERT for Natural Language Understanding
Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang - 2020
Language model pre-training, such as BERT, has significantly improved the performances of many natural language processing tasks. However, pre-trained language models are usually computationally expensive, so it is difficult to efficiently execute them on resource-restricted devices. To accelerate inference and reduce model size while maintaining accuracy, we first propose a novel Transformer distillation method that…
被引用次数：1,534
[PDF] arxiv.org
BERTScore: Evaluating Text Generation with BERT
Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger - arXiv (Cornell University) - 2019
We propose BERTScore, an automatic evaluation metric for text generation. Analogously to common metrics, BERTScore computes a similarity score for each token in the candidate sentence with each token in the reference sentence. However, instead of exact matches, we compute token similarity using contextual embeddings. We evaluate using the outputs of 363 machine translation and image captioning systems. BERTScore corr…
被引用次数：2,010
[PDF] aclweb.org
How Multilingual is Multilingual BERT?
Telmo Pires, Eva Schlinger, Dan Garrette - 2019
In this paper, we show that Multilingual BERT (M-BERT), released by Devlin et al. (2018) as a single language model pre-trained from monolingual corpora in 104 languages, is surprisingly good at zero-shot cross-lingual model transfer, in which task-specific annotations in one language are used to fine-tune the model for evaluation in another language. To understand why, we present a large number of probing experiment…
被引用次数：1,090
[PDF] sci-hub.store
How to Fine-Tune BERT for Text Classification?
Chi Sun, Xipeng Qiu, Yige Xu, Xuanjing Huang - Lecture notes in computer science - 2019
该记录暂无摘要，您可以通过来源链接查看详细信息。
被引用次数：1,310
[PDF] aclanthology.org
BERT Rediscovers the Classical NLP Pipeline
Ian Tenney, Dipanjan Das, Ellie Pavlick - 2019
Pre-trained text encoders have rapidly advanced the state of the art on many NLP tasks. We focus on one such model, BERT, and aim to quantify where linguistic information is captured within the network. We find that the model represents the steps of the traditional NLP pipeline in an interpretable and localizable way, and that the regions responsible for each step appear in the expected sequence: POS tagging, parsing…
被引用次数：1,233
[PDF] aclweb.org
What Does BERT Learn about the Structure of Language?
Ganesh Jawahar, Benoît Sagot, Djamé Seddah - 2019
BERT is a recent language representation model that has surprisingly performed well in diverse language understanding benchmarks. This result indicates the possibility that BERT networks capture structural information about language. In this work, we provide novel support for this claim by performing a series of experiments to unpack the elements of English language structure learned by BERT. Our findings are fourfol…
被引用次数：1,175
[PDF] aclweb.org
BERT-ATTACK: Adversarial Attack Against BERT Using BERT
Linyang Li, Ruotian Ma, Qipeng Guo, Xiangyang Xue - 2020
Adversarial attacks for discrete data (such as texts) have been proved significantly more challenging than continuous data (such as images) since it is difficult to generate adversarial samples with gradient-based methods. Current successful attack methods for texts usually adopt heuristic replacement strategies on the character or word level, which remains challenging to find the optimal solution in the massive spac…
被引用次数：505

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

ALBERT: A Lite BERT for Self-supervised Learning of Language\n Representations

TinyBERT: Distilling BERT for Natural Language Understanding

BERTScore: Evaluating Text Generation with BERT

How Multilingual is Multilingual BERT?

How to Fine-Tune BERT for Text Classification?

BERT Rediscovers the Classical NLP Pipeline

What Does BERT Learn about the Structure of Language?

BERT-ATTACK: Adversarial Attack Against BERT Using BERT

学科分类

热门搜索