about the pretraining datasets #1

tszslovewanpu · 2023-09-21T00:33:34Z

Hello, thanks for your work, you mentioned that the corpus of this training contains 2 million papers collected by the text-mining efforts at CEDER group, does it mean that you get the 2 million papers directly or you take the text-mining method to get the papers(collect, and then parsing) ? hope to get your instruction~

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

about the pretraining datasets #1

about the pretraining datasets #1

tszslovewanpu commented Sep 21, 2023

about the pretraining datasets #1

about the pretraining datasets #1

Comments

tszslovewanpu commented Sep 21, 2023