experiment
The experiment was performed on 4 information units. The three datasets correspond to at least one dataset with downstream era and classification duties. Era duties are normally harder than classification duties. It is because the era job is evaluated by the accuracy of the following token prediction, and artificial information is required to carry fine-grained textual info from the personal information. In distinction, classification duties solely require sustaining co-occurrence patterns between labels and phrases in personal information.
Three era duties are chosen to cowl a various set of sensible situations: PubMed (medical paper abstract), chatbot area (human-machine interactions), and multi-session chat (every day human-human interactions). To evaluate the standard of the generated artificial information, we educated a small downstream language mannequin of artificial information in line with the AUG-PE setup and calculated the next token prediction accuracy with precise take a look at information.
The classification job is carried out on the OpenReview dataset. To evaluate the standard of the generated artificial information, we prepare a classifier downstream of the artificial information to calculate the classification accuracy of the particular take a look at information.
Chosen datasets have been fastidiously analyzed to alleviate issues about information contamination. Our evaluation confirmed no overlap between pre-training information and downstream datasets.