Achieving 10,000x training data reduction with high-fidelity labels

experiment

I needed to grasp which fashions and duties would profit most from the curation course of. As a baseline for the experiment, we used crowdsource labels to fine-tune two LLMs of various sizes (Gemini Nano-1 with 1.8B parameters, NANO-2 with 3.25B parameters) with two duties of various complexity (primarily based on skilled alignment beneath the decrease one). Every crowdsourced dataset has a robust class of imbalance with ~100k annotations and a robust class of imbalance, with a median of round 95% benign labels.

Every of those 4 baseline situations was in contrast with the corresponding curation situations through which every mannequin (NANO-1 and NANO-2) is fine-tuned in a number of rounds utilizing the curation course of described above. For every iteration, we chosen a set of curated examples and used them for mannequin analysis and fine-tuning as above. All fashions stopped earlier than reaching similar to the skilled inside alignment, thus stopped at six iterations (~400 fine-tuning and ~250 analysis samples) on account of decrease complexity and 5 iterations (~250 fine-tuning and ~150 analysis samples). (Notice that decrease complexity duties have increasingly more completely different examples, which can clarify the very long time required to converge.) Each datasets had constructive examples with a stability of roughly 40% for the ultimate class.

The next desk offers an outline of the size and high quality of the information utilized in every situation. Consultants reached the common pairwise Cohen kappa (decrease complexity activity) and .78 (higher complexity duties) via the curation course of. We think about these to be the ceiling of the mannequin’s efficiency. To evaluate the standard of crowdsourced information, crowdsourced annotations and kappa alignments between specialists have been calculated. That is primarily based on a whole curation set of .59 (decrease complexity) and .41 (greater complexity).

Achieving 10,000x training data reduction with high-fidelity labels

experiment

Leave a Reply Cancel reply

Follow US

Popular News

Tyler, The Creator Zane Lowe Interview: ‘Wanted To Be Silly’

Book Review: The Financial Restructuring Tool Set

Level Up You Racing Experience With the Mad Catz M.2.X. Pro Racing Wheel

authID Revenue Jumps 367% in Q2

Motherhood Is A Spectrum: 6 Questions To See Where You Fall On It

Categories

About US

Quick Links

Important Links

Subscribe US

experiment

Leave a Reply Cancel reply

Follow US

Weekly Newsletter

Popular News

Tyler, The Creator Zane Lowe Interview: ‘Wanted To Be Silly’

Book Review: The Financial Restructuring Tool Set

Level Up You Racing Experience With the Mad Catz M.2.X. Pro Racing Wheel

authID Revenue Jumps 367% in Q2

Motherhood Is A Spectrum: 6 Questions To See Where You Fall On It

Categories

About US

Quick Links

Important Links

Subscribe US