AllTopicsTodayAllTopicsToday
Notification
Font ResizerAa
  • Home
  • Tech
  • Investing & Finance
  • AI
  • Entertainment
  • Wellness
  • Gaming
  • Movies
Reading: Achieving 10,000x training data reduction with high-fidelity labels
Share
Font ResizerAa
AllTopicsTodayAllTopicsToday
  • Home
  • Blog
  • About Us
  • Contact
Search
  • Home
  • Tech
  • Investing & Finance
  • AI
  • Entertainment
  • Wellness
  • Gaming
  • Movies
Have an existing account? Sign In
Follow US
©AllTopicsToday 2025. All Rights Reserved.
AllTopicsToday > Blog > AI > Achieving 10,000x training data reduction with high-fidelity labels
Open graph.width 800.format jpeg.jpg
AI

Achieving 10,000x training data reduction with high-fidelity labels

AllTopicsToday
Last updated: August 10, 2025 12:09 am
AllTopicsToday
Published: August 10, 2025
Share
SHARE

experiment

I needed to grasp which fashions and duties would profit most from the curation course of. As a baseline for the experiment, we used crowdsource labels to fine-tune two LLMs of various sizes (Gemini Nano-1 with 1.8B parameters, NANO-2 with 3.25B parameters) with two duties of various complexity (primarily based on skilled alignment beneath the decrease one). Every crowdsourced dataset has a robust class of imbalance with ~100k annotations and a robust class of imbalance, with a median of round 95% benign labels.

Every of those 4 baseline situations was in contrast with the corresponding curation situations through which every mannequin (NANO-1 and NANO-2) is fine-tuned in a number of rounds utilizing the curation course of described above. For every iteration, we chosen a set of curated examples and used them for mannequin analysis and fine-tuning as above. All fashions stopped earlier than reaching similar to the skilled inside alignment, thus stopped at six iterations (~400 fine-tuning and ~250 analysis samples) on account of decrease complexity and 5 iterations (~250 fine-tuning and ~150 analysis samples). (Notice that decrease complexity duties have increasingly more completely different examples, which can clarify the very long time required to converge.) Each datasets had constructive examples with a stability of roughly 40% for the ultimate class.

The next desk offers an outline of the size and high quality of the information utilized in every situation. Consultants reached the common pairwise Cohen kappa (decrease complexity activity) and .78 (higher complexity duties) via the curation course of. We think about these to be the ceiling of the mannequin’s efficiency. To evaluate the standard of crowdsourced information, crowdsourced annotations and kappa alignments between specialists have been calculated. That is primarily based on a whole curation set of .59 (decrease complexity) and .41 (greater complexity).

I Tested Kavout: Some Features Surprised Me
Discussing Decision Trees: What Makes a Good Split?
New AI architecture delivers 100x faster reasoning than LLMs with just 1,000 training examples
From scrappy experiment to Wall Street’s invisible backbone
AI May Soon Help You Understand What Your Pet Is Trying to Say
TAGGED:10000xAchievingdatahighfidelitylabelsReductiontraining
Share This Article
Facebook Email Print
Leave a Comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Follow US

Find US on Social Medias
FacebookLike
XFollow
YoutubeSubscribe
TelegramFollow

Weekly Newsletter

Subscribe to our newsletter to get our newest articles instantly!

Popular News
Tyler the creator zane lowe.png
Entertainment

Tyler, The Creator Zane Lowe Interview: ‘Wanted To Be Silly’

AllTopicsToday
AllTopicsToday
August 8, 2025
Book Review: The Financial Restructuring Tool Set
Level Up You Racing Experience With the Mad Catz M.2.X. Pro Racing Wheel
authID Revenue Jumps 367% in Q2
Motherhood Is A Spectrum: 6 Questions To See Where You Fall On It
- Advertisement -
Ad space (1)

Categories

  • Tech
  • Investing & Finance
  • AI
  • Entertainment
  • Wellness
  • Gaming
  • Movies

About US

We believe in the power of information to empower decisions, fuel curiosity, and spark innovation.
Quick Links
  • Home
  • Blog
  • About Us
  • Contact
Important Links
  • About Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
  • Contact

Subscribe US

Subscribe to our newsletter to get our newest articles instantly!

©AllTopicsToday 2025. All Rights Reserved.
1 2
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?