Using Lift to Turn Research PDFs into Structured JSON with Controlled, Schema-Guided Field-Level Evaluation

def render_pdf(d, path): “””Attracts a sensible 3-page report. A web page break is compelled in order that the heading metrics (abstract) on web page 1 are bodily separated from the outcomes desk on web page 3.””” from reportlab.lib.pagesizes import LETTER from reportlab.lib.types import getSampleStyleSheet, ParagraphStyle from reportlab.lib.models importinch from reportlab.lib import Colours from reportlab.platypus import (SimpleDocTemplate, Paragraph, Spacer, Desk, TableStyle, PageBreak) ss = getSampleStyleSheet() H1 = ParagraphStyle(“H1”, guardian=ss)[“Title”]fontSize=16, main=20, spaceAfter=6) AUTH = ParagraphStyle(“AUTH”, guardian=ss)[“Normal”]fontSize=9.5, textColor=colours.gray, spaceAfter=10) H2 = ParagraphStyle(“H2”, guardian=ss)[“Heading2”]fontSize=12, spaceBefore=8, spaceAfter=4) BODY = ParagraphStyle(“BODY”,guardian=ss)[“Normal”]fontSize=10, main=14, spaceAfter=6) sota_phrase = (f”Higher than earlier greatest {d[‘prior_best’]}” within the case of[“beats_sota”] In any other case, it’s near f” however has not exceeded the earlier highest worth of {d.[‘prior_best’]}”) authors_line = “, “.be part of(f”{n} ({a})” for (n, a) in d[“authors”]) Story = []
Story += [Paragraph(d[“title”]H1), paragraph(creator line, AUTH), paragraph(“abstract”, H2)]Story += [Paragraph(
f”We introduce {d[‘method’]}, {mannequin of d[‘task’]}. {d in[‘primary_benchmark’]} “f” benchmark, {d[‘method’]} achieves {d[‘test_acc’]} {d[‘metric_name’]} For deferred ” f” take a look at set, {sota_phrase}. our {d[‘params_m’]}M parameter mannequin is ” f”{len(d[‘datasets’])} dataset ({‘, ‘.be part of(d[‘datasets’])}). “f” In depth ablation confirms the contribution of every element. “, BODY)]Story += [Paragraph(“Keywords”, H2),
Paragraph(f”{d[‘task’]};Illustration studying. {d[‘primary_benchmark’]}”, BODY), PageBreak()]Story += [Paragraph(“1 Method and Training Details”, H2)]
Story += [Paragraph(
f”{d[‘method’]} is skilled end-to-end utilizing {d[‘optimizer’]} Optimizer. “f” Adjusts the validation break up and reviews the ultimate numbers for the take a look at break up. The entire coaching configuration is summarized in Desk 1. “, BODY)]hp = [[“Hyperparameter”, “Value”],
[“Optimizer”, d[“optimizer”]],
[“Learning rate”, str(d[“lr”])],
[“Batch size”, str(d[“batch”])],
[“Epochs”, str(d[“epochs”])],
[“Parameters”, f”{d[‘params_m’]}M”]]t1 = desk(hp,colWidths=[2.4 * inch, 2.0 * inch]) t1.setStyle(TableStyle([
(“BACKGROUND”, (0, 0), (-1, 0), colors.HexColor(“#2b3a67”)),
(“TEXTCOLOR”, (0, 0), (-1, 0), colors.white),
(“FONTSIZE”, (0, 0), (-1, -1), 9.5),
(“GRID”, (0, 0), (-1, -1), 0.4, colors.grey),
(“ROWBACKGROUNDS”, (0, 1), (-1, -1), [colors.white, colors.HexColor(“#eef1f8”)]), (“LEFTPADDING”, (0, 0), (-1, -1), 8), (“TOPPADDING”, (0, 0), (-1, -1), 4), (“BOTTOMPADDING”, (0, 0), (-1, -1), 4)])) Story += [Spacer(1, 4), t1, Spacer(1, 6),
Paragraph(“Table 1. Training configuration.”, BODY),
Paragraph(“2 Datasets”, H2),
Paragraph(
f”We evaluate on {‘, ‘.join(d[‘datasets’])}. {d[‘primary_benchmark’]} is the principle benchmark for ‘f’. The remaining dataset is used for generalization ” f”research.”, BODY), PageBreak()]story += [Paragraph(“3 Results”, H2)]
decision = [[“Method”, f”Val. {d[‘metric_name’]}”, f”take a look at{d[‘metric_name’]}”],
[f”{d[‘baseline_name’]} (baseline)”, str(d[“baseline_val”]), str(d[“baseline_test”])],
[f”{d[‘method’]} (our)”, str(d[“val_acc”]), str(d[“test_acc”])]]t2 = desk(res, colWidths=[2.6 * inch, 1.7 * inch, 1.7 * inch]) t2.setStyle(TableStyle([
(“BACKGROUND”, (0, 0), (-1, 0), colors.HexColor(“#7a2e2e”)),
(“TEXTCOLOR”, (0, 0), (-1, 0), colors.white),
(“FONTSIZE”, (0, 0), (-1, -1), 9.5),
(“GRID”, (0, 0), (-1, -1), 0.4, colors.grey),
(“FONTNAME”, (0, 2), (-1, 2), “Helvetica-Bold”),
(“ROWBACKGROUNDS”, (0, 1), (-1, -1), [colors.white, colors.HexColor(“#f7eeee”)]), (“LEFTPADDING”, (0, 0), (-1, -1), 8), (“TOPPADDING”, (0, 0), (-1, -1), 4), (“BOTTOMPADDING”, (0, 0), (-1, -1), 4)])) Story += [Spacer(1, 4), t2, Spacer(1, 6),
Paragraph(f”Table 2. Results on {d[‘primary_benchmark’]}. “f” in daring is one of the best take a look at consequence. “, BODY), Paragraph(“4 restrict”, H2)]for lim in d[“limitations”]: Story += [Paragraph(“• ” + lim, BODY)]
Story += [Paragraph(“5 Funding and Code Availability”, H2),
Paragraph(d[“funding_note”]BODY)]SimpleDocTemplate(path, pagesize=LETTER, topMargin=0.8 * inches, bottomMargin=0.8 * inches, leftMargin=0.9 * inches, rightMargin=0.9 * inches).construct(story) print(“Step 3/7 · Generate the synthesis report PDF…”) CORPUS = []
for i, d in enumerate(DOCS): path = f”/content material/report_{i}.pdf” if os.path.isdir(“/content material”) else f”report_{i}.pdf” render_pdf(d, path) CORPUS.append((d, ground_truth(d), path)) print(f” ✓ {os.path.basename(path)} — {d[‘method’]}”) print() if SHOW_FIRST_PAGE: strive: import pypdfium2 as pdfium, matplotlib.pyplot as plt pg = pdfium.PdfDocument(CORPUS[0][2])[0]
img = pg.render(scale=2.0).to_pil() plt.determine(figsize=(6.4, 8.3)); plt.imshow(img); plt.axis(“off”) plt.title(“Elevate contents — web page 1 of report_0.pdf”, fontsize=10); plt.present() e: print(” (Web page preview skipped:”, e, Besides as “)n”)

Using Lift to Turn Research PDFs into Structured JSON with Controlled, Schema-Guided Field-Level Evaluation

Leave a Reply Cancel reply

Follow US

Popular News

Nancy Guthrie missing case: The influencer circus on TikTok and YouTube.

Application for quotation of securities – BSN

The Complete Guide to Using BPC-157 and TB-500

Quick Mediterranean Chicken Rice Bowl (High Protein, 15 Minutes)

8.29 Friday Faves – The Fitnessista

Categories

About US

Quick Links

Important Links

Subscribe US