Pipeline · four stages
From noisy references to a written related-work section.
Each stage of the pipeline has its own failure modes. Pick a stage to see the raw inputs, the transforms applied, and the resulting record — on real sample records from the dataset.
Phase 01Live
Cleaning
Normalize · Validate · Drop
Raw scraped records carry unicode noise, table-of-contents pseudo-abstracts, missing text, and citation-only related-work. This stage flags and removes them.
Phase 02Coming next
Extractive
Salient sentence selection
Pick the sentences from each cited abstract that carry the reasoning most relevant to the query paper.
Phase 03Coming next
Abstractive
Draft the paragraph
Generate a fluent related-work paragraph from the extracted evidence and the query abstract.
Phase 04Coming next
Compare
Model vs. ground truth
Set the generated paragraph next to the paper's actual related-work section, side by side.