POST /results/thematic-blocks
Get the results section of a scientific paper structured into distinct "thematic blocks" where each block corresponds to a thematically distinct portion of the paper's method.
Input: DOI or a PDF of a scientific paper.
Request
Content-Type: multipart/form-data
Accepted keys: doi
and file
JSON Body (if using DOI):
{
"doi": "10.1038/s41592-020-01005-z"
}
JSON Body (if uploading a PDF):
file: "path_to_file.pdf"
Note: If both a doi
and a file
key are present in the request, the API uses the file key by default.
Response
{
"doi": "10.3390/ijms22031484",
"paper_input_type": "doi",
"pdf_url": "https://europepmc.org/articles/pmc7867340?pdf=render",
"pmcid": "PMC7867340",
"results_thematic_blocks": [
{
"block_id": "block_5b84e8fcbdebc354daa50a68d",
"objective_summary": "Describes the bioinformatic pipeline used to create a more complete annotation of lncRNAs in VSMCs from various datasets, including validation steps and initial assessment of newly assembled lncRNAs in plaque tissue.",
"findings_main_text": "2. Results\n\n2.1. A Bioinformatic Approach to Provide a More Complete Annotation of lncRNAs Expressed in VSMCs in Basal and Pathological Conditions\n\nSeveral LncRNAs are known to be involved in VSMC phenotypic transitions occurring in vessel wall remodelling [10–12]. However, a full accounting of lncRNAs expressed in these transitions does not yet exist. Accordingly, to gain in-depth representation of the lncRNAs expressed in pathological VSMCs, we applied a transcript discovery pipeline to published VSMC RNAseq datasets selected based on specific criteria (Figure 1a). We focused on high depth, paired-end total RNA sequencing datasets to identify all lncRNAs and their gene structures (including non-polyA tailed and/or lowly expressed lncRNAs) and we selected two in vitro datasets fulfilling these criteria. The first dataset describes primary human saphenous vein SMCs (svSMCs) either quiesced in 0.2% FBS or treated with interleukin-1a (Il-1a) and/or platelet-derived growth factor-BB (PDGF-BB) [10]. The second dataset describes primary aortic (aoSMCs) or coronary artery (caSMCs) VSMCs, plated in 5% FBS media onto soft or stiff culture matrices [21]. These conditions model a convergence of pro-inflammatory and pro-mitogenic signals, or mechanical stretch in the vessel wall, both of which promote proliferation and disruption of contractility. We also selected an in vivo dataset describing VSMCs isolated and sequenced directly from enzymatically digested carotid plaques derived from symptomatic or asymptomatic patients, defined as such based on lumen size and occurrence of cardiovascular events prior to surgery [22]. Together, these three datasets document a broad span of VSMC types and phenotypes contributing to vessel wall remodelling.\n\nThe transcriptome analysis used GENCODE annotation as a reference and included a transcript assembly step to further identify transcripts not previously described in GENCODE (newly assembled transcripts). This approach allows the identification of novel isoforms for GENCODE genes, but also allows the identification of novel genes (newly assembled genes). This analysis was carried out independently for the three datasets. These expanded transcriptomes consisted of ~80,000–90,000 transcripts in total with 0.6–1.5% transcribed from newly assembled genes (Table S1). To identify high confidence lncRNAs from these complete transcriptomes, we used “Pipeline for Annotation of LncRNAs” (PLAR) [23], which filters lowly-expressed or artefactual transcripts and assesses coding potential based on three distinct tools. As expected, the bulk of expressed transcripts were annotated as protein coding (Figure S1A). However, a high confidence set of ~2500–3000 lncRNA annotations were predicted within each transcriptome, with 6–7% deriving from newly assembled gene loci (Table S1). Our analysis of robustly expressed genes showed that newly assembled lncRNA genes produced transcripts with comparable lengths to transcripts from GENCODE protein-coding genes (PCGs) or GENCODE lncRNA genes (Figure S1B–E). Of these genes, newly assembled and GENCODE lncRNAs have a lower expression compared to PCGs, as expected. Further, the GENCODE lncRNA genes were also more abundant than the newly assembled lncRNA genes but only by a median difference of ~1 FPKM (Figure S1F–I).\n\nTo show the validity of the transcript discovery pipeline for lncRNA identification, we assessed if the newly assembled lncRNA transcripts identified in the three datasets were observed in other reference databases. Using GFFcompare [24], we cross-referenced expressed GENCODE and newly assembled lncRNA transcript structures to transcripts annotated in FANTOM CAT [15], a particularly extensive reference annotation. We observed 72% of GENCODE and 40% of newly assembled lncRNAs matched to a FANTOM transcript containing the exact same chain of introns whilst another 25% of GENCODE and 40% of newly assembled lncRNAs contained at least 1 matching splice junction site (Figure 1b). The validation of the complete or partial gene structures for a large proportion of the newly assembled lncRNAs in other annotation sets (derived from other contexts) provide confidence in the identified transcripts and evidence of the lncRNA expression in different datasets. To find further corroborating evidence of transcription for our lncRNAs, we used FANTOM CAT CAGEseq data which accurately defines transcription start sites (TSSs) in ~1800 distinct human samples through sequencing the site of RNA 5′ capping [15]. We identified 74.3% of newly assembled and 86.9% of GENCODE lncRNA genes across all datasets matched to experimentally validated TSSs in FANTOM CAGEseq data (hereafter referred as CAGE-matched lncRNAs) (Figure 1c). The position of these CAGEseq matches indicates the first exons of newly assembled lncRNAs from our analysed datasets were largely complete at their 5′ ends (incomplete by median of 8% of their initial size) (Figure S2).\n\nTo gain perspective on the in vivo relevance of newly assembled lncRNAs, we assayed their expression in an RNAseq dataset of carotid plaque tissue. VSMCs are major components of atherosclerosis plaque, with many demonstrating phenotypic modulation [25]. To assess all newly assembled lncRNAs simultaneously, we merged the three expanded annotations into a non-redundant transcriptome containing 255 newly assembled lncRNA transcripts from 207 lncRNA genes (Figure 1d). Analysis of the plaque-derived RNAseq with this non-redundant merged transcriptome demonstrated that 50 (24%) of the newly assembled lncRNA genes were detectable in whole plaque tissue. This is a substantial detection rate, considering that plaques are heterogenous and contain non-VSMC cells contributing to the RNAseq. In addition, the newly assembled lncRNAs were identified in VSMCs from different vessel types grown in distinct conditions and so might not be expressed in plaques. Interestingly, these 50 lncRNAs come from all four independent transcriptomes (Figure 1e), showing each new annotation provides in vivo relevant transcripts. Notably, 24 of these lncRNAs were identified exclusively using the svSMC dataset showing this annotation particularly improved coverage of plaque-expressed lncRNAs.\n\nTogether, these analyses expand the representation of lncRNAs expressed in basal and pathological VSMCs in vitro and in vivo and provide confidence in the newly assembled gene structures.",
"explicit_references": [
"10–12",
"Figure 1a",
"10",
"21",
"22",
"Table S1",
"23",
"Figure S1A",
"Table S1",
"Figure S1B–E",
"Figure S1F–I",
"24",
"15",
"Figure 1b",
"15",
"Figure 1c",
"Figure S2",
"25",
"Figure 1d",
"Figure 1e"
],
"hypotheses": [
"Accordingly, to gain in-depth representation of the lncRNAs expressed in pathological VSMCs, we applied a transcript discovery pipeline to published VSMC RNAseq datasets selected based on specific criteria (Figure 1a).",
"To identify high confidence lncRNAs from these complete transcriptomes, we used “Pipeline for Annotation of LncRNAs” (PLAR) [23], which filters lowly-expressed or artefactual transcripts and assesses coding potential based on three distinct tools.",
"To show the validity of the transcript discovery pipeline for lncRNA identification, we assessed if the newly assembled lncRNA transcripts identified in the three datasets were observed in other reference databases.",
"To find further corroborating evidence of transcription for our lncRNAs, we used FANTOM CAT CAGEseq data which accurately defines transcription start sites (TSSs) in ~1800 distinct human samples through sequencing the site of RNA 5′ capping [15].",
"To gain perspective on the in vivo relevance of newly assembled lncRNAs, we assayed their expression in an RNAseq dataset of carotid plaque tissue."
],
"doi": "10.3390/ijms22031484",
"block_order_in_paper": 1,
"source_section": "results",
"word_count": 919
},
],
}
Last updated