POST /results/thematic-modules

Get the results section of a scientific paper structured into distinct "thematic modules" where each block corresponds to a modular connection of a method and the results it produced. Thematic modules are mapped to specific thematic blocks.

Input: DOI or a PDF of a scientific paper.

Request

Content-Type: multipart/form-data

Accepted keys: doi and file

JSON Body (if using DOI):

{
  "doi": "10.1038/s41592-020-01005-z"
}

JSON Body (if uploading a PDF):

file: "path_to_file.pdf"

Note: If both a doi and a file key are present in the request, the API uses the file key by default.

Response

{
  "doi": "10.3390/ijms22031484",
  "paper_input_type": "doi",
  "pdf_url": "https://europepmc.org/articles/pmc7867340?pdf=render",
  "pmcid": "PMC7867340",
  "results_thematic_modules": [
       {
            "original_block_id": "block_5b84e8fcbdebc354daa50a68d",
            "processed_data": {
                "method_finding_units": [
                    {
                        "unit_id": "mf_unit_1",
                        "method_description": "A transcript discovery pipeline was applied to published VSMC RNAseq datasets selected based on specific criteria (high depth, paired-end total RNA sequencing) to identify lncRNAs and their gene structures. Two in vitro datasets (primary human svSMCs treated with Il-1a/PDGF-BB or quiesced, and primary aoSMCs/caSMCs on soft/stiff matrices) and one in vivo dataset (VSMCs from carotid plaques of symptomatic/asymptomatic patients) were selected.",
                        "method_text_verbatim": "Accordingly, to gain in-depth representation of the lncRNAs expressed in pathological VSMCs, we applied a transcript discovery pipeline to published VSMC RNAseq datasets selected based on specific criteria (Figure 1a). We focused on high depth, paired-end total RNA sequencing datasets to identify all lncRNAs and their gene structures (including non-polyA tailed and/or lowly expressed lncRNAs) and we selected two in vitro datasets fulfilling these criteria. The first dataset describes primary human saphenous vein SMCs (svSMCs) either quiesced in 0.2% FBS or treated with interleukin-1a (Il-1a) and/or platelet-derived growth factor-BB (PDGF-BB) [10]. The second dataset describes primary aortic (aoSMCs) or coronary artery (caSMCs) VSMCs, plated in 5% FBS media onto soft or stiff culture matrices [21]. These conditions model a convergence of pro-inflammatory and pro-mitogenic signals, or mechanical stretch in the vessel wall, both of which promote proliferation and disruption of contractility. We also selected an in vivo dataset describing VSMCs isolated and sequenced directly from enzymatically digested carotid plaques derived from symptomatic or asymptomatic patients, defined as such based on lumen size and occurrence of cardiovascular events prior to surgery [22].",
                        "finding_description": "Existing knowledge indicates lncRNAs are involved in VSMC transitions, but a full accounting is lacking. The selected datasets document a broad span of VSMC types and phenotypes relevant to vessel wall remodelling.",
                        "finding_text_verbatim": "Several LncRNAs are known to be involved in VSMC phenotypic transitions occurring in vessel wall remodelling [10–12]. However, a full accounting of lncRNAs expressed in these transitions does not yet exist. Together, these three datasets document a broad span of VSMC types and phenotypes contributing to vessel wall remodelling.",
                        "linking_narrative": "To address the lack of a full accounting of lncRNAs in VSMC transitions, a transcript discovery pipeline was applied to carefully selected datasets.",
                        "explicit_references": [
                            "Figure 1a"
                        ],
                        "full_text_verbatim": "2. Results\n\n2.1. A Bioinformatic Approach to Provide a More Complete Annotation of lncRNAs Expressed in VSMCs in Basal and Pathological Conditions\n\nSeveral LncRNAs are known to be involved in VSMC phenotypic transitions occurring in vessel wall remodelling [10–12]. However, a full accounting of lncRNAs expressed in these transitions does not yet exist. Accordingly, to gain in-depth representation of the lncRNAs expressed in pathological VSMCs, we applied a transcript discovery pipeline to published VSMC RNAseq datasets selected based on specific criteria (Figure 1a). We focused on high depth, paired-end total RNA sequencing datasets to identify all lncRNAs and their gene structures (including non-polyA tailed and/or lowly expressed lncRNAs) and we selected two in vitro datasets fulfilling these criteria. The first dataset describes primary human saphenous vein SMCs (svSMCs) either quiesced in 0.2% FBS or treated with interleukin-1a (Il-1a) and/or platelet-derived growth factor-BB (PDGF-BB) [10]. The second dataset describes primary aortic (aoSMCs) or coronary artery (caSMCs) VSMCs, plated in 5% FBS media onto soft or stiff culture matrices [21]. These conditions model a convergence of pro-inflammatory and pro-mitogenic signals, or mechanical stretch in the vessel wall, both of which promote proliferation and disruption of contractility. We also selected an in vivo dataset describing VSMCs isolated and sequenced directly from enzymatically digested carotid plaques derived from symptomatic or asymptomatic patients, defined as such based on lumen size and occurrence of cardiovascular events prior to surgery [22]. Together, these three datasets document a broad span of VSMC types and phenotypes contributing to vessel wall remodelling."
                    },
                    {
                        "unit_id": "mf_unit_2",
                        "method_description": "Transcriptome analysis was performed using GENCODE annotation as a reference, including a transcript assembly step to identify novel transcripts (isoforms and genes). This was done independently for the three datasets. High confidence lncRNAs were identified using “Pipeline for Annotation of LncRNAs” (PLAR) [23], which filters transcripts and assesses coding potential. Analysis of robustly expressed genes compared lengths and expression levels of newly assembled lncRNAs, GENCODE protein-coding genes (PCGs), and GENCODE lncRNA genes.",
                        "method_text_verbatim": "The transcriptome analysis used GENCODE annotation as a reference and included a transcript assembly step to further identify transcripts not previously described in GENCODE (newly assembled transcripts). This approach allows the identification of novel isoforms for GENCODE genes, but also allows the identification of novel genes (newly assembled genes). This analysis was carried out independently for the three datasets. To identify high confidence lncRNAs from these complete transcriptomes, we used “Pipeline for Annotation of LncRNAs” (PLAR) [23], which filters lowly-expressed or artefactual transcripts and assesses coding potential based on three distinct tools. Our analysis of robustly expressed genes showed that newly assembled lncRNA transcripts produced transcripts with comparable lengths to transcripts from GENCODE protein-coding genes (PCGs) or GENCODE lncRNA genes. Of these genes, newly assembled and GENCODE lncRNAs have a lower expression compared to PCGs, as expected. Further, the GENCODE lncRNA genes were also more abundant than the newly assembled lncRNA genes but only by a median difference of ~1 FPKM.",
                        "finding_description": "The expanded transcriptomes contained ~80,000–90,000 transcripts, with 0.6–1.5% from newly assembled genes. The bulk of expressed transcripts were protein coding. A high confidence set of ~2500–3000 lncRNA annotations were predicted per transcriptome, with 6–7% from newly assembled loci. Newly assembled lncRNA transcripts had comparable lengths to GENCODE PCGs and lncRNAs. Newly assembled and GENCODE lncRNAs had lower expression than PCGs, and GENCODE lncRNA genes were slightly more abundant than newly assembled lncRNA genes (~1 FPKM median difference).",
                        "finding_text_verbatim": "These expanded transcriptomes consisted of ~80,000–90,000 transcripts in total with 0.6–1.5% transcribed from newly assembled genes (Table S1). As expected, the bulk of expressed transcripts were annotated as protein coding (Figure S1A). However, a high confidence set of ~2500–3000 lncRNA annotations were predicted within each transcriptome, with 6–7% deriving from newly assembled gene loci (Table S1). Our analysis of robustly expressed genes showed that newly assembled lncRNA transcripts produced transcripts with comparable lengths to transcripts from GENCODE protein-coding genes (PCGs) or GENCODE lncRNA genes (Figure S1B–E). Of these genes, newly assembled and GENCODE lncRNAs have a lower expression compared to PCGs, as expected. Further, the GENCODE lncRNA genes were also more abundant than the newly assembled lncRNA genes but only by a median difference of ~1 FPKM (Figure S1F–I).",
                        "linking_narrative": "Following the dataset selection, the core bioinformatic pipeline was applied.",
                        "explicit_references": [
                            "Table S1",
                            "Figure S1A",
                            "Figure S1B–E",
                            "Figure S1F–I"
                        ],
                        "full_text_verbatim": "The transcriptome analysis used GENCODE annotation as a reference and included a transcript assembly step to further identify transcripts not previously described in GENCODE (newly assembled transcripts). This approach allows the identification of novel isoforms for GENCODE genes, but also allows the identification of novel genes (newly assembled genes). This analysis was carried out independently for the three datasets. These expanded transcriptomes consisted of ~80,000–90,000 transcripts in total with 0.6–1.5% transcribed from newly assembled genes (Table S1). To identify high confidence lncRNAs from these complete transcriptomes, we used “Pipeline for Annotation of LncRNAs” (PLAR) [23], which filters lowly-expressed or artefactual transcripts and assesses coding potential based on three distinct tools. As expected, the bulk of expressed transcripts were annotated as protein coding (Figure S1A). However, a high confidence set of ~2500–3000 lncRNA annotations were predicted within each transcriptome, with 6–7% deriving from newly assembled gene loci (Table S1). Our analysis of robustly expressed genes showed that newly assembled lncRNA transcripts produced transcripts with comparable lengths to transcripts from GENCODE protein-coding genes (PCGs) or GENCODE lncRNA genes (Figure S1B–E). Of these genes, newly assembled and GENCODE lncRNAs have a lower expression compared to PCGs, as expected. Further, the GENCODE lncRNA genes were also more abundant than the newly assembled lncRNA genes but only by a median difference of ~1 FPKM (Figure S1F–I)."
                    },
                    {
                        "unit_id": "mf_unit_3",
                        "method_description": "To validate the identified lncRNAs, their transcript structures were cross-referenced against transcripts annotated in the FANTOM CAT database [15] using GFFcompare [24].",
                        "method_text_verbatim": "To show the validity of the transcript discovery pipeline for lncRNA identification, we assessed if the newly assembled lncRNA transcripts identified in the three datasets were observed in other reference databases. Using GFFcompare [24], we cross-referenced expressed GENCODE and newly assembled lncRNA transcript structures to transcripts annotated in FANTOM CAT [15], a particularly extensive reference annotation.",
                        "finding_description": "72% of GENCODE and 40% of newly assembled lncRNAs matched FANTOM transcripts with exact introns, while another 25% of GENCODE and 40% of newly assembled lncRNAs matched with at least one splice junction. This structural validation in other datasets provides confidence in the identified transcripts.",
                        "finding_text_verbatim": "We observed 72% of GENCODE and 40% of newly assembled lncRNAs matched to a FANTOM transcript containing the exact same chain of introns whilst another 25% of GENCODE and 40% of newly assembled lncRNAs contained at least 1 matching splice junction site (Figure 1b). The validation of the complete or partial gene structures for a large proportion of the newly assembled lncRNAs in other annotation sets (derived from other contexts) provide confidence in the identified transcripts and evidence of the lncRNA expression in different datasets.",
                        "linking_narrative": "To show the validity of the pipeline, the identified lncRNAs were compared to other databases.",
                        "explicit_references": [
                            "Figure 1b"
                        ],
                        "full_text_verbatim": "To show the validity of the transcript discovery pipeline for lncRNA identification, we assessed if the newly assembled lncRNA transcripts identified in the three datasets were observed in other reference databases. Using GFFcompare [24], we cross-referenced expressed GENCODE and newly assembled lncRNA transcript structures to transcripts annotated in FANTOM CAT [15], a particularly extensive reference annotation. We observed 72% of GENCODE and 40% of newly assembled lncRNAs matched to a FANTOM transcript containing the exact same chain of introns whilst another 25% of GENCODE and 40% of newly assembled lncRNAs contained at least 1 matching splice junction site (Figure 1b). The validation of the complete or partial gene structures for a large proportion of the newly assembled lncRNAs in other annotation sets (derived from other contexts) provide confidence in the identified transcripts and evidence of the lncRNA expression in different datasets."
                    },
                },
            ],
        },
    },     
  ],
}

PreviousPOST /results/thematic-blocks NextPOST /figure-legends

Last updated 4 months ago