High-throughput isoform-wide miRNome sequence reconstruction in the TCGA-LUAD cohort using FAS2rDNA

DOI: https://dx.doi.org/10.17504/protocols.io.rm7vzenqxvx1/v1

Abstract: Large-scale miRNome studies frequently rely on coordinate-based annotations or raw sequencing datasets that are computationally expensive to reprocess and difficult to integrate into sequence-centric analytical workflows. This protocol presents an isoform-wide reconstruction of miRNA sequences from the TCGA-LUAD cohort using FAS2rDNA, enabling direct derivation of strand-aware nucleotide sequences without reanalyzing bulk sequencing data. By reconstructing sequences directly from genomic coordinates, the workflow provides a faster, more scalable, and reproducible alternative for generating miRNA isoform–resolved FASTA datasets. The reconstructed miRNome sequences generated through this protocol are directly applicable to machine learning–based modeling, isoform-level molecular discovery, and integrative miRNA landscape analysis. Applied to the TCGA-LUAD cohort, this workflow facilitates high-resolution exploration of miRNA isoform diversity with the broader objective of improving molecular understanding of lung adenocarcinoma and supporting data-driven strategies aimed at reducing cancer-related mortality.

FAS2rDNA-Colab: A cloud-based workflow for pan-cancer, isoform-wide miRNome reconstitution across TCGA cohorts

DOI: https://dx.doi.org/10.17504/protocols.io.14egn1xr6v5d/v1

Abstract: MicroRNA (miRNA) sequence composition and isoform diversity play important roles in post-transcriptional regulation and contribute to biological variability across cancer types. Large-scale resources such as The Cancer Genome Atlas (TCGA) provide a standardized foundation for exploratory miRNome research; however, TCGA miRNA datasets are typically distributed as expression matrices without direct access to reconstructed, isoform-resolved sequence outputs. This limits the application of sequence-based analyses, including pan-cancer comparisons and machine learning workflows that require explicit nucleotide representations. FAS2rDNA-Colab is a cloud-based workflow that reconstructs FASTA-formatted DNA/cDNA sequences using genomic coordinates/annotations. This protocol extends FAS2rDNA-Colab for the reconstitution of isoform-wide miRNome sequences from TCGA-derived miRNA expression data. By reconstituting FASTA-formatted miRNA sequences across multiple cancer cohorts, the protocol enables pan-cancer and isoform-level comparisons without reliance on predefined probe sets or raw sequencing reprocessing. The resulting reconstructed miRNomes can be used for sequence validation, exploratory comparative analyses, and downstream computational modeling.