Post-processing data
To simplify downstream analysis of sequencing data, import_seq
can create links to FASTQ files for each sequencing run. Finding all FASTQ files for a specific run is then trivial and doesn’t require any information beyong the run reference. Run import_seq
with:
import_seq --bulk 'LABXDB_EXP' \
--path_seq_raw '/data/seq/raw' \
--path_seq_run '/data/seq/by_run' \
--make_import
Output will display the imported runs:
Summary
resa-2h-1 HWI-ST1144:496:H8F1PADXX GATCAG AGR000001 /data/seq/raw/H8F1PADXX
resa-6h-1 HWI-ST1144:496:H8F1PADXX ATGTCA AGR000002 /data/seq/raw/H8F1PADXX
resa-6h-2a HWI-ST1144:496:H8F1PADXX CCGTCC AGR000003 /data/seq/raw/H8F1PADXX
resa-6h-2b HWI-D00306:231:H916YADXX CCGTCC AGR000004 /data/seq/raw/H8F1PADXX
And the following links will be created:
by_run/
├── AGR000001
│ └── resa-2h-1_R1.fastq.zst -> ../../raw/H8F1PADXX/resa-2h-1_R1.fastq.zst
├── AGR000002
│ └── resa-6h-1_R1.fastq.zst -> ../../raw/H8F1PADXX/resa-6h-1_R1.fastq.zst
├── AGR000003
│ └── resa-6h-2a_R1.fastq.zst -> ../../raw/H8F1PADXX/resa-6h-2a_R1.fastq.zst
└── AGR000004
└── resa-6h-2b_R1.fastq.zst -> ../../raw/H8F1PADXX/resa-6h-2b_R1.fastq.zst