Post-processing data

To simplify downstream analysis of sequencing data, import_seq can create links to FASTQ files for each sequencing run. Finding all FASTQ files for a specific run is then trivial and doesn’t require any information beyong the run reference. Run import_seq with:

import_seq --bulk 'LABXDB_EXP' \
           --path_seq_raw '/data/seq/raw' \
           --path_seq_run '/data/seq/by_run' \
           --make_import

Output will display the imported runs:

Summary
resa-2h-1                     HWI-ST1144:496:H8F1PADXX      GATCAG                        AGR000001      /data/seq/raw/H8F1PADXX
resa-6h-1                     HWI-ST1144:496:H8F1PADXX      ATGTCA                        AGR000002      /data/seq/raw/H8F1PADXX
resa-6h-2a                    HWI-ST1144:496:H8F1PADXX      CCGTCC                        AGR000003      /data/seq/raw/H8F1PADXX
resa-6h-2b                    HWI-D00306:231:H916YADXX      CCGTCC                        AGR000004      /data/seq/raw/H8F1PADXX

And the following links will be created:

by_run/
├── AGR000001
│   └── resa-2h-1_R1.fastq.zst -> ../../raw/H8F1PADXX/resa-2h-1_R1.fastq.zst
├── AGR000002
│   └── resa-6h-1_R1.fastq.zst -> ../../raw/H8F1PADXX/resa-6h-1_R1.fastq.zst
├── AGR000003
│   └── resa-6h-2a_R1.fastq.zst -> ../../raw/H8F1PADXX/resa-6h-2a_R1.fastq.zst
└── AGR000004
    └── resa-6h-2b_R1.fastq.zst -> ../../raw/H8F1PADXX/resa-6h-2b_R1.fastq.zst