Dear MiXCR Developers/Community,
I am writing to seek your expert advice on configuring MiXCR for analyzing Bulk BCR/TCR sequencing data from a custom protocol. Our goal is to accurately assemble full-length VDJ sequences using UMI-based error correction.
1. Experimental Background:
Our library preparation method is in-house and similar to the 10x Genomics Single Cell V(D)J assay in principle (featuring UMI tagging and cDNA fragmentation), but it is performed in bulk (without cell barcodes). Sequencing is paired-end 150bp (PE150).
2. UMI Structure & Challenge:
The UMI is located at the very beginning of Read 1. It has a defined structure with constant regions, not just random bases. The pattern is:
NNNNNACANNNNNACANNNNNTTTCTTATATGGG
(Where N represents random molecular barcode bases).
This structure means our UMI is longer and more complex than the standard N{26} or N{34} patterns in many public presets.
3. Core Question:
We wish to leverage these UMIs to group reads and construct high-fidelity consensus sequences for full-length VDJ recovery. Could you please guide us on how to best set up MiXCR for this task?
