ProleTRact

grafik </p>


This repository contains a Tandem Repeat Visualization Tool that serves as the companion tool to TandemTwister. The tool processes Variant Call Format (VCF) files generated by TandemTwister and visualize tandem repeats in an intuitive, interactive format. Users can explore motifs, compare alleles to the reference sequence, and gain insights into the structure of tandem repeats, enhancing their ability to interpret genomic variation.

Why ProleTRact?

TRs are complex: alleles can differ by motif composition, length, and interrupted blocks. ProleTRact visulize TR regions with color-coded motifs, highlights interruptions, and provides intuitive navigation across regions and samples, enabling quick insight into potentially pathogenic expansions or atypical structures.

Key Features

Installation

Requirements: Python 3.9, 3.10, 3.11, or 3.12 (Python 3.13+ may require building dependencies from source)

Install from PyPI:

pip install proletract
proletract --install-deps  # launches the web application

The launcher starts both the backend API server (port 8502) and frontend web server (port 3000). The application will open in your browser automatically. On headless machines, access the frontend at http://localhost:3000 after starting the application.

Note: If you encounter build errors (e.g., with Python 3.13+), ensure you're using Python 3.9–3.12, or install system dependencies: liblzma-dev (Ubuntu/Debian) or xz-devel (RHEL/CentOS/Fedora).

Quickstart

  1. Launch the app with the command above: proletract
  2. Open the browser tab to http://localhost:3000 (the URL will be shown in the terminal if you're running headless).
  3. Load an individual VCF or cohort folder from the sidebar and start exploring tandem repeats.

Usage

Individual mode 👤

  1. Select individual sample in the sidebar.
  2. Provide the absolute path to a bgzipped and tabix-indexed VCF (.vcf.gz with .tbi):
    • Enter the path in the sidebar input, then click Load VCF.
    • The app will parse records and enable navigation across TR variants.
  3. Use Previous/Next to step through records or jump to a region like chr1:1000-2000.
  4. Inspect motif blocks, interruptions, and per-allele differences.

Cohort mode 👥👥

Reads-based VCF

  1. Select Cohort in the sidebar and choose Reads-based VCF view.
  2. Provide the absolute path to a directory containing TandemTwister VCF files.
  3. Click Load Cohort to scan the directory and enable cohort navigation.
  4. Browse records and compare across samples.
  5. Use Previous/Next to step through records or jump to a region like chr1:1000-2000.
  6. Inspect motif blocks, interruptions, and per-allele differences.

Assembly VCF

  1. Select Cohort in the sidebar and choose Assembly VCF view.
  2. Provide the absolute path to a directory containing TandemTwister VCF files.
  3. Click Load Cohort to scan the directory and enable cohort navigation.
  4. Browse records and compare across samples.
  5. Use Previous/Next to step through records or jump to a region like chr1:1000-2000.
  6. Inspect motif blocks, interruptions, and per-allele differences.

Input Requirements

Demo / Examples

Example screenshots and short walkthrough GIFs will be added here. For now, you can open example.svg for a preview:

Tandem Repeat Visualization Example

Contributing

Contributions are welcome! Please open an issue to discuss changes.

License

This project is licensed under the BSD 3-Clause Non-Commercial License — see LICENSE for details. Commercial use is prohibited. This software is intended for academic research, educational purposes, and personal/private use only. For commercial licensing inquiries, please contact the author.

Citation

If you use ProleTRact in your work, please cite this repository. A formal citation entry will be added once available.