About - TALE Pair Finder

What are TALEs?

TALE (Transcription Activator-Like Effector) proteins are DNA-binding proteins originally found in plant pathogenic bacteria. When paired they can be engineered to target and cut specific DNA sequences, making them powerful tools for genome editing.

What This Tool Does

The TALE Pair Finder identifies suitable pairs of TALE binding sites in your DNA sequence that can be used for TALE/TALEN design. The tool searches for complementary binding sites on opposite DNA strands with appropriate spacing and validates them against several quality criteria.

Search Criteria

TALE Length: Typically 15-20 base pairs (configurable 10-30 bp)
Spacer Length: Distance between binding sites (typically 12-30 bp)
Start Base: TALEs Must begin with 'T' on the ?sense? strand
GC Content: At least 25% for optimal binding specificity
CpG Islands: Avoided due to potential methylation
Consecutive A/T: Avoids 7+ consecutive A or T bases
Strong RVDs: Requires at least 3 strong binding pairs (NN, HD)

RVD Encoding

RVDs (Repeat Variable Diresidues) are the amino acid pairs that determine TALE DNA binding specificity:

NI binds to Adenine (A)
HD binds to Cytosine (C)
NN or NH bind to Guanine (G)
NG binds to Thymine (T)

How to Use

Enter your DNA sequence (100 to 100,000 base pairs)
Adjust parameters as needed (or use defaults)
Click "Find TALE Pairs" to start the search
Monitor progress in real-time
View results with interactive filtering and sorting
Export data as CSV or TSV for further analysis

Technical Details

Performance Optimizations

Pre-computation of CpG islands (eliminates millions of redundant checks)
O(1) GC content queries using cumulative sum arrays
Efficient sliding window approach
Early filter termination to skip invalid candidates quickly
Bulk database inserts for fast result storage

Technology Stack

Backend: FastAPI (Python 3.10+)
Database: PostgreSQL with asyncio support
Frontend: Vanilla JavaScript with DataTables
Deployment: Docker & Docker Compose

Data Retention

Search results are stored for 7 days and then automatically deleted. Each search session gets a unique ID that you can use to access results later.

Limits

Maximum sequence length: 100,000 base pairs
Background processing may take several minutes for large sequences

Support & Feedback

This software is in active development. If you encounter issues or have suggestions, please report them.

🧬 About TALE Pair Finder