What are TALEs?
TALE (Transcription Activator-Like Effector) proteins are DNA-binding proteins
originally found in plant pathogenic bacteria. When paired they
can be engineered to target and cut specific DNA sequences, making them powerful tools for genome editing.
What This Tool Does
The TALE Pair Finder identifies suitable pairs of TALE binding sites in your DNA sequence that can
be used for TALE/TALEN design. The tool searches for complementary binding sites on opposite DNA
strands with appropriate spacing and validates them against several quality criteria.
Search Criteria
- TALE Length: Typically 15-20 base pairs (configurable 10-30 bp)
- Spacer Length: Distance between binding sites (typically 12-30 bp)
- Start Base: TALEs Must begin with 'T' on the ?sense? strand
- GC Content: At least 25% for optimal binding specificity
- CpG Islands: Avoided due to potential methylation
- Consecutive A/T: Avoids 7+ consecutive A or T bases
- Strong RVDs: Requires at least 3 strong binding pairs (NN, HD)
RVD Encoding
RVDs (Repeat Variable Diresidues) are the amino acid pairs that determine TALE DNA binding specificity:
- NI binds to Adenine (A)
- HD binds to Cytosine (C)
- NN or NH bind to Guanine (G)
- NG binds to Thymine (T)
How to Use
- Enter your DNA sequence (100 to 100,000 base pairs)
- Adjust parameters as needed (or use defaults)
- Click "Find TALE Pairs" to start the search
- Monitor progress in real-time
- View results with interactive filtering and sorting
- Export data as CSV or TSV for further analysis
Technical Details
Performance Optimizations
- Pre-computation of CpG islands (eliminates millions of redundant checks)
- O(1) GC content queries using cumulative sum arrays
- Efficient sliding window approach
- Early filter termination to skip invalid candidates quickly
- Bulk database inserts for fast result storage
Technology Stack
- Backend: FastAPI (Python 3.10+)
- Database: PostgreSQL with asyncio support
- Frontend: Vanilla JavaScript with DataTables
- Deployment: Docker & Docker Compose
Data Retention
Search results are stored for 7 days and then automatically deleted.
Each search session gets a unique ID that you can use to access results later.
Limits
- Maximum sequence length: 100,000 base pairs
- Background processing may take several minutes for large sequences
Support & Feedback
This software is in active development. If you encounter issues or have suggestions,
please report them.