Turn into .

Turn

into .

DNAvisualization.org turns DNA sequences into gorgeous, interactive two-dimensional visualizations to enable their exploration.

About

Raw DNA sequences, which consist of long strings of letters, are hard for humans to make sense of at a glance. However, there is significant biological meaning contained within them. To reveal this meaning, a number of methods have been proposed to convert raw DNA sequences into two-dimensional visualizations. This website allows you to try several of these methods out on your data with a high-performance parallel-computing backend enabling genome-scale visualization.

Instructions

Using this website is easy:

  1. Choose one or more visualization methods. For full information about the visualization methods, take a look here.
  2. Upload a sequence by:
    • Clicking the "browse" button
    • Dragging a file anywhere on the page
    • Loading an example file
    • Pasting a file while anywhere on the page
    • Pasting the contents of a FASTA file into a textbox

That's it.

Optionally, after plotting a sequence, you can:

  • Remove files from the visualization by clicking the icon and selecting files
  • Modify the graph's title and subtitle by clicking the icon
  • Export the content of the graph in .png, .jpeg, .svg, or .pdf format by clicking on the icon
  • Zoom in on a region of interest by clicking and dragging over it, zoom out via the icon, or reset the zoom
  • Change the visualization method if you have chosen more than one visualization method
  • Edit a pasted sequence by clicking on the symbol next to the " Paste FASTA" button

When you plot your data using this website, there are a few things to remember. By default, each DNA sequence you wish to display will be rendered in a different color unless there are more sequences than there are availible colors, in which case each file will be shown in its own color. This may be overridden at any time using the "legend mode" selector that becomes visible after plotting. Although the website can display up to thirty sequences of up to 4.5 Mbp each simultaneously, you may find that the resulting graph becomes very cluttered.

On the "visualization method" click-box, you choose how to display your data using the three different display methods currently supported. We are actively investigating adding more methods.

As you look at an output graph with several overlaid sequences, you will see that when the sequences completely mirror one another, that the sequences are essentially identical. When you see the plot lines begin to diverge, you will instantly be able to see where the sequences have begun to differ. You can zoom in for a closer view, then reset the zoom to enable a return the original (unzoomed) view. To change the topmost sequence on the visualization, hover over the legend entry corresponding to the sequence you want to see. When the sequences are parallel on the plot, that tells you that the sequences are once again identical after a period of divergence.

Try visualizing your data with each of the visualization techniques; each method shows your data from a slightly different perspective, but be aware of each method’s strengths or weaknesses so that you use the right tool for the right job.

When you finish using the site, you can output your data using the .png, .pdf, or .jpg formats in a publication-ready graph. Your data will be retained for 24 hours, but after that time has elapsed, you will need to re-upload your data if you wish to continue using this tool.

Architecture

DNAvisualization.org is built on an entirelesly serverless architecture. Due to the inherently parallelizable nature of DNA sequence transformation (the transformation of one sequence has no effect on the transformation of another), this design yields a significant performance increase at a lower cost compared to a traditional architecture.

This website uses Flask as a lightweight Python web framework and Zappa to automate deployment to Amazon Web Service's (AWS) Lambda serverless computing platform. Additionally, it takes advantace of AWS S3 for serverless data storage and querying. All stored data are automatically deleted after 24 hours.

For a full overview of the architecture, take a look at the sequence diagram below:

sequenceDiagram participant C as Client participant L as AWS Lambda participant S as AWS S3 note over C: User uploads FASTA loop Every sequence in uploaded FASTA files C-xL:Submit sequence async activate L opt Not already saved? L->>S: Transform and save parquet.sz end L->>C: Downsampled JSON deactivate L end note right of S: Data deleted in 24h note over C: User zooms in loop Every uploaded sequence hash C-xL: Async sequence query activate L L->>S: S3 Select query activate S S->>L: Selected data as CSV deactivate S L->>C: Downsampled JSON deactivate L end

Citation

If you use this website in your research, please cite is as:

Additionally, please be sure to cite the visualization method you used. Click below to display the citation information for each method:

Squiggle

If you use the Squiggle visualization method, please cite the following paper (full text):

  • Lee, B. D. (2018). Squiggle: a user-friendly two-dimensional DNA sequence visualization tool. Bioinformatics. doi:10.1093/bioinformatics/bty807.
Yau

If you use the Yau visualization method, please cite the following paper (full text):

  • Yau, S. S., Wang, J., Niknejad, A., Lu, C., Jin, N., & Ho, Y. K. (2003). DNA sequence representation without degeneracy. Nucleic acids research, 31(12), 3078-80.
Yau-BP

If you use the Yau-BP visualization method, please cite the following papers:

  • Lee, B. D. (2018). Squiggle: a user-friendly two-dimensional DNA sequence visualization tool. Bioinformatics. doi:10.1093/bioinformatics/bty807.
  • Yau, S. S., Wang, J., Niknejad, A., Lu, C., Jin, N., & Ho, Y. K. (2003). DNA sequence representation without degeneracy. Nucleic acids research, 31(12), 3078-80.
Randic

If you use the Randic visualization method, please cite the following paper:

  • Randić, M., Vračko, M., Lerš, N., & Plavšić, D. (2003). Novel 2-D graphical representation of DNA sequences and their numerical characterization. Chemical Physics Letters, 368(1–2), 1–6. doi:10.1016/s0009-2614(02)01784-0.
Qi

If you use the Qi visualization method, please cite the following paper:

  • Qi, Z., & Qi, X. (2007). Novel 2D graphical representation of DNA sequence based on dual nucleotides. Chemical Physics Letters, 440(1–3), 139–144. doi:10.1016/j.cplett.2007.03.107.

Contact Us

Having issues? We're happy to help get them resolved. If you think you've found a bug, the best way to get in touch is to make an bug report on the project's GitHub repository.

Have an idea for another way to turn a DNA sequence into a two-dimensional visualization? Let us know over on the Squiggle repository and we'll be happy to work with you on implementing it. (DNAvisualization.org uses a stripped-down version of the Squiggle library to transform sequences.)

To make a feature request for the DNAvisualization.org website, please click here.