Friday SNPpets

This week’s SNPpets include a new cancer Data Catalog from NCI, Avianbase for bird genomics, a couple of variant calling tools, and someone trying to genotype the color assessment of “the dress” using 23andme data. Heh. [But at least no llamas were harmed in the making of this post.]


Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

 

What’s The Answer? (Internet of DNA)

This week’s highlighted discussion tackles the “Internet of DNA”, a story I picked last week in my SNPpets post, which has bubbled up elsewhere. And Biostar folks look at the more technical implications of “A global network of millions of genomes….”


Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.

This week’s discussion comes as part of an interesting week on the personalized medicine front. A whole bunch of things are coming together–the US getting a Chief Data Scientist who talks about bioinformatics, The NEJM talking about training physicians to deal with medical genomics issues, and the “Internet of DNA” getting out into the popular science media realm. So have a look at what bioinformatics nerds made of this, and what their thoughts are:

Forum: A global network of millions of genomes could be medicine’s next great advance. | Beacon

Internet of DNA

A global network of millions of genomes could be medicine’s next great advance.

Availability: 1-2 years

Noah is a six-year-old suffering from a disorder without a name. This year, his physicians will begin sending his genetic information across the Internet to see if there’s anyone, anywhere, in the world like him.

http://www.technologyreview.com/featuredstory/535016/internet-of-dna/

Do you think this will happen within 2 years?

Edit:

This is the technical implementation I think  that they are talking about:

The Beacon project is a project to test the willingness of international sites to share genetic data in the simplest of all technical contexts. It is defined as a simple public web service that any institution can implement as a service. The service is designed merely to accept a query of the form “Do you have any genomes with an ‘A’ at position 100,735 on chromosome 3″ (or similar data) and responds with one of “Yes” or “No.” A site offering this service is called a “beacon”.

http://ga4gh.org/#/beacon

So it just a federated query over multiple large genomics (+ phenotypes) data sets. Full genomes are not centralized, or moved, so privacy is less of a concern.

William

And please, contribute your own thoughts over there. We need to be having this discussion. Also, watch for more on this Beacon….

Video Tip of the Week: CRISPRdirect for editing tools and off-target information

Great RCSB PDB molecule-of-the-month page on CRISPR

Great RCSB PDB molecule-of-the-month page on CRISPR

Genome editing strategies are certainly a hot topic of late. We were astonished at the traffic that the animation of the CRISPR/Cas-9 process recently drew to the blog. There’s a huge amount of potential for novel types of studies and interventions in human disease situations–but I’m already seeing applications in agriculture coming along. There’s an edited canola available in Canada already. China has edited wheat for disease resistance. There’s a project underway to remove horns from cattle–by merely snipping out a bit of sequence with TALENs/ZNF strategies. They’ve already created cattle with edited myostatin too.

To accompany this work, new software tools have been developed to help design target sequences and evaluate potential off-target situations. Both TALEN target software tools exist, and CRISPR tools exist. For this post I’ll be focusing on just one of the CRISPR tools, but I’ll list a few others as well. Some sites have incorporated both options in their software tools. Some will have a small range of species, some have larger sets. So part of choosing a tool is asking about the genomes it supports. In future Tips we may explore some of the others. There is something of a flood of these tools coming along, and I’ll continue to explore them.

This week’s focus is CRISPRdirect. A Japanese group has created this tool for generating a guide sequence and for evaluating potential off-target activity. This introductory video (with music, and with English annotations to convey the features) will give you an overview of the functions.

It seems to be an easy-to-use interface, with effective organization of the results. They have a nice range of species to examine–not only some of the mammalian genomes, but fish, chicken, worm, plants, and yeast too. There’s a graphical viewing component and an easy export option as well.

So I’ve come across a few tools in my search, but if you have favorites please feel free to add them below in the comments. I’m going to continue to look into these tools and will be looking to highlight others in the future.

Quick link:

CRISPRdirect: http://crispr.dbcls.jp/

A few links to other tools I’ve been looking at:

E-TALEN: http://www.e-talen.org/E-TALEN/

E-CRISP: http://www.e-crisp.org/E-CRISP/

TAL Effector Nucleotide Targeter 2.0: https://tale-nt.cac.cornell.edu/

Prognos: http://baolab.bme.gatech.edu/Research/BioinformaticTools/prognos.html

ZiFiT Targeter software (TALEN/ZNF/CRISPR support): http://zifit.partners.org/ZiFiT/

COSMID: https://crispr.bme.gatech.edu/

CRISPY (specific for CHO cells): http://staff.biosustain.dtu.dk/laeb/crispy/

Reference:

Naito Y., K. Hino, H. Bono & K. Ui-Tei (2014). CRISPRdirect: software for designing CRISPR/Cas guide RNA with reduced off-target sites, Bioinformatics, DOI: http://dx.doi.org/10.1093/bioinformatics/btu743

TIL: There’s a chief data scientist for the US. DJ Patil.

I know there’s lots of hype and drama over “big data”, some of which is over-the-top. But there are real needs and real opportunity in all sorts of data we are generating as well. So we now have a chief data scientist in the US. I found the news on the NIH Data Science blog, where they have more links and include this video where DJ Patil explains more about this role and the reasons.

Highlights of the video in case you can’t listen right now:

~6min he calls out “bioinformatics” as an area of emphasis

~10min he specifically talks of working with Phil Bourne and NIH about bringing data science and bioinformatics together.

The White House release about Patil references the Precision Medicine efforts. 1.29.15_precision_medicine

Precision medicine. Medical and genomic data provides an incredible opportunity to transition from a “one-size-fits-all” approach to health care towards a truly personalized system, one that takes into account individual differences in people’s genes, environments, and lifestyles in order to optimally prevent and treat disease. We will work through collaborative public and private efforts carried out under the President’s new Precision Medicine Initiative to catalyze a new era of responsible and secure data-based health care.

He asks for your help. They are building out teams. He wants everyone to check out the site and see if they can contribute.

US Data Service: http://whitehouse.gov/USDS

Follow @dpatil on twitter: https://twitter.com/dpatil

Hat tip to Beth Russell at the NIH Data Science blog called Input | Output: https://nihdatascience.wordpress.com/2015/02/24/dj-patil-is-the-new-chief-data-scientist-of-the-united-states/

Friday SNPpets

This week’s SNPpets include 5 continents of Drosophila genomes, a response to “Cars…made by bioinformaticians”, BGI’s mission to change the world, a great table of sequencing costs per platform, and the Internet of DNA.


Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

What’s the Answer? (RStudio as a game-changer)

Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.

This week’s highlighted Biostar item is part of the week’s them on statistical computing. The post comes from someone who is a biologist, is remembering what it was like before we had the nice RStudio interface. And he offers some hand-holding to get new users started.

Tutorial: Few words for R beginners

Hi,

As a biologist who started to learn R, I encountered a lot of problems on learning the subject. Now I don’t want to go into them but I just want to suggest what I think that can save you from wasting your time and energy fooling around without getting what you expect.

  1. Install R ! Of course!
  2. Install R-studio, this simplifies your life. Note: R-studio should be installed after R. (http://www.rstudio.com/). After this you always open R-studio not R. R is the actual program but R-studio gives it the nice interactive interface.
  3. Watch this webinar on R to get familiar with basics and why it’s good to have R-studio. http://bitesizebio.com/webinar/20600/beginners-introduction-to-r-statistical-software/
  4. Coursera offers this very nice course in R. Get the videos from their website and of course watch them! (https://www.coursera.org/course/rprog)
  5. While learning from the course, practice with swirl ( http://www.swirlstats.com ). Swirl was the best R teacher for me. It interactively makes you work around with R.
  6. Also https://www.datacamp.com/courses/introduction-to-r or generally https://www.datacamp.com is very good resource for self learners!
  7. Stuar51XT is a youtube channel that has very nice comprehensive R courses. Just in their videos search for “introduction to R programming” https://www.youtube.com/user/Stuar51XT .
  8. Practice and expand bioinformatics oriented R skills by “Institute for Integrative Genome Biology” manual. http://manuals.bioinformatics.ucr.edu/home/R_BioCondManual

If I go back to my pre-R era I would follow the above. I think its a good kick-off for those who want to learn R and start getting familiar with R’s environment.  I hope it helps you =)

Cheers!

–Parham

But I also loved this response:

I would add, as someone who started using R around 13 years ago: RStudio has been a complete game-changer. It has made the software far more accessible to more people, brought together a great combination of developers, been responsible for many useful, innovative packages and all-in-all, is just A Good Thing.            – Neilfws

See, it’s not just me trying to lure you to RStudio. It is A Good Thing. There are some other comments over there too with more tips or chatter. Go have a look.

 

Video Tip of the Week: RStudio as an interface for using R

Although typically we focus on databases and algorithms in use in bioinformatics and genomics, there are some other tools that support this work that are crucial as well. The statistical software and computing tools associated with R fall into this category. Increasingly RStudio is being adopted by folks in genomics, and although we talked about R in the past, I hadn’t highlighted the RStudio interface before. But this really lowered the barrier to entry, and has changed the way to use R effectively, and it’s time to include this in our Video Tips of the Week.

In a previous tip we highlighted some training on R that was delivered in a webinar, by Heather Merk of Ohio State. So if you need an overall Introduction to R Statistical Software, that’s a good place to start. When you are ready to begin to work with R, though, you should consider trying out RStudio.

This overview video will demonstrate the basics of the interface for RStudio.

RStudio Overview – 1:30 from RStudio, Inc. on Vimeo.

There’s more detail on many of the features of RStudio that they provide as well. And their Vimeo channel has a few more videos as well. Another thing about using RStudio is that there’s increasingly additional types of support coming from that front. A popular tip we did was on Slidify to make sides directly from RStudio.

RStudio is not just for genomics, though–it’s widely used in many fields that engage in statistical analysis. I was surprised to not find a lot of references to it in PubMed yet–some guidance and explainers in biotech, but I know it’s being widely used. You can see a lot of examples in use in Google Scholar. This includes several enthusiastic uses of RStudio in teaching situation: An Attractive Template of a Reproducible Data Analysis Document for an Awesome Class Project; and Teaching precursors to data science in introductory and second courses in statistics. I did find reference to a software review in an economics publication. And you can get a book to help if that’s how you like to learn more as well.

But if you haven’t had a chance to check out RStudio yet, I’d recommend it.

Quick links:

RStudio: http://www.rstudio.com/

R: http://www.r-project.org/

RSeek: an R-specific search engine http://www.rseek.org (hat tip Elana Fertig’s handy intro slide deck)

References:

Gandrud, Christopher. Reproducible Research with R and R Studio. CRC Press, 2013.

Racine J.S. (2011). RStudio: A Platform-Independent IDE for R and Sweave, Journal of Applied Econometrics, 27 (1) 167-172. DOI: http://dx.doi.org/10.1002/jae.1278

Fertig, E. (2012) Getting Started in R.

Statistics for Biologists

In a curious coincidence (not statistically relevant), this week I planned to highlight some useful statistical software as my Video Tip of the Week and the Answer post. In order to lure you back for the other pieces this week, I bring you a handy collection from Nature that was just announced:

Direct link over there in case the tweet breaks later:  Statistics for biologists – A free Nature Collection is the announcement post.

The collection is here: http://www.nature.com/collections/qghhqm

Friday SNPpets

This week’s SNPpets include link rot, gall-forming Hessian flies, a Kickstarter for the story of a rare disease found with personal genome sequencing, and a hilarious/sad video of grad students who can’t seem to meet with their PIs.


Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

 

What’s the Answer? (wet lab software)

Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.

This week’s highlighted question is about some wet lab software. Typically we are looking at genomics analysis tools, but with the high-throughput nature of current biology it seems to me there’s good opportunity for more of this type of resource management software too. And the software developer is looking for some feedback from other types of researchers.

Tool: StrainControl Laboratory Manager software

Dear all,

I have read some posts regarding lab software’s, so I thought that maybe ours could be of interest.

We that have developed the software, StrainControl Laboratory Manager, all work in the field of science.

Last year StrainControl was released to the research community.

StrainControl is a lab software that allows to you to store everything in the lab in one place.
Currently there are about 700 labs that are using StrainControl and they are all satisfied.

Some key functions:
1) Handle strains, cell-lines, oligos, plasmids, chemicals and inventories.
2) Link plasmid data to strains or cell-lines.
3) Ability to rename any field and text to your own needs.
4) Customize which data columns should be visible.
5) User management allowing you to create read, write, administrator accounts etc.
6) Read-access from cloud drive (dropbox etc) and network support.
7) Create reports (over 20 formats)

What I´m interested in is if any other research fields (beside basic research labs) can make any use of the software since any field can be renamed to fit a different research field.

We would be very happy if you could give it a try and comment how the software works for you.

More information: http://www.straincontrol.com

Thank you in advance,
Chris Ericsson, PhD

And some of our most popular blog posts are about colony management software, electronic lab notebooks, and other sorts of routine stuff–not just analysis tools. So have a look and see if this is useful. Or if you have other tools like that which you find essential to lab work, let  me know. I’d love to have a look.