What's Your Problem? Open Thread

q_mark2.jpgWelcome to the “What’s Your Problem?” (WYP) open thread. The purpose of this entry is to allow the community to ask questions on the use of genomics resources. Think of us as a virtual help desk. If you have a question about how to access a certain kind of data, or how to use a database, or what kind of resources there are for your particular research problem, just ask in the comments. OpenHelix staff will keep watch on the comment threads and answer those questions to the best of our knowledge. Additionally, we encourage readers to answer questions in the comments too. If you know the answer to another reader’s question, please chime in! The “WYP” thread will be posted every Thursday and remain at the top of the blog for 24 hours.Questions or problems asked on Thursday will be answered on Thursday to the best of our ability. You can leave questions on other days of the week, but the answer might not come that day.

You can keep up with this thread by remembering to check back, by subscribing to the RSS comments feed to this WYP post or by subscribing to be notified by email of new comments to the post (use checkbox at end of comment form, you can unsubscribe later). If you want to be notified of future WYP posts (every Thursday), you can subscribe to the WYP feed.

17 thoughts on “What's Your Problem? Open Thread

  1. Megan


    I am having a few problems with Haploview that I don’t see answers to in the user’s manual. I am loading “Linkage Format” data in the .ped and .map file formats. I am performing a case-control analysis for multiple subjects at 21 different SNP locations. If I load the data for each SNP individually, I get completely different Chi Sq. values than if I load all of the SNPs together. Is this possible mathematically, or is there something wrong with my file/the program? Also, when I load multiple SNPs, it tends to stop “seeing” some alleles. It will miss a TT from the a case sample for one SNP, and a CT from a control sample for another SNP etc.

    Any Ideas of What I could be doing wrong?


  2. Mary

    Hi Megan–

    We’ll look into this. Are you on a mac or pc? We can run some test samples on both and see if there’s any difference, but sometimes it matters.

    I’ll try it out with some HapMap data and see what I find.

  3. Megan

    Thank you for your help! I am using a PC and I have the newest version of Microsoft Office.

  4. Max

    Hi, I am not an expert on SNPs but I wondering for some years now, having read about the lactase-SNP in the intron of its flanking gene, how many SNPs relevant for a gene are located in neighboring genes. This must a question with an easy answer by looking at a database of SNPs with a known effect.

    dbSNP is probably not the right place to look for as it just lists the SNPs. After your recent blog post, I’ve tried the HugeNavigator, but it also doesn’t look like the right database. F-SNP, that you presented recently, does not show already known SNPs, HGV does not list genes and OMIM seems to consist only of text. I guess there is another public database with curated, well known, published SNP gene associations like the lactase mutation?

    thanks a lot in advance

  5. Mary

    It’s hard to know how far out the distance effects might be working–I too have heard of some suggestions of these but I’m not sure how well curated that data is yet.

    If you want to visually scan around and see what’s nearby I would recommend the UCSC Genome Browser. You could put a gene in the center of your view, and then examine adjacent genes and their SNPs quite easily. You could pull the same data down with the table browser.

    I like VISTA for comparing regions and looking for conserved that might be a nice clue too–and you can see the SNPs and look at regulatory regions like Transcription Factor Binding Sites which could be informative.

    GVS might be a good tool for your needs too. You could start by browsing around by the chromosomal region.

    We have tutorials on each of these here: http://www.openhelix.com/blog/?page_id=57 Those should get you started.

  6. Mary


    I tried to load up some data in my Haploview and I’m seeing what I expect (at least for the loading issues). Probably you have done already looked at some of these things, but I’ll just mention them: check the input format–if a tab or space it out of line it might be causing the problem? I noticed the filtering–the software will take out some data if it doesn’t meet the thresholds. Is that possibly an issue? The documentation on data checks might offer some possibilities.

    I noticed the documentation within the program doesn’t link to the FAQ properly, but there’s a bit more on the filtering there:
    http://www.broad.mit.edu/mpg/wiki/index.php/Haploview_-_FAQ And contains a suggested work-around.

    Trey is going to have to look at the calculation part, he’s done more of that than I have. But he’s at this conference overseas and his internet access is intermittent. But I’ll ask him to have a look.

  7. Max

    Thanks Mary for your suggestions! But I think that I didn’t succeed in explaining my question very well: I am not searching for any SNPs within a genomic region that might have no effect. I am searching for well-known SNPS, with publications on them, and their possible effector genes. A database where a link is made between SNPs, phenotypes and the probable effector genes.

    My concrete example is the LCT SNP -13910 upstream of LCT, within MCM6. It’s described on the OMIM entry of MCM6 and also in the entry of LCT but actually the literature suggests the MCM6 is not related to lactose digestion. So an ideal database would link the human gene LCT with the dbSNP entries within MCM6 (rs4988235) (which dbSNP doesn’t do, dbSNP links rs4988235 to MCM6).

    I think that the answer to my question is that this kind of database doesn’t exist yet, the root of the problem is the structure of OMIM.

    The answer to my question is part of OMIM but difficult to obtain automatically: One would have to parse the text in OMIM descriptions of allelic variations if they link to genes that are nearby on the human chromosome… quite a bit of downloading and parsing…

    Thanks for the help, the “what’s your problem”-thread is good idea.

  8. Mary


    Ah, I see. Although I think there’s still plenty of opportunity for you to hunt around for these for genes of interest to you. Just because they haven’t been shown yet doesn’t mean they aren’t there ;) And of those associations are proposed but not confirmed, which does seem like a literature curation issue.

    But this is a popular list–a catalog of GWAS studies put out by NHGRI: http://www.openhelix.com/blog/?p=670

    Also check out this thread and Andrew Johnson’s paper in the comments: http://www.openhelix.com/blog/?p=731

    Another place to look might be SNPedia: http://www.snpedia.com/ I searched for LCT there and they had some info on the MCM6 SNPs. But that’s all reliant on curation too.

  9. Megan


    Thank you so much for your help. It turns out it was the “exclude individuals” setting. I didn’t think it was because it seemed to be excluding individual SNPs rather than samples, but setting it to “100%” solved my problem.

    Thanks again!

  10. Jennifer

    Hi Max,

    I’m sure this is not a perfect solution either, but today I have been playing with MyNCBI (in preparation for a live training that Mary & I will be doing soon) & I am wondering if you can get at your answer by setting a MyNCBI OMIM filter for dbSNP entries.

    - I set an OMIM filter with ‘OMIM records linked to gene-specific displays of SNPs.’ and ‘OMIM records with allelic variants linked to dbSNP.’ options checked.
    - then I ran an OMIM search for the text ‘intron snp upstream’
    - from the results, I clicked the OMIM dbSNP tab
    - I randomly clicked to one report, scanned the report until I saw a dbSNP logo & read the associated line:’…sequence variation in the introns of HERC2 affects the expression of OCA2…’ And I saw an entry with the MCM6/LCT connection in the results list too.

    So I got a pretty good hit ratio with a quick MyNCBI filter. I’d guess with just a bit of trial and error you could set up an automated MyNCBI search to run regularly to pull these out. Perhaps with a bit of PERL etc. programming, or text mining you could get even closer to what you ultimately want.

    Best of luck to you, and do please keep us posted on how it works out for you!

  11. Max

    Hi Jennifer,

    this is really a cool idea! I’ve never used these filters in myNCBI before, I think it’s really strange that they hide their filters so well instead of simply adding an “advanced search” box.

    What I did had the same results, but more complicated to do: I’ve downloaded the omim.txt file and filtered it with awk and perl scripts to search for the words snp and upstream and was looking through the results. Currently I have four examples, the LCT case and pmids 12176321, 18445777 and the one that you found (HERC/OCA2). I think I should be able to illustrate my point now that there are a couple of examples with SNPs in neighboring genes.

    Thanks a lot for your help!

  12. Jennifer

    Hey, I’m glad we were able to help, Max. Thanks for letting us know it worked out!

    And on the filters being buried – NCBI is currently overhauling a lot of their user interfaces (for example PubMed’s new Advanced Search interface http://www.ncbi.nlm.nih.gov/pubmed/advanced is wonderful & quite easy to use), so you may get your wish one day soon…

  13. Mesk


    I have a list of human SNPs (some of which are not in dbSNP) with their build 36 coordinates. I’m interested in finding out which of these correspond to missense mutations, and if so in which genes they are found. Can you suggest the quickest possible way to input a list of b36 coordinates and get an output of the gene name and affected amino acid position for each one?

  14. Mary

    Hi Mesk–

    There are a couple of places you could try to do that. You could try BioMart: http://www.biomart.org/

    But I would recommend the UCSC Table Browser (http://genome.ucsc.edu/) for that. If you aren’t familiar with it, watch our tutorial on that here: http://openhelix.com/ucsc And the last exercise in that tutorial suite (download the exercises pdf) is a step-by-step example of starting with a list of genes and getting to what you want. Of course you would replace that with your SNP locations instead. But the same principles would apply–you should be able to replace genes with positions (use the “define regions” button) in the table browser input, and pull out the other data from the linked tables.

    Another option would be Galaxy. But we don’t have the exercise to demonstrate that yet… http://galaxy.psu.edu/ It can pull the data, you can process it for what you want, and analyze more from there.

  15. Max

    Hi, just for completeleness and the occasional google-users passing by: I just found an article about the LCT mutation in a huge pile on my desk and it seems that the mutation in the MCM6 gene is not the only reason for the phenotype (pmid 12914565), as expected.

Comments are closed.