Eh, enter your own damn data….

tair_submission.jpgI was looking over the Eurekalert announcements and came across one that I have been percolating about now for some time. It is an effort I fully support and encourage. But I worry about a few aspects of it. The alert is entitled: Controlling a sea of information. The Arabidopsis Information Resource (TAIR) has partnered with the journal Plant Physiology to ensure data from Plant Physiology papers will get into the TAIR database. The longer story is available from the alert and from the associated Editorial. The short story is: there aren’t enough curators to keep up with all the data coming out. This prevents a lot of information from getting into the databases. The TAIR and PlantPhysiol folks have teamed up to create a way for the authors themselves to get this information into TAIR with a simple form.

Now, I would use the form, and I would submit, and I would try to do it to meet standards and with accuracy. But I have had lots of colleagues over the years who would have blown that off. So I hope there is compliance, but….we’ll see. Over and over in bioinformatics there have been efforts at “community annotation” and “community submission” which have had mixed outcomes–both in adoption and quality. There are certainly some advocates and early adopters. Everybody wants great data in the databases when they go and look for it. But for the authors/submitters–it can be a time sink and a nuisance–without any benefit such as a CV item that the tenure committee will consider.

I even had one colleague who had submitted a gene to GenBank years ago and accidentally had a typo in the gene name. He knew he could correct it–but he preferred not to, because he knew that if he used the typo as a search he would always get his submission and not have to look through all the other ones.

Professional curation staff are so important for focus, volume, and quality control. They are serious and have such a great understanding of why the correct symbols, Gene Ontology terms, experimental descriptions, etc are crucial to store in the databases. But some scientists complain about the funding that goes to database projects because they compete for scarcer and scarcer resources with the bench researchers.

I wish this project great success. I would encourage people to take this seriously if you want good quality information in the databases. Community efforts are a great idea. But professional staff at these resources are still crucial for obtaining the quality we need in the databases. And for some efforts there really needs to be a committed group with training, standards, institutional memory and teamwork. I hope people support and appreciate curators as well.

Perhaps we’ll get a report on the TAIR/PlantPhysiol project at the next Biocurators meeting–we’ll be there, we’ll be looking forward to hearing about it!

Editorial: Plant Physiology 146:1022-1023 (2008)

3 thoughts on “Eh, enter your own damn data….

  1. Eva Huala

    One slight clarification – the data in the form will still be reviewed by professional curators, but having the data already somewhat structured and distilled within the form should cut down on the curator time needed.

  2. Mary

    Hi Eva–

    Oh, I’m sure it still needs review. But I’m almost worried that will take longer than a professional curator taking the whole paper from the beginning. When I was at Jax we did a clean up on a free text tissue field…oy. And I saw all that free text in the form.

    I was also remembering a time when someone threatened to sue because curators had changed their pet gene name. Proper nomenclature is just not a top priority for a lot of people.

    I really hope it works, and works well. I know the volume of data is huge. Maybe I just haven’t been hanging around with the right people when I wasn’t working at databases :)

  3. Pingback: Wikification of Genbank | The OpenHelix Blog

Comments are closed.