Oryza sativa (ssp. japonica), as one of the most important food crops, was among the first to be sequenced, greatly facilitating genetic and physiological research in agriculture and plant biology. However, annotation of genes in the short-length range was proved inadequate to many plant genomes in general, especially for small secreted peptides found to be involved in diverse physiological processes, i.e. stress response, flowering, hormone signaling, etc. Studies showed that the numbers of small secreted proteins were underestimated. As both an economic crop and a model plant, it is a top priority for us to address this issue in rice (Oryza sativa ssp. japonica).
We made an effort to provide plant biologists a comprehensive comparative platform: OrysPSSP. It provides the data of small secreted proteins, 25-250 aa, on a genome-scale, integrated with a variety of search tools, validation functions and comparative resources. The current official release (v0530) contains a wholly set of 101,048 candidates. About two-thirds of them, 67,559, are located in un-annotated genome regions, while the rest, 33,489, are included in known genes. For each candidate, users are provided with chromosomal location, peptide sequence and domain(s), organelle location, gene annotation and neighboring genes. Validated with different data sets, 33,350 proteins were supported by tiling Array data, 9,431 by RNAseq data, and 18,353 by mass spectrum results. When comparing across the phylogeny of 25 green plants, we found the number of conserved SSPs between rice and other plants, in general, was inversely proportional to their evolutionary distance.
On top of the curated data for small secreted proteins from rice, we developed a number of tools to help rice scientists and plant biologists in obtaining (sub)datasets that are relevant and valuable to their fields of studies. Users can view the distribution of small secreted proteins on rice chromosomes, and browse the data by chromosome. Alternatively, they can search for small secreted protein genes and retrieve data by applying one or more filter parameters, i.e. gene keyword, domain name, chromosome location, annotation status, etc. A "BLAST" tool is also provided to seek small secreted proteins mapped to users' query sequences. Query sequence can be chose from three different types, genomic sequence (DNA), mRNA sequence (mRNA), or protein sequence (Protein). In our testing releases, the most important function users found, is the validation tool supported by our database. Currently we offered three separate datatypes, tilingArray, transcriptomics and proteomics, (all from public available data sources) to validate and filter small secreted protein candidates. A comparative genomics tool for a comprehensive analysis of the conservation of SSPs in 26 green plants was build. We integrated the genome information from 25 plant species besides Oryza sativa ssp. japonica. Comparison across the phylogeny would yield insight into the occurrence and evolution of SSPs in green plants.