This page serves as the viewer for the assembled EST sequences and finished cDNA sequences in our pig full-length-enriched cDNA libraries. Search results provide information on similarity with UniGene clusters (human, mouse, cattle, and pig), sequences in RefSeq (human, mouse, dog, cattle, and pig), human genome sequences, and cSNPs (SNPs in cDNA sequences) (for EST assemblies). In addition, information on the coverage of the open reading frame (ORF) of the assembled sequences and clones is provided in the result pages. Data can be searched according to clone name, assembly name, properties of cDNA libraries, keywords in BLAST search results, gene symbol and information on human gene loci.
Items with a check box are enabled during the search only when they are checked.
You can select the dataset to search. Datasets for EST assemblies and/or finished cDNA sequences can be selected.
Search is by the name of an assembly. Names are determined by the following rule:
[Release name][Contig or Singlet]-[Cluster number]
Examples: 20030531C-000305 20030430S-000028
Release names are determined by 8 digits describing the date of assembly. "Contig" versus "singlet" is discriminated by the one-letter code "C" or "S". Cluster numbers start at 000001 and have 6 digits.
You should use this item only when you know the cluster name(s) for your sequence(s) of interest.
This search yields the contigs and singlets derived from the reads from a particular library.
This search provides the matched clone itself or the assembly that contains the selected clone. Read names are defined as follows:
[Library name]_[Plate name]_[Well]
Examples: LNG01_0086_A07 OVRM1_0032_G11
This search is based on the results of BLAST searches of the clone and assembly sequences in the database, performed by using NCBI RefSeq and UniGene data as queries. Human, mouse, cattle, and pig UniGene clusters and human, mouse, dog, cattle, and pig translated RefSeq sequences are used in the search, and the RefSeq and UniGene database can be selected to limit the result to a particular species. The search can be done with the following items as queries:
- RefSeq Gene ID / UniGene ID
- Accession number of the RefSeq protein / UniGene nucleotide sequence
- Keyword(s) (multiple words can be selected by demiliting spaces)
- Gene symbol of the gene (for RefSeq)
- Chromosome where loci of the corresponding genes with high similarity are localized (for RefSeq)
- BLAST score threshold for the homology search (this parameter affects the results of the keyword or locus name and chromosome searches)
Clones and assemblies in the results can be restricted to those estimated as full-length CDSs according to their counterparts in the UniGene clusters or RefSeq sequences (see below). A search based on the results of BLAST searches using human and mouse RefSeq protein sequences in NCBI is also available.
This option limits the search to assemblies with putative SNP(s) within cDNA sequences in the search. Because reverse transcriptase has a high error rate, single-base mutations occur frequently and complicate the identification of actual SNPs. Therefore, alleles of mutations must occur in at least 2 reads to be considered putative SNPs. These putative SNPs have partly been confirmed in porcine genomic sequences, and their distribution in various breeds has been investigated (see Suppplemental figures and tables).
For the finished cDNA sequences in RefSeq search results, we roughly estimated the cDNA clones as encoding full-length CDSs if the length from the head to tail of all of the matches (BLAST score > 50) in the ORF of the cDNA clones were within the limit between 67% and 150% of the length of CDS of the matched reference gene. For the finished cDNA sequences for UniGene searches and for the EST assemblies, we estimated the clones or assemblies as encoding full-length CDSs if the length upstream of the matches (BLAST score > 50) in the clones or assemblies were longer than the length between the start base of the CDSs and the matched region of the corresponding genes.