Technical Report
There are many sections to this document. Read below to find the relevant part.
Files that need to be moved to a portable server.
You need to make sure CVS and ANT are installed?
Things to copy for a new server
webvibe/public_html/Client/
*.structure
demo.html, demo.php
blank.html
webvibe-client.jar
/corejava
public_html/help
/FamilyDef
/GenusDef
/SpeciesDef
/FNA_XML
/bflyxml
/cgi-bin
/images
put the webvibe server in
webvibe/~bin/webvibe-server.jar
Edit demo.php demo.html to point to correct server, index files.
BIBEClient and BIBEServer
swish-x
mysql -
Thesaurus
Network Services
Apache
Tomcat
Soap
Mozilla (to be used with VNC)
Illinois FNA subset.
Wireless Networking on
iPAQ PocketPC
Handspring PalmOS
Vaio - Windows/me
VNC
Server on TeleBotonistServer
Clients
iPAQ PocketPC
Handspring PalmOS
Vaio - Windows/me
News Client
TeleBotanistServer
iPAQ PocketPC
Handspring PalmOS
Vaio - Windows/me
Spider
In order to index files on the web you need to use a web spider.
The swishspider requires a number of Perl libraries. These are listed below:
http://search.cpan.org/search?dist=Compress-Zlib
http://search.cpan.org/search?dist=libnet
http://search.cpan.org/search?mode=module&query=libnet
Install LWP From http://search.cpan.org/search?dist=LWP
http://search.cpan.org/search?mode=module&query=tagset
Swish-ex indexing
Differences between swish-e and swish- ex
Functional Differences
Programming Differences
How to Compile Swish-ex
(Insert Hong's documentation here)
Using Swish-ex
The executable is available at /home/webvibe/bin/swishe2.0beta3HCrTxtXml/src/swish-ex
Configuration File
Making an index
Using an index
Index file = /home/webvibe/index/FNAIndexWDictWithHongSwish
swish-ex -f /home/webvibe/index/FNAIndexWDictWithHongSwish -w '"sepals dark"'
returns for example
http://www.canis.uiuc.edu/~webvibe/GenusDef/g_DELPHINIUM.html "g_DELPHINIUM.html
where sepals and dark are not adjacent.
swish-ex -f /home/webvibe/index/butterfly.index -w
'"marginal spots"'
# SWISH format 2.0
# Search words: " marginal spots "
# Number of hits: 4
159 /home/webvibe/public_html/bflyxml/8Phoebis_sennae_m.xml
"8Phoebis_sennae_m.xml" 1941
159 /home/webvibe/public_html/bflyxml/7Phoebis_sennae_f.xml
"7Phoebis_sennae_f.xml" 1942
141 /home/webvibe/public_html/bflyxml/21Speyeria_idalia_f.xml
"21Speyeria_idalia_f.xml" 2015
121 /home/webvibe/public_html/bflyxml/22Speyeria_idalia_m.xml
"22Speyeria_idalia_m.xml" 2116
In xml searches, phrases may be searched as in the following example.
/home/webvibe/users/hong/swish-ex/src/swish-e -T /home/webvibe/public_html/FNA_XML/FNA.dtd -w '"<fna><description>flowers solitary"' -f ~webvibe/index/fnaxml.index
Structure of the Swish-ex XML
Data structure of swish-ex in indexing and searching xml documents:
Here is an example DTD for article
<!Element Article (Title, Author, Abstract)>
<!Element Title #PCDATA>
<!Element Author (FirstName, LastName)>
<!Element FirstName #PCDATA>
<!Element LastName #PCDATA>
<!Element Abstact #PCDATA>
The data structure used for representing this dtd looks like:
Article Hash
Title
0 Author Hash
Abstract LastName
2 11
Description of the data structure:
Every element that has sub-elements is represented as a hash, for example, element Article is a hash with three key-value pairs; element Author is a hash with two key-value pairs. The author hash is linked to Author cell in Article hash because it is a sub-element of Article element.
The keys are called tagnames. Each tagname has a number associated with it. The numbers are prefixed with their parent elements’ numbers. For example, FirstName has a number 10, where 1 is Author’s number in Article Hash and 0 is the order it gets in Author Hash.
Structure of the Glossary
There are several glossaries in this directory including the FNA Glossary
pwd /home/webvibe/FNA/Glossary
Searching MySQL
mysql -Dvibe -uvibeuser -pthe_password_that_the_staff_know
Thought I would give it to you didn't you.
show tables;
last updated by Karen Medina, 1/3/2003
Selecting Definitions for the Glossary
Species Pantarum
Processing of the text file.
The file was delivered by Anthony R. Brach in a word document.
This was saved as a text file and copied to the Unix directory ~webvibe/FNA/Glossary under the name SpeciesPlanarumGlossary.txt.
The first ";" was converted to a tab with vi
All references to "'" (single quote) were converted to "\'" (slash quote). This is needed for MySQL.
The program ~webvibe/FNA/Glossary/GlossaryDefinition/InsertSpeciesPlantarum.pl was used to same all of the terms and definitions in a table called species_plantarum_def.
There are 972 definitions in the file.
How to Compile the Java Server
1) Make a directory called "src" where you wish to keep the development code
#mkdir src
2) Change to that directory
# cd src
3) Copy the latest version of the source out of the cvs source code control system. This will create all of the directories with the source code.
# cvs -d /home/webvibe/cvsroot co webvibe
Your output should look like:
----------------------------
soldev:src 504 $ cvs -d /home/webvibe/cvsroot co webvibe
cvs checkout: Updating webvibe
U webvibe/build.xml
cvs checkout: Updating webvibe/conf
U webvibe/conf/databases.xml
U webvibe/conf/marys.xml
cvs checkout: Updating webvibe/docs
cvs checkout: Updating webvibe/docs/newsearchxml
U webvibe/docs/newsearchxml/database field rename.xml
U webvibe/docs/newsearchxml/database.xml
U webvibe/docs/newsearchxml/dataset.xml
U webvibe/docs/newsearchxml/message-query.xml
U webvibe/docs/newsearchxml/message-queryformat.xml
cvs checkout: Updating webvibe/jar
U webvibe/jar/crimson.jar
U webvibe/jar/jaxp.jar
U webvibe/jar/jdbc-mysql.jar
.....
----------------------------
4) Change to the webvibe directory
# cd webvibe
5) Run "ant" to compile the source code and build jar files and store them in webvibe/dist. There will be one for server, client and test. Included jar files are in the webvibe/jar directory.
# ant
Your output should look like:
----------------------------
Buildfile: build.xml
init:
[mkdir] Created dir: /home/webvibe/src/webvibe/build
[mkdir] Created dir: /home/webvibe/src/webvibe/dist
[mkdir] Created dir: /home/webvibe/src/webvibe/docs/api
compile:
[javac] Compiling 89 source files to /home/webvibe/src/webvibe/build
[javac] Note: 8 files use or override a deprecated API. Recompile with "-deprecation" for details.
[javac] 1 warning
dist:
[jar] Building jar: /home/webvibe/src/webvibe/dist/webvibe-server.jar
[jar] Note: creating empty jar archive /home/webvibe/src/webvibe/dist/webvibe-tests.jar
[jar] Building jar: /home/webvibe/src/webvibe/dist/webvibe-client.jar
BUILD SUCCESSFUL
Total time: 43 seconds
----------------------------
6) To edit a file, move to the appropriate directory such as server
# cd /home/webvibe/src/webvibe/src/webvibe/server
edit the file
When you are confident of the changes they should be added back into cvs with the
# cvs commit -m "Type a reason for this commit and what you changed"
To recompile go back to the directory containing the build.xml file and run ant again.
# ant
7) To add a new file to the source, create the file, and begin it wilt a line defining its package. For example, a server files should have the line
package webvibe.server;
Follow the example of other files in the directory where you wish to create the new file.
Add the new file to cvs
# cvs add <filename>.
Running the server
start the test servers with "ant run 8016"
To run on another port, copy ~src/webvibe/dist/webvibe-server.jar to where you want it to reside.
From the command line it can be executed with
java -classpath
/usr/java/lib/mysql_comp.jar:/home/webvibe/bin/server/webvibe-server.jar
webvibe.server.Server1 8016 >> $HOME/logs/cron8016.log%
Query Interface
At the moment, it looks like we will be sticking with Swing. If you look at the interface at http://www.canis.uiuc.edu/~webvibe/Client/demo.php http://soldev.isrl.uiuc.edu/~webvibe/Client/demo.php and pick the EcoWatch butterfly collection, you will see a dtd structure tree in the right hand side of the query panel. The nodes of that tree are currently editable. You click them and a couple of seconds later you can edit one. So for example you can open the Background Color = blue. Jingbo is adding a new function so that if you right click(reverse click) some terminal nodes you will get a new window with a set of choices.
For example: you might get a window with a set of the eight primary colors and the user could pick one (or two?). Where you come in is the list of items that is in the window. It might be a text list but in the more general case it should be a set of display elements and associated replies.
A choice table might look like
Choice.dtd
<!-- DTD for Choice DISPLAY LISTS -->
<!ATTLIST Choice Type (radio | checkbox ) "checkbox">
<!ELEMENT DisplayTuple (Reply, DisplayText, DisplayImage?)
<!ELEMENT Reply (#PCDATA)>
<!ELEMENT DisplayText (#PCDATA)>
<!ELEMENT DisplayImage (url)>
color.xml
<?xml version="1.0"?>
<!DOCTYPE Choice SYSTEM "http://www.canis.uiuc.edu/~webvibe/choices/choice.dtd">
<XSL processor "http:// www.canis.uiuc.edu/~webvibe/choices/choice.xsl">
<DisplayTuple>
<Reply="Red">
<DisplayText="Red">
<DisplayImage>
<URL>http://soldev.isrl.uiuc.edu/webvibe/choices/red.gif</URL>
</DisplayTuple>
<DisplayTuple>
<Reply="Orange">
<DisplayText="Orange">
<DisplayImage>
<URL>http://soldev.isrl.uiuc.edu/webvibe/choices/orange.gif</URL>
</DisplayTuple>
<DisplayTuple>
<Reply="Yellow">
<DisplayText="Yellow">
<DisplayImage>
<URL>http://soldev.isrl.uiuc.edu/webvibe/choices/yellow.gif</URL>
</DisplayTuple>
<DisplayTuple>
<Reply="Green">
<DisplayText="Green">
<DisplayImage>
<URL>http://soldev.isrl.uiuc.edu/webvibe/choices/green.gif</URL>
</DisplayTuple>
<DisplayTuple>
<Reply="Blue">
<DisplayText="Blue">
<DisplayImage>
<URL>http://soldev.isrl.uiuc.edu/webvibe/choices/blue.gif</URL>
</DisplayTuple>
These display tuples are actually from the MySQL database. I am open to suggestions. One option is to put a cgi script in the choices directory and have the "path" sent in. This script would query the database and create these XML files as needed. There of course is a space time tradeoff. It might be better to only create the xml files once for any database.
One question is how the xml files are displayed. One option is to use dynamic XSL to generate javascript or something else to display in the window. The line <XSL processor "http:// www.canis.uiuc.edu/~webvibe/choices/choice.xsl"> which is the incorrect syntax, could make the translation from xml to html or php or whatever it is that needs to be displayed in the choice window. Rather than dynamic translation we could also process this ahead of time do that the files to be displayed would already exist. I think this would be transparent to the java code on the client. It would just have a url that is sticks into a window that it generates. That code would eventually need to return a result string to the java calling function. If the type was "radio" this would only be one item (eg. "blue"). If it was a checkbox there might be several choices made by the user (blue or purple). The java would translate this as a logical OR in the query string.
Non-XML option
1) From Client/swing call soap to with the dtd key as the argument
rpc with a return of an array of strings;
2) Servlet returns string array in the form
dtd;word;url1[|url#]eol
word;url[|url#]eol
...
example:
FNA.dtd;ovate;http://soldev.isrl.uiuc.edu/webvibe/cimages/ovate1.gif| http://soldev.isrl.uiuc.edu/webvibe/cimages/ovate2.gif(eol)
obovate; http://soldev.isrl.uiuc.edu/webvibe/cimages/obovate1.gif| http://imageserver.ncbg..edu/webvibe/cimages/obovate2.gif|(eol)
3) Client needs to restructure the strings into a form/window to display images and get user reply.
Window types: radio button, check box, int,....
Each type needs a different window processor.
There is a security issue for certificate to allow access to multiple image servers.
Server constructs the url from the host+file name. Do we need certificate from the image servers to display the *.gif files?
The keys are of the form
BACKGROUND COLOR|BKGROUNDCOLOR 1 3 2 1 0 0
from bflydtd.structure. The key is
1:3:2:1:0:0
OpenArchive type servers provide character states and values.
Nightly harvester collects values for storage in central database using keys+word to have url for images. Server id used to identify url of images.
Part of Speech Tagging
Use the Brill tagger in webvibe/etc/tagger
Files may be converted one by one with the following commands run from the directory /home/webvibe/etc/tagger/Bin_and_Data
First tokenizing:
less Species/s_Abies_amabilis.html | tokenize > Species_tokenized/s_Abies_amabilis.tokenized
Then tagging:
tagger LEXICON Species_tokenized/s_Abies_amabilis.tokenized BIGRAMS LEXICALRULEFILE CONTEXTUALRULEFILE -i Species_tagged/s_Abies_amabilis.tagged
There is a script. /home/webvibe/etc/tagger/Bin_and_Data/TokenAndTag.sh that will convert all of the html files in /home/webvibe/etc/tagger/Bin_and_Data/Species into tokenized and tagged files in ~Species_Tokenized and ~/Species_Tagged respectively.
-------- TokenAndTag.sh ----------------
# TokenAndTag.sh
# will convert all of the html files in ~/Species into tokenized and tagged
# files in ~Species_Tokenized and ~/Species_Tagged respectively
#
for name in `ls Species/*.html`
do
basefilename=`basename $name .html`
echo $name
#echo $newname
# newname=`basename $name ".xml"`
cat $name | tokenize > Species_tokenized/$basefilename.tokenized
tagger LEXICON Species_tokenized/$basefilename.tokenized BIGRAMS LEXICALRULEFILE CONTEXTUALRULEFILE -i Species_tagged/$basefilename.tagged
done
-------- TokenAndTag.sh ----------------
Text Extraction
grep -i leaf/NN" "blade *
grep -i Leaves\</B\>/NN *
For each file in the directory
# Read though the file
if not end of file templine$ = read a line
read until you find "leaf or leaves"
read up to the word blade,
pick the first adj as a shape word or adv adj., ignore size adj (#-# )
Leaf parts
Blade
Margin
Base
Apex
Examples of forms of Leaf description:
Some start with "Leaves" in bold, some "Leaf Blade" in bold.
s_Halophila_johnsonii.html:
Leaves apparently attached to rhizome, 5–25 ´ 1–4 mm; blade linear-lanceolate, base cuneate, margins entire, glabrous, apex round to round-acute.
s_Hesperocnide_tenella.html :
Leaf blades 1-6 × 1-4 cm, base broadly cuneate, rounded, or cordate, apex acute or short-acuminate to obtuse tip.
s_Aspidotis_carlotta.html:
Leaves monomorphic or weakly subdimorphic, 10--30 cm. Blade 4-pinnate, 3--12 cm, nearly as wide as long, thin to thick. Ultimate segments narrowly lanceolate to deltate, 2--6 mm; midrib obscure or evident abaxially. Sori of mature blades ± discrete to usually subcontinuous, 3--7(--9) per segment; indusia semicircular to usually elongate and connecting several adjacent sori, margins with 6--10 irregular and prominent teeth and/or lobes.
s_Polystichum_aleuticum.html:
Leaves monomorphic, erect, 1--1.5 dm; bulblets absent. Petiole 1/6--1/4 length of leaf; scales tan, sparse or falling off early. Blade linear-lanceolate, 1-pinnate, gradually tapered to base. Pinnae ± deltate to ovate, slightly overlapping, in 1 plane, 4--8 mm; base truncate, acroscopic auricle well developed; margins denticulate, not spiny; apex rounded, not dentate; microscales linear, lacking projections, dense on both surfaces. Indusia entire to minutely erose-dentate.
s_Ranunculus_triternatus.html:
Basal leaves persistent, blades rhombic to deltate or reniform in outline, 3-4×-dissected, 1.1-3.4 × 2-3.1 cm, segments linear, base obtuse, margins crenate, apices of segments narrowly rounded.
--------------------------------------------------------------
Instructions for identifying examples of shape.
Start www.biobrowser.org
Select search databases
Search on a word
Leaf Blade
3-parted or
-divided
deltate
elliptic
narrowly elliptic
narrowly elongate
broadly elongate
elongate
lance-elliptic
lance-ovate
lanceolate
linear
linear- ?
lobed
oblanceolate
obovate
orbiculate
pinnate
reniform
semicircular
Leaf Base
cuneate
Plant Characteristics
Standardize the height to be 150.
If the image is a line drawing save it in gif format. If it is a photo use jpg.
Save the new image to a file named: descriptionword.gif or descriptionword.jpg
If there is more than one example for the same descriptive term use numbers after the descriptive term.. eg. ovate1.gif, ovate2.gif.
If the description is multiword, replace spaces with the underscore character eg. narrowly_elongate.gif
The following information Should be associated with each image
Image File Name: (e.g. dentate.gif)
Image Description: (e.g. Leaf Margin, Dentate)
Publication: (e.g. Flora of North America)
Original URL Document or page from print materials: (e.g. http://www.canis.uiuc.edu/~webvibe/SpeciesDef/s_Thalictrum_minus.html)
Original URL for the image it was extracted from or Figure number for print material: http://www.canis.uiuc.edu/~webvibe/fna_images/plates/I27101384.html
Keep a record of where you got the images. Include the name of the Species and the name of the original file.
Repeat for more images.
You need to copy these files to the proper directory. If you are using unix in the ISRL just save them directly to ~webvibe/FNA/newcharacterimages. this is under your home directory.
If you are on a windows machine in the isrl or another machine anywhere you need to use secure ftp.
open sftp
copy files / drag and drop.
If ssh is not installed on your machine you can get it at the url below for free http://uiarchive.uiuc.edu/content/PC/Communications_and_Networking/SSH/
Move the gif files to the soldev.isrl.uiuc.edu server using sftp (or other secure ftp).
Place in the directory ~webvibe/FNA/newcharacterimages. This directory is also called
~webvibe/FNA/nci. You can place the files in either one.
There is a data entry form at http://soldev.isrl.uiuc.edu:8080/glossary/index.jsp
Below is a set of instructions for installing SSH
1) Go to ftp://ftp.ssh.com/pub/ssh/SSHSecureShellClient-3.1.1.exe with your web browser.
Say yes and OK to everything! The license is free for Universities and non-profits (there is just a little something about first born children).
2) Run the ssh program. It should be in your desktop and on your program list.
3) Set a profile to point at the OpenKey page so that you will not need to remember what computer this is on.
Push the "Profile" button
Select "Add Profile"
Set these values:
Host name: soldev.isrl.uiuc.edu
User Name: janeg
Click "OK"
4) Now change the default file protection so that things that you place on the web will be readable.
Click "Edit"
Select Settings
Select "File Transfer /Advanced.
The form should have 644 filled in for the "Default File Permission Mask"
Change it to 664. (664 sort of works but other group members will not be able to edit the file.)
Click "OK"
--------------------------------------
From now on this is all you need to do.
5) Login
Select the profile you just made.
Enter your password
You now have a terminal window.
6) Start the file transfer client
Click on the yellow folder with blue circles on it.
You now have a window that looks like the Windows Explorer.
7) Click your way to the OpenKey folder. You just logged in as Jane. There is a folder in her directory called "openkey" Click that. You are now where you need to place files.
8) Drag and drop files.
To copy a file into the OpenKey website
open Windows Explorer or anything that lists your files for you on your PC.
Click and hold down the mouse button for the file(s) you want to copy.
Drag them to the SSH window.
Let go of the button.
To copy files from the OpenKey website
Point at what you want to copy, click and drag to where you want to put it.
Butterfly Characteristics
Images of butterfly characteristics should be stored in files names as above, reflecting the characteristic. For example. there is a <butterfly><taxon><charactertaxon><morphology><wingappendages>no or <butterfly><taxon><charactertaxon><morphology><wingappendages>yes in the Butterfly.DTD. Images might be given names like WingAppendageNo.jpg and WingAppendageYes.jpg. These should be stored on the main Biobrowser server in the directory.
//home/webvibe/public_html/ButterflyCharacters/images
The development directory is usr/local/tomcat/webapps/glossary
The location of the image management stuff is:
http://soldev.isrl.uiuc.edu:8080/glossary/
You need to log in with your username, which I believe is heidorn. In order to use the webpage you have to be in the database as an administrator. There is a section on this page where you can add or remove administrators. The pages are not as automated as I'd like at the moment, but I've spent most of my time this week working on the OAI project. SOAP will definitely work very nicely for it. I'm actually rather surprised that it hasn't been implemented already. It seems like a very elegant way to approach it.
Metadata about the images should be stored in the an xml file fitting the following dtd.
This dtd is based on the NISO: Technical Metadata for Digital Still Imageshttp://www.niso.org/committees/committee_au.html
ButterflyImageImageCharacteristic.dtd
<!-- DTD for Butterfly Image Characteristics-->
<!ELEMENT ImageElements (Description,ImageLocation,CopyrightHolder,Source,DateCr
eated,Species?,DerivedFrom?,Contributor)>
<!ELEMENT Description (#PCDATA)>
<!ELEMENT ImageLocation (URL)>
<!ELEMENT CopyrightHolder (#PCDATA)>
<!ELEMENT Source (#PCDATA)>
<!ELEMENT DateCreated (#PCDATA)>
<!ELEMENT Species (#PCDATA)>
<!ELEMENT DerivedFrom (#PCDATA)>
<!ELEMENT Contributor (ContributorID,ContributorName,ContributorDetails)>
<!ELEMENT ContributorID (#PCDATA)>
<!ELEMENT ContributorName (#PCDATA)>
<!ELEMENT ContributorDetails (#PCDATA)>
Example: WingAppendageNo.xml
<?xml version="1.0"?>
<!DOCTYPE ButterflyCharacteristic SYSTEM "http://soldev.isrl.uiuc.edu/~webvibe/B
utterflyCharacters/ButterflyImageCharacteristic.dtd"
>
<ImageElements>
<Description>wing appendage not present</Description>
<ImageLocation>http://soldev.isrl.uiuc.edu/~webvibe/ButterflyCharacters/images/W
ingAppendageNo.jpg</ImageLocation>
<CopyrightHolder>None</CopyrightHolder>
<Source>Illinois Natural History Museum</Source>
<DateCreated>January 9, 2002</DateCreated>
<Species>Papilio polyxenes</Species>
<DerivedFrom> http://www.canis.uiuc.edu/~webvibe/Butterflies/mSWALTL.jpg</Derive
dFrom>
<Contributor>
<ContributorID>pbheidorn</ContributorID>
<ContributorName>P. Bryan Heidorn</ContributorName>
<ContributorDetails>University of Illinois, GSLIS</ContributorDetails>
</ImageElements>
or
<?xml version="1.0"?>
<!DOCTYPE PlantCharacteristic SYSTEM "http://soldev.isrl.uiuc.edu/~webvibe/PlantCharacters/PlantImageCharacteristic.dtd"
>
<ImageElements>
<Description>Leaf Arrangements - Alternate</Description>
<ImageLocation>http://soldev.isrl.uiuc.edu/~webvibe/PlantCharacters/images/AlternateLeaf.jpg</ImageLocation>
<CopyrightHolder>Illinois Natural History Survey</CopyrightHolder>
<Source>Observing, Photographing, and Collecting Plants. Illinois Natural History Survey Circular 55, 1980</Source>
<DateCreated>January 29, 2002</DateCreated>
<Species></Species>
<DerivedFrom> </DerivedFrom>
<Contributor>
<ContributorID>pbheidorn</ContributorID>
<ContributorName>P. Bryan Heidorn</ContributorName>
<ContributorDetails>University of Illinois, GSLIS</ContributorDetails>
</ImageElements>
Margins:
crispate
Bei Yu
Nov. 12, 2001
Period Report for Document Preprocessing
Main Tasks
Current Data Sets
Working Results
Working Directory: /home/webvibe/public_html/beiyu/data/
It includes 7 decomressed file directories.
All the big T files are in T_files directory.
All the small t files are in t_files.
Because currently the small t files can’t be included into big T files automatically, I made a program named "Tt_merge.pl" to merge big T files and their corresponded small t files together.
/home/webvibe/beiyu/code/
Data Problem:
The previous data files were acquired by the spider program. It is not the same as the current one in compressed files. For the current files, the small t files can not be included into big T files automatically. Some big T files and small t files don’t have matched correspondents. Tt_merge.pl deals with this problem.
Tt_merge.pl:
Tasks:
Result:
The following small t files don’t have matched big T files.
In Hamamelidae/
t40001762.html
t40010579.html
t40014731.html
t40027085.html
t40034662.html
t42000420.html (this file is linked to T50128103.html)
In Magnoliidae/
t50007404.html
The following big T files don’t have matched small t files.
In Magnoliidae/
T50007476.html
T50007477.html
T50007478.html
T50007480.html
T50007482.html
T50007484.html
T50128103.html (this file is linked to t42000420.html)
Rerun tparse.pl:
The code can still be run on current data set, but because of the problems in data, the result is not quite the same, especially for sub-family parsing.
Previous Parsing Result:
|
Family |
Sub-family |
Genus |
Sub-genus |
Species |
|
76 |
4 |
258 |
4 |
1331 |
Current Parsing Result:
|
Family |
Sub-family |
Genus |
Sub-genus |
Species |
|
72 |
0 |
246 |
4 |
1196 |
After creating T40001762, T40010579, T40014731, T40027085, T40034662, T50007404 according to the corresponded t files:
|
Family |
Sub-family |
Genus |
Sub-genus |
Species |
|
72 |
0 |
252 |
5 |
1235 |
All the new parsed results are stored in
/home/webvibe/public_html/beiyu/new_parse_result/
Task 2: Unify the file format as "specification-image-map" order.
The problem is for some html files the images are presented as links to a separate html files which include the *.gif.
Program "new_embed_inline_image.pl" deals with the problem. It gets the links at the beginning of the html files and read the content of the target image files and insert them into the bottom of the current processed files.
Task 3: Add links between family, genus and species.
When parsing the family, genus and species using tparse.pl, the relation between genus and species have been recorded in the name of the species files.
The program fails to trace the relation between family and genus because they are stored in separate files and can’t be linked for most of the files. But there exist some files mixing the family, genus and species descriptions. For these files the relations between family and genus are kept.
I revised the tparse.pl to record the relation between these families and genera. File fg_links.html records the part of relation between 13 families and their 46 genera.
I integrated the work into link_fgs.pl so that all the available relations between families, genera and species are hyperlinked in the files. link_fgs.pl deals with each species, genus and family file by turn.
This work is based on the old parsed result. The source data are at:
http://soldev.isrl.uiuc.edu/~webvibe/public_html/Family/
http://soldev.isrl.uiuc.edu/~webvibe/public_html/Genus/
http://soldev.isrl.uiuc.edu/~webvibe/public_html/Species/
All the results are stored at /home/webvibe/public_html/beiyu/links/
See : http://soldev.isrl.uiuc.edu/~webvibe/beiyu/links/Family/
http://soldev.isrl.uiuc.edu/~webvibe/beiyu/links/Genus/
http://soldev.isrl.uiuc.edu/~webvibe/beiyu/links/Species/
All the codes and working data are stored at /home/webvibe/beiyu. The list is:
Tt_merge.pl
tparse.pl
link_fgs.pl
new_embed_inline.pl
END
Instrumentation for Experimental Evaluation
In the course of the experiment, the user needs to press JavaScript buttons to indicate if they believe that the match to the target and what is not a match.
This is accomplished through communication between the main Java-based client application and Javascript. Java opens a new window with JavaScript in one frame and the document being evaluated in the other. Javascript communicates back through a javamessage() ?? call telling Java which button was pushed. The main java application needs to compare the actual target name with the user's selection.
Unfortunately, the user can change the document being displayed by following the hyper links in the document. When the user presses the "This is it" button, they may not be looking at the original document. The currently, being displayed document name needs to be passed back to the main java application. Hong suggested inserting JavaScript buttons into the source documents. This would need to be done with a preprocessor if we can not think of better solution.