Technical Report

There are many sections to this document. Read below to find the relevant part.

 

Files that need to be moved to a portable server.

You need to make sure CVS and ANT are installed?

Things to copy for a new server

webvibe/public_html/Client/

*.structure

demo.html, demo.php

blank.html

webvibe-client.jar

/corejava

public_html/help

/FamilyDef

/GenusDef

/SpeciesDef

/FNA_XML

/bflyxml

/cgi-bin

/images

put the webvibe server in

webvibe/~bin/webvibe-server.jar

Edit demo.php demo.html to point to correct server, index files.

BIBEClient and BIBEServer

swish-x

mysql -

Thesaurus

Network Services

Apache

Tomcat

Soap

Mozilla (to be used with VNC)

Illinois FNA subset.

Wireless Networking on

iPAQ PocketPC

Handspring PalmOS

Vaio - Windows/me

VNC

Server on TeleBotonistServer

Clients

iPAQ PocketPC

Handspring PalmOS

Vaio - Windows/me

News Client

TeleBotanistServer

iPAQ PocketPC

Handspring PalmOS

Vaio - Windows/me

 

 

Spider

In order to index files on the web you need to use a web spider.

The swishspider requires a number of Perl libraries. These are listed below:

http://search.cpan.org/search?dist=Compress-Zlib

Compress-Zlib-1.16.tar.gz

http://search.cpan.org/search?dist=libnet

libnet-1.0901.tar.gz

http://search.cpan.org/search?mode=module&query=libnet

Bundle-libnet-1.00.tar.gz

Install LWP From http://search.cpan.org/search?dist=LWP

HTML-Parser-3.25.tar.gz

HTML-Tagset-3.03.tar.gz

http://search.cpan.org/search?mode=module&query=tagset

 

Swish-ex indexing

Differences between swish-e and swish- ex

Functional Differences

Programming Differences

How to Compile Swish-ex

(Insert Hong's documentation here)

Using Swish-ex

The executable is available at /home/webvibe/bin/swishe2.0beta3HCrTxtXml/src/swish-ex

Configuration File

Making an index

Using an index

Index file = /home/webvibe/index/FNAIndexWDictWithHongSwish

swish-ex -f /home/webvibe/index/FNAIndexWDictWithHongSwish -w '"sepals dark"'

returns for example

http://www.canis.uiuc.edu/~webvibe/GenusDef/g_DELPHINIUM.html "g_DELPHINIUM.html

where sepals and dark are not adjacent.

swish-ex -f /home/webvibe/index/butterfly.index -w

'"marginal spots"'

# SWISH format 2.0

# Search words: " marginal spots "

# Number of hits: 4

159 /home/webvibe/public_html/bflyxml/8Phoebis_sennae_m.xml

"8Phoebis_sennae_m.xml" 1941

159 /home/webvibe/public_html/bflyxml/7Phoebis_sennae_f.xml

"7Phoebis_sennae_f.xml" 1942

141 /home/webvibe/public_html/bflyxml/21Speyeria_idalia_f.xml

"21Speyeria_idalia_f.xml" 2015

121 /home/webvibe/public_html/bflyxml/22Speyeria_idalia_m.xml

"22Speyeria_idalia_m.xml" 2116

In xml searches, phrases may be searched as in the following example.

/home/webvibe/users/hong/swish-ex/src/swish-e -T /home/webvibe/public_html/FNA_XML/FNA.dtd -w '"<fna><description>flowers solitary"' -f ~webvibe/index/fnaxml.index

Structure of the Swish-ex XML

Data structure of swish-ex in indexing and searching xml documents:

Here is an example DTD for article

<!Element Article (Title, Author, Abstract)>

<!Element Title #PCDATA>

<!Element Author (FirstName, LastName)>

<!Element FirstName #PCDATA>

<!Element LastName #PCDATA>

<!Element Abstact #PCDATA>

The data structure used for representing this dtd looks like:

Article Hash

Title

0 Author Hash



Author FirstName



1 10

Abstract LastName

 

2 11

Description of the data structure:

Every element that has sub-elements is represented as a hash, for example, element Article is a hash with three key-value pairs; element Author is a hash with two key-value pairs. The author hash is linked to Author cell in Article hash because it is a sub-element of Article element.

The keys are called tagnames. Each tagname has a number associated with it. The numbers are prefixed with their parent elements’ numbers. For example, FirstName has a number 10, where 1 is Author’s number in Article Hash and 0 is the order it gets in Author Hash.

 

Structure of the Glossary

There are several glossaries in this directory including the FNA Glossary

pwd /home/webvibe/FNA/Glossary

 

Searching MySQL

mysql -Dvibe -uvibeuser -pthe_password_that_the_staff_know
Thought I would give it to you didn't you.

show tables;

last updated by Karen Medina, 1/3/2003

 

Selecting Definitions for the Glossary

Species Pantarum

Processing of the text file.

The file was delivered by Anthony R. Brach in a word document.

This was saved as a text file and copied to the Unix directory ~webvibe/FNA/Glossary under the name SpeciesPlanarumGlossary.txt.

The first ";" was converted to a tab with vi

All references to "'" (single quote) were converted to "\'" (slash quote). This is needed for MySQL.

The program ~webvibe/FNA/Glossary/GlossaryDefinition/InsertSpeciesPlantarum.pl was used to same all of the terms and definitions in a table called species_plantarum_def.

There are 972 definitions in the file.

 

How to Compile the Java Server

1) Make a directory called "src" where you wish to keep the development code

#mkdir src

2) Change to that directory

# cd src

3) Copy the latest version of the source out of the cvs source code control system. This will create all of the directories with the source code.

# cvs -d /home/webvibe/cvsroot co webvibe

Your output should look like:

----------------------------

soldev:src 504 $ cvs -d /home/webvibe/cvsroot co webvibe

cvs checkout: Updating webvibe

U webvibe/build.xml

cvs checkout: Updating webvibe/conf

U webvibe/conf/databases.xml

U webvibe/conf/marys.xml

cvs checkout: Updating webvibe/docs

cvs checkout: Updating webvibe/docs/newsearchxml

U webvibe/docs/newsearchxml/database field rename.xml

U webvibe/docs/newsearchxml/database.xml

U webvibe/docs/newsearchxml/dataset.xml

U webvibe/docs/newsearchxml/message-query.xml

U webvibe/docs/newsearchxml/message-queryformat.xml

cvs checkout: Updating webvibe/jar

U webvibe/jar/crimson.jar

U webvibe/jar/jaxp.jar

U webvibe/jar/jdbc-mysql.jar

.....

----------------------------

4) Change to the webvibe directory

# cd webvibe

5) Run "ant" to compile the source code and build jar files and store them in webvibe/dist. There will be one for server, client and test. Included jar files are in the webvibe/jar directory.

# ant

Your output should look like:

----------------------------

Buildfile: build.xml

init:

[mkdir] Created dir: /home/webvibe/src/webvibe/build

[mkdir] Created dir: /home/webvibe/src/webvibe/dist

[mkdir] Created dir: /home/webvibe/src/webvibe/docs/api

compile:

[javac] Compiling 89 source files to /home/webvibe/src/webvibe/build

[javac] Note: 8 files use or override a deprecated API. Recompile with "-deprecation" for details.

[javac] 1 warning

dist:

[jar] Building jar: /home/webvibe/src/webvibe/dist/webvibe-server.jar

[jar] Note: creating empty jar archive /home/webvibe/src/webvibe/dist/webvibe-tests.jar

[jar] Building jar: /home/webvibe/src/webvibe/dist/webvibe-client.jar

BUILD SUCCESSFUL

Total time: 43 seconds

----------------------------

6) To edit a file, move to the appropriate directory such as server

# cd /home/webvibe/src/webvibe/src/webvibe/server

edit the file

When you are confident of the changes they should be added back into cvs with the

# cvs commit -m "Type a reason for this commit and what you changed"

To recompile go back to the directory containing the build.xml file and run ant again.

# ant

7) To add a new file to the source, create the file, and begin it wilt a line defining its package. For example, a server files should have the line

package webvibe.server;

Follow the example of other files in the directory where you wish to create the new file.

Add the new file to cvs

# cvs add <filename>.

 

Running the server

start the test servers with "ant run 8016"

To run on another port, copy ~src/webvibe/dist/webvibe-server.jar to where you want it to reside.

From the command line it can be executed with

java -classpath

/usr/java/lib/mysql_comp.jar:/home/webvibe/bin/server/webvibe-server.jar

webvibe.server.Server1 8016 >> $HOME/logs/cron8016.log%

 

Query Interface

At the moment, it looks like we will be sticking with Swing. If you look at the interface at http://www.canis.uiuc.edu/~webvibe/Client/demo.php http://soldev.isrl.uiuc.edu/~webvibe/Client/demo.php and pick the EcoWatch butterfly collection, you will see a dtd structure tree in the right hand side of the query panel. The nodes of that tree are currently editable. You click them and a couple of seconds later you can edit one. So for example you can open the Background Color = blue. Jingbo is adding a new function so that if you right click(reverse click) some terminal nodes you will get a new window with a set of choices.

For example: you might get a window with a set of the eight primary colors and the user could pick one (or two?). Where you come in is the list of items that is in the window. It might be a text list but in the more general case it should be a set of display elements and associated replies.

A choice table might look like

Choice.dtd

<!-- DTD for Choice DISPLAY LISTS -->

<!ATTLIST Choice Type (radio | checkbox ) "checkbox">

<!ELEMENT DisplayTuple (Reply, DisplayText, DisplayImage?)

<!ELEMENT Reply (#PCDATA)>

<!ELEMENT DisplayText (#PCDATA)>

<!ELEMENT DisplayImage (url)>

color.xml

<?xml version="1.0"?>

<!DOCTYPE Choice SYSTEM "http://www.canis.uiuc.edu/~webvibe/choices/choice.dtd">

<XSL processor "http:// www.canis.uiuc.edu/~webvibe/choices/choice.xsl">

<DisplayTuple>

<Reply="Red">

<DisplayText="Red">

<DisplayImage>

<URL>http://soldev.isrl.uiuc.edu/webvibe/choices/red.gif</URL>

</DisplayTuple>

<DisplayTuple>

<Reply="Orange">

<DisplayText="Orange">

<DisplayImage>

<URL>http://soldev.isrl.uiuc.edu/webvibe/choices/orange.gif</URL>

</DisplayTuple>

<DisplayTuple>

<Reply="Yellow">

<DisplayText="Yellow">

<DisplayImage>

<URL>http://soldev.isrl.uiuc.edu/webvibe/choices/yellow.gif</URL>

</DisplayTuple>

<DisplayTuple>

<Reply="Green">

<DisplayText="Green">

<DisplayImage>

<URL>http://soldev.isrl.uiuc.edu/webvibe/choices/green.gif</URL>

</DisplayTuple>

<DisplayTuple>

<Reply="Blue">

<DisplayText="Blue">

<DisplayImage>

<URL>http://soldev.isrl.uiuc.edu/webvibe/choices/blue.gif</URL>

</DisplayTuple>

These display tuples are actually from the MySQL database. I am open to suggestions. One option is to put a cgi script in the choices directory and have the "path" sent in. This script would query the database and create these XML files as needed. There of course is a space time tradeoff. It might be better to only create the xml files once for any database.

One question is how the xml files are displayed. One option is to use dynamic XSL to generate javascript or something else to display in the window. The line <XSL processor "http:// www.canis.uiuc.edu/~webvibe/choices/choice.xsl"> which is the incorrect syntax, could make the translation from xml to html or php or whatever it is that needs to be displayed in the choice window. Rather than dynamic translation we could also process this ahead of time do that the files to be displayed would already exist. I think this would be transparent to the java code on the client. It would just have a url that is sticks into a window that it generates. That code would eventually need to return a result string to the java calling function. If the type was "radio" this would only be one item (eg. "blue"). If it was a checkbox there might be several choices made by the user (blue or purple). The java would translate this as a logical OR in the query string.

Non-XML option

1) From Client/swing call soap to with the dtd key as the argument

rpc with a return of an array of strings;

2) Servlet returns string array in the form

dtd;word;url1[|url#]eol

word;url[|url#]eol

...

example:

FNA.dtd;ovate;http://soldev.isrl.uiuc.edu/webvibe/cimages/ovate1.gif| http://soldev.isrl.uiuc.edu/webvibe/cimages/ovate2.gif(eol)

obovate; http://soldev.isrl.uiuc.edu/webvibe/cimages/obovate1.gif| http://imageserver.ncbg..edu/webvibe/cimages/obovate2.gif|(eol)

3) Client needs to restructure the strings into a form/window to display images and get user reply.

Window types: radio button, check box, int,....

Each type needs a different window processor.

There is a security issue for certificate to allow access to multiple image servers.

Server constructs the url from the host+file name. Do we need certificate from the image servers to display the *.gif files?

The keys are of the form

BACKGROUND COLOR|BKGROUNDCOLOR 1 3 2 1 0 0

from bflydtd.structure. The key is

1:3:2:1:0:0

OpenArchive type servers provide character states and values.

Nightly harvester collects values for storage in central database using keys+word to have url for images. Server id used to identify url of images.

Part of Speech Tagging

Use the Brill tagger in webvibe/etc/tagger

Files may be converted one by one with the following commands run from the directory /home/webvibe/etc/tagger/Bin_and_Data

First tokenizing:

less Species/s_Abies_amabilis.html | tokenize > Species_tokenized/s_Abies_amabilis.tokenized

Then tagging:

tagger LEXICON Species_tokenized/s_Abies_amabilis.tokenized BIGRAMS LEXICALRULEFILE CONTEXTUALRULEFILE -i Species_tagged/s_Abies_amabilis.tagged

There is a script. /home/webvibe/etc/tagger/Bin_and_Data/TokenAndTag.sh that will convert all of the html files in /home/webvibe/etc/tagger/Bin_and_Data/Species into tokenized and tagged files in ~Species_Tokenized and ~/Species_Tagged respectively.

-------- TokenAndTag.sh ----------------

# TokenAndTag.sh

# will convert all of the html files in ~/Species into tokenized and tagged

# files in ~Species_Tokenized and ~/Species_Tagged respectively

#

for name in `ls Species/*.html`

do

basefilename=`basename $name .html`

echo $name

#echo $newname

# newname=`basename $name ".xml"`

cat $name | tokenize > Species_tokenized/$basefilename.tokenized

tagger LEXICON Species_tokenized/$basefilename.tokenized BIGRAMS LEXICALRULEFILE CONTEXTUALRULEFILE -i Species_tagged/$basefilename.tagged

done

-------- TokenAndTag.sh ----------------

Text Extraction

grep -i leaf/NN" "blade *

grep -i Leaves\</B\>/NN *

For each file in the directory

# Read though the file

if not end of file templine$ = read a line

read until you find "leaf or leaves"

read up to the word blade,

pick the first adj as a shape word or adv adj., ignore size adj (#-# )

 

 

Leaf parts

Blade

Margin

Base

Apex

Examples of forms of Leaf description:

Some start with "Leaves" in bold, some "Leaf Blade" in bold.

s_Halophila_johnsonii.html:

Leaves apparently attached to rhizome, 5–25 ´ 1–4 mm; blade linear-lanceolate, base cuneate, margins entire, glabrous, apex round to round-acute.

s_Hesperocnide_tenella.html :

Leaf blades 1-6 × 1-4 cm, base broadly cuneate, rounded, or cordate, apex acute or short-acuminate to obtuse tip.

s_Aspidotis_carlotta.html:

Leaves monomorphic or weakly subdimorphic, 10--30 cm. Blade 4-pinnate, 3--12 cm, nearly as wide as long, thin to thick. Ultimate segments narrowly lanceolate to deltate, 2--6 mm; midrib obscure or evident abaxially. Sori of mature blades ± discrete to usually subcontinuous, 3--7(--9) per segment; indusia semicircular to usually elongate and connecting several adjacent sori, margins with 6--10 irregular and prominent teeth and/or lobes.

s_Polystichum_aleuticum.html:

Leaves monomorphic, erect, 1--1.5 dm; bulblets absent. Petiole 1/6--1/4 length of leaf; scales tan, sparse or falling off early. Blade linear-lanceolate, 1-pinnate, gradually tapered to base. Pinnae ± deltate to ovate, slightly overlapping, in 1 plane, 4--8 mm; base truncate, acroscopic auricle well developed; margins denticulate, not spiny; apex rounded, not dentate; microscales linear, lacking projections, dense on both surfaces. Indusia entire to minutely erose-dentate.

s_Ranunculus_triternatus.html:

Basal leaves persistent, blades rhombic to deltate or reniform in outline, 3-4×-dissected, 1.1-3.4 × 2-3.1 cm, segments linear, base obtuse, margins crenate, apices of segments narrowly rounded.

 

--------------------------------------------------------------

Instructions for identifying examples of shape.

Start www.biobrowser.org

Select search databases

Search on a word

Leaf Blade

 

3-parted or

-divided

deltate

elliptic

narrowly elliptic

narrowly elongate

broadly elongate

elongate

lance-elliptic

lance-ovate

lanceolate

linear

linear- ?

lobed

oblanceolate

obovate

orbiculate

pinnate

reniform

semicircular

 

Leaf Base

cuneate

Plant Characteristics

Standardize the height to be 150.

If the image is a line drawing save it in gif format. If it is a photo use jpg.

Save the new image to a file named: descriptionword.gif or descriptionword.jpg

If there is more than one example for the same descriptive term use numbers after the descriptive term.. eg. ovate1.gif, ovate2.gif.

If the description is multiword, replace spaces with the underscore character eg. narrowly_elongate.gif

The following information Should be associated with each image

Image File Name: (e.g. dentate.gif)

Image Description: (e.g. Leaf Margin, Dentate)

Publication: (e.g. Flora of North America)

Original URL Document or page from print materials: (e.g. http://www.canis.uiuc.edu/~webvibe/SpeciesDef/s_Thalictrum_minus.html)

Original URL for the image it was extracted from or Figure number for print material: http://www.canis.uiuc.edu/~webvibe/fna_images/plates/I27101384.html

 

Keep a record of where you got the images. Include the name of the Species and the name of the original file.

Repeat for more images.

You need to copy these files to the proper directory. If you are using unix in the ISRL just save them directly to ~webvibe/FNA/newcharacterimages. this is under your home directory.

If you are on a windows machine in the isrl or another machine anywhere you need to use secure ftp.

open sftp

copy files / drag and drop.

If ssh is not installed on your machine you can get it at the url below for free http://uiarchive.uiuc.edu/content/PC/Communications_and_Networking/SSH/

Move the gif files to the soldev.isrl.uiuc.edu server using sftp (or other secure ftp).

Place in the directory ~webvibe/FNA/newcharacterimages. This directory is also called

~webvibe/FNA/nci. You can place the files in either one.

There is a data entry form at http://soldev.isrl.uiuc.edu:8080/glossary/index.jsp

 

Below is a set of instructions for installing SSH

1) Go to ftp://ftp.ssh.com/pub/ssh/SSHSecureShellClient-3.1.1.exe with your web browser.

Say yes and OK to everything! The license is free for Universities and non-profits (there is just a little something about first born children).

2) Run the ssh program. It should be in your desktop and on your program list.

3) Set a profile to point at the OpenKey page so that you will not need to remember what computer this is on.

Push the "Profile" button

Select "Add Profile"

Set these values:

Host name: soldev.isrl.uiuc.edu

User Name: janeg

Click "OK"

4) Now change the default file protection so that things that you place on the web will be readable.

Click "Edit"

Select Settings

Select "File Transfer /Advanced.

The form should have 644 filled in for the "Default File Permission Mask"

Change it to 664. (664 sort of works but other group members will not be able to edit the file.)

Click "OK"

--------------------------------------

From now on this is all you need to do.

5) Login

Select the profile you just made.

Enter your password

You now have a terminal window.

6) Start the file transfer client

Click on the yellow folder with blue circles on it.

You now have a window that looks like the Windows Explorer.

7) Click your way to the OpenKey folder. You just logged in as Jane. There is a folder in her directory called "openkey" Click that. You are now where you need to place files.

8) Drag and drop files.

To copy a file into the OpenKey website

open Windows Explorer or anything that lists your files for you on your PC.

Click and hold down the mouse button for the file(s) you want to copy.

Drag them to the SSH window.

Let go of the button.

To copy files from the OpenKey website

Point at what you want to copy, click and drag to where you want to put it.

 

 

Butterfly Characteristics

Images of butterfly characteristics should be stored in files names as above, reflecting the characteristic. For example. there is a <butterfly><taxon><charactertaxon><morphology><wingappendages>no or <butterfly><taxon><charactertaxon><morphology><wingappendages>yes in the Butterfly.DTD. Images might be given names like WingAppendageNo.jpg and WingAppendageYes.jpg. These should be stored on the main Biobrowser server in the directory.

//home/webvibe/public_html/ButterflyCharacters/images

The development directory is usr/local/tomcat/webapps/glossary

The location of the image management stuff is:

http://soldev.isrl.uiuc.edu:8080/glossary/

You need to log in with your username, which I believe is heidorn. In order to use the webpage you have to be in the database as an administrator. There is a section on this page where you can add or remove administrators. The pages are not as automated as I'd like at the moment, but I've spent most of my time this week working on the OAI project. SOAP will definitely work very nicely for it. I'm actually rather surprised that it hasn't been implemented already. It seems like a very elegant way to approach it.

Metadata about the images should be stored in the an xml file fitting the following dtd.

This dtd is based on the NISO: Technical Metadata for Digital Still Imageshttp://www.niso.org/committees/committee_au.html

ButterflyImageImageCharacteristic.dtd

<!-- DTD for Butterfly Image Characteristics-->

<!ELEMENT ImageElements (Description,ImageLocation,CopyrightHolder,Source,DateCr

eated,Species?,DerivedFrom?,Contributor)>

<!ELEMENT Description (#PCDATA)>

<!ELEMENT ImageLocation (URL)>

<!ELEMENT CopyrightHolder (#PCDATA)>

<!ELEMENT Source (#PCDATA)>

<!ELEMENT DateCreated (#PCDATA)>

<!ELEMENT Species (#PCDATA)>

<!ELEMENT DerivedFrom (#PCDATA)>

<!ELEMENT Contributor (ContributorID,ContributorName,ContributorDetails)>

<!ELEMENT ContributorID (#PCDATA)>

<!ELEMENT ContributorName (#PCDATA)>

<!ELEMENT ContributorDetails (#PCDATA)>

 

Example: WingAppendageNo.xml

<?xml version="1.0"?>

<!DOCTYPE ButterflyCharacteristic SYSTEM "http://soldev.isrl.uiuc.edu/~webvibe/B

utterflyCharacters/ButterflyImageCharacteristic.dtd"

>

<ImageElements>

<Description>wing appendage not present</Description>

<ImageLocation>http://soldev.isrl.uiuc.edu/~webvibe/ButterflyCharacters/images/W

ingAppendageNo.jpg</ImageLocation>

<CopyrightHolder>None</CopyrightHolder>

<Source>Illinois Natural History Museum</Source>

<DateCreated>January 9, 2002</DateCreated>

<Species>Papilio polyxenes</Species>

<DerivedFrom> http://www.canis.uiuc.edu/~webvibe/Butterflies/mSWALTL.jpg</Derive

dFrom>

<Contributor>

<ContributorID>pbheidorn</ContributorID>

<ContributorName>P. Bryan Heidorn</ContributorName>

<ContributorDetails>University of Illinois, GSLIS</ContributorDetails>

</ImageElements>

or

<?xml version="1.0"?>

<!DOCTYPE PlantCharacteristic SYSTEM "http://soldev.isrl.uiuc.edu/~webvibe/PlantCharacters/PlantImageCharacteristic.dtd"

>

<ImageElements>

<Description>Leaf Arrangements - Alternate</Description>

<ImageLocation>http://soldev.isrl.uiuc.edu/~webvibe/PlantCharacters/images/AlternateLeaf.jpg</ImageLocation>

<CopyrightHolder>Illinois Natural History Survey</CopyrightHolder>

<Source>Observing, Photographing, and Collecting Plants. Illinois Natural History Survey Circular 55, 1980</Source>

<DateCreated>January 29, 2002</DateCreated>

<Species></Species>

<DerivedFrom> </DerivedFrom>

<Contributor>

<ContributorID>pbheidorn</ContributorID>

<ContributorName>P. Bryan Heidorn</ContributorName>

<ContributorDetails>University of Illinois, GSLIS</ContributorDetails>

</ImageElements>

Margins:

crispate

Bei Yu

Nov. 12, 2001

Period Report for Document Preprocessing

Main Tasks

  1. Rerun Jun’s code "tparse.pl" on the current data set.
  2. Unify the file format as "specification-image-map" order.
  3. Add links between family, genus and species.

Current Data Sets

  1. Original data location: /home/webvibe/FNA/
  2. Compressed Files List:

Working Results

  1. Data:
  2. Working Directory: /home/webvibe/public_html/beiyu/data/

    It includes 7 decomressed file directories.

    All the big T files are in T_files directory.

    All the small t files are in t_files.

    Because currently the small t files can’t be included into big T files automatically, I made a program named "Tt_merge.pl" to merge big T files and their corresponded small t files together.

  3. Task 1: rerun tparse.pl on the current set.

/home/webvibe/beiyu/code/

Data Problem:

The previous data files were acquired by the spider program. It is not the same as the current one in compressed files. For the current files, the small t files can not be included into big T files automatically. Some big T files and small t files don’t have matched correspondents. Tt_merge.pl deals with this problem.

Tt_merge.pl:

Tasks:

      1. check which big T files don’t have matched small t files.
      2. check which small t files don’t have matched big T files.
      3. merge big T files and small t files and store as new T files.

Result:

The following small t files don’t have matched big T files.

In Hamamelidae/

t40001762.html

t40010579.html

t40014731.html

t40027085.html

t40034662.html

t42000420.html (this file is linked to T50128103.html)

In Magnoliidae/

t50007404.html

The following big T files don’t have matched small t files.

In Magnoliidae/

T50007476.html

T50007477.html

T50007478.html

T50007480.html

T50007482.html

T50007484.html

T50128103.html (this file is linked to t42000420.html)

Rerun tparse.pl:

The code can still be run on current data set, but because of the problems in data, the result is not quite the same, especially for sub-family parsing.

Previous Parsing Result:

Family

Sub-family

Genus

Sub-genus

Species

76

4

258

4

1331

Current Parsing Result:

Family

Sub-family

Genus

Sub-genus

Species

72

0

246

4

1196

After creating T40001762, T40010579, T40014731, T40027085, T40034662, T50007404 according to the corresponded t files:

Family

Sub-family

Genus

Sub-genus

Species

72

0

252

5

1235

All the new parsed results are stored in

/home/webvibe/public_html/beiyu/new_parse_result/

Task 2: Unify the file format as "specification-image-map" order.

The problem is for some html files the images are presented as links to a separate html files which include the *.gif.

Program "new_embed_inline_image.pl" deals with the problem. It gets the links at the beginning of the html files and read the content of the target image files and insert them into the bottom of the current processed files.

Task 3: Add links between family, genus and species.

When parsing the family, genus and species using tparse.pl, the relation between genus and species have been recorded in the name of the species files.

The program fails to trace the relation between family and genus because they are stored in separate files and can’t be linked for most of the files. But there exist some files mixing the family, genus and species descriptions. For these files the relations between family and genus are kept.

I revised the tparse.pl to record the relation between these families and genera. File fg_links.html records the part of relation between 13 families and their 46 genera.

I integrated the work into link_fgs.pl so that all the available relations between families, genera and species are hyperlinked in the files. link_fgs.pl deals with each species, genus and family file by turn.

This work is based on the old parsed result. The source data are at:

http://soldev.isrl.uiuc.edu/~webvibe/public_html/Family/

http://soldev.isrl.uiuc.edu/~webvibe/public_html/Genus/

http://soldev.isrl.uiuc.edu/~webvibe/public_html/Species/

All the results are stored at /home/webvibe/public_html/beiyu/links/

See : http://soldev.isrl.uiuc.edu/~webvibe/beiyu/links/Family/

http://soldev.isrl.uiuc.edu/~webvibe/beiyu/links/Genus/

http://soldev.isrl.uiuc.edu/~webvibe/beiyu/links/Species/

All the codes and working data are stored at /home/webvibe/beiyu. The list is:

Tt_merge.pl

tparse.pl

link_fgs.pl

new_embed_inline.pl

 

END

 

Instrumentation for Experimental Evaluation

In the course of the experiment, the user needs to press JavaScript buttons to indicate if they believe that the match to the target and what is not a match.

This is accomplished through communication between the main Java-based client application and Javascript. Java opens a new window with JavaScript in one frame and the document being evaluated in the other. Javascript communicates back through a javamessage() ?? call telling Java which button was pushed. The main java application needs to compare the actual target name with the user's selection.

Unfortunately, the user can change the document being displayed by following the hyper links in the document. When the user presses the "This is it" button, they may not be looking at the original document. The currently, being displayed document name needs to be passed back to the main java application. Hong suggested inserting JavaScript buttons into the source documents. This would need to be done with a preprocessor if we can not think of better solution.