September 2002 Report
Login | Datasets | Logout
 

Meeting of 11-14 September 2002
(being a record of the proceedings and of subsequent thoughts and actions regards the topics discussed.)

 

Compiled by R.K. Peet

 

Phase 1. PI meeting Wednesday, September 11

Attending: Robert Peet, Marilyn Walker, Dennis Grossman, Michael Jennings, Don Faber-Langendoen.

Need plan for addressing the long-term business model. This should be the charge of a subcommittee of the ESA Vegetation Panel, though it seems likely the subcommittee will have a highly overlapping membership

 

Michael Lee will serve as Project Manager

Need for monthly conference calls & quarterly meetings

Set up meeting for last two weeks of January

 

Phase 2. Evaluation, Short-term planning, International outreach, Thursday-Saturday, September 12-14.

Attending: Brad Boyle, John Harris, Mike Lee, Robert Peet, Michael Jennings, Don Faber-Langendoen, Jerry Cooper, Susan Wiser, Stephan Hennekens (13-14), Dave Roberts, David Tart (12-13).

2-1 Anticipated products and activities

 

We reviewed progress to date and attempted to refine our products through the end of the first funding cycle, which ends in December. Our primary employee on the project, John Harris, plans to be rotating off very soon (gone half of Sept, all of Oct, and then back for just one month of overlap with our new employee). Our goal is to launch a prototype of VegBank very soon so as to start accumulating user feedback and public comment. We especially need to assess what additional work is critical for that launch, and which additions can wait for a subsequent upgrade.

We also discussed the major products expected for the second funding cycle and tried to project a reasonable production schedule for them.

2-1-1 Major products anticipated:

  • Prototype archive: Fully functional prototype Feb 2003.
  • Full production archive: Sept 2003.
  • VegBranch: for data entry Jan 2003; for data receipt and storage Jun 2003.
  • Enter Plots: 20,000 plots Feb-Mar 2003; 30,000 plots Apr-Jul 2003.
  • Focus groups: FGDC - Jan 2003, MidAtlantic Heritage Mar 2003, perhaps two others.
  • Performance tuning: Fall 2003.
  • Distributed VegBank capability: Fall 2004.
  • Business rules and validation layer: June 2003.
  • VegBank data services: Fall 2003.
  • Proposal engine (for proposing and distributing community classification proposals): Spring 2004.
  • Digital journal for publication of classification revisions: Spring 2004.
  • Workbench for plant concept development: fall 2003.
  • Plant concept database population: 5000 during 2003, 10000 during 2004.
  • Link plots to NatureServe Explorer: Fall 2003.

2-1-2 Web services to be provided by VegBank and used by NatureServe:

  • Service to provide summary information about a plot.
  • Service to inform of new plot annotations.
  • Service to inform of new plant taxa and new status.
  • Service to provide list of typal, assigned, and asserted plots for a type.

2-1-3 Web services to be provided by NatureServe & used by VegBank:

  • Service to inform of new plot annotations.
  • Service to inform of new plant taxa and new NS status assignments.
  • Service to inform of assignment of plots as typal, assigned, or asserted.
  • Service to provide summary information on a community type.
  • Service to inform of changes in community classification.

2-1-4 Meetings:

  • Monthly conference call with PIs and as needed plus staff, Morse, Faber-L.
  • Face-to-face at NCEAS or Arlington 3 times per year + ESA

Sept or Oct; Jan or Feb; April or May, and as needed.

 

2-2  VegBranch

Mike Lee provided an overview of progress to date on VegBranch desktop tool. As with the patent VegBank, we needed to assess progress to date and future needs.

http://vegbank.nceas.ucsb.edu/vegbranch/vegbranch.html

 

In general the group was very impressed with progress on the design and implementation of VegBranch. The only major concern expressed is that it it might be too complex for

the naive user. Interfaces for VegBranch are somewhat intimidating - need to be less cluttered


2-3 Review of parallel activities and global outreach

2-3-1 Susan Wiser (NZ Liaison).


PowerPoint presentation.

Nvs.landcareseach.co.nz

Interest in potential adoption of the VegBank architecture.

Data includes, among others: cover, tiers, stems, saplings, seedlings, frequency

 

52000 recce plots

9000 = remeasurement plots

6500 20x20 forest plots

2-3-2 Jerry Cooper (NZ Liaison).

PowerPoint presentation.

Landcare Research - nationally significant datasets

Move legacy datasets

Vertical migration

Lateral integration

Single repository not necessary if interoperate and access

Goal is to open up data

2-3-3 David Tart (USFS database activities including ENRIS and TERRA).

2-3-4 Brad Boyle (U. AZ Brian Enquist group; tropical forest stem-map database)

PowerPoint presentation.
Collaborators:

Brian Enquist (see Science 295:1517-1520; Nature 410:655)

Oliver Phillips -- Rainforest data (see PNAS 91:2805-2809)

Jason Bradford , MoBotGard

Thomas Lachner -- Conservation International

 

Planned components or requirements:

Specimens module

Distributed query (e.g. REMIB in Mexico)
http://www.conabio.gob.mx/remib/cgi-bin/remib_nodos.cgi

Accept species name or list of species

Lookup coordinates for specimens

Inventories module

Single vs distributed

Attributes

Flexibility , many formats (e.g. individuals, different sample units,

repeated observations), and taxonomic resolution (e.g. Determine

above species level, morphospecies)

Fidelity : archive historical data in original form

Comparability : permits interpretation of historical data

Taxonomy module & interpretation service

Searches online databases for names linked

Where no list : need means for user to evaluate

 

Plot resources

Gentry 0.1ha 233 Tropical forest

Boyle 0.1ha 40 Tropical forest

Rainfor 1-3 ha 525 Tropical forest

Stephenson 0.1ha 500 Sierras

Mobot PSPs 1 ha 50-100 Tropical forest

CTFS at Smithsonian (interested in VegBank)

2-3-5 Stephan Hennekens (Netherlands, TurboVeg, SynBioSys)

2-3-6 Don Faber-Langendoen (NatureServe : Representing Canadian discussions)

2-3-7 Discussion of other groups

What should be our level of interactions with outside groups?

 

Who should we interact with?

TFBIS :terrestrial & freshwater biodiversity information system

GBIF

AVH Australian Virtual Herbarium

WDCM

Species 2000

TDWG

World data centre for microorganisms

IT IS & NBII

 

2-4 Database design and business rule issues

2-4-1 Intellectual Property and Confidentiality.

Brad Boyle and Jerry Cooper both pointed out that much plot data needs to remain protected for some years if authors are to agree to post their data. There followed a discussion of how to resolve this issue.

  • The current data model contains the fields "confidentiality_status" and "confidentiality_reason" within Plot as a mechanisms for handling T&E species and private land issues.
  • We need a new permissions layer. Ideally, this would be similar to UNIX permissions.
  • The intent is to allow contributors of data to screen potential users for some fixed period of time so as to protect their rights and opportunities to publish based on the data. However, the same mechanism could be used for other reasons such as rare species or landowner issues.
  • The simple version is table of Permissions to see embargoed plots including fields for plot number, party, & permission. Confidentiality status could be maintained within Plot, but perhaps a separate table is need in the event that more than one embargo applies to a particular plot.
  • There could be a switch in embargo to allow both observation and plot, but we have not chosen to separate the observations.

 

Here is the set of two proposed new tables to handle these things

Embargo

Embargo_ID = PK

Plot_ID = FK to Plot

Embargo_reason (need new list)

Author stipulation,

Rare species,

Landownership,

Bad data,

Other

Embargo_start = Date

Embargo_stop = Date

Embargo_owner = FK to Party

Default_Status = Same list as in Permission_Status below

Permission

Permission_ID = PK

Embargo_ID = FK to Embargo

Party_ID = FK to Party given permission

Permission_Status (use Confidentiality_Status list)

Public [Default if no embargo]
1 km radius location resolution
10 km radius [Default if landowner or rare species]
100 km radius
Location embargo
Public embargo on data [Default if reason = author]
Full embargo on data

Permission_Notes

 

  • Business rules. The embargo owner should be able to lift the embargo for a party or parties via a web form, as should the management team. Submitted plots should be embargoable for up to x (5?) years, and the embargo should be renewed twice. When a plot search reveals one or more embargoed plots, the searcher see only accession # + email of owner
  • Subsequent to the meeting we observed that the Storage Resource Broker had aome rather impressive permissions options built in. Might be a good idea to look these over more carefully.

2-4-2 Individual tree records.

The Enquist (Brad Boyle speaking) and New Zealand (Jerry Cooper & Susan Wiser speaking) groups want individual tree attributes to be contained within the database. They cite the case of each tree having a voucher (or even multiple vouchers).

Because of the potential existence of multiple vouchers, we agreed to add a table for vouchers.

The essential problem with the current data model is lack of taxonomic determination of individual trees. Three solutions were discussed.

  • (1) Same as currently modeled, but with collection number allowed for each stem record. This fails if the identification associated with a stem changes and it moves to a different Taxon_Observation because we would then lose the link to old taxon_Interpretations.
  • (2) Create a table parallel to Taxon_Observation for individuals, called perhaps Individual_Observation. The problem here is that we would need a second Individual_Interpretation table, or we would need to place a switch in Taxon_Interpretation so that a record could point to a record in either of the two tables. Similarly, the Voucher table would need a switch so that a record could apply to either the Taxon_Observation or Individual_Observation
  • (3) Add an Observation_Type to Taxon_Observation which indicates whether this is an individual stem or a collective record. If a collective observation, then current fields like cover could be populated, and if an individual observation then a record could be placed in a linked stems table. This solution has a performance price in that there will be millions of added entries in the table to handle that many individual stems.

We recognized that solution #3 is the best for use in a conceptual data model, while recognizing that the actual physical model implemented may need to be closer to #2 for performance reasons.

Here is the basic data model (replacing Taxon_Obs, Stem_count and stem_location.

Biological_Observation (= old Taxon_Observation)

Biological_Observation_ID = PK

Observation_ID = FK to the plot

Biological_Type (Collective or Individual)

Cover

Basal_Area

Density

Other collective observation types ??

Notes

Stem (replaces current stemCount and stemLocation tables)

Stem_ID = PK

Biological_Observation_ID = FK

Stem_ID = Recursive FK for repeated observations

(Each set would need a new Observation entry and thus

A new Biological_Observation_ID record)

Stem_Diameter

Stem_Diameter_Accuracy

Stem_Height

Stem_Height_Accuracy

Individual_X_Position

Individual_Y_Location

Stem_Code

Voucher

Voucher_ID = PK

Biological_Observation_ID = FK

Party_ID = collector

Collection_Number

Museum

Accession_Number

Collection_Date

2-4-3 References

Need more research. The following candidate formats were recognized

ITIS

EML

GBIF

TDWG

BIOSIS

IPNI

2-4-4 Other changes required

  • Change tracking and roll-back capability. Need to add start and stop dates to various fields.
  • Move level & parent to status
  • Move table number, page number, line number etc from citation to observation

 

2-4-5 Decompose names into atomic units?

Decided not to, but to keep names as expressed and to keep names with and without authors as names systems. Genus, variety etc have parent-child relationships

2-5 Implementation and interface requirements and recommendations

2-5-1 Views & searches

  • Need report where see details and summary information on individual projects.
  • Need standard report for web service.
  • Need Project summary

2-5-2 Data quality discussion

  • Need validation layer
  • Need rectification layer - perhaps as web service

2-5-3 Searches

  • Need ability to search for projects before viewing plots, and then be able to select plots within the project
  • Need ability to select plots by working down the classification tree
  • Need advanced query page with pulldown comboboxes for attributes to search by

See Costa Rican NBIO

2-5-4 Style sheet generator

  • Plot reports, exports, etc. John already has a good start on this

2-5-5 Bulk load new taxa (with built-in rectification)

2-6 Equipment issues

2-6-1 Main machine : vegbank.org

  1. Do we need to add a second processor - Bob will ask Matt
  2. We need to add second hard drive - John will do this
  3. Need to start selecting a high performance replacement server to be installed in the spring - On hold
  4. We have finished the migration of the name vegbank.org to UCSB. John needs to convey the ownership to one of RKP-ESA-NCEAS.
  5. Tape backup issues were discussed in June - Nothing done yet.

2-6-2 Test machine added : tekka.vegbank.org

We observed the need for a test/development implementation of VegBank to run separately from the main production machine

  1. Installed the database, apache, servletengine. Ant and java are also running.
  2. Dell 450 Mhz, 128 MB RAM (to be increased), Roughly 30 G Storage
  3. Accounts: peet, lee, harris, root

2-7 Other Issues

  • XML not complete
  • We currently lack capability of downloading data from VegBank to VegBranch
  • Taxonomic modules via web services ???
  • Move news off homepage to a second page, and add critical links
  • Need plot exchange standards - potential discussion at IAVS 2003 Naples
  • Perhaps more plot and project metadata via the EML project
  • Need the in = out test for all data transfer systems

NPS, VegBRanch, XML, TurboVeg, etc.