Frequently Asked Questions (FAQ)
This FAQ includes the following sections:
Data Provider FAQ
Can I create a custom search page for the items in the data provider?
Yes. The Search page can be customized and easily installed to another location or remote Web server as described in the ODL Search Specification page.
What is different between the 'Search' and 'Admin search' pages?
The Search page is intended to be accessed by the general public. It provides search
over the full text found in all files that can be disseminated in oai_dc format.
This includes files that reside natively in oai_dc format as well as files that
can be converted to oai_dc as described in Providing
files in multiple formats. In addition, only files that are enabled in the
Metadata Files Configuration
page are available for search.
The Admin search page resides in the administration portion of the software and its access is intended to be restricted to trusted users as described in the Configuring jOAI page. It provides
search over the full text found in all XML files that are configured in the
data provider. Files are searchable and viewable even when public access to
them has been disabled in the Metadata
Files Configuration page. Admin search also provides options to search by
set, available format, and record attribute (not deleted, deleted, etc.).
If my records reside in a database, can I use jOAI to implement an OAI data provider for my repository?
The jOAI data provider allows XML files from a file system to be exposed as items in an OAI data repository. To expose records that reside in a database, write a routine to export the records to XML files at regular intervals such as once a day or once a week, depending on how often the records change. Then setup the data provider to monitor the directory or directories where the files are exported to. See also preparing files for serving.
After the initial set of records have been exported from the database, files should be modified or deleted only when the corresponding database record has been updated or deleted. jOAI will monitor the files and provide them to harvesters according to the OAI-PMH.
Does jOAI support selective or incremental harvesting?
Yes. After performing an initial full harvest of the repository, harvesters may use datestamps to request only those records that have changed or been deleted since that last time of harvest, which can greatly reduce the number of records transferred over the network over time. The data provider implements deleted records and datestamps in accordance with the OAI-PMH to support selective and incremental harvests.
If I remove a file, can I add it back at a later time?
When a file is removed from a directory that is being monitored by the data
provider, it's record will be changed to status deleted, and harvesters will
be notified that the record has been deleted the next time they harvest from
the data provider. At a later time, if a file is added back with the same unique
ID as a deleted record (regardless of the directory), the data provider will
replace it with the new one, and it's status will no longer be deleted. Harvesters
will then receive the new record the next time they harvest from the data provider.
What happens if I accidentally create two files with the same ID?
When jOAI imports a new or modified file into it's index, a check is performed to see if there is an existing record with the same unique ID. If the ID already exists, an error will be reported under 'Indexing Errors' in the the Metadata Files Configuration page, and the file will not be imported into the repository index. To fix the problem the file must either be removed from the directory or a unique ID should be assigned to the file, as described under preparing files for serving.
My records have indexing errors when I put them in the data provider. What's wrong?
XML files that contain text that was copied and pasted from tools such as Microsoft Word often contain invalid characters such as dashes or copyright symbols that are improperly encoded. These 'bad characters' can trigger the XML processors in the software to issue an error. Files must contain well-formed XML and should use UTF-8 encoding. Character references, rather than entity references, should also be used for special characters, as required by the OAI protocol XML response format.
How many records can the data provider scale to?
The jOAI data provider is designed for small to medium size data repositories.
The software has been tested successfully with repositories up to 300,000 records.
The number of records the software can support depends on the amount of memory
available to the Java JVM, the speed of the host machine and the size of the
individual records in the repository.
The baseURL that is shown uses the local machine name, but it should use the domain name for the server instead. How can I change it?
The base URL that is shown on the front page
and Repository Information
page of jOAI and elsewhere reflects the URL that was entered into the web
browser when connecting to the software. For example, if a user accesses jOAI
using the web address http://localhost:8080/joai,
the baseURL will be shown as 'http://localhost:8080/joai/provider'.
If the user connects to the same instance of jOAI using the Internet address
http://myserver.somewhere.edu/joai, the baseURL
will be shown as 'http://myserver.somewhere.edu/joai/provider'.
Where are the harvested records and zip archives saved to?
The harvester saves the records that are harvested into individual files on
the file system, one record per file. Files are saved to either a default directory
(which is named based upon the name of the provider and optionally the set that
is being harvested) or a specific directory that was specified when setting
up the harvest. Each harvest is then packaged into a zip archive.
To determine where files and zip archives were saved to after a harvest has
occurred, go to the Harvester
Setup and Status page, then click on 'View harvest history' for a given
harvest. This brings up the detailed history of harvests and shows the full
directory path to the harvested files and zip archives.
Each time a harvest is performed for a given harvest configuration, files in
the harvest directory may be added, updated or deleted by the harvester depending
on the outcome of the harvest. If configured for zipping, at the conclusion of each harvest that results
in a change to one or more files, a new zip archive is created, and a maximum
of three zip archives are preserved at any given time. Each zip archive contains
the exact time of the harvest in its name. The zip archives for each harvest
may be downloaded directly from the Harvester
Setup and Status page or accessed from the file system.
Can I use jOAI to harvest records into a database?
There are two ways in which the harvester may be used to import records into
The first method, which uses the jOAI web application, requires two parts.
First, configure the jOAI harvester to save files to a convenient directory
at regular intervals, such as once a day. Then, write a routine to monitor the
file directory and add, update or delete the corresponding records to the database
when changes occur in the files.
Another method is to use the Harvester
API from within native Java code to perform harvests and import metadata
records directly into a database.
Does the harvester support selective or incremental harvesting?
Yes. When an automatic harvest is conducted at regular intervals, the harvester
checks if the data provider supports deleted records. If deletions are supported,
harvest is performed by requesting and synchronizing only those records
that have been added, modified or deleted since the previous harvest. If deletions
are not supported, a full harvest is performed by deleting all previously harvested
records and harvesting all records from scratch.
Similarly when a manual harvest is performed, clicking 'New' performs a selective
harvest while clicking 'All' deletes all previously harvested files and performs
a full harvest from scratch.
The files that are saved by the harvester include characters like '%3A'
in their names. Why is that?
When the harvester saves records, it places each record in a single file, which
is named using the OAI identifier associated with the record that was harvested.
Reserved characters such as the colon ':' are encoded using hexadecimal values
in order to ensure the file name is valid on the file system. For example, if
the OAI identifier for a given harvested record is oai:dlese.org:123-ABC, the
file will be named oai%3Adlese.org%3A123-ABC.xml. The hexadecimal characters
can be converted back to the original form as needed.
How can I provide records that I harvest?
Currently it is a two step process to make records that are harvested available
through the data provider. First, harvest the records to a convenient file directory.
Second, configure the data provider to point to the same file directory. As
new records are added, modified or deleted by the harvester, these changes will
be reflected and passed along in the data provider.
Can I search over and view the records I harvest?
The harvester portion of the software does not currently support searching
and viewing harvested records directly. However, by configuring harvested records
in the data provider as mentioned above, the records will become searchable
and viewable in the 'Search' and 'Admin Search' pages.
Is it possible to customize the CSS, HTML or other features of the jOAI user interface?
Can I configure jOAI to store my settings and data in a permanent location for backup or reinstallation purposes?
Yes. jOAI saves it's configuration files and stored data inside file directories. By default, these are located inside the WEB-INF directory of the jOAI installation. To store these in a global directory, set the repositoryData and harvesterData configuration parameters to a point to a directory of your choice. See the section titled 'Configure software settings' in the Configuring jOAI page for details.
When upgrading or reinstalling jOAI, how do I preserve the the settings, indexes and files for the data provider and harvester?
If you have previously configured jOAI to store it's configuration files and data in a global directory as described above, you can simply stop Tomcat, upgrade or reinstall the jOAI software (oai.war) and start Tomcat again. Then visit the jOAI admin and search pages to confirm that the settings and indexes have been preserved for the data provider and harvester. In some cases it may be necessary to re-index the files in the data provider for changes to be seen. Before upgrading or reinstalling, be sure to make a backup copy of your settings and data in case you need to revert back for any reason.
If you have not configured jOAI to store it's files in a global directory, follow these steps:
1. Stop Tomcat
2. Move and save the current oai installation to a location outside the webapps directory. (Backup and save a copy).
3. Install the new version of jOAI (put oai.war in webapps, start Tomcat, etc). Tomcat will unpack the new oai.war file.
Then, to restore the previous settings, indexes and files:
4. Stop tomcat again.
5. In the new webapps/oai/WEB-INF directory, replace the two directories 'repository_settings_and_data' and 'harvester_settings_and_data' with the ones saved from the previous installation.
6. Start Tomcat.
7. Visit the jOAI admin and search pages to confirm that the settings and indexes have been preserved for the data provider and harvester.
Can jOAI be configured to run through an Apache web server (httpd)?
Yes. Running jOAI through an Apache web
server provides additional functionality that is not available through Tomcat
alone. For example, Apache provides robust support for SSL, user authorization
and authentication, access control by IP address, virtual host support, web
logging, URL redirection, and other functionality. By configuring Tomcat to
run through Apache, all of Apache's functionality becomes available. This may
be especially convenient for web administrators who are already familiar with
One of two Apache modules may be used to connect Apache with Tomcat: mod_proxy or mod_jk. Choose one or the other:
- mod_proxy - Information for setting up mod_proxy is provided in the Apache Module mod_proxy documentation, with additional configuration information specific to Tomcat provided in the Tomcat proxy how to documentation (proxyName and proxyPort attributes must be added to the non-SSL and SSL HTTP <Connector> elements in Tomcat's server.xml to ensure that URLs in jOAI and other Web applications that rely on the
ServletRequest.getServerName() and related Java methods will resolve properly when using mod_proxy).
- mod_jk - Information for setting up mod_jk is provided in the Apache Tomcat Connector documentation.
After setting up mod_proxy or mod_jk, a typical configuration scenario would be to use Apache to provide SSL encryption,
user authorization and authentication for all pages that reside in the admin
area of the software (e.g. https://oai.somewhere.edu/oai/admin*), while leaving
all other public jOAI pages open. This scenario provides a relatively secure
way to restrict access to the administrative functions of the software to trusted
users while leaving access to the data provider, search and other public pages
Another scenario might be to restrict access to the software or portions of
the software by requestors IP address.
See the Apache documentation for
a list of available features and configuration information.
Can jOAI be integrated into an existing Web application?
Yes. It is recommended that jOAI be run as a stand-alone Web application, however it is possible to integrate it into an existing Web application. Either the data provider, the harvester or both can be configured. Here is a general outline of how this may be done:
1. Copy the configuration from web.xml:
- All <servlet> elements OAIProviderServlet, OAIHarvesterServlet and action.
- All <context-param> elements
- All <filter> and <filter-mapping> elements
- All <servlet-mapping> elements
- All <taglib> elements
- Optionally, copy over the <welcome-file-list> and <error-page> elements
2. Copy over files struts-config.xml, users.xml, validation.xml, validator-rules.xml from /WEB-INF to your application.
3. Copy over all JAR files from /WEB-INF/libs (some may not be required)
4. Copy over directories /WEB-INF/classes, /WEB-INF/tlds, /WEB-INF/xsl_files and /WEB-INF/conf. Optionally /WEB-INF/error_pages (if configured in web.xml).
5. Copy over all .jsp, .js and .css files from the root and the /oai_requests, /admin, /docs (optional), /images directories.
- Optionally edit, add and remove jsp and css files as needed. The OAI protocol is handled by the pages found in /oai_requests. Administration is handled by the pages found in /admin. The WEB-INF/struts-config.xml file is used to configure URL paths to the JSP pages that handle them, via the Action.