Applying World Wide Web Technology to the Study of Patients with Rare Diseases
- Piet C. de Groen, MD;
- Jon A. Barry, BA, BS; and
- William J. Schaller, BA
- From Mayo Medical School, Clinic, and Foundation, Rochester, Minnesota. Requests for Reprints: Piet C. de Groen, MD, Division of Gastroenterology, Mayo Clinic and Foundation, 200 First Street SW, Rochester, MN 55905. Current Author Addresses: Dr. de Groen: Division of Gastroenterology, Mayo Clinic and Foundation, 200 First Street SW, Rochester, MN 55905.
Abstract
Randomized, controlled trials of sporadic diseases are rarely conducted.Recent developments in communication technology, particularly the World Wide Web, allow efficient dissemination and exchange of information. However, software for the identification of patients with a rare disease and subsequent data entry and analysis in a secure Web database are currently not available.
To study cholangiocarcinoma, a rare cancer of the bile ducts, we developed a computerized disease tracing system coupled with a database accessible on the Web.The tracing system scans computerized information systems on a daily basis and forwards demographic information on patients with bile duct abnormalities to an electronic mail-box. If informed consent is given, the patient's demographic and preexisting medical information available in medical database servers are electronically forwarded to a UNIX research database. Information from further patient–physician interactions and procedures is also entered into this database. The database is equipped with a Web user interface that allows data entry from various platforms (PC-compatible, Macintosh, and UNIX workstations) anywhere inside or outside our institution. To ensure patient confidentiality and data security, the database includes all security measures required for electronic medical records. The combination of a Web-based disease tracing system and a database has broad applications, particularly for the integration of clinical research within clinical practice and for the coordination of multicenter trials.
Randomized, controlled trials are the gold standard in medicine. In general, they are performed at major institutions where many patients with the disease being studied are seen. These trials are seldom done for rare diseases because of the years required to identify a sufficient number of patients for clinical investigation. Even in major institutions where patients with rare diseases are treated more frequently than in smaller centers, it is unlikely that one physician will see more than a few patients with the disease of interest. Two exceptions are 1) if a referral system is in place and 2) if a system is set up to actively search for patients with the disease of interest among all patients seen in the institution.
Multicenter trials provide a way to rapidly enroll more patients than could be enrolled at one institution. They also have the advantage of providing a patient population that may more evenly represent various subpopulations of a society, especially with respect to race and culture. However, the disadvantages inherent in multicenter trials include the difficulty of achieving uniform follow-up and complete data collection, particularly from participating institutions where little or no official time is available for clinical research and where the primary focus is clinical practice.
Cholangiocarcinoma, a rare cancer of the bile ducts, occurs in an estimated 2000 to 3000 patients in the United States per year [1, 2]. At the Mayo Clinic, it is diagnosed in 100 to 200 new patients each year (among a total of 350 000 patients), making the Mayo Clinic one of the largest referral centers for this type of tumor. However, until recently, individual physicians at the Mayo Clinic saw no more than a few patients with cholangiocarcinoma per year, and prospective trials were essentially nonexistent.
We developed a disease tracing system that automatically identifies patients with cholangiocarcinoma, and we used novel World Wide Web-based technology to develop a database that allows entry, modification, deletion, and analysis of data from any Web-enabled workstation. Although this system is currently used only at the Mayo Clinic, the technology can be easily applied at other institutions and should facilitate randomized, controlled trials of any disease, regardless of frequency or the patients' location. In addition, the simplicity of the user interface will allow physicians who primarily perform clinical tasks to participate in randomized, controlled, multicenter trials.
Methods
Tracing System
Our overall goal was to develop a fully automated system for gathering data on all patients in whom cholangiocarcinoma was diagnosed at the Mayo Clinic and Foundation outpatient clinics and hospitals. We obtained permission from all of the Mayo Clinic committees and departments that deal with patient information, confidentiality issues, database security, and legal issues. These committees and departments included the institutional review board, the security committee, the clinical practice committee, and the legal department.
Various computerized databases containing patient information were analyzed. These included databases with demographic information, the general patient appointment system, a pathology database, medical and surgical diagnostic records, and billing records. Of particular importance was the availability of up-to-date information on patient diagnoses.
Analysis of these databases showed that the billing records database had the most up-to-date information on diagnoses. Immediately after contact with a patient, the physician is required to fill out a billing record that describes the type of contact (for example, general medicine or subspecialty, or full examination or consultation), the duration of the patient encounter (for example, 45 minutes), and one or more specific diagnoses suspected or known at the time of contact (for example, cholangiocarcinoma). These billing records are then sent to the business office, where the patient's name, clinic number, and diagnoses (according to ICD-9 [International Classification of Diseases, Ninth Revision] codes) are entered at the end of the day into the billing records database (Figure 1). At night, a computerized batch program scans all newly entered diagnoses and, for the cholangiocarcinoma database, retrieves those with ICD-9 codes indicating bile duct disease. These data are forwarded electronically to the “Candidates” section of the cholangiocarcinoma database (Figure 2). The next day, the study coordinator checks this section, contacts the primary physicians of potential candidates, and requests referral of the patient to the hepatobiliary clinic. In the hepatobiliary clinic, permission is obtained from the patient for enrollment into the cholangiocarcinoma database.
User Interface and Database
After a patient is identified and informed consent for enrollment is obtained, all preexisting computerized information on the patient is automatically entered into the database. Data from subsequent patient–physician interactions are entered by using templates available on the Web through a Web browser. For information retrieval and data input, we developed a system that operates independently of the type of computer used (PC-compatible, Macintosh, or UNIX workstations), has quick response times, and requires little or no training before use. We chose an Internet browser (Netscape 3.X, Netscape Communications Corp., Mountain View, California) as the interface between the user and our database to allow access from any location in the Mayo Clinic intranet or the Internet (outside the Mayo Clinic firewall). A scalable UNIX database was created using Ingres (Computer Associates, Islandia, New York) and allowed creation of new records as needed. All of the levels of data security required for electronic medical records were built into the system, including a firewall; unique user identification; one-way encrypted passwords; documentation of the date, time, user identity, and modification of data with each entry; and ability to monitor and audit user activities.
The database includes templates for inputting the research information collected from a specific patient–physician encounter (Figure 3). For example, templates were developed for describing pathologic findings, various radiologic and surgical procedures, and chemotherapy and radiation therapy regimens. All of the templates are listed in Table 1. Templates can be added at any time for specific diagnostic or therapeutic research protocols.
All template parameters are empty, unless otherwise defined by data entry. Rules were designed within and between each template; for example, a tumor cannot vary by more than a specific percentage in maximal diameter when the results of computed tomography, magnetic resonance imaging, or ultrasonography done on the same day are compared. Errors in data entry are flagged and stored, allowing retrieval and correction at a later date by the “Master” user of the database.
The user interface was developed using practical extraction and report language (Perl) [3], hypertext markup language (HTML) (World Wide Web Consortium, Massachusetts Institute of Technology, Cambridge, Massachusetts), JavaScript (Netscape Communications Corp.), and Java (Sun Microsystems, Inc., Palo Alto, California) software. The interface is a Perl application that runs on a Web server and generates HTML and JavaScript code according to a user request or transaction. Java applications are used to create graphs and charts. When the user asks to use an electronic template, the interface program reads the structure of the template from the database and then builds the HTML, JavaScript, and Java page that is displayed on the local workstation's Web browser. The user fills out the template and saves it, and the collected data are stored in the database. Connection to the database is only needed twice: to read the template structure and to save the data. This feature keeps traffic on the local network to a minimum. Data can be retrieved at any time and displayed in the same format used for data entry.
To optimize the user-database interaction, several user-friendly features are included (Figure 4). For example, the user can select whether entry fields precede or follow text. Free text entry is reduced to the absolute minimum. Entry fields are prefilled with expected data, such as the current date for new template entries. Data can be carried over if the findings on a second examination are identical or similar to those on a previous date (for example, “change of stents, all findings similar to previous cholangiogram”). Only the template items that require data input are shown on the computer screen. For example, tumor size is relevant only if a tumor mass is seen; therefore, the item “tumor size” only appears when the item “tumor mass” is set to “yes.” The page elongates, the new item is inserted, and the screen scrolls down to this new item (“autoadvance”); all of this occurs in a “flip” of the page. The overall result is that the user remains focused on the center of the screen, enters a minimal amount of items, and does not need to answer irrelevant questions.
The interface is programmed to display each template as it is defined in the database. The rules that contain the information needed to guide the user through the template while the user enters data are attached to each template. As mentioned, when the user selects a template, an HTML page is generated on the server and displayed on the computer screen by the Web browser. This allows use of the same interface system for all databases within the Mayo Clinic Cancer Center without the need to make changes in individual databases. Generation of a new Web page requires only the addition of a new template to the database, and rules are defined as needed.
The database is programmed to automatically obtain information available in other databases as soon as a new patient is registered and every time a patient's records are activated for data entry, deletion, modification, or retrieval. For instance, laboratory values, such as serum bilirubin and alkaline phosphatase levels, and radiologic reports, such as cholangiograms, are automatically retrieved from the electronic result information system and stored in the database. This information is immediately available for review. Laboratory data can be displayed in a graphic format (concentration over time), which allows rapid interpretation of disease progression or response to treatment.
Discussion
We designed and implemented a Web-based computer system that allows the development of a database of patients with rare diseases. Not unexpectedly, we found that of all electronic, patient-related databases in our institution, the billing records database contains the most up-to-date diagnostic information. This can be explained by the need for timely payment for services and the fact that insurance companies require information about diagnostic categories and types of treatment before payments are made. Because the billing records database is so up to date, it was chosen to identify patients with rare diseases.
We are currently examining whether the initial ICD-9 codes are accurate as well as complete. Our first impression is that the tracing system is remarkably accurate for patients with cholangiocarcinoma, perhaps because the differential diagnosis of obstructive jaundice is limited and because many patients have already been evaluated elsewhere and are referred to our institution with a probable diagnosis of cholangiocarcinoma. Clearly, more experience with other diseases is needed before this disease tracing system can be recommended for general use in identifying patients with other rare diseases. In addition, we intend to improve the accuracy of the tracing system by probing other diagnostic databases for information that confirms the diagnosis of cholangiocarcinoma, such as the electronic pathology database (biopsy results from elsewhere) and the laboratory database (markers of cholestasis).
Because the concept and protocols of automatic database searching for specific disease categories and subsequent enrollment of identified patients in a Web-based database were new, the intramural review process was extensive. We discussed the plans for the database with 14 intramural committees and departments before we obtained permission to implement the system. The approval process was complicated by a Minnesota law (effective 1 January 1997) that forbids confidential access to medical records (but not billing records) for research purposes without previous written approval by the patient [4]. Thus, medical record databases cannot be scanned at this time for specific disease categories. However, every patient is currently asked to give permission for confidential and anonymous use of their medical records for research purposes. Permission, when given, is entered into a database. In the near future, we intend to use this “authorization status” database to select the medical records of patients who have given permission for use of their records. We will then scan only these records for specific disease categories.
To ensure patient confidentiality and security of data, all levels of data security for electronic medical records are incorporated into the database. In addition, the database provides precise allocation of access. For example, access by radiologists, who are responsible for inputting ultrasonography data only, can be limited to selective read-write access of the ultrasonography templates. Such control not only prevents fraudulent access of data but also prevents entry of data by nonsubspecialty users.
Despite the advertised platform independence of Web browsers, we encountered many difficulties in implementing the tracing system and database across UNIX, PC-compatible, and Macintosh platforms. To solve these problems, we designed our system for one Web browser (Netscape 3.X) because of the programming options of this browser and its frequency of use in our institution.
Each template of the database was designed by specialists interested in and responsible for the data to be collected for that template. For instance, the ultrasonography template was designed by radiologists who specialize in ultrasonography of the liver. Each specialty group was asked to create a template that captures the important clinical results of their physician–patient interactions and at the same time includes all of the data the group needed for their own clinical research. We adopted this philosophy to achieve compliance with data entry (that is, complete records), to collect useful data for the subspecialty research fields, and to stimulate the development of clinical trials from within the various subspecialties.
Substantial effort went into the development of a user-friendly interface. The appearance of a template, such as location of entry fields (to the left or right of text), can be tailored to the wishes of a specific user. A setup feature will allow each user to create his or her own template. More important, irrelevant questions can be hidden, which saves time in completing the form. Of key importance is the speed of the computer system; we did not want anyone to wait for computer or file access. On the contrary, the page “flipping” feature occurs so quickly that we considered incorporating a time delay to allow the user to see what was chosen before the page changes.
What are some of the implications of using such technology in clinical research? First, in theory, similar computer-based disease tracing systems, using billing records or other up-to date databases for diagnoses, can be developed for every disease and can be used in major medical institutions, allowing prospective clinical trials of even the rarest diseases. Second, only a limited number of databases per disease (or one database for extremely rare diseases) is needed because access to a database is independent of the database's physical location. It is conceivable that only a limited number of centers with expertise in Web technology and data analysis may be needed to administer all clinical trials in the United States. Data exchange and meta-analysis can be facilitated by a priori agreements on methods of data collection. Third, with automated patient identification and data handling systems in place, academic and nonacademic institutions are more likely to enter patients into clinical trials, resulting in shorter enrollment periods and reducing the overall duration of clinical trials. Fourth, it may be easier to conduct multicenter trials when all data are collected at a single center; in addition, monitoring of data collection and interim analysis can be done at any time during a trial. Fifth, direct input into a computer and automatic retrieval of information from other databases eliminate the time needed for transfer of data from paper records to computer records and eliminates a potential source of error. Sixth, the database can be programmed to retrieve only the information needed for enrollment and evaluation during trials; irrelevant information can be excluded from the database. Seventh, information provided by the patient, such as medical and surgical history, social history, and family history, can be entered and reviewed by patients themselves using a similar system. In addition, educational aspects can be included in and extracted from the database for patient education, a feature we are already using. Finally, detailed medical reports or letters to referring physicians can be automatically generated by the database through the use of defined templates.
Our institution will soon implement a process for entering ICD-9 codes into a database during evaluation of the patient. When a code is entered, information about ongoing therapeutic trials and the optimal and most cost-effective diagnostic strategy may be displayed on the computer. For instance, entry of the ICD-9 code for cholangiocarcinoma could trigger a display of an electronic algorithm that suggests an ultrasonogram of the liver, an endoscopic retrograde cholangiogram, liver tests, specific tumor markers, and referral to the hepatobiliary clinic. To keep errors to a minimum, a feedback feature is being developed in our database to graphically display the liver and spleen and connecting arteries, veins, and bile ducts. After data are entered and saved, the computer will generate a picture of the liver and spleen with all abnormalities as entered in place, such as a stent through the bile ducts after stent placement or a missing gallbladder after cholecystectomy. Eventually, we envision a system in which diagnostic information, such as that from radiographs, is analyzed by software algorithms and in which the physician confirms the abnormality found by either pointing at or describing the anatomic site. Currently available electrocardiography programs are an example of such technology. Therapeutic procedures can be entered similarly without the need to enter or select text items.
Although the tracing system and database were developed to identify and study patients with cholangiocarcinoma, the system can clearly be applied to many fields of interest, both medical and nonmedical. Within medicine, the clinical research community may find this system useful for studying any type of disease, independent of its frequency. The Mayo Clinic Cancer Center is already using this database system for multicenter trials and collaborative research projects with pharmaceutical companies.
We are currently developing a software program with a user-friendly Web interface that allows the creation of a disease tracing system, a database, and templates (as well as rules within the templates) by means of a Web-enabled browser. If we succeed, physicians with Web access would not only be able to participate in clinical trials but also would be able to design and implement clinical trials with little or no help.
Addendum: We are currently upgrading the user interface to Netscape 4.X.
Mr. Barry and Mr. Shaller: Cancer Center, Mayo Clinic and Foundation, 200 First Street SW, Rochester, MN 55905.
- Copyright ©2004 by the American College of Physicians
RSS Feeds



![Figure 4. On each page, two frames are shown. The top frame contains the study number (964202); the study name (Cholangiocarcinoma Initiative); the Mayo logo; an entry field for the patient number (Patient:); a “Set Patient” button to choose a patient; a “Reset” button to reset the patient number; a “Logout” button to disconnect the user from the database; a “Lookup” link to show, in the second frame, a table from which patients can be chosen; and a “(View)” link to show, in the second frame, all templates available. The last line in the top section contains patient information (name, birth date, sex, and race, if available) and a “Text Pos” button, which allows the user to set the screen with text on the left (top left and right, bottom left example of Web page) or the right (bottom right example of Web page). The template for an arteriogram is shown. The date is pre-filled with the actual date, and all answers are “blank” ([similar]). At the bottom of the template, “SAVE” and “Reset” buttons allow those actions. In the first example ( ), the user has answered “no” to “Abnormal” and “Chemoembolization.” Answering “yes” to “Abnormal” moves (“autoadvances” and “flips”) the answer “yes” to the top of the panel and shows new questions indented at least one level ( ). Answering “Aberrant anatomy” does not result in a page change, but answering “Vascular mass” with “yes” results in a request to define the size of the mass at a second indentation level ( ). The remainder of the questions continue to be shown at their original indentation level ( ). Buttons allow predefined choices; shaded areas allow numeric input; and red, underlined text (color not shown) defines links to other pages within the database.](107/F4.small.gif)









