PMID- 11884248 TI - Reporting of measures of accuracy in systematic reviews of diagnostic literature AB - Abstract | Background | There are a variety of ways in which accuracy of clinical tests can be summarised in systematic reviews. Variation in reporting of summary measures has only been assessed in a small survey restricted to meta-analyses of screening studies found in a single database. Therefore, we performed this study to assess the measures of accuracy used for reporting results of primary studies as well as their meta-analysis in systematic reviews of test accuracy studies. Methods | Relevant reviews on test accuracy were selected from the Database of Abstracts of Reviews of Effectiveness (1994 --2000), which electronically searches seven bibliographic databases and manually searches key resources. The structured abstracts of these reviews were screened and information on accuracy measures was extracted from the full texts of 90 relevant reviews, 60 of which used meta-analysis. Results | Sensitivity or specificity was used for reporting the results of primary studies in 65/90 (72%) reviews, predictive values in 26/90 (28%), and likelihood ratios in 20/90 (22%). For meta-analysis, pooled sensitivity or specificity was used in 35/60 (58%) reviews, pooled predictive values in 11/60 (18%), pooled likelihood ratios in 13/60 (22%), and pooled diagnostic odds ratio in 5/60 (8%). Summary ROC was used in 44/60 (73%) of the meta-analyses. There were no significant differences in measures of test accuracy among reviews published earlier (1994 --97) and those published later (1998 --2000). Conclusions | There is considerable variation in ways of reporting and summarising results of test accuracy studies in systematic reviews. There is a need for consensus about the best ways of reporting results of test accuracy studies in reviews. Keywords: Background : The manner in which accuracy of clinical tests is mathematically summarised in the biomedical literature has important implications for clinicians. Appropriate accuracy measures would be expected to sensibly convey the meaning of the study results with scientifically robust statistics without exaggerating or underestimating the clinical significance of the findings. Lack of use of appropriate measures may lead authors of primary accuracy studies to draw biased conclusions. In systematic reviews of test accuracy literature, there are many ways of synthesising results from several studies, not all of which are considered to be scientifically robust. For example, measures such as sensitivity and specificity commonly used in primary studies are not considered suitable for pooling separately in meta-analysis. Variations in reporting of summary accuracy and use of inappropriate summary statistics may increase the risk of misinterpretation of clinical value of tests. A recent study evaluated a small sample of meta-analytical reviews of screening tests to demonstrate the variety of approaches used to quantitatively summarise accuracy results. This study confined itself to a limited Medline search. It exclusively examined meta-analytical studies so reviews not using quantitative synthesis were excluded. It did not look at accuracy measures used to report results of primary studies separately from those used for meta-analyses. In order to address these issues, we undertook a comprehensive search to survey systematic reviews (with and without meta-analysis) of test accuracy literature to assess the measures used for reporting results of included primary studies as well as their quantitative synthesis. Methods : We manually searched for relevant reviews in the Database of Abstracts of Reviews of Effectiveness (DARE). In order to limit the impact of human error inherent in manual searching, we complemented it with electronic searching. DARE was searched electronically with word variants of relevant terms (diagnostic, screening, test, likelihood ratio, sensitivity, specificity, positive and negative predictive value) combined using OR. From 1994 to 2000 DARE has identified 1897 reviews of different types by regular electronic searching of several bibliographic databases, hand searching of key major medical journals, and by scanning grey literature (search strategy and selection criteria can be found at ). The structured abstracts of these reviews were screened independently by the authors to identity systematic reviews of test accuracy. The full texts were obtained of those abstracts judged to be potentially relevant. Reviews addressing test development and diagnostic effectiveness or cost effectiveness were excluded. Any disagreements about review selection were resolved by consensus. Information from each of the selected reviews was extracted for the measures of test accuracy used to report the results of the primary studies included in the review. If a meta-analysis was conducted, information was also extracted for the summary accuracy measures. The various accuracy measures are shown in Table . We sought the following in the primary studies: sensitivity or specificity, predictive values, likelihood ratios and diagnostic odds ratio. For meta-analysis, we sought the summary measures pooling the above results and summary receiver operating characteristics (ROC) plot or values. All extracted data were double-checked. We divided the reviews into two groups arbitrarily according to time of publication; one group covering the period 1994 --97 (50 reviews) and another covering 1998 --2000 (40 reviews). This allowed us to assess whether there were any significant differences in measures being used to report test accuracy results among reviews published earlier and those published later. As the approaches to summarising results are not mutually exclusive, we evaluated and reported the most commonly used measures and their most common combinations. We used chi-squared statistical test for comparison of differences between proportions. Table 1 | Measures of accuracy of dichotomous test results Results : Of the abstracts available in DARE, 150 were considered to be potentially relevant. Excluding reviews that addressed test development and diagnostic effectiveness or cost, 90 reviews of test accuracy were left for inclusion in our survey. There were 45 reviews of dichotomous test results, 42 reviews of continuous results dichotomised by the original authors, and 3 reviews that contained both result types. Meta-analysis was used in 60/90 (67 %) reviews, 50 in 1994 --97 and 40 in 1998 --2000. (See : BMC_IncludedRefList_04032002 for a complete listing of the 90 reviews included in our study). As shown in Table , sensitivity or specificity was used for reporting the results of primary studies in 65/90 (72%) reviews, predictive values in 26/90 (28%), and likelihood ratios in 20/90 (22%). For meta-analysis, independently pooled sensitivity or specificity was used in 35/60 (58%) reviews, pooled predictive values in 11/60 (18%), pooled likelihood ratios in 13/60 (22%), and pooled diagnostic odds ratio in 5/60 (8%). Summary ROC was used in 44/60 (73%) of the meta-analyses. There were no significant differences between reviews published earlier and those published later as shown in Table . Table 2 | Measures of test accuracy reported in review of diagnostic literature (1994 --2000) Discussion : Our study showed that sensitivity and specificity remain in frequent use, both for primary studies and for meta-analyses over the time period surveyed. Sensitivity and specificity are considered inappropriate for meta-analyses, as they do not behave independently when they are pooled from various primary studies to generate separate averages. In our survey, separate pooling of sensitivities or specificity was used frequently in meta-analyses where summary ROC would have been more appropriate. . Our findings about reporting of summary accuracy measures in meta-analyses are different to those reported previously. We found a higher rate of use of summary ROC, though use of independent summaries of sensitivity, specificity and predictive values were similar. These differences may be due to differences in searching strategies (databases and time frames) and selection criteria. Our search was more recent and comprehensive, using DARE, which has covered seven different databases (Medline, CINAHL, BIOSIS, Allied and Alternative Medicine, ERIC, Current Contents clinical medicine and PsycLIT), and hand-searched 68 peer-reviewed journals and publications from 33 health technology assessment centres around the world since February 1994. Moreover, as we did not restrict our selection to meta-analytical reviews only, we were able to examine reviews summarising accuracy results of primary studies without quantitative synthesis, which constituted 33% (30/90) of our sample. Therefore, compared to the previous publication on this topic, our survey provided a broader and more up-to-date overview of the state of reporting of accuracy measure in test accuracy reviews. Conclusions : The use of inappropriate accuracy measures has the potential to bias judgement about the value of tests. Of the various approaches to reporting accuracy of dichotomous test results, likelihood ratios are considered to be more clinically powerful than sensitivities or specificities. Crucially, it has been empirically shown that authors of primary studies may overstate the value of tests in the absence of likelihood ratios. There is also evidence that readers themselves may misinterpret test accuracy measures following publication. It is conceivable that the problem of inconsistent usage of test accuracy measures in published reviews, as found in our survey, may contribute to misinterpretation by clinical readership. The reason for variation in reported accuracy measures may, in part, be attributed to a lack of consensus regarding the best ways to summarise test results. It is worth noting that despite authoritative publications about appropriate summary accuracy measures in the past, (we have only quoted a few references) inconsistent and inappropriate use of summary measures has remained prevalent in the period 1994 --2000. Our paper highlights the need for consensus to support change in this field of research. Competing interest : None declared Backmatter: PMID- 11914161 TI - A randomised controlled trial of a patient based Diabetes recall and Management system: the DREAM trial: A study protocol [ISRCTN32042030] AB - Abstract | Background | Whilst there is broad agreement on what constitutes high quality health care for people with diabetes, there is little consensus on the most efficient way of delivering it. Structured recall systems can improve the quality of care but the systems evaluated to date have been of limited sophistication and the evaluations have been carried out in small numbers of relatively unrepresentative settings. Hartlepool, Easington and Stockton currently operate a computerised diabetes register which has to date produced improvements in the quality of care but performance has now plateaued leaving substantial scope for further improvement. This study will evaluate the effectiveness and efficiency of an area wide 'extended' system incorporating a full structured recall and management system, actively involving patients and including clinical management prompts to primary care clinicians based on locally-adapted evidence based guidelines. Methods | The study design is a two-armed cluster randomised controlled trial of 61 practices incorporating evaluations of the effectiveness of the system, its economic impact and its impact on patient wellbeing and functioning. Keywords: Background : Delivering care to people with diabetes | There is broad, international agreement over what constitutes high quality health care for people with diabetes . This will be enshrined in a National Service Framework for people with diabetes, due in summer 2002. However, in the face of poor current performance the most efficient method of delivering care remains unclear . Following a 1994 systematic literature review suggesting structured care improved patient care, an editorial in the British Medical Journal concluded that more evaluative research was needed before widespread adoption of any of the models could be recommended . A subsequent systematic review of routine surveillance of patients with diabetes by Griffin and Kinmonth concluded "Computerised central recall, with prompting for patients and their family doctors, can achieve standards of care as good or better than hospital outpatient care, at least in the short term. The evidence supports provision of regular prompted recall and review of people with diabetes by willing general practitioners and demonstrates that this can be achieved, if suitable organisation is in place'. However, the evidence base on which these conclusions are based is limited in several ways. Firstly there are only five randomised controlled trials (RCTs) involving 1058 patients. All of these studies are 'patient randomised" trials, thus potentially under-estimating the effectiveness of the intervention (see Study Design). They were all evaluating more or less selected patients and general practices and none of them were explicitly evaluating a UK National Health Service (NHS) service area wide intervention. Only one of the four UK based studies evaluated patient based outcomes and included an economic assessment and this study only involved patients from three general practices . Thus, the effectiveness of an area wide, patient focussed, structured recall and management system (in terms of process of care, patient outcome and economic impact) remains unknown. The current system | The current computerised diabetes management system runs in Hartlepool, Easington and Stockton, three Primary Care Group (PCG) areas, in the Northern and Yorkshire Region. It was introduced to all 36 general practices in Hartlepool and Easington Districts in mid-1995. Stockton (25 practices) agreed to join the system in 1999 and it was operational there by October 2000. There are three key components to the current system: 1. A central register of patients with diabetes. 2. A structured minimum dataset to be completed and returned to the central register. 3. The provision of both patient specific and aggregated data to both patients and clinicians. The system (developed by Westman Medical Software) allows three methods of collection of data at each contact with a patient with diabetes who is registered on the database. Two methods use a standard form completed by clinicians to collect data concordant with the UK minimum data set . Within secondary care, forms are completed at every new patient or annual review. In primary care, forms are completed by the practice nurse (usually) or general practitioner, either opportunistically or at practice diabetic clinics. In both cases, the completed data forms are sent to the Diabetes Register Facilitator for data entry. Thirdly, the hospital laboratory provides a monthly download of laboratory test details (e.g. HbA1c). A patient can be identified as having diabetes and added to the register by any permutation of one or more of these three routes. Feedback of individual patients' data, including review status, is provided to general practices quarterly. This feedback is 'passive' in that it does not explicitly prompt either patients or doctors as to required actions. Audit packages within the software can audit on every variable collected. District wide audit is provided on anonymised aggregated data; individual practice audits (with comparisons to other practices) are provided to participating practices at least annually. Feedback of the data to the patient (for hospital patients only) is by a patient information sheet and to the GP as a standardised letter. A Diabetes Register Facilitator co-ordinates and updates the register. A steering group composed of GH, the Diabetes Register Facilitator and representatives of the PCGs and patients, oversees the register and deals with issues such as confidentiality. Impact of the system to date | Measures of the impact of the system to date relate only to Hartlepool, Easington and Stockton. The main impact on patient registration was in its first 12 to 18 months of operation: during 1995, 747 patients were registered on the system (0.4% prevalence) which had increased to 3867 (1.8% prevalence) by the end of 1996. The increase in registration has stabilised since then, reaching 4324 (2% prevalence) by 1999. During 1999, 70% of registered patients attended a clinic; 52% had their feet examined and 51% had their eyes examined. Seventy three per cent had an HbA1c result recorded and 69% a blood pressure measurement. These figures are similar to those reported by other centres using the same system . The need for an extended system | Recording of clinical measures increased during the first few years of operation of the system but began to plateau more recently (for example, 50% of patients had an HbA1c recorded during 1996, compared to 60% in 1997 and 63% in 1998). This plateauing of performance has been reported by others . We believe that this is due to a lack of coordination (patients being lost to follow up) and lack of prompting of clinicians to deliver appropriate clinical interventions. Furthermore, given that most patients with diabetes are primarily seen in primary care the greatest potential impact is from optimising and extending the system in primary care. In order to address these shortcomings the additional key components, over and above those already in the system, will be: 1. Locally adapted evidence based guidelines for the management and follow up of patients with diabetes. 2. Automated prompts to patients and primary care clinicians that a review consultation is necessary. 3. A structured management sheet (including patient specific management suggestions based on (1)). 4. An enhanced monitoring system to follow up reasons for non-attendance from both patients and clinicians and to re-schedule appointments, based on nonreturn of a completed management sheet. 5. Patient feedback for patients in primary care. There is some limited supportive trial evidence for these developments, although the existing studies involved small sample sizes and may not be generalisable to the NHS . In evaluating the system with these extended features this study will also address the design shortcomings of previous studies of shared care in diabetes . It will be tailored to each practice, PCG defined areas will be studied, rather than an unrepresentative sample of general practices; and the system will be transparent and replicable in other areas. Methods : Design of the study | The study design is a pragmatic two-arm cluster randomised controlled trial. The unit of randomisation will be the general practice. Simple patient randomised trials are rightly considered the most robust method of assessing most health care innovations . This design, however, cannot be regarded as the gold standard for evaluating systematic approaches to chronic disease management, an essentially behavioural field of research . If both intervention and control patients were to be cared for within the same practice there is the risk that the management of control patients would be influenced by the practitioners knowledge of the care of intervention patients. This would result in an underestimation of the effect of the intervention . Therefore, practices rather than patients are the appropriate unit of randomisation and analysis. As the current system has been in place for different lengths of time within the three participating PCGs, we will stratify the randomisation by PCG. Randomisation will be performed by a statistician independent of the research team using computer generated numbers to avoid allocation bias . Study setting and recruitment of practices | The study will be based in the general practices of the three PCGs of Easington, Hartlepool and Stockton. Since the recent merger of Hartlepool and North Tees Acute Trusts all three PCGs are now exclusively served by one secondary care diabetes service (and thus the one diabetes register). GH is the lead clinician for diabetes services in the new Trust. The 61 general practices in the three PCGs constitute the target practices for the study and we will attempt to recruit all practices. The PCG diabetes leads or the PCG clinical governance leads in all three PCGs have provided letters confirming their support for the project. We do not envisage major difficulties with recruitment, given the need to agree local guidelines as part of the process involved in the Trust merger, the likely requirements in the forthcoming National Service Framework for diabetes, and the 100% practice coverage with the current diabetes system. We will (through the PCGs) write to all practices, giving information about the project to the senior partner or diabetes lead and practice manager of practices. Practices will be invited to opt out if they do not wish to be included in the study -- this is an approach we have used successfully before. The PCG diabetes lead, clinical governance lead and GH will be co-signatories of this letter. If practices do decline we will collect data on characteristics of non-participating practices to assess the impact on the generalisability of the trial's findings. Finally, if there are significant problems with recruitment, there are other practices which could be approached in a nearby PCG (South Tyneside) which uses the same software for its diabetes register. Details of the intervention | Local guidelines and management prompts | A guideline development group will be established to develop local guidelines for the management of diabetes, based upon available evidence based guidelines (Scottish Intercollegiate Guidelines Network (SIGN, 1996, 1997a, 1997b, 1997c), and Effective Care Bulletins . They will also use the forthcoming national diabetes guidelines as these become available. The group will be multidisciplinary and contain primary and secondary care doctors and nurses, patients and the Diabetes Register Facilitator . The group will define review periods for specified patient groups (e.g. patients with diabetes satisfactorily controlled on diet alone should be reviewed every 12 months), referral criteria for patients moving from primary to secondary care and back and simple decision rules for the management prompts. These would be of two types. The first would prompt for actions to be performed and only require their performance to be documented (e.g. asking for a foot examination to be performed in a patient who does not have a recorded foot examination). The second would be more complex and suggest alterations to clinical management on the basis of data in the database (e.g. patients with persistently raised blood pressure should have their anti-hypertensive medication increased). These decision rules will be integrated into the recall and management system. Running the system | The proposed enhancements to the system are designed to require the primary care team to perform no additional work over and above the current configuration. The current database has a patient identifier, a minimum dataset and retrieval systems to support the structured recall of patients. Westman Medical Software has agreed to amend the system as required. A 'circle of information exchange' will be established between the participating general practices and the database. The local guidelines will be used to adapt the current centralised database, along with the practices' preferred method of following up patients (for example, within consultations in routine surgeries or within special clinics). The central database system will identify when patients are due for review (based upon the local guidelines) and will generate a letter to the patient asking them to make an appointment for a review consultation. Patient information or educational materials could be included with the letter. At the same time, the central database will generate a letter to the practice stating that the patient should be making a review appointment in the near future. The letter to the practice will include a management sheet (to be held in the patient's record) to capture an agreed minimum data set to be collected during the consultation. This management sheet will also contain the relevant prompts (as described above). When the patient is seen in the practice, the primary care professional (currently this is usually done by the practice nurse) will complete the management sheet and return a copy for entry onto the central register within a designated period of time. This circle of information is broken if the patient does not visit the general practice as planned or the general practice does not return the management sheet to the central register. If this happens, the central register would alert the Diabetes Register Facilitator who will ascertain the reason for failure and take appropriate action, (e.g. send a reminder to the patient, prompt the practice to return the management sheet). A range of educational activities will be provided for intervention practices, as part of the usual local structures for contact with practices, with some additions, These will include: distribution of information about the trial in local newsletters; meetings with practice clinical governance leads; evening meetings for practice nurses (with small group discussion of the practical implications for intervention practices); and a telephone meeting with the practice diabetes lead (usually the practice nurse) in each intervention practice. Practices in the control arm will continue to receive the recall system as currently configured. Logistical considerations | From the prevalence of patients with diabetes on the current register, there will be about 7500 patients on the system if 61 practices are recruited. Half of these will be in intervention practices. On current patterns of usage, we anticipate there being the need for 1.5 recalls per annum per patient on the register, resulting in about 6000 recalls per year for the intervention group. Assuming a 40 week working year, the system will need to dispatch, receive and process about 150 forms per group per week. Identification of patients | Patients for the structured recall and management system are already identified on the Hartlepool and North Tees database. As some practices have children registered on the system, who are under the care of an exclusively secondary care adolescent service, an age limit of 18 years or over will be set for inclusion. Practices will be asked to check lists of their patients on the database regularly throughout the study. The central database will remove patients from the recall system who are known to have died or moved away. Patient consent | Patients have already consented, or are being consented, to their data being held within the current diabetes register. The study will involve no extra 'routine' data being collected, and this data will be anonymised before being sent for analysis; all data held for analysis will be held in accordance with the Data Protection Act. For the patient-based questionnaire study, we will seek additional patient consent to complete one survey. The three relevant Local Research Ethics Committees have approved the trial. Data collection | The main study outcome measures will be rates of performance of process of care and the patient based measures of functional and psychosocial wellbeing. Data will be collected for 15 months after the start of the intervention. Fifteen months was chosen to allow for patients who are reviewed every 12 months but fail to attend on initial invitation. Process of care variables | Process of care variables will be collected via the computerised database. The exact data to be collected will be determined by both the current content of the database and the guidelines but will include such data items as rates of attendance at clinics and annual reviews, conduct of eye and feet examinations, performance of investigations and prescribing. We will also collect data on clinical measures (e.g. HbA1c, and blood pressure levels). Outcome of care measures | Outcome of care data will be collected, by postal questionnaire, 15 months after commencement of the study. A portfolio of validated and responsive generic and disease specific instruments will be used to measure functional and psychosocial variables that will be potentially influenced by the intervention. These will include: i) The SF36 health status profile which we will use to generate Mental (MCS) and Physical Component Summary Scales (PCS) . ii) The Newcastle Diabetes Symptoms Questionnaire . iii) The Bradley Treatment Satisfaction Questionnaire . Patient costs questions will be developed by the study health economist. We have successfully used such packages of questionnaires within trials before and have achieved response rates in excess of 70% in similar surveys in this region. . Sample size considerations | On the basis of previous work we have made the following assumptions. The mean number of patients per practice for whom we will be able to collect process data will be 30 and the ICC (a measure of the lack of independence of responses from patients from the same practice) calculated from our local data is 0.14 for measures of process (whether a blood pressure measurement and whether an HbA1c measurement has been recorded in a 12 month period). Standard methods for determining the sample size requirements for a cluster randomised trial indicate that we need 60 practices to detect a difference of 15% (42.5% v 57.5%) with 80% power assuming a significance level of 5%. Assessment of outcome of care will be based on health status scales such as the SF-36. Previous work has shown that this type of intervention is likely to produce an effect size of approximately 0.25 in such measures and that the ICCs for such measures will be approximately 0.07. The most efficient study design (that minimises the number of patients required) is one that makes use of all the available practices. A sample of 27 patients from each of 61 practices will give us 85% power to detect an effect size of 0.25 assuming a significance level of 5%. With a predicted response rate of approximately 70% (based on our experience in the COGENT study ) after two reminders, our starting sample size will need to be 2379 patients (approximately 39 patients per practice). Principles of data analysis | Analysis will be by intention to treat. Multilevel modelling (using the MlwiN package ) will be used to take into account the clustering of patients within practices . Both binary variables (when a process was undertaken or not) and continuous variables (such as the physical health component of the SF-36) can be analysed using these techniques. For both types of variable, variation between practices will be fitted as a random effect and the difference between intervention and control practices will be fitted as a fixed effect. In the case of binary variables, a logit link function will be used. Economic evaluation | The economic impact of implementing the new structured recall and management system will be evaluated in terms of the marginal costs of adapting and running the system; the costs of developing and disseminating the guidelines; the educational activities for intervention practices; the implications for the use of health care services; and the costs to the patients and their carers. The benefits will be measured as described earlier on in the clinical study. The estimation of health service resource use will relate to diabetes-specific clinical visits, tests, investigations, and procedures. This data will be routinely collected as part of the management system implementation and subsequent costing, using health service pay and price data, will be undertaken using a mixed approach based on micro-costing and gross-costing methods . Use of drugs, referrals to secondary care and the impact of the intervention on the change of use of patients' and their carers' time will also be monitored through postal questionnaires at the end of the follow-up period. A sensitivity analysis will be undertaken to test the robustness of the results to the uncertainty not related to sampling variations and to enhance the generalisability of the results . We are aware that the costs of the system might be balanced only in the longer term against the cost savings related to averted complications . However, the assessment of the benefits in terms of final outcomes (e.g lives saved, or QALYs) and long term costs is beyond the objective of the present study. Competing interests : None declared Pre-publication history : The pre-publication history for this paper can be accessed here: Backmatter: PMID- 11914164 TI - Inter-rater agreement in the scoring of abstracts submitted to a primary care research conference AB - Abstract | Background | Checklists for peer review aim to guide referees when assessing the quality of papers, but little evidence exists on the extent to which referees agree when evaluating the same paper. The aim of this study was to investigate agreement on dimensions of a checklist between two referees when evaluating abstracts submitted for a primary care conference. Methods | Anonymised abstracts were scored using a structured assessment comprising seven categories. Between one (poor) and four (excellent) marks were awarded for each category, giving a maximum possible score of 28 marks. Every abstract was assessed independently by two referees and agreement measured using intraclass correlation coefficients. Mean total scores of abstracts accepted and rejected for the meeting were compared using an unpaired t test. Results | Of 52 abstracts, agreement between reviewers was greater for three components relating to study design (adjusted intraclass correlation coefficients 0.40 to 0.45) compared to four components relating to more subjective elements such as the importance of the study and likelihood of provoking discussion (0.01 to 0.25). Mean score for accepted abstracts was significantly greater than those that were rejected (17.4 versus 14.6, 95% CI for difference 1.3 to 4.1, p = 0.0003). Conclusions | The findings suggest that inclusion of subjective components in a review checklist may result in greater disagreement between reviewers. However in terms of overall quality scores, abstracts accepted for the meeting were rated significantly higher than those that were rejected. Keywords: Background : Interest in the peer review process and research aimed at determining the method of obtaining the best quality reviews has grown in recent years. Checklists have been developed that aim to guide reviewers when assessing the quality of papers, but little evidence exists concerning the extent of agreement between two referees when evaluating the same paper. In addition, little is known about which dimensions of a checklist are likely to result in greater agreement between referees. There were two aims of this study: (1) to examine inter-rater agreement of the quality of abstracts submitted to a primary care research conference (Annual Meeting of the South West Association of University Departments of General Practice, Exeter 2000, UK), and (2) to compare the scores of abstracts accepted and rejected for the meeting. Materials and Methods : Abstracts were anonymised and scored using a structured assessment comprising seven categories: (1) importance of the topic (2) originality (3) overall quality of the study design (4) appropriateness of the design used (5) achievement of aim (6) contribution to academic primary care (7) likelihood of provoking discussion. For comparison purposes, we have classified the assessment of categories 1, 2, 6 and 7 as more 'subjective' in nature, and categories 3, 4 and 5 as more 'objective'. Between one (poor) and four (excellent) marks were awarded for each category, giving a maximum possible score of 28 marks. Every abstract was assessed independently by two referees (AM and AG). Agreement between referees was assessed using intraclass correlation coefficients (ICC), a chance corrected measure of agreement. The ICC indicates perfect agreement only if the two assessments are numerically equal and is preferable to the more usual (Pearson) correlation coefficient. The crude ICC is lowered by any systematic differences between referees' scores. In terms of a plot of the two referees' scores, a line with a non-zero intercept will further lower the ICC irrespective of any disagreement, represented by deviation of the slope of the line away from unity and scatter around the line. In a further analysis, this effect was investigated by subtracting the mean difference for each component from the higher of the two referees' scores. The ICCs were then recalculated, giving estimates of agreement corrected for both systematic differences and chance. There are no universally applicable standard values for the ICC that represent adequate agreement, but the following convention is used here to aid interpretation: ICC <0.20 'slight agreement'; 0.21 --0.40 'fair agreement'; 0.41 --0.60 'moderate agreement'; 0.61 --0.80 'substantial agreement'; >0.80 'almost perfect agreement'. Scores from referees from three different institutions were summed to give each abstract an overall score. Abstracts were ranked by this overall score and the top 45 were accepted for oral presentation at the meeting. Of the 52 abstracts refereed by AM and AG, mean total scores of those accepted and rejected for the meeting were compared using an unpaired t test. Results : Chance corrected agreement between the two referees' scores measured using crude ICCs was greater for the three components relating to design and execution of the study (Table : items 3 to 5) compared to those relating to more subjective elements of the abstract (Table : items 1, 2, 6, 7). After adjustment for systematic differences in referees' scores, ICCs for items 3 to 5 remained highest, demonstrating fair to moderate agreement. Table 1 | Inter rater agreement between two referees for 52 abstracts submitted for a primary care research conference A total of 76 abstracts were submitted for the meeting. Of 52 received by the authors for assessment, 26 were accepted for oral presentation . Abstracts accepted for the meeting had a significantly higher mean score than those that were rejected (95% CI for difference 1.3 to 4.1, p = 0.0003) . Table 2 | Summary statistics of abstracts accepted and rejected for oral presentation at a primary care research conference Discussion : This study has shown that when using a structured assessment form, two independent reviewers were more likely to agree on design or methodological components of a checklist than on subjective components of abstracts submitted for an annual research meeting. Abstracts accepted for the meeting had significantly higher total scores, but overlapped considerably with rejected abstracts. This was due to acceptance for the meeting being determined by an overall aggregate of scores awarded by referees from three institutions. While the subject of inter-reviewer agreement on different components of a checklist is relatively under-researched, some previous studies offer support for our finding that agreement is better when reviewers can be more objective in their assessments. Among a group of reviewers asked to rate a series of review articles, agreement on scientific quality of the papers was very high (60% of ICCs > 0.7) both within and between groups with varying levels of research training and expertise. All 10 dimensions of the checklist that reviewers rated could be regarded as objective. Divergent reviewers have been identified in a study comparing an overall rating score that indicated a recommendation to publish rather than individual dimensions of a review checklist. This study does have limitations. Importantly, we assessed agreement between only two reviewers on a relatively small number of abstracts. This could be addressed by having more abstracts assessed by a greater number of reviewers. However the study was conducted pragmatically within the time and administrative constraints of a small annual scientific meeting rather than submissions to a journal over an extended period. Another limitation is that the reviewer checklist was constructed prior to conceiving the study. If future meetings are to be used to investigate the content of structured reviewer assessments, such checklists should be constructed with specific hypotheses in mind. Characteristics associated with good peer review are age under 40 years and training in epidemiology or statistics, characteristics that applied to both reviewers in the present study. Structured assessment forms that ask the reviewer for their opinion of a paper's interest, originality or likelihood of provoking discussion may be more likely to result in scores that reflect the reviewer's own research interests. This is not necessarily a criticism -- it is perhaps only natural that individuals will differ in their opinions of how interesting they find, and think others will find, a particular paper. It is interesting that the two components with the lowest agreement, importance of the topic and originality of the study, both require more knowledge about a specific subject area than either of the other two subjective questions. Journal editors and meeting organisers should be aware that including subjective components in review checklists may result in greater disagreement between reviews. Conclusions : This study provides some evidence that inclusion of subjective components in a review checklist may result in greater disagreement between reviewers. An interesting area for further research would be to investigate the effects of attaching different weights to subjective and objective components of a checklist, or to exclude subjective components altogether from overall quality scores and simply use them a guide to acceptance or rejection. Competing interests : None declared. Figure 1 | Difference between referees' scores versus mean score Difference between referees' scores versus mean score Pre-publication history : The pre-publication history for this paper can be accessed here: Backmatter: PMID- 11943069 TI - The Caenorhabditis elegans Y87G2A.14 Nudix hydrolase is a peroxisomal coenzyme A diphosphatase AB - Abstract | Background | The number of Nudix hydrolase family members varies widely among different organisms. In order to understand the reasons for the particular spectrum possessed by a given organism, the substrate specificity and function of different family members must be established. Results | The Y87G2A.14 Nudix hydrolase gene product of Caenorhabditis elegans has been expressed as a thioredoxin fusion protein in Escherichia coli and shown to be a CoA diphosphatase with catalytic activity towards CoA and its derivatives. The products of CoA hydrolysis were 3',5'-ADP and 4'-phosphopantetheine with Km and kcat values of 220 muM and 13.8 s-1 respectively. CoA esters yielded 3',5'-ADP and the corresponding acyl-phosphopantetheine. Activity was optimal at pH 9.5 with 5 mM Mg2+ and fluoride was inhibitory with a Ki of 3 muM. The Y87G2A.14 gene product has a potential C-terminal tripeptide PTS1 peroxisomal targeting signal -- SKI. By fusing a Y87G2A.14 cDNA to the C-terminus of yeast-enhanced green fluorescent protein, the enzyme appeared to be targeted to peroxisomes by the SKI signal when transfected into yeast cells. Deletion of SKI abolished specific targeting. Conclusions | The presence of related sequences with potential PTS1 or PTS2 peroxisomal targeting signals in other organisms suggests a conserved peroxisomal function for the CoA diphosphatase members of this group of Nudix hydrolases. Keywords: Background : The Nudix hydrolase family comprises enzymes that hydrolyse predominantly the diphosphate (pyrophosphate) linkage in a variety of nucleoside triphosphates, dinucleoside polyphosphates, nucleotide sugars and nucleotide cofactors having the general structure of a nucleoside diphosphate linked to another moiety, X . They are found in archaea, eubacteria, animal, plants, and fungi and all possess the Nudix box sequence signature motif Gx5Ex5 [UA]xREx2EExGU (where U is an aliphatic hydrophobic amino acid) . The proposed functions of this family are to eliminate potentially toxic nucleotide metabolites from the cell and to regulate the concentrations of nucleotide cofactors and signalling molecules for optimal cell growth and survival. The number of genes encoding Nudix hydrolases varies widely, from zero in Mycoplasma genitalium to 22 in Deinococcus radiodurans. This variation presumably reflects the growth or environmental adaptability, stress tolerance and metabolic capacity of the different organisms. The Nudix hydrolases thus offer an ideal system with which to study the evolution of a largely inessential protein family and its contribution to the individual biology of an organism. Understanding such variation requires a combination of detailed biochemical, genetic and cellular studies to reveal the individual functions of family members within the set in any given system. In the case of multicellular eukaryotes, the nematode Caenorhabditis elegans offers a genetically amenable model system with which to carry out such studies. There are 11 members of the Nudix hydrolase family in C. elegans. So far only two of these have been characterized -- a diadenosine tetraphosphate pyrophosphohydrolase (the orthologue of human NUDT2) and an NADH diphosphatase . Sequence comparisons would predict the existence of an ADP-sugar diphosphatase (NUDT5 orthologue) , an ADP-ribose diphosphatase (NUDT9 orthologue) , a diphosphoinositol polyphosphate pyrophosphohydrolase (NUDT3/4 orthologue) , two probable coenzyme A diphosphatases, one of which is highly similar to the mouse Nudt7 CoA diphosphatase , and 4 proteins of unknown function, including one with a strong similarity to the Saccharomyces cerevisiae PSU1/DCP2 protein and another similar to the developmentally-regulated mouse RP2 protein . Recent characterization of the S. cerevisiae NADH and CoA diphosphatases and the mouse Nudt7 CoA diphosphatase has revealed that they are located in peroxisomes. The function of these peroxisomal enzymes may be to regulate the concentration of these essential nucleotide cofactors for peroxisomal metabolism or, by analogy with the E. coli MutT 8-oxo-dGTPase, to eliminate toxic modified cofactor metabolites from the highly oxidizing peroxisomal environment. In order to investigate these possibilities in the C. elegans model system, we have cloned and characterised the putative C. elegans Y87G2A.14 CoA diphosphatase and shown that it displays the expected enzymatic activities and that it appears to be targetted to peroxisomes by a C-terminal PTS1 targeting signal. Results and discussion : Cloning, expression and purification of Y87G2A.14 | The C. elegans Y87G2A.14 gene encodes a 234 amino acid protein with an expected molecular weight of 26,601 Da. It was amplified by PCR from a C. elegans cDNA library. The PCR fragment was inserted into the pET-32b(+) expression vector and the nucleotide sequence of the insert was determined to be exactly the same as that submitted to GenBank under accession no. CAB54476. The recombinant plasmid pETY87G2A.14 was then used to transform E. coli BL21 (DE3) cells to generate a His-tagged thioredoxin fusion protein with an expected molecular mass of 43,731 Da. When the Trx-Y87G2A.14 fusion protein was expressed at 37C, it was confined to inclusion bodies, so the induction temperature was decreased to 25C to enhance protein solubility. As the expression level was low at this temperature, the induction time was increased to 8 h. These conditions markedly increased the solubility of Trx-Y87G2A.14 which was then purified from the soluble fraction (Fig , lane 2) to apparent homogeneity on NiCAMTM-HC resin (Fig , lane 3). To determine the molecular weight of the Y87G2A.14 protein itself, the Trx-Y87G2A.14 fusion was cleaved with thrombin, which generated Y87G2A.14 with an apparent molecular weight of 27 kDa (expected molecular weight, 29,807 Da) and thioredoxin (15 kDa, Fig , lane 4). Figure 1 | Purification and cleavage of Trx-Y87G2A.14 fusion protein. Purification and cleavage of Trx-Y87G2A.14 fusion protein. Samples were analysed by SDS-PAGE (15% polyacrylamide) and stained with Coomassie Brilliant blue R 250. Lane 1, 2 mug protein standards: bovine serum albumin (66 kDa), ovalbumin (45 kDa), glyceraldehyde 3-phosphate dehydrogenase (36 kDa), carbonic anhydrase (29 kDa), trypsinogen (24 kDa), soybean trypsin inhibitor (20 kDa) and alpha-lactalbumin (14.2 kDa); lane 2, soluble cell extract of BL21 (DE3) cells transformed with recombinant plasmid pETY87G2A.14 and induced with 1 mM IPTG for 8 hours at 25C before applying to a column of NiCAMTM-HC resin ; lane 3, 3 mug purified Trx-Y87G2A.14 fusion protein; lane 4, 3 mug purified Trx-Y87G2A.14 fusion protein after cleavage with thrombin. Substrate specificity and product analysis | Purified Trx-Y87G2A.14 was inactive towards the following nucleotides when assayed at a fixed concentration of 0.5 mM: NADH, NAD+, NDP-sugars, 5'-(d)NTPs, 5'-NDPs, 5'-NMPs and diadenosine polyphosphates. High activity was found with CoA and its derivatives. HPLC analysis of CoA hydrolysis by Trx-Y87G2A.14 showed that the enzyme was a CoA diphosphatase, cleaving the diphosphate linkage in CoA to yield adenosine 3',5'-bisphosphate (3',5'-ADP) and 4'-phosphopantetheine . Figure 2 | Identification of reaction products of CoA hydrolysis. Identification of reaction products of CoA hydrolysis. Reaction mixtures containing 0.5 mM CoA were incubated at 37C for 20 min with or without 0.1 mug Trx-Y87G2A.14 fusion protein and the products separated by HPLC as described in Materials and methods. Without enzyme (------), with enzyme , gradient ( --- --- --- --- ---). Positions of authentic standards are indicated. Reaction requirements and kinetic parameters | Trx-Y87G2A.14 displayed optimal activity with 0.5 mM CoA as a substrate at pH 9.5. A divalent metal ion was absolutely required for activity, with optimal activity at 5 mM MgCl2. In common with all other Nudix hydrolases tested, fluoride was a strong inhibitor with a Ki value of approximately 3 muM (results not shown). Km, and kcat values for CoA, CoA esters and oxidized CoA were calculated by non-linear regression from data obtained by HPLC analysis . A graphical example of the data for CoA in the form of a hyperbolic plot and double reciprocal plot show that the enzyme obeys simple Michaelis-Menten kinetics. The kcat / Km ratios show that the enzyme prefers reduced forms of CoA to oxidized CoA with CoA itself the best substrate of those tested. Figure 3 | Lineweaver-Burk and Michaelis-Menten (inset) plots for the hydrolysis of CoA. Lineweaver-Burk and Michaelis-Menten (inset) plots for the hydrolysis of CoA. Reaction mixtures containing various concentrations of CoA (0.05 --0.7 mM) were incubated at 37C for up to 20 min with 0.1 mug Trx-Y87G2A.14 fusion protein. Initial rates of hydrolysis were determined after separation of the products by HPLC as described in Materials and methods. Table 1 | Kinetic parameters for the hydrolysis of CoA and CoA derivatives by Trx-Y87G2A.14 fusion protein Subcellular localization | Y87G2A.14 has the C-terminal tripeptide sequence SKI. This conforms to the pattern typical of PTS1 peroxisomal targeting signals found in many peroxisomal matrix proteins, suggesting that Y87G2A.14 may be targeted to these organelles . However, possession of a potential PTS1 sequence is not always sufficient on its own to result in peroxisomal targeting and other elements of the protein sequence may also be involved. Since targeting of animal peroxisomal proteins expressed in yeast has often been observed, yeast cells were transformed with expression plasmids encoding C-terminal or N-terminal fusions of Y87G2A.14 to yeast-enhanced green fluorescent protein (yEGFP) in order to determine the subcellular location of Y87G2A.14. The cells were then examined by confocal microscopy. Cells transformed with pY87G2A.14-yEGFP, in which the C-terminus of Y87G2A.14 is fused to the N-terminus of yEGFP, showed a diffuse, cytoplasmic fluorescence with no clear subcellular localization . In contrast, cells transformed with pyEGFP-Y87G2A.14, in which the C-terminal tripeptide SKI is free to act as a targeting signal showed the clear punctate fluorescence that is indicative of yeast peroxisomes . The identity of SKI as the targeting signal was confirmed by transformation of cells with pyEGFP-Y87G2A.14Delta SKI, in which the C-terminal tripeptide was deleted during construction. This again showed a diffuse, cytoplasmic, fluorescence . Together, these results strongly suggest that Y87G2A.14 is targeted to peroxisomes by its C-terminal tripeptide, SKI. Figure 4 | Subcellular localization of Y87G2A.14 by fluorescence confocal microscopy. Subcellular localization of Y87G2A.14 by fluorescence confocal microscopy. yEGFP fluorescence of yeast cells transformed with (a) pY87G2A.14-yEGFP; (b) pyEGFP-Y87G2A.14 and (c) pyEGF-Y87G2A.14DeltaSKI Conclusions : On the basis of its sequence, the C. elegans Y87G2A.14 gene product was predicted to be a peroxisomal coenzyme A diphosphatase. In addition to the Nudix motif, Y87G2A.14 possesses the PROSITE UPF0035 motif , which we have previously suggested confers a specificity for coenzyme A and its derivatives , and a C-terminal tripeptide, SKI, that conforms to the pattern typical of PTS1 peroxisomal targeting signals. The experiments described here confirm these predictions. Fig shows a multiple sequence alignment of the motif-containing region of Y87G2A.14 with related sequences from other organisms. Those marked with a tick have been experimentally shown to be coenzyme A diphosphatases . In most cases, higher organisms possess two related sequences, e.g. mouse Nudt7 and Nudt8, one of which encodes a peroxisomal enzyme (e.g. Nudt7). However, S. cerevisiae has only one sequence containing the UPF0035 motif while Arabidopsis thaliana has three, and the second of the two Drosophila melanogaster sequences, RH61317, is currently only represented in GenBank by a single expressed sequence tag, so its status is still questionable. For the peroxisomal enzymes, either a putative C-terminal PTS1 or an N-terminal PTS2 targeting signal is present. Interestingly, in each case, the putative PTS2 signal is contained within or near a predicted mitochondrial targeting or chloroplast transit peptide sequence , suggesting a possible dual location for these proteins. Such a possibility has not yet been experimentally observed; however, mutation of a glutamate five residues to the C-terminal side of the PTS2 of rat peroxisomal 3-ketoacyl-CoA thiolase to a neutral or basic amino acid has been shown to result in partial mitochondrial targeting, suggesting that the negative charge on glutamate may normally block translocation to the mitochondria . Whether or not a system exists in vivo to regulate dual targeting is clearly a topic requiring further investigation. The non-peroxisomal sequences provide no clear indication of possible subcellular location, hence they are likely to be cytoplasmic. Given the existence of mitochondrial, peroxisomal and cytoplasmic pools of CoA and CoA esters , it would not be surprising to find CoA diphosphatase activity in all these locations. However, the precise substrate specificities of the "cytoplasmic" activities remain to be determined. Figure 5 | Partial sequence alignment of Y87G2A.14 and related sequences. Partial sequence alignment of Y87G2A.14 and related sequences. The partial sequence of Y87G2A.14 containing the UPF0035 and Nudix motifs (arrowed) was aligned using the Clustal W program with related sequences from other organisms retrieved from a BLAST search. Organisms and database accession numbers are: Caenorhabditis elegans Y38A8.1, Q23236; Homo sapiens NUDT7, XP_058753; H. sapiens NUDT8, AI743601; Mus musculus Nudt7, Q99P30; M. musculus Nudt8, AK009700; Drosophila melanogaster CG11095, Q9VY79; D. melanogaster RH61317, BI631687; Schizosaccharomyces pombe YDH5, Q92350; S. pombe YDZA, 013717; Ambidopsis thaliana At2g33980, 022951; A. thaliana At1g28960, Q9SHQ7; A. thaliana At5g45940, BAB09322; Saccharomyces cerevisiae PCD1, Q 12524; Escherichia coli YeaB, P43337; Deinococcus radiodurans DR1184, Q9RV46. Sequences encoding experimentally confirmed CoA diphosphatases are marked with a tick. Columns on the right indicate whether the full sequence contains a putative peroxisomal targeting signal (PTS1 or PTS2) and/or a putative mitochondrial targeting peptide (mTP) or chloroplast transit peptide (cTP). Regarding the possible function of these enzymes in general, and the C. elegans peroxisomal enzyme in particular, a recent functional genomic analysis by RNA-mediated interference of C. elegans chromosome I, on which the Y87G2A.14 gene is located, revealed no phenotype in relation to growth, survival, fecundity or morphology when the expression of Y87G2A.14 was ablated . This would indicate that, within the limitations of RNAi, the CoA diphosphatase activity of Y87G2A.14 is not essential. However, now that the biochemical properties of this protein have been established, a more detailed and targeted biochemical analysis can be undertaken that should reveal its cellular function and benefit to the organism. Nudix hydrolases are believed to regulate the concentrations of nucleotides for optimal cell performance and also to eliminate potentially toxic nucleotide metabolites from the cell. With regard to regulation, CoA diphosphatase activity is associated with the 400 kDa CoA synthesizing protein complex from S. cerevisiae, in which it forms part of an alternative pathway for CoA biosynthesis that differs from the principal route of 3'-dephospho-CoA and CoA synthesis by this complex . This CoA/4'-phosphopantetheine cycle involves hydrolysis of CoA to 3',5'-ADP and 4'-phosphopantetheine, which then reacts with ATP to give 3'-dephospho-CoA then CoA. Whether such a pathway operates in peroxisomes and whether the C. elegans Y87G2A.14 protein is involved remain to be established. With regard to the elimination of toxic nucleotide metabolites, the 13-fold higher kcat / Km ratio for oxidized CoA (CoASSCoA) compared to CoA for the S. cerevisiae PCD1 CoA diphosphatase previously suggested to us that this enzyme might preferentially remove non-functional and potentially toxic oxidized CoA and CoA esters from within the oxidizing environment of the peroxisomes . However, neither the mouse Nudt7 nor the C. elegans Y87G2A.14 proteins show this preference. Nevertheless, the potential production of adenine ring-oxidized derivatives of CoA by reactive oxygen species generated in the peroxisomes analogous to the 2-oxo-dATP and 8-oxo-dATP substrates of the mammalian MTH1 Nudix hydrolase suggests that such species could be more relevant substrates for peroxisomal CoA diphosphatases in vivo. The amenability of C. elegans to studies of cellular and molecular stress will allow the question of the biological function of these enzymes to be addressed. Materials and methods : S. cerevisiae strain BY4741 (MAT a; his3D1; leu2D0; met15D0; ura3D0) was from Research Genetics. Calf intestinal alkaline phosphatase, yeast inorganic pyrophosphatase, EcoR1 and BamH1 were from Roche while BspH1 (Pag1) was from Helena Biosciences. Pfu DNA polymerase was from Stratagene. All other chemicals and nucleotides were from BDH or Sigma. The E. coli expression vector pET-32b(+) was from Novagen and the yeast-enhanced green fluorescent protein (yEGFP) fusion vectors pUG35 and pUG36 were a gift from J.H. Hegemann, Institute of Microbiology, University of Dusseldorf, Germany. The C. elegans cDNA library was prepared from adult nematodes by H. M. Abdelghany, School of Biological Sciences, University of Liverpool, U.K. Cloning of Y87G2A.14 from C.elegans | A cDNA corresponding to the C. elegans Y87G2A.14 gene on chromosome 1 (GenBank accession no. CAB54476) was amplified from a cDNA library by PCR using as forward and reverse primers 5' GCAAATCATGAAGTGTGTGGTTAGCCGAGCTG 3' and 5' TAAATGAATTCACTAAATTTTGGATTTCGGTTC 3' respectively. These primers provided a BspH1 restriction site at the start of the amplified gene and an EcoR1 site at the end. After amplification with Pfu DNA polymerase, the DNA was recovered by phenol/chloroform extraction and digested with BspH1 and EcoR1. The digest was gel-purified and the restriction fragment ligated into the Nco1 and EcoR1 restriction sites of pET-32b(+) as both BspH1 and Nco1 form compatible ends with each other. The resulting construct, pETY87G2A.14, yielded Y87G2A.14 downstream of the 109-amino acid thioredoxin (Trx) fusion and His-tag and S-tag sequences under the control of an IPTG-inducible promoter. The structure of the insert was confirmed by sequencing. The construct was propagated by transformation of E. coli XL1-Blue cells. Expression of Y87G2A.14 in E. coli and protein purification | E. coli strain BL21(DE3) was transformed with pETY87G2A.14. A single colony was picked from an LB agar plate containing 50 mug/ml ampicillin and inoculated into 10 ml LB medium containing 50 mug/ml ampicillin and incubated at 37C. When the cells reached an A600 of 0.5, they were transferred to 1 litre of fresh LB medium containing 50 mug/ml ampicillin and grown to an A600 of 0.3 at 37C, then transferred to an incubator at 25C. Isopropyl-1-thio-beta-D-galactopyranoside (IPTG) was added to 1 mM at an A600 of 0.8, and the cells induced for 8 h. The induced cells (4 g) were harvested, washed and resuspended in 20 ml breakage buffer (50 mM Tris-HCl, pH 8.0, 50 mM NaCl, 1 mM DTT). The cell suspension was sonicated and the cell lysate was then cleared by centrifugation at 10,000 x g at 4C for 15 min. The supernatant was applied to a 15 x 50 mm column of NiCAMTM-HC resin (Sigma) equilibrated with 50 mM Tris-HCl, pH 8.0, 0.5 M NaCl, 1 mM 2-mercaptoethanol at a flow rate of 0.5 ml/min. After eluting the unbound proteins, a linear gradient of 0 --40 mM histidine in the same equilibration buffer was applied at flow rate of 1 ml/min and 1 ml fractions collected and analysed by SDS-PAGE. Those containing pure Trx-Y87G2A.14 fusion protein were collected, dialysed overnight at 4C against 1 litre of 20 mM Tris-HCl, pH 8.0, 50 mM NaCl, ImM DTT and then concentrated by ultraflltration (Amicon) and stored at -20C in 50% glycerol. Enzyme assays | Potential substrates were screened by measuring the Pi released after nucleotide hydrolysis in presence of inorganic pyrophosphatase or alkaline phosphatase . The standard assay (200 mul) for phosphodiester substrates was incubated at 37C for 30 min and contained 50 mM l,3-bis [tris(hydroxymethyl)-methylamino]propane-HCl (BisTrisPropane-HCl), pH 8.0, 5 mM MgCl2, 1 mM DTT, 0.5 mM substrate, 0.1 mug of Trx-Y87G2A.14 fusion protein and 0.5 mug (1 unit) alkaline phosphatase. Assays with phosphomonoester substrates were as above, except 0.5 mug (100 mU) inorganic pyrophosphatase was used instead of alkaline phosphatase. The Pi released in each case was measured colorimetrically. Chromatographic analysis | Kinetic parameters and reaction products generated from hydrolysis of CoA and its derivatives were measured by high performance anion-exchange chromatography. The reaction mixtures (100 mul) contained 50 mM BisTrisPropane-HCl, pH 9.5, 5 mM MgCl2, 1 mM DTT (in cases of substrates requiring reducing conditions), substrate in the range of 0.05 --0.7 mM, (0.1 --1 mM in the case of oxidized CoA) and were incubated at 37C for up to 20 min (during which time the reaction rates remained linear) with 0.1 mug Trx-Y87G2A.14 fusion protein. A 90 mul sample of each reaction mixture was applied to a 1 ml Resource-Q column (Amersham Pharmacia Biotech) equilibrated with 0.045 M CH3COONH4 (pH 4.6, adjusted with H3PO4), and eluted with a linear gradient from 0 to 0.45 M NaH2PO4 (pH 2.7 adjusted with CH3COOH) for 10 min at a flow rate of 2 ml/min . Elution was monitored at 259 nm and peaks identified with the aid of standards and quantified by area integration. GFP fusion constructs and subcellular localization | Expression plasmids encoding C-terminal and N-terminal fusions of Y87G2A.14 to yeast-enhanced green fluorescent protein (yEGFP) were constructed by amplification of the coding region of Y87G2A.14 from C. elegans cDNA by PCR using the same forward primer 5' CGACGGATCCATGAAGTGTGT 3' and one of the reverse primers 5' TAAATGAATTCACTAAATTTTGGATTTCGGTTC 3', 5' CACTAAGAATTCTATTTCGGTTCAAATTTCCTACTTGC 3', or 5' GCTCGAATGAATTCAATTTTGGATTTCGGTTC 3' to give PCR products "C", "CDeltaSKI" or "N" respectively. These primers provided a BamH1 restriction site at the start of the amplified gene and EcoR1 sites at the end. PCR products "C" and "CDeltaSKI" were cloned as C-terminal fusion proteins to yEGFP, while PCR product "N" with a deletion of the Y87G2A.14 termination codon was cloned as an N-terminal fusion to yEGFP. After amplification with Pfu DNA polymerase, the DNA products were recovered by phenol/chloroform extraction and digested with BamH1 and EcoR1. The digested PCR products "C", and "CDeltaSKI" were gel purified and the restriction fragments ligated between the BamH1 and EcoR1 restriction sites of pUG36 (yEGFP-C-fusion) to give pyEGFP-Y87G2A.14 and pyEGFP-Y87G2A.14DeltaSKI respectively. The digested PCR product "N" was gel purified and the restriction fragment ligated between the BamH1 and EcoR1 restriction sites of pUG35 (yEGFP-N-fusion) to give pY87G2A.14-yEGFP. The structures of the inserts were confirmed by sequencing. The plasmids were propagated by transformation of E. coli XL 1-Blue cells. For microscopy, S. cerevisiae strain BY4741 was transformed with pyEGFP-Y87G2A.14, pyEGFP-Y87G2A.14DeltaSKI or pY87G2A.14-yEGFP and grown on solid SC-Ura medium containing 2% glucose. Cells were viewed by conventional and confocal fluorescent microscopy on a Zeiss LSM510 confocal microscope with a 100 x 1.4 NA objective. Other methods | Protein concentrations were estimated by the Coomassie blue binding dye-based colorimetric method using equal weights of bovine serum albumin, conalbumin, cytochrome c and myoglobin as standards . Backmatter: PMID- 11914163 TI - Outcomes research in the development and evaluation of practice guidelines AB - Abstract | Background | Practice guidelines have been developed in response to the observation that variations exist in clinical medicine that are not related to variations in the clinical presentation and severity of the disease. Despite their widespread use, however, practice guideline evaluation lacks a rigorous scientific methodology to support its development and application. Discussion | Firstly, we review the major epidemiological foundations of practice guideline development. Secondly, we propose a chronic disease epidemiological model in which practice patterns are viewed as the exposure and outcomes of interest such as quality or cost are viewed as the disease. Sources of selection, information, confounding and temporal trend bias are identified and discussed. Summary | The proposed methodological framework for outcomes research to evaluate practice guidelines reflects the selection, information and confounding biases inherent in its observational nature which must be accounted for in both the design and the analysis phases of any outcomes research study. Keywords: Background : The development of practice guidelines | In clinical medicine, variations exist that do not appear to be related to variations in the clinical presentation and severity of disease . In response, practice guidelines have been developed in an attempt to reduce the wide practice variations and, through this process, to increase the appropriateness and quality of medical care and to reduce health care costs . Despite the publication and dissemination of practice guidelines , there has been relatively little evaluation of the application and impact of clinical practice guidelines . Some of the difficulty in the evaluation of these guidelines relates to the methods that were used to develop them . Guidelines have often have been developed before adequate data have been available to assess the relationship between clinical practice patterns and desired clinical outcomes. Nevertheless, there have been some reviews of practice guideline evaluation . While epidemiological designs are commonly used to evaluate the effectiveness of health care interventions, never has this been discussed in the context of outcomes research. We propose the use of a methodological framework for outcomes research to evaluate practice guidelines. Methodological issues with the measurement of practice variations | In the debate about reasons to promote the development of practice guidelines, few have questioned whether the variations are real, or alternatively, whether they are simply a function of methodological flaws in the measurement of medical practices themselves, the result of variations in practice patterns across groups of patients with a similar diagnosis, or both. Furthermore, few studies have addressed whether practice variations, in fact, lead to outcome variations. Finally, little attention has been paid to the identification and measurement of initial conditions, that is, the potentially confounding factors and effect modifiers of the practice patterns outcomes relationship. Measurement of practice pattern variation | The measurement of medical practice patterns is susceptible to error. Measurement error may affect the validity of medical practice measurement in three major ways. First, it may lead to selection bias, in that subjects are selected to belong to a certain group based on an erroneous diagnosis. Secondly, it may lead to misclassification of exposure (information bias), in that patients treated with a specific practice pattern are classified in the wrong diagnostic group. Thirdly, it may lead to misclassification of outcomes, in that patients with a given outcome are classified in the wrong diagnostic group. Potential problems with the measurement of practice variations relate to the mechanisms that underlie the choice of groups that are compared in studies of practice variations. These mechanisms must be defined clearly to minimize selection bias. In many studies of practice variations, populations are arbitrarily divided according to hospitals, regions, counties, or countries. Little information is available about the factors that lead these groups to go to a particular hospital, live in a particular region, go to a particular doctor, etc. The population base from which each comparison group is derived should, in principle, be quite similar for all groups. Basically, if the groups are drawn from a similar population, unmeasurable and potentially confounding variables are more likely to be equally distributed between groups. In addition, the measurement of practice variations cannot be valid without information on relevant "initial conditions". Initial conditions are all confounding factors and effect modifiers, other than the treatment/practice patterns, that may cause or influence the clinical outcomes of interest. These factors may explain practice variations among groups that do not share similar initial conditions. To evaluate practice patterns-outcomes associations, potential confounders must be identified and controlled for in the analysis. Aside from clinical presentation and severity of illness; the initial conditions to be identified and characterized as completely as possible include physician, patient, and practice environment factors . Measurement of such factors is essential to minimize the chance of a systematic error following confounding biases and effect modification . Figure 1 | Table 1 | Initial conditions to be taken into account when making inferences about practice patterns-outcomes associations Identification and measurement of outcomes of interest | Limitations to the development and evaluation of practice guidelines also include the absence of a clear concept of the targeted outcomes and the paucity of outcomes data to support these guidelines . There appears to be only a weak relationship between the purpose of guidelines and many of the outcomes usually measured in clinical research, that is, the source of evidence for guideline development (evidence-based). The initial goals of establishing practice guidelines -- to reduce costs and enhance the quality and appropriateness of treatment -- are, in fact, rarely the basis for guideline development, since little data is available for these outcomes. To some degree, the development of guidelines has been driven by the availability of data on clinical outcomes, such as morbidity and mortality, rather than those outcomes related to the primary goals of the guidelines. The evaluation of practice guidelines | Throughout the development of practice guidelines, the major deficiency has been the lack of an evaluative method . Thus, we suggest a methodological framework for outcomes research to be applied to evaluate practice guidelines. Outcomes research evaluates practice patterns as they occur in actual clinical settings. This type of research can describe practice patterns, evaluate their divergence from practice guidelines and determine the effect of practice variations on outcomes. Outcomes research is necessarily observational in nature and, although observational studies have been used to evaluate health care interventions, the proposed methodological framework has yet to be applied to outcomes research. Why should outcomes research be used to evaluate and validate practice guidelines? The primary goal of practice guidelines is the consistent adherence by physicians to practice patterns that achieve the "best" outcomes at the lowest cost. Outcomes research evaluates practice patterns as they occur in actual clinical settings, and is thus the logical method to evaluate practice guidelines. In fact, outcomes research and practice guidelines are connected through concepts that relate to efficacy and effectiveness research . Efficacy studies, which normally complement practice guideline development, are those performed in highly selected groups of patients to investigate if a particular intervention works under controlled conditions set by the study investigators. In contrast, outcomes research evaluates practice as it occurs in actual clinical settings . Research in these settings is called effectiveness research because the investigators have limited control over the conditions that qualify the practice settings. The difference between efficacy and effectiveness research can be summarized as follows: does it work at all (efficacy) or does it work in the real world (effectiveness)? Thus, there exists a dynamic process in which evidence from both effectiveness and efficacy studies feeds into the development and evaluation of practice guidelines, as depicted in Figure . Figure 2 | Relationship between outcomes research and practice guidelines Relationship between outcomes research and practice guidelines Most practice guidelines are derived from efficacy studies rather than effectiveness studies. Therefore, it is not surprising that practice guidelines are not fully applicable in actual clinical practice. We suggest that effectiveness studies be used not only as a method to evaluate practice guidelines but also as a basis for their development. These could include both observational studies and effectiveness trials. Outcomes research better reflects practice in the real world and may make guidelines more likely to be applied. However, to date, little attention has been paid to the epidemiological underpinnings of the methods used to conduct outcomes research. Discussion : We will first propose a methodological framework for outcomes research. Then, we will show how it can be used to evaluate practice guidelines. Finally, we will address the limitations of the proposed methodological framework. Generic epidemiological issues in outcomes research | In the proposed methodological framework, the generic issues related to outcomes research will be discussed in sequential order. In outcomes research, the first step is to identify the study population and the groups (hospitals, providers, regions, etc.) that will be compared. The next step is the measurement of practice patterns and outcomes. After groups are compared on the basis of the treatment they receive and outcomes of interest, associations are sought between practice patterns and the various measures of outcome. This step of the methodological framework raises issues of confounding bias because not all factors that can confound these associations are measured and controlled or even known. The presence or absence of confounding bias can be affected by the other sources of bias namely selection and information biases. Lastly, we discuss the issue of temporal trends. In the evaluation of practice guidelines, the measurement of practice patterns may not be contemporaneous with the publication of practice guidelines. This may explain and even lead to the frequently observed discrepancy between the actual practice and what the guidelines state that it should be. Finally, two particularities of outcomes research 1) the presence of ecological exposures in individual level studies and 2) the common use of large administrative databases are discussed. Specification of the model | Definition of the elements of the proposed epidemiological model for outcomes research | In the proposed model for outcomes research designed to evaluate practice guidelines, the outcome of interest can be a disease . For example, if the practice patterns that are being studied pertain to coronary revascularization, complications such as mortality and reinfarction after acute myocardial infarction may constitute the outcome of interest. Finally, the consequences of different practice patterns on medical resources (cost, quality and appropriateness) may be another possible outcome of interest. Table 2 | Epidemiological model for outcomes research to evaluate practice guidelines In the studies of outcome research, practice patterns, (which constitute the exposure in the proposed model), range from the use of medication, diagnostic tests and therapeutic procedures to the length of hospital stay, transfer to other facilities and/or scheduled physicians visits. The primary goal of outcomes research is the evaluation of the effects of the selected practice patterns on the outcomes of interest. Consequently, any inference made about this association must be evaluated as a function of the potential selection, information (measurement error) and confounding biases. A limitation of outcomes research as it is most often performed is the lack of attention given to the measurement of each of the elements of the epidemiological model shown in Table 3. The basis of the proposed methodological framework will be the identification of generic sources of potential bias that relate to each element of the proposed model. Selection bias | Since outcomes research is observational in nature, the choice of the study population and of the compared groups is highly susceptible to selection bias. As applied to outcomes research, selection bias is defined as a distortion in the estimate of the practice patterns outcomes association due to the way that subjects are selected for inclusion in the study population and in the different groups to be compared . A major consequence of selection bias is the potential confounding of inferences made about practice patterns-outcomes associations. This occurs when some characteristics of the subjects related to practice patterns or clinical outcomes influence the selection or exclusion of individual subjects, groups of subjects or practice environments. The selection process should be such that patients included in the study population come from the same target population . Furthermore, patients or study members should have a similar probability of being selected and included in the actual population. Inclusion and exclusion criteria must be clearly defined in order to characterize the actual population as precisely as possible. Judging the internal validity of a study is more feasible when there is a detailed account of how the individuals were selected to become members of the actual population. Finally, the study population, also needs to be carefully characterized so that the inferences derived from the analysis of the study population can be evaluated for both internal validity (based on the data analyzed in the study) and external validity (the extent to which results obtained from the data analyzed in a particular study can be generalized to populations outside of the study). Any systematic differences between those actually studied and the source (target) population could result in biased estimates of the impact of a practice pattern on a clinical outcome. In many studies of outcomes research, groups exposed to different practice patterns are compared. The identification of such groups of patients is sought to assess the impact of different practice patterns on various outcomes in actual clinical settings and, as previously mentioned, can be used to assess practice guidelines. Because of such study design, it becomes unclear as to what the target population precisely is. Is it the group (the set of patients in a given environment) or is it the individuals receiving the various practice patterns within each group? For example, in a study of regional variations in the treatment of acute myocardial infarction in the U.S., the treatment of patients (practice patterns) was compared across different regions of the U.S. In this study, one wishes to generalize the findings about practice patterns-outcomes associations to all individuals with acute myocardial infarction (individual level). One also wishes to generalize the effect of the exposure, which is in this case practice patterns, to those prevalent in a given region (ecological level). The presence of these two levels, the individual and the ecological levels, introduces an added level of complexity in terms of the assessment of the effect of the exposure on outcome. When comparing practice patterns across regions using individual data, there is a certain degree of correlation brought about by the clustering of practice patterns that needs to be taken into account. Such a correlation is very difficult to quantify. In contrast, when assessing the effect of the exposure at the individual level, there are ecological factors (initial conditions particular to a given region) that need to be taken into account. The data originating from studies with mixed design, which are often the design of outcomes research studies, need to be analyzed with special attention to the degree of correlation between the individual covariates and to the presence of ecological exposure variables. Another potential source of selection bias is the choice of the groups to be compared, which depends on the criteria used to divide the groups. Individuals included in the groups to be compared should have the same probability of being included in these groups. Not infrequently in outcomes research, geographic criteria (such as country, regions, hospitals) are used because such criteria allow the identification of clinically comparable groups that receive very different treatments, whose resulting outcomes can then be assessed. However, such a process must be scrutinized for the possibility of selection bias other than the treatments that are being evaluated. Such selection bias would make groups not comparable as to clinical and other factors that could affect outcomes. The presence of a biased selection process could lead to confounding bias when practice patterns-outcomes associations are assessed. Such a situation may occur when the study groups are not comparable with regard to some characteristics of the subjects related to practice patterns or clinical outcomes that influenced the selection or exclusion of individual subjects, groups of subjects or practice environments. For example, in the same study of regional variations in the treatment of acute myocardial infarction, census regions of the U.S. were arbitrarily chosen as a basis for comparison. In this example, patients with similar risk of developing the outcome of interest, which is defined here as a complication after acute myocardial infarction, may not have had the same probability of being included in the different groups to be compared. Confounders may then bias the practice patterns/outcomes association if the selection of different risk groups is related to practice patterns. Selection bias can also affect the assessment of outcomes. Potential sources of this bias include loss to follow-up or missing data. Follow-up data is difficult to obtain in outcomes research studies, which often rely on administrative databases for data acquisition. Linkage, either of different databases or of the same database over time, is often performed . A failure to link the databases for a number of individuals presents a problem equivalent to having data missing for these individuals. Information bias | The second step in outcomes research studies is the measurement of practice patterns and of the outcomes of interest. Here, issues of information bias must be considered. Information bias can be defined as a distortion of the potential practice patterns outcomes association due to misclassification of subjects with regard to practice patterns, outcome measures or both, or due to measurement error . There are two major ways in which practice patterns can be misclassified. They relate to the sensitivity and specificity of the tests that are used for the diagnosis for which practice patterns are being evaluated and for the classification of the outcomes of interest. The measurement of the different practice patterns and their related outcomes largely depend on the identification of a group of patients who have a given diagnosis and require a given treatment. The characteristics that make a diagnosis more amenable to outcomes research are the following: 1) a precise diagnostic definition, 2) a diagnostic test with high sensitivity and specificity, 3) reproducibility among different individuals and locations, 4) easily coded, 5) related to a procedure, and 6) common and costly, so that it is likely to be collected in large, administrative databases frequently used in outcomes research. Because of such requirements, only a limited number of clinical conditions are amenable to outcomes research. Acute myocardial infarction is an example of a diagnosis that can be made with a high level of certainty because it has a precise diagnostic definition and well-defined diagnostic criteria, which, when taken together, have high sensitivity and specificity for the correct classification of patients. Therefore, it is easy to identify a study population that, in fact, has this disease and to describe their treatment. Thus, in order to minimize the misclassification of relevant practice patterns, the methods used to classify the disease and the outcomes that relate to the practice patterns under investigation must have high sensitivity and specificity . Given the principles underlying the measurement of practice patterns and outcomes, how are the measurements generally made in outcomes research studies? The measurement of the exposure (practice patterns) in outcomes research is valid only if it corresponds to the "true" practice as performed in the clinical setting. Again, practice can only be "true" if the diagnosis is correct. The identification of both patients with the disease of interest and their treatment requires a source of information that has the features of a diagnostic test. In outcomes research, administrative databases are often used as an information source to identify a study population and to obtain data on exposure. The database coding of diagnoses and procedures can be used as a "diagnostic test" to identify the clinical condition for which practice patterns will be described and to classify the practice patterns themselves and the outcomes of interest. Such a "diagnostic test" will have higher sensitivity and specificity values for some diagnoses than for others. For example, administrative database coding will have higher sensitivity and specificity for procedure-related diagnoses (such as hip fracture) because the diagnostic code is related to a major operation and is likely to be recorded for administrative purposes. In contrast, a diagnostic criterion for osteoarthritis can be quite vague and administrative coding is likely to have very low sensitivity and specificity for this diagnosis. The use of databases as a diagnostic test must be validated in all outcomes research studies, especially those using administrative databases. Methods to validate these databases include chart reviews, a priori coding systems or both. These validation methods ensure that coding is as accurate and reproducible as possible, thus allowing the database to be used as a diagnostic test to identify the study population and the practice patterns and the outcomes in outcomes research. However, these validation methods are rarely used. Finally, appropriate measures of outcomes that will serve to evaluate practice guidelines must be identified. This presents a problem because most practice guidelines aim to reduce practice variations, which will, in turn, lead to improved appropriateness and quality of care. However, how appropriateness and quality of care are measured is controversial and will not be discussed here . Nevertheless, defining the outcomes that will be used to evaluate practice guidelines is a crucial step in this process. Quality of life and functional status measures constitute another group of outcome measures that should be included for the evaluation of practice guidelines. These dimensions of outcomes have received more attention from health providers, while consumers have become more concerned about outcomes of care. However, these outcomes also are difficult to measure, because they rely heavily on patient interviews and questionnaires. They are likely to vary with patient expectations, culture, and climate and are thus potentially to be measured with error and be misclassified. A few reliable, valid instruments have been developed to assess health-related quality of life , but such instruments are not easily used to collect this information from large databases. There is a need to develop instruments to measure these types of outcomes, whether they are conversion factors for existing databases (such using length of stay as a proxy for cost) or new measures that could easily be integrated in administrative databases. Such measures could include estimates of functional class or severity of illness. At present, many outcomes research studies measure mortality and disease-specific morbidity. The validity of the measurement of these outcomes is limited by the type of database that is used. For example, using death registries to obtain causes for death is a notoriously invalid source for this type of information. There are many examples of poor correlation between cause of death as established by death registries versus disease registries. Death certificates in New York City during 1992 were assessed to determine the accuracy and frequency of reporting tuberculosis as a cause of death. Of 310 persons who died with active tuberculosis in 1992 (based on a disease-specific registry), only 34% had tuberculosis listed on their death certificate. Thus, in this example, as in many others like it, using death certificates led to an inaccurate measure of disease burden . Confounding bias | In outcomes research terms, confounding bias is present when the effect of the practice variations on the outcomes of interest is distorted because of the effects of extraneous variables (variables that are causally associated with the practice variations and the outcomes of interest) . This issue is crucial in outcomes research because, while outcomes research shares the purpose of a clinical trial (to evaluate different treatments), it primarily uses observational methods -- investigators conducting outcomes research have limited control over potentially confounding factors (the initial conditions of individual groups of patients). Because outcomes research builds on existing practice variations and analyses the natural ongoing experiment, there is ample opportunity for confounding bias to invalidate any inference made about practice patterns outcome associations . For example, variations in practice patterns could reflect variation not only in the use of a given procedure but also in the severity of disease. Assignment of patients to certain procedures on the basis of the severity of illness makes sense clinically, but in outcomes research, it is a common and important source of confounding if the procedure is either efficacious or particularly harmful in high-risk patients. Many indices have been developed to measure the severity of illness when using existing databases to correct for such confounding, but one can never be sure that this type of confounding has been entirely controlled . This presents an intrinsic limitation of outcomes research. Avoidance of confounding bias is limited by the source of data used to describe practice patterns, particularly when observational data, such as the large Medicare administrative databases, are used to compare outcomes among patients who receive different treatments. The potential for confounding bias arises because many factors other than the treatment under evaluation may affect patient outcomes. These factors include comorbid diseases, severity of illness, and patient, physician and environmental factors. Such factors are likely to influence treatment decisions but are difficult to capture fully in recorded data. Researchers cannot adjust for imbalances in prognostic factors that are unmeasured or poorly categorized and administrative data, in particular, may lack the precise and accurate coverage of clinical details needed to permit full and fair adjustments. Further data collection might solve this issue, but it is not always possible to collect additional information. Standard statistical modeling can attempt to adjust for the known differences between the groups, but this might not be sufficient for unmeasured differences. Several alternative methods have been suggested. One method is subgroup analysis to adjust for unmeasured differences between groups of individuals who differ on known risk factors. Another method consists of the use of instrumental variables . Instrumental variables are observable factors that influence treatments but do not directly affect patient outcomes. This approach uses the so-called instrumental variables to mimic a randomization of patients to different likelihoods of receiving alternative treatments. McClellan et al. applied this methodology to assess whether more aggressive use of invasive cardiac procedures improved outcomes in the elderly. In this study, the instrumental variable was the distance of the patient's residence from the nearest hospital with on-site angiography. The authors noted lower mortality among elderly individuals who received more aggressive treatment than among those treated more conservatively. Temporal trend bias | We propose a bias called a "temporal trend bias" that is particular to the use of outcomes research to evaluate practice guidelines. This bias results from the inability to control for secular trends. It reflects the fact that by the time practice guidelines are published and disseminated, new treatments and technology are being incorporated into clinical practice. Thus, it is difficult to identify a pure application of a practice guideline whose application is not undermined by recent advances in medicine and technology. For example, we evaluated the effect of a specific set of guidelines on return to work after acute myocardial infarction. The use of these guidelines had been successful in a university setting; this study assessed their use in a community setting. During the 5 years that elapsed between these two studies, practices changed. The use of guidelines was less successful in the community not only because they did not influence practice but also because usual care had grown closer to the proposed guidelines . Ecological exposure in individual level studies | A frequently encountered particularity of outcomes research study design is the presence of both ecological exposure and individual level covariates in the same analysis. Because the unit of analysis is a group, but inferences are made about the impact of a given practice pattern on individual outcomes, many outcomes research analyses have elements of both individual and ecological analyses . In our study of regional variations in the treatment of acute myocardial infarction, measures describing practice patterns at the regional level, ecological exposure, (proportion of patients receiving angiography, angioplasty, and coronary artery bypass surgery) were linked to the outcome measures of mortality adjusting for individual level variables that measured severity of disease. Then, inferences were made about the use of these procedures at the patient level. Although the unit of analysis is the region, which would demand an ecological analysis, there are individual level covariates, which are likely to be correlated within each region, that need to be taken into account. When group measures are used that contain individual-level variability with some degree of correlatedness (within region) and aggregate-level variability (between regions), specific analytic tools must be used. It has been suggested that hierarchical logistic regression modeling be used to examine the interplay between sources of variation in the use of health-care services, that is, between ecological-level and individual-level sources. This type of modeling is designed to separate true variability across areas from observed variability. An application of this method is the work by Gatsonis et al. who found that practice variations across regions of the U.S. in the use of angiography after acute myocardial infarction were largely explained by differences in patient characteristics and geographic region. However, states that had more on-site availability of angiography still tended to have higher angiography rates after accounting for between-region and within-region variability. After analysis for sources of variability, more reliable inferences about the associations between practice patterns and outcomes can be made. Sources of data | The application of the proposed methodological framework for outcomes research largely depends on the sources of data that are used to evaluate the effect of the practice variations on outcomes . Most commonly, the study design is a retrospective cohort analysis and the dataset that is used has been obtained either for administrative purposes (discharge databases) or for a randomized clinical trial that addressed a different question . Less often, a prospective cohort study is designed to evaluate a particular set of practice guidelines . Although a prospective design provides more control in data collection than a retrospective analysis, both designs are subject to selection, information and confounding biases. The ideal database to use for the evaluation of practice guidelines is one that allows the precise measurement of the practice patterns (exposure) and outcomes (disease) as well as the measurement of potential confounders (severity of illness, precision of diagnosis, socioeconomic characteristics). Unfortunately, such a database probably does not exist. The strength of administrative databases, such as that of Medicare is that they allow the observation of large numbers of patients for which practice patterns can be evaluated as they occur in actual clinical practice. Furthermore, administrative databases allow the observation of practice patterns outcomes associations in large numbers of unselected patients. However, the limitations of such databases include the missing information about potential confounding factors, such as severity of illness, and the limited ability to measure exposure and outcome accurately. Many databases that are not designed for clinical research either mismeasure patient outcomes or fail to capture outcomes that are important to both physicians and patients (such as quality of life and functional status). The control of these biases was the basis of the methodological framework for outcomes research proposed in this chapter. The application of outcomes research methods to practice guideline evaluation | The application of outcomes research methods to practice guideline evaluation can accomplish several goals. One important goal is the evaluation of practice guidelines, that is, to determine to what extent the guidelines accomplished their primary goals after their dissemination. We have suggested the model of chronic disease epidemiology as the methodological framework for outcomes research to evaluate practice guidelines. The steps to evaluate practice guidelines using outcomes research when the basic design is a retrospective cohort study are summarized in Figure Some limitations to the application of this model exist. The reasons for the inability of the proposed methodological framework to deal completely with the intrinsic biases in outcomes research are listed in Figure . They relate mostly to the databases usually used in studies of outcomes research. Figure 3 | Steps to evaluate practice guidelines using outcomes research Steps to evaluate practice guidelines using outcomes research Figure 4 | Reasons for the inability of the proposed methodological framework to deal with biases in outcomes research Reasons for the inability of the proposed methodological framework to deal with biases in outcomes research Summary : The proposed methodological framework for outcomes research to evaluate practice guidelines reflects the selection, information and confounding biases inherent in its observational nature which must be accounted for in both the design and the analysis phases of any outcomes research study. Indeed, a major limitation of outcomes research is the inability to account for unobserved heterogeneity that directly correlates with practice patterns and/or health outcomes. This may lend bias to any inferences made about practice variations and outcomes. "Researchers cannot correct for the subtle reason doctors choose one treatment over another for a particular patient. That bias, in turn, can undermine the entire premise of outcomes research" . These are intrinsic properties of outcomes research that can be dealt with only in part, by applying the principles of chronic disease epidemiology. Thus, this proposed methodology can serve as a framework for the conduct of outcomes research in the evaluation of practice guidelines but its application will be limited. Competing interests : none declared Pre-publication history : The pre-publication history for this paper can be accessed here: Backmatter: PMID- 11914162 TI - Organization specific predictors of job satisfaction: findings from a Canadian multi-site quality of work life cross-sectional survey AB - Abstract | Background | Organizational features can affect how staff view their quality of work life. Determining staff perceptions about quality of work life is an important consideration for employers interested in improving employee job satisfaction. The purpose of this study was to identify organization specific predictors of job satisfaction within a health care system that consisted of six independent health care organizations. Methods | 5,486 full, part and causal time (non-physician) staff on active payroll within six organizations (2 community hospitals, 1 community hospital/long-term care facility, 1 long-term care facility, 1 tertiary care/community health centre, and 1 visiting nursing agency) located in five communities in Central West Ontario, Canada were asked to complete a 65-item quality of work life survey. The self-administered questionnaires collected staff perceptions of: co-worker and supervisor support; teamwork and communication; job demands and decision authority; organization characteristics; patient/resident care; compensation and benefits; staff training and development; and impressions of the organization. Socio-demographic data were also collected. Results | Depending on the organization, between 15 and 30 (of the 40 potential predictor) variables were found to be statistically associated with job satisfaction (univariate analyses). Logistic regression analyses identified the best predictors of job satisfaction and these are presented for each of the six organizations and for all organizations combined. Conclusions | The findings indicate that job satisfaction is a multidimensional construct and although there appear to be some commonalities across organizations, some predictors of job satisfaction appear to be organization and context specific. Keywords: Background : There appears to be no one commonly accepted definition for quality of work life. In healthcare organizations, quality of work life (QWL) has been described as referring to the strengths and weaknesses in the total work environment . Characteristics that describe the overall organization are viewed as part of the behaviour and reward system of the staff working in that setting. Organizational features such as policies and procedures, leadership style, operations, and general contextual factors of the setting, all have a profound effect on how staff view the quality of their work life. QWL is an umbrella term which includes many concepts. Therefore, concentrating on only one job characteristic, whether it is wages or management style, is an inadequate approach to assessing QWL. Because the perceptions held by employees play an important role in their decisions to enter, stay with or leave an organization, it is important that staff perceptions be included when assessing QWL. And although job satisfaction is not QWL, perception of QWL is often assessed using job satisfaction surveys. Previous studies have shown that low job satisfaction is a major cause of turnover among health care providers . In addition, job satisfaction may affect the quality of service and organizational commitment and may be a contributing factor associated with shortages of health care providers . Such findings have recently increased interest in studying job satisfaction among health care providers . The results of a 1993 meta-analysis of 48 studies looking at work satisfaction in over 15,000 nurses revealed that job satisfaction was associated strongly with reduced work stress, organizational commitment, communication with supervisors, autonomy, employee recognition, fairness, locus of control, years of experience, education, and professionalism. This study also found a strong relationship between job satisfaction and QWL for nurses . After reviewing the literature on QWL and job satisfaction, and considering the wide variety of health care settings, situational contexts, and organizational structures (including management styles, reporting structures, staffing complements, and levels of training and experience) in which employees work, we hypothesized that the predictors of job satisfaction would vary depending on the organization. The purpose of this study was to identify organization specific predictors of job satisfaction within a health care system that consisted of six independent and distinct organizations located in five communities in Central West Ontario, Canada. Methods : Setting | The settings for this study included six independent and distinct health care organizations providing varying levels and types of care. All six organizations were affiliated with the St. Joseph's Health System (SJHS) located in five Central West Ontario communities. Collectively, the SJHS is one of the largest corporations in Canada devoted to health care. At the time of the study (2000), the SJHS employed 5,486 full, part and casual time (non physician) staff. Additional information about of each of the six organizations and their respective communities is provided in Table . Table 1 | Characteristics of the Organizations within the St. Joseph's Health System. Questionnaire development | Items included in the "Quality of Work Life Survey 2000" were selected after a review of the literature and extensive consultation between research team members and the QWL Task Force (a management group consisting of representatives from each of the six SJHS organizations). The initial selection of items was influenced by a recently published Canadian study and reports from two meta-analyses . The QWL Task Force then refined these items to consider, among other things, issues of accuracy, relevance, readability, grammar, potential for offensiveness, and appearance of cultural or gender bias. After several months of development, the instrument was pretested on a small group of staff at two of the participating organizations (Site 2 and Site 4 -- see Table ). This pretesting was done to ensure that individuals could follow the instructions associated with the format, to obtain estimates of the time required to complete the survey instrument, to identify items that were poorly written or ambiguous, and to identify an appropriate implementation strategy. The questionnaire and implementation strategies were revised accordingly. The final 65-item survey contained nine sections representing topic areas considered relevant to assessing QWL in the SJHS. Eight scale scores were developed from the individual items (see below and : Statistically Significant Organization Specific (Univariate) Predictors of Job Satisfaction). The Co-worker and supervisor support section included 10 closed-ended and 1 open-ended questions. A 3-item supervisor social support scale included questions about supervisor helpfulness, concern about the welfare of employees, and ability to facilitate effective interaction among employees. Co-worker support was measured by a 7-item scale reflecting the extent to which co-workers were seen as competent, understanding, and supportive of employees. Both scales where adapted from Woodward et al. (1999) . The Teamwork and Communication section included 9 closed-ended and 1 open-ended questions. For determining teamwork, a 7-item scale was adapted from Taylor and Bowers (1972) to measure the extent to which one's work unit coordinates efforts, solves problems and works together effectively . A 2-item scale developed for this project measured how communication was practiced within the organization. The Job Demands and Decision Authority section included 15 closed-ended and 1 open-ended questions. It included a 4-item scale adapted from Brosnan and Johnson (1980) to measure clarity regarding responsibilities, workloads and conflicting demands . There was also a 9-item scale adapted from Karasek et al. (1998) to measure the extent to which respondents' jobs gave them autonomy or decision-making latitude , and 2 questions which reflected the demands of one's work . The Characteristics of Your Organization section included 6 closed-ended and 1 open-ended questions. This section was adapted from Woodward et al. (1999) and included a 4-item scale that inquired about the extent to which the organization encouraged the best efforts from staff, and how employees were treated . Two additional questions examined the extent to which staff were kept informed, and organizational recognition of employee contributions. The Patient/Resident Care section included 5 closed-ended and 1 open-ended questions. The questions (developed for this project) were used to measure employees' perceptions of the quality and timeliness of care provided for patients and residents at their respective organizations. The Compensation and Benefits section included 10 closed-ended and 1 open-ended questions. These questions were developed for this project to determine employee satisfaction concerning a number of employee benefits and level of pay. The Staff Training and Development section included 6 closed-ended and 1 open-ended questions. These questions (developed for this project) measured the extent to which each organization supports its staff in training, educational development and opportunities for advancement. The Overall Impressions of Your Organization section included 4 closed-ended and 4 open-ended questions. All of the questions (developed for this project) assessed staffs' impressions of and overall satisfaction with their organization. The question "Overall, how satisfied are you with your job?" was used as the outcome variable in this study. The Staff Socio-Demographic Information section included 10 closed-ended questions (developed for this project) to collect information on gender, age, marital status, education, length of employment, supervisory status, time spent on job activities, job status and job classification. Within each of the first 8 sections, employees were asked to circle the response that best described their feelings using 5-point Likert scales. Employees were also asked for written comments pertaining to each of the sections and were provided space to comment on other issues they felt were important. Survey Procedure | Because of the diversity of organizations and staff within the SJHS, it was decided by the QWL Task Force, organization administrators and researchers that the implementation of the survey would be customized to best fit each of the organizations. It was felt that a varied approach would be more feasible for the organizations and that this would help maximize response rates. Although the procedures were not identical, all of the organizations provided as a minimum: advance notification (written or voice mail) of the survey to all staff (eligibility was based on whether the worker was active on the organization's pay roll at the time of the study and was not a physician); access to questionnaires for all staff (the QWL Task Force felt that each staff member in the SJHS should have the opportunity to complete a questionnaire); one or more reminder notices (e.g., letters, newsletters, voice mail, personal communication); and sealed drop off boxes for completed questionnaires. Pilot testing of the questionnaire revealed that employees felt that tracking individual employees for the purpose of follow-up (i.e., to increase response rate), violated the perception of anonymity and confidentiality. Therefore, to help ensure anonymity and confidentiality, follow-up attempts were limited to general reminder notices to all staff. Analysis | All closed-ended (or quantitative) responses were entered directly from the questionnaires into SPSS (version 10.0.5 for Windows, SPSS, Inc., Chicago, 1999). Prior to data analysis, most of the survey questions were re-coded. Questions which asked participants to select one response within a five point scale (never to always; very dissatisfied to very satisfied; very poor to very good; no, definitely not to yes, definitely) were collapsed into two categories. For example, for the response scale (1=very dissatisfied, 2=dissatisfied, 3=not sure, 4=satisfied, 5=very satisfied) those who indicated they were either satisfied or very satisfied were re-coded as "satisfied" while all others were re-coded "not satisfied" by default. In several instances, it was appropriate to combine two or more of the questions into a composite scale score. See "Questionnaire Development" section and : Statistically Significant Organization Specific (Univariate) Predictors of Job Satisfaction for additional details on how the composite scale scores were calculated. In total, there were eight scale scores (supervisor social support; co-worker support; teamwork; communication; role clarity; decision latitude; organization/staff relations; patient/resident care). Scale scores were generated by summing the participant responses (i.e. one to five) for all questions that made up the scale. In the rare situation where a participant failed to answer one or more of the questions that made up a scale score, missing values were replaced with mean values for that organization. Scale scores were categorized into meaningful dichotomous categories prior to analysis (e.g., satisfied or not satisfied). For the purpose of this study, QWL was operationally defined using the global question "Overall, how satisfied are you with your job?". Employees rated job satisfaction from very dissatisfied to very satisfied using a five point scale (very dissatisfied, dissatisfied, not sure, satisfied, very satisfied). For the analysis, however, those indicating they were either satisfied or very satisfied were considered to be "satisfied" with their jobs. All others were considered "not satisfied" with their jobs. Prior to analysis, study researchers reached a consensus on which survey questions to include as potential predictors of job satisfaction. In total, there were eight scale scores and 32 questions that were rationalized a priori as potential predictors of job satisfaction. Data from each of the organizations, as well as all of the organizations combined (representing the SJHS), were analyzed separately to identify predictors of job satisfaction. T-test, chi-square analyses and, when appropriate, Fisher exact tests were used to determine which of the variables were statistically associated with job satisfaction i.e., were potential predictors of job satisfaction. Descriptive information (numbers and percentages) for each of the variables was calculated by whether or not staff were satisfied with their jobs. In addition, p-values, odds ratios, and 95% confidence intervals for the odds ratios were calculated for each potential predictor of job satisfaction. Separate logistic regression analyses were used to identify the best predictors of job satisfaction for each organization and for all organizations combined (SJHS). Only variables which had a statistically significant association with job satisfaction were included in these analyses. Adjusted odds ratios and corresponding 95% confidence intervals are reported for each organization and the SJHS. The logistic regression analyses produces odds ratios which have been simultaneously adjusted for all other variables in their respective final models. The goodness of fit of the logistic regression models were assessed using the rho-squared statistic . A rho-square value between 0.20 and 0.40 suggests a very good fit of the model. A probability level of <0.05 was used to determine statistical significance. SPSS and Epi-Info (version 6.04a, Centers for Disease Control and Prevention, Atlanta, 1995) were used for statistical computations. Results : Table provides additional information about each of the six health care organizations, including the type of organization, number of staff, number of beds or visits/year, and the size of the community where the organization was located. Respondent participation rate | Response rates are often used as an indicator of the representativeness of a sample of respondents. Of the combined 5,486 staff, 1,819 (33.2%) returned a completed questionnaire. Organization specific response rates varied from 25.3% to 55.3% . In an attempt to assess the representativeness of respondents, a comparison was made of available socio-demographic information between respondents and all staff within each of the organizations. Overall, female employees were more likely to respond than male employees (it should be noted, however, that the vast majority of staff (82% to 98%), were females within each of these organizations), as were full-time employees compared to part-time, casual or temporary employees. There were also some differences in respondents, across organizations, based on job classification. All organizations, however, had respondents within each job classification. A statistical estimating procedure was also used to assess how accurately respondents represent staff at each of the organizations . This calculation suggests that the organization specific findings were accurate plus or minus 3.6% to 8.8%, 19 times out of 20 . Table 2 | Response rates and accuracy of responses by organization. Potential predictors of job satisfaction | Organization specific and combined SJHS (univariate) analyses (t-test, chi-square analyses and, when appropriate, Fisher exact tests) were used to determine which of the potential predictor variables were statistically associated with job satisfaction. Included in these analyses were the 40 potential predictor variables (8 scale scores and 32 individual questions). See : Statistically Significant Organization Specific (Univariate) Predictors of Job Satisfaction for a list of all variables. The number of statistically significant variables ranged from 15 to 30 depending on the organization and 32 for all organizations (SJHS) combined (see : Statistically Significant Organization Specific (Univariate) Predictors of Job Satisfaction). Best predictors of job satisfaction | Separate logistic regression analyses were then used to identify the best predictors of job satisfaction for each organization and for all organizations combined (SJHS). All variables found to be statistically associated with job satisfaction from the univariate analyses were entered into these logistic regressions analyses. The best predictors of job satisfaction are presented in Table . The ranking assigned to these variables relates to the order in which variables were added to the logistic regression models. For example, the rank "1" refers to the first variable that was added to the model i.e., the variable which best improved the fit of the model (or the most important variable). A more detailed description of the magnitude (as represented by the size of the odds ratios) and statistical significance (as represented by the 95% confidence intervals of the odds ratios) of the association between each of these predictors and job satisfaction is presented below for each organization and all organizations combined (SJHS). The best predictors of job satisfaction are again ranked according to their importance. All of the odds ratios presented below have been simultaneously adjusted for all other variables in their respective final logistic regression models. All logistic regression models achieved a rho-square between 0.20 and 0.40 suggesting they were very good (fitting) models for predicting job satisfaction. Table 3 | Best Predictors of Job Satisfaction1 Ranked by Organization2. Site 1 (community hospital) | The most important predictors of job satisfaction were: 1) being satisfied with the organization's recognition of employee contributions (OR 5.01, 95% CI 1.59 to 15.81), 2) good decision authority (OR 7.91, 95% CI 1.46 to 42.92), 3) being satisfied with patient resident care (OR 4.66, 95% CI 1.36 to 15.97), and 4) good role clarity (OR 4.24, 95% CI 1.16 to 15.49). The final model achieved a rho-square of 0.30. Site 2 (community hospital/long-term care facility) | The most important predictors of job satisfaction were: 1) good open communication between staff (OR 2.55, 95% CI 1.03 to 6.35), 2) good supervisor social support (OR 6.27, 95% CI 1.36 to 29.00), 3) organization keeps staff informed (OR 3.73, 95% CI 1.51 to 9.20), 4) good decision authority (OR 3.49, 95% CI 1.25 to 9.73), and 5) being satisfied with pay level (OR 2.47, 95% CI 1.14 to 5.34). The final model achieved a rho-square of 0.24. Site 3 (visiting nurse organization) | The most important predictors of job satisfaction were: 1) less frequently (never/seldom/sometimes) asked to do an excessive amount of work (OR 7.22, 95% CI 2.22 to 23.46), 2) being satisfied or very satisfied that the organization keeps employees informed (OR 4.52, 95% CI 1.43 to 14.32), 3) belief the organization carries out its Mission statement (OR 11.17, 95% CI 2.04 to 61.14, and 4) good decision authority (OR 5.29, 95% CI 1.32, to 21.22). The final model achieved a rho-square of 0.34. Site 4 (long-term care facility) | The most important predictors of job satisfaction were: 1) belief the organization carries out its Mission statement (OR 4.63, 95% CI 1.77 to 12.51), 2) good supervisor social support (OR 3.32, 95% CI 1.22 to 9.04), 3) good decision latitude (OR 11.61, 95% CI 1.33 to 101.8), 4) often or always given enough time to get the job done (OR 3.05, 95% CI 1.00 to 9.35), and 5) spending 38 hours or more on the job or job related activities (OR 3.55, 95% CI 1.32 to 9.59). The final model achieved a rho-square of 0.34. Site 5 (community hospital) | The most important predictors of job satisfaction were: 1) belief the organization carries out its Mission statement (OR 3.42, 95% CI 1.82 to 6.43), 2) satisfied that the organization keeps staff informed (OR 2.62, 95% CI 1.48 to 4.65), 3) not being asked frequently to do an excessive amount of work (OR 2.41, 95% CI 1.36 to 4.27), 4) good decision latitude (OR 5.65, 95% CI 2.09 to 15.25), 5) being satisfied with pay level (OR 2.41, 95% CI 1.37 to 4.23), 6) being female (OR 2.99, 95% CI 1.29 to 6.90), and 7) good role clarity (OR 2.45, 95% CI 1.02 to 5.86). The final model achieved a rho-square of 0.25. Site 6 (tertiary care hospital/community health centre) | The most important predictors of job satisfaction were: 1) belief the organization carries out its Mission statement (OR 3.99, 95% CI 2.52 to 6.31), 2) good communication (OR 3.00, 95% CI 1.85 to 4.88), 3) being given enough time to get the job done (OR 2.63, 95% CI 1.58 to 4.40), 4) being a member of the nursing staff (OR 2.73, 95% CI 1.75 to 4.26), 5) good organization support for training and development (OR 3.51, 95% CI 1.59 to 7.76), 6) good decision latitude (OR 2.57, 95% CI 1.30 to 5.09) and 7) being satisfied with the organization's recognition of employee contributions (OR 2.05, 95% CI 1.07 to 3.91). The final model achieved a rho-square of 0.25. All sites combined (SJHS) | The most important predictors of job satisfaction after adjusting for site were: 1) belief the organization carries out its Mission statement (OR 2.79, 95% CI 2.07 to 3.77), 2) good communication (OR 1.87, 95% CI 1.33 to 2.62), 3) less frequently being asked to do an excessive amount of work (OR, 1.80, 95% CI 1.33 to 2.43), 4) good decision latitude (OR 3.28, 95% CI 2.09 to 5.17), 5) being satisfied with pay level (OR 1.61, 95% CI 1.21 to 2.15), 6) being satisfied with the organization's recognition of employee contributions (OR 1.57, 95% CI 1.07 to 2.29), 7) being female (OR 2.83, 95% CI 1.81 to 4.42), 8) good role clarity (OR 1.73, 95% CI 1.17 to 2.56), 9) being satisfied that the organization keeps employees informed (OR 1.35, 95% CI 1.00 to 1.85), 10) good teamwork (OR 1.45, 95% CI 1.01 to 2.09), 11) being given enough time to get the job done (OR 1.57, 95% CI 1.10 to 2.23), and 12) good organization/staff relations (OR 2.02, 95% CI 1.13 to 3.62). The final model achieved a rho-square of 0.26. Discussion : The results of this survey were intended to assist decision-makers in identifying key workplace issues, as perceived by employees, in order to develop strategies to address and improve the quality of working conditions for staff within each of the individual health care organizations and the SJHS as a whole. This research represents the first step of an ongoing process to ensure better QWL for employees. In addition to the findings presented here, information from the survey's open-ended written comments have also been summarized for each of the six organizations (L Lohfeld, K Brazil, P Krueger, G Edward, D Lewis, E Tjam, E., personal communication, 2001) and the SJHS as a whole (St. Joseph's Health System Quality of Work Life Technical Reports 2000). This open-ended information provides additional and complementary information to that which is provided in this report. Together, these findings are currently being used by decision-makers at each of the organizations, and the SJHS, in an effort to improve employee QWL. It should be noted that at the time of this survey, all of the hospitals included in this study (as well all other hospitals within the Province of Ontario) were operating in an environment of restructuring and change. This was a time of anxiety for many health care professionals, hospital staff and the general public. In 1996, the Ontario government created a Health Services Restructuring Commission (HSRC) with a four year mandate to restructure Ontario's hospitals and health services system. The HSRC was given authority under the Public Hospitals Act and The Ministry of Health Act to direct public hospitals to change their roles, transfer services and programs, amalgamate or close. The HSRC completed its mandate, announced its decisions and was terminated in March 2000. The timing for this study was after the decisions of the HSRC were announced. All of the organizations included in this study were impacted to varying degrees either directly or indirectly the HSRC decisions. The most notable impacts occurred at Site 1 and Site 2. Site 1 (a community hospital) was ordered closed effective March 2001 with programs and services to be transferred to the other local community hospital while site 2 (a community hospital/long-term care facility) was ordered to transfer its acute care services to the other local hospital in its community thereby becoming a long-