|
|
||||||||
MEDICAL EDUCATION |
From the Division of Functional Biology (Dr Cope) and the Division of Clinical Sciences (Drs Baker, Foster, and Boisvert) at the West Virginia School of Osteopathic Medicine in Lewisburg.
Address correspondence to Michael K. Cope, PhD, West Virginia School of Osteopathic Medicine, 400 N Lee St, Lewisburg, WV 24901-1128.E-mail: hbaker{at}wvsom.edu
At the West Virginia School of Osteopathic Medicine (WVSOM) in Lewisburg,
an electronic rating form was created to assist preceptors in evaluating
student performance for third- and fourth-year clinical rotations. Multiple
preceptors, often in rural locations, rated the clinical performance of 70
students in the WVSOM graduating class of 2005. The current study analyzes
these ratings. Using Cronbach
, subscore reliability coefficients were
obtained for all rotations: clinical knowledge, 0.80; data collection, 0.59;
application of knowledge, 0.65; and professionalism, 0.78. For the three
required family medicine rotations, which were almost always supervised by
osteopathic physicians, reliability of the rating for osteopathic principles
and practice (OPP) was 0.44. Intercorrelations among these five subscores
ranged from 0.46 to 0.94, all statistically significant at the .01 level.
Ratings for the five subscores were compared with 19 measures of student
performance in other parts of the WVSOM curriculum, many correlations were
significant at the .01 level. Clinical knowledge correlated 0.59 with year 2
grade point average (GPA), 0.51 with years 1 and 2 OPP GPA, 0.50 with
Comprehensive Osteopathic Medical Licensing Examination USA Level 2 Cognitive
Evaluation, and 0.45 with years 1 and 2 physical diagnosis GPA. Application of
knowledge correlated 0.55 with year 2 GPA and 0.50 with the organization score
on the college's year 3 objective structured clinical evaluation.
Professionalism correlated 0.51 with year 2 GPA and 0.49 with OPP years 1 and
2 written examination score. The average preceptor rating using the new
electronic form was 92.6, compared with 96.8 when the previous paper-based
form was used for the WVSOM class of 1998 (change significant at .05 level).
These moderate correlations provide some support for the validity of the
Clinical Education Grade Form.
The present study describes a new electronic form, the Clinical Education Grade Form, used by clinical preceptors to grade osteopathic medical student performance on third- and fourth-year rotations at the West Virginia School of Osteopathic Medicine (WVSOM) in Lewisburg. This study also describes procedures used to evaluate the extent to which the Clinical Education Grade Form decreased grade inflation, and the extent to which preceptor ratings were correlated with other measures of academic ability.
The use of evaluation forms and systems in clinical clerkships has a long history that has run parallel with concerns over the validity of those forms and systems. Articles evaluating medical student clerkship performance generally fall into three categories: surveys and suggestions; evidence of validity, reliability, and generalizability; and correlation with other measures of medical student performance.
Surveys and Suggestions
In 1987, Tonesk and
Buchanan1 conducted
a pilot study of 10 medical schools that described common problems with
clerkship evaluation systems. Some problems cited were the evaluators'
unwillingness to record negative evaluations, inadequately defined evaluation
criteria, and lack of training for evaluators. In 1990, Magarian and
Mazur2 conducted a
survey of 101 medical schools on the type of clerkship evaluation system used
to assess student performance. The survey found that 68% of schools used a
pass-fail grading system, 28% used letter grades, and 4% reported numeric
scores. In 1992,
Hunt3 described a
model evaluation system containing four stages and provided a list of
"signs and symptoms" one could consult to detect an evaluation
system with problems. Other
articles4-7
express a general desire for a valid and reliable clerkship evaluation system
that provides adequate and appropriate evaluation and feedback concerning the
developing clinical skills of medical students.
Evidence of Validity, Reliability, and Generalizability
Many articles have also addressed the validity, reliability, and
generalizability of data gathered from existing clerkship evaluation
systems.7-12
While some
studies7,8
have supported the validity of rating forms used in these evaluation systems,
many researchers argue that the topic of form reliability has not been
adequately addressed in the literature. For example, Levine and
colleagues9
determined that one way to increase form reliability would be to have multiple
preceptors rate each student. Reznick and
coauthors10 used
interrater agreement coefficients to look at relative and absolute
reliabilities and determined that only five raters per student were required
to obtain the commonly recommended reliability of 0.80. Carline and
coinvestigators11
determined that a minimum of seven raters per student were required to obtain
a reliability of 0.80. One
study12 noted that
"raters usually have had limited opportunities to observe a student's
skills before completing the evaluation, affecting the reliability of the
assessment."
Researchers12
further posited that three raters were required to obtain rating
reliability.
Correlations With Other Performance Measures
In the process of demonstrating the validity of clerkship evaluation forms,
many
authors13-17
have shown correlations between these evaluation scores. Correlations with
other academic measures or correlations with measures from outside agencies
such as the National Board of Osteopathic Medical Examiners (NBOME) or the
National Board of Medical Examiners (NBME) have also been
demonstrated.13-17
Lazaro and
colleagues13
reported the relative lack of correlation between clinical performance and
written examinations whether they were generated internally or by national
agencies. Metheny14
also found a weak relationship between clinical evaluations and NBME Subject
Examination scores. Campos-Outcalt and
colleagues15
compared ratings of clinical performance and problem-solving skills with
preadmission measures (ie, undergraduate GPA and Medical College Admission
Test [MCAT] score) and also with NBME Parts I and II subtests and total score
values. Their study found no correlation between clinical evaluations and
problem-solving skills and any preadmission measures. In addition, their study
found mild correlations between clinical evaluations and NBME Parts I and II
scores. They found no correlation between problem-solving skills and NBME Part
I, and only a moderate correlation between problem-solving skills and NBME
Part II. Although Silver and
Hodgson16 reported
no correlation between admissions measures (ie, undergraduate GPA and MCAT
score) and clinical performance, they found a good correlation between
admissions measures and NBME Part I performance. Callahan and
coauthors17
reported a high (0.01) correlation between family medicine, internal medicine,
obstetrics and gynecology, pediatrics, and surgery clerkship evaluations and
United States Medical Licensing Examination (USMLE) Step 2 and a high (0.01)
correlation between family medicine, internal medicine, and pediatrics
clerkship evaluations and USMLE Step 3. Ferguson and
Kreiter18 reported
a high correlation between preclinical and clinical training measures such as
objective structured clinical evaluations (OSCEs), case-based reports,
multiple-choice clinical skills examinations, and clinical preceptor
evaluation forms. They also noted that basic science grades, undergraduate
GPA, and MCAT scores did not correlate with clinical preceptor evaluation
forms.
To determine form validity, we looked at clerkship evaluation systems and forms and their correlation to institutional and national measures of both preclinical and clinical performance. The Clinical Education Grade Form was created in Microsoft Word 98 for Windows (Microsoft Corporation, Redmond, Wash). It was studied previously19 and researchers found that student grades were distributed more accurately (ie, more even distribution and lower mean) using the Clinical Education Grade Form than WVSOM's previous paper-based form.
| Preceptor Evaluations |
|---|
|
|
|---|
Data analysis was used to evaluate the extent to whether the Clinical Education Grade Form decreased grade inflation and correlated with other measures of academic success, including OSCE performance. Other than those found in the OSCE and COMLEX-USA examinations, the current investigation does not include analysis of written tests or case studies required during phase 3 for the class of 2005. This omission is due in part to changes in process at WVSOM with regard to how those components are assigned.
| Methods |
|---|
|
|
|---|
Clinical Rotations
Preceptors are asked to evaluate student performance on clinical rotations.
Five performance factors are addressed
(Figure 1):
|
Possible ratings were "failure," "needs improvement" (61-68), "adequate" (73-77), "good" (81-89), "excellent" (92-96), "truly exceptional" (100), or "not observed." A brief description of the behaviors associated with each rating is included in the Clinical Education Grade Form. All ratings were converted to numeric scores using the values listed. The subscore values for all core clerkships, selective clerkships, and elective clerkships were averaged for use in analysis. Figure 2 lists the various clinical rotations available at WVSOM and describes their respective durations at the time of the study.
|
For a few rotations (predominately electives), students had the option of spending 50% of the rotation with one preceptor and the other 50% with a different preceptor (9.1% of total rotations, 2.5% of required and selective rotations, and 36% of elective rotations). In these instances, the subscore values assigned by each preceptor were averaged to determine the rotation grade.
Student Performance on COMLEX-USA
The National Board of Osteopathic Medical Examiners administers a series of
licensing examinations designed to evaluate student competency at different
stages in the educational process. At the end of year 2, students take
COMLEX-USA Level 1; during year 4, students take COMLEX-USA Level 2-CE. In the
spring of 2005, COMLEX-USA Level 2-PE was introduced. All members of WVSOM's
class of 2005 were required to pass COMLEX-USA Level 1 and COMLEX-USA Level
2-CE in order to graduate. All members of WVSOM's class of 2005 were required
to participate in, but not necessarily pass, COMLEX-USA Level 2-PE. Numeric
scores were available for both the Level 1 and Level 2-CE examinations, and
pass-fail status was established for the Level 2-PE examination. The scores
for the COMLEX-USA Level 2-PE examination were coded as "1" for
pass and "0" for fail.
School-Based Performance Measures
| Results |
|---|
|
|
|---|
Correlations with selected academic performance measures are shown in the Table.
Reliability of Clinical Education Grade Form Subscores
The reliability of the subscores from the Clinical Education Grade Form was
varied. Subscore 1 had a reliability of 0.80; subscore 2, 0.59; subscore 3,
0.65; and subscore 5, 0.78.
For subscore 4, there were not enough data to determine reliability because all WVSOM students in this graduating class took at least one rotation with a preceptor (sometimes an allopathic physician [MD]) who did not evaluate students for OPP. The clerkships that had the largest proportion of osteopathic physicians (DOs) serving as preceptors, namely the three family medicine clerkships, were isolated from the rest of the dataset and used to determine the reliability of the OPP subscore. All other subscores for these clerkships were also reevaluated for reliability based on this reduced dataset.
The reliability ratings for the family medicine clerkship subscores were as follows: subscore 1, 0.52; subscore 2, 0.53; subscore 3, 0.51; subscore 4, 0.44; and subscore 5, 0.34.
All subscores were correlated, ranging from a low correlation of 0.46 to a high correlation of 0.94. The highest correlation was between subscore 1, clinical knowledge, and subscore 3, application of knowledge.
Correlations With COMLEX-USA Results
The correlations of subscores 2 through 5 with COMLEX-USA Level 1 were not
statistically significant. Subscore 1 was significantly correlated with
student performance on COMLEX-USA Level 1 (P=.05). The correlation of
subscores 1, 2, 3, and 5 with student performance on COMLEX-USA Level 2-CE was
statistically significant (P=.01). Subscore 4 was not correlated with
student performance on COMLEX-USA Level 2-CE. Subscores 1 through 4 were not
correlated with student performance on COMLEX-USA Level 2-PE. However,
subscore 5 did have a statistically significant correlation with student
performance on COMLEX-USA Level 2-PE (P=.05).
Correlations With Academic Measures
Subscore 1, subscore 3, and subscore 5 were significantly correlated with
all academic measures (GPAs for both preclinical phases and years, as well as
for physical diagnosis and OPP courses) (P=.01). Subscore 2 was
significantly correlated both with phase 2 and year 2 GPAs (P=.01) as
well as with GPAs from physical diagnosis and OPP courses (P=.05).
Subscore 2 had no significant correlation with either phase 1 GPA or year 1
GPA. Subscore 4 was correlated with GPAs for both preclinical phases and years
as well as OPP courses (P=.01). Subscore 4 was significantly
correlated with GPA for physical diagnosis courses (P=.05).
The correlations for WVSOM's OSCE were as follows: subscore 1 was
significantly correlated with total score, history, SOAP Note Form,
organization, and professionalism (P=.01), as well as with physical
examination (P=.05). Subscore 1 was not correlated with
communication. Subscore 2 was significantly correlated with organization
(P=.01), total score, and SOAP Note Form (P=.05). Subscore 2
was not correlated with history, physical examination, communication, or
professionalism. Subscore 3 was significantly correlated with total score,
history, SOAP Note Form, organization, and professionalism (P=.01),
as well as with physical examination (P=.05). Subscore 3 was not
correlated with communication. Subscore 4 was significantly correlated with
total score and physical examination (P=.01). Subscore 4 was not
correlated with physical examination, SOAP Note Form, communication,
organization, or professionalism. Subscore 5 was significantly correlated with
total score, organization, and professionalism (P=.01), as well as
with physical examination (P=.05). Subscore 5 was not correlated with
communication.
|
Overall Grade Distribution
As determined by the weighted averages of all clinical clerkship grades in
the last 2 years of training, the average WVSOM student GPA was 92.6 as
compared to the average GPA for the first 2 years of training, 87.3. The
distribution of grades for the final 2 years of training was A, 75.9%; B,
21.2%; C, 2.9%, and F, 2.9%; as compared to the distribution of grades for the
first 2 years of training A, 27.1%; B, 70%; C, 2.9%, and F, 2.9%.
The electronic Clinical Education Grade Form was developed to help preceptors conduct better evaluations of student performance in clinical rotations. The Clinical Education Grade Form had an average preceptor rating of 92.6, somewhat lower than the average rating of 96.8 for the class of 1998 (P=.05), which used the old paper-based form that was discontinued in 2002. While we would like to believe that students excelled on their clinical rotations, we believe it is more likely the preceptor ratings remain inflated with the Clinical Education Grade Form. We will continue to expand faculty development activities to better define WVSOM's necessary expectations in order for students to receive an "A" grade.
With the previous paper-based grading form, the correlation between preceptor ratings and COMLEX-USA Level 2-CE was low (0.16). With the new electronic Clinical Education Grade Form, the correlation between subscore 1 and COMLEX-USA Level 2-CE was 0.50, which seems more appropriate. Correlations with other written and performance examination results seem to suggest the validity of the new instrument. For example, the correlation between subscore 1 and performance on WVSOM's OSCE is 0.43, which supports the validity of both measures.
All subscores were correlated, with the highest correlation (0.94) between subscore 1 and subscore 3. Because these two variables (clinical knowledge and application of knowledge) are logically related, we believe this high correlation is appropriate. We intend to leave both items on the scale because students' final grades are determined by calculating the simple average of each subscore. We think it is appropriate to have both items on the Clinical Education Grade Form.
Preceptors did not always fully complete the new form. Consequently, students did not receive a rating for all subscores. The subscore most frequently left incomplete by preceptors was subscore 4, OPP. We must grant that some rotations are not always appropriate for the demonstration of the clinical skills included in the OPP subscore. In addition, preceptors were either DOs or MDs. Most MDs did not provide a rating for subscore 4. We assume the omission of this rating was often intentional and that most MDs decided not to rate students on this aspect of their performance due to their relative lack of training in OPP. However, some DOs also did not rate students on this item. Because of incomplete results for subscore 4, we were unable to determine the reliability of the OPP subscore. When the dataset was reduced to rotations with a high proportion of preceptors who were DOs (eg, family medicine clerkships), subscore 4 reliability was 0.44. This reliability is lower than expected—and low enough to reduce the possibility that these ratings would highly correlate with academic variables documented earlier in the medical school curriculum (ie, phase 1 or phase 2). Some faculty development in the application of the OPP subscore has already occurred. We intend to work more closely with preceptors to standardize and build confidence in their abilities to evaluate students for OPP.
The current study has several limitations. First, the data were collected at only one rural, osteopathic medical school, with a graduating class composed of 70 students. The ability of schools with other missions and structures to adopt the new electronic Clinical Education Grade Form may be limited; adaptation may be preferable. Also, the current analysis does not consider the impact of written examinations taken at or near the end of rotations. Similarly, it does not include the impact of case studies on rotation grades. Neither does this investigation address the extent to which preceptor ratings may correlate with such measures. Beginning with WVSOM's class of 2005, changes in end-of-rotation examinations have been implemented, so analysis of their impact would have no value for curriculum planning. However, for the continued improvement of the program, eventually it will be necessary to analyze how much end-of-rotation examinations correlate with preceptor ratings.
As noted, changes to the electronic Clinical Education Grade Form have been ongoing and other changes are underway. First, as initially implemented, the numeric values associated with the grade scale were intentionally removed in the hope that preceptors would focus on the characteristics described (eg, "consistently demonstrates expected, good knowledge") rather than the associated grade value (eg, "85," a mid-B grade). However, discussions with students at graduation exit interviews revealed that some students wrote these numbers on the forms in the hope of receiving a better grade. Student-altered forms are a possible source of error variation (ie, variability not related to the skills and abilities we wish to measure). Therefore, to prevent some students from having an unfair advantage, we placed numeric grade values on the electronic evaluation form.
Second, the Clinical Education Grade Form was developed before an emphasis on "core competencies"19,20 was implemented at the predoctoral level for colleges of osteopathic medicine, and before WVSOM's faculty expanded the application of contemporary models for teaching and evaluating communication skills.21,22 The new electronic evaluation form incorporates taking a patient's medical history as one of the examples for subscore 2, and the item "worked well with other healthcare team members" as an example for subscore 5. The Clinical Education Grade Form is also being revised to make interpersonal communication a separate subscore: interpersonal communication and professionalism. Accordingly, some descriptors will be modified to emphasize communication skills. Other ways of measuring the core competencies (eg, basic knowledge of OPP and osteopathic manipulative treatment) will also be pursued.20
Finally, it is clear that WVSOM must continue to expand faculty-development activities for clinical preceptors, specifically with regard to faculty expectations for WVSOM students. At its midwinter continuing medical education conferences, WVSOM has offered special sessions regarding contemporary practices in physical examination, communication skills, writing physician progress notes, and osteopathic manipulative medicine. In February 2005, session participants were asked to take part in a three-station OSCE. In February 2006, they were asked to critique three aspects of student performance: a video of osteopathic manipulative treatment, a role play of communication skills, and sample written progress notes. A major objective of these programs was to help preceptors become more confident with WVSOM's current expectations for students' clinical performance. Similar programs are being planned, as well as additional training for preceptors that specifically focus on the new electronic evaluation form.
| Conclusion |
|---|
|
|
|---|
Submitted April 3, 2006; revision received July 18, 2006; accepted August 2, 2006.
| References |
|---|
|
|
|---|
2. Magarian GJ, Mazur DJ. A national survey of grading systems used in medicine clerkships. Acad Med.1990; 65:636 -639.[Medline]
3. Hunt DD. Functional and dysfunctional characteristics of the prevailing model of clinical evaluation systems in North American medical schools. Acad Med.1992; 67:254 -259.[Medline]
4. Ravelli C, Wolfson P. What is the "ideal" grading system for the junior surgery clerkship? Am J Surg.1999; 177:140 -144.[Medline]
5. Turnbull J, MacFadyen J, Van Barneveld C, Norman G. Clinical work sampling: a new approach to the problem of in-training evaluation. J Gen Intern Med. 2000;15:556-561. Available at: http://www.blackwell-synergy.com/doi/full/10.1046/j.1525-1497.2000.06099.x. Accessed February 23, 2007.
6. Ogburn T, Espey E. The R-I-M-E method for evaluation of medical students on an obstetrics and gynecology clerkship. Am J Obstet Gynecol. 2003;189:666 -669.[Medline]
7. Buckwalter JA, Schumacher R, Albright JP, Cooper RR. The validity
of orthopaedic in-training examination scores. J Bone Joint Surg
Am. 1981;63:1001
-1006.
8. Levine HG, McGuire CH. Rating habitual performance in graduate medical education. J Med Educ.1971; 46:306 -311.[Medline]
9. Levine HG, Yunker R, Bee D. Pediatric resident performance. The
reliability and validity of rating forms. Eval Health
Prof. 1986;9:62
-74.
10. Reznick RK, Colliver JA, Williams RG, Folse JR. Reliability of different grading systems used in evaluating surgical students. Am J Surg. 1989;157:346 -349.[Medline]
11. Carline JD, Paauw DS, Thiede KW, Ramsey PG. Factors affecting the reliability of ratings of students' clinical skills in a medicine clerkship. J Gen Intern Med.1992; 7:506 -510.[Medline]
12. Kreiter CD, Ferguson K, Lee WC, Brennan RL, Densen P. A generalizability study of a new standardized rating form used to evaluate students' clinical clerkship performances. Acad Med.1998; 73:1294 -1298.[Medline]
13. Lazaro EJ, Hobson RW 2nd, Kerr JC, Spillert CR, Casey KF. A critical analysis of clerkship grading procedures. J Natl Med Assoc. 1983;75:1083 -1086.[Medline]
14. Metheny WP. Limitations of physician ratings in the assessment of
student clinical performance in an obstetrics and gynecology clerkship.
Obstet Gynecol.1991; 78:136
-141.
15. Campos-Outcalt D, Witzke DB, Fulginiti JV. Correlations of family medicine clerkship evaluations with scores on standard measures of academic achievement. Fam Med.1994; 26:85 -88.[Medline]
16. Silver B, Hodgson CS. Evaluating GPAs and MCAT scores as predictors of NBME I and clerkship performances based on students' data from one undergraduate institution. Acad Med.1997; 72:394 -396.[Medline]
17. Callahan CA, Erdmann JB, Hojat M, Veloski JJ, Rattner S, Nasca TJ, et al. Validity of faculty ratings of students' clinical competence in core clerkships in relation to scores on licensing examinations and supervisors' ratings in residency. Acad Med.2000; 75(10 suppl):S71 -S73.[Medline]
18. Ferguson KJ, Kreiter CD. Using a longitudinal database to assess the validity of preceptors' ratings of clerkship performance. Adv Health Sci Educ Theory Pract.2004; 9:39 -46.[Medline]
19. Cope MK, Baker HH, Boisvert CS, Foster RW. Development of a numerical grading system for a family practice rotation [abstract]. J Am Osteopath Assoc.2003; 103: 393. C40.
20. Accreditation of colleges of osteopathic medicine: COM accreditation standards and procedures page. DO-Online.org Web site. Available at: http://www.do-online.org/pdf/acc_predoccom2007.pdf. Accessed March 6, 2007.
21. Makoul G. Essential elements of communication in medical encounters: the Kalamazoo consensus statement [review]. Acad Med. 2001;76:390 -393.[Medline]
22. Buyck D, Lang F. Teaching medical communication skills: a call for greater uniformity. Fam Med.2002; 34:337 -343.[Medline]
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |