<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="other" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML">
  <front>
    <journal-meta>
      <journal-id journal-id-type="pmc">JRMS</journal-id>
      <journal-id journal-id-type="pubmed">J Res Med Sci</journal-id>
      <journal-id journal-id-type="publisher-id">Journal of Research in Medical Sciences</journal-id>
      <journal-title>Journal of Research in Medical Sciences</journal-title>
      <issn pub-type="ppub">1735-1995</issn>
	<issn pub-type="epub">1735-7136</issn>
      <publisher>
        <publisher-name>Medknow Publications Pvt Ltd</publisher-name>
	<publisher-loc>India</publisher-loc>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="publisher-id">JRMS-18-887</article-id>
      <article-id pub-id-type="pmid">24497861</article-id>
      <article-categories>
	<subj-group subj-group-type="headings">
		<subject>Educational Forum</subject>
	</subj-group>
      </article-categories>
      <title-group>
        <article-title>Assessing the reliability of the borderline regression method as a standard setting procedure for objective structured clinical examination</article-title>
      </title-group>
	<contrib-group>
<contrib contrib-type="author">
<name><surname>Hejri</surname>
<given-names>Sara M</given-names></name>
<xref ref-type="aff" rid="aff1"/></contrib>
<contrib contrib-type="author">
<name><surname>Jalili</surname>
<given-names>Mohammad</given-names></name>
<xref ref-type="aff" rid="aff2"/><xref ref-type="corresp" rid="cor1"/></contrib>
<contrib contrib-type="author">
<name><surname>Muijtjens</surname>
<given-names>Arno M</given-names></name>
<xref ref-type="aff" rid="aff3"/></contrib>
<contrib contrib-type="author">
<name><surname>Van Der Vleuten</surname>
<given-names>Cees P</given-names></name>
<xref ref-type="aff" rid="aff4"/></contrib>
</contrib-group>
<aff id="aff1">Department of Medical Education; School of Medicine, Tehran University of Medical Sciences, Tehran, Iran</aff><aff id="aff2">Department of Emergency Medicine, Department of Medical Education; School of Medicine, Tehran University of Medical Sciences, Tehran, Iran</aff><aff id="aff3">Department of Medical Education and Research, Faculty of Health, Medicine and Life Sciences, Maastricht University; Maastricht, The Netherlands</aff><aff id="aff4">Department of Educational Development and Research, School of Health Professions Education, Maastricht University; Maastricht, The Netherlands</aff>

      <author-notes>
	<corresp id="cor1"><bold>Address for correspondence:</bold>Mohammad Jalili, 7<sup>th</sup> floor, Tehran University of Medical Sciences, Ghods Street, Keshavarz Blvd, Tehran, Iran <email xlink:href="mjalili@tums.ac.ir">mjalili@tums.ac.ir</email></corresp>

      </author-notes>
      <pub-date pub-type="ppub">
        <season>October</season>
        <year>2013</year>
      </pub-date>
      <volume>18</volume>
      <issue>10</issue>
      <fpage>887</fpage>
      <lpage>891</lpage>   
      
<history>
<date date-type="received"><day>12</day><month>1</month><year>2013</year></date>

<date date-type="rev-recd"><day>28</day><month>1</month><year>2013</year></date>
</history>

      <permissions>
        <copyright-statement>Copyright: &#x000a9; Journal of Research in Medical Sciences</copyright-statement>
        <copyright-year>2013</copyright-year>
        <license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by-nc-sa/3.0"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</p>
</license>
      </permissions>
      <abstract><sec id="st1"><title>Background:</title><p> One of the methods used for standard setting is the borderline regression method (BRM). This study aims to assess the reliability of BRM when the pass-fail standard in an objective structured clinical examination (OSCE) was calculated by averaging the BRM standards obtained for each station separately. <sec id="st1"><title>Materials and Methods:</title><p> In nine stations of the OSCE with direct observation the examiners gave each student a checklist score and a global score. Using a linear regression model for each station, we calculated the checklist score cut-off on the regression equation for the global scale cut-off set at 2. The OSCE pass-fail standard was defined as the average of all station&#x2032;s standard. To determine the reliability, the root mean square error (RMSE) was calculated. The R<sup>2</sup> coefficient and the inter-grade discrimination were calculated to assess the quality of OSCE. <sec id="st1"><title>Results:</title><p> The mean total test score was 60.78. The OSCE pass-fail standard and its RMSE were 47.37 and 0.55, respectively. The R<sup>2</sup> coefficients ranged from 0.44 to 0.79. The inter-grade discrimination score varied greatly among stations. <sec id="st1"><title>Conclusion:</title><p> The RMSE of the standard was very small indicating that BRM is a reliable method of setting standard for OSCE, which has the advantage of providing data for quality assurance.</p>
</sec>
<sec id="st2"><title>Materials and Methods:</title><p> In nine stations of the OSCE with direct observation the examiners gave each student a checklist score and a global score. Using a linear regression model for each station, we calculated the checklist score cut-off on the regression equation for the global scale cut-off set at 2. The OSCE pass-fail standard was defined as the average of all station&#x2032;s standard. To determine the reliability, the root mean square error (RMSE) was calculated. The R<sup>2</sup> coefficient and the inter-grade discrimination were calculated to assess the quality of OSCE. <sec id="st2"><title>Results:</title><p> The mean total test score was 60.78. The OSCE pass-fail standard and its RMSE were 47.37 and 0.55, respectively. The R<sup>2</sup> coefficients ranged from 0.44 to 0.79. The inter-grade discrimination score varied greatly among stations. <sec id="st2"><title>Conclusion:</title><p> The RMSE of the standard was very small indicating that BRM is a reliable method of setting standard for OSCE, which has the advantage of providing data for quality assurance.</p>
</sec>
<sec id="st3"><title>Results:</title><p> The mean total test score was 60.78. The OSCE pass-fail standard and its RMSE were 47.37 and 0.55, respectively. The R<sup>2</sup> coefficients ranged from 0.44 to 0.79. The inter-grade discrimination score varied greatly among stations. <sec id="st3"><title>Conclusion:</title><p> The RMSE of the standard was very small indicating that BRM is a reliable method of setting standard for OSCE, which has the advantage of providing data for quality assurance.</p>
</sec>
<sec id="st4"><title>Conclusion:</title><p> The RMSE of the standard was very small indicating that BRM is a reliable method of setting standard for OSCE, which has the advantage of providing data for quality assurance.</p>
</sec>
</abstract>
      <kwd-group><kwd>Borderline regression method</kwd>
<kwd>objective structured clinical examination</kwd>
<kwd>reliability</kwd>
<kwd>standard setting</kwd>
</kwd-group>	
      
    </article-meta>
  </front>
  <body>
	<sec><title/>
</sec><sec><title>Introduction</title><p>The pass-fail standard is a cut-score on a test that indicates the minimal adequate level of competence and defines students who performed satisfactorily. Although standards may be set through arbitrary decisions, standard setting is a judgmental process that results in defensible pass-fail standards in a systematic, reproducible, and defensible manner. <sup><xref ref-type="bibr" rid="ref1">1</xref></sup>,<sup><xref ref-type="bibr" rid="ref2">2</xref></sup>,<sup><xref ref-type="bibr" rid="ref3">3</xref></sup> Many studies on standard setting methods have been conducted in the area of written assessments. However, recent studies have been focused on setting cut-scores for performance tests like objective structured clinical examinations (OSCEs). <sup><xref ref-type="bibr" rid="ref4">4</xref></sup>,<sup><xref ref-type="bibr" rid="ref5">5</xref></sup>,<sup><xref ref-type="bibr" rid="ref6">6</xref></sup>,<sup><xref ref-type="bibr" rid="ref7">7</xref></sup>,<sup><xref ref-type="bibr" rid="ref8">8</xref></sup>,<sup><xref ref-type="bibr" rid="ref9">9</xref></sup>,<sup><xref ref-type="bibr" rid="ref10">10</xref></sup>,<sup><xref ref-type="bibr" rid="ref11">11</xref></sup></p>

<p> Standard setting procedures can be categorized as either exam-centered, in which the content of the test is reviewed by the expert judges (e.g., Angoff method) or examinee-centered, where expert decisions are based on the actual performance of the examinees. <sup><xref ref-type="bibr" rid="ref2">2</xref></sup>,<sup><xref ref-type="bibr" rid="ref3">3</xref></sup>,<sup><xref ref-type="bibr" rid="ref12">12</xref></sup>,<sup><xref ref-type="bibr" rid="ref13">13</xref></sup> One of these latest methods is the borderline regression method (BRM). In the BRM, a rater evaluates student&#x2032;s performance at each station by completing a checklist and a global rating scale. The checklist marks from all examinees at each station are then regressed on the attributed global rating scores, providing a linear equation. The global score representing borderline performance (e.g., 2 on the global performance rating scale) is substituted into the equation to predict the pass-fail cut-score for the checklist marks. <sup><xref ref-type="bibr" rid="ref5">5</xref></sup></p>

<p> There are several advantages to this method: It is based on actual performance of all examinees, it uses the judgments of expert examiners, and it is not time consuming. <sup><xref ref-type="bibr" rid="ref5">5</xref></sup>,<sup><xref ref-type="bibr" rid="ref8">8</xref></sup>,<sup><xref ref-type="bibr" rid="ref14">14</xref></sup> Yet, another important advantage of BRM is that it can be used to generate metrics to evaluate the quality of an OSCE. These include the R<sup>2</sup> coefficient, the adjusted value of R<sup>2</sup> , and the inter-grade discrimination. <sup><xref ref-type="bibr" rid="ref15">15</xref></sup></p>

<p> Considering the above mentioned advantages of the BRM, it is important to prove that it is a reliable procedure for standard setting. Earlier studies have calculated the precision for a single application of the BRM (average checklist score vs. average global score). <sup><xref ref-type="bibr" rid="ref6">6</xref></sup>,<sup><xref ref-type="bibr" rid="ref10">10</xref></sup> The aim of this study is to assess the reliability of BRM as a standard setting method for a pre-internship OSCE, where the overall OSCE pass-fail standard was calculated by averaging the BRM standards obtained for each station separately.</p>


</sec><sec><title>Methods</title><p>In this study, a 14-station OSCE was administered to 105 medical students prior to internship phase at Tehran University of Medical Sciences in 2010. The fourteen 4 min stations represented different domains of clinical skills relevant to clerkship experience. Five stations using the written questions were excluded from the analysis. In the following part of the paper, we will use the term OSCE to indicate the nine-station performance-based subtest. In the nine stations with patient encounters, the examiners directly observed student&#x2032;s performance and gave two scores: The checklist score (percentage correct, 0-100) and the global rating score (1: Fail, 2: Borderline, 3: Sufficient, 4: Good, and 5: Excellent). The raters were instructed to give the global score based on their overall impression of the examiner&#x2032;s candidates and not to convert the checklist score into a global rating. To make this even harder to occur, the raters were not supposed to sum up the checklist scores of the candidate in that station. The total test score was calculated by averaging the station checklist scores. The global rating was only used for standard setting purpose.</p>

<p>The BRM was applied to establish a standard. For each station, we used a linear regression model in which the student&#x2032;s checklist scores and global scores were considered as dependent and independent variables, respectively. Then we calculated the checklist score cut-off on the regression equation for the global scale cut-off set at 2. The corresponding pass-fail standard for the OSCE (PFS <sub>OSCE</sub> ,) was defined as the average of the nine station cut-scores. The percentage of students passing the OSCE accordingly is indicated as the pass rate.</p>

<p>To assess the quality of OSCE, the following metrics were calculated for each station: The R<sup>2</sup> coefficient (the squared linear correlation between the checklist score and the global rating score), and the inter-grade discrimination (the slope of the regression line).</p>

<p>To determine the reliability of the PFS <sub>OSCE</sub> , the root mean square error (RMSE) of the estimated standard was calculated: The lower the RMSE, the more reliable the standard is. For this purpose, the regression-based method to calculate the precision for a single application of the BRM (OSCE average checklist score vs. OSCE average global score) presented in Muijtjens et al. was extended. <sup><xref ref-type="bibr" rid="ref6">6</xref></sup> The extension provides an estimate of the RMSE for the current situation where the OSCE standard is obtained by averaging the checklist cut-off scores that were obtained by applying BRM for each station separately. <sup><xref ref-type="bibr" rid="ref10">10</xref></sup></p>

<p> Assuming that the error in the checklist cut-off scores is independent over the M stations of the OSCE for the error in the OSCE checklist standard it holds:</p>

<p>[INLINE:1]</p>

<p>Where, M is the number of stations, n is the number of candidates attending the OSCE, s <sub>regr,I</sub> is the standard error of estimate of the regression (estimate of the standard deviation (SD) of the residual error in the regression) for the i<sup>th</sup> station, Mean <sub>G,i</sub> and SD <sub>G,I</sub> are the mean and SD of the student&#x2032;s global scores G <sub>i</sub> for the i<sup>th</sup> station, respectively, and G <sub>0</sub> is the cut-off value of the global score, which is identical for all stations.</p>

<p>For each station separately, say for station i, the corresponding RMSE can be obtained on the basis of the expression above with some plausible modifications: Dropping the summation leaving only the i<sup>th</sup> term, and setting M equal to one.</p>


</sec><sec><title>Results</title><p>For each of the nine stations in the OSCE <xref ref-type="fig" rid="F1">Figure 1</xref> shows the scatter plot of the checklist score versus the global score for the 105 candidates attending the OSCE. Each circle indicates the result of a candidate. However, it should be noted that the scores of some students may be identical and will result in coinciding circles in the plot. This is clearly demonstrated in the panel of the splinting station <xref ref-type="fig" rid="F1">Figure 1</xref>, second row, second column] where the circle at the point (global score = 1, checklist score = 0) represents 65 candidates having the same result. Each panel presents the linear regression of checklist score versus global score (solid line), the pass-fail cut-off value for the global score (equal to two, vertical broken line), and the corresponding BRM pass-fail cut-off value for the checklist score (horizontal broken line). The lower right panel (Total) shows the scatter plot for the mean global score and mean checklist score (total test score), where the mean is taken by averaging a candidates scores over the nine stations of the OSCE. The broken line indicates the OSCE checklist standard, which is obtained by averaging the BRM cut-off scores of the nine stations in the OSCE.<fig id="F1"><label>Figure 1</label><caption><p>Scatter plots of the checklist score versus the global score for the nine stations in the in the pre-internship objective structured clinical examination (OSCE) with 105 candidates. Each panel presents the linear regression of checklist score versus global score (solid line), the pass-fail cut-off value for the global score (equal to 2, vertical broken line), and the corresponding pass-fail cut-off value for the checklist score (horizontal broken line) according to the borderline regression method (BRM). The lower right panel (total) shows the scatterplot of the mean global and checklist scores over the nine stations for the 105 candidates, the broken line indicating the pass-fail cut-off score for the mean checklist score (total score); the latter cut-off score was obtained by averaging the BRM cut-off scores of the nine stations in the OSCE</p>
</caption><alt-text>Figure 1</alt-text><graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="JResMedSci_2013_18_10_887_124892_u3.tif"/></fig></p>

<p>Performance of students in the pre-internship OSCE resulted in a mean total test score of 60.78 (SD = 8.04). The Pass-Fail Standard of the OSCE was 47.37. The RMSE of the standard was 0.55, which is very small compared to the SD of the total test score amounting to 8.04, thereby indicating that the standard is sufficiently reliable. The percentage of students passing the whole exam was 95.2&#x0025; [see lower right panel of <xref ref-type="fig" rid="F1">Figure 1</xref>. Descriptive statistics including the Pass-Fail standards and the corresponding RMSE and pass rate for each station are presented in <xref ref-type="table" rid="T1">Table 1</xref>. The mean student checklist scores and standard deviation for each station are also displayed. As shown in the table, the lowest and highest pass rates were obtained in splinting (19.0&#x0025;) and breast examination (89.5&#x0025;) stations, respectively. The most accurate BRM standard was found for the abdominal examination station (RMSE equal to 0.98) while the least accurate standard was found for the breast examination station (RMSE equal to 2.27).{Table 1}</p>

<p>The degree of linear correlation (R<sup>2</sup> ) between the checklist score and the overall global rating ranged from 0.44 to 0.79, with the highest value pertaining to the abdominal examination station, and falling below the threshold of 0.5 in only one station (breast examination). The slope of the regression line varied greatly among stations. In splinting station, for instance, an increase of more than 25 points in the checklist score was required to produce a one-point increment in the global rating scores <xref ref-type="table" rid="T1">Table 1</xref>.</p>


</sec><sec><title>Discussion</title><p>BRM as a standard setting method is much more convenient and less resource consuming compared to other procedures like Angoff. Furthermore, owing to the fact that global grade is awarded in addition to the checklist score; BRM has the advantage of generating a number of indices that are useful in measuring the quality of the OSCEs. Considering the fact that BRM is widely used as a standard setting method, assessing its reliability is of paramount importance. The focus of this study was to evaluate the reliability of the BRM, using the RMSE for a pre-internship OSCE, where the OSCE pass-fail standard was calculated by averaging the BRM standards obtained for each station separately.</p>

<p>Overall, the low RMSE of the total OSCE cut-score shows a high reliability of the standard setting procedure. The results are comparable with several other studies, which employed a similar technique to assess the reliability of the BRM <xref ref-type="table" rid="T2">Table 2</xref>. Overall, the standard error is approximately half a point on a percentage scale. For taking decisions, we might multiply this standard error with 1.96 for a confidence level of 95&#x0025;. That means that the BRM produces a standard that could be &#177;1&#x0025; range on the checklist scoring scale. If we had set our pass mark 1&#x0025; lower, our pass rate would have been the same (95.2&#x0025;). If we had set our pass mark 1&#x0025; higher, the pass rate would have been 93.3&#x0025;. That means that the noise caused by the BRM leads to an approximate 1.9&#x0025; of shifts in pass/fail decisions. With an increasing number of examinees and/or increasing number of stations these results might even improve, because the RMSE would decrease (and the reliability would increase). <sup><xref ref-type="bibr" rid="ref10">10</xref></sup>{Table 2}</p>

<p> The relatively low RMSE of the BRM standard for the abdominal examination station is consistent with the strong correlation expressed by the high R<sup>2</sup> for this station. It is due to the spread of points over the whole range of the two score scales (checklist and global) in combination with a fairly strong relation between the two. It indicates that the station is of adequate difficulty and sufficiently sensitive to tap performance differences consistently from both perspectives. The opposite situation is found for the breast examination: Low R<sup>2</sup> and high RMSE. This point merits further explanation: With this station, global scores are mainly concentrated at levels three and four and within each of these levels the checklist scores are widely spread. These characteristics indicate that this station lacks discriminative power, and the validity of the checklist and/or the global score is questionable.</p>

<p>Generally, in all except one station, higher overall global ratings corresponded with higher checklist scores, giving rise to greater values of R<sup>2</sup> coefficient (0.55-0.79). This is similar to the study conducted by Homer and Pell, in which at each station, the two variables always showed a significant positive correlation, varying in size from 0.659 to 0.865. <sup><xref ref-type="bibr" rid="ref16">16</xref></sup> As shown in <xref ref-type="table" rid="T1">Table 1</xref>, station three (breast examination) is less satisfactory in this regard, with an R<sup>2</sup> value of 0.44. The main problem with this station is a wide-spread of checklist scores for each global grade <xref ref-type="fig" rid="F1">Figure 1</xref>. This unsatisfactory relationship demonstrates some degree of non-linearity. Pell et al. suggest that in this situation, other methods rather than linear regression model may provide a better explanation. <sup><xref ref-type="bibr" rid="ref15">15</xref></sup> In our case, adding quadratic and/or a cubic term does not change the fitted relation considerably, and hardly increases the R<sup>2</sup> (linear &#x002B; quadratic: R<sup>2</sup> = 0.440, linear &#x002B; quadratic &#x002B; cubic: R<sup>2</sup> = 0.443). We think this kind of low correlation between global and checklist score indicates that one of the two measures or both are unreliable and/or invalid or they regard very different aspects of performance.</p>

<p>On the other hand, we should be cautious when interpreting the R<sup>2</sup> values because if raters automatically translated checklist score into a corresponding global score, the R<sup>2</sup> would have artificially been inflated. <sup><xref ref-type="bibr" rid="ref15">15</xref></sup> Other psychometric indicators of quality should be used to identify possible problems. <sup><xref ref-type="bibr" rid="ref15">15</xref></sup> As an example, station four, which had a high failure rate also showed an unacceptable inter-grade discrimination. Although no clear guidance on "ideal" value for inter-grade discrimination exists, Association for Medical Education in Europe guide no. 49 recommends this value should be "of the order of a 10 <sup>th</sup> of the maximum available checklist mark". <sup><xref ref-type="bibr" rid="ref15">15</xref></sup> Hence, we considered values below 20 as tolerable (the maximum checklist score was 100). For the splinting station, the distribution of the points in the scatter plot is not adequate for a reliable regression result: The large majority of points are concentrated at the lower left and only a few very influential points at the upper right support the steep regression line. The extreme skewedness of the score distribution is also indicated by the very low mean value for this station: 11.24. Obviously, the station is too difficult or the candidates were not adequately trained for the skills required for this station. In summary, although considering a station to be flawed solely based on the high number of failures is an incorrect assumption, <sup><xref ref-type="bibr" rid="ref15">15</xref></sup> scrutiny of station performance may inform curriculum effectiveness.</p>

<p>There are some limitations in our study. First, generalizability of the results of the present study may be limited by the fact that it was based on one rather small sample of 105 students in a single test. However, this study confirms the results of Kramer et al. and Schoonheim et al.; thus, we believe that the findings of this study can be extended to a wider context. Secondly, we used data only from nine out of 14 stations of the original OSCE. Finally, the main disadvantage to using RMSE approach in assessing reliability of BRM procedure is statistical complexity.</p>


</sec><sec><title>Conclusion</title><p>The current study confirms that using RMSE is an efficient method of assessing the reliability of BRM. It also proves that BRM is a reliable method of setting standard for OSCE and has the advantage of providing data for quality assurance.</p>


</sec>
  </body>
  <back>
	<ack><p>The authors would like to thank Azim Mirzazadeh MD, Director of the Education Development Office, School of Medicine, TUMS, and Ali Labaf MD, Director of the Clinical Skills Centre, School of Medicine, TUMS, for their aid with the design and implementation of the OSCE, and also for their constant support during this project.</p>
</ack>
	
	    <ref-list><ref id="ref1">
<label>1</label>
<nlm-citation citation-type="journal">
<person-group person-group-type="author"><name> 
  <surname>Cusinamo</surname>
  <given-names>MD</given-names>
</name>
</person-group><article-title>Standard setting in medical education</article-title><source>Acad Med</source>
<year>1996</year>
<volume></volume>
<fpage>71:112</fpage>
<lpage>20</lpage>
</nlm-citation>
</ref>
<ref id="ref2">
<label>2</label>
<nlm-citation citation-type="journal">
<person-group person-group-type="author"><name> 
  <surname>Norcini</surname>
  <given-names>JJ</given-names>
</name>
</person-group><article-title>Setting standards on educational tests</article-title><source>Med Educ</source>
<year>2003</year>
<volume>37</volume>
<fpage>464</fpage>
<lpage>9</lpage>
</nlm-citation>
</ref>
<ref id="ref3">
<label>3</label>
<nlm-citation citation-type="journal">
<person-group person-group-type="author"><name> 
  <surname>Cizek</surname>
  <given-names>GJ</given-names>
</name>
<name> 
  <surname>Bunch</surname>
  <given-names>MB</given-names>
</name>
</person-group><article-title>Standard setting: A guide to establishing and evaluating performance standards for tests.Thousand Oaks, CA: Sage Publications, Inc</article-title><source>;</source>
<year>7</year>
<volume></volume>
<fpage></fpage>
<comment> Standard setting: A guide to establishing and evaluating performance standards for tests Thousand Oaks, CA: Sage Publications, Inc; 2007 p 20-2</comment>
</nlm-citation>
</ref>
<ref id="ref4">
<label>4</label>
<nlm-citation citation-type="journal">
<person-group person-group-type="author"><name> 
  <surname>Wilkinson</surname>
  <given-names>TJ</given-names>
</name>
<name> 
  <surname>Newble</surname>
  <given-names>DI</given-names>
</name>
<name> 
  <surname>Frampton</surname>
  <given-names>CM</given-names>
</name>
</person-group><article-title>Standard setting in an objective structured clinical examination: Use of global ratings of borderline performance to determine the passing score</article-title><source>Med Educ</source>
<year>2001</year>
<volume>35</volume>
<fpage>1043</fpage>
<lpage>9</lpage>
</nlm-citation>
</ref>
<ref id="ref5">
<label>5</label>
<nlm-citation citation-type="journal">
<person-group person-group-type="author"><name> 
  <surname>Kramer</surname>
  <given-names>A</given-names>
</name>
<name> 
  <surname>Muijtjens</surname>
  <given-names>A</given-names>
</name>
<name> 
  <surname>Jansen</surname>
  <given-names>K</given-names>
</name>
<name> 
  <surname>D&#252;sman</surname>
  <given-names>H</given-names>
</name>
<name> 
  <surname>Tan</surname>
  <given-names>L</given-names>
</name>
<name> 
  <surname>van der Vleuten</surname>
  <given-names>C</given-names>
</name>
</person-group><article-title>Comparison of a rational and an empirical standard setting procedure for an OSCE.Objective structured clinical examinations</article-title><source>Med Educ</source>
<year>2003</year>
<volume>37</volume>
<fpage>132</fpage>
<lpage>9</lpage>
</nlm-citation>
</ref>
<ref id="ref6">
<label>6</label>
<nlm-citation citation-type="journal">
<person-group person-group-type="author"><name> 
  <surname>Muijtjens</surname>
  <given-names>AM</given-names>
</name>
<name> 
  <surname>Kramer</surname>
  <given-names>AW</given-names>
</name>
<name> 
  <surname>Kaufman</surname>
  <given-names>DM</given-names>
</name>
<name> 
  <surname>van de Vleuten</surname>
  <given-names>C</given-names>
</name>
</person-group><article-title>Using resampling to estimate the precision of an empirical standard setting method</article-title><source>Appl Meas Educ</source>
<year>2003</year>
<volume>16</volume>
<fpage>245</fpage>
<lpage>56</lpage>
</nlm-citation>
</ref>
<ref id="ref7">
<label>7</label>
<nlm-citation citation-type="journal">
<person-group person-group-type="author"><name> 
  <surname>Hobma</surname>
  <given-names>SO</given-names>
</name>
<name> 
  <surname>Ram</surname>
  <given-names>PM</given-names>
</name>
<name> 
  <surname>Muijtjens</surname>
  <given-names>AM</given-names>
</name>
<name> 
  <surname>Grol</surname>
  <given-names>RP</given-names>
</name>
<name> 
  <surname>van der Vleuten</surname>
  <given-names>CP</given-names>
</name>
</person-group><article-title>Setting a standard for performance assessment of doctor-patient communication in general practice</article-title><source>Med Educ</source>
<year>2004</year>
<volume>38</volume>
<fpage>1244</fpage>
<lpage>5</lpage>
</nlm-citation>
</ref>
<ref id="ref8">
<label>8</label>
<nlm-citation citation-type="journal">
<person-group person-group-type="author"><name> 
  <surname>Wood</surname>
  <given-names>TJ</given-names>
</name>
<name> 
  <surname>Humphrey-Murto</surname>
  <given-names>SM</given-names>
</name>
<name> 
  <surname>Norman</surname>
  <given-names>GR</given-names>
</name>
</person-group><article-title>Standard setting in a small scale OSCE: A comparison of the Modified Borderline-Group Method and the Borderline Regression Method</article-title><source>Adv Health Sci Educ Theory Pract</source>
<year>2006</year>
<volume>11</volume>
<fpage>115</fpage>
<lpage>22</lpage>
</nlm-citation>
</ref>
<ref id="ref9">
<label>9</label>
<nlm-citation citation-type="journal">
<person-group person-group-type="author"><name> 
  <surname>Boursicot</surname>
  <given-names>KA</given-names>
</name>
<name> 
  <surname>Roberts</surname>
  <given-names>TE</given-names>
</name>
<name> 
  <surname>Pell</surname>
  <given-names>G</given-names>
</name>
</person-group><article-title>Using borderline methods to compare passing standards for OSCEs at graduation across three medical schools</article-title><source>Med Educ</source>
<year>2007</year>
<volume>41</volume>
<fpage>1024</fpage>
<lpage>31</lpage>
</nlm-citation>
</ref>
<ref id="ref10">
<label>10</label>
<nlm-citation citation-type="journal">
<person-group person-group-type="author"><name> 
  <surname>Schoonheim-Klein</surname>
  <given-names>M</given-names>
</name>
<name> 
  <surname>Muijtjens</surname>
  <given-names>A</given-names>
</name>
<name> 
  <surname>Habets</surname>
  <given-names>L</given-names>
</name>
<name> 
  <surname>Manogue</surname>
  <given-names>M</given-names>
</name>
<name> 
  <surname>van der Vleuten</surname>
  <given-names>C</given-names>
</name>
<name> 
  <surname>van der Velden</surname>
  <given-names>U</given-names>
</name>
</person-group><article-title>Who will pass the dental OSCE.Comparison of the Angoff and the borderline regression standard setting methods&#x003F;</article-title><source>Eur J Dent Educ</source>
<year>2009</year>
<volume>13</volume>
<fpage>162</fpage>
<lpage>71</lpage>
</nlm-citation>
</ref>
<ref id="ref11">
<label>11</label>
<nlm-citation citation-type="journal">
<person-group person-group-type="author"><name> 
  <surname>Jalili</surname>
  <given-names>M</given-names>
</name>
<name> 
  <surname>Hejri</surname>
  <given-names>SM</given-names>
</name>
<name> 
  <surname>Norcini</surname>
  <given-names>JJ</given-names>
</name>
</person-group><article-title>Comparison of two methods of standard setting: The performance of the three-level Angoff method</article-title><source>Med Educ</source>
<year>2011</year>
<volume>45</volume>
<fpage>1199</fpage>
<lpage>208</lpage>
</nlm-citation>
</ref>
<ref id="ref12">
<label>12</label>
<nlm-citation citation-type="journal">
<person-group person-group-type="author"><name> 
  <surname>Kane</surname>
  <given-names>M</given-names>
</name>
</person-group><article-title>Choosing between examinee-centered and test-centered standard-setting methods</article-title><source>Educ Assess</source>
<year>1998</year>
<volume>5</volume>
<fpage>129</fpage>
<lpage>45</lpage>
</nlm-citation>
</ref>
<ref id="ref13">
<label>13</label>
<nlm-citation citation-type="journal">
<person-group person-group-type="author"><name> 
  <surname>Liu</surname>
  <given-names>M</given-names>
</name>
<name> 
  <surname>Liu</surname>
  <given-names>KM</given-names>
</name>
</person-group><article-title>Setting pass scores for clinical skills assessment</article-title><source>Kaohsiung J Med Sci</source>
<year>2008</year>
<volume>24</volume>
<fpage>656</fpage>
<lpage>3</lpage>
</nlm-citation>
</ref>
<ref id="ref14">
<label>14</label>
<nlm-citation citation-type="journal">
<person-group person-group-type="author"><name> 
  <surname>Davison</surname>
  <given-names>I</given-names>
</name>
<name> 
  <surname>Cooper</surname>
  <given-names>R</given-names>
</name>
<name> 
  <surname>Bullock</surname>
  <given-names>A</given-names>
</name>
</person-group><article-title>The objective structured public health examination: A study of reliability using multi-level analysis</article-title><source>Med Teach</source>
<year>2010</year>
<volume>32</volume>
<fpage>582</fpage>
<lpage>5</lpage>
</nlm-citation>
</ref>
<ref id="ref15">
<label>15</label>
<nlm-citation citation-type="journal">
<person-group person-group-type="author"><name> 
  <surname>Pell</surname>
  <given-names>G</given-names>
</name>
<name> 
  <surname>Fuller</surname>
  <given-names>R</given-names>
</name>
<name> 
  <surname>Homer</surname>
  <given-names>M</given-names>
</name>
<name> 
  <surname>Roberts</surname>
  <given-names>T</given-names>
</name>
<name> 
  <surname>International Association for Medical</surname>
  <given-names>Education</given-names>
</name>
</person-group><article-title>How to measure the quality of the OSCE: A review of metrics-AMEE guide no.49</article-title><source>Med Teach</source>
<year>2010</year>
<volume>32</volume>
<fpage>802</fpage>
<lpage>11</lpage>
</nlm-citation>
</ref>
<ref id="ref16">
<label>16</label>
<nlm-citation citation-type="journal">
<person-group person-group-type="author"><name> 
  <surname>Homer</surname>
  <given-names>M</given-names>
</name>
<name> 
  <surname>Pell</surname>
  <given-names>G</given-names>
</name>
</person-group><article-title>The impact of the inclusion of simulated patient ratings on the reliability of OSCE assessments under the borderline regression method</article-title><source>Med Teach</source>
<year>2009</year>
<volume>31</volume>
<fpage>420</fpage>
<lpage>5</lpage>
</nlm-citation>
</ref>
<ref id="ref17">
<label>17</label>
<nlm-citation citation-type="journal">
<person-group person-group-type="author"></person-group><article-title></article-title><source></source>
<year></year>
<volume></volume>
<fpage></fpage>
</nlm-citation>
</ref>
</ref-list>

  </back>
	
</article> 




