Chapter 5 An Illustrative Example of Case 3 Best–Worst Scaling
Author: Hideo Aizaki
First Release: 29 March 2021
5.1 Before you get started
This tutorial presents the entire process of Case 3 (multi-profile case) best–worst scaling (BWS)—from constructing the choice sets to analyzing the responses using R (R Core Team 2021b). The demonstration assumes some familiarity with R. If you are completely unfamiliar with R, you are advised to consult R Core Team (2021a) or other general support materials before attempting the example.
5.2 Overview of Case 3 BWS
This section explains Case 3 BWS briefly. For complete details on the approach please refer to works such as Marley and Pihlens (2012), Lancsar et al. (2013), and Louviere, Flynn, and Marley (2015 Chapter 4). For an outline of three variants of BWS, see Section 3.2.
Similar to discrete choice experiments (see Chapter 2), a Case 3 BWS question consists of three or more profiles. Each profile has two or more attributes with each attribute having two or more levels. Consequently, the profile is expressed as a combination of attribute-levels. Respondents are asked to choose the best and worst profiles from a Case 3 BWS question (i.e., a choice set). This style of question is repeated until all the choice sets are evaluated.
There are various approaches to construct the choice sets for Case 3 BWS. This example uses a basic design approach with a fractional factorial design (FFD)/orthogonal array (OA) and balanced incomplete block design (BIBD), as explained by Louviere, Flynn, and Marley (2015 Chapter 4). This approach creates profiles from an FFD and then assigns the profiles into choice sets using a BIBD. Therefore, the FFD and BIBD used for the approach need to satisfy a condition in which the number of rows of the FFD is equal to the number of treatments in the BIBD. An example with R code chunks will be shown later.
There are two approaches for analyzing the responses to Case 3 BWS questions similar to the Case 1 and Case 2 BWS—the counting and modeling approaches. Since the modeling approach seems to be major in applied studies, we focus only on this approach below. The modeling approach uses discrete choice models to analyze the responses. When using the modeling approach, a model type must be selected according to the assumptions made about the respondents’ choices to the Case 3 BWS questions. Following this, the dataset must be formatted as per the selected model. There are three standard models of analysis: maximum difference (maxdiff), sequential, and rank-ordered models (Lancsar et al. 2013; Marley and Pihlens 2012). Although the three models all assume that respondents derive utility from each profile in a choice set and select the best and worst profiles in the choice set based on the subjective utilities for profiles, the assumption regarding how they select the best and worst profiles from the choice set differs across the three models. The maxdiff model assumes that respondents select profile \(i\) as the best and profile \(j\) (\(i \neq j\)) as the worst because the difference in utility between the two profiles represents the greatest utility difference among all the utility differences. The number of utility differences in a pair is equal to the number of possible pairs in which profile \(i\) is selected as the best and profile \(j\) is selected as the worst from \(P\) profiles, that is, \(P \times (P - 1)\). The sequential model assumes that respondents select profile \(i\) as the best from \(P\) profiles in the choice set since the utility for profile \(i\) is the maximum, and then select profile \(j\) as the worst from the remaining \(P - 1\) profiles since the utility for profile \(j\) is the minimum. The rank-ordered model assumes that respondents select profile \(i\) as the best among \(P\) profiles since the utility for profile \(i\) is the maximum, profile \(k\) (\(k \neq i, j\)) as the second best from the remaining \(P - 1\) profiles since the utility of profile \(k\) is the second maximum, profile \(l\) (\(l \neq i, j, k\)) as the third best from the remaining \(P - 2\) profiles since the utility of profile \(l\) is the third maximum, and so on.
The probability of selecting profiles from a choice set \(S\) for each model can be expressed using the conditional logit model. Under assumptions such as a choice set consists of four profiles (\(S = \{1, 2, 3, 4\}\)); respondent \(n\) selected Profile 2 as the first best (FB), Profile 4 as the first worst (FW), Profile 1 as the second best (SB), and Profile 3 as the second worst (SW = the third best [TB] for the rank-ordered model); and the systematic component of utility of profile \(i\) for respondent \(n\) is \(v_{in}\), the choice probability for the maxdiff, sequential, and rank-ordered models is given as follows (\(n\) is omitted for the simplicity):
\[\begin{equation} \mathrm{Pr}(\textrm{FB} = 2, \textrm{FW} = 4) = \frac{\exp{(v_{2} - v_{4})}}{\sum_{p, q \subset S, p \neq q} \exp{(v_{p} - v_{q})}}. \end{equation}\]
\[\begin{equation} \mathrm{Pr}(\textrm{FB} = 2, \textrm{FW} = 4, \textrm{SB} = 1) = \frac{\exp{(v_{2})}}{\sum_{p \subset S} \exp{(v_{p})}}\frac{\exp{(-v_{4})}}{\sum_{q \subset S - \{2\}} \exp{(-v_{q})}}\frac{\exp{(v_{1})}}{\sum_{r \subset S - \{2, 4\}} \exp{(v_{r})}}. \end{equation}\]
\[\begin{equation} \mathrm{Pr}(\textrm{FB} = 2, \textrm{SB} = 1, \textrm{TB} = 3) = \frac{\exp{(v_{2})}}{\sum_{p \subset S} \exp{(v_{p})}}\frac{\exp{(v_{1})}}{\sum_{q \subset S - \{2\}} \exp{(v_{q})}}\frac{\exp{(v_{3})}}{\sum_{r \subset S - \{2, 1\}} \exp{(v_{r})}}. \end{equation}\]
where \(v_{i} = x_{i}'\beta\); \(x_{i}\) is a vector of attribute-level variables for profile \(i\); and \(\beta\) is a vector of the coefficients to be estimated.
The following R code example, given in this chapter, focuses only on the maxdiff model.
5.3 Packages for Case 3 BWS
We use five CRAN packages support.BWS3 (Aizaki 2019b), support.CEs (Aizaki 2012), DoE.base (Grömping 2018), crossdes (Sailer 2013), and survival (Therneau 2020) to explain how to implement Case 3 BWS in R. The package support.BWS3 provides functions to convert an FFD/OA and BIBD into Case 3 BWS questions, create a dataset for the analysis from the two designs and the responses to the questions, and generate artificial responses to the questions. The function oa.design()
in DoE.base constructs OAs, whereas the function find.BIB()
in crossdes generates BIBDs. The function clogit()
in survival fits the conditional logit model. Apart from clogit()
, other packages such as mlogit (Croissant 2020b), gmnl (Sarrias and Daziano 2017), and apollo (Hess and Palma 2019, 2021) are also used for the conditional logit model analysis. The functions gofm()
and mwtp()
in support.CEs calculate the goodness-of-fit measures for conditional logit model fitted by clogit()
and marginal willingness to pays for the non-monetary variables in the fitted model, respectively. Details for the CRAN packages regarding experimental designs and discrete choice models can be found on CRAN Task View: https://CRAN.R-project.org/view=ExperimentalDesign for experimental designs; and https://CRAN.R-project.org/view=Econometrics for econometrics including discrete choice models.
Once the packages have been downloaded and installed, we then load the packages needed for the example into the current session:
library("support.BWS3")
library("support.CEs")
library("survival")
library("crossdes")
library("DoE.base")
Note that we printed out six digits in the example. To set the same appearance format in R, execute
5.4 Designing a choice situation
We explain how to use the functions on the basis of a hypothetical survey focusing on potential tourists’ valuation of agritourism, which is constructed according to the survey given in Chapter 4. Agritourism refers to various activities offered by farms to visitors, such as hands-on farm work or outdoor recreation. In the hypothetical survey, a total of 100 individuals were asked to evaluate agritourism packages provided by dairy farms.
As the first step of implementing Case 3 BWS, it is necessary to determine the features of agritourism that are potentially available. In this example, the agritourism package consists of the following three types of activities and the fee, each with three levels (names used in R code chunks are in parenthesis):
- Hands-on farm chore attribute (
chore
)- Milking a cow (
milking
) - Feeding a cow (
feeding
) - Nursing a calf (
nursing
)
- Milking a cow (
- Hands-on food processing attribute (
food
)- Butter making (
butter
) - Ice-cream making (
ice
) - Creamy caramel making (
caramel
)
- Butter making (
- Outdoor recreation attribute (
outdoor
)- Horse riding (
horse
) - Tractor riding (
tractor
) - Walking with cows (
cow
)
- Horse riding (
- Fee per person of the agritourism package attribute (
fee
) (Note that USD 1 \(\neq\) JPY 104 as of January 2021)- JPY 3000
- JPY 6000
- JPY 9000
5.5 Generating designs
As mentioned above, this example uses a design method for Case 3 BWS as explained by Louviere, Flynn, and Marley (2015 Chapter 4). Therefore, the second step is to prepare an FFD/OA and BIBD corresponding to the choice situation.
Firstly, we create an OA with four three-level factors (attributes) using the function oa.design()
in DoE.base. Although the function has many arguments (see the oa.design
help document for details), this example uses two arguments:
nlevels
: A vector containing the number of levels corresponding to the OMED that you wish to create.randomize
: A logical variable denoted byTRUE
when randomizing the order of the rows in the resultant design orFALSE
when not doing so.
The following lines of code create the OMED used in this example and store it in an object oa
:
## A B C D
## 1 1 1 1 1
## 2 1 2 3 2
## 3 1 3 2 3
## 4 2 1 3 3
## 5 2 2 2 1
## 6 2 3 1 2
## 7 3 1 2 2
## 8 3 2 1 3
## 9 3 3 3 1
## class=design, type= oa
The resultant design is a matrix with nine rows and four columns. The columns of the OA correspond to attributes, while the rows correspond to profiles. Therefore, the first row denotes Profile 1 that consists of level 1 of each attribute, and the last (ninth) row denotes Profile 9 that consists of level 3 of attribute 1, 2, and 3 each, and level 1 of attribute 4.
Next, we create a BIBD with nine treatments, three treatments per block, and 12 blocks using the find.BIB()
function in crossdes. Three arguments are used:
trt
: A number of treatments.k
: A number of treatments per block.b
: A number of blocks.
R code for creating the BIBD for this example is given as:
## [,1] [,2] [,3]
## [1,] 2 4 5
## [2,] 3 4 9
## [3,] 2 3 6
## [4,] 1 6 9
## [5,] 1 3 5
## [6,] 2 8 9
## [7,] 5 7 9
## [8,] 3 7 8
## [9,] 1 4 8
## [10,] 4 6 7
## [11,] 1 2 7
## [12,] 5 6 8
##
## [1] The design is a balanced incomplete block design w.r.t. rows.
As the find.BIB()
uses random numbers internally, a seed for random number generator is set in advance by set.seed()
for reproducibility. The result of executing the crossdes function isGYD()
with the output from find.BIB()
indicates that the resultant design is a BIBD.
As mentioned above, the resultant BIBD is expressed as a matrix with 12 rows and three columns. According to Louviere, Flynn, and Marley (2015 Chapter 4), we can generate 12 choice sets for Case 3 BWS by replacing the treatment numbers in the BIBD such as 1, 2, …, and 9 with the corresponding profiles such as Profile 1, Profile 2, …, and Profile 9 created above. In this example, a choice set corresponding to the first row (block) of the BIBD consists of Profile 2, Profile 4, and Profile 5, while one corresponding to the ninth row of the BIBD consists of Profile 5, Profile 6, and Profile 8. This assignment process can be done using the support.BWS3 function bws3.design()
with the following arguments:
bibd
: A data frame or matrix containing a BIBD.ffd
: A data frame or matrix containing an FFD.attribute.levels
: A list containing the names of the attributes and their levels.
The following code chunk stores the names of the attributes and their levels into an object atr
, and then creates a Case 3 BWS design for this example from the objects bibd
, oa
, and atr
:
atr <- list(
chore = c("milking", "feeding", "nursing"),
food = c("butter", "ice", "caramel"),
outdoor = c("horse", "tractor", "cow"),
fee = c(3000, 6000, 9000))
bws3dsgn <- bws3.design(
bibd = bibd,
ffd = oa,
attribute.levels = atr)
class(bws3dsgn)
## [1] "cedes" "list"
The resultant design is an object S3 class "cedes"
, which is an S3 class for discrete choice experimental design generated using functions in support.CEs (see Chapter 2).
5.6 Creating questions
The third step is to create Case 3 BWS questions from the design generated above. As the design generated from bws3.design()
is an object S3 class "cedes"
, the function questionnaire()
in support.CEs is used to create Case 3 BWS questions. The following is the R code chunk for this example:
questionnaire(bws3dsgn)
Block 1
Question 1
alt.1 alt.2 alt.3
chore "milking" "feeding" "feeding"
food "ice" "butter" "ice"
outdoor "cow" "cow" "tractor"
fee "6000" "9000" "3000"
...
Question 12
alt.1 alt.2 alt.3
chore "feeding" "feeding" "nursing"
food "ice" "caramel" "ice"
outdoor "tractor" "horse" "horse"
fee "3000" "6000" "9000"
As you can see, the first question has three different agritourism packages. The first package (alt.1
), which comes at JPY 6000, consists of three activities which are milking a cow, ice-cream making, and walking with cows; the second package (alt.2
) including feeding a cow, butter making, and waking with cows is priced at JPY 9000; and the third package (alt.3
) of feeding a cow, ice-cream making, and tractor riding comes at JPY 3000. When an opt-out option (i.e., I would like to select none of these packages) is available, respondents are asked to select the best and worst alternative from among four including the opt-out option. We can the copy the resultant questions from the R console and then paste them into other software packages such as a word processor to create the actual questionnaire.
5.7 Conducting a survey/Synthesizing responses to questions
In practice, the fourth step is to conduct the actual questionnaire survey using the Case 3 BWS questions created above. As this example assumes a hypothetical survey, we skip this step and instead synthesize responses to the Case 3 BWS questions (designs) using the function bws3.response()
in support.BWS3. This function generates artificial responses on the basis of a maxdiff model and has the following arguments:
design
: An object of the S3 class"cedes"
.categorical.attributes
: A vector containing the names of the attributes that are treated as dummy-coded variables in the model. If there are no categorical variables, it is set asNULL
(default).continuous.attributes
: A vector containing the names of the attributes that are treated as continuous variables in the model. If there are no continuous variables, it is set asNULL
(default).asc
: A vector containing binary values that takes the value of 1 when an alternative specific constant (ASC) is included in the utility function of an alternative and 0 otherwise. The \(i\)-th element in the vector corresponds to the \(i\)-th alternative. If there are no ASCs, it is set asNULL
(default).common
: A named vector containing a fixed combination of levels corresponding to a common alternative (base/reference) in the choice sets. If the common alternative is an opt-out (no choice) option, use the argumentoptout
instead. If there is no common option, it is set asNULL
(default).optout
: A logical variable describing whether the opt-out option is included in the Case 3 BWS questions. IfTRUE
(default), the opt-out option is included; ifFALSE
, it is not.b
: A vector containing parameters of independent variables in the model. The vector is used to calculate utilities for alternatives.n
: An integer value showing the number of respondents in the resultant dataset.detail
: A logical variable. IfTRUE
, the dataset is returned in a detailed format; and ifFALSE
(default), the dataset is returned in a simple format.seed
: Seed for a random number generator.
Before using the function bws3.response()
, we have to consider assumptions of an opt-out option and systematic component of utility function. Assumptions for this example are given as: (i) an opt-out option is included in each choice set (i.e., respondents are asked to select one from a set of three profiles and the “none of these” option); (ii) three activities are treated as categorical (qualitative) attributes, and the third level in each activity attribute (i.e., nursing a calf for hands-on farm chore, creamy caramel making for hands-on food processing, and walking with cows for outdoor recreation) is a reference (base level) for dummy variables; (iii) the fee is treated as a continuous (quantitative) attribute; and (iv) an alternative specific constant (ASC) is included in a utility function for the opt-out option, while it is not included for agritourism packages. Accordingly, the systematic component of a utility function for three agritourism packages (\(i = 1, 2, 3\)) and that for the opt-out option (\(i = 4\)) are given as follows:
\[\begin{align}
v_{i} = &\beta_{1}milking_{i} + \beta_{2}feeding_{i} + \beta_{3}butter_{i} + \beta_{4}ice_{i} + \beta_{5}horse_{i} + \beta_{6}tractor_{i} + \beta_{7}fee_{i},
\end{align}\]
\[\begin{align}
v_{4} = ASC,
\end{align}\]
where \(milking_{i}\) is a dummy variable taking the value of 1 if agritourism package \(i\) has a “milking a cow” as the hands-on farm chore activity, 0 otherwise; \(feeding_{i}\) is a dummy variable taking the value of 1 if agritourism package \(i\) has a “feeding a cow” as the hands-on farm chore activity, 0 otherwise; \(butter_{i}\) is a dummy variable taking the value of 1 if agritourism package \(i\) has a “butter making” as the hands-on food processing activity, 0 otherwise; \(ice_{i}\) is a dummy variable taking the value of 1 if agritourism package \(i\) has a “ice-cream making” as the hands-on food processing activity, 0 otherwise; \(horse_{i}\) is a dummy variable taking the value of 1 if agritourism package \(i\) has a “horse riding” as the outdoor recreation, 0 otherwise; \(tractor_{i}\) is a dummy variable taking the value of 1 if agritourism package \(i\) has a “tractor riding” as the outdoor recreation, 0 otherwise; \(fee_{i}\) is a numerical variable for fee per person of agritourism package \(i\) (unit: JPY); \(ASC\) is an alternative specific constant; and \(\beta\)s are coefficients (parameters) for these variables. Further, this example assumes that the values of the parameters, including \(ASC\), are as follows: \(\beta_{1} = 0.5\); \(\beta_{2} = -0.2\); \(\beta_{3} = 1\); \(\beta_{4} = 0.1\); \(\beta_{5} = 0.1\); \(\beta_{6} = 0.1\); \(\beta_{7} = -0.0003\); and \(ASC = -1\). Note that the function bws3.response()
assumes that the opt-out option is located as the last alternative in each choice set, if the option is available.
After calculating the utility values (by adding the calculated values of the systematic component of the utility and random numbers generated from a type I extreme value distribution), the function bws3.response()
finds the pair with the highest difference in utility from the \(P \times (P - 1)\) differences in utility. Given the assumptions mentioned above, the code chunk for synthesizing 100 individuals’ responses to Case 3 BWS questions for this example is as follows:
bws3rsp <- bws3.response(
design = bws3dsgn,
b = c(0, 0, 0, -1, 0.5, -0.2, 0, 1, 0.1, 0, 1, 0.1, 0, -0.0003),
n = 100,
optout = TRUE,
categorical.attributes = c("chore", "food", "outdoor"),
continuous.attributes = c("fee"),
asc = c(0,0,0,1),
detail = F,
seed = 123)
Here, we provide additional explanation on the settings for arguments b
and asc
. The numerical vector assigned to the argument b
is c(0, 0, 0, -1, 0.5, -0.2, 0, 1, 0.1, 0, 1, 0.1, 0, -0.0003)
. The first, second, third, and fourth elements correspond to ASC in the first, second, third, and fourth alternatives, respectively, in a choice set. The value of 0 for ASCs means the corresponding ASC does not exist: the first, second, and third alternative has no ASC. The fifth, sixth, and seventh elements correspond to the parameter values of dummy variables regarding the first attribute: milking, feeding, and nursing. Note that the seventh element has the value of 0, meaning that nursing a calf is treated as the reference for dummy variables regarding the hands-on farm chore attribute. The eighth, ninth, and tenth elements correspond to the parameter values of dummy variables regarding the second attribute: butter, ice, and caramel (the reference for hands-on food processing attribute). The eleventh, twelfth, and thirteenth elements correspond to the third attribute: horse, tractor, and cow (the reference for outdoor recreation attribute). The last element corresponds to the variable for the fourth attribute: fee.
When the systematic component of the utility includes ASCs, the argument asc
is set according to the assumption on ASCs. The \(i\)-th element of a binary vector assigned to the argument asc
takes the value of 1 if the \(i\)-th alternative has an ASC and takes the value of 0 otherwise. For this example, since a choice set consists of four alternatives and only the fourth alternative (i.e., the opt-out option) has an ASC, a vector c(0, 0, 0, 1)
is assigned to the argument asc
.
The resultant dataset is given as follows:
## 'data.frame': 100 obs. of 26 variables:
## $ id : int 1 2 3 4 5 6 7 8 9 10 ...
## $ BLOCK: num 1 1 1 1 1 1 1 1 1 1 ...
## $ B1 : num 4 1 3 1 3 3 4 3 1 3 ...
## $ W1 : num 2 2 4 3 1 1 2 2 3 2 ...
## $ B2 : num 4 4 4 4 1 3 4 3 4 3 ...
## $ W2 : num 3 2 1 3 2 4 1 1 1 1 ...
## $ B3 : num 3 4 3 2 3 2 3 3 3 2 ...
## $ W3 : num 1 2 2 3 2 4 2 4 2 3 ...
## $ B4 : num 1 1 1 1 1 1 1 2 1 1 ...
## $ W4 : num 2 2 4 3 3 3 2 4 2 3 ...
## $ B5 : num 1 1 4 4 3 1 1 1 4 1 ...
## $ W5 : num 3 2 2 2 2 4 2 2 3 3 ...
## $ B6 : num 3 1 3 2 1 4 4 1 3 3 ...
## $ W6 : num 2 3 2 1 2 2 2 2 2 1 ...
## $ B7 : num 4 2 1 2 2 1 2 3 1 4 ...
## $ W7 : num 3 4 4 3 1 4 3 4 2 1 ...
## $ B8 : num 2 4 2 2 3 4 2 2 1 4 ...
## $ W8 : num 1 1 1 1 1 1 3 3 4 2 ...
## $ B9 : num 1 1 1 1 1 3 1 1 4 1 ...
## $ W9 : num 2 3 3 2 3 2 2 2 2 2 ...
## $ B10 : num 4 4 2 2 1 2 2 4 4 2 ...
## $ W10 : num 1 1 1 4 2 1 3 2 1 1 ...
## $ B11 : num 2 1 3 1 1 1 1 1 1 1 ...
## $ W11 : num 4 2 4 3 4 2 2 4 4 4 ...
## $ B12 : num 3 1 4 4 4 1 4 2 4 1 ...
## $ W12 : num 1 2 2 2 2 2 2 1 1 2 ...
The dataset contains 100 observations (individuals) with 26 variables. The id
variable is the respondents’ unique id variable. The BLOCK
variable is the block number of choice sets. For this example, the choice sets are not divided into sub blocks. As such, the BLOCK
variable takes only the value of 1. The remaining 24 variables are response variables B1
to W12
that correspond to the Case 3 BWS questions: B
denotes a response as the best, W
denotes a response as the worst, and the numbers 1
to 12
indicate the Case 3 BWS question number. The alternative numbers selected as the best and worst are stored in the response variables. For example, respondent 1 (id = 1
) selected the opt-out option as the best (B1 = 4
) and the second alternative as the worst (W1 = 2
) in the first question.
When you conduct a survey, you would have to create a respondent dataset on the basis of the completed questionnaires. The structure of your respondent dataset needs to follow the basic structure of the dataset mentioned above: it must include the id
, BLOCK
, response variables. The response variables must be constructed such that the best selection alternates with the worst by question, which is similar to the synthesized respondent dataset: it contains 24 response variables B1
, W1
, B2
, W2
, …, B12
, and W12
. Other variables in the respondent dataset are treated as the respondents’ characteristics such as gender and age.
5.8 Creating a dataset
As the fifth step, we need to create a dataset for the analysis by combining the Case 3 BWS design generated in the second step and the responses to the Case 3 BWS questions. A dataset for the analysis is created using the support.BWS3 function bws3.dataset()
with the following arguments:
data
: A data frame containing a respondent dataset.id
: A character representing the respondent identification number variable in the respondent dataset.response
: A named list containing the names of response variables in the respondent dataset.choice.sets
: An object of the S3 class"cedes"
.categorical.attributes
: A vector containing the names of attributes treated as dummy-coded variables in the analysis, or a named vector containing the reference levels of attributes treated as effect-coded variables. If there are no categorical variables, it is set asNULL
(default).continuous.attributes
: A vector containing the names of attributes treated as continuous variables in the analysis. If there are no continuous variables, it is set asNULL
(default).common
: A named vector containing a fixed combination of levels corresponding to a common alternative (base/reference) in the choice sets. If the common alternative is an opt-out (no choice) option, use the argumentoptout
instead. If there is no common option, it is set asNULL
(default).optout
: A logical variable describing whether the opt-out option is included in the BWS questions. IfTRUE
(default), the opt-out option is included;FALSE
otherwise it is not.asc
: A vector containing binary values that takes the value of 1 when an ASC is included in the utility function of an alternative and 0 otherwise. The \(i\)-th element in the vector corresponds to the \(i\)-th alternative. If there are no ASCs, it is set asNULL
(default).model
: A character showing a type of dataset created by the function:"maxdiff"
for a maximum difference model;"sequential"
for a sequential model; and"rank"
for a rank-ordered model.detail
: A logical variable describing whether the response variables assigned to the argumentresponse
are also stored in the dataset created by the function.
As you can see, the definitions of the arguments data
, choice.sets
, categorical.attributes
, continuous.attributes
, common
, optout
, and asc
are similar to those of the function bws3.response()
. Note that the order of questions in the respondent dataset has to be the same as that in the Case 3 BWS design assigned to choice.sets
.
In this example, we store the names of the response variables used in the synthesized respondent dataset into the vector bws3rsp.var
, which will be assigned to the argument response
of bws3.dataset()
:
## [1] "B1" "W1" "B2" "W2" "B3" "W3" "B4" "W4" "B5" "W5" "B6" "W6"
## [13] "B7" "W7" "B8" "W8" "B9" "W9" "B10" "W10" "B11" "W11" "B12" "W12"
According to the settings mentioned above, the following code chunk creates the dataset for the maxdiff model (note the argument model = "maxdiff"
):
bws3dat <- bws3.dataset(
data = bws3rsp,
response = bws3rsp.var,
choice.sets = bws3dsgn,
categorical.attributes = c("chore", "food", "outdoor"),
continuous.attributes = c("fee"),
optout = TRUE,
asc = c(0, 0, 0, 1),
model = "maxdiff")
In this example, each question contains four alternatives, including the none-of-these option, and thus the number of possible pairs per question is 12 (\(= 4 \times 3\)). Accordingly, the resultant dataset consists of 14,400 rows (= 100 respondents \(\times\) 12 questions \(\times\) 12 possible pairs per question):
## [1] 14400 24
The 24 variables, corresponding to the columns of the dataset, are as follows:
## [1] "id" "BLOCK" "QES" "PAIR" "BEST" "WORST" "RES.B"
## [8] "RES.W" "RES" "ASC1" "ASC2" "ASC3" "ASC4" "milking"
## [15] "feeding" "nursing" "butter" "ice" "caramel" "horse" "tractor"
## [22] "cow" "fee" "STR"
Among these variables, the variables RES
and ASC4
to STR
are used for the modeling analysis explained below. The following shows the first 12 rows, which correspond to the 12 possible pairs (PAIR
= 1
to 12
) in the first question (QES
= 1
) for respondent 1 (id
= 1
):
## id QES PAIR RES ASC4 milking feeding nursing butter ice caramel horse
## 1 1 1 1 0 0 1 -1 0 -1 1 0 0
## 2 1 1 2 0 0 1 -1 0 0 0 0 0
## 3 1 1 3 0 -1 1 0 0 0 1 0 0
## 4 1 1 4 0 0 -1 1 0 1 -1 0 0
## 5 1 1 5 0 0 0 0 0 1 -1 0 0
## 6 1 1 6 0 -1 0 1 0 1 0 0 0
## 7 1 1 7 0 0 -1 1 0 0 0 0 0
## 8 1 1 8 0 0 0 0 0 -1 1 0 0
## 9 1 1 9 0 -1 0 1 0 0 1 0 0
## 10 1 1 10 0 1 -1 0 0 0 -1 0 0
## 11 1 1 11 1 1 0 -1 0 -1 0 0 0
## 12 1 1 12 0 1 0 -1 0 0 -1 0 0
## tractor cow fee STR
## 1 0 0 -3000 1010
## 2 -1 1 3000 1010
## 3 0 1 6000 1010
## 4 0 0 3000 1010
## 5 -1 1 6000 1010
## 6 0 1 9000 1010
## 7 1 -1 -3000 1010
## 8 1 -1 -6000 1010
## 9 1 0 3000 1010
## 10 0 -1 -6000 1010
## 11 0 -1 -9000 1010
## 12 -1 0 -3000 1010
The variable RES
, which will be used as a response variable in the modeling analysis below, takes the value of 1 if the corresponding pair is selected by the respondent, 0 otherwise. As such, we can understand that respondent 1 selected the pair corresponding to the 11th row in the first BWS question. Although four ASCs (i.e., ASC1
, ASC2
, ASC3
, and ASC4
; see the output from names()
mentioned above) are created by the function bws3.dataset()
in this example, only the variable ASC4
will be used in the following analysis. This is because the function bws3.dataset()
creates ASCs according to the length of a vector assigned to the argument asc
. This example assigned a vector with four elements; as such four ASCs are included in the resultant dataset. The ASCs, corresponding to the elements with the value of 1 in the vector assigned to the argument asc
, are used in the following analysis, while the remaining ASCs, which have only the value of 0 in all the observations (rows) of the resultant dataset, are not used. The nine variables milking
to cow
correspond to the dummy variables regarding three activity attributes. Note that although three dummy variables are created according to the three levels in each attribute, one of three dummy variables for each attribute is not used in the modeling analysis because the non-used one is treated as the reference level in the attribute. The variable fee
corresponds to the fee per person of the agritourism package. These variables regarding the four attributes are named according to the level names in the design assigned to the argument choice.sets
and will be used as the independent variables in the following analysis. The last variable STR
is a stratification variable that has different values depending on the combinations of respondents and BWS questions.
5.9 Measuring preferences
Finally, we fit the conditional logit model to the responses on the basis of the maxdiff model. The function clogit()
in survival is used to fit the models. Although this function has many arguments, only two are used in this example:
formula
: A model formula.data
: A data frame containing the variables used in the model formula.
The structure of the model formula is similar to that for a linear regression function, lm()
: the dependent variable is set on the left-hand side of ~
, whereas the independent variables are set on the right-hand side according to a systematic component of the utility function to be fitted. Unlike lm()
, however, a stratification variable must be added to the end of the right-hand side of ~
using the strata()
function. As shown above, the stratification variable used in clogit()
is automatically generated as the variable STR
and stored in the dataset created using the function bws3.dataset()
. Three level variables—nursing
, caramel
, and cow
—are omitted from the model as those are assumed as the reference in the corresponding attributes (see the systematic component of the utility function for this example mentioned in Section 5.7). Therefore, the model formula for clogit()
is described (and stored as an object bws3mf
) as:
and the model is fitted using clogit()
with the dataset bws3dat
as follows:
## Call:
## clogit(formula = bws3mf, data = bws3dat)
##
## coef exp(coef) se(coef) z p
## ASC4 -8.98e-01 4.07e-01 9.83e-02 -9.14 < 2e-16
## milking 4.35e-01 1.55e+00 7.10e-02 6.13 8.6e-10
## feeding -1.76e-01 8.39e-01 6.17e-02 -2.85 0.0044
## butter 9.73e-01 2.65e+00 7.26e-02 13.40 < 2e-16
## ice 1.79e-01 1.20e+00 6.11e-02 2.93 0.0034
## horse 1.10e+00 3.01e+00 7.27e-02 15.14 < 2e-16
## tractor 1.34e-01 1.14e+00 6.22e-02 2.16 0.0308
## fee -2.97e-04 1.00e+00 1.29e-05 -23.03 < 2e-16
##
## Likelihood ratio test=1087 on 8 df, p=<2e-16
## n= 14400, number of events= 1200
In this output, the column coef
contains the estimated coefficients, column exp(coef)
is the exponential function of the estimated coefficients, column se(coef)
is the standard error of the estimated coefficients, and columns z
and p
give the \(z\)-value and \(p\)-value under the null hypothesis, in which the coefficient is zero. According to our model assumption on references for dummy variables regarding the three activity attributes, tourists’ preference for feeding is smaller than that for nursing (the reference in the hands-on farm chore), while that for milking is larger than that for the reference; those for butter and ice cream makings are larger than that for creamy caramel making (the reference in the hands-on food processing); and those for horse and tractor ridings are larger than that for walking with cows (the reference in the outdoor recreation). The negative estimate for fee means that tourists prefer an agritourism package with a lower fee.
The summary statistics for the model fitted with clogit()
can be calculated using the function gofm()
in support.CEs:
##
## Rho-squared = 0.182222
## Adjusted rho-squared = 0.179539
## Akaike information criterion (AIC) = 4893.04
## Bayesian information criterion (BIC) = 4933.76
## Number of coefficients = 8
## Log likelihood at start = -2981.89
## Log likelihood at convergence = -2438.52
Further, when using clogit()
for fitting the conditional logit model, marginal willingness to pay (MWTP) for the levels can be calculated with the function mwtp()
in support.CEs. Three arguments are used for this example:
output
: An output fromclogit()
.monetary.variables
: A character vector containing the monetary variable.nonmonetary.variables
: A character vector containing the non-monetary variables used to calculate the MWTPs.
The command for this example is given as:
mwtp(
output = bws3md.cl,
monetary.variables = c("fee"),
nonmonetary.variables = c("milking", "feeding", "butter",
"ice", "horse", "tractor"))
##
## MWTP 2.5% 97.5%
## milking 1465.6 1016.3 1919.8
## feeding -591.9 -1006.7 -185.6
## butter 3275.9 2805.9 3765.3
## ice 603.4 194.6 1010.5
## horse 3705.9 3221.4 4200.7
## tractor 451.9 40.3 850.1
##
## method = Krinsky and Robb
According to the 95% confidence intervals for the MWTPs, MWTPs of activities milking, butter, ice, horse, and tractor are significant positives at 5% level, while that of feeding is a significant negative at 5% level.
References
Aizaki, H. 2012. “Basic Functions for Supporting an Implementation of Choice Experiments in R.” Journal of Statistical Software, Code Snippets 50 (2): 1–24. https://doi.org/10.18637/jss.v050.c02.
Aizaki, H. 2019b. Support.BWS3: Tools for Case 3 Best-Worst Scaling. https://CRAN.R-project.org/package=support.BWS3.
Croissant, Y. 2020b. “Estimation of Random Utility Models in R: The Mlogit Package.” Journal of Statistical Software 95 (11): 1–41. https://doi.org/10.18637/jss.v095.i11.
Grömping, U. 2018. “R Package DoE.base for Factorial Experiments.” Journal of Statistical Software 85 (5): 1–41. https://doi.org/10.18637/jss.v085.i05.
Hess, S., and D. Palma. 2019. “Apollo: A Flexible, Powerful and Customisable Freeware Package for Choice Model Estimation and Application.” Journal of Choice Modelling 32: 100170. https://doi.org/10.1016/j.jocm.2019.100170.
Hess, S., and D. Palma. 2019. “Apollo: A Flexible, Powerful and Customisable Freeware Package for Choice Model Estimation and Application.” Journal of Choice Modelling 32: 100170. https://doi.org/10.1016/j.jocm.2019.100170.
2021. Apollo: A Flexible, Powerful and Customisable Freeware Package for Choice Model Estimation and Application. Choice Modelling Centre. http://www.ApolloChoiceModelling.com.Lancsar, E., J. Louviere, C. Donaldson, G. Currie, and L. Burgess. 2013. “Best Worst Discrete Choice Experiments in Health: Methods and an Application.” Social Science & Medicine 79: 74–82. https://doi.org/10.1016/j.socscimed.2012.10.007.
Louviere, J. J., T. N. Flynn, and A. A. J. Marley. 2015. Best–Worst Scaling: Theory, Methods and Applications. Cambridge, UK: Cambridge University Press.
Marley, A. A. J., and D. Pihlens. 2012. “Models of Best-Worst Choice and Ranking Among Multiattribute Options (Profiles).” Journal of Mathematical Psychology 56: 24–34. https://doi.org/10.1016/j.jmp.2011.09.001.
R Core Team. 2021a. An Introduction to R. https://CRAN.R-project.org/manuals.html.
R Core Team. 2021b. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Sailer, M. O. 2013. Crossdes: Construction of Crossover Designs. https://CRAN.R-project.org/package=crossdes.
Sarrias, M., and R. Daziano. 2017. “Multinomial Logit Models with Continuous and Discrete Individual Heterogeneity in R: The Gmnl Package.” Journal of Statistical Software, Articles 79 (2): 1–46. https://doi.org/10.18637/jss.v079.i02.
Therneau, T. M. 2020. Survival: Survival Analysis. https://CRAN.R-project.org/package=survival.