Chapter 2 Data sources

The dataset for our project is speed dating dataset provided by Anna Montoya. It is publicly available on [Data.com] as an excel file.

The data contains information about questionnaires filled out before, during and after a speed dating experiment in 2002-2004, which was run by professor Ray Fisman and Sheena Iyengar from Columbia Business School. The primary purpose of the experiment was discussed in their paper Gender Differences in Mate Selection: Evidence From a Speed Dating Experiment.

2.1 Experimental Design

The experiment design was based on the format of a Speed Dating event, where male participants and female participants have a one-to-one four-minute conversation and decide if they want to meet each other again. They will get the other’s contact information only if both parties “accept”. Each participant will meet a large number of participants of opposite sex during the experiment.

Subjects in this experiments were students in graduate and professional school at Columbia University during the time of the experiment. They learned about the event through mass e-mail and fliers on distributed on campus. They had to sign up for the event by providing their names and email addresses and completing a pre-event survey on website, which requires the following information:

  • Gender, Location, Income, Age, etc.
  • Their interests in 17 hobbies listed to them
  • Their self-evaluation

There were 20 sessions of the experiment, all of which had the same setting. All participants were randomly assigned to one of the sessions. On the session day, participants who are unawared of the total number of participants on that day had to fill in a scorecard that contains:

  • Spaces to write the ID number of each person they met
  • Yes/No whether a subject wants to meet the person again
  • Six attributes on which a subject was to rate the person they met. The attributes are:
    • Attractive
    • Sincere
    • Intelligent
    • Fun
    • Ambitious
    • Shared Interests

There were roughly the same number of female and male participants in each sessions. Subjects only had conversation with those of different genders. Female subjects would meet all male subjects in the session. The scorecard was to be filled after each four-minute conversation.

The day after the Speed Dating event, particiapants were asked to complete the follow-up online questionaire in order to obtain their matches. For more details on the experiment procedure, please check out the study.

2.2 Data Description

The raw dataset from the source has 195 variables and 8378 rows. The number of rows do not represent the total number of participants in the experiment. The data provider had transformed the dataset by gathering to match some variables between a subject and each of its partner. We have further cleaned the dataset so that it includes only important variables (see more detail in data cleaning section). Variables in the analysis is summarized in the metadata below.

Variable Description
iid Unique subject number
wave Session number
gender Female/male
race Race of a subject
1=Black/African American
2=European/Caucasian-American
3=Latino/Hispanic American
4=Asian/Pacific Islander/Asian-American
5=Native American
6=Other
from Where are you from originally (before coming to Columbia)?
field Field of study
field_cd Field code
1= Law
2= Math
3= Social Science, Psychologist
4= Medical Science, Pharmaceuticals, and Bio Tech
5= Engineering
6= English/Creative Writing/ Journalism
7= History/Religion/Philosophy
8= Business/Econ/Finance
9= Education, Academia
10= Biological Sciences/Chemistry/Physics
11= Social Work
12= Undergrad/undecided
13=Political Science/International Affairs
14=Film
15=Fine Arts/Arts Administration
16=Languages
17=Architecture
18=Other
[acticity] How interested are you in [activity] on a scale of 1-10?
[activity] are sports, tvsports, exercise, dining, museums, art, hiking, gaming, clubbing, reading, tv, theater, movies, concerts, music, shopping, and yoga
pid Partner’s iid number
match Whether iid and pid are matched. 1=yes, 0=no
dec_o Decision of partner the night of event
samerace Whether iid and pid have the same race
race_o Race of partner
attr How attractive is a person you met, on a scale of 1-10?
sinc How sincere is a person you met, on a scale of 1-10?
intel How intelligent is a person you met, on a scale of 1-10?
fun How fun is a person you met, on a scale of 1-10?
amb How ambitious is a person you met, on a scale of 1-10?
shar How much do you and a person you met share the same interest, on a scale of 1-10?
like Overall, how much do you like this person? (1=don’t like at all, 10=like a lot)
prob How probable do you think it is that this person will say ‘yes’ for you? (1=not probable, 10=extremely probable)
goal The primary goal a subject participated in the event.
1 = Seemed like a fun night out
2 = To meet new people
3 = To get a date
4 = Looking for a serious relationship
5 = To say I did it
6 = Other

Participants were also asked to rate five (and six in some cases) attributes for themselves and for people they met (i) before the event (if applicable), (ii) a day after the event, and (iii) 2 weeks after the event. The variable names are in the form of [attribute][#]_[#].

Code Description
[attr] attr = attractive
sinc = sincere
fun = fun
intel = intelligent
amb = ambitious
shar = share the same interest
[#1] 1 = Rate yourself
3 = Rate a person you meet
[#2] 1 = questions asked before the event (if applicable)
2 = questions asked one day after the event
3 = questions asked two weeks after the event

For more information about the raw data and questionaire, see the metadata from the dataset provider.