Chapter 7 Conclusion

7.1 Limitations

There are several issues with the dataset that we use. First, a part of the dataset is a questionaire about subjective matters. For instance, two people do not rate the same level of attractive with the same score. This potentially introduces bias to the dataset and invalidates the results. Second, the dataset, although very interesting, is almost 20 years old. Relationships among variables may have changed over the years. Besides, there is an obvious selection bias during the process of data collecting. Only students are included in the experiment. Outgoing people are more likely to participate. Moreover, the setting that only pairs female and male students together excludes a large group of people who do not identify themselves as any binary genders. As a result, what we find in the analyses may not be applicable to nowadays.

7.2 Future Directions

The first next step would be to include more interactive components to the analyses to highlight the findings and engage readers.

Some current findings in the analyses are not evident enough. This suggests that there may be more factors that we should explore. Based on our results, any appearance features, such as the way participants talk and walk or their postures, are good candidates although these are not easily quantified. If possible, we also would like to conduct statistical tests to verify conclusions we found.

As mentioned in limitations section that things change over time, we find that this is a good topic that we can extend our project towards. If we could gather a similar set of data from the present time, we would like to run the same study on it and compare changes in trends.

7.3 Lesson Learned

There are a lot we have learned from working on this project. We are aware that interactive part is very important to engage readers. However, to create an interactive graph that can appropriately demonstrate our intent using D3 takes a lot longer time than we expected. Some of the graphs were dropped from the final version because we find static versions better suited our purpose.

GitHub can be a very useful resource when working collaboratively, but at the same time, it can cause a number of issues that require us some times to fix. We have to overwrite the entire remote master branch to get over some problems.

When we move from EDA to presentation, we find that choosing appropriate forms of graphs is greatly critical. We need to pick presentations for the results that are easy for readers to understand. Nonetheless, we sometimes find that presentations that highlight the results the most are not comprehensive. It is hard to judge where is the right balance between easy-to-read and impactful outcomes.