The project is a group effort covering an aspect of computation for information science in depth. The basic thrust of the project is to complete a mini- computational study utilizing one or more of the techniques (i.e., probability, statistics, graph theory, etc.) discussed in class. A variety of projects are possible and each group should turn in a FINAL REPORT documenting the project. The FINAL Report should contain the following information:
(a)introduction - giving background on the project;
(b)goals/problem statement - explaining the purpose of the study;
(c)results - detailing the methods used and the results; and
1)Presentation – coherent well written paper with appropriate figures and references 25%
2)Depth of analysis (using techniques discussed in class and more advanced) 70%
3)Progress Selection + Progress Report 5%
Some project suggestions are given below – please feel free to propose a project (A good source of local data is WPRDC)
1.Benchmarking Computational Tools (e.g., Random Number Generators)
Mathematical based computational software (Matlab, R, Excel, etc.) have a number of a built in computational tools. However, the implementation and algorithms used can vary depending on the tool. For example the random number generators used in various software can be quite different. Perform a comparative evaluation of at least three different random number generators from different software tools – evaluate from a statistical, computational and subjective standpoint.
The annual MIT Sloan Sports Analytics conference regularly has a number of sports focused research papers using techniques discussed in class (Basic Stats, Regression, Graph theory, Probabilistic Analysis, etc.), similar research papers appear at other conferences and journals (e.g. ACM ASONAM,PLOS, etc.). Repeat the analysis in one of the papers or try repeating with a different dataset or updated data set.
For example, the paper Graphical Model of Basketball, could be updated with more recent data or a different team or using a WNBA team to contrast with.
There are many other options, such as Markov models for basketball, tennis etc. or statistical analysis of datasets (Bigdataball, Kaggle, Basketball reference, NBA Data in R, are sources of data)
3.Broadband Internet Access Analysis
Ookla maintains an open database of consumer Internet connection speed test. Repeat the Github R example county by county analysis of broadband in Kentucky but for a different state Pennsylvania, West Virginia, etc.
4.Analysis of Impact of Covid on Housing Markets
Redfin and Zillow provide housing market datasets – look at the housing market data in the Pittsburgh metro area and see if one can build a model correlating sales and prices with Covid outbreaks (using data available from the county health department by zipcode) and/or other factors (e.g., poverty level/, walkability index, unemployment level, per zipcode) or identify nearby Zoom towns where younger incoming residents are buying houses.
In a fashion similar to the sport analytics above – graph theory, differential equation models, Markov Models and statistics have been used to model cybersecurity. Reproduce the results of a research paper using one of the techniques discussed in class in the cybersecurity domain (e.g., Mathematical modeling of the propagation of malware: A review by Martín del Rey, A. et.al., Markov Models of Cyber Kill Chains with Iterations, by R. Hoffmann, Attack Countermeasure Trees(ACT): towards unifying the constructs of attack and defense trees, by A. Roy, et.al.)
6.Network Reliability and Weather Models
Analyze the data from the community WiFI network Pittmesh operated by Metamesh and determine time periods when the performance is poor – try to correlate with national weather service data. Same idea with Pitt ISS satellite data and space weather from NASA.
Analyze EPA and/or SmellPGH data for Pittsburgh metro area and determine the most environmentally stressed areas – does it correlate with population, demographic data (poverty rate, unemployment rate, education levels, etc.)
8.Network Topology Analysis
Consider some real communication network topologies (http://www.topology-zoo.org/, CAIDA Internet Topology Datakit, SNDlib ) and analyze the topology using graph theory metrics such as network degree, network centrality, betweeness centrality, etc.