This project may be done in groups of size 2-3. We will have higher standards for those working in groups of 3, but either way we expect it to be a substantial project on which you devote significant effort. It's difficult to quantify "significant effort" but there's a grading rubric that can clarify this. We are not looking for a perfect project, instead we are interested to see how you identify some of the challenges and come up with strategies to handle them. You may include your failed attempts in the report, what you learned from them, and how you improved your work to get into your final submission.
Use techniques and tools such as (but not limited to) those covered in class to manipulate, analyse, visualize, and possibly build regression (or K-NN) models with the data in order to achieve your objectives. Some of you will end up developing a data processing pipeline, where in each step you transform or otherwise manipulate some or all of your data to get it into the best form to answer your questions or otherwise achieve your objectives. Some of you will end up building models that may help with decision making and prediction.
In many cases the early steps in a pipeline are more about preparing the data correcting mistakes, filling in missing values, creating consistent representations, mapping corresponding values while the later steps are more focused on summarization and analysis. If you use one of the recommended datasets, your preparation steps may be minimal.
You can focus on few interesting business questions about your dataset. Then, explore your data and build visualization dashboard or perform some modeling (on Power BI, Tableau – You can use Excel for some analysis [not more than 25% of your work] ). The number of questions is completely up to you and depends on the dataset you choose. Some questions are more challenging to answer than others. The recommended number of questions would be in the 2-5 range. Below is a list of sample business questions. Please note that this list is provided just to give you a rough idea. You can think of your own unique business question depends on the dataset you’re working on.
Human Resources
• What do our employees value most at work? (You could even segment this into top 10% of performers; by department; by role, and so on.)
• What do our longest-tenured/best-performing employees have in common?
• What do employees who leave within 1-2 years have in common?
• How effective are our onboarding programs? Training programs?
• How can we spot a valuable employee at risk of leaving?
• What percentage of employees are disengaged from the organization and from their work?
• What combination of compensation and bonuses will most motivate performance?
Outcomes: Attract and retain the best talent, maximize productivity and employee satisfaction, pinpoint recruitment efforts, reduce churn expenses.
Supply Chain
• Where are the biggest holdups in paperwork and procurement?
• What are the most significant causes of delay?
• Which inspection errors occur most frequently?
• How resilient is the chain to external forces? How can we prepare?
• What hidden inefficiencies can we find and correct?
• Can any steps be eliminated?
• What are the biggest opportunities for additional supply
• What do our employees value most at work? (You could even segment this into top 10% of performers; by department; by role, and so on.)
• What do our longest-tenured/best-performing employees have in common?
• What do employees who leave within 1-2 years have in common?
• How effective are our onboarding programs? Training programs?
• How can we spot a valuable employee at risk of leaving?
• What percentage of employees are disengaged from the organization and from their work?
• What combination of compensation and bonuses will most motivate performance?