Description:
Create cloud simulators for evaluating executions of applications in cloud datacenters with different characteristics and deployment models.
You will describe your design of the map/reduce implementation of the simulation.
You will create a simulation that shows how broadcast storm is created in the cloud.
In this homework, you will experiment with creading cloud computing datacenters and running jobs on them. Of course, creating real cloud computing datacenters takes hundreds of millions of dollars and acres of land and a lot of complicated equipment, and you don't want to spend your money and resources creating physical cloud datacenters for this homework. Instead, we have a cloud simulator, a software package that models the cloud environments and operates different cloud models that we study in the lectures. We will use cloud2sim, a simulation framework that is available from sourceforge. It is an extension of cloudsim, a framework and a set of libraries for modeling and simulating cloud computing infrastructure and services. It is a publically available project in github.
Cloudsim website contains a wealth of information and it is your starting point. It is recommended that you learn more about cloudsim -- you will find an online course on cloudsim and a new resource on cloudsim and your starting point is to download and configure cloudsim and to run examples that are provided in the github repo. Those examples you will find under the section documentation on the main cloudsim website. You will notice that tutorials are a bit dated referring to as older versions of eclipse. You should be able to adjust accordingly to using intellij. Those who want to read more about modeling physical systems and creating simulations can find ample resources on the internet - i recommend the following paper by any maria on introduction to modeling and simulation.
This homework script is written using a retroscripting technique, in which the homework outlines are generally and loosely drawn, and the individual students improvise to create the implementation that fits their refined objectives. In doing so, students are expected to stay within the basic requirements of the homework and they are free to experiments. Asking questions is important, so please ask away at piazza!
Once you installed and configured cloud2sim (and/or cloudsim), your job is to run examples supplied with the frameworks to perform two or more simulations where you will evaluate two or more datacenters with different characteristics (e.g., operating systems, costs, devices) and policies. Imagine that you are a cloud computing broker and you purchase computing time in bulk from different cloud providers and you sell this time to your customers, so that they can execute their jobs, i.e., cloudlets on the infrastructure of these cloud providers that have different policies and constraints. As a broker, your job is to buy the computing time cheaply and sell it at a good markup. One way to achieve it is to take cloudlets from your customers and estimate how long they will execute. Then you charge for executing cloudlets some fixed fee that represent your cost of resources summarily. Some cloudlets may execute longer than you expected, the other execute faster. If your revenue exceeds your expenses for buying the cloud computing time in bulk, you are in business, otherwise, you will go bankrupt!
There are different policies that datacenters can use for allocating virtual machines (vms) to hosts, scheduling them for executions on those hosts, determining how network bandwidth is provisioned, and for scheduling cloudlets to execute on different vms. Randomly assigning these cloudlets to different datacenters may result in situation where the executions of these cloudlets are inefficient and they takes a long time. As a result, you exhaust your supply of the purchased cloud time and you may have to refund the money to your customers, since you cannot fulfil the agreement, and you will go bankrupt. Modeling and simulating the executions of cloudlets in your clouds may help you chose a proper model for your business.
Once you installed and configured cloud2sim and ran its examples, your next job will be to create simulations where you will evaluate a large cloud provider with many datacenters with different characteristics (e.g., operating systems, costs, devices) and policies. You will form a stream of jobs, dynamically, and feed them into your simulation. You will design your own datacenter with your own network switches and network links. You can organize cloudlets into tasks to accomplish the same job (e.g., a map reduce job where some cloudlets represent mappers and the other cloudlets represent reducers). There are different policies that datacenters can use for allocating virtual machines (vms) to hosts, scheduling them for executions on those hosts, determining how network bandwidth is provisioned, and for scheduling cloudlets to execute on different vms. Randomly assigning these cloudlets to different datacenters may result in situation where the execution is inefficient and takes a long time. Using a cleverer algorithm like assigning tasks to specific clusters where the data is located may lead to more efficient cloud provider services.
Consider a snippet of the code below from one of the examples of using cloud2sim. In it, a network cloud datacenter is created with network hardware that is used to organize hosts in a connected network. Vms can exchange packets/messages using a chosen network topology. Depending on your simulation construct, you may view different levels of performances.
Your homework can be divided roughly into five steps. First, you learn how cloud2sim organized and what your building blocks are. I suggest that you load the source code of cloud2sim plus into intellij and explore its classes, interfaces, and dependencies. Second, you design your own cloud provider organization down to rack/cluster organization as we will study in the lecture on cloud infrastructure. You will add various policies and load balancing heuristics like randomly allocating tasks to machines or using data locality to guide the task allocation. Next, you will create an implementation of the simulation(s) of your cloud provider using cloud2sim. Fourth, you will run multiple simulations with different parameters, statistically analyze the results and report them in your documentation with explanations why some cloud architectures are more efficient than the others in your simulations.
Your absolute minimum gradeable baseline project can be based on the examples that come from the repo cloud2sim. To be considered for grading, your project should include at least one of your simulation programs written in java, your project should be buildable using the sbt or the gradle, and your documentation must specify how you create and evaluate your simulated clouds based on the cloud models that we learn in the class. Your documentation must include the results of your simulation, the measurement of the runtime parameters of the simulator (e.g., cpu and ram utilization) and your explanations of how these results help you with your simulation objectives (e.g., choose the right cloud model and configuration). Simply copying java programs from examples and modifying them a bit (e.g., rename some variables) will result in desk-rejecting your submission.
You can post questions and replies, statements, comments, discussion, etc. On piazza. For this homework, feel free to share your ideas, mistakes, code fragments, commands from scripts, and some of your technical solutions with the rest of the class, and you can ask and advise others using piazza on where resources and sample programs can be found on the internet, how to resolve dependencies and configuration issues. When posting question and answers on piazza, please select the appropriate folder, i.e., hw1 to ensure that all discussion threads can be easily located. Active participants and problem solvers will receive bonuses from the big brother :- who is watching your exchanges on piazza (i.e., your class instructor and your ta). However, you must not describe your simulation or specific details related how your construct your models!
This is an individual homework. Separate repositories will be created for each of your homeworks and for the course project. You will find a corresponding entry for this homework at you will fork this repository and your fork will be private, no one else besides you, the ta and your course instructor will have access to your fork. Please remember to grant a read access to your repository to your ta and your instructor. In future, for the team homeworks and the course project, you should grant the write access to your forkmates, but not for this homework. You can commit and push your code as many times as you want. Your code will not be visible and it should not be visible to other students (except for your forkmates for a team project, but not for this homework). When you push the code into the remote repo, your instructor and the tas will see your code in your separate private fork. Making your fork public or inviting other students to join your fork for an individual homework will result in losing your grade. For grading, only the latest push timed before the deadline will be considered. If you push after the deadline, your grade for the homework will be zero. For more information about using the git and bitbucket specifically, please use this link as the starting point. For those of you who struggle with the git, i recommend a book by ryan hodson on ry's git tutorial. The other book called pro git is written by scott chacon and ben straub and published by apress and it is freely available. There are multiple videos on youtube that go into details of the git organization and use.
Please follow this naming convention while submitting your work : "Firstname_lastname_hw1" Without quotes, where you specify your first and last names exactly as you are registered with the university system, so that we can easily recognize your submission. I repeat, make sure that you will give both your ta and the course instructor the read/write access to your private forked repository so that we can leave the file feedback.txt with the explanation of the grade assigned to your homework.
As it is mentioned above, you can post questions and replies, statements, comments, discussion, etc. On piazza. Remember that you cannot share your code and your solutions privately, but you can ask and advise others using piazza and stackoverflow or some other developer networks where resources and sample programs can be found on the internet, how to resolve dependencies and configuration issues. Yet, your implementation should be your own and you cannot share it. Alternatively, you cannot copy and paste someone else's implementation and put your name on it. Your submissions will be checked for plagiarism. Copying code from your classmates or from some sites on the internet will result in severe academic penalties up to the termination of your enrollment in the university. When posting question and answers on piazza, please select the appropriate folder, i.e., hw1 to ensure that all discussion threads can be easily located.