Solving Auction Data Set essay in Sydney

Question 1

Download the data set ‘auction data.xls’ from the Assignment folder in the resources section of Interact. The data given in the worksheet tab “Data” show the sales results (as compiled by the Australian Property Manager (APM)) for properties listed for auction in Sydney on Saturday 20 May, 2017. The variables in this data set are: Suburb, Address, Bedrooms, Type, Price, Result and Agent. The key for entries in the variable Type and Result are provided in the worksheet tab ‘Key’.

Identify a variable from the data set which is both
qualitative and nominal;
qualitative and ordinal;
quantitative and ratio;
quantitative and ordinal.

If you believe there is no variable which satisfies both labels then state ‘no variable’

Copy and then complete the table below by matching the correct overall outcome to the appropriate results. Use the information given in the worksheet tab ‘key’ in the data set when completing the table.

Overall Outcomes: Sold prior to the auction, sold at the auction, sold after the auction,

Most real data sets you encounter will contain problem data such as typographical errors, transcription errors, coding errors and possible outliers. This data set is no exception. In a real situation, we would make a note of these anomalies and ask for them to be investigated or checked. Since we cannot contact the owner of this data set, for the purpose of this assignment, we will ignore the anomalies and work with the data as best we can. Read the document ‘working with real data sets.pdf’ found in the Assignment folder, which explains some ways of identifying errors in a data set and how to deal with them.

There are four properties where the number of bedrooms is missing from the data set.
Identify these four properties by Suburb and Address.

John and Julie are property owners in the Sydney region and are planning to sell their four bedroom house over the next few months. They are considering putting it up for auction and are concerned the market may be slowing so are interested in using these data to gain an insight into the current Sydney auction market. Using the complete data set, generate a three way pivot table report of ‘type’ by ‘bedrooms’ by ‘result’. Use ‘type’ and ‘bedrooms’ as row labels.

Hint: Include ‘result’ in both the column and in the body of the table. (3 marks)

Use the data in the pivot table to answer the following vendors’ questions in parts e. and f., about the properties listed for auction in Sydney on 20 May 2017.

Hint: You may find it easier to read the table if you right justify the column labels (along the top of the table).

John and Julie would like to calculate the clearance rate for all properties listed for auction that week.

How many properties were originally listed for auction for the day in question?
How many of these were sold (at auction, prior or after)?
Express the number of properties sold (at auction, prior or after) as a percentage of all properties listed for auction.

Given that they will be selling their four bedroom house soon, they are also interested in the clearance rate of all four bedroom houses that week. They would like to compare the clearance rate of four bedroom houses with the overall clearance rate .

How many four bedroom houses were originally listed for auction for the day in question?

How many of these were sold (at auction, prior or after)?
Express the number of four bedroom houses sold (at auction, prior or after) as a percentage of all four bedroom houses listed for auction.

Was the clearance rate for four bedroom houses worse, the same or better than the clearance rate for all properties that week?

John and Julie would like to compare the sales outcomes of all the properties that were listed for auction that week.

For all the properties that were listed for auction that week, generate a two way pivot table of ‘Type’ by ‘Result’.

Hint: Include ‘Result’ in both the column and in the body of the table. Also, you may find it easier to read the table if you right justify the column labels (along the top of the table).

Use this pivot table to generate a single horizontal 100% component bar chart with type along the vertical axis and the different types of ‘result’ making up the components of each of the four bars. Once generated hide the field buttons and insert an appropriate label on the vertical axis and an appropriate title on the chart.

Use the graph to identify the two types of properties which had approximately the same proportion of properties passed in that week.

Include both the pivot table and the component bar chart generated in parts i. and

with your assignment submission.
No pivot table provided Pivot table generated using Excel but one or more flaws Pivot table generated correctly using Excel

Bar chart notprovided

100% component bar chart generated correctly in Excel but with 3 or 4 errors in chart) missing labels/titles, field buttons not hidden, legend missing)

100% component bar chart generated correctly in Excel but with 1 or 2 errors in chart (missing labels/titles, field buttons not hidden,legend missing) 100% component bar chart generated correctly using Excel with field buttons hidden, appropriate title and label and legend

Sort the data by the ‘type’ variable and then extract the house data only to a separate file. We will use this new file to generate a table of descriptive statistics for the variable ‘price’. We first need to clean up the data set however so we can calculate the correct descriptive statistics. You might like to make a copy of this new file before you start deleting entries just in case you delete an entry you shouldn’t have.

Sort the house data in the new file by ‘result’.
Delete any results for which there are no selling prices.
You will notice that there are some properties which were passed in yet have a price recorded. Each of these is clearly an error since the property was not sold. This price may represent the highest bid, which was below the reserve and hence passed in. In any case since the property was not sold the price is not the selling price of the property.

There are prices associated with the result vendor bid. As these represent bids made by the vendor only, not selling prices,they should also be deleted.

There is one other error in the sold result which must be deleted. You will need to find this before continuing or you may have already deleted it earlier.

Question 1

a)The following are the variables from the data set which fall in the respective categories:
i) Qualitative and Normal variable: Address, Type, Agent and Address
ii) Qualitative and Ordinal variable: no variable

Quantitative and Ratio variable: Price

Quantitative and Ordinal variable: Bedrooms
The overall outcomes are shown in the following table:

Result Code	Overall Outcome
PI, NB, VB	Did not sell
SP, PN	Sold prior to the auction
S,SN	Sold at auction
SA,SS	Sold after auction
W	Withdrawn from sale

c)The four properties where the number of bedrooms is missing are as follows:

Suburbs	Address of the properties
Kirribilli	49/20, Carabella Street
North Sydney	307/54, High Street
Manly	1/19 – 23, Pittwater Road
Darlington	9/299, Abercrombie Street

The two properties of the above four, which are errors, are as follows:

Suburbs	Address	Price	Result
North Sydney	307/54, High Street	NA	PN
Darlington	9/299, Abercrombie St.	NA	PN

The two above properties are errors due to the reasons, discussed as follows:

Both the properties were sold before auction was conducted, which implies at the time of the auction, the properties were no longer available.
The prices of both the properties were not revealed in the data set.

The pivot table for gaining insight into the auction market of Sydney is as follows:

Count of Result	Column Labels
Labels (Row)	PI	PN	S	SN	SP	VB	W	Total
h	36	8	159	28	72	5	5	313
1						1		1
2	2		18	3	6	1		30
3	15	4	60	12	29			120
4	6	4	54	7	27	2	2	102
5	7		22	4	9	1	2	45
6	3		4	1	1		1	10
7	3			1				4
8			1					1
studio			2					2
(blank)			2					2
t type	1	3	15		3	1		23
2		1	3		1			5
3	1	2	10		2	1		16
4			1					1
5			1					1
u	6	5	68	3	46		1	129
1			9		12			21
2	4	3	45	1	30		1	84
3	1		12	2	4			19
4	1		2					3
(blank)		2						2
Total	43	16	244	31	121	6	6	467

The calculations for the clearance rate are as follows:

Based on the results of the above pivot table, it is seen that a total of 467 properties were listed for the auction, originally, on that particular day.
396 properties (244+31+22+121) were sold during, before and after the auction.
The percentage of the properties that were sold as a whole = 396/467, which is equal to 84.8%.
The following discussion shows the comparison, conducted by John and Julie, regarding the clearance rates:
The total number of four bedroom houses, that were listed for the auction, for that particular day originally are as follows:

Therefore, 106 of four bedroom properties were listed in total on that day.

Of the above properties a total of 91 were sold during the auction, before the auction and after the auction was held.

The percentage of the four bedroom properties which were sold = 85.8% (91/106).

The clearance rate of the four bedroom properties, being 85.8%, is higher than that of the clearance rate of the properties, as a whole, listed, which is 84.8%.

i)The two-way pivot table for all the listed properties is as follows:

Result Count	Label (Column)
Labels of Row	PI	PN	S	SN	SP	VB	W	Total
h	36	8	159	28	72	5	5	313
studio			2					2
t	1	3	15		3	1		23
u	6	5	68	3	46		1	129
Total	43	16	244	31	121	6	6	467

ii) Based on the above two-way pivot table, the 100% component bar chart is as follows:

iii) Based on the100% component bar chart, it is observed that township properties and unit/duplex type of properties have almost identical proportions of the properties that passed during that particular week.

i) The descriprive statistics is as follows:
ii) The Median price for Selling was 1566250 dollars and the Standard Deviation for the same was 963279 dollars.

The price of the cheapest of the houses that was sold in that particular week was $428500.

As can be seen from the table, the cheapest property is in San Remo. The property is a flat of three bedrooms.

iv) The sample variance of the above data is as follows:

	Sample Variance
Actual Number Value	927906035169.86
Scientific Notation	9.27906E+11

i)The frequency distribution of the selling prices of the houses that are sold, as mentioned, can be shown as follows:

Selling prices	Frequency
700000	6
1500000	100
2300000	78
3100000	29
3900000	8
4700000	7
5500000	1
6300000	0
7100000	0
7900000	0
8700000	0
9500000	1
Above	0

Based on the above frequency distribution table, the Histogram of the selling prices is as follows:

Instead of using mean price, during the quotation of the prices of the houses in Sydney, the median prices are always considered. This is done because, mean takes all the samples in the data and is extremely affected by the extreme values which are present in the series. However, the median pricing is more appropriate as it shows the 50% point of the data and is not affected by the extreme values unlike mean prices. Therefore, median pricing is more appropriate than ea pricing.

In the above data, it can be seen that the mean price is 1790575 whereas that of the median price is 1566250, which implies that fifty percent of the total observations, are above this value and the rest are below. This observation is supported by the outcome as is shown in the above histogram.

Let X be the event of the number of cars that needs repairing.
P(X=1) = 0.17

P(X=2) = 0.08

P(X>2) = 0.06

Therefore, P(X=0) = 1 – 0.17- 0.08-0.06 = 0.69

This implies that the probability that the car needs no repair is 0.69.

ii) P(X≤1) = 0.69+0.17 = 0.86

This means that the probability that the car chosen will not require more than one repair is 0.86.

iii) P(X≥1) = 1 – 0.69 = 0.31

The result shows that the probability that the car will need some repairs in 0.31.

Given the above table, the mean of cars that are repaired on a random day is as follows:

Mean = 6*0.15 + 8*0.30 + 9*0.23 + 10*0.07 = 7.82

Therefore, approximately 8 cars are repaired on any day.

The Standard Deviation can be seen from the following table:

The Standard Deviation = [P_i(X_i-μ)^2]^1/2 = [84.89]^1/2 = 9.21

The SD is 10 approximately.

The sample is biased due to the following reasons:
The survey is gender biased as it primarily took into account females, thereby biasing the data set.
The survey specifically was conducted for only a single day, ruling out the variations that can occur in the numbers on any other day. It would have been better if the survey took into account more than one (at-least four) days within a specific period as that would have made the results more robust and reliable (Ross, 2014).
i) Paul should sample at least (300/900)*90 male members = 30 male members between the range of 31 years to 50 years.
ii) He should survey (200/900)*90 female members = 20 female members.

In the concerned problem, there are presence of two types of incidents, namely, “Day shift work” and “Workers not turning up”. Let T denotes that the worker is not coming to work and D shows the day shift work. This implies D’ shows the work in night shift and T’ shows that the worker is coming for work.

From the probability tree drawn above, the following results can be derived:

P(D) = 70%, which implies, P(D’) = 30%

P(T/D) = 2%, therefore, P(T’/D) = 98% and as P(T/D’) is 4%, so, P(T’/D’) = 96%.

This implies that the percentage of workers (Day shift) who are absent on a random day = (70*2)% = 1.4% and the same for the Night shift ones is (30*4)% = 1.2%.

Therefore, the total percentage of the worers who are absent = 2.6%.

It is known that, P(D).P(T) = P(D∩T). This implies that absenteeism is not dependant on the working shifts.

Z is assumed to be a standard normal variable:
So, P(Z>0.4) = 1 – P(Z<0.4)
P(-135£Z £25) = 0.8944 – 0.0885 = 0.8059
Probability of him making 30 sales is equal to BINOM.DIST(30, 300, 0.12, FALSE) = 0.042100743.
Probability of >30 sales is equal to 1-BINOM.DIST(31, 300, 0.12, FALSE) = 0.949997651.
Probability of occurrence of <2 collisions within a period of six months is DIST(2, 1.8, TRUE) = 0.730621086.
iThe same for one collision in a period of two months can be given by:

[POISSON.DIST(1, 1.8, FALSE) ] /3 = 0.099179333.

The type of distribution followed by the random variable in the concerned problem is Poisson distribution. It is defined as follows:

Ghasemi & Zahediasl, 2012)

Here, λ signifies the mean. Here, the value of mean is 13/30 = 0.433.

The probability of the company receiving a minimum of 13 calls of emergency within a particular month is:

P(X>=13) = 1 – 0.109939814

So, the probability is 0.890060186.

Here, as λ= 0.433, so, the probability of the company getting more emergency calls than it will be able to handle is P(X>3) = 1 – 0.000675043 which is equal to 0.999324957.

The distribution followed by the random variable in the concerned problem is Binomial distribution, which is: C(n, x) * p^x * q^(n-x) (Ghasemi & Zahediasl, 2012).
Here, the Mean = pn = 0.75*20 = 15.

This implies that in a samlple of 25 random customer, 20 of them will be satisfied = C(25,20)*(0.75)²⁰*(0.25)⁵ = 0.164537588.

The expected dissatisfied customer number (n=50) is 50*0.25 = 13.
The probability that <100 customers out of 150 will be satisfied is 0.013618601
The distribution in the concerned problem is normal distribution. Here, μ signifies the mean and σ shows the SD.
Here, the proportion of tyres that will fail before the warranty period is:

P(X<50000) = P[Z<(50000-55000)/ 2000] = P(Z<-2.5) = 0.621%.

The claims of the company can be evaluated with the help of the following calculations:

P(X>58500) = P[Z>(58500-55000)/2000] = 6.68%.

Based on the above result it can be concluded that the claim of the company is not right.

The probability that the average life-time of the tyres< 54,700 km is as follows:

P(X<54700) = P[Z<(54700-55000)/ {2000/(100)^1/2}] = P(Z<-1.5)

= 0.066807201

= 6.68% (Approximately).

References

Ghasemi, A., & Zahediasl, S. (2012). Normality tests for statistical analysis: a guide for non-statisticians. International journal of endocrinology and metabolism, 10(2), 486.

Ross, S. M. (2014). Introduction to probability models. Academic press.

Cite This Work

To export a reference to this article please select a referencing stye below:

My Assignment Help. (2021). Solving Essay On Auction Data Set In Sydney - Assignment.. Retrieved from https://myassignmenthelp.com/free-samples/qbm117-business-statistics/introductioton-probability-models.html.

"Solving Essay On Auction Data Set In Sydney - Assignment.." My Assignment Help, 2021, https://myassignmenthelp.com/free-samples/qbm117-business-statistics/introductioton-probability-models.html.

My Assignment Help (2021) Solving Essay On Auction Data Set In Sydney - Assignment. [Online]. Available from: https://myassignmenthelp.com/free-samples/qbm117-business-statistics/introductioton-probability-models.html
[Accessed 20 May 2025].

My Assignment Help. 'Solving Essay On Auction Data Set In Sydney - Assignment.' (My Assignment Help, 2021) <https://myassignmenthelp.com/free-samples/qbm117-business-statistics/introductioton-probability-models.html> accessed 20 May 2025.

My Assignment Help. Solving Essay On Auction Data Set In Sydney - Assignment. [Internet]. My Assignment Help. 2021 [cited 20 May 2025]. Available from: https://myassignmenthelp.com/free-samples/qbm117-business-statistics/introductioton-probability-models.html.