Course Coordinator:Damian Hills (dhills1@usc.edu.au) School:School of Science, Technology and Engineering
UniSC Southbank |
Blended learning | Most of your course is on campus but you may be able to do some components of this course online. |
Online |
Online | You can do this course without coming onto campus, unless your program has specified a mandatory onsite requirement. |
Please go to unisc.edu.au for up to date information on the
teaching sessions and campuses where this course is usually offered.
This course examines big data processing and analysis, using a modern framework such as Hadoop or Apache Spark. You will learn how to build data processing tools that can run on cloud computing systems and can scale up to process massive data sets. You will apply these skills to build tools that can generate business insights.
| Activity | Hours | Beginning Week | Frequency |
| Blended learning | |||
| Online – Pre-recorded concept videos and associated activity | 1hr | Not applicable | 12 times |
| Tutorial/Workshop 1 – On campus tutorial | 2hrs | Not applicable | 11 times |
| Online | |||
| Online – Pre-recorded concept videos and associated activity | 1hr | Not applicable | 12 times |
| Tutorial/Workshop 1 – Interactive zoom tutorial | 2hrs | Not applicable | 11 times |
Spark Runtime and RDD
Pair RDD and Files
DataFrame and SparkSQL
Hadoop
MapReduce
Parallel Computing
Machine Learning with Spark
Advanced Spark Programming
700 Level (Specialised)
12 units
| Course Learning Outcomes On successful completion of this course, you should be able to... | Graduate Qualities Completing these tasks successfully will contribute to you becoming... | |
| 1 | Design and build programs that can load, transform, analyse and store big data using cloud computing techniques. |
Knowledgeable Creative and critical thinker |
| 2 | Apply data mining, analysis and visualisation techniques to big data to gain business insights. |
Creative and critical thinker Empowered |
| 3 | Research and apply theory and practice of scalable distributed data analysis within the discipline. |
Knowledgeable Empowered |
| 4 | Demonstrate and justify the use of big data analysis skills to develop innovative solutions to business problems. |
Creative and critical thinker Engaged |
| 5 | Demonstrate critical and creative thinking to identify and solve complex business problems and arrive at innovative solutions. | Creative and critical thinker |
Refer to the UniSC Glossary of terms for definitions of “pre-requisites, co-requisites and anti-requisites”.
ICT705 and ICT706 and enrolled in a Postgraduate Program
Not applicable
Not applicable
Not applicable
Not applicable
Standard Grading (GRD)
| High Distinction (HD), Distinction (DN), Credit (CR), Pass (PS), Fail (FL). |
Task 1 is a test involving basic concepts, principles, and skills of data science practice, which will be the basis for the understanding of Spark programming.
| Delivery mode | Task No. | Assessment Product | Individual or Group | Weighting % | What is the duration / length? | When should I submit? | Where should I submit it? |
| All | 1 | Examination - not Centrally Scheduled | Individual | 20% | 60min |
Week 5 | Online Test (Quiz) |
| All | 2 | Examination - not Centrally Scheduled | Individual | 50% | 90min |
Week 9 | Online Assignment Submission with plagiarism check |
| All | 3 | Artefact - Technical and Scientific, and Written Piece | Individual | 30% | Big data analysis + 1,000 word report |
Week 12 | Online Assignment Submission with plagiarism check |
| All - Assessment Task 1:Big data test | |||||||||||||
| Goal: | To build your knowledge of big-data processing skills and problem-solving techniques. |
||||||||||||
| Product: | Examination - not Centrally Scheduled | ||||||||||||
| Authorship Statement: | |||||||||||||
| Format: | Coding test based on the content of Week 1 – 4. This task will help to build your knowledge of basic Spark programming. Further details of this assessment will be given on Blackboard. |
||||||||||||
| Criteria: |
|
||||||||||||
| Generic Skills: | |||||||||||||
| All - Assessment Task 2:Mid-semester test | |||||||||||||
| Goal: | To demonstrate understanding of the theory and practice of scalable distributed data analysis. |
||||||||||||
| Product: | Examination - not Centrally Scheduled | ||||||||||||
| Authorship Statement: | |||||||||||||
| Format: | This is an individual assessment. Answer a set of questions about big data analysis theory and practice |
||||||||||||
| Criteria: |
|
||||||||||||
| Generic Skills: | |||||||||||||
| All - Assessment Task 3:Big data assignment | ||||||||||||||||
| Goal: | To demonstrate a comprehensive view of big data analysis in terms of definitions and concepts, techniques, and producing big-data solutions to business problems. |
|||||||||||||||
| Product: | Artefact - Technical and Scientific, and Written Piece | |||||||||||||||
| Authorship Statement: | ||||||||||||||||
| Format: | A program that uses big-data analysis techniques to solve a business problem, plus a report (1000 words) describing and justifying the design of that program. |
|||||||||||||||
| Criteria: |
|
|||||||||||||||
| Generic Skills: | ||||||||||||||||
A 12-unit course will have total of 150 learning hours which will include directed study hours (including online if required), self-directed learning and completion of assessable tasks. Student workload is calculated at 12.5 learning hours per one unit.
Please note: Course information, including specific information of recommended readings, learning activities, resources, weekly readings, etc. are available on the course Canvas site– Please log in as soon as possible.
You need regular access to the resource(s) below. Many texts are available as ebooks through the Library at no additional cost.
| Required? | Author | Year | Title | Edition | Publisher |
| Required | Holden Karau, Andy Konwinski, Patrick Wendell and Matei Zaharia | 2015 | Learning Spark: Lightning-fast data analysis | O'Reilly Media, Inc |
You must have a computer (Desktop or Laptop) that you can install Python and Spark software on, in order to be able to practice the programming skills outside lecture and workshop times.
Academic integrity is the ethical standard of university participation. It ensures that students graduate as a result of proving they are competent in their discipline. This is integral in maintaining the value of academic qualifications. Each industry has expectations and standards of the skills and knowledge within that discipline and these are reflected in assessment.
Academic integrity means that you do not engage in any activity that is considered to be academic fraud; including plagiarism, collusion or outsourcing any part of any assessment item to any other person. You are expected to be honest and ethical by completing all work yourself and indicating in your work which ideas and information were developed by you and which were taken from others. You cannot provide your assessment work to others. You are also expected to provide evidence of wide and critical reading, usually by using appropriate academic references.
In order to minimise incidents of academic fraud, this course may require that some of its assessment tasks, when submitted to Canvas, are electronically checked through Turnitin. This software allows for text comparisons to be made between your submitted assessment item and all other work to which Turnitin has access.
For more information on Academic Learning & Teaching categories including:
For more information, visit https://www.usc.edu.au/explore/policies-and-procedures#academic-learning-and-teaching
UniSC is committed to excellence in teaching, research and engagement in an environment that is inclusive, inspiring, safe and respectful. The Student Charter sets out what students can expect from the University, and what in turn is expected of students, to achieve these outcomes.