BIG DATA

Degree course: 
Corso di First cycle degree in COMPUTER SCIENCE
Academic year when starting the degree: 
2023/2024
Year: 
2
Academic year in which the course will be held: 
2024/2025
Course type: 
Compulsory subjects, characteristic of the class
Language: 
Italian
Credits: 
6
Period: 
Second semester
Standard lectures hours: 
48
Detail of lecture’s hours: 
Lesson (48 hours)
Requirements: 

Knowledge of a programming language (Java, Python), Databases, and a good understanding of the written English language, which allows access to a large amount of educational material, publications, manuals, programs, etc., which are available on the subject, are appropriate.

The objective of the exam (in oral mode) is to ascertain the acquisition of the knowledge and skills described in the section "Course objectives", evaluating the level of knowledge and above all the ability to put into practice, even integrating them, the techniques and contents seen in class.

The oral test focuses on ascertaining the ability to synthesize the theoretical knowledge acquired, with particular regard to the ability to identify the theoretical elements to be used in the design and processing of big data.

The oral test will complement the evaluation of the knowledge and skills acquired with the various practical tasks to be carried out from time to time during the course. The oral exam has a weight of 3 points to be added / subtracted to the final evaluation obtained from the practical tasks and projects (each task assigns a maximum of 2 or 3 points for a total of 28 points.)

The vote is expressed out of thirty.

Assessment: 
Voto Finale

The course aims to illustrate the conceptual and applicative elements of the big data sector, open data, and statistical and Artificial Intelligence methodologies for data analysis and for the generation of knowledge starting from the processing of the infinite amount of data that surrounds us.

The course aims to provide the basic skills related to:
1. identification of the data sources to be used;
2. to the manipulation of data;
3. to the generation of knowledge starting from data.

The student, therefore, acquires knowledge and understanding in reference to:
1. the characteristics of big data both at the hardware, infrastructural, and software level;
2. the methods of processing big data, through real-time information analysis models;
3. the graphic and visual representation of the data.

The skills and abilities acquired by the student at the end of the course are:
1. ability to identify the most appropriate data sources;
2. ability to design infrastructure solutions and data processing software;
3. ability to apply the most appropriate models of data analysis.

Introduction to Big Data, definitions, and the 5V model 2 ore
Examples and Case Studies: Scenarios and Applicative domains for Big Data (ability 1) 2 ore
Use of data in the context of open data, e-government, and open-gov (ability 1) 4 ore
Introduction to Visualization of Data: tools, approaches, and technologies to design and create interactive dashboards to organize and represent data (ability 2) 4 ore
Social and Citizen Data Science: what is and which kind of data sources to use (Facebook API, Twitter API, Instagram API) (ability 1) 4 ore
Introduction to infrastructural technologies to manage and store Big Data (ability 2) 2 ore
Models of Big Data (based on predictive statistical approaches, Machine Learning, and Artificial Intelligence) (ability 3) 10 ore
Practical development of Big Data Solutions (abilities 1,2,3) 20 ore
Totale 48 ore

See the Section "Contents"

Convenzionale

The course is divided into lectures (48 hours) divided as detailed in the Course Contents section. The lectures must give students all the tools to then understand and apply the theoretical aspects learned in practical and real contexts. The lessons will therefore have an extremely practical cut and will see the application of theoretical aspects on continuous case studies. In the classroom, practical tasks will be solved in small work teams, to put into practice what they have learned and to allow students to make a self-assessment of their work and their preparation.

The student's personal commitment to autonomous re-elaboration is calibrated on the canonical value of a total of 25 hours per CFU.

The slides of the lectures in PDF format are made available on the University e-learning platform, together with scientific articles and specific literature on the topic (reports, datasets, etc.). It is therefore not necessary to have a textbook.
Optionally, you can buy:
- Steven S. Skiena, The Data Science Design Manual, 2017.
- A.Clerici et al., Learning Python Vol1 e 2. EGEA

The teacher receives by appointment, upon request via e-mail to davide.tosi@uninsubria.it. The teacher only replies to e-mails signed and from the students.uninsubria.it domain.

Professors

Borrowers