Data Source & Data Quality (ILV)

Course numberM2.08760.11.031
Course codeDSDQ
Semester of degree program Semester 1
Mode of delivery Presence- and Telecourse
ECTS credits5,0
Language of instruction English

Knowledge about existing data sources: from hardware sensors over human sensors to standard web services.
List and describe the primary and secondary techniques of data capture
Explain the principles of data transfer
Differentiate between different data formats
Carry out (advanced) search for digital data
Describe data collection workflows
Analyze practical issues associated with managing data capture projects
Students are able to describe, record and estimate the quality of data and to take into account error propagation in analytical processes.
Students know the basics of quality management in the processing of large amounts of data.
They are familiar with typical problems of heterogeneous data sources and are able to deal with them.

This course will cover the theoretical and practical background of different sensor networks and sensor technologies in the context of acquiring environmental, industrial, scientific, and social data.

  • Sensor network design
  • Acquisition, integration, exchange and dissemination of sensor derived data using standardized Web technologies
  • Difference between primary and secondary data acquisition
  • Identification of required data (application domain and user perspective)
  • National and global data sources, open government data, data catalogues, geodata infrastructures and other data brokers
  • Information harvesting and data collection methods for attributive, spatial and temporal data
  • Data integration processes: evaluation strategies and workflows for cleaning, conversion and integration explained at the conceptual and implementation level (incl. basic data types and models)
  • Setting up and managing a data collection project
  • Data transfer: Norms and standards, format transformations, interoperability (Open API)
  • Data types (dimensions, cross sectional, logitudional, time series data)
  • Metadata, metadata standards, data value
  • Data ethics and legal issues, copyright, and open licenses
  • Introduction to quality management (DIN EN ISO 9001)
  • Aspects of data quality (accuracy, precision, timeliness, completeness)
  • Error estimation, error propagation law
  • Errors and residuals in statistics
  • Experimental uncertainty analysis
  • Measurement uncertainty
  • Quantification of uncertainty

Lecture script as provided in the course (required)
Data Architecture: A Primer for the Data Scientist A Primer for the Data Scientist W.H. Inmon, Daniel Linstedt, Mary Levins 2019
Data Warehousing in the Age of Big Data, Krish Krishnan 2013
Jack E. Olson, Data Quality: The Accuracy Dimension, Morgan Kaufmann, 2002, ISBN-13: 978-1558608917
Russell G. Congalton & Kass Green, Assessing the Accuracy of Remotely Sensed Data: Principles and Practices, CRC PR INC, 2nd edition, 2008, ISBN-13: 978-1420055122
Bevington, Philip R.; Robinson, D. Keith (2002), Data Reduction and Error Analysis for the Physical Sciences (3rd ed.), McGraw-Hill, ISBN 978-0-07-119926-1
Fornasini, Paolo (2008), The uncertainty in physical measurements: an introduction to data analysis in the physics laboratory, Springer, p. 161, ISBN 978-0-387-78649-0

Lecture, guest lectures by specialists, exercises in the computer lab (group size 15 persons), project work and presentations

Immanent examination character:presentation, assignment reports, written/oral exam