November 16, 2020, Yokohama, Japan

Data: Acquisition to Analysis

A SenSys/BuildSys 2020 Workshop

Register for the Workshop!

Registration Link

If the link does not work, check SenSys or BuildSys website.


The workshop will be held in virtual format, link will be provided later.

NOTE: Time listed below are in JST

How to Attend:

Welcome + Keynote!

8:00-9:00 (AM, JST) Link to World Clock

Speaker: Tristan Henderson

Full Paper Presentation

9:00-10:00 (AM, JST) Link to World Clock

Designing Privacy-Preserving Data Sharing Middleware for Internet of Things

9:00 - 9:10 (AM,JST)

Sameera Ghayyur, Primal Pappachan, Guoxi Wang, Sharad Mehrotra, Nalini Venkatasubramanian (University of California Irvine)

Lessons from large scale campus deployment

9:15 - 9:25 (AM,JST)

Adhikary Rishiraj, Pachpande Soham, Nipun Batra (IIT Gandhinagar)

The Quest for Raw Signals: A Quality Review of Publicly Available Photoplethysmography Datasets

9:30 - 9:40 (AM,JST)

Florian Wolling, Kristof Van Laerhoven (University of Siegen)


9:40 - 10:00 (AM,JST)

Accepted Dataset Papers:

Dataset: Multi-city Street-Sidewalk Imagery from Pedestrian Mobile Cameras

Shubham Jain (Stony Brook University)
Dataset: DOI

Dataset: Pollen Video Library for Benchmarking Detection, Classification, Tracking and Novelty Detection Tasks

Nam Cao (TU Graz); Matthias Meyer, Lothar Thiele (ETH Zurich); Olga Saukh (TU Graz and CSH Vienna)
Dataset: DOI

Dataset: Person Tracking and Identification using Cameras and Wi-Fi Channel State Information (CSI) from Smartphones

Shiwei Fang (UNC Chapel Hill); Sirajum Munir (Bosch Research and Technology Center); Shahriar Nirjon (UNC Chapel Hill)
Dataset: DOI

Dataset: Toothbrushing Data and Analysis of its Potential Use in Human Activity Recognition Applications

Zawar Hussain, David Waterworth (Macquarie University); Murtadha Aldeer (Rutgers University); Wei Emma Zhang (The University of Adelaide); Quan Z. Sheng (Macquarie University)
Dataset: DOI

Dataset: LoED: The LoRaWAN at the Edge Dataset

Laksh Bhatia, Michael Breza (Imperial College London); Ramona Marfievici (Digital Catapult UK); Julie A. McCann (Imperial College London)
Dataset: DOI

Dataset: A video set of wooden box assembly

Jiayun Zhang (University of California San Diego); Petr Byvshev, Yu Xiao (Aalto University)
Dataset: DOI





Tristan Henderson

Tristan Henderson is a Senior Lecturer (equivalent Associate Professor) in Computer Science at the University of St Andrews in Scotland. He has worked in the broad area of networked communications for two decades. His research aims to better understand user behaviour in networked systems and use this to build improved systems; an approach which has involved measurements and testbeds for networked games, wireless networks, mobile sensors, smartphones, online social networks and opportunistic networks. Most recently his work has moved into the privacy and ethical aspects of such research, which has led in turn to an interest in the law, and how technology, ethics and the law can be jointly used to regulate behaviour.

Tristan holds an MA in Economics from Cambridge University, an MSc and PhD in Computer Science from University College London, and an LLM in Innovation, Technology and the Law from Edinburgh University. He has served on the JANET UK Wireless Advisory Group, the NetGames steering committee, the ACM SIGCOMM Exec and was a JISC Research Data Champion. Perhaps his activity with the highest impact is the co-founding of CRAWDAD, the world's largest wireless network data archive, with over 150 datasets and tools in use by over 11,500 users from 120 countries. For more information, see and

Check Call for Papers for information on submission!


As the enthusiasm for and success of the Internet of Things (IoT), Cyber-Physical Systems (CPS), and Smart Buildings grows, so too does the volume and variety of data collected by these systems. How do we ensure that this data is of high quality, and how do we maximize the utility of collected data such that many projects can benefit from the time, cost, and effort of deployments?

The Data: Acquisition To Analysis (DATA) workshop aims to look broadly at interesting data from interesting sensing systems. The workshop considers problems, solutions, and results from all across the real-world data pipeline. We solicit submissions on unexpected challenges and solutions in the collection of datasets, on new and novel datasets of interest to the community, and on experiences and results—explicitly including negative results—in using prior datasets to develop new insights.

The workshop aims to bring together a community of application researchers and algorithm researchers in the sensing systems and building domains to promote breakthroughs from integration of the generators and users of datasets. The workshop will foster cross-domain understanding by enabling both the understanding of application needs and data collection limitations.


The workshop seeks contributions across two major thrusts, but is open to a broad view of interesting questions around the collection, dissemination, and use of data as well as interesting datasets:

The collection and use of data

  • - Challenges and solutions in data collection, especially around security and privacy
  • - Expectations and norms for data collection from sensor networks, especially those that involve human factors
  • - Novel insights from existing datasets
  • - Metadata management for complex datasets
  • - Synthetic data, including its generation, application, and utility
  • - Success stories—key properties of useful datasets and how to generalize these
  • - Preprocessing, cleaning, and fusing datasets
  • - analysis and visualization of the data
  • - Shortcomings of prior datasets—and how to address these in the future
  • - Position papers on policies and norms from experimental design through data management and use are explicitly welcomed

New and interesting datasets, including but not limited to:

  • - Shopping related sensing data
  • - Animal related data or sensed data
  • - Anonymized health, or synthetic health related data
  • - Indoor localization, especially unprocessed/unfiltered physical layer measurements
  • - Smart building, occupancy, motion data, energy, human comfort, vibration, BIM
  • - Vehicular, GPS, cellular, or wifi traces and remote sensing
  • - Reproductions of prior work that validate, refute, or enhance results
  • - Anonymized contact tracing, interaction and exposure notification data

To enable the longevity of submitted datasets, we plan on providing a central location where a repository for the data, and information about the data can be archived for at least 5 years.

Submission Guidelines

Submissions may range from 1-5 pages in PDF format, excluding references, using the standard ACM conference template. DATA 2020 follows the single-blind review policy. The names and affiliations of all the authors must be present in the submitted manuscript. Submissions are strongly encouraged to use only as much space as needed to clearly convey the significance of the work—we fully expect many submissions, especially datasets, to use only 1-2 pages, but wish to allow those interested in fully elucidating positions on data collection and use or insights from reproducibility efforts ample space to do so. Submissions should use only as much space as necessary to clearly convey their ideas and contributions.

Dataset submissions should prefix paper titles with “Dataset: “ and must include a description of the dataset as well as a reasonable accompanying data sample. Once accepted, a full described dataset must be shared to a public repository by the camera ready deadline. Issues on licenses will be resolved by generally following the procedure similar to CRAWDAD ( and special treatments, if needed, will be discussed separately with the TPC chairs. The dataset submission must submit a link to the dataset at the time of submission.

Datasets will be reviewed by an artifact evaluation committee. To support this, dataset submissions must include:

  • - A link to the full dataset (not just a single sample) at the time of submission
  • - An example analysis or result from the dataset (what kind of insights might folks glean?)
  • - Steps to run an analysis on the dataset, e.g.
    • - A graph and the steps (sample code) to generate the graph
    • - A video demonstrating access and manipulation of the data or execution of queries and results on the data
    • - Other evidence or demonstration of how the dataset can be accessed and used

The evaluation committee will work with sumbitters to ask clarifying questions, etc. The goal is not to be a barrier to submission, but instead to help make sure datasets are usable and useful for folks in the future.

Each accepted submission is required to have at least one author attend the workshop and present to the workshop attendees.

Important Dates

Abstract Registration: September 17, 2020, AOE October 1, 2020, AOE , HotCRP

Submission Deadline: September 24, 2020, AOE October 8, 2020, AOE

Notifications: October 6, 2020, AOE October 19, 2020

Camera-ready: October 16, 2020, AOE October 23, 2020

Workshop: November 16, 2020

Useful links

Submission Site (HotCRP)


Co-Chairs & TPC Chairs

Gabe Fierro University of California, Berkeley

Mostafa Mirshekari Stanford University

Pat Pannuto University of California, Berkeley

Yang Zhao GE Research

Steering Committee

Jie Gao Stony Brook University

Pei Zhang Carnegie Mellon University

Flora Salim RMIT University

Mikkel Baun Kjærgaard University of Southern Denmark

Shijia Pan University of California, Merced

Pat Pannuto University of California, Berkeley

Prabal Dutta University of California, Berkeley

Jie Liu Harbin Institute of Technology

Chien-Chun Ni Yahoo! Research

Haeyoung Noh Carnegie Mellon University


Shiwei Fang University of North Carolina at Chapel Hill

Technical Program Committee

Romain Jacob ETH Zurich

Nipun Batra IIT Gandhinagar

Deepak Vasisht University of Illinois Urbana-Champaign

Rachel Cardell-Oliver University of Western Australia

Zoltan Nagy University of Texas at Austin

Jorge Ortiz Rutgers University

Branden Ghena Northwestern University

Clayton Miller National University of Singapore

M. Hadi Amini Florida International University

Dezhi Hong University of California, San Diego

Rong Zheng McMaster University

Trevor Pering Google

Shiwei Fang University of North Carolina at Chapel Hill

Javad Mohammadi Carnegie Mellon University

Artifact Evaluation Committee

Colleen Josephson Stanford University

Jonathon Fagert Carnegie Mellon University

Adeola Bannis Carnegie Mellon University

Dhiman Sengupta University of California, San Diego

Nishant Bhaskar University of California, San Diego

Yue Zhang University of California, Merced


The 3rd DATA workshop is part of (co-located with) SenSys/BuildSys 2020.

For venue details, visa information, etcetera please visit the SenSys venue page.