Downloads

We are providing the following dataset and models, including both the underlying data as well as pre-trained models so that you can be on your way to understanding hands in your data as soon as possible!

Please be sure to cite our work if you use the models, and if you use the egocentric detector, please be sure to cite the 3 original datasets it was trained on.

Pretrained Models & Codes

We are providing a series of pretrained models:

Category	Links	Details
Full State Detection	[hand_object_detector]	Our github repository (left) contains the hand object detection model trained on 100K. Our system is build on top of this [faster-rcnn.pytorch] system.
Hand Detection	[hand_detector.d2]	Our github repository (left) contains the hand detection model trained on 100K. This is for hand detection only, using [detectron2].

We're interested in data where the model does not work. While many aspects of the system work quite well, we are aware of the following trends:

Occasional false positives with no people.
Issues with left/right in egocentric data (Please see below for egocentric models that work far better).
Difficulty parsing the full state with lots of people.

Please send us any interesting failure modes because we want to make many aspects of understanding hands "just work" out of the box.

Videos, Frame Data, Annotations

We are providing a series of downloads for the dataset:

Category	Links	Details
Video Dataset	file.zip [100K visualization code]	This includes: (1) the video id for all of the videos in 100DOH; (2) the annotations for the frames plus specifications for obtaining the frames.
Frame Cache	raw.zip -- readme.txt pascal_voc_format.zip	This is for non-commercial research purposes only. This is the frames pre-extracted for the dataset, in two formats: (1) raw data; (2) data pre-packaged into PASCAL VOC format.
Shot Detection	shot.zip -- readme.txt	This provides the (1) shot segmentation results, and (2) fine-grained scene segmentation recults, which segment each shot into continuous/stationary/fading classes.

Egocentric Models & Annotations

While 100DOH does contain 1st person views, they're in smaller number compared to 3rd person and typically overhead cameras. We are releasing tools that are trained on our 100K frames plus 56.4K frame subset of [EPIC-Kitchens2018], [EGTEA] and [CharadesEgo]. You must also cite papers of these 3 datasets in any publications if you use the joint model (trained on 100K+ego).

Category	Links	Details
Full State Detection	[hand_object_detector]	Our github repository (left) also provides the model in format for full hand state system, trained on 100DOH + Egocentric data.
Hand Detection	[hand_detector.d2]	Our github repository (left) also provides the model in format for hand detection using detectron2, trained on 100DOH + Egocentric data.
Frame Name	frame_name.zip	Frame names for the data on which they were trained for split purposes.

In practice, we have found that a joint model does at least as well if not better. We are happy to send the annotated data for non-commercial research purposes. Please email us if you need for your research.