Learning non-maximum suppression

J. Hosang, R. Benenson, B. Schiele

CVPR 2017 (spotlight, 8% acceptance rate)

Object detectors have hugely profited from moving towards an end-to-end learning paradigm: proposals, features, and the classifier becoming one neural network improved results two-fold on general object detection. One indispensable component is non-maximum suppression (NMS), a post-processing algorithm responsible for merging all detections that belong to the same object. The de facto standard NMS algorithm is still fully hand-crafted, suspiciously simple, and — being based on greedy clustering with a fixed distance threshold — forces a trade-off between recall and precision. We propose a new network architecture designed to perform NMS, using only boxes and their score. We report experiments for person detection on PETS and for general object categories on the COCO dataset. Our approach shows promise providing improved localization and occlusion handling.

In our GCPR 2016 paper we worked on a way to overcome the limitations of the de-facto standard non-maximum suppression algorithm, which still used the classic NMS algorithm as an input. This work proposes a network that can perform the task without additional help.

@inproceedings{Hosang2017cvpr,
  Author = {Jan Hosang and Rodrigo Benenson and Bernt Schiele},
  Title = {Learning non-maximum suppression},
  Year = {2017},
  booktitle = {CVPR}
}

Simple does it: Weakly supervised instance and semantic segmentation

A. Khoreva, R. Benenson, J. Hosang, M. Hein, B. Schiele

CVPR 2017

Semantic labelling and instance segmentation are two tasks that require particularly costly annotations. Starting from weak supervision in the form of bounding box detection annotations, we propose to recursively train a convnet such that outputs are improved after each iteration. We explore which aspects affect the recursive training, and which is the most suitable box-guided segmentation to use as initialisation. Our results improve significantly over previously reported ones, even when using rectangles as rough initialisation. Overall, our weak supervision approach reaches ~95% of the quality of the fully supervised model, both for semantic labelling and instance segmentation.

@inproceedings{Khoreva2017cvpr,
  Author = {Anna Khoreva and Rodrigo Benenson and Jan Hosang and Matthias Hein and Bernt Schiele},
  Title = {Simple does it: Weakly supervised instance and semantic segmentation},
  Year = {2017},
  booktitle = {CVPR}
}

A convnet for non-maximum suppression

J. Hosang, R. Benenson, B. Schiele

GCPR 2016 (oral presentation)

Standard non-maximum suppression is a greedy clustering with a fixed distance threshold, which forces to trade-off recall versus precision. A high distance threshold merges different objects in crowded scenes, while a low threshold generates duplicate detections for a single object.

We propose a learnable alternative. By posing the problem as a rescoring task, we use a matching loss and joint rescoring of neighboring detections to learn a network that performs non-maximum suppression.

@inproceedings{Hosang2016Gcpr,
  Author = {Jan Hosang and Rodrigo Benenson and Bernt Schiele},
  Title = {A Convnet for Non-Maximum Suppression},
  Year = {2016},
  booktitle = {GCPR}
}

How far are we from solving pedestrian detection?

S. Zhang, R. Benenson, M. Omran, J. Hosang, B. Schiele

CVPR 2016

We evaluate a human baseline for Caltech pedestrian detection, analyse the weak areas of current detectors, push detection performance, and provide new improved the dataset annotations.

@inproceedings{Zhang2016Cvpr,
  Author = {Shanshan Zhang and Rodrigo Benenson and Mohamed Omran and Jan Hosang and Bernt Schiele},
  Title = {How Far are We from Solving Pedestrian Detection?},
  Year = {2016},
  booktitle = {CVPR}
}

What makes for effective detection proposals?

J. Hosang, R. Benenson, P. Dollár, B. Schiele

TPAMI 2015

Detection proposals allow to avoid exhaustive sliding window search across images, while keeping high detection quality. We provide an in depth analysis of proposal methods regarding recall, repeatability, and impact on DPM and R-CNN detector performance.

We introduce a novel metric, the average recall (AR), which rewards both high recall and good localisation and correlates surprisingly well with detector performance. Our findings show common strengths and weaknesses of existing methods, and provide insights and metrics for selecting and tuning proposal methods.

@ARTICLE{Hosang2015pami,
  author = {J. Hosang and R. Benenson and P. Dollár and B. Schiele},
  title = {What makes for effective detection proposals?},
  journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year = {2015}
}

Taking a deeper look at pedestrians

J. Hosang, M. Omran, R. Benenson, B. Schiele

CVPR 2015

We show that convolutional networks can reach better results for pedestrian detection than previously reported. We provide new best reported results (on Caltech-USA) under different training regimes (1x, 10x, and ImageNet pre-training).

@INPROCEEDINGS{Hosang2015cvpr,
  author = {J. Hosang and R. Benenson and M. Omran and and B. Schiele},
  title = {Taking a deeper look at pedestrians},
  booktitle = {CVPR},
  year = {2015}
}

GyroPen: Gyroscopes for Pen-Input with Mobile Phones

T. Deselaers, D. Keysers, J. Hosang, H. Rowley

IEEE Transactions on Human-Machine Systems 2014

Mobile phones can do so much and still it is quite tedious to type on phones. Instead of using your fingers to type or handwrite on the phone's touch screen, we propose to use the phone like a pen on paper. The inertial sensors in your phone capture your phone's movements and reconstruct the path that the phone drew on paper. A handwriting recognition software then recognizes the text you wrote.

@ARTICLE{Deselaers2014Thms,
  author = {T. Deselaers and D. Keysers and J. Hosang and H. Rowley},
  title = {GyroPen: Gyroscopes for Pen-Input with Mobile Phones},
  journal = {IEEE Transactions on Human-Machine Systems},
  year = {2014}
}

Ten years of pedestrian detection, what have we learned?

R. Benenson, M. Omran, J. Hosang, B. Schiele

ECCV 2014, CVRSUAD workshop

We review and discuss the 40+ detectors currently present in the Caltech pedestrian detection benchmark. We identify the strategies that have paid-off, and the ones that still have to show their promise.

By combining existing approaches we study their complementarity and report the current best known performance on the challenging Caltech-USA dataset.

@INPROCEEDINGS{Benenson2014eccvw,
  author = {R. Benenson and M. Omran and J. Hosang and and B. Schiele},
  title = {Ten years of pedestrian detection, what have we learned?},
  booktitle = {ECCV, CVRSUAD workshop},
  year = {2014}
}

How good are detection proposals, really?

J. Hosang, R. Benenson, B. Schiele

BMVC 2014 (oral presentation, 7% acceptance rate)

Detection proposals allow to avoiding exhaustive sliding window search across images, while keeping high detection quality. We provide an in depth analysis of proposal methods regarding recall, repeatability, and impact on DPM detector performance.

Our findings show common weaknesses of existing methods, and provide insights to choose the most adequate method for different settings.

@INPROCEEDINGS{Hosang2014Bmvc,
  author = {J. Hosang and R. Benenson and B. Schiele},
  title = {How good are detection proposals, really?},
  booktitle = {BMVC},
  year = {2014}
}

Towards Large-Scale Categorization Using Min-Hash

J. Hosang

Diploma Thesis, 2011, RWTH Aachen University

Min-Hash promises the retrieval of highly similar images in constant time, i.e. independent of the number of indexed images. In this work, we investigate the potential of Min-Hash for near neighbor search in a large-scale categorization setting. We evaluate Min-Hash and several extensions on the ImageNet Large Scale Visual Recognition Challenge 2010 using nearest neighbor categorization. Extensions include tf-idf weighting, permutation grouping, and Geometric Min-Hash as well as a novel generalization of Geometric Min-Hash for small vocabularies.

@MASTERSTHESIS{Hosang2011Thesis,
  author = {J. Hosang},
  title = {Towards Large-Scale Categorization Using Min-Hash},
  school = {RWTH Aachen University},
  year = {2011}
}

An Evaluation of Two Automatic Landmark Building Discovery Algorithms for City Reconstruction

T. Weyand, J. Hosang, B. Leibe

ECCV 2010, RMLE workshop

We compare two state-of-the-art landmark mining algorithms: spectral clustering and min-hash. Furthermore, we introduce a new large-scale dataset for the evaluation of landmark mining algorithms consisting of 500k images from the inner city of Paris. We evaluate both algorithms on the well-known Oxford dataset and our Paris dataset and give a detailed comparison of the clustering quality and computation time of the algorithms.

@INPROCEEDINGS{Weyand2010Eccvw,
  author = {T. Weyand and J. Hosang and B. Leibe},
  title = {An Evaluation of Two Automatic Landmark Building Discovery Algorithms for City Reconstruction},
  booktitle = {ECCV, RMLE workshop},
  year = {2010}
}

Data-Mining-Cup 2007

C. Buck, T. Gass, A. Hannig, J. Hosang, S. Jonas, J.-T. Peter, P. Steingrube and J. H. Ziegeldorf

Informatik-Spektrum 2008

The task of the Data-Mining-Cup 2007 was to develop an automatic method to efficiently control a discount couponing system.

Our experiments showed that a single classifier does not solve this task sufficiently, but massive ensembles of various classifiers are able to cope with the problem. By combining up to 2,000 classifiers, we solved the task very successfully: Six out of nine submissions ranked among the top ten, our other three submissions ranked among the top twenty of 230.

@article {Buck2008Spektrum,
  title = {Data-Mining-Cup 2007},
  journal = {Informatik-Spektrum},
  volume = {31},
  number = {6},
  year = {2008},
  month = {December},
  pages = {591{\textendash}599},
  publisher = {Springer},
  author = {Buck, Christian and Gass, Tobias and Hannig, Andreas and Hosang, Jan and Jonas, Stephan and Peter, Jan-Thorsten and Steingrube, Pascal and Ziegeldorf, Jan Hendrik}
}