Textbook:
[1] Szeliski, Richard. Computer vision: algorithms and applications. Springer Science & Business Media, 2010.
Papers:
[2] Krishna, Ranjay, et al. "Visual genome: Connecting language and vision using crowdsourced dense image annotations." International Journal of Computer Vision 123.1 (2017): 32-73.
[3] Kuznetsova, Alina, et al. "The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale." arXiv preprint arXiv:1811.00982 (2018).
[4] Lu, Cewu, et al. "Visual relationship detection with language priors." European Conference on Computer Vision. Springer, Cham, 2016.
[4] Torralba, Antonio, and Alexei A. Efros. "Unbiased look at dataset bias." CVPR. Vol. 1. No. 2. 2011.
[5] Sadeghi, Mohammad Amin, and Ali Farhadi. "Recognition using visual phrases." CVPR 2011. IEEE, 2011.
[6] Dai, Bo, Yuqi Zhang, and Dahua Lin. "Detecting visual relationships with deep relational networks." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.
[7] Xu, Danfei, et al. "Scene graph generation by iterative message passing." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.
[8] Peyre, Julia, et al. "Weakly-supervised learning of visual relations." Proceedings of the IEEE International Conference on Computer Vision. 2017.
[9] Goyal, Yash, et al. "Making the V in VQA matter: Elevating the role of image understanding in Visual Question Answering." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.
[10] Zhang, Peng, et al. "Yin and yang: Balancing and answering binary visual questions." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
[11] Attenberg, Josh M., Pagagiotis G. Ipeirotis, and Foster Provost. "Beat the machine: Challenging workers to find the unknown unknowns." Workshops at the Twenty-Fifth AAAI Conference on Artificial Intelligence. 2011.