From PublicWiki
Jump to: navigation, search

Video Compression

[1] Yang Liu, Zheng Guo Li, and Yeng Chai Soh, Region-of-Interest Based Resource Allocation for Conversational Video Communication of H.264/AVC. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 18, No. 1, January 2008.

[2] Andreas Richter, Mobile videotelephony: Test of 3G telephones. Hjälpmedelsinstitutet (HI) / The Swedish Handicap Institute (SHI), URN-NBN:se:hi-2007-07335-pdf, 2007.

Comments: A review a bunch of swedish mobile cell phones for their use with sign language.

[3] Luca Chittaro, Fabio Buttussi, Daniele Nadalutti, MAge-AniM: a system for visual modeling of embodied agent animations and their replay on mobile devices, Proceedings of the working conference on Advanced visual interfaces (AVI), 2006, 344 - 351.

Comments: Users manipulate an avatar in the many positions it would take to create their end of the conversation and then send the resulting signing avatar video to the receiver who then plays the video on their phone.

Harkins, J., Wolff, A., Korres, E., Foulds, R., Galuska, S., Intelligibility Experiments with a Feature Extraction System Designed to Simulate a Low-Bandwidth Video Telephone for Deaf People, Proceedings of the 14th Annual RESNA Conference, 1991, 38-40.

Comments: Although this paper claims very high intelligibility rates, it does show only a small decline in intelligibility from 30 all the way to 10 fps with bigger drop-off from 10 to 6 fps (with some fps values inbetween). Nice, because it confirms our results as well.

[4] M. D. Manoranjan and John A. Robinson. Practical Low-Cost Visual Communication Using Binary Images for Deaf Sign Language. IEEE TRANSACTIONS ON REHABILITATION ENGINEERING, VOL. 8, NO. 1, MARCH 2000

[5] Series H: Audiovisual and Multimedia Systems Application profile - Sign language and lip-reading real-time conversation using low bit-rate video communication. ITU-T Telecommunication Standard Sector of ITU

[6] Keman Yu, Jiang Li, Tielin He, Yunfeng Lin, Jiangbo Lv and Shipeng Li. Microsoft Portrait: A Real-time Mobile Video Communication System IEEE International Conference on Multimedia & Expo (ICME) 2003, July 6-9, Baltimore, Maryland, demonstration III-2. Webpage

Kaoru Nakazono, Yuji Nagashima, and Mina Terauchi. Evaluation of Effect of Delay on Sign Video Communication. ICCHP 2006, pages 659-666

Comments: Participants were instructed to communicate in sign language via video at varying delay rates while conducting 1 of 5 tasks. Results of acceptable delay range from 100 ms (for a simple task where partners alternate counting numbers) to 200 ms (for a slightly more complex task of alternately summing numbers). Results were not reported for unaided conversation, although participants were instructed to "just talk" for part of the experiment. These numbers indicate more tolerance for delay than voice communication (at 45 ms).

[7] B.F. Johnson and J.K. Caird. The effect of frame rate and video information redundancy on the perceptual learning of American Sign Language gestures. Conference on Human Factors in Computing Systems 1996: 121-122 (University of Calgary, Calgary, Alberta, Canada)

[8] George Ghinea, Johnson P Thomas. QoS Impact on User Perception and Understanding of Multimedia Video Clips. Proceedings of ACM Multimedia 1998: 49-54 (University of Reading, U.K.)

Comments: Users' definition of QoS as based on their ability to understand multimedia videos may be different from the technical definition. Lower framerate may lead to more information gathered because it was seen longer.

[9] Wilson S. Geisler and Jeffrey S. Perry. A real-time foveated multiresolution system for low-bandwidth video communication. SPIE Proceedings Vol. 3299, 1998 (University of Texas, Austin)

Comments: Better resolution where people are likely to look. Plus, some research on how to intelligently segment images to mesh with human visual system.

[10] Laura J. Muir and Iain E. G. Richardson. Perception of Sign Language and Its Application to Visual Communications for Deaf People. Journal of Deaf Studies and Deaf Educationn 2005. Volume 10, Number 4 390-401 (The Robert Gordon University, Aberdeen, United Kingdom)

Comments: Participants were shown three videos where zoom, background, and range of sign was varied. Regardless of these factors, participants viewing patterns were the same, focused around the lower face of the signer. Visual excursions were noticed when the hands came close to the face (close enough to "draw" the eyes away from the face briefly, but fingerspelling was not a factor. Conclusion is that ROI would be an appropriate and useful technique.

[11] Agrafiotis, D., Canagarajah, N., Bull, D.R., Kyle J., Seers H., and Dye, M. A Video Coding System For Sign Language Communication at Low Bit Rates. International Conference on Image Processing 2004, pp 441-444 (University of Bristol, UK)

[12] Andrew P. Bradley and Fred W.M. Stentiford. Visual attention for region of interest coding in JPEG 2000. Journal of Visual Communication and Image Representation 14(3):232-250. (The University of Queensland, Australia)

Comments: Images, not video. Compicated clustering to find relevant ROIs. Difference in regions is just bits/pixel. User studies indicated that users did *not* prefer the ROI images over the uniformly distorted ones! Perhaps the region was not chosen properly, or not all users prefer the same region...

[13] Richard P Schumeyer, Edwin A. Heredia, Kenneth E. Barner. Region of Interest Priority Coding for Sign Language Videoconferencing. IEEE First Workshop on Multimedia Signal Processing, pp. 531--536, (Princeton), 1997. (Applied Science and Engineering Laboratories and Thomson Consumer Electronics)

Comments: ROI of sign language video results in better compression (old paper: using H.261). Results not empirically tested.

[14] Nariman Habili, Cheng-Chew Lim, Alireza Moini. Segmentation of the face and hands in sign language video sequences using color and motion cues. IEEE Trans. Circuits Syst. Video Techn. 14(8): 1086-1097 (2004)

Comments: One of many skin detection algorithms using properties of skin found in the chrominance, motion, and connected components. Just happens to have 'sign language' in the title.

[15][16] George Sperling, Michael Landy, Yoav Cohen, and M. Pavel. Intelligible Encoding of ASL image Sequences at Extremely Low Information Rates. Papers from the second workshop Vol. 13 on Human and Machine Vision II, pp. 256-312, (1986)

[17] Kaoru Nakazono, Yuji Nagahima, and Akira Ichikawa. Digital Encoding Applied to Sign Language Video. IEICE-Transactions on Info and Systems, pp. 1893-1900 (2006)

[18] H. Poizner, U. Bellugi, V. Lutes-Driscoll. Perception of American Sign Language in dynamic point-light displays. Journal of Experimental Pyschology: Human Perception and Performance, pp. 430-440 (1981)