Monday to Thursday of my trip were dominated by the Electronic Imaging 2011 conference, though I got to do other things as well on Thursday. This was a huge international conference, with something close to a thousand participants from all over the world, representing technology companies, research centres, and universities. As mentioned, I was there with three colleagues from Canon Information Systems Research Australia, and there were also other people from Canon USA as well as from Océ in France, which was recently acquired by Canon. And there were representatives from many of the big players in digital image technology, such as Sony, Hewlett Packard, Microsoft, Nikon, IBM, and so on, as well as Internet technology companies like Facebook and Google.
Electronic Imaging is actually an umbrella conference containing a dozen or so sub-conferences, all taking place simultaneously in the same venue. I was giving my paper in the Digital Photography stream, but there were also streams with names like: Stereoscopic Displays and Applications; Human Vision and Electronic Imaging; Computer Vision and Image Analysis of Art; Real-Time Image and Video Processing; Intelligent Robots and Computer Vision; and Multimedia on Mobile Devices. The various streams used one of the hotel’s many conference facility rooms for their oral paper presentations. Some of the rooms were small, holding only 30 or so people, others mid-sized, holding maybe 60 or 70 people. And then there was the main ballroom, which was decked out with a stage and seating for about 500 people by my estimate. All of these rooms were used simultaneously for various presentations, and you needed to juggle which of the dozen or so concurrent talks to you wanted to see most. I stayed with the Digital Photography stuff mostly, but this stream ended on the Tuesday, and another stream took over its meeting room on the last two days, so I had the chance to move around and sample some of the stuff being presented in the other streams.
By far the overwhelmingly biggest streams were those dedicated to 3D image technology. The 3D streams together monopolised the enormous ballroom for the entire four days of the conference. Whatever you want to say about the state of digital image technology today, it’s clear that by far most of the research interest and money is in 3D video, including TV, cinema, and 3D gaming technology. It became very obvious to me that the media technology companies like Sony, LG, Panasonic, Samsung, Philips, Toshiba, etc. are absolutely pouring money into research in this field.
It’s easy to sit back as a consumer and say that 3D TV and cinema sucks, because you have to wear dorky glasses and it hurts your eyes and gives you a headache and the films are crap anyway and the 3D effect is more distracting than anything. I suspect a lot of people think that the tech companies are trying to push it on consumers simply to force us to buy new hardware, without caring about how good the technology actually is. I have to say this is completely wrong. They are painfully aware that current 3D technology is not good enough. Most of the talks about 3D tech that I saw began from the premise that the current tech isn’t good enough, and that many viewers actively dislike it. They went so far as to say that the current stereoscopic technologies that rely on glasses are simply not good enough to make 3D video a mainstream technology. Besides the glasses thing, the fact is that all current 3D tech presents views to each eye based on the separation of the eyes being horizontal. And that simply fails as soon as a viewer tilts their head, causing the eyes to have to track separately up and down to cause the stereoscopic view to merge – which is unnatural and causes eye strain every quickly. One speaker said that head tilting is currently the biggest technical problem faced by 3D display technology, and until that can be fixed somehow, 3D is always going to give people headaches.
Besides all the technological aspects, there were many physiologists, ophthalmologists, and doctors present too, presenting studies of the physical effects of watching 3D video. The conclusions were blunt: Current 3D TV technology causes painful eye strain far faster than any conventional TV display. It’s a real problem, it needs to be fixed, and the companies who make 3D TVs are very aware of it. And then there were discussions of how to make content that you actually want to watch in 3D. Several speakers pointed out that cinematographers need to be trained in how to use 3D effectively. Companies like Disney/Pixar with experience and strong technical expertise are able to make 3D CGI films that use the effect subtly and immersively, building a 3D world that looks natural rather than jarring. On the other hand you have studios and directors with no experience suddenly making 3D content and simply going bananas, producing schlock that simply uses the 3D as a gimmick, making it painfully obvious when you watch the stuff. And there are so many ways to make 3D content wrong that people without some training don’t really have a chance to make it right.
Beyond the acknowledgement that current 3D tech (and much of the content) simply sucks, there were many ideas for improvement. A speaker from MIT pointed out that current “3D” should really be called “stereoscopic”, since it’s not actually 3D at all. The physical behaviour of the light rays from a stereoscopic TV doesn’t match the behaviour from an actual 3D object, and this is the root of the problem. But this can be addressed with new technology now being worked on. Lightfield displays are ones in which you can control not merely the colour and brightness of every pixel, but also the colour and brightness of that pixel as viewed from every possible angle. This allows you to reconstruct the four-dimensional field of light (the x, y position of every photon plus the 2D angle at which it is travelling) as though it were coming from an actual 3D object. Such a display would produce a true 3D experience to viewers at any angle, with their head tilted in any direction, without requiring glasses. These things don’t exist yet, but there are plenty of researchers working on ways to make them real, and the MIT guy gave some examples of how you might approach it with a multi-layered active LCD matrix.
The impression I got is that these sorts of technologies will start to become viable within the next 10 years or so, and then trickle down into consumer products. At which point we can all dump our 3D glasses TVs and finally get some decent 3D display technology. And hopefully by then the film directors will have figured out how to use it without being stupid about it too. I’m very sceptical about the current 3D technology, and have no desire to get anything that requires glasses or uses only the stereoscopic principle to simulate 3D depths. But if the researchers can get the true lightfield synthesised at high enough resolution, that will be an amazing breakthrough, and I honestly think it will transform our video watching habits in as far-reaching a way as going from silent movies to talkies. And because it’ll be indistinguishable from the photons produced by real 3D objects, there’ll be no chance for any eye strain effects. It’ll be identical to seeing real 3D objects.
Enough of that – it wasn’t even what I went to the conference for! My bit was about Digital Photography, and Andrew also presented a paper in that stream (Chris presented a paper in a different stream). Both our papers were very well received. Andrew won the award for best paper in the Digital Photography stream, and the conference chair spoke to me later in the conference and said that my paper was the runner-up in their considerations for the prize! So we did very well there. My paper was about measuring the optical transfer function of a lens, basically a quantitative measure of how much the lens blurs out an image. After I gave my talk, Norman Koren from Imatest and one of his researchers came to talk to me, saying they really liked our technique. That was cool! I met several other interesting and influential people in the field, which was great.
Some other highlights of talks that I went to:
A really impressive talk by a woman from MIT who was doing research into how much information you can perceive in your peripheral vision. She came up with a technique of scrambling an image so that when you look at it directly, you get very close to the same information from it that you get when you see the original image only in your peripheral vision. An example is a photo of a person. In your peripheral vision, you can tell that a human face is male or female, and approximately how old the person is, but you can’t detect or even notice details like if an eye and the mouth are swapped in position! More concretely, for something like a map, her scrambling algorithm preserves features like thick lines that represent roads or whatever, but messes up text. Using volunteer viewers, she showed that in your peripheral vision you can detect where a thick road on a map goes, but can’t read the labels. It sounds basic, but this is just a simple example – the stuff she was showing got a lot more complicated than this. The point is that people who do stuff like design user interfaces or maps can use this to make it easy to distinguish the most important information at a glance – useful for designing things like the interface for an in-car navigation system, for example, where you want to make sure the user can get the info they need without having to stare at the display.
A guy from Facebook gave a talk about face detection technology they are working on, which not merely detects human faces, but then analyses which faces appear together in which photos, and which faces somewhat resemble one another. Given a few hundred photos, the system can infer things like family relationships. he demonstrated it up to the point where upon adding a photo to your collection, it might say to you, “Hey, I see you have a photo of yourself and your brother here. Do you want to send a copy to your brother?” Someone in the audience asked the question, “Wouldn’t people find this creepy?” The speaker said some people would, certainly, and if they implemented it on Facebook they’d definitely make it opt-in only. But he also said that young users these days are developing very different ideas about privacy to older generations and may well embrace something like this.
A talk about using multi-spectral imaging to decipher the diaries of Dr David Livingstone, the famous African explorer. While in Africa he ran out of paper and ink, so he made his own ink from berries and whatnot, and wrote on used paper, either stuff he’d already used, or newsprint, writing at 90 degrees to the existing writing. The ink soaked through so it appeared on both sides almost equally dark. This, combined with the other writing, made reading what he has written virtually impossible. By imaging the material using a dozen different wavelengths of light and then processing them in various ways, you can eliminate the newsprint and writing on the opposite side of the paper, rendering the text easily legible. At the end of the talk, which was all about the technical details, I asked if this was the first time the diaries had been read. The speaker said that in the late 19th century when the diaries were first returned to Europe, a guy at some museum took on the job of transcribing them. It took years of painstaking work, virtually letter by letter, with a magnifying glass. The results were published, and since then everyone has used that as the canonical source. But this new work has revealed that the original transcriber had made lots of errors!
A talk by a Ph.D. student about forensic identification of camera lenses, based on their chromatic aberration. This involves examining a digital photo and analysing the coloured fringes around various edges, and from that figuring out what lens was used to take the photo. Not just the model of the lens, but the exact lens – since even lenses of the same model have minute variations in their construction that effect the amounts and characteristics of the chromatic aberration they cause.
An idea for testing people’s eyesight using something like a mobile phone with a 3D display. The principle is based on standard testing of lens aberrations using a dot pattern, which gets distorted in various ways in three dimensions (x, y, and focus) when passed through an imperfect lens, The point is that if you project a 3D pattern of dots into someone’s eye, then the dots should merge into a clear pattern on the retina, and be seen as perfectly aligned, sharp dots. (You’d need to hold the 3D phone display right up close to your eye to do this.) If the eye lens has some refractive errors, the user can manipulate a set of controls on the phone until the dots all appear to line up. The resulting settings on the controls can then be read off directly as an eyeglass or contact lens prescription! Basically, this guy was saying that as soon as mobile phones have 3D displays, optometrists will become redundant – anyone can download an app and test their own eyesight with at least as much accuracy as a formal optometry test.
There was much, much more, and much of it was very interesting and potentially transformative. All together, it was a fantastic conference, and very exciting to attend and see what the world of digital imaging research is up to.
Besides the work of the conference, there was of course socialising and eating. Each day for lunch I took the short 15-minute walk into the nearby suburb of Burlingame, where I tried a different place each time. On Monday I went into a “Mediterranean” cafe, which was basically Lebanese, run by a family obviously from that part of the world, making all the dishes familiar from similar establishments here in Sydney: felafels, kebabs, shawermas, tabouli, baba ganoush, etc. I had a lamb shawerma, and it was really delicious. Later in the week, Geoff pointed out that there were several “Mediterranean” places in Burlingame, none of which corresponded to typical use of the word “Mediterranean” on restaurants that we’re used to – indicating Italian or Greek cuisine. They were all more accurately described as Middle Eastern restaurants. In fact Geoff determined that one of them was owned by a guy from Jordan – which isn’t even on the Mediterranean at all! We figured that declaring yourself as Middle Eastern in the current climate in the USA may probably be bad for business, so they’re going for this “Mediterranean” euphemism.
On Tuesday I simply had a Subway sandwich for lunch. On Wednesday I had a curry at a Thai place. The Broadway street of Burlingame had a good selection of different choices for food, which was nice. It also provided dinner for us on Monday and Tuesday. On Monday we had a Canon group dinner, a total of ten of us from Australia, the US, and France. We went to a steakhouse, where most of us ordered various cuts of beef from a menu that read cryptically to me. I merely chose the smallest one, since it came with a bunch of other stuff – bread, mashed potatoes, corn, and something I can’t remember the name of, and “English something”, which turned out to be sort of an eggy soufflé/brioche thing. On Tuesday the four of us from Australia went to a Japanese barbecue place, where we cooked our own dinner on a grill in the middle of the table. This was nicer than the steak place. On Wednesday we had the conference reception, which had lots of nice food: antipasto, Chinese dumplings, salads, stir fried noodles, chicken skewers, mini hamburgers, and other stuff. There was a bar open (first drink free for registered conference attendees) with local wines and beers. With the food and the socialising of professionals it was a very pleasant evening.
I don’t mind the new 3D technology, though the old gave me bad headaches right away. I have a friend who hates it, though. And I hate movies that just poke stuff at you. I agree that Disney/Pixar’s movies have been quite good use of the technology. I’m fascinated with the research going on… And that “cellphone optometrist”, would that work with astigmatism, too?
As for “Mediterranean” vs “Middle Eastern”, I think the usage way predates 9/11. Greek restaurants are just called “GreeK”, always have been.
Jordan is not [i]on[/i] the Mediterranean, but it’s close enough that the cuisine is going to be near-identical. If there were some country buried inside Sicily like the Vatican is inside Rome, you wouldn’t object to it having Mediterranean food; Jordan is about as close to the Mediterranean as such a country could be.