WO2009068942A1 - Method and system for processing of images - Google Patents

Method and system for processing of images Download PDF

Info

Publication number
WO2009068942A1
WO2009068942A1 PCT/IB2008/001369 IB2008001369W WO2009068942A1 WO 2009068942 A1 WO2009068942 A1 WO 2009068942A1 IB 2008001369 W IB2008001369 W IB 2008001369W WO 2009068942 A1 WO2009068942 A1 WO 2009068942A1
Authority
WO
WIPO (PCT)
Prior art keywords
images
image
mesh
camera
dimensional
Prior art date
Application number
PCT/IB2008/001369
Other languages
French (fr)
Inventor
Stéphane Jean-Louis JACOB
Original Assignee
Dooworks Fz Co
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dooworks Fz Co filed Critical Dooworks Fz Co
Publication of WO2009068942A1 publication Critical patent/WO2009068942A1/en

Links

Classifications

    • G06T5/80
    • G06T3/08
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/04Texture mapping
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/10Geometric effects
    • G06T15/20Perspective computation
    • G06T15/205Image-based rendering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/261Image signal generators with monoscopic-to-stereoscopic image conversion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/2628Alteration of picture size, shape, position or orientation, e.g. zooming, rotation, rolling, perspective, translation

Definitions

  • This invention relates to processing of images and in particular, to the manipulation and display of images derived from very wide or 360° views.
  • Such images can be from computer- generated images or from real world images recorded using a video camera.
  • 3D Real-time is a 3D engine software that can compute and display in real-time all the elements needed in a 3D world, such as meshes, texture, light, shadow.
  • the rendering unit generates images at the desired FPS (Frame per Second), each frame being computed and rendered. For example, depending on the complexity of the 3D world and its components, 30 FPS means that the scene is rendered 30 times per seconds, giving the end user a smooth illusion of movement in that 3D world.
  • the FPS in this case is relative to the rendering process in the 3D engine. This technology provides full interactivity to the 3D environment, but is poor in quality.
  • 3D Rendering relates to video sequences that are preliminarily rendered into a 3D software and are simply displayed using a video player.
  • the 3D video is then composed with sequences of frames (basically 25 Frames Per Seconds FPS).
  • the rendering process is computed in advance using technologies such as Ray-Tracing or Radiosity, not allowed in 3D Realtime. However, in this arrangement, the end user does not have any control of the movie.
  • the FPS in this case is relative to the final video. This technology provides ultra high quality, but doesn't provide any interactivity control.
  • 360 Still Navigable is a 3D image rendered in any 3D rendering software using Equirectangular settings for virtual camera parameters.
  • the concept of a virtual camera is well understood in the computer graphics field to refer to an image view computed by the software as if it had 'seen' the scene from a certain coordinate location.
  • the image is then displayed using software that corrects and dewarps in Real-time the Equirectangular image to allow the end user to have a corrected image on the screen.
  • the end user has Pan, Tilt and Zoom (PTZ) control on that static frame enabling a limited degree of control of the static frame that is displayed
  • This technology combines ultra high quality, but provides a limited interactivity. In particular, only still images can be viewed.
  • the invention involves a new two-stage process involving first, pre-rendering pixels of an image using a mesh and second, playing the sequence of frames derived from the mesh.
  • the pre-rendering is arranged such that it is then computationally simple to convert the image data mapped onto the mesh into video images with user selectable parameters such as pan, tilt and zoom.
  • the mesh is arranged so that the conversion from the image mapped on the mesh into a 2D representation does not require interpolation.
  • the mesh is arranged such that a conceptual virtual camera, if arranged at a central point within the mesh, would see an equal spacing of image pixels for presentation on a 2D display.
  • the computationally intensive step of pre- rending can therefore be done in non-Real-time and allows a video player to play the images in Real-time whilst allowing user selectable parameters, such as pan, tilt or other degrees of freedom and zoom to also be chosen in Real-time.
  • the image source may be video, for example coming from a real time camera stream, recorded file or computer generated.
  • the mesh is comprised of multiple meshes that have the shape of one, or multiple camera lenses.
  • the computer generated images are rendered in equirectangular format by rendering software and then mapped onto a spherical mesh.
  • Figure 1 is an overview of the whole image creation, pre-rendering and video play out process;
  • Figure 2 shows the step of producing an image on a mesh for a single image;
  • Figure 3 shows examples of images resulting from the process
  • Figure 4 shows the process steps for an embodiment using wide angle lens capture
  • Figures 5a, 5b and 5c show plan, side and perspective views of an 3D or real environment to be derived using the embodiment of Figure 4;
  • Figure 6 is an example of a fish eye image acquired from the images of Figures 5a - 5c using the embodiment of Figure 4;
  • Figure 7 shows example meshes
  • Figures 8a and 8b show the conversion between a hemisphere and a hemi-cube
  • Figure 9 shows a resulting image with the scene of Figures 5a - 5c projected onto a hemi-cube lens
  • Figure 10 shows examples of a mesh used with a fish eye lens to correct for distortion.
  • the invention may be embodied in an image processing system. It is preferred that the invention is embodied into separate systems, namely a pre-rendering system, which operates an embodiment of a first aspect of the invention and a video player, which operates an embodiment of a second aspect of the invention.
  • a pre-rendering system which operates an embodiment of a first aspect of the invention
  • a video player which operates an embodiment of a second aspect of the invention.
  • Each aspect may be embodied in specific hardware designed to operate the process, such as a specific digital signal processor or other hardware arrangement.
  • the embodiments may be implemented in software.
  • the embodiment of the invention undertakes the computationally expensive process of converting each frame of an image sequence, in such a way that it is then computationally simple to allow a user to select parameters such as pan, tilt and zoom in Real-time. Accordingly, the end result is an image sequence representing a camera (whether virtual or real) moving through space, which can be manipulated by a user to pan, tilt or zoom in different directions. Accordingly, a user may select different pan, tilt and zoom settings and step forwards and backwards through the image sequence in time.
  • the computationally very expensive process of 3D Real-time rendering the user would not be able to cause the apparent virtual camera to move to different locations in space.
  • pan and tilt are examples of degrees of freedom which are the set of independent displacements that specify completely the displaced or deformed position of a body or system.
  • a rigid body in d-dimensions has d(d+1)/2 degrees of freedom (d translations + d(d-1)/2 rotations). It can be reasoned that rotational freedom is the same as fixing a coordinate frame.
  • the first axis of the new frame is unrestricted, except that it has to have the same scale as the original, so it has (d-1) DOFs.
  • the second axis has to be orthogonal to the first, so it has (d-2) DOFs.
  • a non-rigid or deformable body may be thought of as a collection of many minute particles (infinite number of DOFs); this is often approximated by a finite DOF system.
  • a deformable body may be approximated as a rigid body (or even a particle) in order to simplify the analysis.
  • the 6 degrees of freedom may be described as:
  • Pan equates to yawing and that tilt equates to pitching.
  • the zoom function is not a DoF.
  • the system also has the moving DoF 1-3.
  • references to pan and tilit should be treated as references also tO other possible degrees of freedom.
  • the embodiment allows interactive ultra high 3D Rendering video sequences to be visualised using standard computer processing power available in the market.
  • the embodiment combines pan, tilt zoom and time interactive functions on the content, keeping an original frame rate, as generated with the 3D Rendering application or captured from the real world by camera.
  • sequences may be rendered using the graphics card of a commercially available PC instead of requiring use of the main PC processor or a dedicated processor.
  • the pre-rendering and video play out may be performed on image data acquired from any source. However, it is particularly suited, but not limited, to data acquired by a camera using a wide or ultra wide angle lens such as a fish eye lens. Such lenses can produce a view of up to 360° and it is desirable to be able to select a portion of such an image for further viewing. Thus, the ability to pan, tilt and zoom is most important.
  • the principal idea is to create a 3D world into a 3D software at 10 and to pre-render it at 20 using particular parameters, and to display them into a 3D world at 30 under the control of user defined pan, tilt and zoom parameters input into the system via a graphical user interface at 40.
  • the process is able to match the final frame rate displayed at 50 to the frame rate created 10.
  • the 3D world 10 may be created using any known 3D graphic design software.
  • the 3D world 10 is composed with multiple elements to create the illusion of volume and reality.
  • Each object is composed with meshes, textures, lights, movement. Then, a virtual camera is located in that environment.
  • the virtual camera parameters are then set up and sent to the renderer 20.
  • the renderer 20 has parameters set up, such as resolution, frame rate with number of frames, and the type of renderer algorithm. Any suitable algorithm may be used. Known algorithms include Phong, Gouraud, Ray Tracing and Radiosity. For the avoidance of doubt, the term rendering applies to the processing of computer generated images.
  • the renderer operates on either CG images or images acquired from other sources such as a video camera.
  • the main camera and renderer common parameter is to set the view projection as equirectangular, with an image ratio to 2:1. This parameter may be changed according to user preference.
  • the image sequence is rendered and saved as an images file sequence or a video file.
  • the resulting file or files are then converted as a texture file or files. This is a pre-rendering step.
  • a virtual camera is then located in the centre of that sphere.
  • the mapping onto the mesh is the key step that saves computation time. This is a rendering step.
  • the virtual camera parameters can be modified in real time. For example, the focal length can be very short, providing a wide angle view.
  • the mesh sphere, the texture and the virtual camera are combined into software giving the end user control the pan, tilt and zoom function of the virtual camera.
  • This combination has a 3D frame rate (rendered in real-time).
  • the texture has a 2D frame rate (pre-rendered).
  • the graphic user interface 40 generates and sends the pan, tilt and zoom parameters to the software virtual camera 30 and then allows the view to be generated and sent to the general display at step 50. Basically, all this sequence is determined into a loop. That loop determines the 3D frame rate.
  • the 2D frame rate is still continuing during all the sequence.
  • the 3D frame rate continues until the end user stops the application.
  • the software application can embed normal movie player's functions such as pause, rewind, forward and stop. These functions act on the 2D frame rate.
  • Figure 2 shows the key steps in the process and the available view as a result.
  • a 3D view of the world as first created by any known 3D graphic techniques and rendered into a 2D equirectangular view.
  • the equirectangular views are part of a sequence of 2D representations with a 2D frame rate being a sequence representing a view as a camera tracks through a scene.
  • the camera is, of course, a virtual camera in this example as the image is computer-generated.
  • the key step in software is then taking the equirectangular image frames and mapping these onto a mesh, in this case a spherical mesh, in such a way that a virtual camera located at the centre of that mesh views a non- distorted view of the image in any direction.
  • a camera with given pan, tilt and zoom parameters, will view a portion of the scene represented on the mesh substantially non-distorted.
  • the user can alter the pan, tilt and zoom parameters of the virtual camera, in any direction, to select a portion of the scene to view.
  • the user also has control of the whole sequence of frames, the user is able to step forwards or backwards, in the manner of re-winding or playing, as a virtual video player.
  • FIG 2 at (a) an equirectangular image is provided. For simplicity this is shown as a single image although in practice it can be a sequence of images at a real time frame rate.
  • the 3-D mesh is produced and at (c) texture mapping techniques are used to map the equirectangular image onto the mesh. The remaining parts of the figure show how a portion of the spherical image may be selected for viewing substantially without distortion.
  • Figure 2 (d) is a side view of the 3D world, that is the image mapped onto the sphere. A virtual camera is positioned at the centre of the sphere, that is, in X Y Z co-ordinates (0, 0, 0).
  • the camera can 'see' a portion of the 3D image, with the aspect ratio being set in advance and the actual portion of the 3D image seen being determined by user selected controls including pan, tilt and zoom.
  • Figure 2 (d) shows a front view of the same 3D world, this time showing the image selected by the camera in the X Y directions.
  • Figure 2 (f) shows the same view as Figure 2 (e), but with the selected image portion shown within the camera frame.
  • the virtual camera is a tool for selecting a defined portion of the image rendered onto the 3D mesh for viewing.
  • Figure 3 shows the same example scene, showing how the user can pan around each image of an image frame sequence, to select portions of the image.
  • FIGs 3a and 3b the full equirectangular image is shown with a different user window selected.
  • this window is to the left of centre.
  • the window is to the right of centre.
  • the window is the portion of the 3D image that is selected for viewing by the virtual camera.
  • the window is moved around the equirectangular image by the pan and tilt function.
  • this function is operating to select an area of the image projected onto the 3D mesh, but the user will see the equirectangular image on his display and move the window around the displayed equirectangular image.
  • the zoom function is illustrated.
  • the user window occupies a considerably smaller portion of the equirectangular image resulting in a small part of the image filling the display than in the Figure 3(a) and 3(b) examples, giving a zoom function.
  • the 3D world onto which the equirectangular image is mapped is a sphere. This is a convenient shape and particularly suited to mapping computer generated images. However, the choice of 3D shape is defined by the system provider. In the more detailed example which follows, the images are not computer generated but are acquired using a fish eye lens and camera. In this case it is appropriate to use a mesh which approximates the fish eye lens. Thus, a
  • the mesh may be adjusted to compensate for optical distortions in the lens as shown later with respect to Figure 10.
  • the present invention is not limited to any particular mesh shape, although for a given input, the correct choice of mesh is key to outputting good distortion free images.
  • the image source is video
  • the equirectangular images of the computer graphics example are replaced by mapped texture images.
  • the pre-rendering system will first be described followed by the video player system.
  • the pre-rendering process takes frames of image data and maps these onto a mesh in such a manner that the subsequent process for displaying the video is computationally simple.
  • the key feature of the pre-rendering process is the choice of mesh.
  • the mesh arrangement is chosen so that if a theoretical virtual camera were located at the centre of the mesh, the view of the camera in any direction would give a substantially non- distorted two-dimensional view of the image. If the original image representing a wide-angle or 360° view is computer generated, then the mesh itself can be calculated.
  • the mesh may be empirically determined by calculation from captured test images, as described later.
  • the images are captured from a real environment using cameras.
  • the camera arrangement could be a single wide-angled lens camera, but is preferably at least a pair of cameras with each having a lens with a 180° view.
  • an optional zoom, pan, tilt and zoom camera may also be included.
  • a real environment is captured using a camera system 60.
  • the camera system includes two XHD cameras with ultra wide angle lens. These two cameras are disposed back-to-back, preferably with their focal axes parallel, or ideally identical. This system is one of the possible ways to provide a full 360 vision. In Figure 4 this camera is shown as dual fish eye capture 62.
  • these cameras are combined with one other camera 64 with a lens having the capability to provide mechanical zoom. That camera is controlled with standard PTZ functions.
  • the protocol used to control that camera could be the PELCO protocol which aims to reduce time delays in IP based system which is desirable to improve operator control.
  • the PTZ controls are managed and sent through a software application 90 which governs behaviour in the 3D world.
  • further cameras could be linked to that system. For example, a night vision camera. As a minimum, a single camera is used, but the preferred embodiment uses two 180° cameras. The option to include a further camera is shown at 66.
  • the multiple streams of image data from the camera system may be transmitted to a converter 30.
  • the converter operates on the XHD format data from the camera and may be any suitable digital video codec. However, it is preferred to use the systems described in our co-pending applications GB 0709711 and GB 0718015.
  • a virtual camera is then located in the centre of that mesh.
  • the 3D mesh used depends on what kind of focal length used in the cameras and the characteristics of the lenses. A method of empirically determining the mesh is described later.
  • the mesh generation and the mapping of the image represented as a texture map onto the mesh, followed by the selection of a portion of the 3D image by the virtual camera is performed at 90 under the control of the virtual camera PTZ controls 92 input from the user GUI 96.
  • the 3D world may store a number of pre-generated meshes including a sphere, a hemisphere, a cube, a hemi-cube and a spline. This is not an exhaustive list and, as mentioned above, the shape may be chosen according to the input characteristics such as the camera lens. However, this mesh is pre-calculated and does not need to be generated on the fly as the system is operating in real time.
  • the portion of the image from the 3D world is then passed to a user display 100.
  • Figures 5a - 5c show schematically how the system captures and processes images for display.
  • the camera and fish eye lens may be positioned to form an image for mapping onto a mesh.
  • the image here, includes a number of components, namely a car 110, a flower 112, an aeroplane 114, a house 116, the sun 118 and a tree 120.
  • Figure 5a shows a plan of the camera, the fish eye lens and image components
  • Figure 5b shows a side view of the camera
  • Figure 5c is an isometric view.
  • An image of the components captured in any resolution is taken with a fish eye lens.
  • a mapping 80 is performed such that the image is mapped onto a 3D mesh. This allows a virtual camera, located at the focal point of the 3D mesh, to view a non-distorted view in any direction.
  • the camera represents the real position of the camera in the real world.
  • Figure 6 shows the first stage of this process.
  • An image of the components is captured using wide-angle lens camera.
  • This may be referred to as the input image and, as already explained, could be an image from a wide-angle lens such as a fish eye lens, including a 360° lens arrangement or could be a computer generated image. In either case, the image represents a wide field of view.
  • the embodiment of the invention is particularly advantageous in that a sequence of frames representing a moving image can easily be pre-rendered using the embodiment of the invention.
  • Figure 6 shows a fish eye view of the image contents of Figure 5 with each component distorted by an amount which is dependent on the distance from the axis of the camera as is usual with a fish eye lens.
  • Figure 7 shows a mesh in the form of a hemisphere.
  • a mesh in graphics processing is a group of polygons connected together by shared vertices.
  • the mesh is used to map co-ordinates of the pixels of the input to conceptual 3D locations.
  • a mesh is thus a conceptual arrangement describing the arrangement of pixels of an image on a 3D surface.
  • a mesh is a theoretical arrangement, rather than a physical arrangement, and represents the manner in which data is stored for each frame.
  • the example on Figure 7 is a theoretical perfect hemispherical mesh, which would be used assuming a perfect 180° lens. Generation of meshes is well known on the computer graphics industry.
  • the mesh may be defined using one of a number of known techniques such as calculating and storing a list of the vertices that make up each polygon in the mesh, or by using an edge array where each edge consists of its vertices. This technique may be enhanced by also storing the polygons that share a given edge. All these techniques describes a 3D object as a polygon mesh such as that shown in Figure 7.
  • FIGs 8a, b and 9 illustrate the problem of distortion which is present particularly when using fish eye lenses.
  • the fish eye lens is represented as a hemi cube conversion shown in Figure 8b.
  • a hemi-cube is a well known representation of a fish eye image.
  • the front of the fish eye image becomes the front of the hemi-cube and the remaining annulus of the fish eye images is divided into four equal rectangles shown as the top, bottom, left and right sides of the hemi-cube.
  • Figure 9 shows how the image of Figure 6 may be represented using the cube conversion of Figure 8b. It can be seen that there is distortion at the edges of the shape, see for example the aeroplane which appears curved. This distortion needs to be compensated for in the mesh, as is described later.
  • texture mapping is used to map the image data onto the chosen 3D mesh.
  • Surface mapping techniques such as texture mapping are well known in the computer graphics industry. Texture mapping techniques take into account the fact that real life surfaces are not just coloured but have textures and patterns. Many also have small or large surface displacements such as bumps or gouges. Texture mapping simulates these surfaces to make images more realistic. There are three well known techniques for creating the impression of natural colours, textures and appearances : texture mapping, bump mapping and displacement mapping.
  • Texture mapping adds a separately defined texture or pattern to a surface, for example, wall paper on a wall. This does not affect the smoothness of the surface, but only changes its colour patterns. Texture mapping therefore adds a separately defined texture or pattern to a surface.
  • this pattern is an image that has been generated by a camera with a fish eye lens.
  • WDC World Device Co-ordinates
  • scan line graphics is used, this is more complex as objects are first converted to physical device co- ordinates (PDC) and then scan converted.
  • image or texture maps are defined in two dimensions (u, v) and are mapped onto a three dimensional object. For each point in 3-D space (X 1 Y, Z) there must be found a corresponding point in the 2D map space and the point on the 3D space must be coloured with the value from the texture map. This is easiest to achieve with parametrically defined surfaces which in general requires two parameters to define. Thus, a surface may be defined by equations of the form:
  • the same method of calculation may be used for any other shape that can be represented parametrically. It will be appreciated that the process described is repeated for each image in a stream of images.
  • the camera used to acquire images may be a video camera acquiring images in XHD or other format and the image frame will be processed one by one.
  • the first step is to view an image of a known shape using the particular lens of the camera being used to capture the series of images.
  • the chosen shape could, for example, be a cube or other angular shape having a surface pattern of regular shapes.
  • a good example is a half cube with small squares 1 cm x 1 cm patterned on the surface. This is a good choice of shape as it reveals any distortions of the capture lens.
  • a single image of this shape is then captured with a digital camera using the chosen fish eye lens.
  • the picture is then mapped onto a mesh with the parameters varied to correct the shape by eye until the edges of the image are not distorted. Having determined the appropriate mesh empirically, this mesh can now be applied to the entire sequence of images captured with that lens.
  • the mesh of Figure 10 is an example of a mesh that may be determined empirically using this method. It will be appreciated from a comparison of
  • the hemispherical, or near hemispherical mesh can only generate a 180 degree view and that a 360 degree view requires more than one mesh.
  • two identical hemishperical meshes may be used, or two near hemispherical meshes with each shaped to a respective one of a pair of 180 degree lenses
  • the process of playing a sequence of video frames as provided by the pre-rendering process is computationally simple and may be undertaken by a standard graphics card on a PC.
  • a virtual camera is defined an arranged in software to view the pre-rendered scene on the image mesh.
  • User definable parameters such as pan, tilt and zoom, are received at an input and applied to the virtual camera, so that an appropriate selection of the pixels from the image mesh is made and represented on a 2D screen.
  • the selection of the pixels does not require computationally intensive processes such as interpolation because the pre-rendering process has ensured that the pixel arrangement, as transformed onto the mesh, is such that a simple selection of a portion of the pixels in any direction (pan or tilt) or of any size (zoom) is already appropriate for display on a 2D display.
  • the selection merely involves selecting the appropriate part of the image mapped onto the mesh. This process can be repeated for each frame of an image thereby creating an video player.
  • the 3D image player takes data mapped on the image mesh and transforms this to a 2D image.
  • This step is computationally simple as it only involves taking the pixels mapped on the mesh, and the pan, tilt and zoom parameters input by the user, to select the pixels to be presented in a 2D image.
  • the virtual camera parameters can be modified in real time.
  • the focal length can be very short, providing a wide angle view.
  • the sphere or other mesh shape, the texture and the virtual camera are combined in software which the end user controls by adjusting pan, tilt and zoom function of the virtual camera.
  • a graphic user interface 50 generates and send the pan, tilt and zoom parameters to the virtual camera.
  • each real camera 10 and virtual camera 40 have a common nodal point. If the fish eye cameras and the optional PTZ camera have the same nodal point, the texture relative to the PTZ camera may then be texture mapped onto a spline mesh, inside the 3D world, exactly between the sphere mesh and the virtual camera. If the digital zoom is still in the zoom range, a window can display only the view from the sphere mesh. If the digital zoom is greater than the zoom range, then the window could display the view provided by the PTZ camera instead.
  • a user acquires 360° video images using a camera and a twin fish eye lens and some other arrangement.
  • the texture mapping of the image data onto the 3D surface can be performed in real time using a convention PC graphics card and the user therefore has real time pan, tilt and zoom control of the images displayed to him.

Abstract

A stream of images is acquired either from a camera and fish eye lens assembly or is computer generated. The image data is then texture mapped onto a 3D mesh and a portion of the mapped data is selected under the control of pan, tilt and zoom functions which define the parameters of a virtual camera within the mesh. The selected data is then converted back into 2D data and displayed.

Description

METHOD AND SYSTEM FOR PROCESSING OF IMAGES
FIELD OF THE INVENTION
This invention relates to processing of images and in particular, to the manipulation and display of images derived from very wide or 360° views.
BACKGROUND TO THE INVENTION
There are many situations in which it may be desired to present images to a user in such a manner that the user can control a portion of a 360° image that is viewed on a screen. Such images can be from computer- generated images or from real world images recorded using a video camera.
In the field of 3D content display many technologies exist on the market, but can easily be grouped in 3 main families:
3D Real Time
3D Rendering
360 Still Navigable
3D Real-time is a 3D engine software that can compute and display in real-time all the elements needed in a 3D world, such as meshes, texture, light, shadow. The rendering unit generates images at the desired FPS (Frame per Second), each frame being computed and rendered. For example, depending on the complexity of the 3D world and its components, 30 FPS means that the scene is rendered 30 times per seconds, giving the end user a smooth illusion of movement in that 3D world.
Many different ways exists to render this 3D world in Real-time. They are all dependent on the hardware regarding the quality and speed of the render. For example, if the scene is complex (high number of vertices, complicated lights and shadows, high quality of textures) the rendering time will be huge for each frame, and the FPS will drop as a result, losing the impression of movement. But the main constraint is that in 3D Real- time, some very high rendering modes such as Ray-Tracing and Radiosity are not allowed because of computing time. For example, an image rendered in Radiosity can take at least 1 hour to render, but give to the end user a sensation of reality. The user has control of images produced by this method and many apply Pan, Tilt and Zoom to a virtual camera to select the portion of images that is viewed.
The FPS in this case is relative to the rendering process in the 3D engine. This technology provides full interactivity to the 3D environment, but is poor in quality.
3D Rendering relates to video sequences that are preliminarily rendered into a 3D software and are simply displayed using a video player. The 3D video is then composed with sequences of frames (basically 25 Frames Per Seconds FPS). The rendering process is computed in advance using technologies such as Ray-Tracing or Radiosity, not allowed in 3D Realtime. However, in this arrangement, the end user does not have any control of the movie.
The FPS in this case is relative to the final video. This technology provides ultra high quality, but doesn't provide any interactivity control.
360 Still Navigable is a 3D image rendered in any 3D rendering software using Equirectangular settings for virtual camera parameters. The concept of a virtual camera is well understood in the computer graphics field to refer to an image view computed by the software as if it had 'seen' the scene from a certain coordinate location. The image is then displayed using software that corrects and dewarps in Real-time the Equirectangular image to allow the end user to have a corrected image on the screen. The end user has Pan, Tilt and Zoom (PTZ) control on that static frame enabling a limited degree of control of the static frame that is displayed
This technology combines ultra high quality, but provides a limited interactivity. In particular, only still images can be viewed.
The following table represents the different interactive possibilities that are allowed in the different technologies:
Figure imgf000004_0001
We have appreciated the desirability of providing moving images to a user, which allow the possibility of the user to control the portion of a wide field image displayed on a screen using parameters such as pan, tilt and zoom. We have further appreciated that hardware constraints mean that it is not computationally feasible to render each frame of such an image sequence in Real-time.
We have also appreciated a need to process streams of image data derived from wide-angle views, for example, acquired using a camera fitted with a fish eye or other wide angle lens, so as to allow a user to select only portions of the image for viewing from a particular perspective. We have also appreciated that such processing is computationally intensive. We have further appreciated the advantage of processing image data in such a way as to allow the computation to be undertaken in a graphics card in a PC rather than requiring processing using a main CPU of a PC. Such processing can make the processing of a wide angle image data commercially viable and not confine it to high-end post production facilities such as are required by existing techniques. SUMMARY OF THE INVENTION
The invention is defined in the claims to which reference is now directed.
In a broad aspect, the invention involves a new two-stage process involving first, pre-rendering pixels of an image using a mesh and second, playing the sequence of frames derived from the mesh. The pre-rendering is arranged such that it is then computationally simple to convert the image data mapped onto the mesh into video images with user selectable parameters such as pan, tilt and zoom. In particular, the mesh is arranged so that the conversion from the image mapped on the mesh into a 2D representation does not require interpolation. To achieve this, the mesh is arranged such that a conceptual virtual camera, if arranged at a central point within the mesh, would see an equal spacing of image pixels for presentation on a 2D display. The computationally intensive step of pre- rending can therefore be done in non-Real-time and allows a video player to play the images in Real-time whilst allowing user selectable parameters, such as pan, tilt or other degrees of freedom and zoom to also be chosen in Real-time. The image source may be video, for example coming from a real time camera stream, recorded file or computer generated. In the former case, in a preferred embodiment, the mesh is comprised of multiple meshes that have the shape of one, or multiple camera lenses. In the latter case, in a preferred embodiment, the computer generated images are rendered in equirectangular format by rendering software and then mapped onto a spherical mesh.
BRIEF DESCRIPTION OF THE FIGURES
Embodiments of the invention will now be described, by way of example only, with reference to the accompanying figures in which:
Figure 1 is an overview of the whole image creation, pre-rendering and video play out process; Figure 2 shows the step of producing an image on a mesh for a single image;
Figure 3 shows examples of images resulting from the process;
Figure 4 shows the process steps for an embodiment using wide angle lens capture;
Figures 5a, 5b and 5c show plan, side and perspective views of an 3D or real environment to be derived using the embodiment of Figure 4;
Figure 6 is an example of a fish eye image acquired from the images of Figures 5a - 5c using the embodiment of Figure 4;
Figure 7 shows example meshes;
Figures 8a and 8b show the conversion between a hemisphere and a hemi-cube;
Figure 9 shows a resulting image with the scene of Figures 5a - 5c projected onto a hemi-cube lens; and
Figure 10 shows examples of a mesh used with a fish eye lens to correct for distortion.
DESCRIPTION OF PREFERRED EMBODIMENT
The invention may be embodied in an image processing system. It is preferred that the invention is embodied into separate systems, namely a pre-rendering system, which operates an embodiment of a first aspect of the invention and a video player, which operates an embodiment of a second aspect of the invention. Each aspect may be embodied in specific hardware designed to operate the process, such as a specific digital signal processor or other hardware arrangement. Alternatively, the embodiments may be implemented in software. To put the embodiment of both aspects of the invention in context prior to describing the two aspects of the pre-rendering system, on the one hand, and the video player, on the other hand, the advantages of an embodiment of both aspects of the invention, as a whole, will first be described.
In essence, the embodiment of the invention undertakes the computationally expensive process of converting each frame of an image sequence, in such a way that it is then computationally simple to allow a user to select parameters such as pan, tilt and zoom in Real-time. Accordingly, the end result is an image sequence representing a camera (whether virtual or real) moving through space, which can be manipulated by a user to pan, tilt or zoom in different directions. Accordingly, a user may select different pan, tilt and zoom settings and step forwards and backwards through the image sequence in time. However, unlike the computationally very expensive process of 3D Real-time rendering, the user would not be able to cause the apparent virtual camera to move to different locations in space. However, we have appreciated that there are many environments and applications where this is not a requirement.
Throughout the description reference is made to pan, tilt and zoom as these are expressions commonly used in the art. Pan and tilt are examples of degrees of freedom which are the set of independent displacements that specify completely the displaced or deformed position of a body or system. In general, a rigid body in d-dimensions has d(d+1)/2 degrees of freedom (d translations + d(d-1)/2 rotations). It can be reasoned that rotational freedom is the same as fixing a coordinate frame. The first axis of the new frame is unrestricted, except that it has to have the same scale as the original, so it has (d-1) DOFs. The second axis has to be orthogonal to the first, so it has (d-2) DOFs. This leads to d(d-1)/2 rotational DOFs in d dimensions. In 1-, 2- and 3- dimensions there are therefore one, three, and six degrees of freedom. A non-rigid or deformable body may be thought of as a collection of many minute particles (infinite number of DOFs); this is often approximated by a finite DOF system. When motion involving large displacements is the main objective of study a deformable body may be approximated as a rigid body (or even a particle) in order to simplify the analysis.
The 6 degrees of freedom may be described as:
1. Moving. up and down (heaving);
2. Moving left and right (swaying);
3. Moving forward and backward (surging); 4. Tilting up and down (pitching);
5. Turning left and right (yawing);
6. Tilting side to side (rolling).
It can be seen that Pan equates to yawing and that tilt equates to pitching. The zoom function is not a DoF. In the 3D realtime engine described below, the system also has the moving DoF 1-3. In the following description, references to pan and tilit should be treated as references also tO other possible degrees of freedom.The embodiment allows interactive ultra high 3D Rendering video sequences to be visualised using standard computer processing power available in the market. In addition, the embodiment combines pan, tilt zoom and time interactive functions on the content, keeping an original frame rate, as generated with the 3D Rendering application or captured from the real world by camera. Thus sequences may be rendered using the graphics card of a commercially available PC instead of requiring use of the main PC processor or a dedicated processor.
The following table comparisons between the functionality of embodiments of the present invention and the existing technologies described above: The pre-rendering and video play out may be performed on image data acquired from any source. However, it is particularly suited, but not limited, to data acquired by a camera using a wide or ultra wide angle lens such as a fish eye lens. Such lenses can produce a view of up to 360° and it is desirable to be able to select a portion of such an image for further viewing. Thus, the ability to pan, tilt and zoom is most important. Moreover, in many applications it is desirable to be able to view such images as a series of images in real time, in which case the PTZ capability available in existing 3D Real-time and 360 Equirectangular systems are not suitable, bearing in mind, particularly, that the equirectangular system is only suitable for still images. More than one camera may be used as is described below.
Figure imgf000009_0001
Figure imgf000009_0002
Overview of the System
The overall steps in the system will first be described with respect to Figures 1 to 3. Subsequently, the pre-rendering system and video player system will be described separately.
As shown in Figure 1 , in an embodiment using computer-generated images, the principal idea is to create a 3D world into a 3D software at 10 and to pre-render it at 20 using particular parameters, and to display them into a 3D world at 30 under the control of user defined pan, tilt and zoom parameters input into the system via a graphical user interface at 40. The process is able to match the final frame rate displayed at 50 to the frame rate created 10.
The 3D world 10 may be created using any known 3D graphic design software. The 3D world 10 is composed with multiple elements to create the illusion of volume and reality. Each object is composed with meshes, textures, lights, movement. Then, a virtual camera is located in that environment.
The virtual camera parameters (focal length, aperture, ratio, projection, movement) are then set up and sent to the renderer 20. The renderer 20 has parameters set up, such as resolution, frame rate with number of frames, and the type of renderer algorithm. Any suitable algorithm may be used. Known algorithms include Phong, Gouraud, Ray Tracing and Radiosity. For the avoidance of doubt, the term rendering applies to the processing of computer generated images. The renderer operates on either CG images or images acquired from other sources such as a video camera.
The main camera and renderer common parameter is to set the view projection as equirectangular, with an image ratio to 2:1. This parameter may be changed according to user preference.
Then the image sequence is rendered and saved as an images file sequence or a video file. The resulting file or files are then converted as a texture file or files. This is a pre-rendering step.
The texture file is mapped onto a mesh in the form of a 3D sphere (normal orientation=inside / minimum number of vertex = 100) using texture mapping parameters (global illumination - 100%). A virtual camera is then located in the centre of that sphere. The mapping onto the mesh is the key step that saves computation time. This is a rendering step.The virtual camera parameters can be modified in real time. For example, the focal length can be very short, providing a wide angle view. The mesh sphere, the texture and the virtual camera are combined into software giving the end user control the pan, tilt and zoom function of the virtual camera.
This combination has a 3D frame rate (rendered in real-time). The texture has a 2D frame rate (pre-rendered).
The graphic user interface 40 generates and sends the pan, tilt and zoom parameters to the software virtual camera 30 and then allows the view to be generated and sent to the general display at step 50. Basically, all this sequence is determined into a loop. That loop determines the 3D frame rate.
The 2D frame rate is still continuing during all the sequence. The 3D frame rate continues until the end user stops the application. The software application can embed normal movie player's functions such as pause, rewind, forward and stop. These functions act on the 2D frame rate.
Figure 2 shows the key steps in the process and the available view as a result. As previously described, a 3D view of the world as first created by any known 3D graphic techniques and rendered into a 2D equirectangular view. The equirectangular views are part of a sequence of 2D representations with a 2D frame rate being a sequence representing a view as a camera tracks through a scene. The camera is, of course, a virtual camera in this example as the image is computer-generated. The key step in software is then taking the equirectangular image frames and mapping these onto a mesh, in this case a spherical mesh, in such a way that a virtual camera located at the centre of that mesh views a non- distorted view of the image in any direction. Consequently, as shown in Figure 2, a camera, with given pan, tilt and zoom parameters, will view a portion of the scene represented on the mesh substantially non-distorted. The user can alter the pan, tilt and zoom parameters of the virtual camera, in any direction, to select a portion of the scene to view. As the user also has control of the whole sequence of frames, the user is able to step forwards or backwards, in the manner of re-winding or playing, as a virtual video player.
Thus, in Figure 2 at (a) an equirectangular image is provided. For simplicity this is shown as a single image although in practice it can be a sequence of images at a real time frame rate. At (b) the 3-D mesh is produced and at (c) texture mapping techniques are used to map the equirectangular image onto the mesh. The remaining parts of the figure show how a portion of the spherical image may be selected for viewing substantially without distortion. Figure 2 (d) is a side view of the 3D world, that is the image mapped onto the sphere. A virtual camera is positioned at the centre of the sphere, that is, in X Y Z co-ordinates (0, 0, 0). The camera can 'see' a portion of the 3D image, with the aspect ratio being set in advance and the actual portion of the 3D image seen being determined by user selected controls including pan, tilt and zoom. Figure 2 (d) shows a front view of the same 3D world, this time showing the image selected by the camera in the X Y directions. Figure 2 (f) shows the same view as Figure 2 (e), but with the selected image portion shown within the camera frame.
It will be appreciated that the virtual camera is a tool for selecting a defined portion of the image rendered onto the 3D mesh for viewing.
Figure 3 shows the same example scene, showing how the user can pan around each image of an image frame sequence, to select portions of the image.
Thus, in Figures 3a and 3b, the full equirectangular image is shown with a different user window selected. In Figure 3a, this window is to the left of centre. In figure 3b the window is to the right of centre. The window is the portion of the 3D image that is selected for viewing by the virtual camera. The window is moved around the equirectangular image by the pan and tilt function. Of course, this function is operating to select an area of the image projected onto the 3D mesh, but the user will see the equirectangular image on his display and move the window around the displayed equirectangular image. In figure 3(c) the zoom function is illustrated. Here, the user window occupies a considerably smaller portion of the equirectangular image resulting in a small part of the image filling the display than in the Figure 3(a) and 3(b) examples, giving a zoom function.
In the examples of Figures 1 to 3, the 3D world onto which the equirectangular image is mapped is a sphere. This is a convenient shape and particularly suited to mapping computer generated images. However, the choice of 3D shape is defined by the system provider. In the more detailed example which follows, the images are not computer generated but are acquired using a fish eye lens and camera. In this case it is appropriate to use a mesh which approximates the fish eye lens. Thus, a
180° fish eye will use a hemispherical mesh. In practice, the mesh may be adjusted to compensate for optical distortions in the lens as shown later with respect to Figure 10. The present invention is not limited to any particular mesh shape, although for a given input, the correct choice of mesh is key to outputting good distortion free images. Where the image source is video, the equirectangular images of the computer graphics example are replaced by mapped texture images.
The pre-rendering system will first be described followed by the video player system.
Pre-Renderinq System
The pre-rendering process takes frames of image data and maps these onto a mesh in such a manner that the subsequent process for displaying the video is computationally simple. The key feature of the pre-rendering process is the choice of mesh. The mesh arrangement is chosen so that if a theoretical virtual camera were located at the centre of the mesh, the view of the camera in any direction would give a substantially non- distorted two-dimensional view of the image. If the original image representing a wide-angle or 360° view is computer generated, then the mesh itself can be calculated. On the other hand, if the input image data is captured in the real world, using a camera system involving one or more fish eye lenses, the mesh may be empirically determined by calculation from captured test images, as described later.
In the present example, the images are captured from a real environment using cameras. The camera arrangement could be a single wide-angled lens camera, but is preferably at least a pair of cameras with each having a lens with a 180° view. In addition, an optional zoom, pan, tilt and zoom camera, may also be included.
The pre-rendering system will now be described in greater detail with respect to Figures 4 to 10.
In the embodiment of Figure 4, a real environment is captured using a camera system 60. The camera system includes two XHD cameras with ultra wide angle lens. These two cameras are disposed back-to-back, preferably with their focal axes parallel, or ideally identical. This system is one of the possible ways to provide a full 360 vision. In Figure 4 this camera is shown as dual fish eye capture 62.
Optionally, these cameras are combined with one other camera 64 with a lens having the capability to provide mechanical zoom. That camera is controlled with standard PTZ functions. The protocol used to control that camera could be the PELCO protocol which aims to reduce time delays in IP based system which is desirable to improve operator control. The PTZ controls are managed and sent through a software application 90 which governs behaviour in the 3D world. Optionally, further cameras could be linked to that system. For example, a night vision camera. As a minimum, a single camera is used, but the preferred embodiment uses two 180° cameras. The option to include a further camera is shown at 66.
The multiple streams of image data from the camera system may be transmitted to a converter 30. The converter operates on the XHD format data from the camera and may be any suitable digital video codec. However, it is preferred to use the systems described in our co-pending applications GB 0709711 and GB 0718015.
Once the multiple streams have been processed by the Codec 70 they are converted into texture mapping data by converter 80.
The texture files are mapped onto a 3D mesh, for example, normal orientation=inside / minimum number of vertex = 100 and using texture mapping parameters for example with global illumination = 100%. For playback, a virtual camera is then located in the centre of that mesh.
The 3D mesh used depends on what kind of focal length used in the cameras and the characteristics of the lenses. A method of empirically determining the mesh is described later.
For example, to have a full 360 view, we combine two hemispherical meshes. The resulting mesh is a sphere. In Figure 4, the mesh generation and the mapping of the image represented as a texture map onto the mesh, followed by the selection of a portion of the 3D image by the virtual camera is performed at 90 under the control of the virtual camera PTZ controls 92 input from the user GUI 96. As shown at 90, the 3D world may store a number of pre-generated meshes including a sphere, a hemisphere, a cube, a hemi-cube and a spline. This is not an exhaustive list and, as mentioned above, the shape may be chosen according to the input characteristics such as the camera lens. However, this mesh is pre-calculated and does not need to be generated on the fly as the system is operating in real time.
The portion of the image from the 3D world is then passed to a user display 100.
Figures 5a - 5c show schematically how the system captures and processes images for display. The camera and fish eye lens may be positioned to form an image for mapping onto a mesh. The image here, includes a number of components, namely a car 110, a flower 112, an aeroplane 114, a house 116, the sun 118 and a tree 120. Figure 5a shows a plan of the camera, the fish eye lens and image components, Figure 5b shows a side view of the camera, the fish eye lens and the image components and Figure 5c is an isometric view. An image of the components captured in any resolution is taken with a fish eye lens. A mapping 80 is performed such that the image is mapped onto a 3D mesh. This allows a virtual camera, located at the focal point of the 3D mesh, to view a non-distorted view in any direction. In figure 5, the camera represents the real position of the camera in the real world.
Figure 6 shows the first stage of this process. An image of the components is captured using wide-angle lens camera. This may be referred to as the input image and, as already explained, could be an image from a wide-angle lens such as a fish eye lens, including a 360° lens arrangement or could be a computer generated image. In either case, the image represents a wide field of view. The embodiment of the invention is particularly advantageous in that a sequence of frames representing a moving image can easily be pre-rendered using the embodiment of the invention. Thus, Figure 6 shows a fish eye view of the image contents of Figure 5 with each component distorted by an amount which is dependent on the distance from the axis of the camera as is usual with a fish eye lens. Figure 7 shows a mesh in the form of a hemisphere. As is known, a mesh in graphics processing is a group of polygons connected together by shared vertices. The mesh is used to map co-ordinates of the pixels of the input to conceptual 3D locations. A mesh is thus a conceptual arrangement describing the arrangement of pixels of an image on a 3D surface. As such, a mesh is a theoretical arrangement, rather than a physical arrangement, and represents the manner in which data is stored for each frame. The example on Figure 7 is a theoretical perfect hemispherical mesh, which would be used assuming a perfect 180° lens. Generation of meshes is well known on the computer graphics industry. The mesh may be defined using one of a number of known techniques such as calculating and storing a list of the vertices that make up each polygon in the mesh, or by using an edge array where each edge consists of its vertices. This technique may be enhanced by also storing the polygons that share a given edge. All these techniques describes a 3D object as a polygon mesh such as that shown in Figure 7.
Figures 8a, b and 9 illustrate the problem of distortion which is present particularly when using fish eye lenses. Here, the fish eye lens is represented as a hemi cube conversion shown in Figure 8b. A hemi-cube is a well known representation of a fish eye image. As can be seen from Figures 8a and 8b, the front of the fish eye image becomes the front of the hemi-cube and the remaining annulus of the fish eye images is divided into four equal rectangles shown as the top, bottom, left and right sides of the hemi-cube.
Figure 9 shows how the image of Figure 6 may be represented using the cube conversion of Figure 8b. It can be seen that there is distortion at the edges of the shape, see for example the aeroplane which appears curved. This distortion needs to be compensated for in the mesh, as is described later. As mentioned above, texture mapping is used to map the image data onto the chosen 3D mesh. Surface mapping techniques such as texture mapping are well known in the computer graphics industry. Texture mapping techniques take into account the fact that real life surfaces are not just coloured but have textures and patterns. Many also have small or large surface displacements such as bumps or gouges. Texture mapping simulates these surfaces to make images more realistic. There are three well known techniques for creating the impression of natural colours, textures and appearances : texture mapping, bump mapping and displacement mapping. Texture mapping adds a separately defined texture or pattern to a surface, for example, wall paper on a wall. This does not affect the smoothness of the surface, but only changes its colour patterns. Texture mapping therefore adds a separately defined texture or pattern to a surface. In the embodiment of the invention, this pattern is an image that has been generated by a camera with a fish eye lens. The process involves the mathematical mapping of the image from one domain to another and requires knowledge of the 3D World Device Co-ordinates (WDC), that is the X, Y, Z value of the surface of the image to be mapped. When ray tracing is used, this would be the point at which the ray intersects the object. Where scan line graphics is used, this is more complex as objects are first converted to physical device co- ordinates (PDC) and then scan converted.
Texture or image mapping may be characterised by the dimensionality involved:
1-D texture domain → 3-D (WDC) → 2-D (PDC)
2-D texture domain → 3-D (WDC) → 2-D (PDC)
3-D texture domain → 3-D (WDC) → 2-D (PDC)
In embodiments of the present invention, as described above, image or texture maps are defined in two dimensions (u, v) and are mapped onto a three dimensional object. For each point in 3-D space (X1 Y, Z) there must be found a corresponding point in the 2D map space and the point on the 3D space must be coloured with the value from the texture map. This is easiest to achieve with parametrically defined surfaces which in general requires two parameters to define. Thus, a surface may be defined by equations of the form:
x = X (U, v) ; y = Y (u, v); z = Z (u, v).
To render the images an equirectangular image is mapped onto the mesh. In the case of a spherical mesh, this may be done as follows:
An angle q is defined as the angle from the x axis (0 < = q < = 2 p) where p = π; and an angle f is defined as the angle from the z axis (0.0 < = f < =
P).
The equations for a sphere are:
X = R sin (f) * cos (q) = R sin (pv) * cos (2pu) (1) where f/p = v (0.0 < = v < = 1.0)
Y = R sin (f) • sin (q) = R sin (2pu) (2) where q/2p = n (0.0 < = n < = 1.0)
Z = R cos (f) = R cos (pv) (3)
From (3) v = f/p = arcos (z/r)/p
From (1) u = [arcos (XIR sin (pu)]/2p
Note: q = arcos x = x = cos q
Thus, if the point on the surface X, Y, Z is known, a point in the u, v texture space can be computed.
Of course, the same method of calculation may be used for any other shape that can be represented parametrically. It will be appreciated that the process described is repeated for each image in a stream of images. The camera used to acquire images may be a video camera acquiring images in XHD or other format and the image frame will be processed one by one.
The manner for empirically determining the shape of the texture mapping mesh for a given lens will now be described with respect to Figures 10 a - c. The first step is to view an image of a known shape using the particular lens of the camera being used to capture the series of images. The chosen shape could, for example, be a cube or other angular shape having a surface pattern of regular shapes. A good example is a half cube with small squares 1 cm x 1 cm patterned on the surface. This is a good choice of shape as it reveals any distortions of the capture lens. A single image of this shape is then captured with a digital camera using the chosen fish eye lens. The picture is then mapped onto a mesh with the parameters varied to correct the shape by eye until the edges of the image are not distorted. Having determined the appropriate mesh empirically, this mesh can now be applied to the entire sequence of images captured with that lens.
The mesh of Figure 10 is an example of a mesh that may be determined empirically using this method. It will be appreciated from a comparison of
Figures 7 and 10 that the mesh of Figure 10 is not a true hemisphere but that the edge of the hemisphere is curved outwards to compensate for distortion at the edge of the fish eye lens. It will therefore be apparent that the actual shape of the mesh of Figure 10 will depend on the characteristics of the wide angle lens that is used.
It will be understood from the preceding example that the hemispherical, or near hemispherical mesh, can only generate a 180 degree view and that a 360 degree view requires more than one mesh. For example, two identical hemishperical meshes may be used, or two near hemispherical meshes with each shaped to a respective one of a pair of 180 degree lenses
Video Player
The process of playing out the video will now be described.
The process of playing a sequence of video frames as provided by the pre-rendering process, is computationally simple and may be undertaken by a standard graphics card on a PC. To play each frame of a sequence of images pre-rendered according to the pre-rendering arrangement, a virtual camera is defined an arranged in software to view the pre-rendered scene on the image mesh. User definable parameters, such as pan, tilt and zoom, are received at an input and applied to the virtual camera, so that an appropriate selection of the pixels from the image mesh is made and represented on a 2D screen. The selection of the pixels does not require computationally intensive processes such as interpolation because the pre-rendering process has ensured that the pixel arrangement, as transformed onto the mesh, is such that a simple selection of a portion of the pixels in any direction (pan or tilt) or of any size (zoom) is already appropriate for display on a 2D display. The selection merely involves selecting the appropriate part of the image mapped onto the mesh. This process can be repeated for each frame of an image thereby creating an video player.
Thus, the 3D image player takes data mapped on the image mesh and transforms this to a 2D image. This step is computationally simple as it only involves taking the pixels mapped on the mesh, and the pan, tilt and zoom parameters input by the user, to select the pixels to be presented in a 2D image.
The virtual camera parameters can be modified in real time. For example, the focal length can be very short, providing a wide angle view. The sphere or other mesh shape, the texture and the virtual camera are combined in software which the end user controls by adjusting pan, tilt and zoom function of the virtual camera.
A graphic user interface 50 generates and send the pan, tilt and zoom parameters to the virtual camera.
An additional possibility in the video player is that each real camera 10 and virtual camera 40 have a common nodal point. If the fish eye cameras and the optional PTZ camera have the same nodal point, the texture relative to the PTZ camera may then be texture mapped onto a spline mesh, inside the 3D world, exactly between the sphere mesh and the virtual camera. If the digital zoom is still in the zoom range, a window can display only the view from the sphere mesh. If the digital zoom is greater than the zoom range, then the window could display the view provided by the PTZ camera instead.
Thus, in a practical embodiment of the invention, a user acquires 360° video images using a camera and a twin fish eye lens and some other arrangement. The texture mapping of the image data onto the 3D surface can be performed in real time using a convention PC graphics card and the user therefore has real time pan, tilt and zoom control of the images displayed to him.
Many modifications to the embodiments described are possible and will occur to those skilled in the art without departing from the invention which is defined by the following claims.

Claims

Claims
1. A method of processing a stream of images for display, comprising:
acquiring the image stream from an image source;
mapping each image of the image stream onto a three dimensional mesh to provide a three dimensional representation of each image;
selecting a portion of the three dimensional representation of each image under the control of a degree of freedom control or zoom control input by a user; and
outputting the selected portion of a 2 dimensional stream of images.
2. A method according to claim 1 , wherein the selection of a portion of each three dimensional representation comprises defining a virtual camera at a focus of the three dimensional mesh, the camera having a pre-defined aspect ratio, and moving the camera to select a desired portion of the three dimensional representation in accordance with the input at a degree of freedomor zoom control.
3. A method according to claim 1 , wherein the mapping of the image data is texture mapping.
4. A method according to claim 1 , or 2 , wherein the images of the image stream are processed to provide equirectangular images prior to mapping.
5. A method according to any of claims 1 to 3, wherein the image source is a camera having a wide angle lens and providing a stream of video images for mapping onto the mesh.
6. A method according to claim 5, wherein the mesh shape is determined by the shape of the camera lens.
7. A method according to claim 6, wherein the mesh is hemispherical.
8. A method according to claim 7, wherein the mesh shape is adjusted to compensate for distortion in images produced by the lens.
9. A method according to any of claims 5 to 8, wherein the camera has a plurality of lenses to acquire substantially 360° images.
10. A method according to any of claims 5 to 9, wherein the camera includes a camera with pan, tilt and zoom controls.
11. A method according to any of claims 1 to 4, wherein the images are computer generated images and the mesh is a sphere.
12. A method according to any preceding claim, wherein the pan, tilt and zoom controls are input by a user via a user interface and control a portion of the images mapped onto the mesh that is selected for conversion back to a two dimensional image and display to the user.
13. A method according to any of claims 5 to 10, wherein the camera provides the video images in XHD format, comprising pre-processing the
XHD format images in a video codec prior to mapping onto the mesh.
14. A method according to any preceding claim, comprising defining, prior to processing the stream of images, a three dimensional world including the mesh, and including at least one of texture, light and movement.
15. A method according to claim 2, wherein the parameters of the virtual camera are set to provide 2D equirectangular images for display.
16. A method according to any preceding claim, wherein the frame rate of the image stream is the same as the frame rate of the output stream of selected image portions.
17. A method of playing out video images processed by a method according to any preceding claim, comprising inputting degree of freedom parameters and a zoom parameter to define a portion of the 3D representation of the images to be displayed, displaying the selected portion of the images as two dimensional images, and varying the portion of the images displayed by varying at least one of the degree of freedom parameters and the zoom parameter to select a different image portion.
18. Apparatus for processing a stream of images for display, comprising:
means for acquiring the image stream from an image source;
means for mapping each image of the image stream onto a three dimensional mesh to provide a three dimensional representation of each image;
means for selecting from the three dimension representation of each image, under the control of user defined controls comprising a degree of freedom controland/or zoom control, a portion of the three dimensional representation of each image; and
means for presenting the selected portion as a two dimensional image.
19. Apparatus according to claim 18, wherein the selecting means is a software defining a virtual camera at a focus of the three dimensional mesh, the camera having a pre-defined aspect ratio, and being movable in response to user degree of freedom and/or zoom controls to select a desired portion of the three dimensional representations of the images.
20. Apparatus according to claim 18 or 19, wherein the mapping means performs textured mapping of the image data.
21. Apparatus according to claim 18, 19 or 20 comprising a processor to processing the images to provide equirectangular images to the mapping means.
22. Apparatus according to any of claims 18 to 20 wherein the means for acquiring images comprises a video camera having a wide angle lens and providing a stream of images for mapping on to the mesh.
23. Apparatus according to claim 22, wherein the wide angle lens comprises at least one fish eye lens.
24. Apparatus according to claim 22, wherein the at least one fish eye lens provides a substantially 360° field of view.
25. Apparatus according to claim 22, 23 or 24, wherein the image acquisition means comprises a camera having at least one of pan, tilt and zoom controls.
26. Apparatus according to any of claims 18 to 25, comprising a mesh store for storing the mesh, the mesh being selected in accordance with characteristics of the input images.
27. Apparatus according to claim 26, wherein the input images are computer generated and the mesh is spherical.
28. Apparatus according to claim 26, wherein the images are acquired from a camera and lens assembly and the mesh is chosen in accordance with characteristics of the lens.
29. Apparatus according to claim 28, wherein the lens is a fish eye lens and the mesh is hemispherical.
30. Apparatus according to claim 29, wherein the shape of the, hemispherical mesh is adjusted to compensate for distortion in the fish eye lens.
31. Apparatus according to any of claims 18 to 30, comprising a user interface for provision of degree of freedom and zoom controls to the selecting means to enable a desired portion of the three dimensional mapped image to be selected.
32. A video player for playing the two dimensional images selected from mapped three dimensional images by the method of any of claims 1 to 16, comprising an interface for input of degree of freedom and zoom parameters to define a portion of the 3D representation of the images to be displayed, and means for outputting a stream of selected image portions for display, wherein the interface includes means for varying at least one of the degree of freedom and zoom to select a different image portion for output.
PCT/IB2008/001369 2007-11-30 2008-01-22 Method and system for processing of images WO2009068942A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB0723538A GB2455498A (en) 2007-11-30 2007-11-30 Method and system for processing of images
GBGB0723538.5 2007-11-30

Publications (1)

Publication Number Publication Date
WO2009068942A1 true WO2009068942A1 (en) 2009-06-04

Family

ID=38962453

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2008/001369 WO2009068942A1 (en) 2007-11-30 2008-01-22 Method and system for processing of images

Country Status (2)

Country Link
GB (1) GB2455498A (en)
WO (1) WO2009068942A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016138043A1 (en) * 2015-02-24 2016-09-01 NextVR, Inc. Calibration for immersive content systems
WO2017193729A1 (en) * 2016-05-11 2017-11-16 深圳市圆周率软件科技有限责任公司 Method and device for unfolding lens image into panoramic image
US9894350B2 (en) 2015-02-24 2018-02-13 Nextvr Inc. Methods and apparatus related to capturing and/or rendering images
NO343149B1 (en) * 2014-04-22 2018-11-19 Vision Io As Procedure for visual inspection and logging
EP3496042A1 (en) * 2017-12-05 2019-06-12 Shortbite Ltd System and method for generating training images

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10154194B2 (en) 2014-12-31 2018-12-11 Logan Gilpin Video capturing and formatting system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6243099B1 (en) * 1996-11-14 2001-06-05 Ford Oxaal Method for interactive viewing full-surround image data and apparatus therefor
WO2001095608A2 (en) * 2000-06-09 2001-12-13 Interactive Imaging Systems Inc. A method and apparatus for mapping images and video to create navigable, immersive video and images
US20050007478A1 (en) * 2003-05-02 2005-01-13 Yavuz Ahiska Multiple-view processing in wide-angle video camera
US20070124783A1 (en) * 2005-11-23 2007-05-31 Grandeye Ltd, Uk, Interactive wide-angle video server

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5796426A (en) * 1994-05-27 1998-08-18 Warp, Ltd. Wide-angle image dewarping method and apparatus
JP2002203254A (en) * 2000-08-30 2002-07-19 Usc Corp Curved surface image transforming method and recording medium with the curved surface image transforming method recorded thereon
JP2003141562A (en) * 2001-10-29 2003-05-16 Sony Corp Image processing apparatus and method for nonplanar image, storage medium, and computer program

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6243099B1 (en) * 1996-11-14 2001-06-05 Ford Oxaal Method for interactive viewing full-surround image data and apparatus therefor
WO2001095608A2 (en) * 2000-06-09 2001-12-13 Interactive Imaging Systems Inc. A method and apparatus for mapping images and video to create navigable, immersive video and images
US20050007478A1 (en) * 2003-05-02 2005-01-13 Yavuz Ahiska Multiple-view processing in wide-angle video camera
US20070124783A1 (en) * 2005-11-23 2007-05-31 Grandeye Ltd, Uk, Interactive wide-angle video server

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CRIMINISI A ET AL: "Image-based interactive exploration of real-world environments", IEEE COMPUTER GRAPHICS AND APPLICATIONS, IEEE SERVICE CENTER, NEW YORK, NY, US, vol. 24, no. 3, 1 May 2004 (2004-05-01), pages 52 - 63, XP011112360, ISSN: 0272-1716 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
NO343149B1 (en) * 2014-04-22 2018-11-19 Vision Io As Procedure for visual inspection and logging
US10380729B2 (en) 2014-04-22 2019-08-13 Vision Io As Method for visual inspection and logging
WO2016138043A1 (en) * 2015-02-24 2016-09-01 NextVR, Inc. Calibration for immersive content systems
US9865055B2 (en) 2015-02-24 2018-01-09 Nextvr Inc. Calibration for immersive content systems
US9894350B2 (en) 2015-02-24 2018-02-13 Nextvr Inc. Methods and apparatus related to capturing and/or rendering images
WO2017193729A1 (en) * 2016-05-11 2017-11-16 深圳市圆周率软件科技有限责任公司 Method and device for unfolding lens image into panoramic image
US10798300B2 (en) 2016-05-11 2020-10-06 Shenzhen Pisoftware Technology Method and device for unfolding lens image into panoramic image
EP3496042A1 (en) * 2017-12-05 2019-06-12 Shortbite Ltd System and method for generating training images

Also Published As

Publication number Publication date
GB2455498A (en) 2009-06-17
GB0723538D0 (en) 2008-01-09

Similar Documents

Publication Publication Date Title
US10839591B2 (en) Stereoscopic rendering using raymarching and a virtual view broadcaster for such rendering
US10096157B2 (en) Generation of three-dimensional imagery from a two-dimensional image using a depth map
EP3350653B1 (en) General spherical capture methods
US6005611A (en) Wide-angle image dewarping method and apparatus
USRE43490E1 (en) Wide-angle dewarping method and apparatus
CN106688231A (en) Stereo image recording and playback
CA2300529A1 (en) System and method for generating and playback of three-dimensional movies
US20030095131A1 (en) Method and apparatus for processing photographic images
WO2017128887A1 (en) Method and system for corrected 3d display of panoramic image and device
EP4007992A1 (en) Few-shot synthesis of talking heads
WO2017086244A1 (en) Image processing device, information processing device, and image processing method
WO2009068942A1 (en) Method and system for processing of images
Nielsen Surround video: a multihead camera approach
CN107005689B (en) Digital video rendering
TW202240530A (en) Neural blending for novel view synthesis
CN107562185B (en) Light field display system based on head-mounted VR equipment and implementation method
EP3057316B1 (en) Generation of three-dimensional imagery to supplement existing content
US7542035B2 (en) Method for interactively viewing full-surround image data and apparatus therefor
US20220253975A1 (en) Panoramic presentation methods and apparatuses
Price et al. 3D virtual production and delivery using MPEG-4
WO2009109804A1 (en) Method and apparatus for image processing
Katayama et al. A method for converting three-dimensional models into auto-stereoscopic images based on integral photography
CN111063034B (en) Time domain interaction method
Hisatomi et al. A method of video production using dynamic 3D models and its application to making scenes of a crowd
CN112465696A (en) Panoramic presentation method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08751064

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08751064

Country of ref document: EP

Kind code of ref document: A1