WO2014139117A1 - Voice and/or facial recognition based service provision - Google Patents

Voice and/or facial recognition based service provision Download PDF

Info

Publication number
WO2014139117A1
WO2014139117A1 PCT/CN2013/072590 CN2013072590W WO2014139117A1 WO 2014139117 A1 WO2014139117 A1 WO 2014139117A1 CN 2013072590 W CN2013072590 W CN 2013072590W WO 2014139117 A1 WO2014139117 A1 WO 2014139117A1
Authority
WO
WIPO (PCT)
Prior art keywords
identification
recognition engine
user
voice
service
Prior art date
Application number
PCT/CN2013/072590
Other languages
French (fr)
Inventor
James A. Baldwin
Guangli ZHANG
Original Assignee
Intel Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corporation filed Critical Intel Corporation
Priority to KR1020157021017A priority Critical patent/KR101731404B1/en
Priority to US13/995,476 priority patent/US9218813B2/en
Priority to JP2015556364A priority patent/JP6093040B2/en
Priority to EP13877676.0A priority patent/EP2974124A4/en
Priority to PCT/CN2013/072590 priority patent/WO2014139117A1/en
Priority to CN201380073090.0A priority patent/CN104995865B/en
Publication of WO2014139117A1 publication Critical patent/WO2014139117A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/18Artificial neural networks; Connectionist approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/32User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/06Decision making techniques; Pattern matching strategies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • H04L63/0861Network architectures or network communication protocols for network security for authentication of entities using biometrical features, e.g. fingerprint, retina-scan
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/32Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials
    • H04L9/3226Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials using a predetermined code, e.g. password, passphrase or PIN
    • H04L9/3231Biological data, e.g. fingerprint, voice or retina
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2111Location-sensitive, e.g. geographical location, GPS
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2113Multi-level security, e.g. mandatory access control

Definitions

  • the present disclosure relates to the field of data processing, in particular, to apparatuses, methods and storage medium associated with voice and/or facial recognition based service provision.
  • FIG. 1 illustrates an overview of a computing environment, including a client device, suitable for practicing the present disclosure, in accordance with various
  • Figure 2 illustrates an example process of voice and/or facial recognition based service provision, in accordance with various embodiments.
  • Figure 3 illustrates an example computing system suitable for use as a client device, in accordance with various embodiments.
  • FIG. 4 illustrates an example storage medium with instructions configured to enable an apparatus to practice the processes of the present disclosure, in accordance with various embodiments.
  • an apparatus e.g., a set-top box or a computing tablet
  • the apparatus may further include a service agent configured to provide a service to a user of the apparatus, after the user has been identified at least an identification level required to receive the service.
  • a service agent may include an enhanced media player for consuming multi-media content, or an enhanced browser for conducting ecommerce or online financial transactions.
  • phrase “A and/or B” means (A), (B), or (A and B).
  • phrase “A, B, and/or C” means (A), (B), (C), (A and B), (A and C), (B and C), or (A, B and C).
  • module may refer to, be part of, or include an
  • ASIC Application Specific Integrated Circuit
  • an electronic circuit a processor (shared, dedicated, or group) and/or memory (shared, dedicated, or group) that execute one or more software or firmware programs, a combinational logic circuit, and/or other suitable components that provide the described functionality.
  • environment 100 may include a number of client devices 102 coupled to a number of servers 104 of online service providers, via networks 106.
  • Servers 104 may be configured to provide a wide range of online services, having different user identification requirements. Examples of such online services and theirs providers may include, but are not limited to, user customized multi-media content services provided by content distributors, such as Cable Television providers or online multi-media content providers like Youtube, Netflix, and so forth, ecommerce facilitated by hosts, such as, Ebay, Best Buy and so forth, or financial services provided by financial institutions, such as Bank of America, Etrade, and so forth.
  • client devices 102 may be configured to provide potentially a more coherent, user friendly and reliable approach to providing various levels of user identifications to meet the different user identification requirements of the different online services.
  • some online services may require only voice recognition of a user based on the voice characteristics of a user.
  • Other online services may require only facial recognition of a user based on the user's facial features.
  • Still other online services may require both the earlier described voice and facial recognitions of a user, and potentially, even other more sophisticated voice and/or facial recognition identifications, to be described more fully below.
  • a client device 102 may include voice and facial recognition engines 204a and 204b, and a number of service agents 206, coupled with each other as shown. Further, in embodiments, client device 102 may include presentation engine 134, user interface engine 136, display 124 and user input device 126, coupled with each other, engines 204a and 204b and agents 206 as shown. In embodiments, to facilitate cooperative usage of voice and facial recognition engines 204a and 204b, client device 102 may further include a common interface (not shown) to the engines 204a and 204b. In embodiments, voice and facial recognition engines 204a and 204b may be configured to provide, individually or in cooperation with each other, user identifications at a number of identification levels.
  • voice recognition engine 204a may be configured to provide an identification of a user, based on the vocal characteristics of the user's voice
  • facial recognition engine 204b may be configured to provide an identification of a user, based on the user's facial features.
  • voice recognition engine 204a and facial recognition engine 204b may collaborate to provide the above identifications. For example, in some embodiments, voice recognition engine 204a may be first employed to narrow down the identification of a user to a number of potential identifications, and facial recognition engine 204b may then be employed make the final identification based on the narrowed down list of potential identifications.
  • the cooperation may be reversed, that is, facial recognition engine 204b may be first employed to narrow down the identification of a user to a number of potential identifications, and voice recognition engine 204a may then be employed to make the final identification based on the narrowed down list of potential identifications.
  • a less precise (and typically less computational intensive) technique may be implemented for the first employed recognition engine, and a more precise (and typically more computational intensive) technique may be implemented by the latter employed recognition engine.
  • the cooperative approach may yield more accurate identification, but with overall less computations, and thus more effective as well as more efficient.
  • voice recognition engine 204a may implement any one or more of a wide range of vocal recognition techniques to compare a voice input of a user to a number of voice templates to identify the user.
  • the wide range of vocal recognition techniques may include, but are not limited to, a frequency estimation technique, a Markov model technique, a Guassian mixture model technique, a pattern matching technique, a neural network technique, a matrix representation technique, a vector quantization technique or a decision tree technique.
  • facial recognition engine 204b may implement any one or more of a wide range of facial recognition techniques to compare an image input of the user to a number of reference images.
  • the wide range of vocal recognition techniques may include, but are not limited to, analysis of relative positions, sizes or shapes of eyes, nose, cheekbones, or jaws.
  • voice recognition engine 204a may be further configured to identify the semantic content of a voice input, to enable e.g., a required passphrase to log into an online service to be provided via a voice input.
  • voice and facial recognition engine 204a and 204b may be further configured to cooperate to identify whether a voice input is attenuation synchronized with the lip movement as seen with a companion series of image inputs. The identification of synchronization may be provided by the common interface to both engines 204a and 204b, based on the analyses of the two engines 204a and 204b.
  • voice and facial recognition engine 204a and 204b may be further configured to cooperate to identify whether a voice input is location synchronized with the companion image input, that is whether the location of the voice source that provided the voice input is the same as the location of the object of the image input.
  • client device 102 may include a location service, such as, a global positioning system (GPS) component, e.g., as one of the other input devices 126.
  • GPS global positioning system
  • services agents 204 may be configured to provide and/or facilitate various online services for users of client devices 102.
  • Examples of service agents and services facilitated may include, but are not limited to, a multi-media player configured to facilitate provision of multi-media content service, including user customized service, a browser configured to facilitate access to ecommerce or financial services, and so forth.
  • These multi-media players and browsers would be enhanced versions to utilize the multi-level identification services provided by voice and/or facial recognition engines 204a and 204b.
  • service agents 204 are intended to represent a broad range of service agents found on client devices, including but are not limited to, multi-media players, browsers, or service specific applications.
  • presentation engine 134 may be configured to present content to be displayed on display 124, in response to user selections/inputs.
  • User interface engine 136 may be configured to receive the user selections/inputs from a user.
  • presentation engine 136 and user engine 136 may be configured to effectuate adaptation of the presentation of a content to enhance user experience during response to some user commands, where the adaptation is in addition to a nominal response to the user commands. See e.g. U.S. Patent Applications 13/727,138, entitled “CONTENT PRESENTATION WITH ENHANCED USER EXPERIENCE,” filed December 26, 2012.
  • Display 124 is intended to represent a broad range of display devices/screens known in the art
  • input devices 126 is intended to represent a broad range of input devices known in the art including, but are not limited to, (hard or soft) keyboards and cursor control devices, microphones for voice inputs, cameras for image inputs, and so forth. While shown as part of a client device 102, display 124 and/or user input device(s) 126 may be standalone devices or integrated, for different embodiments of client devices 102.
  • display 124 may be a stand alone television set, Liquid Crystal Display (LCD), Plasma and the like, while elements 204, 206, 134 and 136 may be part of a separate set-top box, and other user input device 126 may be a separate remote control or keyboard.
  • a chassis hosting a computing platform with elements 204, 206, 134 and 136, display 124 and other input device(s) 126 may all be separate stand alone units.
  • elements 204, 206, 134 and 136, display 124 and other input devices 126 may be integrated together into a single form factor.
  • a touch sensitive display screen may also serve as one of the other user input device(s) 126, and elements 204, 206, 134 and 136 may be components of a computing platform with a soft keyboard that also include one of the user input device(s) 126.
  • Networks 106 may be any combinations of private and/or public, wired and/or wireless, local and/or wide area networks. Private networks may include, e.g., but are not limited to, enterprise networks. Public networks, may include, e.g., but is not limited to the Internet. Wired networks, may include, e.g., but are not limited to, Ethernet networks. Wireless networks, may include, e.g., but are not limited to, Wi-Fi, or 3G/4G and beyond networks. It would be appreciated that at the server end, networks 106 may include one or more local area networks with gateways and firewalls, through which servers 104 go through to communicate with client devices 102.
  • networks 106 may include base stations and/or access points, through which client devices 102 communicate with servers 104.
  • client devices 102 and servers 104 there may be communication/network interfaces, and in between the two ends may be any number of network routers, switches and other networking equipment of the like.
  • process 300 may start at block 302, wherein an initial voice and/or facial identification may be established by voice and/or facial recognition engines 204a and 204b.
  • the initial voice identification may be made by voice recognition engine 204a by comparing a voice input of the user to a number of voice templates, using any one of a number of voice recognition techniques earlier described.
  • the initial facial identification may be made by facial recognition engine 204b by comparing an image input that includes the user to a number of reference images, using any one of a number of facial feature analysis techniques earlier described. Further, as earlier described, the initial voice and facial identifications may be cooperatively made by voice and facial recognition engines 204a and 204b.
  • process 300 may proceed to block 304.
  • a determination may be made, e.g. by each of the service agents 206, on whether a service is requested. If a result of the determination, e.g., by one of the service agent 206, indicates that a service is requested of the service agent 206, for the service agent, process 300 may proceed from block 304 to block 306.
  • another determination may be made, e.g., by the service agent 206, on whether the current level of identification of the user is sufficient or adequate to allow access to the requested service. If a result of the determination indicates that the current level of identification is sufficient or adequate to allow access to the requested service, for the service agent 206, process 300 may proceed from block 306 to block 310.
  • process 300 may proceed from block 306 to block 308.
  • the additional level of identification may be attempted.
  • an additional level of identification may include asking the user to provide a passphrase through another voice input.
  • Voice recognition engine 204a may analyze the semantic content of the additional voice input to determine if the semantic content matches the expected/required passphrase. Further, if needed, additional levels of identification such as identification of lip synchronization, and/or location synchronization may be attempted.
  • process 300 may return to block 306 to confirm the adequate levels of identification are now in place. As described earlier, on confirmation that the required level of identification is now in place, for the service agent 206, process 300, from block 306, may proceed to block 310. At block 310, the service agent 206 may provide or facilitate the requested service.
  • process 300 may proceed to block 314, to return to block 304, rejoining other service agents 206 waiting for service requests. From block 304, process 300 may continue as earlier described. On the other hand, if at block 308, process 300 fails to acquire the necessary additional level(s) of identification to provide the requested service, for the service agent 206, process 300 may proceed to block 312. At block 312, the service agent 206 may deny the requested service, and return to block 304. Again, from block 304, process 300 may continue as earlier described.
  • process 300 may return to block 304, and await for service request. If termination of process 300 is requested, process 300 may end.
  • the services may include customized provision of multi-media content for consumption, e-commerce, and/or financial services.
  • customized provision of multi-media content on establishing
  • a multi-media player may adapt a multi-media presentation including, but are not limited to,
  • a service that requires log in may be provided as follows:
  • a user may start the processing by saying to the client device: "Hi.” 2)
  • the client device may analyze the voice bio-metric of the voice and finds a match in a registered user bio-metric database; the client device may then load the identified user's information and responses to identified user by voice, saying e.g.,: "Hello, dear David, what can I do for you?".
  • the client device may determine the log-in requires an additional level of identification, and respond by voice, saying e.g.,: "please face the camera and say your passphrase.”
  • the client device may then confirm both the user's face and voice match the user's information in database, and after vocally and facially identified David, the client device may then proceed to load the user name and password for David's Youtube login, and log David into his Youtube account.
  • identification such as banking service
  • banking service may be provided a user who wants to transfer money from his bank account to pay for an on-line purchase, as follows:
  • the user may hold his bank card to a camera of the client device and say: "Hi, this is my bank card”;
  • the client device may first identify the bank card number and bank name, and determine that a higher level of identification is required;
  • the client device may response by saying e.g., "please face the camera and say your bank passphrase;;
  • the user may then respond by facing the camera and say "it is David, and my birthday is Aug. 1980.”
  • the client device may:
  • the client device may then proceed to send the user's log in information and voice passphrase to the bank system;
  • the client device may subsequently inform the user the transaction is successful, after the bank system has returned the successful result of the transaction.
  • computer 400 may include one or more processors or processor cores 402, and system memory 404.
  • processors or processor cores may be considered synonymous, unless the context clearly requires otherwise.
  • computer 400 may include mass storage devices 406 (such as diskette, hard drive, compact disc read only memory (CD-ROM) and so forth), input/output devices 408 (such as display, keyboard, cursor control and so forth) and communication interfaces 410 (such as network interface cards, modems and so forth).
  • the elements may be coupled to each other via system bus 412, which may represent one or more buses. In the case of multiple buses, they may be bridged by one or more bus bridges (not shown).
  • system memory 404 and mass storage devices 406 may be employed to store a working copy and a permanent copy of the programming instructions implementing the operations associated with practicing method 300 of Figure 2 client devices 102, earlier described.
  • the various elements may be implemented by assembler instructions supported by processor(s) 402 or high-level languages, such as, for example, C, that can be compiled into such instructions.
  • the permanent copy of the programming instructions may be placed into permanent storage devices 406 in the factory, or in the field, through, for example, a distribution medium (not shown), such as a compact disc (CD), or through communication interface 410 (from a distribution server (not shown)). That is, one or more distribution media having an implementation of the agent program may be employed to distribute the agent and program various computing devices.
  • a distribution medium such as a compact disc (CD)
  • CD compact disc
  • communication interface 410 from a distribution server (not shown)
  • Figure 4 illustrates an example non-transitory computer-readable storage medium having instructions configured to practice all or selected ones of the operations associated with method 300 of Figure 2, earlier described; in accordance with various embodiments.
  • non-transitory computer-readable storage medium 502 may include a number of programming instructions 504.
  • Programming instructions 504 may be configured to enable a device, e.g., computer 400, in response to execution of the programming instructions, to perform, e.g., various operations of process 300 of Figure 2, e.g., but not limited to, the operations performed in association with establishing one or more levels of user identifications, and providing/facilitating services based of the level of voice/facial identification established.
  • processors 402 may be packaged together with computational logic 422 configured to practice aspects of the process of Figure 2.
  • processors 402 may be packaged together with computational logic 422 configured to practice aspects of the process of Figure 3 to form a System in Package (SiP).
  • SiP System in Package
  • at least one of processors 402 may be integrated on the same die with computational logic 422 configured to practice aspects of the process of Figure 3.
  • at least one of processors 402 may be packaged together with computational logic 422 configured to practice aspects of the process of Figure 3 to form a System on Chip (SoC).
  • SoC System on Chip
  • the SoC may be utilized in, e.g., but not limited to, a computing tablet.
  • Example 1 may be an apparatus having a voice recognition engine and a facial recognition engine configured to provide, individually or in cooperation with each other, identification of a user of the apparatus at a plurality of identification levels.
  • the apparatus may further include a service agent coupled with at least one of the voice recognition engine and the facial recognition engine, and configured to provide a service to the user, after the user has been identified at least at an identification level required to receive the service.
  • Example 2 may be example 1 , wherein the voice recognition engine is configured to individually provide identification of a user at a first identification level in response to a voice input, and to cooperate with the facial recognition engine to provide identification of the user at a second identification level that is a higher identification level than the first identification level, enabling the user to be eligible for service that requires at least the second identification level.
  • the voice recognition engine is configured to individually provide identification of a user at a first identification level in response to a voice input, and to cooperate with the facial recognition engine to provide identification of the user at a second identification level that is a higher identification level than the first identification level, enabling the user to be eligible for service that requires at least the second identification level.
  • Example 3 may be example 2, wherein the voice recognition engine is configured to individually provide identification of the user at the first identification level in response to the voice input, via comparison of the voice input to a plurality of voice templates.
  • Example 4 may be example 3, wherein the voice recognition engine is configured to compare the voice input to the plurality of voice templates employing one of a frequency estimation technique, a Markov model technique, a Guassian mixture model technique, a pattern matching technique, a neural network technique, a matrix
  • Example 5 may be example 2, wherein the voice input is a first voice input, and the voice recognition engine is configured to individually provide identification of the user at the first identification level in response to the first voice input, by comparing the first voice input to a plurality of voice templates; wherein the voice recognition engine is further configured to individually provide identification of the user at a third identification level in response to a second voice input, wherein the third identification level is a higher identification level than the second identification level, enabling the user to be eligible for service that requires at least the third identification level.
  • the voice input is a first voice input
  • the voice recognition engine is configured to individually provide identification of the user at the first identification level in response to the first voice input, by comparing the first voice input to a plurality of voice templates; wherein the voice recognition engine is further configured to individually provide identification of the user at a third identification level in response to a second voice input, wherein the third identification level is a higher identification level than the second identification level, enabling the user to be eligible for service that requires at least the third identification level.
  • Example 5 may be example 2, wherein the voice input is a first voice input, and the voice recognition engine is configured to individually provide identification of the user at the first identification level in response to the first voice input, by comparing the first voice input to a plurality of voice templates; wherein the voice recognition engine is further configured to individually provide identification of the user at a third identification level in response to a second voice input, wherein the third identification level is a higher identification level than the second identification level, enabling the user to be eligible for service that requires at least the third identification level.
  • the voice input is a first voice input
  • the voice recognition engine is configured to individually provide identification of the user at the first identification level in response to the first voice input, by comparing the first voice input to a plurality of voice templates; wherein the voice recognition engine is further configured to individually provide identification of the user at a third identification level in response to a second voice input, wherein the third identification level is a higher identification level than the second identification level, enabling the user to be eligible for service that requires at least the third identification level.
  • Example 6 may be example 5, wherein the voice recognition engine is configured to determine a semantic content of the second voice input and compare the semantic content of the second voice input to a semantic reference.
  • Example 7 may be example 6, wherein the semantic reference is a passphrase.
  • Example 8 may be example 1 , wherein the facial recognition engine is configured to individually provide identification of a user at a first identification level in response to an image input, and to cooperate with the voice recognition engine to provide
  • identification of the user at a second identification level that is a higher identification level than the first identification level, enabling the user to be eligible for service that requires at least the second identification level.
  • Example 9 may be example 8, wherein the facial recognition engine is configured to individually provide identification of the user at the first identification level via comparison of the image input to a plurality of reference images.
  • Example 10 may be example 9, wherein the facial recognition engine is configured to compare the image input to the plurality of reference images via at least analysis of relative positions, sizes or shapes of eyes, nose, cheekbones, or jaws.
  • Example 11 may be any one of examples 1-10, wherein the service agent is configured to provide customized multi-media presentation service that requires an identification level that includes first and second identifications of the user by both the voice recognition engine, and the facial recognition engine.
  • Example 12 may be any one of examples 1-10, wherein the service agent is configured to facilitate access to an online service, that requires an identification level that includes first and second identifications of the user by both the voice recognition engine and the facial recognition engine based correspondingly on a first voice input and an image input, and a third identification by the voice recognition engine based on semantic content of a second voice input.
  • the service agent is configured to facilitate access to an online service, that requires an identification level that includes first and second identifications of the user by both the voice recognition engine and the facial recognition engine based correspondingly on a first voice input and an image input, and a third identification by the voice recognition engine based on semantic content of a second voice input.
  • Example 13 may be any one of examples 1-10, wherein the service agent is configured to facilitate access to an online service, that requires an identification level that includes first and second identifications of the user by both the voice recognition engine and the facial recognition engine based correspondingly on a first voice input and an image input, a third identification by the voice recognition engine based on semantic content of a second voice input, and at least a fourth identification that uses both the voice recognition engine and the facial recognition engine.
  • the service agent is configured to facilitate access to an online service, that requires an identification level that includes first and second identifications of the user by both the voice recognition engine and the facial recognition engine based correspondingly on a first voice input and an image input, a third identification by the voice recognition engine based on semantic content of a second voice input, and at least a fourth identification that uses both the voice recognition engine and the facial recognition engine.
  • Example 14 may be example 13, wherein the fourth identification comprises identifying synchronization of a real time voice input to the voice recognition engine, with lip movements in a real time image input to the facial recognition engine.
  • Example 15 may be example 13, wherein the fourth identification comprises identifying synchronization of a location of a voice source providing a voice input to the voice recognition engine, with a location of the user determined based on image input to the facial recognition engine.
  • Example 16 may be example 13, wherein the online service comprises an online financial service.
  • Example 17 may be any one of examples 1-10, wherein the apparatus is a selected one of a television set, a set-top box, a smartphone, a computing tablet, an ultrabook, a laptop computer or a desktop computer.
  • Example 18 may be a method for providing service.
  • the method may include providing, by a computing device, identification of a user of the computing device at a plurality of identification levels, via a voice recognition engine, a facial recognition engine, or both, individually or in cooperation with each other; and providing, by the computing device, a service to the user, after the user has been identified at least at an identification level required to receive the service.
  • Example 19 may be example 18, wherein providing identification of a user comprises the voice recognition engine individually providing identification of a user at a first identification level in response to a voice input, and cooperating with the facial recognition engine to provide identification of the user at a second identification level that is a higher identification level than the first identification level, enabling the user to be eligible for service that requires at least the second identification level.
  • Example 20 may be example 19, wherein the voice recognition engine individually providing identification of a user comprises the voice recognition engine individually providing identification of the user at the first identification level in response to the voice input, by comparing the voice input to a plurality of voice templates.
  • Example 21 may example 20, wherein the voice recognition engine comparing the voice input to a plurality of voice templates comprises the voice recognition engine comparing the voice input to the plurality of voice templates employing one of a frequency estimation technique, a Markov model technique, a Guassian mixture model technique, a pattern matching technique, a neural network technique, a matrix
  • Example 22 may be example 19, wherein the voice input is a first voice input, and the voice recognition engine individually providing identification of the user at the first identification level in response to the first voice input, by comparing the first voice input to a plurality of voice templates; wherein the voice recognition engine further individually provides identification of the user at a third identification level in response to a second voice input, wherein the third identification level is a higher identification level than the second identification level, enabling the user to be eligible for service that requires at least the third identification level.
  • Example 23 may be example 22, wherein the voice recognition engine individually provides identification of the user at a third identification level in response to a second voice input comprises the voice recognition engine determines a semantic content of the second voice input and compares the semantic content of the second voice input to a semantic reference.
  • Example 24 may be example 23, wherein the semantic reference is a passphrase.
  • Example 25 may be example 18, wherein providing identification of a user of the computing device at a plurality of identification levels, via a facial recognition engine, comprises the facial recognition engine individually providing identification of a user at a first identification level in response to an image input, and cooperating with the voice recognition engine to provide identification of the user at a second identification level that is a higher identification level than the first identification level, enabling the user to be eligible for service that requires at least the second identification level.
  • Example 26 may be example 25, wherein the facial recognition engine individually providing identification of a user comprises the facial recognition engine individually providing identification of the user at the first identification level by comparing the image input to a plurality of reference images.
  • Example 27 may be example 26, wherein the facial recognition engine individually providing identification of the user at the first identification level by comparing the image input to a plurality of reference images comprises the facial recognition engine comparing the image input to the plurality of reference images via at least analysis of relative positions, sizes or shapes of eyes, nose, cheekbones, or jaws.
  • Example 28 may be any one of examples 18-27, wherein providing a service to the user, after the user has been identified at least at an identification level required to receive the service comprises providing customized multi-media presentation service that requires an identification level that includes first and second identifications of the user by both the voice recognition engine, and the facial recognition engine.
  • Example 29 may be any one of examples 18-27, wherein providing a service to the user, after the user has been identified at least at an identification level required to receive the service comprises facilitating access to an online service, that requires an identification level that includes first and second identifications of the user by both the voice recognition engine and the facial recognition engine based correspondingly on a first voice input and an image input, and a third identification by the voice recognition engine based on semantic content of a second voice input.
  • Example 30 may be any one of examples 18-27, wherein providing a service to the user, after the user has been identified at least at an identification level required to receive the service comprises facilitating access to an online service, that requires an identification level that includes first and second identifications of the user by both the voice recognition engine and the facial recognition engine based correspondingly on a first voice input and an image input, a third identification by the voice recognition engine based on semantic content of a second voice input, and at least a fourth identification that uses both the voice recognition engine and the facial recognition engine.
  • Example 31 may be example 30, wherein the fourth identification comprises identifying synchronization of a real time voice input to the voice recognition engine, with lip movements in a real time image input to the facial recognition engine.
  • Example 32 may be example 30, wherein the fourth identification comprises identifying synchronization of a location of a voice source providing a voice input to the voice recognition engine, with a location of the user determined based on image input to the facial recognition engine.
  • Example 33 may be example 30, wherein the online service comprises an online financial service.
  • Example 34 may be at least one storage medium comprising a plurality of instructions configured to cause a client device, in response to execution of the instructions, to perform any one of the methods of examples 18 - 33.

Abstract

Apparatuses, methods and storage medium associated with voice and/or facial recognition based service provision are provided herein. In embodiments, an apparatus may include a voice recognition engine(204a) and a facial recognition engine(204b) configured to provide, individually or in cooperation with each other, identification of a user at a plurality of identification levels. The apparatus may further include a service agent(206) configured to provide a service to a user of the apparatus, after the user has been identified at least at an identification level required to receive the service.

Description

VOICE AND/OR FACIAL RECOGNITION BASED SERVICE PROVISION
Technical Field
The present disclosure relates to the field of data processing, in particular, to apparatuses, methods and storage medium associated with voice and/or facial recognition based service provision.
Background
The background description provided herein is for the purpose of generally presenting the context of the disclosure. Unless otherwise indicated herein, the materials described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
Advances in computing, networking and related technologies have led to
proliferation in the usage of online services, from consumption of multi-media content, to ecommerce and financial services, to name just a few. Users often prefer to access the wide range of services with the same client device. However, the security requirements often vary greatly between the different services, from one end of the spectrum, like viewing a video file online, to the other end, like conducting banking transactions online. Current art lacks a coherent user- friendly offering that can reliably meet a large range of the security requirements of the different online services.
Brief Description of the Drawings
Embodiments will be readily understood by the following detailed description in conjunction with the accompanying drawings. To facilitate this description, like reference numerals designate like structural elements. Embodiments are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings.
Figure 1 illustrates an overview of a computing environment, including a client device, suitable for practicing the present disclosure, in accordance with various
embodiments.
Figure 2 illustrates an example process of voice and/or facial recognition based service provision, in accordance with various embodiments.
Figure 3 illustrates an example computing system suitable for use as a client device, in accordance with various embodiments.
Figure 4 illustrates an example storage medium with instructions configured to enable an apparatus to practice the processes of the present disclosure, in accordance with various embodiments.
Detailed Description
Apparatuses, methods and storage medium associated with voice and/or facial recognition based service provision are disclosed herein. In embodiments, an apparatus, e.g., a set-top box or a computing tablet, may include a voice recognition engine and a facial recognition engine configured to provide, individually or in cooperation with each other, identification of a user at a plurality of identification levels. The apparatus may further include a service agent configured to provide a service to a user of the apparatus, after the user has been identified at least an identification level required to receive the service. Examples of a service agent may include an enhanced media player for consuming multi-media content, or an enhanced browser for conducting ecommerce or online financial transactions.
In the following detailed description, reference is made to the accompanying drawings which form a part hereof wherein like numerals designate like parts throughout, and in which is shown by way of illustration embodiments that may be practiced. It is to be understood that other embodiments may be utilized and structural or logical changes may be made without departing from the scope of the present disclosure. Therefore, the following detailed description is not to be taken in a limiting sense, and the scope of embodiments is defined by the appended claims and their equivalents.
Various operations may be described as multiple discrete actions or operations in turn, in a manner that is most helpful in understanding the claimed subject matter.
However, the order of description should not be construed as to imply that these operations are necessarily order dependent. In particular, these operations may not be performed in the order of presentation. Operations described may be performed in a different order than the described embodiment. Various additional operations may be performed and/or described operations may be omitted in additional embodiments.
For the purposes of the present disclosure, the phrase "A and/or B" means (A), (B), or (A and B). For the purposes of the present disclosure, the phrase "A, B, and/or C" means (A), (B), (C), (A and B), (A and C), (B and C), or (A, B and C).
The description may use the phrases "in an embodiment," or "in embodiments," which may each refer to one or more of the same or different embodiments. Furthermore, the terms "comprising," "including," "having," and the like, as used with respect to embodiments of the present disclosure, are synonymous.
As used herein, the term "module" may refer to, be part of, or include an
Application Specific Integrated Circuit ("ASIC"), an electronic circuit, a processor (shared, dedicated, or group) and/or memory (shared, dedicated, or group) that execute one or more software or firmware programs, a combinational logic circuit, and/or other suitable components that provide the described functionality.
Referring now Figure 1 , wherein a computing environment, including a client device, for practicing the present disclosure, in accordance with various embodiments, is illustrated. As shown, in embodiments, environment 100 may include a number of client devices 102 coupled to a number of servers 104 of online service providers, via networks 106. Servers 104 may be configured to provide a wide range of online services, having different user identification requirements. Examples of such online services and theirs providers may include, but are not limited to, user customized multi-media content services provided by content distributors, such as Cable Television providers or online multi-media content providers like Youtube, Netflix, and so forth, ecommerce facilitated by hosts, such as, Ebay, Best Buy and so forth, or financial services provided by financial institutions, such as Bank of America, Etrade, and so forth. As will be described in more detail below, in embodiments, client devices 102 may be configured to provide potentially a more coherent, user friendly and reliable approach to providing various levels of user identifications to meet the different user identification requirements of the different online services.
In embodiments, some online services may require only voice recognition of a user based on the voice characteristics of a user. Other online services may require only facial recognition of a user based on the user's facial features. Still other online services may require both the earlier described voice and facial recognitions of a user, and potentially, even other more sophisticated voice and/or facial recognition identifications, to be described more fully below.
In embodiments, as shown, a client device 102 may include voice and facial recognition engines 204a and 204b, and a number of service agents 206, coupled with each other as shown. Further, in embodiments, client device 102 may include presentation engine 134, user interface engine 136, display 124 and user input device 126, coupled with each other, engines 204a and 204b and agents 206 as shown. In embodiments, to facilitate cooperative usage of voice and facial recognition engines 204a and 204b, client device 102 may further include a common interface (not shown) to the engines 204a and 204b. In embodiments, voice and facial recognition engines 204a and 204b may be configured to provide, individually or in cooperation with each other, user identifications at a number of identification levels. In embodiments, voice recognition engine 204a may be configured to provide an identification of a user, based on the vocal characteristics of the user's voice, whereas facial recognition engine 204b may be configured to provide an identification of a user, based on the user's facial features. In embodiments, voice recognition engine 204a and facial recognition engine 204b may collaborate to provide the above identifications. For example, in some embodiments, voice recognition engine 204a may be first employed to narrow down the identification of a user to a number of potential identifications, and facial recognition engine 204b may then be employed make the final identification based on the narrowed down list of potential identifications. In other embodiments, the cooperation may be reversed, that is, facial recognition engine 204b may be first employed to narrow down the identification of a user to a number of potential identifications, and voice recognition engine 204a may then be employed to make the final identification based on the narrowed down list of potential identifications. Thus, for these cooperative embodiments, a less precise (and typically less computational intensive) technique may be implemented for the first employed recognition engine, and a more precise (and typically more computational intensive) technique may be implemented by the latter employed recognition engine. Together, the cooperative approach may yield more accurate identification, but with overall less computations, and thus more effective as well as more efficient.
Thus, depends on embodiments, voice recognition engine 204a may implement any one or more of a wide range of vocal recognition techniques to compare a voice input of a user to a number of voice templates to identify the user. The wide range of vocal recognition techniques may include, but are not limited to, a frequency estimation technique, a Markov model technique, a Guassian mixture model technique, a pattern matching technique, a neural network technique, a matrix representation technique, a vector quantization technique or a decision tree technique. Similarly, facial recognition engine 204b may implement any one or more of a wide range of facial recognition techniques to compare an image input of the user to a number of reference images. The wide range of vocal recognition techniques may include, but are not limited to, analysis of relative positions, sizes or shapes of eyes, nose, cheekbones, or jaws.
In embodiments, voice recognition engine 204a may be further configured to identify the semantic content of a voice input, to enable e.g., a required passphrase to log into an online service to be provided via a voice input. In other embodiments, voice and facial recognition engine 204a and 204b may be further configured to cooperate to identify whether a voice input is attenuation synchronized with the lip movement as seen with a companion series of image inputs. The identification of synchronization may be provided by the common interface to both engines 204a and 204b, based on the analyses of the two engines 204a and 204b. In other embodiments, voice and facial recognition engine 204a and 204b may be further configured to cooperate to identify whether a voice input is location synchronized with the companion image input, that is whether the location of the voice source that provided the voice input is the same as the location of the object of the image input. In embodiments, client device 102 may include a location service, such as, a global positioning system (GPS) component, e.g., as one of the other input devices 126.
Still referring to Figure 1 , services agents 204 may be configured to provide and/or facilitate various online services for users of client devices 102. Examples of service agents and services facilitated may include, but are not limited to, a multi-media player configured to facilitate provision of multi-media content service, including user customized service, a browser configured to facilitate access to ecommerce or financial services, and so forth. These multi-media players and browsers would be enhanced versions to utilize the multi-level identification services provided by voice and/or facial recognition engines 204a and 204b. Thus, except for the usage of the multi-level identification services provided by voice and/or facial recognition engines 204a and 204b, service agents 204 are intended to represent a broad range of service agents found on client devices, including but are not limited to, multi-media players, browsers, or service specific applications.
In embodiments, presentation engine 134 may be configured to present content to be displayed on display 124, in response to user selections/inputs. User interface engine 136 may be configured to receive the user selections/inputs from a user. Further, in various embodiments, presentation engine 136 and user engine 136 may be configured to effectuate adaptation of the presentation of a content to enhance user experience during response to some user commands, where the adaptation is in addition to a nominal response to the user commands. See e.g. U.S. Patent Applications 13/727,138, entitled "CONTENT PRESENTATION WITH ENHANCED USER EXPERIENCE," filed December 26, 2012.
Display 124 is intended to represent a broad range of display devices/screens known in the art, whereas input devices 126 is intended to represent a broad range of input devices known in the art including, but are not limited to, (hard or soft) keyboards and cursor control devices, microphones for voice inputs, cameras for image inputs, and so forth. While shown as part of a client device 102, display 124 and/or user input device(s) 126 may be standalone devices or integrated, for different embodiments of client devices 102. For example, for a television arrangement, display 124 may be a stand alone television set, Liquid Crystal Display (LCD), Plasma and the like, while elements 204, 206, 134 and 136 may be part of a separate set-top box, and other user input device 126 may be a separate remote control or keyboard. Similarly, for a desktop computer arrangement, a chassis hosting a computing platform with elements 204, 206, 134 and 136, display 124 and other input device(s) 126 may all be separate stand alone units. On the other hand, for a laptop, ultrabook, tablet or smartphone arrangement, elements 204, 206, 134 and 136, display 124 and other input devices 126 may be integrated together into a single form factor. Further, for tablet or smartphone arrangement, a touch sensitive display screen may also serve as one of the other user input device(s) 126, and elements 204, 206, 134 and 136 may be components of a computing platform with a soft keyboard that also include one of the user input device(s) 126.
Networks 106 may be any combinations of private and/or public, wired and/or wireless, local and/or wide area networks. Private networks may include, e.g., but are not limited to, enterprise networks. Public networks, may include, e.g., but is not limited to the Internet. Wired networks, may include, e.g., but are not limited to, Ethernet networks. Wireless networks, may include, e.g., but are not limited to, Wi-Fi, or 3G/4G and beyond networks. It would be appreciated that at the server end, networks 106 may include one or more local area networks with gateways and firewalls, through which servers 104 go through to communicate with client devices 102. Similarly, at the client device end, networks 106 may include base stations and/or access points, through which client devices 102 communicate with servers 104. Within each of client devices 102 and servers 104, there may be communication/network interfaces, and in between the two ends may be any number of network routers, switches and other networking equipment of the like.
However, for ease of understanding, these communication/network interfaces, gateways, firewalls, routers, switches, base stations, access points and the like are not shown.
Referring now to Figure 2, wherein an example process for presenting content, in accordance with various embodiments, is illustrated. As shown, process 300 may start at block 302, wherein an initial voice and/or facial identification may be established by voice and/or facial recognition engines 204a and 204b. As described earlier, the initial voice identification may be made by voice recognition engine 204a by comparing a voice input of the user to a number of voice templates, using any one of a number of voice recognition techniques earlier described. The initial facial identification may be made by facial recognition engine 204b by comparing an image input that includes the user to a number of reference images, using any one of a number of facial feature analysis techniques earlier described. Further, as earlier described, the initial voice and facial identifications may be cooperatively made by voice and facial recognition engines 204a and 204b.
From block 302, process 300 may proceed to block 304. At block 304, a determination may be made, e.g. by each of the service agents 206, on whether a service is requested. If a result of the determination, e.g., by one of the service agent 206, indicates that a service is requested of the service agent 206, for the service agent, process 300 may proceed from block 304 to block 306. At block 306, another determination may be made, e.g., by the service agent 206, on whether the current level of identification of the user is sufficient or adequate to allow access to the requested service. If a result of the determination indicates that the current level of identification is sufficient or adequate to allow access to the requested service, for the service agent 206, process 300 may proceed from block 306 to block 310.
If a result of the determination indicates that the current level of identification is insufficient or inadequate to allow access to the requested service, process 300 may proceed from block 306 to block 308. At block 308, the additional level of identification may be attempted. As described earlier, an additional level of identification may include asking the user to provide a passphrase through another voice input. Voice recognition engine 204a may analyze the semantic content of the additional voice input to determine if the semantic content matches the expected/required passphrase. Further, if needed, additional levels of identification such as identification of lip synchronization, and/or location synchronization may be attempted.
If successful (succ), for the service agent 206, process 300, from block 308, may return to block 306 to confirm the adequate levels of identification are now in place. As described earlier, on confirmation that the required level of identification is now in place, for the service agent 206, process 300, from block 306, may proceed to block 310. At block 310, the service agent 206 may provide or facilitate the requested service.
Thereafter, for the service agent 206, process 300 may proceed to block 314, to return to block 304, rejoining other service agents 206 waiting for service requests. From block 304, process 300 may continue as earlier described. On the other hand, if at block 308, process 300 fails to acquire the necessary additional level(s) of identification to provide the requested service, for the service agent 206, process 300 may proceed to block 312. At block 312, the service agent 206 may deny the requested service, and return to block 304. Again, from block 304, process 300 may continue as earlier described.
Back at block 304, if a result of determination indicates that no service is requested, another determination may be made to determine if termination of process 300 is requested. If not, process 300 may return to block 304, and await for service request. If termination of process 300 is requested, process 300 may end.
As described earlier, in embodiments, the services may include customized provision of multi-media content for consumption, e-commerce, and/or financial services. For example, for customized provision of multi-media content, on establishing
identification of the required level, a multi-media player may adapt a multi-media presentation including, but are not limited to,
- loading the identified user's preference automatically
switching to the identified user's favorite channels or last watching
channel/movie
loading a conversation history between the identified user and a set-top box for better understanding current conversation.
- loading alerts, notifications and calendar specific to the identified user
recommending channels/contents based on the watching history of the identified user
displaying advertisements specifically targeted to the identified user retrieving emails of the identified user
- displaying news filtered for the identified user or subscribed by the identified user
analyzing the identified user's behaviors for pushing more relevant information identifying the speaker of video telephone and displaying information of the speaker
- customize response to a service call from the identified user
In another service scenario, a service that requires log in may be provided as follows:
1) A user may start the processing by saying to the client device: "Hi." 2) The client device may analyze the voice bio-metric of the voice and finds a match in a registered user bio-metric database; the client device may then load the identified user's information and responses to identified user by voice, saying e.g.,: "Hello, dear David, what can I do for you?".
3) User David may then say to the client device: "Log me in to Youtube."
4) The client device may determine the log-in requires an additional level of identification, and respond by voice, saying e.g.,: "please face the camera and say your passphrase."
5) User David may then face the camera and say "this is David."
6) The client device may then confirm both the user's face and voice match the user's information in database, and after vocally and facially identified David, the client device may then proceed to load the user name and password for David's Youtube login, and log David into his Youtube account.
In still another service scenario, a service that requires very high level
identification, such as banking service, may be provided a user who wants to transfer money from his bank account to pay for an on-line purchase, as follows:
- the user, with initial identifications, may hold his bank card to a camera of the client device and say: "Hi, this is my bank card";
- the client device may first identify the bank card number and bank name, and determine that a higher level of identification is required;
- on determination, the client device may response by saying e.g., "please face the camera and say your bank passphrase;;
- the user may then respond by facing the camera and say "it is David, and my birthday is Aug. 1980."
- in response, in addition to extracting the substance of the voice input, the client device may:
a. check whether the user's lip movement as seen from an image input is in sync with the voice input;
b. check whether the user's location recognized as the voice source is the same as the user's location recognized through visual recognition;
c. check and determine whether the user's environment is consistent with the location identified (to prevent video recording cheat); - on confirmation that all the additional checks/identifications have passed, the client device may then proceed to send the user's log in information and voice passphrase to the bank system;
- further, the client device may subsequently inform the user the transaction is successful, after the bank system has returned the successful result of the transaction.
Referring now to Figure 3, wherein an example computer suitable for use as a client device, in accordance with various embodiments, is illustrated. As shown, computer 400 may include one or more processors or processor cores 402, and system memory 404. For the purpose of this application, including the claims, the terms "processor" and "processor cores" may be considered synonymous, unless the context clearly requires otherwise. Additionally, computer 400 may include mass storage devices 406 (such as diskette, hard drive, compact disc read only memory (CD-ROM) and so forth), input/output devices 408 (such as display, keyboard, cursor control and so forth) and communication interfaces 410 (such as network interface cards, modems and so forth). The elements may be coupled to each other via system bus 412, which may represent one or more buses. In the case of multiple buses, they may be bridged by one or more bus bridges (not shown).
Each of these elements may perform its conventional functions known in the art. In particular, system memory 404 and mass storage devices 406 may be employed to store a working copy and a permanent copy of the programming instructions implementing the operations associated with practicing method 300 of Figure 2 client devices 102, earlier described. The various elements may be implemented by assembler instructions supported by processor(s) 402 or high-level languages, such as, for example, C, that can be compiled into such instructions.
The permanent copy of the programming instructions may be placed into permanent storage devices 406 in the factory, or in the field, through, for example, a distribution medium (not shown), such as a compact disc (CD), or through communication interface 410 (from a distribution server (not shown)). That is, one or more distribution media having an implementation of the agent program may be employed to distribute the agent and program various computing devices.
Figure 4 illustrates an example non-transitory computer-readable storage medium having instructions configured to practice all or selected ones of the operations associated with method 300 of Figure 2, earlier described; in accordance with various embodiments. As illustrated, non-transitory computer-readable storage medium 502 may include a number of programming instructions 504. Programming instructions 504 may be configured to enable a device, e.g., computer 400, in response to execution of the programming instructions, to perform, e.g., various operations of process 300 of Figure 2, e.g., but not limited to, the operations performed in association with establishing one or more levels of user identifications, and providing/facilitating services based of the level of voice/facial identification established.
Referring back to Figure 3, for one embodiment, at least one of processors 402 may be packaged together with computational logic 422 configured to practice aspects of the process of Figure 2. For one embodiment, at least one of processors 402 may be packaged together with computational logic 422 configured to practice aspects of the process of Figure 3 to form a System in Package (SiP). For one embodiment, at least one of processors 402 may be integrated on the same die with computational logic 422 configured to practice aspects of the process of Figure 3. For one embodiment, at least one of processors 402 may be packaged together with computational logic 422 configured to practice aspects of the process of Figure 3 to form a System on Chip (SoC). For at least one embodiment, the SoC may be utilized in, e.g., but not limited to, a computing tablet.
The following paragraphs describe examples of various embodiments.
Example 1 may be an apparatus having a voice recognition engine and a facial recognition engine configured to provide, individually or in cooperation with each other, identification of a user of the apparatus at a plurality of identification levels. The apparatus may further include a service agent coupled with at least one of the voice recognition engine and the facial recognition engine, and configured to provide a service to the user, after the user has been identified at least at an identification level required to receive the service.
Example 2 may be example 1 , wherein the voice recognition engine is configured to individually provide identification of a user at a first identification level in response to a voice input, and to cooperate with the facial recognition engine to provide identification of the user at a second identification level that is a higher identification level than the first identification level, enabling the user to be eligible for service that requires at least the second identification level.
Example 3 may be example 2, wherein the voice recognition engine is configured to individually provide identification of the user at the first identification level in response to the voice input, via comparison of the voice input to a plurality of voice templates. Example 4 may be example 3, wherein the voice recognition engine is configured to compare the voice input to the plurality of voice templates employing one of a frequency estimation technique, a Markov model technique, a Guassian mixture model technique, a pattern matching technique, a neural network technique, a matrix
representation technique, a vector quantization technique or a decision tree technique.
Example 5 may be example 2, wherein the voice input is a first voice input, and the voice recognition engine is configured to individually provide identification of the user at the first identification level in response to the first voice input, by comparing the first voice input to a plurality of voice templates; wherein the voice recognition engine is further configured to individually provide identification of the user at a third identification level in response to a second voice input, wherein the third identification level is a higher identification level than the second identification level, enabling the user to be eligible for service that requires at least the third identification level.
Example 5 may be example 2, wherein the voice input is a first voice input, and the voice recognition engine is configured to individually provide identification of the user at the first identification level in response to the first voice input, by comparing the first voice input to a plurality of voice templates; wherein the voice recognition engine is further configured to individually provide identification of the user at a third identification level in response to a second voice input, wherein the third identification level is a higher identification level than the second identification level, enabling the user to be eligible for service that requires at least the third identification level.
Example 6 may be example 5, wherein the voice recognition engine is configured to determine a semantic content of the second voice input and compare the semantic content of the second voice input to a semantic reference.
Example 7 may be example 6, wherein the semantic reference is a passphrase.
Example 8 may be example 1 , wherein the facial recognition engine is configured to individually provide identification of a user at a first identification level in response to an image input, and to cooperate with the voice recognition engine to provide
identification of the user at a second identification level that is a higher identification level than the first identification level, enabling the user to be eligible for service that requires at least the second identification level.
Example 9 may be example 8, wherein the facial recognition engine is configured to individually provide identification of the user at the first identification level via comparison of the image input to a plurality of reference images. Example 10 may be example 9, wherein the facial recognition engine is configured to compare the image input to the plurality of reference images via at least analysis of relative positions, sizes or shapes of eyes, nose, cheekbones, or jaws.
Example 11 may be any one of examples 1-10, wherein the service agent is configured to provide customized multi-media presentation service that requires an identification level that includes first and second identifications of the user by both the voice recognition engine, and the facial recognition engine.
Example 12 may be any one of examples 1-10, wherein the service agent is configured to facilitate access to an online service, that requires an identification level that includes first and second identifications of the user by both the voice recognition engine and the facial recognition engine based correspondingly on a first voice input and an image input, and a third identification by the voice recognition engine based on semantic content of a second voice input.
Example 13 may be any one of examples 1-10, wherein the service agent is configured to facilitate access to an online service, that requires an identification level that includes first and second identifications of the user by both the voice recognition engine and the facial recognition engine based correspondingly on a first voice input and an image input, a third identification by the voice recognition engine based on semantic content of a second voice input, and at least a fourth identification that uses both the voice recognition engine and the facial recognition engine.
Example 14 may be example 13, wherein the fourth identification comprises identifying synchronization of a real time voice input to the voice recognition engine, with lip movements in a real time image input to the facial recognition engine.
Example 15 may be example 13, wherein the fourth identification comprises identifying synchronization of a location of a voice source providing a voice input to the voice recognition engine, with a location of the user determined based on image input to the facial recognition engine.
Example 16 may be example 13, wherein the online service comprises an online financial service.
Example 17 may be any one of examples 1-10, wherein the apparatus is a selected one of a television set, a set-top box, a smartphone, a computing tablet, an ultrabook, a laptop computer or a desktop computer.
Example 18 may be a method for providing service. The method may include providing, by a computing device, identification of a user of the computing device at a plurality of identification levels, via a voice recognition engine, a facial recognition engine, or both, individually or in cooperation with each other; and providing, by the computing device, a service to the user, after the user has been identified at least at an identification level required to receive the service.
Example 19 may be example 18, wherein providing identification of a user comprises the voice recognition engine individually providing identification of a user at a first identification level in response to a voice input, and cooperating with the facial recognition engine to provide identification of the user at a second identification level that is a higher identification level than the first identification level, enabling the user to be eligible for service that requires at least the second identification level.
Example 20 may be example 19, wherein the voice recognition engine individually providing identification of a user comprises the voice recognition engine individually providing identification of the user at the first identification level in response to the voice input, by comparing the voice input to a plurality of voice templates.
Example 21 may example 20, wherein the voice recognition engine comparing the voice input to a plurality of voice templates comprises the voice recognition engine comparing the voice input to the plurality of voice templates employing one of a frequency estimation technique, a Markov model technique, a Guassian mixture model technique, a pattern matching technique, a neural network technique, a matrix
representation technique, a vector quantization technique or a decision tree technique.
Example 22 may be example 19, wherein the voice input is a first voice input, and the voice recognition engine individually providing identification of the user at the first identification level in response to the first voice input, by comparing the first voice input to a plurality of voice templates; wherein the voice recognition engine further individually provides identification of the user at a third identification level in response to a second voice input, wherein the third identification level is a higher identification level than the second identification level, enabling the user to be eligible for service that requires at least the third identification level.
Example 23 may be example 22, wherein the voice recognition engine individually provides identification of the user at a third identification level in response to a second voice input comprises the voice recognition engine determines a semantic content of the second voice input and compares the semantic content of the second voice input to a semantic reference.
Example 24 may be example 23, wherein the semantic reference is a passphrase. Example 25 may be example 18, wherein providing identification of a user of the computing device at a plurality of identification levels, via a facial recognition engine, comprises the facial recognition engine individually providing identification of a user at a first identification level in response to an image input, and cooperating with the voice recognition engine to provide identification of the user at a second identification level that is a higher identification level than the first identification level, enabling the user to be eligible for service that requires at least the second identification level.
Example 26 may be example 25, wherein the facial recognition engine individually providing identification of a user comprises the facial recognition engine individually providing identification of the user at the first identification level by comparing the image input to a plurality of reference images.
Example 27 may be example 26, wherein the facial recognition engine individually providing identification of the user at the first identification level by comparing the image input to a plurality of reference images comprises the facial recognition engine comparing the image input to the plurality of reference images via at least analysis of relative positions, sizes or shapes of eyes, nose, cheekbones, or jaws.
Example 28 may be any one of examples 18-27, wherein providing a service to the user, after the user has been identified at least at an identification level required to receive the service comprises providing customized multi-media presentation service that requires an identification level that includes first and second identifications of the user by both the voice recognition engine, and the facial recognition engine.
Example 29 may be any one of examples 18-27, wherein providing a service to the user, after the user has been identified at least at an identification level required to receive the service comprises facilitating access to an online service, that requires an identification level that includes first and second identifications of the user by both the voice recognition engine and the facial recognition engine based correspondingly on a first voice input and an image input, and a third identification by the voice recognition engine based on semantic content of a second voice input.
Example 30 may be any one of examples 18-27, wherein providing a service to the user, after the user has been identified at least at an identification level required to receive the service comprises facilitating access to an online service, that requires an identification level that includes first and second identifications of the user by both the voice recognition engine and the facial recognition engine based correspondingly on a first voice input and an image input, a third identification by the voice recognition engine based on semantic content of a second voice input, and at least a fourth identification that uses both the voice recognition engine and the facial recognition engine.
Example 31 may be example 30, wherein the fourth identification comprises identifying synchronization of a real time voice input to the voice recognition engine, with lip movements in a real time image input to the facial recognition engine.
Example 32 may be example 30, wherein the fourth identification comprises identifying synchronization of a location of a voice source providing a voice input to the voice recognition engine, with a location of the user determined based on image input to the facial recognition engine.
Example 33 may be example 30, wherein the online service comprises an online financial service.
Example 34 may be at least one storage medium comprising a plurality of instructions configured to cause a client device, in response to execution of the instructions, to perform any one of the methods of examples 18 - 33.
Although certain embodiments have been illustrated and described herein for purposes of description, a wide variety of alternate and/or equivalent embodiments or implementations calculated to achieve the same purposes may be substituted for the embodiments shown and described without departing from the scope of the present disclosure. This application is intended to cover any adaptations or variations of the embodiments discussed herein. Therefore, it is manifestly intended that embodiments described herein be limited only by the claims.
Where the disclosure recites "a" or "a first" element or the equivalent thereof, such disclosure includes one or more such elements, neither requiring nor excluding two or more such elements. Further, ordinal indicators (e.g., first, second or third) for identified elements are used to distinguish between the elements, and do not indicate or imply a required or limited number of such elements, nor do they indicate a particular position or order of such elements unless otherwise specifically stated.

Claims

Claims What is claimed is:
1. An apparatus, comprising:
a voice recognition engine and a facial recognition engine configured to provide, individually or in cooperation with each other, identification of a user of the apparatus at a plurality of identification levels; and
a service agent coupled with at least one of the voice recognition engine and the facial recognition engine, and configured to provide a service to the user, after the user has been identified at least at an identification level required to receive the service.
2. The apparatus of claim 1, wherein the voice recognition engine is configured to individually provide identification of a user at a first identification level in response to a voice input, and to cooperate with the facial recognition engine to provide identification of the user at a second identification level that is a higher identification level than the first identification level, enabling the user to be eligible for service that requires at least the second identification level.
3. The apparatus of claim 2, wherein the voice recognition engine is configured to individually provide identification of the user at the first identification level in response to the voice input, via comparison of the voice input to a plurality of voice templates.
4. The apparatus of claim 3, wherein the voice recognition engine is configured to compare the voice input to the plurality of voice templates employing one of a frequency estimation technique, a Markov model technique, a Guassian mixture model technique, a pattern matching technique, a neural network technique, a matrix
representation technique, a vector quantization technique or a decision tree technique.
5. The apparatus of claim 2, wherein the voice input is a first voice input, and the voice recognition engine is configured to individually provide identification of the user at the first identification level in response to the first voice input, by comparing the first voice input to a plurality of voice templates; wherein the voice recognition engine is further configured to individually provide identification of the user at a third identification level in response to a second voice input, wherein the third identification level is a higher identification level than the second identification level, enabling the user to be eligible for service that requires at least the third identification level.
6. The apparatus of claim 5, wherein the voice recognition engine is configured to determine a semantic content of the second voice input and compare the semantic content of the second voice input to a semantic reference.
7. The apparatus of claim 6, wherein the semantic reference is a passphrase.
8. The apparatus of claim 1, wherein the facial recognition engine is configured to individually provide identification of a user at a first identification level in response to an image input, and to cooperate with the voice recognition engine to provide identification of the user at a second identification level that is a higher identification level than the first identification level, enabling the user to be eligible for service that requires at least the second identification level.
9. The apparatus of claim 8, wherein the facial recognition engine is configured to individually provide identification of the user at the first identification level via comparison of the image input to a plurality of reference images.
10. The apparatus of claim 9, wherein the facial recognition engine is configured to compare the image input to the plurality of reference images via at least analysis of relative positions, sizes or shapes of eyes, nose, cheekbones, or jaws.
11. The apparatus of any one of claims 1 - 10, wherein the service agent is configured to provide customized multi-media presentation service that requires an identification level that includes first and second identifications of the user by both the voice recognition engine, and the facial recognition engine.
12. The apparatus of any one of claims 1 - 10, wherein the service agent is configured to facilitate access to an online service, that requires an identification level that includes first and second identifications of the user by both the voice recognition engine and the facial recognition engine based correspondingly on a first voice input and an image input, and a third identification by the voice recognition engine based on semantic content of a second voice input.
13. The apparatus of any one of claims 1 - 10, wherein the service agent is configured to facilitate access to an online service, that requires an identification level that includes first and second identifications of the user by both the voice recognition engine and the facial recognition engine based correspondingly on a first voice input and an image input, a third identification by the voice recognition engine based on semantic content of a second voice input, and at least a fourth identification that uses both the voice recognition engine and the facial recognition engine.
14. The apparatus of claim 13, wherein the fourth identification comprises identifying synchronization of a real time voice input to the voice recognition engine, with lip movements in a real time image input to the facial recognition engine.
15. The apparatus of claim 13, wherein the fourth identification comprises identifying synchronization of a location of a voice source providing a voice input to the voice recognition engine, with a location of the user determined based on image input to the facial recognition engine.
16. A computer-implemented method for providing service, comprising:
providing, by a computing device, identification of a user of the computing device at a plurality of identification levels, via a voice recognition engine, a facial recognition engine, or both, individually or in cooperation with each other; and
providing, by the computing device, a service to the user, after the user has been identified at least at an identification level required to receive the service.
17. The method of claim 16, wherein providing identification of a user comprises the voice recognition engine individually providing identification of a user at a first identification level in response to a voice input, and cooperating with the facial recognition engine to provide identification of the user at a second identification level that is a higher identification level than the first identification level, enabling the user to be eligible for service that requires at least the second identification level.
18. The method of claim 16, wherein the voice input is a first voice input, and the voice recognition engine individually providing identification of the user at the first identification level in response to the first voice input, by comparing the first voice input to a plurality of voice templates; wherein the voice recognition engine further individually provides identification of the user at a third identification level in response to a second voice input, wherein the third identification level is a higher identification level than the second identification level, enabling the user to be eligible for service that requires at least the third identification level.
19. The method of claim 18, wherein the voice recognition engine individually provides identification of the user at a third identification level in response to a second voice input comprises the voice recognition engine determines a semantic content of the second voice input and compares the semantic content of the second voice input to a semantic reference.
20. The method of claim 16, wherein providing identification of a user of the computing device at a plurality of identification levels, via a facial recognition engine, comprises the facial recognition engine individually providing identification of a user at a first identification level in response to an image input, and cooperating with the voice recognition engine to provide identification of the user at a second identification level that is a higher identification level than the first identification level, enabling the user to be eligible for service that requires at least the second identification level.
21. The method of any one of claims 16 - 20, wherein providing a service to the user, after the user has been identified at least at an identification level required to receive the service comprises providing customized multi-media presentation service that requires an identification level that includes first and second identifications of the user by both the voice recognition engine, and the facial recognition engine.
22. The method of any one of claims 16 - 20, wherein providing a service to the user, after the user has been identified at least at an identification level required to receive the service comprises facilitating access to an online service, that requires an identification level that includes first and second identifications of the user by both the voice recognition engine and the facial recognition engine based correspondingly on a first voice input and an image input, and a third identification by the voice recognition engine based on semantic content of a second voice input.
23. The method of any one of claims 16 - 20, wherein providing a service to the user, after the user has been identified at least at an identification level required to receive the service comprises facilitating access to an online service, that requires an identification level that includes first and second identifications of the user by both the voice recognition engine and the facial recognition engine based correspondingly on a first voice input and an image input, a third identification by the voice recognition engine based on semantic content of a second voice input, and at least a fourth identification that uses both the voice recognition engine and the facial recognition engine.
24. The method of claim 23, wherein the fourth identification comprises identifying synchronization of a real time voice input to the voice recognition engine, with lip movements in a real time image input to the facial recognition engine.
25. The method of claim 23, wherein the fourth identification comprises identifying synchronization of a location of a voice source providing a voice input to the voice recognition engine, with a location of the user determined based on image input to the facial recognition engine.
26. At least one storage medium comprising a plurality of instructions configured to cause a client device, in response to execution of the instructions, to perform any one of the methods of claims 18 - 25.
PCT/CN2013/072590 2013-03-14 2013-03-14 Voice and/or facial recognition based service provision WO2014139117A1 (en)

Priority Applications (6)

Application Number Priority Date Filing Date Title
KR1020157021017A KR101731404B1 (en) 2013-03-14 2013-03-14 Voice and/or facial recognition based service provision
US13/995,476 US9218813B2 (en) 2013-03-14 2013-03-14 Voice and/or facial recognition based service provision
JP2015556364A JP6093040B2 (en) 2013-03-14 2013-03-14 Apparatus, method, computer program, and storage medium for providing service
EP13877676.0A EP2974124A4 (en) 2013-03-14 2013-03-14 Voice and/or facial recognition based service provision
PCT/CN2013/072590 WO2014139117A1 (en) 2013-03-14 2013-03-14 Voice and/or facial recognition based service provision
CN201380073090.0A CN104995865B (en) 2013-03-14 2013-03-14 Service based on sound and/or face recognition provides

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2013/072590 WO2014139117A1 (en) 2013-03-14 2013-03-14 Voice and/or facial recognition based service provision

Publications (1)

Publication Number Publication Date
WO2014139117A1 true WO2014139117A1 (en) 2014-09-18

Family

ID=51535810

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2013/072590 WO2014139117A1 (en) 2013-03-14 2013-03-14 Voice and/or facial recognition based service provision

Country Status (6)

Country Link
US (1) US9218813B2 (en)
EP (1) EP2974124A4 (en)
JP (1) JP6093040B2 (en)
KR (1) KR101731404B1 (en)
CN (1) CN104995865B (en)
WO (1) WO2014139117A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20170129203A (en) * 2015-03-13 2017-11-24 알리바바 그룹 홀딩 리미티드 And a method for activating a business by voice in communication software
CN107430854A (en) * 2015-04-13 2017-12-01 Bsh家用电器有限公司 Home appliances and the method for operating home appliances
CN108494836A (en) * 2018-03-09 2018-09-04 上海星视度科技有限公司 Information-pushing method, device and equipment
JP2019522840A (en) * 2016-05-19 2019-08-15 アリババ グループ ホウルディング リミテッド Identity authentication method and apparatus

Families Citing this family (60)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10270748B2 (en) 2013-03-22 2019-04-23 Nok Nok Labs, Inc. Advanced authentication techniques and applications
US9305298B2 (en) 2013-03-22 2016-04-05 Nok Nok Labs, Inc. System and method for location-based authentication
US9887983B2 (en) 2013-10-29 2018-02-06 Nok Nok Labs, Inc. Apparatus and method for implementing composite authenticators
US9961077B2 (en) 2013-05-30 2018-05-01 Nok Nok Labs, Inc. System and method for biometric authentication with device attestation
US10276026B2 (en) * 2013-12-06 2019-04-30 Vivint, Inc. Voice annunciated reminders and alerts
US9654469B1 (en) 2014-05-02 2017-05-16 Nok Nok Labs, Inc. Web-based user authentication techniques and applications
US9508360B2 (en) * 2014-05-28 2016-11-29 International Business Machines Corporation Semantic-free text analysis for identifying traits
US10148630B2 (en) 2014-07-31 2018-12-04 Nok Nok Labs, Inc. System and method for implementing a hosted authentication service
US9431003B1 (en) 2015-03-27 2016-08-30 International Business Machines Corporation Imbuing artificial intelligence systems with idiomatic traits
CN107635431B (en) 2015-06-08 2022-01-11 化妆品科技有限责任公司 Automatic dispensing system for cosmetic samples
KR20170052976A (en) * 2015-11-05 2017-05-15 삼성전자주식회사 Electronic device for performing motion and method for controlling thereof
JP6756503B2 (en) * 2016-03-30 2020-09-16 株式会社ユニバーサルエンターテインメント Information display device
US11275446B2 (en) * 2016-07-07 2022-03-15 Capital One Services, Llc Gesture-based user interface
US10769635B2 (en) * 2016-08-05 2020-09-08 Nok Nok Labs, Inc. Authentication techniques including speech and/or lip movement analysis
WO2018027148A1 (en) * 2016-08-05 2018-02-08 Nok Nok Labs, Inc. Authentication techniques including speech and/or lip movement analysis
US10637853B2 (en) 2016-08-05 2020-04-28 Nok Nok Labs, Inc. Authentication techniques including speech and/or lip movement analysis
CN108134767A (en) * 2016-12-01 2018-06-08 中兴通讯股份有限公司 A kind of cut-in method and server
US20180189471A1 (en) * 2016-12-30 2018-07-05 Facebook, Inc. Visual CAPTCHA Based On Image Segmentation
US10091195B2 (en) 2016-12-31 2018-10-02 Nok Nok Labs, Inc. System and method for bootstrapping a user binding
US10237070B2 (en) 2016-12-31 2019-03-19 Nok Nok Labs, Inc. System and method for sharing keys across authenticators
US10178432B2 (en) 2017-05-18 2019-01-08 Sony Corporation Identity-based face and voice recognition to regulate content rights and parental controls using consumer profiles
GB2578386B (en) 2017-06-27 2021-12-01 Cirrus Logic Int Semiconductor Ltd Detection of replay attack
GB201713697D0 (en) 2017-06-28 2017-10-11 Cirrus Logic Int Semiconductor Ltd Magnetic detection of replay attack
GB2563953A (en) 2017-06-28 2019-01-02 Cirrus Logic Int Semiconductor Ltd Detection of replay attack
GB201801527D0 (en) 2017-07-07 2018-03-14 Cirrus Logic Int Semiconductor Ltd Method, apparatus and systems for biometric processes
GB201801528D0 (en) 2017-07-07 2018-03-14 Cirrus Logic Int Semiconductor Ltd Method, apparatus and systems for biometric processes
GB201801532D0 (en) 2017-07-07 2018-03-14 Cirrus Logic Int Semiconductor Ltd Methods, apparatus and systems for audio playback
GB201801526D0 (en) 2017-07-07 2018-03-14 Cirrus Logic Int Semiconductor Ltd Methods, apparatus and systems for authentication
GB201801530D0 (en) 2017-07-07 2018-03-14 Cirrus Logic Int Semiconductor Ltd Methods, apparatus and systems for authentication
JP7123540B2 (en) 2017-09-25 2022-08-23 キヤノン株式会社 Information processing terminal that accepts input by voice information, method, and system including information processing terminal
GB201801661D0 (en) * 2017-10-13 2018-03-21 Cirrus Logic International Uk Ltd Detection of liveness
GB2567503A (en) 2017-10-13 2019-04-17 Cirrus Logic Int Semiconductor Ltd Analysing speech signals
GB201801664D0 (en) 2017-10-13 2018-03-21 Cirrus Logic Int Semiconductor Ltd Detection of liveness
GB201803570D0 (en) 2017-10-13 2018-04-18 Cirrus Logic Int Semiconductor Ltd Detection of replay attack
GB201801663D0 (en) * 2017-10-13 2018-03-21 Cirrus Logic Int Semiconductor Ltd Detection of liveness
GB201804843D0 (en) 2017-11-14 2018-05-09 Cirrus Logic Int Semiconductor Ltd Detection of replay attack
GB201801874D0 (en) 2017-10-13 2018-03-21 Cirrus Logic Int Semiconductor Ltd Improving robustness of speech processing system against ultrasound and dolphin attacks
KR102399809B1 (en) * 2017-10-31 2022-05-19 엘지전자 주식회사 Electric terminal and method for controlling the same
GB201801659D0 (en) 2017-11-14 2018-03-21 Cirrus Logic Int Semiconductor Ltd Detection of loudspeaker playback
US11868995B2 (en) 2017-11-27 2024-01-09 Nok Nok Labs, Inc. Extending a secure key storage for transaction confirmation and cryptocurrency
US10910001B2 (en) * 2017-12-25 2021-02-02 Casio Computer Co., Ltd. Voice recognition device, robot, voice recognition method, and storage medium
US11831409B2 (en) 2018-01-12 2023-11-28 Nok Nok Labs, Inc. System and method for binding verifiable claims
US10656775B2 (en) 2018-01-23 2020-05-19 Bank Of America Corporation Real-time processing of data and dynamic delivery via an interactive interface
US11264037B2 (en) 2018-01-23 2022-03-01 Cirrus Logic, Inc. Speaker identification
US11475899B2 (en) 2018-01-23 2022-10-18 Cirrus Logic, Inc. Speaker identification
US11735189B2 (en) 2018-01-23 2023-08-22 Cirrus Logic, Inc. Speaker identification
US10733996B2 (en) * 2018-03-30 2020-08-04 Qualcomm Incorporated User authentication
US10720166B2 (en) * 2018-04-09 2020-07-21 Synaptics Incorporated Voice biometrics systems and methods
US10818296B2 (en) * 2018-06-21 2020-10-27 Intel Corporation Method and system of robust speaker recognition activation
CN110767221A (en) * 2018-07-26 2020-02-07 珠海格力电器股份有限公司 Household appliance and method for determining control authority
US10692490B2 (en) 2018-07-31 2020-06-23 Cirrus Logic, Inc. Detection of replay attack
US10460330B1 (en) 2018-08-09 2019-10-29 Capital One Services, Llc Intelligent face identification
US10915614B2 (en) 2018-08-31 2021-02-09 Cirrus Logic, Inc. Biometric authentication
US11037574B2 (en) 2018-09-05 2021-06-15 Cirrus Logic, Inc. Speaker recognition and speaker change detection
CN109065058B (en) * 2018-09-30 2024-03-15 合肥鑫晟光电科技有限公司 Voice communication method, device and system
US20210256099A1 (en) * 2019-01-18 2021-08-19 Nec Corporation Information processing method
US11792024B2 (en) 2019-03-29 2023-10-17 Nok Nok Labs, Inc. System and method for efficient challenge-response authentication
CN110196914B (en) * 2019-07-29 2019-12-27 上海肇观电子科技有限公司 Method and device for inputting face information into database
EP3912063A4 (en) * 2020-03-24 2021-11-24 Rakuten Group, Inc. Liveness detection using audio-visual inconsistencies
US20230245127A1 (en) * 2022-02-02 2023-08-03 Kyndryl, Inc. Augmented user authentication

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2317205A1 (en) * 2000-08-31 2002-02-28 Upper Canada Systems Information management system and method
US20060174119A1 (en) * 2005-02-03 2006-08-03 Xin Xu Authenticating destinations of sensitive data in web browsing
US20070041323A1 (en) * 2005-08-16 2007-02-22 Kddi Corporation Traffic control system, traffic control method, communication device and computer program
CN101111053A (en) * 2006-07-18 2008-01-23 中兴通讯股份有限公司 System and method for defending network attack in mobile network
CN101651541A (en) * 2008-08-14 2010-02-17 中华电信股份有限公司 System and method for authentication of network user
WO2011049784A2 (en) * 2009-10-23 2011-04-28 Microsoft Corporation Authentication using cloud authentication

Family Cites Families (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0816788A (en) * 1994-06-30 1996-01-19 Yuuseidaijin Authenticating method for person using plural physical features peculiar to the person
US6219639B1 (en) * 1998-04-28 2001-04-17 International Business Machines Corporation Method and apparatus for recognizing identity of individuals employing synchronized biometrics
JP2001331801A (en) * 2000-03-15 2001-11-30 Cai Kk Device and method for personal identification and recording medium
JP4330107B2 (en) * 2000-12-21 2009-09-16 圭一 加藤 Wireless mobile terminal user authentication system
JP3827600B2 (en) * 2001-04-17 2006-09-27 松下電器産業株式会社 Personal authentication method and apparatus
DE10163814A1 (en) * 2001-12-22 2003-07-03 Philips Intellectual Property Method and device for user identification
AU2003262746A1 (en) * 2002-08-20 2004-03-11 Fusionarc, Inc. Method of multiple algorithm processing of biometric data
US20130282580A1 (en) * 2003-02-28 2013-10-24 Payment Pathways, Inc. SYSTEMS AND METHODS FOR EXTENDING IDENTITY ATTRIBUTES AND AUTHENTICATION FACTORS IN AN ePAYMENT ADDRESS REGISTRY
US7161465B2 (en) * 2003-04-08 2007-01-09 Richard Glee Wood Enhancing security for facilities and authorizing providers
US7343289B2 (en) * 2003-06-25 2008-03-11 Microsoft Corp. System and method for audio/video speaker detection
US7506363B2 (en) * 2004-08-26 2009-03-17 Ineternational Business Machines Corporation Methods, systems, and computer program products for user authorization levels in aggregated systems
JP4696605B2 (en) * 2005-03-11 2011-06-08 富士通株式会社 Biometric authentication program, apparatus and method
JP2007052496A (en) * 2005-08-15 2007-03-01 Advanced Media Inc User authentication system and user authentication method
JP2007088803A (en) * 2005-09-22 2007-04-05 Hitachi Ltd Information processor
US20070130588A1 (en) 2005-12-06 2007-06-07 Greg Edwards User-customized sound themes for television set-top box interactions
JP2007156974A (en) * 2005-12-07 2007-06-21 Kddi Corp Personal identification/discrimination system
US20090157560A1 (en) * 2007-12-14 2009-06-18 Bank Of America Corporation Information banking and monetization of personal information
US9495583B2 (en) * 2009-01-05 2016-11-15 Apple Inc. Organizing images by correlating faces
US8311522B1 (en) * 2010-09-28 2012-11-13 E.Digital Corporation System and method for managing mobile communications
JP5710748B2 (en) * 2011-04-19 2015-04-30 株式会社日立製作所 Biometric authentication system
CN102457377A (en) * 2011-08-08 2012-05-16 中标软件有限公司 Role-based web remote authentication and authorization method and system thereof
CN102510426A (en) * 2011-11-29 2012-06-20 安徽科大讯飞信息科技股份有限公司 Personal assistant application access method and system
CN103310339A (en) * 2012-03-15 2013-09-18 凹凸电子(武汉)有限公司 Identity recognition device and method as well as payment system and method
US20140007154A1 (en) * 2012-06-29 2014-01-02 United Video Properties, Inc. Systems and methods for providing individualized control of media assets

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2317205A1 (en) * 2000-08-31 2002-02-28 Upper Canada Systems Information management system and method
US20060174119A1 (en) * 2005-02-03 2006-08-03 Xin Xu Authenticating destinations of sensitive data in web browsing
US20070041323A1 (en) * 2005-08-16 2007-02-22 Kddi Corporation Traffic control system, traffic control method, communication device and computer program
CN101111053A (en) * 2006-07-18 2008-01-23 中兴通讯股份有限公司 System and method for defending network attack in mobile network
CN101651541A (en) * 2008-08-14 2010-02-17 中华电信股份有限公司 System and method for authentication of network user
WO2011049784A2 (en) * 2009-10-23 2011-04-28 Microsoft Corporation Authentication using cloud authentication

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP2974124A4 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20170129203A (en) * 2015-03-13 2017-11-24 알리바바 그룹 홀딩 리미티드 And a method for activating a business by voice in communication software
EP3270550A4 (en) * 2015-03-13 2018-12-12 Alibaba Group Holding Limited Method and corresponding device for starting service through voice in communication software
US10353666B2 (en) 2015-03-13 2019-07-16 Alibaba Group Holding Limited Starting network-based services using a vocal interface with communication software on a mobile computing device
KR102388512B1 (en) * 2015-03-13 2022-04-20 어드밴스드 뉴 테크놀로지스 씨오., 엘티디. How and how to enable business with voice in communications software
CN107430854A (en) * 2015-04-13 2017-12-01 Bsh家用电器有限公司 Home appliances and the method for operating home appliances
CN107430854B (en) * 2015-04-13 2021-02-09 Bsh家用电器有限公司 Household appliance and method for operating a household appliance
JP2019522840A (en) * 2016-05-19 2019-08-15 アリババ グループ ホウルディング リミテッド Identity authentication method and apparatus
US10789343B2 (en) 2016-05-19 2020-09-29 Alibaba Group Holding Limited Identity authentication method and apparatus
CN108494836A (en) * 2018-03-09 2018-09-04 上海星视度科技有限公司 Information-pushing method, device and equipment
CN108494836B (en) * 2018-03-09 2024-03-29 上海领秀眼镜有限公司 Information pushing method, device, equipment, server, pushing system and medium

Also Published As

Publication number Publication date
KR101731404B1 (en) 2017-04-28
CN104995865B (en) 2018-06-08
EP2974124A4 (en) 2016-10-19
EP2974124A1 (en) 2016-01-20
US9218813B2 (en) 2015-12-22
KR20150103264A (en) 2015-09-09
JP6093040B2 (en) 2017-03-08
JP2016517548A (en) 2016-06-16
US20150134330A1 (en) 2015-05-14
CN104995865A (en) 2015-10-21

Similar Documents

Publication Publication Date Title
US9218813B2 (en) Voice and/or facial recognition based service provision
US10586541B2 (en) Communicating metadata that identifies a current speaker
US9514333B1 (en) Secure remote application shares
US10380380B1 (en) Protecting client personal data from customer service agents
US10673851B2 (en) Method and device for verifying a trusted terminal
US11228683B2 (en) Supporting conversations between customers and customer service agents
US9363375B1 (en) Interaction using content
US20140137220A1 (en) Obtaining Password Data
CN108573393B (en) Comment information processing method and device, server and storage medium
US8984612B1 (en) Method of identifying an electronic device by browser versions and cookie scheduling
WO2020078050A1 (en) Comment information processing method and apparatus, and server, terminal and readable medium
WO2020233009A1 (en) Identity authentication method and apparatus, computing device, and storage medium
US10936705B2 (en) Authentication method, electronic device, and computer-readable program medium
US11283806B2 (en) Adaptive security system
US10789385B1 (en) Dynamic tagging of media for service sessions
US11409856B2 (en) Video-based authentication
US20190304336A1 (en) Editing tool for math equations
CN112837159B (en) Transaction guiding method and device based on scene element, electronic equipment and medium
US10276169B2 (en) Speaker recognition optimization
US20140282955A1 (en) Controlled Password Modification Method and Apparatus
US11456886B2 (en) Participant identification in mixed meeting
CN114357425A (en) Application processing method and device
CN115547329A (en) Training data acquisition method, voice recognition method and device
CN111861483A (en) Communication method, computer equipment and storage medium

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 13995476

Country of ref document: US

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13877676

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 20157021017

Country of ref document: KR

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 2013877676

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2015556364

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE