вторник, 2 октября 2012 г.

Project “Audio Internet - Look into the Future”


Today is the age of the mobile Internet. Rapid growth and proliferation of wireless networks and associated with them mobile devices, allows the user to get relevant content virtually anywhere and at any time.


However, the traditional (visual) form, in which the Internet content is provided, is not always practical with mobile devices. A lot of the content comes as text, tables, images, video clips that require continuous visual contact. In addition, networking and browsing requires manual operation.
It is my understanding that mobile means "in motion", "on the fly", i.e. just the way one uses his or her mobile phone for talking. One has no need to stop, concentrate on the screen. Instead one just holds the phone to the ear or uses hands-free talking to receive and transmit information. The mobile Internet as it exists now requires visual attention focused on the screen and manual control. The need to find a visual reference for the receipt and transmission of information does not allow the user to combine it with such tasks as driving a car, walking or performing mechanical labour.

Until recently, the domination of visual content has been justified by the weak development base, low memory devices, and low-speed data channels. However, modern technical basis allows for storage and instant wireless transmission of large amounts of information in various forms, so that the usual array of visual information can be supplemented with large amounts of audio data processing of which does not require visual attention.

The essence of the idea is as follows: the user receives information in the form of audio playback of audio files stored on a remote server or on the mobile device. In addition, the audio content can be automatically generated with the help of special programs on the basis of text files according to users’ request. The user controls the flow of content through audio commands and queries, or creates new information in the form of audio files placed in the memory of the mobile device or on a remote server. This architecture is similar to a radio channel with feedback, which provides any information on user’s demand. Henceforth I will refer to this system as Audio Internet (AI).

AI would look like the traditional Internet, the only difference being that it does not require screen for browsing and can be operated through voice commands. AI frees not only our eyes, but our hands. This arrangement provides us complete mobility, the ability to simultaneously perform a number of necessary actions, such as driving, cooking, walking, so who knows what else, from morning washing and finishing evening walk with the dog.

To date, the Internet has accumulated a huge amount of audio information in a variety of audio files, most of which are in the public domain. These audio files should form the basis for AI. As for the commercial component in AI, its potential is not inferior to the potential of the traditional Internet.

What kind of information transmitted in the form of audio content should of primary interest  to the consumer?

  1. Entertainment content in the form of music, audio books, and radio programs.
  2. Cognitive information in the form of news releases, encyclopedic reference, audio guides, etc.;
  3. Training information in the form of training and development programs, quizzes;
  4. Social information in the form of audio-blogging, audio-chats, audio-forums;
  5. Personal (confidential) information in the form of audio-mail, Internet-telephone, audio-calendar, and so on;
  6. Business information in the form of audio-sites of various companies and organizations, audio card, etc.

In fact, this is a whole new industry on the Internet, which will provide an impetus for creation of new software products, audio browsers, audio search engines, audio e-mail programs and other. AI will also require a constant supply of new audio content.

I am sure that the AI ​​will be consumed by the majority of mobile phone users, and at some point, when the wireless access to the Internet will become ubiquitous, it could replace the very cellular telephone service. At the initial stage the AI ​​will connect those who are already actively using the mobile Internet, and those who use mobile audio players.

The AI users will be young and middle-aged people who lead an active life and have a constant need to remain updated. Then they will be joined by older people, including those who currently do not actively use the Internet. This category of consumers should be attracted by the relative ease of use, which does not require any technical skill or knowledge. The AI should greatly simplify the dialogue between man and computer, bringing it to the level of human interaction.

In addition, the AI can play a huge role in the social life of many people.

Another potential beneficiaries of the AI are people with special needs. Although some Internet resources are adapted for blind or visually impaired people, their choice of content is heavily limited and access is often too complicated. The introduction of AI will allow them to enjoy the Internet content and services on a par with with the rest of us. This issue is especially significant in view of the fact that there are 37 million blind people in the world, and another 124 million have very poor eyesight and their numbers are constantly growing. Another group of people that might benefit from AI are bodily disabled, who may experience difficulties in manipulating input devices, such as a mouse or a keyboard. For them replacing the manual control by a voice command system is a must.

In addition, the use of the traditional Internet requires a certain level of literacy. Unfortunately, in the world today there are about 800 million illiterate adults and 113 million illiterate children who do not attend school. For them, the AI ​​can be a window into the modern world.

Ongoing access to a variety of content and a simple audio interface can radically change the way unskilled, manual labor is performed and will allow people to develop and grow, regardless of their level of education, type of activity, social group and age.

вторник, 12 июня 2012 г.

Why the idea of control over audio-content in mobile computers is innovative


The main goal - is to simplify human-computer interaction, bringing it to the level of inter-human communication. This would signify abandoning the approach of turning the user into computer-like creature in favour of making the computer more human-like.
At the end of the eighties a similar breakthrough occurred due to the emergence of a new concept of “user interface”, the most common of which was Microsoft Windows. Through this GUI, millions of users without special training could start using computers, as they began to perceive the computer not as a machine, understanding only special, machine instructions, but as a normal desktop with a number of instruments needed to perform usual human tasks. To date, the development of computers has been moving towards miniaturization, providing mobility and access to the Internet over wireless networks. The main idea is to provide a user with the ability to obtain necessary information at any place and at any time. But the attachment to the visual interface (GUI shells) greatly complicates the human-computer interaction due to the mobile computers’ screen limitations and the need to completely switch all the user's attention to the process of controlling the device. The emergence of a new audio interface will greatly simplify the management of mobile computers, allow the user to do other tasks simultaneously and greatly accelerate the process of providing the person with the information he or she needs. This audio interface must be included in the operating system  of primarily the mobile devices such as PDA, smart phones, and others. For this end, the standards for audio interface must be developed to facilitate its use by other programs and applications.

For a mobile computer user it is easier to receive most of the information as audio-content through headphones. Accordingly, it is more convenient to manage such information through voice commands. Tentatively, such communication with a computer can be referred to as the Question-Answer.Such system would require some new programs, let’s call them audio browser and audio search engine. These programs will allow us to search, organize and play audio files either downloaded from the Internet or stored on your computer without using the screen. Such use will be most effective if the machine is able to understand your question correctly and accurately and quickly select the desired response in the form of an audio file. This requires indexing of audio files’ content. Of course, part of the required labels can be set manually in the form of certain phrases, but for the search to be most accurate a special search engine will have to analyze audio content and set the labels automatically. This approach will not only organize audio files, but will also allow to find desired places within the files, and if necessary to create bookmarks, etc. Such system will also require a number of new applications, such as applications designed to establish the language or the music genre of the audio files.

How should  the audio content look, or rather sound like, then? Will it be formed automatically by means of special programs for creating sound out of text files? Currently, most of the audio content must be created in the recording studios by live people, who ideally, have special training and skills, since intonation plays important role in perception of auditory text. A person does not simply describe events, his intonation shows his attitude towards them. The machine can not yet express the nuances of intonation since it is devoid of emotion. It is a task for the future, perhaps the nearest, but not for today. However it is safe to entrust the machine with reading commands, table of contents, menus, brief newsletters, announcements, letters and short messages.

These additional computer capabilities will greatly simplify the communication between man and machine, expand the number of users and create new programs. They can control the flow of music, provide news programs, artistic, scientific and business literature, tutorials, games and much more. For example, it may be a navigator, which will lead you to a given point, or take you on a certain route and on the way will tell you more about local attractions. But most importantly, the audio interface will make another step towards convergence of man and machine, and this step will be directed towards the person, towards his way of life and it will be in line with his natural predispositions.

суббота, 2 июня 2012 г.

В чем заключается инновационность идеи управления аудио-контентом для мобильных компьютеров?



Основная цель - это упрощение диалога человека с компьютером, доведение его до уровня человеческого общения. Переход на путь, по которому компьютер приближается к человеку, а не наоборот, когда человек все больше начинает походить на компьютер.
В конце восьмидесятых годов прошлого столетия такой прорыв произошел благодаря появлению нового интерфейса в виде «пользовательских оболочек», наиболее распространенная  из которых Microsoft Windows. Благодаря такой графической оболочке миллионы пользователей без особой подготовки смогли начать пользоваться компьютерами, так как они стали воспринимать компьютер не как машину, понимающую только специальные, машинные команды, а как обычный рабочий стол с рядом необходимых для выполнения задач человеческих инструментов. На сегодняшний день развитие компьютеров идет по пути миниатюризации, обеспечивающей  мобильность и возможность  предоставления доступа в сеть интернет по беспроводным сетям.  Основная идея заключается  в возможности пользователю получать необходимую информацию в любом месте и в любое время.  Но привязанность к визуальным  интерфейсам (графическим оболочкам) значительно усложняет диалог человека с компьютером. Это связано и с ограниченными возможностями экрана мобильных компьютеров и с необходимостью полностью переключать все внимание пользователя на процесс управления устройством. Появление нового аудио-интерфейса позволит значительно упростить процесс управления мобильными компьютерами. Даст возможность пользователю заниматься одновременно выполнением других задач, значительно ускорит процесс получения человеком необходимой ему информации. Такой аудио-интерфейс необходимо включить в операционные системы в первую очередь мобильных устройств, таких как PDA, смартфоны и прочие. Для этого необходимо разработать стандарты аудио-интерфейса для дальнейшего его использования другими программами-приложениями.
Для пользователя мобильного компьютера значительную часть информации удобно воспринимать  в виде аудио-контента, которую он мог бы получать через наушники.  Соответственно, управлять такой информацией удобнее будет также голосовыми командами с помощью микрофона. Условно такое общение с компьютером можно назвать Вопрос-Ответ. При этом появляется потребность в ряде новых программ, назовем их аудио-браузер и аудио поисковая машина. Т.е. это программы, позволяющие искать, систематизировать и воспроизводить аудио-файлы из сети интернет или записанные на ваш компьютер, без использования экрана. Такое использование будет максимально эффективным в том случае, если машина сможет правильно понимать ваш вопрос и быстро и точно подбирать нужный вам ответ, т.е. аудио-файл. Возникает необходимость индексации контента т.е. аудио-файлов. Конечно, часть необходимых меток можно установить вручную в виде определенных фраз, но все же поиск будет наиболее точным, если специальная поисковая машина самостоятельно проанализирует контент и установит метки. Такой подход позволит не только систематизировать аудио-файлы, но и находить нужные места в них, при необходимости делать закладки и пр. Потребуется ряд программ, например, таких как определение языка аудио-файла или определение жанра музыкального произведения.
Как должен выглядеть, вернее, звучать, аудио-контент? Будет ли он формироваться автоматически, с помощью специальных программ  формирования звука при считывании текстовых файлов? На сегодняшний день основной объем аудио-контента   должен создаваться в звукозаписывающих студиях, живыми людьми, желательно имеющими специальную подготовку и умение. Дело в том, что при чтении текста интонация играет огромную роль для восприятия. Человек не просто описывает события, а интонацией показывает свое отношение к ним. Машина не может пока выражать свое отношение к читаемому ею тексту, она лишена эмоций, это задача будущего, возможно ближайшего, но не сегодняшних дней. А вот что смело можно поручить читать машине - это команды, оглавления, меню, краткие информационные сводки, анонсы, короткие письма и сообщения.
Такие дополнительные возможности позволят значительно упростить общение человека с машиной, расширить число пользователей, создать новые программы. Это могут  быть управление потоками музыки, новостные программы, художественная, научная и деловая литература, обучающие программы, игры и многое другое. Например, это может быть навигатор, который приведет вас в заданную точку или проведет вас по определенному маршруту и по дороге еще расскажет вам о местных достопримечательностях. Но самое главное, аудио-интерфейс  позволит сделать еще один шаг на пути сближении человека и машины, и этот шаг будет направлен к человеку, к его жизненному укладу в соответствии с его мироощущениями.  

суббота, 17 марта 2012 г.

Here are some ideas about the future of the Internet

Rapid development of wireless networks, smart phones, PDA and other mobile Internet devices enables the user to obtain the desired content almost anywhere and at any time. But there is one obstacle, complicating the use of the content, or rather the absorption of the information. Most of the information is supplied in a visual form, such as text, images, tables, etc. In order to absorb the necessary information a screen and visual attention are required. The need to stay in visual contact for the absorption of information does not allow the information consumer to do anything else, for example, to drive a car, walk or perform any other manual task. Until recently, such distribution of information has been justified by the weak development of technical facilities, limited capacity of servers’ memory, low-speed data channels. But today technical resources allow one to store and instantly transmit large amounts of information over the wireless communication channels.


The essence of my idea is as follows: the user receives the information (content) in the form of audio files stored on servers as she/he manages the information flow through voice commands and queries, or creates his own information (content). That is, this would be similar to a radio channel with feedback, which provides you with any information you need at your request. These sites can carry a variety of audio information: news, music, books, training and development programs, games, translators, perhaps forums, where you can take part; this may be just a phone conversation with your friends, which would be facilitated by Skype-like networks. From now on I'll use audiointernet to designate the part of the Internet, which can be used by a consumer without a screen, and audiosites, the specific information channels.

Why and who needs it? It is necessary for providing a modern consumer with the instant access to various types of information, even when she or he is busy with a task that requires constant visual control (driving, hiking, physical work, etc.). That is, at the moment when she or he is either unable or unwilling to use the screen. Such a system of human interaction with the Internet will allow the consumer to reach a new level of information, limited only by the capacity of providers to ensure access to the audiointernet. The consumer will gain an opportunity to remain at all times within the information field of various audiosites.

In my view, the audiointernet will be in high demand among almost all current mobile users. Primarily among those who are already actively using the mobile Internet, namely young, modern people leading active lifestyles who are in need of constant flow of information. In addition they will be joined by middle-aged and older populaiton, who, until now, have only been weakly exposed to the Internet. This segment of population might be attracted by the simplicity of interface, which does not require any specific knowledge or training. The structure of the voice commands and queries should be kept simple and understandable and the audiosites’ interface should be based on typical structures.

In addition, the audiointernet can play a huge role in the social lives of many people. Blind or visually impaired population has only limited access to the Internet content. But with the advent of the audiointernet they will be able to enjoy the same level of service as other users. Currently there are 37 million of blind people in the world and 124 million more suffer from serious visual impairment. In addition, in the world today there are about 800 million illiterate adults and 113 million children who do not attend school. For them audiointernet can serve as a window into the modern world.

Discarding screen will allow a simplified, budget model of the device with the access to the audiointernet that will especially suite low-income people. Ability to have constant access to a variety of information can fundamentally change the attitude towards unskilled, manual labor and will allow a person to develop and grow, regardless of the nature of his or her occupation. In sum it turns out that virtually the entire adult population of the globe will be prone to use the audiointernet network.

Despite the fact that the modern level of technical development allows the audiointernet, to date there is no coherent policy in this matter, no developed basic approaches, no specific standards, model structures or protocols. Also there are no special audiobrawsers, search engines and other necessary applications. But it's only a matter of time. There is already a vast amount of content ready for the audiointernet, and it can serve as a basis for creation of a new content. For example, Wikipedia can get a stable funding, offering its users a paid audio version. Generally, with regard to the commercial component of the audiointernet, the opportunities are en par with the conventional visual internet.