Leio – the making of

22–24 August 2024. My English translation of my own article in Interlingua.

Introduction

Before I created the video in this page, in which I recite a very short paragraph in Portuguese and Interlingua, I conducted various experiments with hardware and software, in order to learn how to obtain optimum conditions, that would also be applicable to musical videos that I intend to produce one day.

I don’t want to tire the reader with everything I tried, I’ll concentrate on final results.

Caveat: I write departing from Linux. I have permanently abandoned MS Windows in July 2019.

Camera and microphone used

I used the internal camera of my laptop, an Acer One 14 Z2-485, and also the built-in microphone. Results with a cheap external microphone were not so good. Results using the internal microphone may vary per laptop. It is always important that the distance between the sound source and the microphone is not too big. About 50 to 70 cm (20 to 30 inches) is optimal. With too great a distance, the recorded sound will seem hollow, difficult to understand if it is spoken word, and with instruments and musical tones that are hard to distinguish with precision.

I found the command lsusb, so I know that the identification of my camera is “Chicony Electronics Co., Ltd Chicony USB2.0 Camera”. There is also the command
v4l2-ctl -d /dev/video0 --list-formats-ext
and according to the man page this v4l2-ctl is “An application to control video4linux drivers”. See also the website webcamtests.com.

From those I know that the camera in my laptop has 922 kilopixels, with a maximum resolution of 1280×720 pixels (HD, high definition). In comparison with digital photo cameras that isn’t so much: already in May 2001 I was using a Fujifilm FinePix1300 with 1.3 megapixels, and from June 2009 onwards a Casio Exilim EX-Z85 with 9.1 megapixels. Much more than the 0.922 MP of the webcam. But we must consider the number of images per second, which requires quite some data transmission capacity.

HD at 1280×720 doesn’t produce the very sharp videos that I sometimes see on Youtube. But those are more likely in 4K UHD at 3840×2160. Three times the number of pixels in both directions, so nine times the amount of data.

Illumination

Illumination (or lighting) is important. Without good lighting, a video might appear blotchy and spotty, as if the resolution were quite low.

Too little light is a problem, but too much light isn’t right either. What matters is that the light, whether natural light from the sun, or artificial light from lamps, falls on the objects to be filmed, and not into the camera, which will then reduce its diaphragm, and make everything too dark.

The incidence of light is essential.

Programs

First I used cheese (version 43.0). A clear and easy-to-use program. For a short while I tried kamoso by KDE, which seemed to produce sharper videos than cheese. Later I more and more began to like guvcview. Also with videos that seemed to be sharper than those by cheese, but I wonder if that might be just an illusion.

I didn’t know right away how to record a video with guvcview. The program opens two windows, one with the current camera footage, and one with various “Settings”. Although it is rather logical and obvious, it just didn’t occur to me that the button “Cap. Video (V)” would be intended for that. Perhaps strange that I think like that, or perhaps strange that such a button and function is located in a Settings window. It happens often that I have problems with the intuitiveness of software. Whether that is really caused by the software itself, or by my way of thinking and my expectations, I don’t know.

The name guvcview isn’t easy to remember, it comes from GTK+ UVC Viewer, in which GTK stands for GIMP ToolKit, and UVC for USB video class.

`Guvcview` settings

I tested with version 2.0.8. The window that appears first is under the tab “Image Controls”, with sliders for the various image parameters. In the Settings menu there is an option Hardware Defaults. I don’t like those. Too pale and colourless, not expressive. This is the case especially when in the tab Video Controls the Camera Output is set to “YUYV - YUYV 4:2:2”.

To compensate that, my changes were: Contrast 32 becomes 44, Saturation 32 goes to 50. But if the Camera Output is “MJPG - Motion JPEG” (which is better, see below), that is too much, and the Saturation better be 40 than 50.

Interestingly, the setting are not only effective for the program guvcview itself, but also for cheese and kamoso. It seems the parameters are not stored in a configuration file for guvcview, but somewhere in the operating system, in hardware, or in a driver.

An additional test: if I slide the controller Hue completely to the right, also Zoom and Verbling show me all in green, like The Hulk. But without the muscles, of course.

Sound

Discontinuous

With the settings reached thus far, my trial videos had a serious problem: the sound did not continue, but stopped and restarted all the time. (In Dutch we call this: haperen or horten, but in both English and Interlingua there are no clear-cut translations for those.)

After quite a bit of time, I happened to run into the cause, and so had the solution: in guvcview, in the tab Audio Controls, the parameter Audio API was set to PORTAUDIO. But in my Linux system I have PULSEAUDIO! Apparently this setting caused a lot of conversion work for the processors, so a continuous operation became infeasible. After setting the Audio API to PULSEAUDIO, the problem was gone.

Avoid clipping

Clipping means that the momentary amplitude of the sound exceeds the analog or digital range of the hardware. So the amplitude is abruptly cut off to a hard limit.

Perhaps this can be repaired afterwards, but prevention is better than cure.

Especially in the sounds from musical instruments, e.g. a guitar, there may be sudden peaks. Therefore I found that in the Audio Mixer, Input Devices, a microphone level of 22 to 20% is optimal. Relative to 100%, this corresponds to minus 39 to 41 dB.

But at too low a level, of course the risk of noise increases. So always look for the optimal compromise for your situation.

Resolution and compression

The command mentioned above,
v4l2-ctl --list-formats-ext
told me that my camera can produce data in these two formats (I leave out the lower resolutions, although they are also possible):

[0]: 'MJPG' (Motion-JPEG, compressed)
	Size: Discrete 1280x720
		Interval: Discrete 0.033s (30.000 fps)
[1]: 'YUYV' (YUYV 4:2:2)
	Size: Discrete 1280x720
		Interval: Discrete 0.100s (10.000 fps)

Only 10 fps, 10 frames per second? Why so few? That is not enough for smoothly flowing videos! I believe the answer lies in data transmission. I don’t fully understand YUYV, which seems quite complicated. But let’s assume that this is the native format in which the camera can produce its images. If we further assume that every pixel requires three encoding bytes, the data to transmit for 10 frames a second is:
1280 x 720 × 3 × 10 = 921 600 × 3 × 10 = 27 648 000 bytes
Then the number of bits/s is:
27 648 000 × 8 = 221 184 000 = 221 Mb/s

The theoretic maximum capacity of USB 2.0, which is used internally for the data transmission from the camera to the central processing unit, is 480 Mb/s, which explains why sending 30 frames per second is not possible.

My speculation is that the camera has its own internal processor, which can do the MJPG compression. To compress the film data, MJPG looks at every image, every frame separately. As a result, the compression is not very good, but it also requires only a limited processing power in the camera. The achieved smaller amount of data means that now instead of 10, also 30 frames per second can be sent.

The video compression scheme that is now state of the art, is H264 (AVC). (No, H265 is even better.) Those not only compress the ‘photos’ in a video, but also look for compression possibilities between the photos, or frames, images. This makes the files much smaller, but of course the algorithms also require a lot of processing power and memory.

It seems obvious to me that the MJPG delivered by the camera first needs to be decoded back to YUV, and then compressed again, now as H264.

And during my experiments, I noted that my laptop is not fast and powerful enough to do the recording and compression work immediately, on line, in real time. My ‘Pentium Gold’, more in detail a Dual core Intel Pentium 4415U, dating from late 2017, with a graphics processor of type Mesa Intel UHD Graphics - CometLake-U GT2, certainly has enough power for normal day-to-day operation. But for intensive work with video, in real time, it is just too slow. While playing my experimental videos, I noticed a bad synchronisation between sound and images. Cause: while recording, many frames were skipped, because the processor cores couldn’t keep up.

Solution: in guvcview, in the menu Video, Video Codec, do not specify “MPEG4-AVC (H264)”, but rather “MJPG - compressed”, the same as what the Camera Output was set to. Thus the program needs only record the data it receives from the camera, and does not have to do any compression work itself. The compression can then take place later, not in real time, but as batch processing, for example with the program kdenlive, after the post-processing. It is not a problem if 1 minute of video requires 3 minutes of compression and codification. There is plenty of time.

Post-processing

With `vlc`

The well-known program vlc can be used for playing audio and video. The program cheese can take photos and record videos. It has a button to view the videos just recorded in vlc. Thus when trying out several things I discovered that vlc (version: 3.0.21 Vetinari) boasts effects as post-processing. So I decided to use these for my videos.

But there was a problem: I couldn’t save the video including the effects. So the effects were only temporarily effective, which of course isn’t very useful. Or rather: I found descriptions of how to save the changes in a new file, but the procedure was very complicated, and eventually I got an error message that was hard to understand. So sadly, I gave up, and preferred other effects, by a different program.

However I did find the effects interesting, therefore I describe them here anyway.

Sound

Equaliser: reduce the noise a bit, by attenuating high frequencies. At my age (69), above 6 kHz I probably hear very little, but younger people still do. Around 170 Hz I reduced the sound a bit to compensate for a resonance at the A string, which at least the guitar I was then using had. And around 1 and 3 kHz a bit stronger, for a ‘presence’ effect sometimes used for electric guitars. But I was using a classical or Spanish guitar.

Compression: this reduces the volume of louder passages, and amplifies quieter parts. In other words: the dynamics of the music are reduced. The setting Make-up gain is important: it’s better when this is a bit lower, to avoid distortion by clipping. This Make-up gain also causes a kind of quantisation noise, or at least that was the impression I had.

All of this together created a more direct sound, as if the microphone were closer to the sound source than it actually was. Of course this way the sound is no longer natural, but what does that matter if this is the way I like it?

Video

Video effects by vlc that I used: Tools, Effects, Colors Gradient, and a bit of Sepia. That made me unrecognisable, well, but I don’t have film star ambitions, on the contrary. The left-hand finger movement remained visible, and that is an objective of future videos.

Also Advanced, Motion Blur, this seemed to make finger motions even better visible. But is that really so?

On 3 July 2024 however I decided: I won’t use any video effects, I leave everything the way it is, but with contrast and saturation adjusted a bit, as mentioned before. This decision I made not only because I didn’t manage to save vlc results.

With `kdenlive`

See below.

Combining videos

On 8 July 2024 I had a passable video of the part in Portuguese, and one of the Interlingua part, of the video the prehistory, and the history of making it, this article tries to describe. I wanted to trim some useless seconds at the end of the first part, and combine the second part with the trimmed result. Glue two videos together. Concatenate them. That shouldn’t be too difficult, should it?

I had seen a demonstration video of kdenlive in Youtube, in which this was the first thing the demonstrator showed. Yet, when I tried it, whatever I did, I couldn’t manage. In my opinion kdenlive (version: 22.12.3) is completely counter-intuitive, it is incomprehensible how things are meant to be done in it. Perhaps that is a problem in me, and not in the program. Yet my experience with various types of software has begun in 1975, and I still work with programs every day.

Here I saw mention of shotcut. Allegedly much easier. I tried it, after having watched parts of this instruction video, linked from here.

Indeed shotcut is easier and more understandable. I could trim and glue. There was a loud tick at the transition. Solution: slide the beginning of the second part slightly over and past the end of the first part, and the program automatically creates a smooth transition. This is intuitive! A program that all by itself simply does a useful thing that is obviously what is wanted! Very good.

Triple stereo

With kdenlive (yes, this time I could do it) I added three stereo effects, to the clip originally recorded with just one microphone, namely the internal one in the laptop. All three effects (Haas, Extra, Widener) were really necessary to my taste, and in that order. Somehow the stereophonisation of the sound isn’t the most important. But as when trying with vlc (by other means!) the result is a proximity effect, as if the microphone were closer to the sound source than in fact it was during the recording.

Concrete steps in kdenlive: menu Project, Add clip or folder, select the MP4 file or some other video file. Almost at the left in the middle of the screen, press Effects. Choose Stereo and Binaural Images. Double click on three effects in succession: 1) Haas Stereo Enhancer, (inspired by the Haas effect, described in 1949 by Helmut Haas), 2) Extra Stereo, 3) Stereo Widener.

Then: Project, Render, and choose the compression method for the new file to be saved.

Leio – the making of

Introduction

Camera and microphone used

Illumination

Programs

Guvcview settings

Sound

Discontinuous

Avoid clipping

Resolution and compression

Post-processing

With vlc

Sound

Video

With kdenlive

Combining videos

Triple stereo

`Guvcview` settings

With `vlc`

With `kdenlive`