LIGHTWEIGHT AND HYBRID TRANSFORMER-BASED SOLUTION FOR QUICK AND RELIABLE DEEPFAKE DETECTION

Lightweight and hybrid transformer-based solution for quick and reliable deepfake detection

Lightweight and hybrid transformer-based solution for quick and reliable deepfake detection

Blog Article

IntroductionRapid advancements in artificial intelligence and generative artificial intelligence have enabled the creation of fake images and videos that appear highly realistic.According to a report published in 2022, approximately 71% of people rely on fake videos and become victims of blackmail.Moreover, these fake videos and images are used to tarnish the reputation of popular public figures.

This has increased the demand for deepfake detection techniques.The accuracy of the techniques proposed in the literature so far varies with changes in fake content generation techniques.Additionally, these techniques are computationally intensive.

The techniques discussed in the literature are based on convolutional neural networks, Linformer models, or transformer models for deepfake detection, each with its advantages and disadvantages.MethodsIn this manuscript, Bowls a hybrid architecture combining transformer and Linformer models is proposed for deepfake detection.This architecture converts an image into patches and performs position encoding to retain spatial relationships between patches.

Its encoder captures the contextual information from the input patches, and Gaussian Error Linear Unit resolves the vanishing gradient problem.ResultsThe Linformer component reduces the size of the attention matrix.Thus, it reduces the execution time to half without compromising accuracy.

Moreover, it utilizes the unique features of transformer and Linformer models to enhance the robustness and generalization of deepfake detection techniques.The low computational requirement and high accuracy of 98.9% increase the Hydrogen Peroxide real-time applicability of the model, preventing blackmail and other losses to the public.

DiscussionThe proposed hybrid model utilizes the strength of the transformer model in capturing complex patterns in data.It uses the self-attention potential of the Linformer model and reduces the computation time without compromising the accuracy.Moreover, the models were implemented on patch sizes of 6 and 11.

It is evident from the obtained results that increasing the patch size improves the performance of the model.This allows the model to capture fine-grained features and learn more effectively from the same set of videos.The larger patch size also enables the model to better preserve spatial details, which contributes to improved feature extraction.

Report this page