Deep Content Inspection
Web Safety is capable of performing deep scanning of web pages for explicit adult word phrases. This is very effective way of blocking as it allows to individually block parts of any web site and does not rely on huge site categorization database.
Deep content inspection scans all downloaded textual pages (HTML, JSON and TEXT) and calculates weight of each page by summing weights of all words found. Commonly used words have zero weights, adult specific phrases have positive weights. The more mature a word is - the more weight it has. If contents of a page result into weight more than maximum configured weight then this page is blocked.
Database of adult phrases is in DansGuardian format and is stored in /opt/websafety/var/spool/adult/weighted.conf. Unfortunately it is not possible to change this file from Admin UI, you must do it manually if further adjustment of weights of each adult phrase is required. Please do not forget to click Save and Restart in Admin UI after that.
The following screen shot shows deep content inspection rule, configured for a default policy.
By default, deep content inspection is switched on with a maximum weight of text configured at value of 80. To keep amount of memory used during scanning manageble the deep content inspection engine does not scan texts exceeding 2 Mb.
If certain amount of adult only material is acceptable then it is recommended to switch on the Trusted Categories rule. This means if a given domain is known to be part of a non blocked category, deep content inspection is skipped for this domain. This proves to be very effective way of decreasing false positives, when for example an article on well known news site is blocked because it contains some adult only words.
The list of trusted categories can be configured in Settings / Trusted Categories as indicated on the following screenshot.
The following table shows default recommended trusted categories.
|HEALTH AND FITNESS||yes|