Deep Content Inspection

Web Safety is capable of performing deep scanning of web pages for explicit adult word phrases. This is very effective way of blocking as it allows to individually block parts of any web site and does not rely on huge site categorization database.

Deep content inspection scans all downloaded textual pages (HTML, JSON and TEXT) and calculates weight of each page by summing weights of all words found. Commonly used words have zero weights, adult specific phrases have positive weights. The more mature a word is - the more weight it has. If contents of a page result into weight more than maximum configured weight then this page is blocked.

Database of adult phrases is in DansGuardian format and is stored in /opt/websafety/var/spool/adult/weighted.conf. Unfortunately it is not possible to change this file from Admin UI, you must do it manually if further adjustment of weights of each adult phrase is required. Please do not forget to click Save and Restart in Admin UI after that.

The following screen shot shows deep content inspection rule, configured for a default policy.

../../../../_images/rule_adult2.png

By default, deep content inspection is switched on with a maximum weight of text configured at value of 80. To keep amount of memory used during scanning manageble the deep content inspection engine does not scan texts exceeding 2 Mb.

It is also possible to scan HTML links (anchors) within text as well as embedded JavaScripts and CSS contents. By default these types of scans are off but they may be switched on in very strict non adult environments.

Trusted Categories

If certain amount of adult only material is acceptable then it is recommended to switch on the Trusted Categories rule. This means if a given domain is known to be part of a non blocked category, deep content inspection is skipped for this domain. This proves to be very effective way of decreasing false positives, when for example an article on well known news site is blocked because it contains some adult only words.

../../../../_images/trusted_categories2.png

The list of trusted categories can be configured in Settings / Trusted Categories as indicated on the following screenshot.

../../../../_images/trusted_categories_settings2.png

The following table shows default recommended trusted categories.

Category Trusted
ADVERTISING yes
AUCTIONS yes
AUTOMOTIVE yes
BUSINESS SERVICES yes
ECOMMERCE SHOPPING yes
EDUCATIONAL INSTITUTIONS yes
FINANCIAL INSTITUTIONS yes
GOVERNMENT yes
HEALTH AND FITNESS yes
JOBS EMPLOYMENT yes
MOVIES yes
MUSIC yes
NEWS MEDIA yes
NON PROFITS yes
POLITICS yes
RADIO yes
RELIGIOUS yes
RESEARCH REFERENCE yes
SEXUALITY yes
SOCIAL NETWORKING yes
SOFTWARE TECHNOLOGY yes
SPORTS yes
TELEVISION yes
TRAVEL yes
WEBMAIL yes