Deep Content Inspection

Web Safety is capable of performing deep scanning of web pages for explicit adult word phrases. This is very effective way of blocking as it allows to individually block parts of any web site and does not rely on huge site categorization database.

Deep content inspection scans all downloaded textual pages (HTML, JSON and TEXT) and calculates weight of each page by summing weights of all words found. Commonly used words have zero weights, adult specific phrases have positive weights. The more mature a word is - the more weight it has. If contents of a page result into weight more than maximum configured weight then this page is blocked.

Database of adult phrases is in DansGuardian format and is stored in /opt/websafety/var/spool/adult/weighted.conf. Unfortunately it is not possible to change this file from Admin UI, you must do it manually if further adjustment of weights of each adult phrase is required. Please do not forget to click Save and Restart in Admin UI after that.

The following screen shot shows deep content inspection rule, configured for a default policy.

../../../../_images/rule_adult2.png

By default, deep content inspection is switched on with a maximum weight of text configured at value of 80. To keep amount of memory used during scanning manageble the deep content inspection engine does not scan texts exceeding 2 Mb.

It is also possible to scan HTML links (anchors) within text as well as embedded JavaScripts and CSS contents. By default these types of scans are off but they may be switched on in very strict non adult environments.

Trusted Categories

If certain amount of adult only material is acceptable then it is recommended to switch on the Trusted Categories rule. This means if a given domain is known to be part of a non blocked category, deep content inspection is skipped for this domain. This proves to be very effective way of decreasing false positives, when for example an article on well known news site is blocked because it contains some adult only words.

../../../../_images/trusted_categories2.png

The list of trusted categories can be configured in Settings / Trusted Categories as indicated on the following screenshot.

../../../../_images/trusted_categories_settings2.png

The following table shows default recommended trusted categories.

Category

Trusted

ADVERTISING

yes

AUCTIONS

yes

AUTOMOTIVE

yes

BUSINESS SERVICES

yes

ECOMMERCE SHOPPING

yes

EDUCATIONAL INSTITUTIONS

yes

FINANCIAL INSTITUTIONS

yes

GOVERNMENT

yes

HEALTH AND FITNESS

yes

JOBS EMPLOYMENT

yes

MOVIES

yes

MUSIC

yes

NEWS MEDIA

yes

NON PROFITS

yes

POLITICS

yes

RADIO

yes

RELIGIOUS

yes

RESEARCH REFERENCE

yes

SEXUALITY

yes

SOCIAL NETWORKING

yes

SOFTWARE TECHNOLOGY

yes

SPORTS

yes

TELEVISION

yes

TRAVEL

yes

WEBMAIL

yes