How to learn PHP language? Personally believe that we need in practice to consolidate theoretical knowledge that we can better achieve our learning objectives.But also to constantly update their knowledge, because each language is very fast update rate, if not continuous learning, our knowledge will soon be eliminated.For example, now have built a collection CMS, content and title are relatively good deal, but in most cases very difficult to extract the key words.So to get keywords automatically become the current PHP CMS category "traditional problem."
CMS PHP class that can automatically obtain the key words to it, and its main steps can be divided into the following three steps:
1, PHP CMS by category segmentation algorithm to segment the title and content, respectively, extracted keywords and the frequency of
Sub-word stage in the content, the two algorithms are currently the main CAS ICTCLAS and hidden Markov models.But both are too high, there is a certain threshold, and are only supported C + + / JAVA.There are two based on the current PHP is recommended PSCWS and HTTPCWS.
SCWS official version 1.0.0 released on 2008-03-08, the latest version now has to 1.0.4.PSCWS its PHP version.
The HTTPCWS is Zhang Yan developed before the call PHPCWS.PHPCWS first use "ICTCLAS 3.0 shared version of Chinese word segmentation algorithm," the API for the initial sub-word processing, and use self-prepared "reverse maximum matching algorithm" on the sub-word and the word consolidation, and increase the punctuation filtering, obtained segmentation results.Currently only supports Linux / Unix systems.
2, PHP class CMS will extract the results were compared with the existing vocabulary, the most consistent with the rules of the keywords
This mainly depends thesaurus, we can define their own vocabulary, you can also use the existing mature vocabulary.
3, CMS and PHP classes to compare these two sets of key words to get the most current content of the keywords
The specific circumstances at this stage is a detailed analysis.PHP CMS are the current class of their own keyword extraction system.Which the network is the most widespread sub-word DEDECMS source, I also make my POPCMS tested, the effect is very good, similar "we" and other meaningless words as keywords extracted by the frequency is too highand sometimes even made to do the HTML space as key words to be desired.However, if as a secondary function, it has been very good.
CMS and other PHP class DISCUZ features automatic extraction of keywords is also very powerful.