CN1720524A - Knowledge system method and apparatus - Google Patents

Knowledge system method and apparatus Download PDF

Info

Publication number
CN1720524A
CN1720524A CN 03825729 CN03825729A CN1720524A CN 1720524 A CN1720524 A CN 1720524A CN 03825729 CN03825729 CN 03825729 CN 03825729 A CN03825729 A CN 03825729A CN 1720524 A CN1720524 A CN 1720524A
Authority
CN
China
Prior art keywords
word
inquiry
word strings
language
strings
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 03825729
Other languages
Chinese (zh)
Other versions
CN100380373C (en
Inventor
埃里·阿博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US10/281,997 external-priority patent/US7711547B2/en
Application filed by Individual filed Critical Individual
Publication of CN1720524A publication Critical patent/CN1720524A/en
Application granted granted Critical
Publication of CN100380373C publication Critical patent/CN100380373C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Abstract

The present invention provides a method and apparatus for automating the acquisition, reconstruction and generation of knowledge bases of associated ideas (figure 1) and using such knowledge bases in many applications including machine translation of human languages, search and retrieval of unstructured text, or other data, based on concept search, voice recognition, data compression and artificial intelligence systems.

Description

Knowledge system method and device
Related application
The application is submission on October 29th, 2002, is numbered 10/281, the application that continues of 997 U.S. Patent application, the latter is submission on May 31st, 2002, is numbered 10/157, the application that continues of 894 U.S. Patent application, and this patented claim is submission on Dec 21 calendar year 2001, is numbered 10/024, the application that continues of 473 U.S. Patent application, the application requires submit to March 16 calendar year 2001, is numbered 60/276 simultaneously, submit, be numbered the right of 60/299,472 United States Patent (USP) preliminery application on 107 United States Patent (USP) preliminery application and June 21 calendar year 2001.The application also is submission on May 16th, 2002, is numbered 10/146, the application that continues of 441 U.S. Patent application, the latter is submission on April 5th, 2002, is numbered 10/116, the application that continues of 047 U.S. Patent application, and this patented claim is the application that continues of submitting, be numbered 10/024,473 U.S. Patent application Dec 21 calendar year 2001 to.The application submitted, is numbered the application that continues of 10/194,322 U.S. Patent application on July 15th, 2002, and the latter is submission on Dec 21 calendar year 2001, be numbered the application that continues of 10/024,473 U.S. Patent application.Above-mentioned all applications all are included in this as a reference.
Catalogue
Background of invention
I. brief introduction
II. the state-of-art of mechanical translation
III. the state-of-art that is used for the statistics natural language processing that semanteme obtains
IV. the state-of-art of artificial intelligence
Brief summary of the invention
I. brief introduction
II. as the word strings of meaning unit
III. the method and system of the natural language understanding of text mining is translated and be used for to language.Natural language interface and other application
A. general introduction
B. method and system
IV. prior art
Describe in detail
I. brief introduction
II. stride state knowledge base acquisition methods and device
A. use parallel text to obtain
B. use the multimode text to obtain
C. use destination document mighty torrent (flooding) to obtain
1. parallel text mighty torrent
2. target language mighty torrent
D. use multi-method difference to obtain
III. stride state knowledge method for reconstructing and device
A. use linked database and two anchor point overlap technique to carry out document translation
B. use that two anchor points are overlapping to carry out knowledge acquisition
C. other related application
IV. frequency analysis method and device are created and had to single state frequency linked database
A. brief introduction
B. the establishment of frequency linked database (FAD)
1. general introduction
2. use the FAD that reappears the word strings index
C. total frequency analysis-carry out knowledge base by correlating method and device to obtain and generate
1. independent total frequency analysis (ICFA)
2. relevant total frequency analysis (RCFA)
3. secondary frequency analysis (RCFA or ICFA)
V. use CFA to carry out single state knowledge acquisition
A. using ICFA to carry out the knowledge acquisition tabulation generates
B. using RCFA to carry out the knowledge acquisition tabulation generates
C. knowledge acquisition list ordering and filtration
1. use direct mutual relationship to carry out related with semantic triangle system
2. use inquiry with sign overlapping carry out related
3. use the synonym mighty torrent to carry out association
4. word strings cradle or signature scheme ordering
VI. be used to stride single state knowledge tabulation of state knowledge acquisition and reconstruction (translation)
VII. single state knowledge is rebuild
The scope that VIII.CFA uses
A. general introduction
B. data compression
IX. the single state CFA that is used for intelligent use
Technical field
The present invention relates to knowledge system, more particularly, relate to the application of knowledge system aspect mechanical translation, natural language processing and artificial intelligence system.
Background of invention
I. brief introduction
For decades, the researcher in each field of computer science attempting always exploitation make machine can with telescopic automated manner understand the human said natural language of writing (as, English, Chinese, Arabic) method.Though can allow computing machine carry out specific task by programming, present state-of-art can't provide automatic universal method or the system of understanding the meaning of word and expression in context.
The a lot of application comprises mechanical translation (or MT), speech recognition technology, search, retrieval and the text mining system and the artificial intelligence application of human language, all need understand natural language in the robotization mode and could realize optimum efficiency.The remarkable advantage under the extensive support of being applied in like this impels the capital of time of university, government and enterprise investment decades and tens above dollars to look for to make computing machine can handle and understand the method for written or oral natural language.Because having dropped into great efforts in these fields but achieves no breakthrough, a lot of people in the scientific circles begin to suspect the real machine understanding that whether may realize natural language.Even believe that much computing machine has the backer that can realize human intelligible in a day widely to think that also that day is still very remote.
II. the state-of-art of mechanical translation
Up to now, a lot of language translations all are to be undertaken by skilled translator, and the cost of doing like this is very high.The language Translation Processing is carried out robotization bring remarkable economic efficiency, comprise remarkable reduction translation cost, and support new translation application, as instant language text or voice communication and the multilingual daily journalism of striding time-sensitive.
With document is known from machine translating apparatus and the method that a kind of language is translated as another kind of language in the prior art automatically.Yet these equipment and method can not be translated as another kind of language with sentence from a kind of language usually exactly, therefore need people before the translation of output can be used for most the application such equipment to be produced a lot of mistakes and edit in a large number.Current systems technology development level can be resolved 60% to 80% word exactly in the translation of Romance language, but is usually less than 40% by the shared number percent of sentence of publishing quality that reaches in wide spectrum of these system's translations.Existing machine translation system is to the accuracy of non-Romance language translation even also low.Unique exception is that the processing of application is not striden by such system to the custom-built machine translation system of narrow and small professional range customization.Moreover most business machine translation systems all need people each direction of each language pairing to be dropped into the development of decades.
That document is pursued the equipment and the method for word translation is complicated more than providing to realize mechanical translation accurately.Because meaning and its residing context height correlation of each word, sentence is carried out simply can causing selecting word, the order of words of mistake incorrect and syntactic units is discontinuous by word translation.
In order to overcome these shortcomings, the design of existing interpreting equipment is to attempt in the context of sentence based on the combination of vocabulary, morphology, sentence structure and semantic rules or the translation of set selection word.These systems have developed more than 40 year and have been called mechanical translation (rule-based MT) system of " rule-based ", and they are defective, because such rule is had too many exception, so they can not provide stable and accurate translation.Mainly provide the foremost Systran of being company in the company of mechanical translation with rule-based method, it has just begun the exploitation of its MT engine in the sixties in 20th century.The establishment of regular collection is required great effort and is always incomplete very much, even because may, allowing human development person that all nuances of language are included in the limited regular collection also is unusual difficulty.
Except rule-based MT, in recent two decades, developed the machine translation method that makes new advances, be called the mechanical translation (EBMT) of " based on example ".EBMT uses and is stored in the sentence (also may be the part of sentence) of striding in the language database with two kinds of different language.When the sentence in the source language translation match query database, database produces the translation of this sentence in target language, is provided at the accurate translation in the target language.If the part of the sentence in the part matching database of source language translation inquiry, then these equipment attempt determining exactly in (being mapped to the source language sentence) target sentences which partly be the translation of inquiry." source " refers to will be translated as in a kind of language or the state content of another kind of language or state, and " target " refers to the language that source translation will be become or the content of state.
EBMT of the prior art system can not provide the accurate translation of language in extensive fields, will be " not exclusively " because may comprise unlimited group of database of striding the language sentence always.And because EBMT system translator unit coupling reliably, and also can make up the part of correct translation sometimes mistakenly, the accuracy of these systems is similar with rule-based engine.
Usually the independent another kind of machine translation method that uses or connect EBMT to use is statistical machine translation (SMT).The corpus of documents that the SMT system attempts using the combinations of pairs of translation document only to comprise target language is carried out the robotization Translation Processing.Compare with rule-based MT, EBMT and SMT both have significantly reduced the time to a certain language pairing exploitation translation engine.Therefore the accuracy of SMT system and rule-based MT and EBMT system similarity are not enough to produce document translation in extensive fields.
The SMT system uses the method that is called " n-gram (n-gram) model " in the prior art, and carries out information translation based on " noisy channel model " of Shannon.The translation of these methods hypothesis is always faulty, and from designing, the SMT method be exactly on the training corpus basis probability based on correct translation produce translation.These methods are carried out " best-guess " based on two in source language and the target language or maximum three other adjacent words in translation during each word.Along with the increase of striding language and target language training corpus size, the marginal contribution that these methods obtained is successively decreased, and has only made micro-improvement in the past few years.For example, announced the test result of its SMT system in some years in past recently in one of the highest SMT system of the quality of University of Southern California's exploitation.After training with the corpus (Canadian legislative procedure) of specific area, their system has correctly translated 40% in the text sentence (AMTA 2002 records, in October, 2002).
Some interpreting equipment makes up rule-based MT, SMT and/or EBMT engine (being called Multi-Engine Machine Translation or MEMT).May produce higher accuracy though these mixed methods are compared with independent any system, its result still is not enough to use under the situation that does not have a large amount of human intervention and editor.
III. the state-of-art that is used for the statistics natural language processing that semanteme obtains
The field of statistics natural language processing (NLP) comprises the research and development of the text of various application being carried out automatic machinery study.A kind of application of NLP is the SMT that is used for mechanical translation as mentioned above.Though various NLP methods attempt extracting meaning from natural language, as illustrated in the authoritative textbook of relevant this theme, present state-of-art is still far from real solution: " Holy grail that vocabulary obtains is obtaining of meaning.If we can automatically obtain meaning, adding up NLP so can be to the progress of making a breakthrough property of a lot of tasks (as text understanding and information retrieval).Yet unfortunately, as how certain can use the mode of automated system operation to represent that meaning still is a unsolved big problem." (Manning and Schutze, " statistics natural language processing basis ", the 5th printing,, the 312nd page in 2002).
Various tissues are starved of the knowledge of managing them better and obtaining in the inorganization text, inorganization text such as word processing document, pdf document, email message or the like.Though can efficiently searching and retrieval before be stored in information in the database, be called data mining method in the prior art, it still is impossible that the system of use current techniques development level excavates search concept and idea widely to inorganization text (80% in the expression data in the world or more).Though Boolean logic and other keyword search methods use the word in the inquiry that is included in the user to search information, can represent most notions and idea with other a large amount of modes, wherein a lot of modes and imprecise comprising even not approximate particular keywords or other search termses of comprising.This means when carrying out keyword search a lot of relevant documentations that identify in the time of will not comprising the search (being not limited to the looking up words that the user provides) of execution " based on notion " among the result.
For example, if " terms and conditions " submits (showing it is definite character string) to as the part of keyword search in quotation marks with word strings, system can find quoting " terms and conditions " so, but do not identify other words and word strings (word strings is the word that is in two or more vectors of particular order) or other abbreviations or the expression of expressing same concept, as " conditions of use ", " restrictions ", " tos ", " terms of service " and " rules and regulations " or the like, the user but may be interested in these speech.System adds the approximate semantic equivalence of search inquiry when searching relevant information ability will strengthen the quality and the efficient of search in every way.Moreover, there is not the dictionary that comprises whole phrase level synonym table or homoionym table.Its reason is because concerning the Alphabetical List of each word of manual creation, has two too many words and three word items, says nothing of the item that all are longer than three words.Pattern in the existing use text automatically generates method that similar vocabulary compiles and has obtained at the extensive semanteme of natural language and obtained limited success in the field.The method of current techniques development level uses the word mode that appears in the text to carry out concept extraction, and this comprises similarity estimating method, as uses the vector space model of various tolerance.Some trial in these methods is found synonym or relevant word by the point that word is designated in the context.
Some method is considered the word that the inquiry distance is different and is paid close attention to proximity and the co-occurrence of word with respect to inquiry.These methods comprise method (Martin, the Ney: two-dimensional grammar and three metagrammar word clustering algorithms, voice communication 24,19-37 page or leaf, 1998 based on n-gram; People such as Brown: class-based natural language N-gram model; Computational linguistics, 18 (4), 467-479 page or leaf, 1992; And based on the method for window people such as () Brown).The relevant works of in this field other comprises: Finch and Carter (1992, use statistical method to carry out Bootstrapping syntactic category); Schutze and Pederson (1997, based on the similar vocabulary of co-occurrence compile and in two kinds of application of information retrieval field), and other are a lot.Though contextual information can provide some result, be to use the result's that these methods obtain range and accuracy still limited, so their practical applications in the commercial product of search and retrieval, Content Management and information management are also very limited.
The linguistic rules, semantic knowledge, the body that the use manual appointment taxonomy of touching upon is used in a lot of advanced search and text mining.Can use these method and systems that semantic prompting is provided, to be used for data being carried out unit's mark and other purposes according to classification.In addition, some system comprises various monitoring and non-monitoring statistical learning and abstracting method, comprises assessing the bayes method that adds the dependent probability in search and/or the classification analysis to.These systems can not excavate text effectively, because these methods can not produce the stable accurately Search Results of (that is, relevant).In addition, because unit mark comprises in advance that to information classification so that use as a part that strengthens search, classification is determined to require to add static labels (this can evolve as time passes or change classification) to Multidimensional Concept.These systems design for mined information and other word and expressions of finding out with query term meaning equivalence.
The ability of the alternative expression of system identification word or the word strings semantic equivalence in a kind of language has multiple application.Except that text mining, the ability that generates the synonymous expression of any expression also is a very effective assembly in any machine translation system based on corpus.In addition, the ability of discerning the expression of meaning of equal value is the machine perception to natural language, and this ability can provide the basis for artificial intelligence (AI) application.
IV. the state-of-art of artificial intelligence
To the most grand target of the machine perception of human language is that it is used in the intelligence system that realizes complete human level, that is, and and reasoning and demonstrating in the system of the human general knowledge that is had reasoningly.The calculating in this field is called " strong AI ", and its ultimate aim is to make computing machine can understand natural language and people or other computing machines to use natural language to carry out alternately, learn notion, make understanding, and carries out cognitive task.Though the task of machine translation system is just understood information at the required level of the purpose that with information translation is another kind of form, the ability that strong AI application need has is not only and is understood fresh information and its other forms and state, also to answer a question and the mode of carrying out other cognitive tasks is handled this information with triggering system study, as the relation of concluding, make discovery from observation by prerequisite, and sub-goal is set seeks further knowledge acquisition, in order to needing in the future of expection.
Mathematician Alan Turing has invented figure spirit check in nineteen thirty-nine, and this is the conceptual design whether inspection machine has realized human level intelligence.Though by scheming the expectation that the clever machine of checking not necessarily satisfies all targets of strong AI fully, even if the most optimistic backer of strong AI thinks that also computing machine can not be convincingly by scheming clever the check in coming few decades.
AI method well known in the prior art has nothing in common with each other on method.Most commercial AI use compared with the target of strong AI and come much can only process range narrow task.Sometimes these application are called " weak AI ", produce the system of " idiot-savant " type with their multipotencys, such system is merely able to the narrow task of the scope of finishing, as plays Chess and reach great master's level.The whole bag of tricks that is used for producing these systems comprises hand-coding knowledge and rule, and comprises that the knowledge that can learn how vague generalization specific coding comes the system of the narrow task of execution scope.Now developed the additive method that training system is learnt, as neural network, but this remains in the very narrow field of scope.When real breakthroughs that lack to realize the extensive machine perception of nature person's speech like sound, the narrow problem of concern scope makes it possible to specific tasks are produced the application of practicality.
Preliminary trial to strong AI software is less relatively.The strong AI of typical case of the prior art system uses usually and is designed for specific purpose certain computer language encoding knowledge manually, and using system is handled such knowledge then, and they attempt polymerization answering a question or executing the task.The most famous example that uses the encoding knowledge body of manual creation in the strong AI system is by the Cyc system of computer scientist Doug Lenat in the CycCorp exploitation.The a large amount of general knowledge of the human hand-coding of Cyc system requirements reaches the knowledge (and the difference of understanding this knowledge is represented) specific to some field, and they are " rules " that this system follows.The example of the rule of hand-coding or knowledge comprises " in case people's death, they just stop to buy " or " tree normally out of doors " among the Cyc.Cyc developed since 1984 always, did not but have to produce to have the system of extensive human intelligence.Up to now, they have encoded and have been less than 2,000,000 concrete rule.
The key breakthrough of strong AI will have far-reaching influence.Along with using scalable Computer Processing and storer to solve thing and the problem that we face under the intelligence help of human level, the evolution of technical progress will significantly improve.The strong breakthrough of AI on the basis will be from changing the All Around The World known to us in essence.
Brief summary of the invention
I. brief introduction
The invention provides to the knowledge base of associated concepts obtain, rebuild and generate the method and apparatus that carries out robotization and in a lot of the application, use such knowledge base, described application comprises the mechanical translation of human language, carries out search and retrieval, speech recognition, data compression and the artificial intelligence system of inorganization text (or other data) based on notion search (non-keyword).In the present invention, because notion can be reappeared, create the knowledge base of associated concepts by the relation between the research notion in amorphous information.The expression of notion can but needn't be similar on quantity, length or size, and can with any medium (as, text, visual image, sound, infrared waves, smell, symbol) express or represent them.
The present invention also provides and creates and use knowledge base is other states with notion from a kind of state exchange, perhaps handles knowledge base so that use it for the method and apparatus of practical application.
In one embodiment of the invention, rebuild the knowledge base of having created, use it for human language translation and use with unlimited deriving.An alternative embodiment of the invention can be used to create the knowledge base of the association between the notion, so that set up their relations each other.When the notion of two or more types occurred together with ad hoc fashion, the association/relation of these notions can be as the trigger event of artificial intelligence application.
Each basic sides of the present invention comprises that knowledge base is obtained, knowledge base is rebuild, knowledge base generates and use knowledge base to change notion or the processing knowledge base is used for practical application.Knowledge base of the present invention is obtained aspect identification notion and their expressions in different conditions.Therefore, to handling the application of text writing, the meaning of identified word of the present invention and word strings unit comprises notion that is equivalent to translation each other in the different language and the notion of expressing same meaning in language of the same race.Knowledge acquisition of the present invention part also identification in some way in semantically relevant non-synonym word and word strings (as, antisense, same class members, relevant notion usually).
Knowledge reconstruction of the present invention aspect will piece together by the meaning unit that knowledge acquisition is learnt infinitely to derive becomes more complex conception.This allows the knowledge base of associated concepts as in broad range or handle the structure piece of different conditions notion in a kind of state.Therefore, the knowledge base of associated concepts can be used for entire document is translated as target language, and in language of the same race, represent complex conception with different forms, this makes the application as notion search, natural language interface, speech recognition or the like can carry out the understanding of robotization.
Cognitive task is carried out in the use (or study new knowledge) that knowledge generation of the present invention aspect uses the pattern of the complex concept of the connection that has identified to trigger the knowledge of previous study.The present invention realizes these and other targets by the multiple mode of each reproduction notion of recognition expression and the relation of setting up between the different concepts.Therefore, in one embodiment of the invention, represent notion, and system makes by the frequency that writes down two or more notions with proximity relation and the co-occurrence in text thereof related with human language.As mentioned above, notion is represented by the word strings of any size.
II. as the word strings of meaning unit
Unlike the vector space tolerance of SMT of the prior art system, semantic similarity, and other NLP monitoring or non-monitoring study, the reproduction word strings of the present invention's coupling and/or related size arbitrarily and other are the pattern of the reproduction word strings of size arbitrarily.This be applied in the method that comprises the definite word strings that stops word (word as " it ", " an ", " a ", " of ", " as ", " in ") as the meaning unit checked in the inorganization text of the present invention aspect all.By discerning and paying close attention to as the reproduction word of individual unit or the word strings of random length, the present invention can obtain the meaning of word in context.
For example, the present invention depend on context with " rock " be considered as representing various meanings (as, stone or a kind of music).When you checked word strings, further meaning becomes obvious: " rock " can represent stalwart in stone or the hard time, and " rock band " can represent one group of musician who plays rock music.Similarly, the word " between a rock " that occurs continuously depends on their residing longer word strings and has different meanings.If they are present in the word strings " between a rock band ' s sets ", then its meaning and they appear in " between a rock and a hard place " very inequality.Moreover " between a rock and a hard place " such its integral body of expression has the meaning that can not easily understand by analyzing its part.
The present invention reappears word with in the language each and handles automatic semantic acquisition methods formation sharp contrast with existing mechanical translation and machine perception as notion independently.In addition, the present invention handles each the reproduction word strings in the language also as single concept and contrasts with theoretical formation of modern languages, and the latter pays close attention to the semantic values of word in the context of other words formations.In linguistic theory, term " collocation " and " idiom " refer to that have can not be by checking meaning that the composition word is easily understood with word strings special circumstances as a whole because many words are expressed.In fact, the composition word has lost their independent semantic values, and only is associated with the notion of expressing when as the part of integral body.
For example, " pitch black " is exactly the example of collocation, and " between a rock and a hard place " is exactly the example of idiom.By contrast, the present invention is just with all words, collocation and the idiom atomic unit as meaning, and all word strings are handled as the atomic unit of possible meaning.Depend on their residing definite word strings, the present invention allow the word in the word strings keep their cores semantic values, change their core semantic values in delicate mode, or be different from their typical meaning fully.
For example, " baseball " is a kind of recreation, and " a baseball " is the object of a circle, and " a baseballteam " is a sport team, and " a baseball player " is a people.When handling the meaning unit in the application that need carry out machine perception to natural language, the present invention is independent as independently notion processing with various words strings that these comprise total word (baseball).Though the present invention does not use the syntax rule on the linguistics and do not mark the sentence element of word strings, method of the present invention still allows context with word strings as a cell processing and keep its feature of semanteme.
III. the method and system of the natural language understanding of text mining is translated and be used for to language.Natural language interface and other application
A. general introduction
The invention provides the several method and the device of striding language linked database (that is knowledge base) of establishment and additional notion.The second kind of form of first kind of form of usually related expression specific concept of these databases or information or the data of state and expression same concept or information or the data of state.Use these databases then, be called the document that two overlapping knowledge method for reconstructing of anchor point will comprise these notions among the present invention and be translated as another kind of state from a kind of state so that for example use efficiently.
A kind of structure is striden the method for language word string translation database and is used original document (parallel text) by human translation to come the co-occurrence of identified word string in the document of translation.Second kind of structure of the present invention striden the method for language word string translation database and translated by using from the known word strings translation derivation word strings of several other linguistic units in the language pairing.Another kind of method use of the present invention is striden language dictionaries and large-scale target language corpus and specific searching method and is come the translation of identified word string.It is overlapping that another kind of method of the present invention is called two anchor points, and it is by automatically expanding and stride language word string data storehouse (this is also referred to as knowledge reconstruction of the present invention aspect) according to the association of originally the learning new association of deriving.
The another kind of method and system of knowledge acquisition of the present invention aspect is by checking the associated concepts knowledge base that single kind language or state repeatedly occur creating of the notion of expressing with this language or state.For example, in the present invention, can create the associated concepts knowledge base of English by checking the reproduction of notion in different English documents of representing by word and word strings.Co-occurrence by notion (representing) around checking by adjacent word or word strings, discern other words and/or the word strings that have icotype in the language of the same race then, thereby the system that makes can discern word and word strings with initial (inquiry) word or word strings semantic equivalence (or having other semantic relations), and the present invention carries out knowledge acquisition to single notion (by word or word strings) of planting language performance.The method of carrying out total frequency analysis in one embodiment of the present of invention is used in knowledge acquisition in single kind state or the language.Generally speaking, total frequency analysis be among the present invention with two or more words and/or word strings interrelated and and other the 3rd words and the related method of word strings.
The knowledge reconstruction aspect that connects the adjacent data fragment among the present invention is two anchor point method of superpositions, and data slot is represented by word strings in this embodiment.The adjacent words string is assembled by the word strings that a connection and those word strings about it have overlapping word (or word strings) in this aspect of the present invention.System can use two overlapping adjacent known structure piece word strings that are connected of not running into as yet according to system of anchor point, generates new complex concept or represents known notion with new model.Of the present invention pair of anchor point method of superposition is used for connecting the notion of being represented by word strings (or other data slots), so that stride the bilingual translation document and connect single adjacent concept of planting in the language.
Knowledge generation of the present invention aspect allows the step of co-occurrence (total frequency analysis) triggering back of the 3rd word strings of the association that the user shares based near two different word strings that occur each other.This knowledge generation aspect will support strong AI to use.The CFA of back solves general problem in the logic chain of the use CFA of system triggering user design.Systematic analysis problem or sentence might be gathered its institute that resolves to known word strings.The difference of systematic analysis word strings then may make up discerns the known mode (that is two or more words and/or the word strings of together expressing with particular order) that triggers later step in the analysis.
B. method and system
In the field of mechanical translation, system uses any linguistry of striding that carries out the word strings translation in the several method to obtain, and uses the knowledge method for reconstructing to make up those translations.The progress that this has significantly improved the quality of existing translation technology and system and has represented state-of-art.
A kind ofly stride the method that linguistry obtains and to realize by the document that uses two or more language.Document can be definite translation each other, that is, " parallel text " document maybe can be the text of macaronic relevant same subject, that is, and and " can compare text " document.This obtains and can directly (maybe can compare text with parallel) between source language and target language and carry out.As be used for language when translation, system can stride automatically bilingual make up semantic equivalence notion (representing) with word or word strings stride language database.
An embodiment of the method for the present invention and system selects to have in first kind of language (source language) all words of repeatedly appearance and occurring at least for the first time and for the second time of word strings in available striding in the Language Document.Select first word scope and second word scope then in the document of second kind of language (target language), wherein these target language scopes are approximate corresponding to the first time of word of selecting in source document or word strings and the position (thereby the translation probability that comprises source word or word strings is higher) that occurs for the second time.Next, check those scopes in the target language, word that finds in first word scope of systematic comparison and word strings and the word and the word strings that in second the word scope every other target word scope of other times appearance of each word in the source language or word strings (and corresponding to), find, locate different word scopes total word and word strings, and those oriented total words and word strings be stored in stride in the conceptual data storehouse.The present invention is associated in word or the word strings of selecting in the total word of locating in the scope of target language and word strings and the source language in striding the conceptual data storehouse then, after associated frequencies adjustment as shown in Figure 1, by their associated frequencies (reproduction number of times) to its classification.Maybe can compare the co-occurrence of in text striding language with word strings parallel by identified word, along with the how parallel text that maybe can compare can be used, system can discern more association.
In case it is related that the word in the based target speech range and the frequency of word strings have been made, just can be by searching their corresponding those possible target language word strings translations of the further verification of scope in the document of source language conversely.Whether system can find out word that word the most frequent in the source language scope and word strings check initial selected or word strings and handling by this backward learning among the most frequent source language word and word strings that obtains then.
By automatically making up the translation (no matter and size of word strings) between the heavy existing word strings of parallel text intermediate frequency, the present invention uses in the character string the required embedded context of each word is obtained translation.The accurate translation of these word strings in embedded context provides the structure piece that can be used for translation document by suitable various combination (using aspect the knowledge reconstruction of the present invention).Along with systematic learning obtains the translation of word strings, these translations will be stored in the data repository, when needing with their translation documents once more in the future, can provide faster translation like this.The word strings that document is learnt to reappear can be operated by system, because they appear in the parallel text of checking in proper order, or can be based on the word strings of having selected that the specific parallel document study in the system reappears that is input to, because they comprise the word that needs to be translated as target language in the source language.The form of a kind of operation in back is " learning by doing ", and it is the example of instant learning.
The present invention also provides such linguistry acquisition methods and device of striding, the word strings translation that their use the database that made up automatically by the present invention that different language is condensed together and derives and directly learn by parallel text as yet between the bilingual.This multilingual lever method of the present invention is used by being the word strings translation of known intermediate language with source language translation, intermediate language being translated as the total result that target language generates indirectly then.
The linguistry of striding that target language was translated and then be translated into to identical the 3rd language that passes through the centre obtains multilingual lever method also can be used any prior art in these language machine translation system realization.Even the level of accuracy of these systems when using separately is lower, and the 3rd language by the centre can obtain public result still less in target language, and when several coming to the same thing, the accuracy of translation is higher.Moreover, by before confirming, use the adjacent word strings translation of two overlapping processing requirements of anchor point in target language, have more overlapping (as, it is overlapping all to have two words, three words or four word strings each side), can confirm these results.
Down a kind of target language corpus and/or parallel text that uses single language when striding linguistry acquisition methods related between the word strings that makes up different language of the present invention, and in following any one or more: the machine translation system of prior art, prior art stride language dictionaries, and/or customization stride language dictionaries.These methods of the present invention are used the technology that is called " mighty torrent ", use customization dictionary of the prior art or system (usually each word being produced a plurality of translation possibilities) to generate all available translation (the target translation can be a word or expression) of each word in the source language word strings by it even some or all translation possibilities are not suitable for this specific context.Use these words to come ferret out Language Document (single language corpus or parallel text) to discern candidate's translation of source language word strings to the various combination of word (and/or word-to-phrase) translation possibility.This processing is called " mighty torrent " because these words to word (with/with word-to-phrase) be combined in the target document as " mighty torrent ".Stride language learning with the parallel text of use and compare, the mighty torrent method of word strings translation needs more calculating, and still because it does not need parallel text just can make up the word strings translation, translation provides wider coverage rate to language for it.
Except that knowledge base is obtained, the of the present invention pair of anchor point overlap technique use the clauses and subclauses in the knowledge base rebuild bigger notion (as, with smaller units become piece together coherent in big unit).Therefore, the present invention provides also that to use to stride the structure piece notion that macaronic various words string list reaches be the method and apparatus of another kind of language or state from a kind of language or state exchange with entire document.The present invention has or makes up the database that comprises with the source language data slot of target language data fragment association.The present invention uses and to stride language word string translation database, and the translation (unless it is first or last word strings in the fragment after the translation) of only determining to have the word strings of overlapping word or word strings in the middle of source language and target language both sides comes cypher text.
In preferred embodiment, the present invention is by the above-mentioned database of visit, and also the word strings in the document that will translate is come cypher text simultaneously to begin the longest the identification database (being weighed by word number) from first word of document.The target language word string related with oriented source language word strings retrieved by system from database then.System's (from the document that will translate) selects to be present in the database and with the word strings of before discerning in document and has second word strings of overlapping word or word strings then, and retrieves and second target language word string that the source language word strings is related from database.If the association of target language word string has overlapping word or word strings, then composite object language word string association (get rid of overlapping in redundancy) constitutes translation.If not, retrieval (or instant learning) other target languages and source language word strings related from database then, and check combination by word overlapping, till success.Obviously, if can not discern or the overlapping word strings translation of learning objective language, then can use the overlapping word strings of replacement of other (shorter or longer) source language, and check the target language association of their correspondences whether overlapping, till success.Have and the word that the source language word strings of previous identification is overlapping or the longest word strings of word strings by in database, searching, select the next word strings in the source document, and continue above-mentioned processing, up to whole source document is translated as target document.Only will confirm as the set of conceptual translation combination with the word strings that the left side in source language and the target language and the right adjacent words string have one or more overlapping words.The starting point of overlapping word strings translation chain and terminal point can be by the starting point and the terminal points of sentence, or any other discernible text unit (as, phrase, title, paragraph, article, chapters and sections or the like) define.
The above-mentioned two anchor point method of superpositions of language of striding have increased the possibility that makes adjacent words string combination suitable on each word strings translation and context and the grammer with handling.Confirm that the required overlapping word quantity of connection between the adjacent segment is user-defined.The overlapping word minimum number that user-defined affirmation word strings makes up between the required adjacent segment is high more, and the result is accurate more.Striding the two anchor point overlap techniques of language can solve " border clash " problem that existing EBMT system runs into and increase use correct contextual possibility in whole translation.
In addition, based on the word strings translation of striding the statistical significance affirmation that language learning (or other knowledge acquisition methods) becomes the candidate but can not be defined by the user, can require to have more multiple folded word between two adjacent words strings according to user-defined demand and confirm.Also can use the word strings than smaller subset (that is inner word strings) to check and stride the overlapping underlapped part of confirming in the middle of the longer word strings of language in the long word candidate string translation unconfirmed with known translation.Notice that interpretation method is not limited to isometric word strings or is positioned at the word strings of same position in source language and target language sentence, so it is very flexibly.
The universal method that is called the establishment of frequency linked database also is provided in the present invention and device is created the word of single kind language and/or the proximity between the word strings concerns frequency meter.Use these proximities relation to plant the total association in the language and make related between word or word strings and other words and/or the word strings then by total frequency analysis of the present invention based on single.The present invention uses at single knowledge acquisition method of planting in the language and surrounds the context (being represented by word and word strings) that each reappears notion (being represented by word or word strings).Semantic relation be can discern and search and text mining application, mechanical translation and artificial intelligence application significantly improved with it.
The present invention allows at single kind of state, uses total frequency analysis method of the present invention to carry out obtaining of knowledge base in the language as single the kind.Use among the embodiment that has frequency analysis at one, the word and the word strings of the relation of other types between synonym notion and the notion represented in system identification.
For example, by checking English text, can to the word of sign semantic equivalence (that is, synonym) word and word strings or word strings make related (as, " nation ' s largest " and " biggest in the country ").The present invention also provides analysis word or word strings to search related word and the word strings (when they exist) that also produces the opposite notion of expression with word strings of word, the word and the word strings of expression definition, example and other related notions, related notion comprise common total genus the member (as, " red " all is the member as color class with respect to " blue " and " lime green "), and other relevant informations (as, the inquiry " MountEverest " may return " highest point in the world ") method and apparatus.
The present invention is by the word strings of the adjacent any size of identification and the word of present analysis or word strings, and whether discerns these adjacent words strings on word or the word strings left side or the right of present analysis, comes these relations between identified word and/or the word strings.Share a lot of identical left sides and the word and the word strings of the right adjacent words string and have strong semantic relation each other.Usually, sharing the different the right and the left side context words string of maximum quantity and comprise the right of longer (more words) and the word and the word strings of left side context words string, is semantically the most approaching or semantic maximally related.
The knowledge (knowledge that comprises instant generation) of obtaining and assembling in single language database can be used for expanding keyword search of the prior art and text mining method.The closely-related word of keyword and the word strings of semantic equivalence that for example, can be by searching key word and other and input strengthen these methods.The aspect of discerning the item of semantic equivalence by the context words string of discerning the total left side and the right among the present invention also can be used for decoding semantic code.If (in this context) be not suitable for or rare word or word strings as the code of the meaning outside its total one or more meanings of expression, its repeated use in rare context makes the present invention can discern the real semanteme of this semantic code bottom.
(translator annotates appendix A (179 pages): the association results example that various inquiries is used RCFA is provided the page number in the original text).Preceding 15 examples are showed the partial results (that is, the most forward 20-25 bar of each inquiry returns) of inquiry, though final example (to inquiry " it is important to note ") demonstrates and always has 1000 and return.These results reflect than any prior art strong the semantic acquisition methods of the robotization of Duoing.These results' key is with inflow (promptly, in the English the inquiry the left side) and the outflow (promptly, the right of inquiring about in the English) word strings of query concept is considered as contextual individual unit, and uses the word strings context on both sides to find out by shared some identical left side and the contextual word of word strings on the right and other semantic primitives that word strings is represented.
Use of the present invention pair of anchor point overlap technique, also can in the chain of overlapping notion, replace mutually by the same concept that the various words string list of language of the same race shows, producing a plurality of sentences of forming by overlapping semantic equivalence notion, these overlapping semantic equivalence notions combine represent identical than major concept.By the semantic equivalence conceptual data storehouse of language of the same race is provided with of the present invention pair of anchor point overlap technique (as above described to striding language translation), the present invention can reappear identical than major concept in a lot of different derivations.Two anchor points are overlapping, and knowledge promptly of the present invention is rebuild part, and are very useful to speech recognition and other natural languages identification application, and the expanded search combination of the same concept of expressing with different word strings combinations is provided.This ability also can provide very effective method to text mining task (quote mutually and follow the trail of as the entity relation) and other tasks.
The assembly that single aspect of planting the linguistry acquisition methods of generative semantics equivalence also can be used as in the mechanical translation application among the present invention uses.To owing to shortage information or any other former thereby can not translate source language word strings, can generate the alternate source language word string that to translate in its position.In addition, the semantic equivalence of the semantic equivalence of source language word strings and/or target language word string candidate translation all helps to confirm correct translation.
The present invention also provides total frequency analysis method and device, such method and apparatus in intelligent use by based on they each other proximities in text, discern the association that two or more words or word strings have jointly, use the relation of reappearing between word and/or the word strings to answer a question in any amount of mode to the 3rd word and/or word strings.The database of creating for intelligent use can make up by single document (or alternatively use and stride language text) of planting language.Two or more words of adjacent or overlapping (or have some other close proximities relation) and/or the appearance of word strings in problem, request or sentence can trigger among the present invention by user's design or the dissimilar total frequency analysis that obtained by systematic learning.
To discern in the problem of not appearing at, request or the sentence and in other usable text and in problem, request or sentence, offer two or more words of system and/or word and the word strings that the shared proximity of word strings concerns through the total frequency analysis that triggers.The 3rd word that word that these provide and/or word strings are total or word strings association can be used for discerning the next procedure of total frequency analysis chain, with understanding problem or order, and furnish an answer or execute the task.
The invention provides the method for strong AI, the related rank of language and the classification of any notion of such method by being expressed as word in the context or word strings provide dynamically the basis in creation of knowledge storehouse automatically.As long as enough training texts are arranged, this ability just can be to providing knowledge base by all situations that the intelligent use trigger makes full use of.
In some sense, " trigger " by making up next step used in specific known mode (by the total frequency analysis identification of the present invention to semantic equivalence and the equivalence class) time based on semantic category identified word under them and/or word strings by system, and how the invention of user's training book thinks deeply a class situation of being represented by the common-mode of notion.By word and/appearance that one group of notion of bigger vague generalization pattern is satisfied in the specific recognition mode of word strings (and/or known semantic equivalence) and identification discerns the general class of notion, system can be when having discerned those common-modes trigger policy (in case being trained to like this), the logic step below carrying out (knowledge base is searched or next step total frequency analysis) by the user.In case the user has created abundant " general policies trigger ", system just can learn automatically to the suitable trigger of a lot of other situation identifications.These initial trigger devices that are provided with by the user can comprise and are designed for how teaching system is provided with trigger automatically to various objectives trigger.
Another target of the present invention is a frequency of sound wave that human speech and other sources are produced with corresponding concept related in every kind of different language to them, so that relying in the speech recognition of the explanation of sub-audible sound and other application.
Another target of the present invention be with the vague generalization pattern of PEL matrix and other viewdata method for expressing and different language represent corresponding concept related so that in the visual identity of information gathering and artificial intelligence application, use.
Another aspect of the present invention is to use single symbol or mark, represents the notion of semantic equivalence as the point in numbering or the wave frequency, and this can be used as data compression method.
IV. prior art
Prior art system can not realize content described in the invention.For example:
Hargrave's United States Patent (USP) 5,724,593Disclosed the translation memory tools of assisting human translator, wherein text and corresponding translation have been loaded in the storer.With the text resolution of source language is n-gram.Analyzing source language n-gram determines the frequency of occurrences in the source language text and gives the entropy weight.Remove those and have N-gram high especially or low especially entropy weight, because they are not enough to be used for translation purpose.Translate " fuzzy matching " that is present in the translation memory by searching input, use remaining n-gram and corresponding translation to carry out machine aided translation with opposite index, for human translation person inspection.
When arbitrarily the reproduction word strings of size is when the reproduction word of size and word strings are related arbitrarily in the frequency of occurrences (deduct bigger word strings from substring after) of their identical approximating positions in parallel text and the target document in the source document, Hargrave does not use the association analysis of parallel text execution word strings.Hargrave does not use the translation of word and word strings indirectly by other the 3rd language.
Hargrave does not make source language word translation " injection " target language text that constitutes source language word strings and source language context words and word strings.Hargrave does not use the word strings of any size on the inquiry left side and the right to carry out word and word strings association analysis between single word of planting language and word strings.Hargrave also do not require the document input that will translate resolved to overlapping word strings in the source language, do not require that the translation of the target language of source language is resolved to also the word strings that adjacent translation with its left side and/or the right has overlapping word or a word strings to be confirmed to translate yet.
Cherny's United States Patent (USP) 6,085,162Disclosed the three-dimensional subject data base that is used for translating between language, wherein each of database layer is all represented the at user option theme relevant with translation.By the text resolution that will represent at least two kinds of different language sources is that word makes up database.In handling the individual branches of sequence, be based in part on the information grammatical function, grammatical form and the direct significance as them, will give different classes from the word through resolving in two sources.Use bilingual dictionary to translate one or more translations or association that the input word in each branch produces each word then.Handle word association together, so that use neural network for example to produce related forward and backward frequency from each branch.The database that is used to translate is made of layer, and each layer is all represented a theme, and each layer class of all comprising associated frequencies and all words in this theme being distributed.
When arbitrarily the reproduction word strings of size is when the reproduction word of size and word strings are related arbitrarily in the frequency of occurrences (deduct bigger word strings from substring after) of their identical approximating positions in parallel text and the target document in the source document, Cherny does not use the association analysis of parallel text execution word strings.Cherny does not use the translation of word and word strings indirectly by other the 3rd language.Cherny does not make source language word translation " injection " target language text that constitutes source language word strings and source language context words and word strings.Cherny does not use the word strings of any size on the inquiry left side and the right to carry out word and word strings association analysis between single word of planting language and word strings.Cherny also do not require the document input that will translate resolved to overlapping word strings in the source language, do not require that the translation of the target language of source language is resolved to also the word strings that adjacent translation with its left side and/or the right has overlapping word or a word strings to be confirmed to translate yet.
O ' Donoghue's United States Patent (USP) 5,867,811Disclosed how to remove least possible corpus alignment, used the Match Words frequency to improve the quality of the corpus that generates by additive method of the prior art by the corpus of revising alignment.The corpus of alignment is two or more text entities that are divided into aligned portions, and wherein each part of the corpus of first kind of language all is mapped to the counterpart of second kind of language corpus.Each part can comprise single sentence or phrase, but also can comprise a word or whole paragraph.The system of the robotization of the corpus of generation alignment is always not reliable in the prior art.The staqtistical data base that this invention uses the word that comprises correspondence to stride the frequency meter of bilingual appearance pairing detects possible mistake in the alignment textual portions.This invention also uses statistical method that the alignment mark of " word that becomes piece " is provided by the word pairing mark of all Match Words in the pairing of accumulating each piece.
When arbitrarily the reproduction word strings of size is when the reproduction word of size and word strings are related arbitrarily in the frequency of occurrences (deduct bigger word strings from substring after) of their identical approximating positions in parallel text and the target document in the source document, O ' Donoghue does not use the association analysis of parallel text execution word strings.O ' Donoghue does not use the translation of word and word strings indirectly by other the 3rd language.O ' Donoghue does not make source language word translation " injection " target language text that constitutes source language word strings and source language context words and word strings.O ' Donoghue does not use the word strings of any size on the inquiry left side and the right to carry out word and word strings association analysis between single word of planting language and word strings.O ' Donoghue also do not require the document input that will translate resolved to overlapping word strings in the source language, do not require that the translation of the target language of source language is resolved to also the word strings that adjacent translation with its left side and/or the right has overlapping word or a word strings to be confirmed to translate yet.
Hirakawa's United States Patent (USP) 5,579,224Disclosed the system that is used to create dictionary.The document of first kind of language and the document of second kind of language are loaded in the storer.From the document of first kind of language, extract word or character string, and, from the document of second kind of language, select corresponding word based on form that the word in the document of second kind of language is carried out and grammatical analysis.Near the word candidate of selecting in the document by near the word the word that extracts in first document of comparison and second kind of language word, relatively word candidate of selecting in the document of second kind of language and the word that from the document of first kind of language, extracts.Based on context and proximity word candidate is given a mark.
When arbitrarily the reproduction word strings of size is when the reproduction word of size and word strings are related arbitrarily in the frequency of occurrences (deduct bigger word strings from substring after) of their identical approximating positions in parallel text and the target document in the source document, Hirakawa does not use the association analysis of parallel text execution word strings.Hirakawa does not use the translation of word and word strings indirectly by other the 3rd language.Hirakawa does not make source language word translation " injection " target language text that constitutes source language word strings and source language context words and word strings.Hirakawa does not use the word strings of any size on the inquiry left side and the right to carry out word and word strings association analysis between single word of planting language and word strings.Hirakawa also do not require the document input that will translate resolved to overlapping word strings in the source language, do not require that the translation of the target language of source language is resolved to also the word strings that adjacent translation with its left side and/or the right has overlapping word or a word strings to be confirmed to translate yet.
Papineni's United States Patent (USP) 5,991,710Having disclosed by the target candidate set of letters in the target language being added up marking and discerning the candidate target set of letters with highest score is the system of target language with source language translation.This system uses statistical model to select most probable translation in the target language candidate, and for such application designs, and what wherein the field is limited in essence limited quantity meets may translating of input inquiry.
When arbitrarily the reproduction word strings of size is when the reproduction word of size and word strings are related arbitrarily in the frequency of occurrences (deduct bigger word strings from substring after) of their identical approximating positions in parallel text and the target document in the source document, Papineni does not use the association analysis of parallel text execution word strings.Papineni does not use the translation of word and word strings indirectly by other the 3rd language.Papineni does not make source language word translation " injection " target language text that constitutes source language word strings and source language context words and word strings.Papineni does not use the word strings of any size on the inquiry left side and the right to carry out word and word strings association analysis between single word of planting language and word strings.Papineni also do not require the document input that will translate resolved to overlapping word strings in the source language, do not require that the translation of the target language of source language is resolved to also the word strings that adjacent translation with its left side and/or the right has overlapping word or a word strings to be confirmed to translate yet.
McCarley's United States Patent (USP) 6,092,034Disclosed the statistical translation system and method that the word that uses reproduction model and meaning of a word model and use source language carries out quick word sense disambiguation and translation.Reproduction model is a language model of describing source language word reproductive probability, supposes that the context of source language word and source language word uses method of the prior art, as very big entropy ternary syntactic model.Meaning of a word model is to describe the language model of probability that target language word is the correct translation of source language word, suppose that the context of source language word and source language word uses ternary syntactic model and other methods of the prior art.
When arbitrarily the reproduction word strings of size is when the reproduction word of size and word strings are related arbitrarily in the frequency of occurrences (deduct bigger word strings from substring after) of their identical approximating positions in parallel text and the target document in the source document, McCarley does not use the association analysis of parallel text execution word strings.McCarley does not use the translation of word and word strings indirectly by other the 3rd language.McCarley does not make source language word translation " injection " target language text that constitutes source language word strings and source language context words and word strings.McCarley does not use the word strings of any size on the inquiry left side and the right to carry out word and word strings association analysis between single word of planting language and word strings.McCarley also do not require the document input that will translate resolved to overlapping word strings in the source language, do not require that the translation of the target language of source language is resolved to also the word strings that adjacent translation with its left side and/or the right has overlapping word or a word strings to be confirmed to translate yet.
The United States Patent (USP) 6,393,389 of Chanod has disclosed by source text being resolved to the method that sub-fragment is come cypher text.Use any in the several different methods of prior art that sub-fragment is translated as target language then.Any have the sub-fragment that a plurality of translations are selected, or because the use several different methods is translated or provided a plurality of selections owing to translate its method, all by user-defined method classification is carried out in those selections.Then, by present the highest word strings that the candidate created of classification that makes up each fragment continuously to the user, attempt the meaning of reception and registration source input in target language.In a further embodiment, user's lower fragment of classification that can swap out maybe can show a plurality of selections of a fragment.
When arbitrarily the reproduction word strings of size is when the reproduction word of size and word strings are related arbitrarily in the frequency of occurrences (deduct bigger word strings from substring after) of their identical approximating positions in parallel text and the target document in the source document, Chanod does not use the association analysis of parallel text execution word strings.Chanod does not use the translation of word and word strings indirectly by other the 3rd language.Chanod does not make source language word translation " injection " target language text that constitutes source language word strings and source language context words and word strings.Chanod does not use the word strings of any size on the inquiry left side and the right to carry out word and word strings association analysis between single word of planting language and word strings.Chanod also do not require the document input that will translate resolved to overlapping word strings in the source language, do not require that the translation of the target language of source language is resolved to also the word strings that adjacent translation with its left side and/or the right has overlapping word or a word strings to be confirmed to translate yet.
Determine whether the linguistic relation that the United States Patent (USP) 6,138,085 of Richardson has disclosed in not appearing at lexical knowledge bank and infer and no matter it does not appear at system in the lexical knowledge bank this semantic relation.Richardson only attempts defining the relation between the word.By searching one or more path between the word, the relation between two words that provided be restricted to a limited number of manual definition classifications (as, synonym, position, user, or the like) in a kind of.The path is included in other words that connected by the relation that manually marks out or derive in the database.
When arbitrarily the reproduction word strings of size is when the reproduction word of size and word strings are related arbitrarily in the frequency of occurrences (deduct bigger word strings from substring after) of their identical approximating positions in parallel text and the target document in the source document, Richardson does not use the association analysis of parallel text execution word strings.Richardson does not use the translation of word and word strings indirectly by other the 3rd language.Richardson does not make source language word translation " injection " target language text that constitutes source language word strings and source language context words and word strings.Richardson does not use the word strings of any size on the inquiry left side and the right to carry out word and word strings association analysis between single word of planting language and word strings.Richardson also do not require the document input that will translate resolved to overlapping word strings in the source language, do not require that the translation of the target language of source language is resolved to also the word strings that adjacent translation with its left side and/or the right has overlapping word or a word strings to be confirmed to translate yet.
Description of drawings
Fig. 1 shows the embodiment of frequency linked database of the present invention;
Fig. 2 shows the embodiment of the computer system that realizes method of the present invention;
Fig. 3 shows the memory devices of computer system of the present invention, and the routine package of realizing method of the present invention with which.
Describe in detail
I. brief introduction
As mentioned above, one aspect of the present invention provides and creates and additional knowledge base (knowledge acquisition) and this knowledge base of use are several diverse ways and the device of second kind of state (knowledge reconstruction) from first kind of state exchange with content." document " is meant by the symbol and the information of character representation and the set of notion that are fixed in the medium as said.For example, document can be the electronic document that is stored on magnetic or the optical medium, or paper document, as books.Being included in symbol in the document and character representation uses one or more expression systems to express so that notion and the information of being understood by the user of document.The document that the present invention handles first kind of state (that is, comprising the information of expressing with a kind of expression system) produces second kind of state document of (that is, comprising the identical in essence information of using second kind of expression system to express).Therefore, the present invention can be with they encoding process or translation documents (for example, with written and spoken word, as English, Rabbinic and Cantonese, being translated as other language) separately between expression system.On the other hand, the present invention can discern a notion or one group of notion different replacement in single kind of state or language is represented, and is current when different notion groups together, and retrieval automatically in the past or the instant learning relevant association (knowledge generation) of arriving.
To all aspects of the present invention, word strings is defined as the adjacent words that a group (two or more) are in exact sequence as mentioned above; Word, as described in this manual, can be independent of word strings or occur as its part, and can comprise the conventional word that can in dictionary, find, the ordinary symbol that can in dictionary, find (as, Chinese character), or in certain language or culture, have any other character or the symbol that can discern semantic values, these characters or symbol comprise abbreviation (as, " inc. " or " dept. "), symbol (as Or " MSFT "), acronym (as " ASAP " or " NCAA ") or the like, and depend on user-defined parameter, can comprise or be not included in the punctuation mark that uses in the language performance and other marks arbitrarily.When the present invention be applied even more extensively other medium input forms outside text (as, visual image) time, word refers to the least unit with the independent notion of other media representation, and word strings refers to use with the meaning unit strings of this media representation and as complete meaning unit.
System or the device of realizing knowledge base establishment of the present invention and content conversion or content processing method can be Fig. 2Shown in computer system 200.Computer system 200 comprises processor 202, the input equipment 210 that is connected to storer 208 by bus 214, and output device 212.Computer system 200 also can comprise memory device 204 and network interface 206.Processor 202 visits are stored in data and the program in the storer 208.By the program in the execute store 208, processor can control computer system 200, and can carry out various steps and come deal with data and control to comprise for example input equipment 210, output device 212, memory device 204, network interface 206, and the such equipment of storer 208.The program that is stored in the storer 208 can comprise the method for the present invention of carrying out, as content conversion, related word and word strings, and the step of database initialize and compensation process.
The information that memory device 204 records and storage back are retrieved by storer 208 or processor 202, and can comprise memory device well known in the prior art, for example, nonvolatile memory device, disc driver, tape drive and optical storage apparatus.Memory device 204 can stored programme and data, comprise being transferred to the database that storer 208 is used by processor 202.The part of complete database or database can be transferred to storer 208 to visit and to handle by processor 202.Network interface 206 provides computer system 200 and network 216 as the interface between the Internet, and will be can be by the form of network 216 transmission from the conversion of signals of computer system 200, and vice versa.Input equipment 210 can comprise keyboard and the scanner that for example is used for entering data into storer 208 and memory device 204.The input data can comprise will be stored in the text that is used to analyze the document of changing with content in the document database.Output device 212 comprises the equipment to the computer system user presentation information, and can comprise for example indicator screen and printer.
Be detailed description of the present invention below, comprise various database initialize method and apparatus (knowledge acquisition), and conversion method and device (that is, knowledge is rebuild).
The II joint is described and is created the distinct methods of striding slip condition database.III joint is described and is used database to change document the knowledge method for reconstructing and the device of (as, translation) between state.The IV joint is described the method and system that is called frequency linked database (FAD) establishment and total frequency analysis (CFA), and they provide the basis for the related notion knowledge base of creating in the single kind of state.The V joint is described a kind of embodiment identified word of the CFA that uses the IV joint and the method for semantic association between word strings and other words and the word strings and relation (knowledge acquisition tabulation).The VI joint is described combination additive method of the present invention and is used single state knowledge acquisition to help carry out the several method and the system of language translation.VII joint describe as how the mode of chain rebuild the word of semantic equivalence notion and word strings (the part identification of the knowledge base that makes up as the method for using the V joint to describe) single kind of state or semantic in the replacement form of the identical complex concept of generation.The VIII joint is described the method for other application of using method and system of the present invention.The IX joint is used for intelligent use with describing method and system in IV and the V joint.
II. stride state knowledge base acquisition methods and device
The invention provides several main methods that are used to stride the state knowledge acquisition, in one embodiment by word between the bilingual and word strings translation expression.Aspect first, obtain knowledge base with different states or the similar notion of semantic meaning representation of the present invention by analytical documentation and identification.A kind of method of obtaining knowledge base among the present invention is to check and relatively express the different document of same concept (of equal value or approaching as far as possible of equal value).The association of using the method to make up between the two states comprises that inspection is with two states or language representation's text or the same concept in the other materials.
Second method of the present invention is called multilingual lever, by using the known translation of having used method of the present invention or existing translation system to construct, also makes up the association of the notion of representing with two states.The method is called the multimode association, or multilingual lever.
The third method of the present invention, be called the target language mighty torrent, use any or multiple in single language corpus of target language and/or parallel text and the following method, make up the association between the word strings of different language: machine translation system of the prior art, the language dictionaries of striding of the prior art, and/or customization stride language dictionaries.System generates the replacement candidate translation (the target translation of source word can be a word or expression) and the ferret out Language Document of word in the source language word strings, searches the word strings of the various combination that comprises approximating different word translation.
A. use parallel text to obtain
A kind of method of creating between bilingual or the state of striding conceptual knowledge base of the present invention comprises checks or relevant macaronic document previous translation with operation.Use method and apparatus of the present invention, the feasible database of creating out comprises the association-conversion accurately of striding two states, or more specifically, with a kind of notion of state expression with the association between the notion of another kind of state expression.To the word or the word strings of each reproduction in first kind of language, analyze corresponding scope in the document of second kind of language, search word that the scope of striding second kind of language reappears and word strings (as Fig. 1After the shown subtraction adjustment).Along with by inspection of the present invention and the operation more documents, the association that translation between the two states is relevant with other becomes stronger, promptly more frequent, this makes that the most general association becomes obviously and can use this method and apparatus the word strings of first kind of new language to be converted to the word strings of second kind of language by enough big " sample " document is operated.
Another kind of embodiment of the present invention uses computing equipment, as personal computer system of providing in the prior art and so on.Though the normally common personal computer of this computing equipment (independently or be in the networked environment) can use other computing equipments, similarly as PDA, wireless device, server, large scale computer or the like.Yet method and apparatus of the present invention might not use such computing equipment, and can directly realize by other modes, comprises the manual creation cross correlation.Checking that continuous document enlarges document " sample " and creates the method for cross correlation knowledge can be different-and can be manually, by feedback (as the paper machine that adds automatically of the prior art) automatically, by the use search technique, automatically search relevant documentation, other Web research tools as the Web crawl device on the Internet, or, document is set is used for analyzing and handling by represent any other method of text with digital form.
Notice that the present invention can produce linked database by the compared text of checking except that parallel text (or substituting it).Moreover all available documents were checked by collective when this method was searched for reproduction word or word strings in a kind of language.
According to this embodiment of the invention, check and stride Language Document that knowledge base is striden the language frequency linked database for the translation of the word strings between two or more language for making up knowledge base.These word strings are served as the structure piece that is used to answer longer translation and inquiry.For illustration purpose, the document below supposing comprises two kinds of identical contents (or having general meaning, notion) in the different language.The language of document A is language A.The language of document B is language B.
First step of the present invention is the word scope of using when calculating the apparent position of the possibility association of determining any given word or word strings.Because use is striden language word and can not be produced effective result (promptly to analysis of words separately, word 1 among the document A is not to exist as the literal translation of word 1 usually in document B), and a kind of sentence structure of language is compared with the sentence of another kind of language, notion of equal value may be in diverse location (or being in different order), interior all words and the word strings of selected scope in each word or the word strings of the related first kind of language of database initialize method of the present invention and the document that appears at second kind of language.This also is very important, because a kind of language uses the word strings longer or shorter than another kind of language to express notion usually.By checking that two documents determine this scope, and use it for word in second document of comparison and each word or the word strings in word strings and first document.That is, check word and word strings in the scope in second document, search they may with the word of each reproduction in first document and word strings had related.By testing with this scope, the database initialize method obtains the word and the word strings of some second kind of language, and they may be equivalent to the word and the word strings of first kind of language and become its translation.
There are two attributes, must determine in order that they determine scope such in the document of second kind of language, search the related of any given word in the document with first kind of language or word strings therein.First attribute is the size (using in second document) of scope, and this weighs (as, 50 words) by the word quantity in the scope.Second attribute is the position of scope in second document, and this position by the scope mid point is weighed.Two attributes all are user-defined, provide the example of preferred embodiment below.When the size of determining scope and position, target is to guarantee that the probability that word in second kind of language of fragment of present analysis in first kind of language or word strings translation be included in the scope is higher.
Can make the size or the value that in all sorts of ways to determine scope, comprise common statistical method, as deriving bell-shaped curve based on the word quantity in the document.Use the statistical method as bell-shaped curve, the scope at document beginning and place, end can be littler than the scope in the middle of the document.The bell frequency of scope provides the rational chance of extrapolation translation, and no matter it is to derive according to the word absolute quantity in the document, and still the particular percentile according to the word in the document derives.The additive method that also has computer capacity, as " step-length " method, wherein scope is present in a level to the word of first number percent, and the word of centering equal percentage is present in second higher level, and the word of remaining number percent is present in the 3rd level that equals first level.Once more, all range attributes can be user-defined, or may parameters determine according to other of the target of the useful association of word of catching present analysis in first kind of language or word strings.
The user can the range of definition, or scope that system can be by starting from narrower definition (as, ten words) and spreading range iteratively, up to reaching threshold value or finding information needed in the target language, dynamically checks and adjusts so that determine final scope.
Two comparisons between the document word quantity are depended in the position of scope in the document of second kind of language.The standard that can be used for the document of definite range position is user-defined, and its example comprises the chapters and sections of the sentence of paragraph, alignment, new article, books, reaches any other the discernible discretely content element that is made of a plurality of data slots.If the word of two documents counting approximately equal, the then also word of present analysis or the position consistency of word strings approx and in first kind of language of the position (that is scope mid point) of the scope in second kind of language.If the word quantity in two documents is unequal, then use a ratio to come the position of orientation range correctly.For example, document B has 100 words if document A has 50 words, and then the ratio between two documents is 1: 2.The mid point of document A is word position 25.If the word in analytical documentation A 25 just, using word position 25 so is effective inadequately as the scope point midway among the document B, because this position (word position 25) is not the mid point of document B.On the contrary, scope mid point during word 25 among the analytical documentation A among the document B is determined the word ratio (making that the scope mid point among the document B is a word 50) between (1) two document in the following manner, (2) mid point of manual positioning document B, or (3) are by a lot of additive methods.
User-defined range size can be very big, locatees the word of first kind of language or the translation of word strings so that guarantee with high likelihood in the document of second kind of language.For example, may need scope definition is to comprise 25 words in the scope mid point left side and scope mid point the right 25 words (scope of 51 words altogether).The scope of 51 words in this example can be the scope from word 25 to 75.All combinations of word in the scope of 51 words and word strings are resolved and analyzed and need much calculate.
The more efficient methods of determining this scope is a scope of determining 51 words by above-mentioned, search for this scope then, search word or the word before the word strings and the specific known translation of word strings of present analysis in the document of the source of being right after (first), and the word or the word after the word strings and the known translation of word strings that are right after present analysis in the source document.The language association algorithm is striden so that the reproduction word in the scope of second kind of language and word strings carried out in the word of the user-defined quantity of identification and beginning and the end that the word strings translation will reduce the scope before the word of first kind of language of present analysis or word strings and in the scope afterwards.Be right after before the word of present analysis or the word strings and afterwards word and the known translation of word strings come " marking off " small range by use, therefore final range size is dwindled, and is must be to the word of the parsing of its compute statistics and the quantity of word strings.
For example, supposing the system is current is analyzing English word string " the most popular " so that make in English and the parallel text between the language X is learnt related with the word of language X and word strings.Further a sentence in the hypothesis English documents is " The car is the most popular mode of transportation inAmerica ".Rather than based on all word strings in 25 words about the scope mid point of the document of second kind of language of word ratio analysis correspondence, an embodiment is included in the interior known translation of checking " the most popular " the English word string before in the English documents of scope of 51 initial words of language X, as " The car " word strings translation in language X.In the reason, the present invention also can be positioned at the word strings word strings afterwards of present analysis in the English documents herein, as " in America " and its known translation in language X of location in initial range.By identification word strings these known translations in language X in the English, be used to resolve all scopes of reappearing words and word strings and will comprise may make up still less, and can also capture translation simultaneously.Equally, if the source language word strings of present analysis comprises system known unique (user-defined) word or mark, the scope mid point can be set so effectively, place it in the position of the translation of the mark word at approximate same position place in the target language text document.
Position by checking word in the document or word strings and as above-mentioned record drop on all words and word strings in the parallel Language Document scope, the language frequency associated data base establishing method of striding of the present invention returns word and/or word strings in the document of one group of second kind of language, and they can translate each word or the word strings of present analysis in the document of first kind of language.When using database initialize method of the present invention, can will dwindle along with the development of associated frequencies as the word that may translate and/or word strings set.Therefore, after checking the document pairing, the present invention will create a kind of word and/or the word of word strings and second kind of language and/or associated frequencies of word strings of language.Checked some document pairings according to the present invention after, stride the related database initialize method of language and will return more and more higher associated frequencies some word and/or word strings.After handling enough big sample, the highest associated frequencies will be brought possible translation, certainly, the final critical point that when associated frequencies is considered as accurately translating is defined by the user, and (as submission on March 16 calendar year 2001, title is the patent preliminery application 60/276 of " method and apparatus of contents processing " can to use other explanatory interpretation methods, those that describe in 107 are included in this as a reference with it).
As implied above, the present invention not only checks word, but also the check word strings.As mentioned above, depend on user-defined parameter, word strings can comprise all punctuation marks and other marks.If exist enough language texts of striding that the part of punctuation mark as word strings comprised, then normally wish to do like this.After the word of having analyzed first kind of language, database initialize methods analyst two word strings of the present invention are analyzed three word strings then, analogize down in the mode that increases progressively.The method makes that a kind of word of language or word strings are translated as word strings shorter or longer in the another kind of language (or word) becomes possibility, and such situation is often to occur.If word or word strings only occur once in all available documents of first kind of language, then processing begins to analyze next word or word strings immediately, and wherein analysis cycle begins once more.When available parallel and can compare when having analyzed all words that repeatedly occur in first kind of language and word strings in the text at all, analysis stops.
After determining scope, answer all documents of polymerization and they are considered as a document, be used to search reproduction word and word strings.Concerning the word or word strings that do not repeat, it must be available parallel and can compare in the text and only occurred once at all.In addition,, can check scope corresponding to each word and word strings as another embodiment, and no matter whether its appearance in all available compared peace are composed a piece of writing this surpasses once.
As another embodiment, can make up database by concrete word and the word strings that instant parsing is imported as the part of inquiry, rather than make up database in advance.When the word of importing the needs translation and word strings, stride language text by using Web crawl device, Web research tool and other equipment in location on the Internet, and based on to the analysis of inquiry with lack enough available situations of striding linguistic data and finally require the user that lose related is provided, the present invention can be in being stored in storer and do not analyze as yet stride the repeatedly appearance of searching word and word strings in the Language Document.So instant structure knowledge base is represented the mode of " learning by doing ", because system makes up word and word strings when needs are used to them to use, and also they is stored in and is used for reference in the future in the database.
Therefore the present invention works by this way so that analyze word strings, and works by this way so that solving the context of word selects and syntactic property, as word, pattern or abbreviation or the like.
The appearance of subclass word or word strings will be returned as association independently and as the part of bigger word strings.In one embodiment of the invention, after the frequency of listing the reproduction word of striding in the language text and word strings with form, system solves these also as the subclass word of the part of big word strings or the appearance of word strings.The present invention solves these patterns by deduct word or word strings from frequency counting as the number of times that returns than the part of big word strings, as Fig. 1Shown in.For example, intrinsic name is normally complete to be provided (as " John Doe "), by name or surname abbreviation (" John " or " Doe "), or by other modes abridge (" Mr.Doe ").The present invention obtains returning more word than word strings probably and returns (promptly, the name or surname have more than full name word strings " John Doe " more return), can be counted separately inevitably because constitute the word of word strings, also counted simultaneously as the part of phrase.Therefore, should use the mechanism that changes classification.For example, in any document, name " John Doe " all may occur 100 times, and " John " self or may occur 120 times as the part of " John Doe ", " Doe " self or may occur 110 times as the part of " JohnDoe ".The correlating method that does not have among the present invention to adjust will make when attempting analyzing word strings " John Doe ", the size scale of " John " " Doe " height, and the both is than word strings " John Doe " height.By from the occurrence number of subclass (or single returning), deducting the occurrence number of big word strings, can obtain correct classification (though, certainly, can use additive method to obtain similar result).Therefore, deduct 100 (occurrence numbers of " JohnDoe ") from 120 (occurrence numbers of word " John "), returning after the adjustment of " John " is 20.Use that frequency is 100 after adjustment of this word strings " John Doe " that analyze to produce, frequency is 20 after the adjustment of word " John ", and frequency is ten after the adjustment of word " Doe ", has so just created out appropriate association.When second kind of language of classification and first kind of language related, system deducts the occurrence number of big word strings association from the associated frequencies of all subclass.This notion exists Fig. 1Middle reflection.
In this embodiment, adjust the word and the word strings of in the scope of second kind of language, reappearing, from the frequency of each word or word strings, deduct frequency after the adjustment of (under it) all word strings as big word and word strings subclass.Can use other user-defined methods, make when word strings appears in the scope, adjust the final frequency counting of its word and word strings ingredient.
For example, a word strings meaning among the imaginary language X is " very good year ".Analyze this word strings and use of the translation association of parallel text structure from language X to English, and word strings " very goodyear " has occurred in the English scope 80 times, then word strings " very good " and " good year " and single word " very ", " good " and " year " will be counted 80 times in this scope altogether by system at least, because they are parts of this three word strings.An embodiment of native system can adjust frequency counting when they are the part of bigger reproduction word strings, prevents to twist this counting.Be how based on the part tabulation of the imaginary frequency counting of word in the following English documents scope and word strings below, stride the word strings of present analysis among the language X, the example of the mark of adjusting frequency:
Word or word strings Frequency counting Adjust the back frequency counting
Very?good?year 80 80
Good?year 130 50
Good 158 23
Year 140 10
Very?good 85 5
Very 87 2
These results adjust product after each frequency counting by deducting counting after all the word strings adjustment under it.By counting (80), " good year " (50) and " very good " (5) after the adjustment that deducts " very good year ", be to count after the adjustment of those its longer word strings of in scope, reappearing as its part, obtain counting (23) after the adjustment of word " good ".
Be arranged in the co-occurrence of the reproduction word strings of any size of striding the approximately uniform opposed area of parallel text by calculating, method of the present invention provide can be used for that document content is handled and conversion stride the conceptual data storehouse.Fig. 1 shows the embodiment that strides notion frequency linked database that uses parallel text creation by the present invention.This embodiment that strides the conceptual data storehouse comprises the tabulation of related data slot in first row and secondary series.Data slot is the symbol or the character group of expression specific concept in expression system.
For example, when the expression system in the document was to use the human language of word, fragment can be word or word strings.Therefore, the system A fragment in the 1st row is to represent data slot (in the present invention, being word or the character with semantic values) Da1, Da2, Da3 and the Da4 of each conception of species and notion combination with imaginary expression system A.System B fragment in the 2nd row is to represent data slot Db1, Db2, Db3, Db4, Db5, Db6, Db7, Db9, Db10 and the Db12 of certain combination of each conception of species (word or character with semantic values) and those notions with imaginary expression system B, and they sort according to the data slot associated frequencies with expression system A.The 3rd row are showed direct frequency, and this is the number of times of one or more fragments with the one or more fragment association of the language A that lists of language B.The 4th row are showed frequency behind the subtraction, one or more data slots of this representation language B deduct these one or more fragments as the number of times of the partial association of longer fragment after with the number of times of one or more fragment association of language A.
As Fig. 1Shown in, individual chip may the most appropriate and a plurality of fragment association, and for example Da1 and Db1 and Db3, Db4 three are related.Frequency is high more behind the subtraction between the data slot, and the probability of fragment that the fragment of system A is equivalent to the B of system is high more.Adjust the back frequency except using " total degree occurring " to weigh, also can recently weigh adjustment back frequency corresponding to the number of times percentage of specific system B fragment by for example calculating specific system A fragment.When using database to come translation document, in processing, at first from database, retrieve the highest associated fragment of classification.Yet, be used for making up two anchor point method of superpositions that fragment translates when the higher association of proof classification is incompatible with the context on the left side or the right, can use usually one different, that classification is lower is related.
For example, if Query Database, the association of search Da1 then can be returned Db1+Db3+Db4.If accurately making up the overlapping processing of two anchor points of the data slot that is used to translate determines to use Db1+Db3+Db4, then database can return down a kind of selection, Db9+Db10 upchecks and whether the overlapping accurate combination that obtains of adjacent one or more associated fragment can be used for translation.
In addition, when the associated frequencies of word is counted, can control database and ignore common words, for example in English, can not consider the word (in technology, being called " stop words ") as " it ", " an ", " a ", " of ", " as ", " in " or the like.This allows associated data base establishing method of the present invention to prevent the common words distortion analysis, and does not have extra subtraction (reducing noise and unnecessary calculating).Should note, even not from linked database, " deduct " the subclass word or the word strings of these or any other common words or big word strings, they finally can not be confirmed to be translation yet, unless in appropriate, because the overlapping processing of two anchor point (being described in more detail below) can not accepted it.
It should be noted that stop words is usually included in the analysis that the word strings under them is carried out.For example, though can control system ignore the appearance of word in scope as " a " and " is " when determining the frequency of word, system can not ignore word " a " and " is " of the part of the reproduction word strings of conduct as " she is a good student " so usually.
Can make other calculating of adjusting associated frequencies and guarantee that the quantity of word and word strings co-occurrence is accurately reflected.For example, can when the scope of the word of present analysis is overlapping, suitably make the adjustment of avoiding repeat count, as described below.Wish to adjust under these circumstances to obtain associated frequencies more accurately.
To use below Table 1Shown in two documents establishment of the present invention is described and replenishes the example of the embodiment of the method and apparatus stride notion frequency linked database:
Table 1
Document A (language A) Document B (language B)
X?Y?Z?X?W?V?Y?Z?X?Z AA?BB?CC?AA?EE?FF?GG?CC
Though this example is paid close attention to is reproduction word and the word strings that only comprises in the parallel text of several characters, and this only is used for illustration purpose.In the present invention, all are available parallel and can compare text and analyze and reappear word and word strings with polymerization.As implied above, if made up a plurality of texts, then can in all documents of polymerization, count then at first by checking that each document matches to determine scope to reproduction word and word strings in the scope.
The parallel document that use is listed above (the document A of first kind of language (or source language), and the document B of second kind of language (or target language)), the step of the database initialize method below carrying out.
Step 1. is at first determined the size and the position of scope.As shown, size and position can be user-definedly maybe can be similar to by the whole bag of tricks, these methods include but are not limited to the word of reference source document and destination document and count, search known vocabulary anchor point, search corresponding sentence boundary, or any other method.In this example, use word counting and their approximately equals of two documents (10 words to be arranged among the document A, 8 words are arranged) among the document B, therefore our orientation range mid point, make it with document A in the word or the position consistency of word strings (note, owing to the counting of the word between two documents ratio is 80%, also can determine the position of scope) by multiply by mark 4/5.In this example, use variable range size to be similar to bell-shaped curve: scope can (+/-) 1 word at document beginning and place, end, and in the centre (+/-) 2 words.Yet, as shown, the size of scope and the position method of scope (or be used for determining) are user-defined fully, and may more much bigger than scope shown here (selecting scope shown here just to be used to show these notions simply), so that increase possibility in the target language scope that the translation of source language word or word strings is in parallel text.
Next step 2. checks first word among the document A and uses document A check that it determines the occurrence number of this word in the document.In this example, first word among the document A is that X: X 1,4 and 9 occurs three times in the position in document A.The Position Number of word or word strings is exactly word in the document or the word strings position with respect to other words.Therefore, Position Number is numbered corresponding to the word in the document, and ignores punctuation mark.For example, if certain document has 10 words, and word " king " appearance twice, then the Position Number of word " king " is exactly the position that this word (in 10 words) occurs.
Because word X occurs in document more than once, all processing enter next procedure.If word X has only occurred once, then can skip this word and processing and enter next word, continue to create and handle.
The target language translation that the source language word X at step 3. home position 1 place is possible: document B is used this scope, obtain being arranged in the word of document B position 1 and 2 (1+/-1): AA and BB (being arranged in the position 1 and 2 of document B).All possible combination is returned as the association that may translate or be correlated with of X: AA, BB and AA BB (as the word strings combination).Therefore, X1 (word X the first time occur) returns AA, BB and AA BB as related.
Step 4. is analyzed the next position of word X.This word (X2) appears at position 4.Because position 4 is near the central authorities of documents, scope (determining as stated above) is each two word of both sides of 4 in the position.By checking that word 4 among the document B and range of application (+/-) 2 return possible association-therefore, return two words of word 4 fronts and two words of word 4 back.Therefore, return the word that is in position 2,3,4,5 and 6.These positions are corresponding to word BB, CC, AA, EE and FF among the document B.Consider all arrangements continuously forward of these words (and the word strings that is combined into).Therefore, X2 returns BB, CC, AA, EE, FF, BB CC, BB CC AA, BB CC AA EE, BB CC AA EEFF, CC AA, CC AA EE, CC AA EE FF, AA EE, AA EE FF and EE FF as possible related.
Step 5. relatively returning of (X1) occur the first time of X, and promptly the position 1, returning of (X2) occur with second time of X, and promptly the position 4, and determine coupling.Note, appear in two overlapping scopes and comprise that returning of same word or word strings answer abbreviation for once occurring.For example, in this example, the word at 2 places, position is BB, and this both the second time to X also occurred returning (when by this range operation) to (when by this range operation) first time of X occurring returning.Because X1 and X2 have been returned this identical word position, word have been counted as once occurring.Yet, if in overlapping scope rather than from two different word positions, return identical word, with twice of this word counting and write down associated frequencies.In the case, because this word (AA) appears in related the returning to X1 and X2, be AA to returning of word X.Notice that another word during the association that appears at both is returned is BB.Yet, as mentioned above,, can ignore this word (that is, it being handled as that only appears in these scopes) because this word is in the first time and the same position (being same word therefore) that the range operation that occurs for the second time obtains to X.
The next position (position 9) of step 6. analysis word X (X3).The scope (near the document end) of using (+/-) 1 will be returned related at the position 8,9 of document B and 10 places.Because document B has only 8 positions, will block the result, and only the probable value of word position 8 as X be returned: CC.(noticing that in addition, user-defined parameter also can require minimum two characters as the part of analyzing, can home position 8 and next immediate position (promptly being in the GG of position 7)) by them.
Relatively returning with returning of X1 of X3 shows not coupling, therefore do not have association.
Step 7. should be analyzed the next position of word X, yet, there has not been more X to occur among the document A.The associated frequencies of the word X of at this moment definite language A and the word AA of language B is one (1).
Step 8. is not owing to there be more having more of word X existing, and processing increases progressively a word, and the check word strings.In this situation, the word strings of inspection is " X Y ", i.e. two words among the document A.The same procedure of describing among the step 2-7 is applied to this phrase.
Step 9. finds that by checking document A word strings X Y has only occurred once.Increase progressively processing and stop this moment, and database initialize does not take place.Because reach home, so check next word (this handles and not to mate the generation whenever of word strings), in this situation, the word among the document A on the position 2 is " Y ".
The processing of step 10. pair word " Y " applying step 2-7 obtains following result:
Twice (position 2 and 7) appears in word Y, so database initialize is handled continuation (once more, if Y has only occurred once, then not checking Y in document A).
The size of the scope at 2 places is (+/-) 1 word in the position.
Document B is used the result that this scope (position 2, the position that word Y occurs for the first time) is returned the position 1,2,3 that is arranged in document B.
The foreign language word of the correspondence on those home positions is: AA, BB and CC.
Only check to arrange forward Y1 is produced following possibility: AA, BB, CC, AA BB, AA BB CC and BB CC.
Analyze the next position (position 7) of Y.
The size of the scope at 7 places, position is (+/-) 2 words.
Document B is used this scope (position 7), the result at home position 5,6,7 and 8 places: EE, FF, GG and CC.
Possibility below all arrangements all produce Y2: EE, FF, GG, CC, EE FF, EE FF GG, EE FF GG CC, FF GG, FF GG CC and GG CC.
Coupling is returned CC from the result of Y1 as unique coupling.
Combination produces the associated frequencies of CC as Y to the coupling of Y1 and Y2.
Step 11. scope end increases progressively: because the unique possible coupling (word CC) of word Y appears at the end (CC appears at the position 3 among the document B) of the scope that Y occurs for the first time, this scope increases progressively 1 in the appearance place first time, home position 1,2,3 and 4:AA, BB, CC and AA, or following arrangement forward: AA, BB, CC, AA BB, AA BB CC, AA BB CC AA, BB CC, BB CC AA and CC AA.Use this result, still CC may be translated as the unique of Y.Increasing progressively this scope is because the coupling of returning is positioned at the end (basis of word " Y " occurs) of the scope that occurs for the first time, this pattern no matter when occurs, all the scope end is increased progressively as substep (or replacement step) and carries out, to guarantee not block notion.
Step 12. is not owing to there be more " Y " to occur among the document A, analysis increases progressively a word in document A, and checks word strings " Y Z " (the next word of word Y back).Be incremented to next string (Y Z) and re-treatment, result below will producing:
Word strings Y Z occurs twice in document A: position 2 and 7.The possibility that Y Z occurs for the first time (YZ1) is AA, BB, CC, AA BB, AA BB CC, BB CC (in addition also can range of definition parameter, make along with the elongated spreading range size of the word strings of present analysis among the language A).
Y Z may (Y Z2) be EE, FF, GG, CC, EE FF, EE FF GG, EE FF GG CC, FF GG, FF GG CC and GG CC what occur for the second time.
Matching result, CC is related as the possibility of word strings Y Z.
Result: AA, BB, CC, AA, AABB, AA BB CC, AA BB CC AA, BB CC, BB CC AA and CC AA below spreading range (the scope end increases progressively) produces Y Z.
Use these results, still with the associated frequencies of CC as word strings Y Z.
Step 13. is not owing to more " Y Z " occur in document A, this analyzes and increase progressively a word in document A, and checks word strings " Y Z X " (adding next word afterwards by the word Z in document A (position 3)).Be incremented to next word strings (Y Z X) and repeat to manage (Y Z X occurs twice in document A) herein, obtain following result:
The scope that Y Z X occurs for the first time comprises position 1,2,3,4 and 5;
Be arranged as AA, BB, CC, AA, EE, AA BB, AA BB CC, AA BB CC AA, AA BB CC AA EE, BB CC, BB CC AA, BB CC AA EE, CC AA, CC AA EE and AA EE;
The scope that Y Z X occurs for the second time comprises position 5,6,7 and 8;
Be combined as EE, FF, GG, CC, EE FF, EE FF GG, EE FF GG CC, FF GG, FFGG CC and GG CC.
Compare both, with the associated frequencies of CC as word strings Y Z X.Once more, abandon EE, (that is, be in same position) because it appears in two examples as same word as the returning of association.
Step 14. is incremented to next word strings (Y Z X W), only finds its once appearance, so the word strings database initialize finishes, and checks next word: Z (position 3 among the document A).
The step of above step 15. is used Z being described finds that Z has occurred three times in document A, obtain following result:
To returning of Z1 be: AA, BB, CC, AA, EE, AA BB, AA BB CC, AA BB CCAA, AA BB CC AA EE, BB CC, BB CC AA, BB CC AA EE, CC AA, CC AAEE and AA EE.
To returning of Z2 be: FF, GG, CC, FF GG, FF GG CC and GG CC.
Compare Z1 and Z2, with the associated frequencies of CC as Z.
Z3 (position 10) does not return by being defined in the scope.Yet if we add such parameter, regulation must have at least one to return to each word among the language A or word strings, will be CC to returning of Z3.
Compare returning of Z3 and Z1, with the associated frequencies of CC as word Z.Yet, this association is not counted because in the association of above-mentioned Z2 the CC to 8 places, word position count.When overlapping scope can cause processing that certain is occurred carrying out dual counting, system can reduce associated frequencies, so that reflect real occurrence number more accurately.
Step 16. is incremented to next word strings, obtains word strings Z X, and it has occurred twice in document A.To the described step of Z X, obtain following result above using:
To returning of Z X1 be: AA, BB, CC, AA, EE, FF, AA BB, AA BB CC, AA BB CC AA, AA BB CC AA EE, AA BB CC AA EE FF, BB CC, BB CC AA, BB CC AA EE, BB CC AA EE FF, CC AA, CC AA EE, CC AA EE FF, AA EE, AA EE FF and EE FF.
To returning of Z X2 be: FF, GG, CC, FF GG, FF GG CC and GG CC.
Relatively these return, and obtain the association between word strings Z X and the CC.
Step 17. increases progressively, and next phrase is Z X W.This has occurred once, therefore checks the next word (X) among the document A.
Step 18. in first position detection word X.Yet, as yet not to word X may return with respect to second position detection word X of other documents.Therefore, from document,, as occurring the first time to word X, operate word X (second position) forward:
Returning of X to position 4 obtains: BB, CC, AA, EE, FF, BB CC, BB CC AA, BB CC AA EE, BB CC AA EE FF, CC AA, CC AA EE, CC AA EE FF, AA EE, AA EE FF and EE FF.
Returning of X to position 9 obtains: CC.
The result of comparison position 9 and position 4 as may the mating of word X, and provides associated frequencies to it with CC.
Step 19. is incremented to next word strings (because after searching forward, discovery X does not more have more now and can occur relatively with the second time of X) in document, obtain word strings X W.Yet therefore this word strings is not handled and is continued to check next word (W) more than occurring once in document A.Word " W " has only occurred once in document A, therefore increase progressively-not to next word strings, because word " W " has only occurred once, but next word-" V " in document A only occurred once in document A, therefore checks next word (Y).Word " Y " does not appear at any other positions that are higher than position 7 among the document A, therefore checks next word (Z), and 10 places occur word " Z " once more in the position after position 8.
Occur the second time of step 20. couple word Z using above-mentioned processing, obtain following result:
Returning of Z to 8 places, position obtains: GG, CC and GG CC.
Returning of Z to 10 places, position obtains: CC.
The result of comparison position 10 and position 8 obtains that word Z is not had association.
Once more, word CC is returned as possible association.Yet, because CC represents the same word position that the Z at the Z at analysis position 8 places and 10 places, position reaches, so ignore this association (that is, it being handled as that only appears in these scopes).
Step 21. increases progressively a word, obtains word strings Z X.This word strings does not appear on more among the document A (forward) position, and therefore restart-" X " at the next word place that handles in document A.Word X does not occur on any more (forward) position in document A, therefore handles and restarts.Yet reached the end of document A, therefore analyzed termination.
Step 22. as mentioned above, make up above-mentioned all results and when they occur, deduct repeat and the subclass string of bigger string (as Fig. 1Middle reflect such), list final associated frequencies with form.
Obviously, these data deficiencies are to return conclusive result to word among the document A and word strings.Comprise and have those related words and the word strings of checking above along with being checked through the pairing of more document, associated frequencies will increase, make stronger related of word and word strings translation structure between language A and the language B.Though typical user-defined scope can guarantee that translation is included in wherein much larger than three words, above-mentioned range computation has still been showed such notion.
Related for what further strengthen using parallel text and above-mentioned processing to make up, can move this processing by opposite direction.System can use above-mentioned processing to use the most frequent target language word string translation candidate of in target language scope appearance, and uses available parallel text to make up the related of those target language words and source language word strings.If initially generate the classification enough high (based on user-defined frequency or number percent) on the target language candidate list of target language translation candidate's source language word or word strings, then the target language of this source language item can be translated the legal translation that the candidate confirm as this source language item (word or word strings).This is called " two-way locking mechanism " of the present invention.Finally, can use the parallel text in each language pairing, on both direction, construct linked database.
Using parallel text to stride among other embodiment of language association,, determining scope corresponding in the target language according to said method to each reproduction word of present analysis in the source language or the scope in the word strings select target language.Then all reproduction words and word strings in those scopes are added to the frequency counting of coming together to obtain them.Deduct the frequency counting of big word strings the word in scope and the frequency of word strings, the smaller portions of big word strings counted avoiding, as top Fig. 1In describe and show like that.Compare with the foregoing description of word that is independent of related each scope of every other scope and word strings, this will give the most frequent word strings littler weight.Therefore, this described embodiment needs more document to make up reliable translation usually again.
For example, suppose currently, attempt in the parallel document of language Y, finding association just in the word strings " ll mm pp " of metalanguage X.If word strings " ll mm pp " occurs four times in the document of language X, then in the document of language Y, determine the word of four language Y and the scope of word strings, each is all corresponding to a language X word strings " ll mm pp " that occurs in the parallel document.If a correct translation among the language Y is " KK BB ZZ ", and it appears in all four scopes, and then the foregoing description can produce frequency counting 4.The embodiment of front (being independent of each scope of every other surface analysis) can produce the frequency counting 6 of " KK BB ZZ ".In case determined scope, can use various user-defined methods to come to list the frequency of reappearing word and word strings with form, this depends on the tabulation method, and higher or lower relative weighting can be provided to independent result.Said method is showed two preferred embodiments of tabulation method.
These language can be the conversions of any kind, might not be limited to oral/written language.For example, conversion can comprise computerese, specific data code, as ASCII or the like.Database is dynamic, that is, database is grown along with content is input in the translation system, and translation system uses the content of original input to carry out the iteration of back.
As shown, present embodiment is represented a kind of related method that is used to create of the present invention.Method of the present invention is not limited to the language translation.From broadly, these methods can be applied to two kinds of expression of any same concept that can be related, and therefore from essence, foreign language translation is exactly the pairing association of the same concept represented by different words or word strings.Therefore, the present invention can be applied to associated data, sound, music, video, computer programming language, or any representation of concept widely, comprises the notion that embodies by any perception (sound, vision, smell or the like).It is required for the present invention that what want is to analyze two related concrete forms of the co-occurrence of same concept by in time (or under the situation of document, the position of co-occurrence).
To not using the word or the word strings of striding the Language Document translation, an alternative embodiment of the invention (back description) can generation and target language or source language in word or the word and the word strings of word strings semantic equivalence, provide identification to substitute the additive method of word or word strings translation.Certain kinds member in the extensive classification (as name and numeral) that the method also allows to exchange shared same context and can have a unlimited member sometimes.
In addition, if available stride Language Document the translation result with statistical significance can not be provided, then user-defined parameter can make up of the present invention other and stride language word string correlating method, replaces or makes up, and uses the method for parallel text.As a last resort, the user can also check the candidate, so that it is related with other not satisfied the translation of user-defined affirmation threshold value, and manual confirmation and the appropriate selection of classification.
B. use the multimode text to obtain
An alternative embodiment of the invention provides the association between each and the other third state of using in that two states, makes up related method between the of equal value or similar notion of bilingual or state.Along with the document of having checked more multilingual pairing, based on all having related with the 3rd other language but there are not those language of direct correlation each other, method and apparatus of the present invention can begin to fill " deriving related " between the language pairing.This type of indirect translation of being undertaken by various states is called " multilingual lever ".
Source language word strings in current translation has known translation in one or more the 3rd language, and when the 3rd different language translation all had known translation in target language, the derivation association that obtains by multilingual lever method can produce between the text of a pair of language.For example, if there are not enough phrases that language text can directly be translated as the phrase " aa dd pz " of language A language B of striding, then this phrase and the translation of this phrase in language C, D, E and F that association is derived and can be comprised comparison language A is as shown in table 2.Then, " the aa dd pz " translation in language C, D, E and F can be translated as language B, as shown in table 3.The derivation association further comprises more the phrase of the language B that the translation translation of " aa dd pz " is come from language C, D, E and F between the phrase of the phrase " aa dd pz " of language A and language B.Part the phrase of the language B that the translation of " aa dd pz " translation is come from language C, D, E and F may be identical, and in this preferred embodiment of the present invention, this will represent the correct translation of phrase " aa dd pz " in language B of language A.As shown in table 3, language C, D and the translation generation identical language B phrase of F to language B, this provides correct language B translation, " UyTByM ".Therefore, can create out the phrase of language A and the derivation association between the language B translation thereof.E Language produces another language B phrase ZnVPiO to the translation of language B.This shows that the phrase " aa dd pz " of language A or the phrase " 153 " of E Language may have multiple meaning, or the phrase UyTBym of language B and ZnVPiO are semantic equivalences (or approximate) and will obtain confirming when being translated as phrase " ZnVPiO " by another kind of language indirectly or using some additive method to produce this translation result.
Table 2
Language A Language C Language D E Language Language F
aa?dd?pz A1d Zyp 153 1AAAA))$
Table 3
Language Translation to language A " aa dd pz " Be translated as language B
Language C A1d UyTByM
Language D Zyp UyTByM
E Language 153 ZnVPiO
Language F 1AAAA))$ UyTByM
In another embodiment, the accuracy of using above-mentioned multilingual lever method and apparatus of the present invention can improve translation system of the prior art.Existing translation system (as, rule-based MT, SMT) obtain inquiry and produce the result who is translated as language B from language A, can compare this result be translated as from language A other language (as, language C, D, E and F) inquiry translation result (use prior art system and device) and next with the result who is translated as language B (using prior art system and device) from those language relatively.
In order to confirm translation, use an embodiment of the multilingual lever of existing machine translation system can require each target language word string (translating by some the 3rd language indirectly) all to appear among the user-defined total result of some in the target, as described above.The word strings (using centre the 3rd language of existing translation system) of indirect target language translation that requirement has user definition quantity before confirming in target language coupling fully each other, this will improve the accuracy of each translation of words string.Though the accuracy of the translation system of prior art is not high,, can obtain the total result of the some that obtains by different centres the 3rd language in the target language if use the 3rd abundant language translation system.Moreover by these indirect target language are translated and desired relative overlapping being connected of high user definition of the of the present invention pair of overlapping aspect of anchor point (being described in detail later), the result precision of this embodiment can further be checked and be improved.
By making up word strings translation and translation system of the prior art of striding in language learning and the database of the present invention, another embodiment of multilingual lever method can use the translation and the translation from those the 3rd language to target language of the 3rd language from the source language to the centre.Use identical cardinal rule to confirm the target language translation; Obtain the total indirect target language translation result of user-defined quantity by the 3rd different language.
Desired total target language result's quantity is user-defined with the intermediate language quantity that is used for multilingual lever.Use many more translations that indirect translation that other language obtain comes verification word strings or any other data slot of passing through, the present invention can produce accurate translation more definitely.As the final inspection of confirming, based on user-defined criterion, can use method same as described above, by one or more the 3rd language the target language translation result is translated back source language.If get back to the translation of source language is the initial source language word string that will translate, or initial source language word strings semantic equivalence (using the of the present invention total frequency analysis of describing later to determine), then can confirm this target language translation.
C. use the destination document mighty torrent to obtain
Another aspect of the present invention is used the single language corpus and/or the parallel text of target language, and any or multiple in the following method, make up the association between the word strings of different language: machine translation system of the prior art, the language dictionaries of striding of the prior art, and/or customization stride language dictionaries.These methods are used " mighty torrent " of the present invention technology, use the system or the system of the prior art of customization, generate the possible target language translation of the word in each word strings that from the source language inquiry, parses, (even some possible word translation is wrong) as described above, ferret out Language Document then, the candidate that the various combination (the target language translation of source language word can be a word or expression) of searching possible word translation produces the target language word string translates tabulation.
In another embodiment that uses the mighty torrent technology, the source language collocation and the idiom that constitute by two or more words in dictionary, have been comprised.In this embodiment, at first check each source language looking up words string, discern the part or all of idiom or the collocation word strings of any formation looking up words string.If in inquiry, identify idiom or collocation, then from dictionary, retrieve the translation of idiom or collocation, and come the ferret out language corpus, rather than use the translation of the word that constitutes idiom or collocation used as the part that mighty torrent is handled.Obviously, any other source language word strings can be added in the dictionary, and is used in mighty torrent and is translated as target language in handling, rather than translates those words individually.
1. parallel text mighty torrent
In one embodiment, use parallel text and translation system of the prior art (or striding language dictionaries) simultaneously.In order to make up the target language association of source language word strings, in source document, locate the appearance of each word strings, and in the target document of parallel text, determine corresponding scope.Make up the same way as of striding the language association with the parallel text of above-mentioned use and determine the target language scope.Use machine translation system of the prior art, dictionary of the prior art, or the dictionary of customization generates a translation (or a plurality of translation, if used a plurality of systems) of source language looking up words string.Use the scope (even the part translation may be wrong) in these translation search target document then, discern word and word strings as the translation candidate.If any one among word of being discerned or the word strings translation candidate user-defined quantity or number percent have occurred in the scope that mighty torrent is handled, then can will should association confirm as translation.Stride language dictionaries rather than MT engine of the prior art if use, (translation of the target language of source language word can be a word or expression, as described above) and use the various combination of method identified word translation in the parallel text of target language that next joint describes the target language mighty torrent then to use all possible known translation of each word to come each word in the translation source language word string.In addition, can search source language inquiry word strings, search idiom or collocation (using the source language clauses and subclauses in the language dictionaries of striding of idiom and collocation), if source language looking up words string comprises idiom and/or collocation, then can use this translation may carry out mighty torrent to the target language corpus to the translation of word (and/or word is to phrase) and handle by word, as described herein such.
2. target language mighty torrent
Use the another kind of method and the embodiment of mighty torrent method, stride each word in language dictionaries (or translation system of prior art) the translation of words string by use, and use the target language corpus to search out those translation of words groups in all available target language word strings now, can be target language from source language translation with word strings.The method and do not rely on parallel text and only need big target language corpus (as, document database, WWW).The corpus that the method only need be made up of target document, and do not need its appropriate translation document of another kind of language, this has expanded the chance that the association of language word string is striden in the present invention's identification.All methods as identified word string translation among the present invention, the word strings that can will translate from source document resolves to has the user definition size (promptly, word number in the string) and the word strings with overlapping word (as hereinafter described) of user-defined minimum number generate the word strings that is used to translate analysis immediately, maybe can check word strings, so that it is added in the translation knowledge storehouse.
Use target language mighty torrent method, at first, language dictionaries (or other translation systems of the prior art) is striden in use, is target language to the mode of word (and/or word is to phrase) with each word translation in the word strings (source language looking up words string) with word.Dictionary provides a plurality of options or candidate usually, discerns all target languages translation candidates that these are provided each word of the word strings of present analysis by dictionary.Dictionary also can comprise the translation of the source language word that is translated as target language word string (that is phrase).In this situation, such word strings can be translated as individual unit, be used for the ferret out language corpus.Dictionary also can comprise the common source language idiom and the translation of collocation.Can search source language inquiry word strings, search idiom or collocation, and if source language looking up words string comprise idiom and/or collocation, then also can use their translation that the target language corpus is carried out mighty torrent and handle, as described herein like that.Use idiom and/or collocation to the target language corpus carry out mighty torrent handle mighty torrent that the translation candidate that can described here use with word the mode of word (and/or word is to phrase) be generated carries out handle before or and carry out simultaneously.In addition, if the present invention is used for such source language, wherein can be by word of the incompatible formation of the particular group of certain mode combined word, then can adjust native system the word of those types is resolved to two or more separate constituents that are translated as two or more independent target language words.
For example in Rabbinic, the Rabbinic letter (Rabbinic letter " vuv ") that looks like for " and " appends to the word front of its indication, rather than has the independently word of the meaning for " and ".In this situation, the present invention can parse the word that starts from " vuv " from the remaining part of word, and generates the translation to " and ", and the translation of the remaining part of " vuv " residing Rabbinic word.In addition, if use the translation system of prior art that word is translated as target language separately, then these systems produce two or more target language words to the example of those combinations of words in the source language usually.The rule of different language comprises that the combinations of words, word deforming and other root words that are caused by tense, odd number, plural number or the like change, and can put the semantic primitive that these rules are expanded employed dictionary word and indicated exactly to search in order in the target language corpus.
Next, after (or idiom or collocation) generates independently target language word translation to each word in the source language looking up words string, systematic search target language corpus, search the word strings that has user-defined maximum length and comprise the translation candidate of user-defined minimum number (or number percent), these translations candidate is that each word (and other user-defined search conditions) to source language looking up words string generates.In order to satisfy user-defined searching requirement, in the target language word string, the candidate who generates for each source language word who is less than is translated counting.If it comprises any combination with any order appearance of the candidate with the user definition minimum number that is generated by different source language words, the target language word string that then has user-defined maximum length is eligible.
From described " tabulation of query string mighty torrent ", return qualified word strings.In addition, user-defined requirement can be provided with the parameter of query string mighty torrent tabulation based on the proximity of source language word and target language correspondence thereof.For example, user-defined parameter can require the target language of source language word to translate now in the scope with the word of the quantity of the target language translation distance users definition of adjacent source language word.Can retrieve the candidate based on other user defined search parameters, these parameters are included in the distance between the word in the source language word strings and the relation of the distance between their its appropriate translation in target language word string translation candidate.Moreover the parameter of Any user definition can be included in these and/or other factor in the target language translation candidate classification.To selecting and these of classification are provided with based on the relation between the macaronic structure, depend on the language pairing and different.
In order to show the mighty torrent method of only using the target language corpus, consider four word strings that will translate among the language X:
“aa?bb?cc?dd”
System can be a target language with each word translation in the string, i.e. language Y.Suppose that in striding language dictionaries each word in the word strings of above-mentioned language X has the following definition in language Y:
The word of language X Translation among the language Y
aa ? AA1、AA2、AA3、AA4、AA5、 AA6
bb BB1、BB2、BB3
cc CC1、CC2、CC3、CC4
dd DD1、DD2、DD3、DD4、DD5
The corpus that system can the ferret out Language Document then, the translation of the user definition minimum number of this word of location in user-defined scope (but only with a candidate of any concrete source language word to least count).In this example, suppose that parameter is arranged so that the minimum word that three translations must be arranged (only a translation of any source language word being counted) appears at and comprises six or still less in the string of word altogether, and no matter the order that word position or they are occurred.Concerning this example, the part tabulation that appears at the qualified word strings of some possibility in the imaginary target language corpus can be:
Query string mighty torrent tabulation (part)
1.DD1?AA2?CC2?BB3
2.AA1?BB1?CC3?EE1
3.BB2?FF1?KK1?AA2?LL3?DD5
4.DD4?PP1?UU1?AA6?CC4?BB2
5.CC1?KK1?RR2?BB3?DD4
6.BB1?CC3?EE1?DD4
By constitute any two results of big word strings result on the recognized list with overlapping word strings, can further expand returning to the tabulation of query string mighty torrent.These word strings combinations can be added in the tabulation of query string mighty torrent as possible word strings translation.For example, in the above-mentioned tabulation of returning, can return " AA1 BB1 CC3 EE1 " and the 6th by second of overlapping word strings combination and return " BB1 CC3 EE1DD4 " and constitute " AA1 BB1 CC3 EE1 DD4 ", it can be added to during the query string mighty torrent tabulates.
Based on user-defined criterion returning in the tabulation of query string mighty torrent carried out classification, described user-defined criterion comprises quantity (or number percent) maximum of the source language word translation (only a target language translation of each source language word being counted) in (1) target language string usually at least, and (2) satisfy the target language word string minimum (word number is minimum) of the user-defined criterion of article one to the source language word translation of minimum number.For example, based on these two criterions (and giving article one criterion the weight bigger), can above-mentioned returning be classified as by following than second:
1.DD1?AA2?CC2?BB3
2.AA1?BB1?CC3?EE1?DD4
3.DD4?PP1?UU1?AA6?CC4?BB2
4.AA1?BB1?CC3?EE1
5.BB1?CC3?EE1?DD4
6.CC1?KK1?RR2?BB3?DD4
7.BB2?FF1?KK1?AA2?LL3?DD5
Above-mentioned classification reflects that the weight of article one criterion under user definition (in the word strings quantity of translation of words) is bigger than second criterion (satisfying the word strings minimum of article one criterion).The highest result of classification comprises the word of all four translations in four word strings.The result of classification second is by returning the word strings of overlapping establishment (and add to query string mighty torrent tabulation) with other, and comprises the word of all four translations in five word strings.The result of classification the 3rd comprises all four translations in six word strings.Classification the 4th and the 5th sane level as a result are because two word strings all comprise three in the word of four translations in four word strings.The result of classification the 6th comprises the word of three translations in five word strings, and the minimum result of classification comprises the word of three translations in six word strings.
In addition, can use user definition criterion based on the distance between source language word and their the target language counterpart.For example, if the translation that user-defined criterion requires adjacent source language word just can enter the tabulation of query string mighty torrent each other in three words or shorter distance, then can get rid of the member of classification the 3rd (DD4 PP1 UU1 AA6 CC4 BB2) and the 6th (CC1 KK1 RR2 BB3 DD4).Note, can meet the condition (that is four to the six-DD4 PP1 UU1 AA6 CC4 BB2 of the word in the word strings) that enters the tabulation of query string mighty torrent as the less word strings of the result's of classification the 3rd subclass.Note simultaneously, when source language word (or collocation or idiom) when being translated as the target language word string, in order to handle (except because the special characteristics of language makes all discontinuous cas fortuit of all words in the target language translation) to carrying out target language corpus mighty torrent, always the target language word string is considered as individual unit (that is, the word in the word strings must keep adjacent and be in identical order).
Of the present invention another returns the embodiment that carries out classification to query string mighty torrent tabulation and can use points-scoring system, and in the target language word string being each word increase mark of the translation of the source language word in the source language looking up words string, in the qualified target language word string not being each word deduction mark of the translation of a word in the source language looking up words string.Moreover word can obtain more or less mark based on its general frequency in language.For example, non-stop words can have higher weight than stop words.
For example, whether user-defined the setting can be the translation of the source language word in the source language looking up words string based on it by (1), add or deduct 5 fens for each stop words that appears in the target language word string, and whether (2) are the translations of the source language word in the source language looking up words string based on it, give and to appear at the non-stop words of target language word string in returning (promptly, word as " it ", " and " or " the " outside the word of frequent reproduction) adds or deduct 20 fens, come each the target language word string marking in the tabulation of query string mighty torrent.
Show such marking for the example that uses the front, suppose that " aa " and " cc " is stop words, " bb " and " dd " is not stop words.In this example, under above-mentioned user-defined marking parameter, if EE1 is a stop words, then word strings " AA1 BB1 CC3 EE1 " can obtain mark 25 (5+20+5-5=25), if EE1 is not a stop words, then it obtains mark 10 (5+20+5-20=10).Can use any other marking scheme based on the word quantity in the word strings that from source language looking up words string, translates and appear in the tabulation of query string mighty torrent.
Can comprise correctly, partly correctly reach incorrect target language translation of words string in the processing returning of this generation.As mentioned below, the present invention is by resolving to document overlapping word strings and making up overlapping target language word string and translate the translation source Language Document.Between the translation of words string, require to have big overlapping word strings (promptly, a lot of words) can get rid of in the tabulation of query string mighty torrent is not the returning of correct translation of word strings, because they translate the overlapping user-defined size (as mentioned below) that do not reach with other word strings.
As described below, returning in the tabulation of query string mighty torrent, or do not reach as yet as any returning (using any method) of accurately translating the user definition criterion of confirming, can in big overlapping chain, use, could be like this when first of the unit of still only serving as interpreter before confirmed as word strings translation accurately with last word strings.In addition, the leftmost word strings of translation must be accurately on its left side, and the rightmost word strings of translation must be accurately on its right.It is known accurate word strings translation that big overlapping (as described below) translation unconfirmed is clipped in, or confirms that at least their both sides are in the middle of two translations of accurately translation, and the basis of accurate translation can be provided like this.
Can not that returning of correct translation improved the tabulation of query string mighty torrent by eliminating, and need not to check overlapping word strings by the big word strings execution query string mighty torrent same as described above that comprises the word that the initial query word strings adds that both sides are additional is analyzed.This embodiment need comprise the source language corpus of the context words and/or the word strings of source language looking up words string and encirclement, but does not require that this source language corpus is the parallel text document of target language corpus.Use the method to continue top example, system is the search source language text, searches the source language word strings that word strings " aa bb cc dd " and both sides add the word of user-defined quantity that comprises of user-defined quantity.The additional clip that user-defined criterion can require the source language word strings that these are long to resolve to the user-defined size of having of user-defined quantity and comprise " aa bb cc dd " is carried out mighty torrent processing by above-mentioned to target document with them then.
For example, if five word strings that user's request all adds three words on each limit of original string, source language five word strings of then using the source language corpus to return can be:
1.“zz?xx?yy?aa?bb?cc?dd?ll?mm?nn”
2.“kk?rr?ll?aa?bb?cc?dd?aa?kk?oo”
3.“kg?lh?wk?aa?bb?cc?dd?ql?io?rr”
4.“ck?nk?ak?aa?bb?cc?ddb?k?sk?jk”
5.“dm?ea?jc?aa?bb?cc?dd?tg?ms?jf”
This handles the next source language word strings that is used for the target language corpus is carried out the mighty torrent processing based on user-defined criterion establishment hereinafter described of the word strings with user definition size (in this example, minimum 5 words) that above-mentioned string is resolved to user-defined quantity then.If customer requirements is analyzed all possible analysis result of the string that comprises initial query, then can generate the following combinations of words that parses to first above-identified word strings:
“zz?xx?yy?aa?bb?cc?dd?ll?mm?nn”
“zz?xx?yy?aa?bb?cc?dd?ll?mm”
“zz?xx?yy?aa?bb?cc?dd?ll”
“zz?xx?yy?aa?bb?cc?dd”
“xx?yy?aa?bb?cc?dd?ll?mm?nn”
“xx?yy?aa?bb?cc?dd?ll?mm”
“xx?yy?aa?bb?cc?dd?ll”
“xx?yy?aa?bb?cc?dd”
“yy?aa?bb?cc?dd?ll?mm?nn”
“yy?aa?bb?cc?dd?ll?mm”
“yy?aa?bb?cc?dd?ll”
“yy?aa?bb?cc?dd”
“aa?bb?cc?dd?ll?mm?nn”
“aa?bb?cc?dd?ll?mm”
“aa?bb?cc?dd?ll”
Can use above-mentioned mighty torrent to handle each word in these word strings is produced possible target language translation.Translate each word separately by using dictionary or existing machine translation system, and based on the user-defined requirement (and/or other requirements) that in the word of maximum quantity, comprises the word translation of minimum number, the ferret out Language Document, search and comprise word Aim of Translation language word string, analyze each word strings.The tabulation that the target language that is generated returns is called " tabulation of inquiry+context mighty torrent ".System's word strings that parses that can obtain deriving by each initial source language word strings (promptly then, source language word strings inquiry adds the context words on the left side and the right-in this example, four ten word strings (2 to 5) of the remainder of promptly discerning previously) in remaining each generated query+context mighty torrent tabulation.In addition, can be created on the word strings of the greater number of the left side of looking up words string and the context words string that the right has a context words or user definition size by the search source language corpus, and can intactly use each to go here and there and create inquiry+context mighty torrent tabulation, and further it not resolved to shorter word strings.
Next, system uses each result in the tabulation of query string mighty torrent, and each substring of word strings greatly in all the inquiry+context mighty torrents tabulations that generate of all source language word strings of constituting by the context words string that is added the left side and/or the right by initial query of search.System appears at the total degree of (or independent appearance) in the substring of the longer word strings result in inquiry+context mighty torrent tabulation and counts to returning in the query string mighty torrent tabulation.
Adjust these countings then, therefrom deduct the number of times that (query string mighty torrent tabulation on) less word strings occurs as the part of (in the tabulation of query string mighty torrent) big word strings.For example, suppose that two word strings " DD1 AA2 CC2 " and " DD1 AA2 CC2 BB3 " are in the tabulation of query string mighty torrent.If word strings " DD1 AA2 CC2 " has occurred 120 times as the substring of the word strings in the tabulation of inquiry+context mighty torrent, and the counting of " DD1 AA2 CC2 BB3 " is 100, then adjust the frequency counting of " DD1 AA2 CC2 ", therefrom deduct its number of times as the part appearance of big word strings " DD1 AA2 CC2 BB3 ", promptly, 120 deduct 100, obtain 20.This subtraction is adjusted conceptive being similar to when this method of use makes up with parallel text and is striden the subtraction adjustment that language is made when related, this subtraction adjustment as Fig. 1Shown in deduct the appearance of less word strings as the part of bigger reproduction word strings.
Then, the total degree that in inquiry+context mighty torrent tabulation, occurs based on each result, the word strings (after the subtraction adjustment of in the preceding paragraph, describing) in the tabulation of classification query string mighty torrent again as the substring (or independent) of big word strings.In addition, user-defined parameter can require part to carry out classification based on other specific factors, these factors comprise the result as the word number in the residing context words string of substring, and the number of times that occurs as the part of the context words on the left side or word strings of substring and substring are as the balance between a part of occurrence number of the context words on the right or word strings.
This stage in processing, if the word strings at the left side or " edge " is confirmed as accurate translation in the translation and inquiry that user-defined parameter request only will be bigger, because it is first word strings in the bigger overlapping word catena, then the context words or the word strings on the left side are only used in tabulation to inquiry+context mighty torrent.If it is the word strings on the right in the overlapping word strings long-chain, then only tabulate with the context words on the right and word strings and the inquiry generated query+context mighty torrent of coming together.
As additional embodiments, can tabulate by generated query+context mighty torrent, and generated query string mighty torrent tabulation.On the contrary, each word strings in the inquiry+context mighty torrent tabulation is considered as picture uses the target language scope of the parallel text state of striding learning, and in them each analyzed searched the reproduction word strings in the same manner.List the counting that reappears word strings with form, and adjust the counting of shorter word strings as the number of times that occurs than long part of going here and there by deducting them.If use the method in order to realize best result, should use different context words or word strings to generate inquiry+context mighty torrent tabulation (rather than press different length and resolve identical string).In addition, also can resolve the context words string, but the translation of the context words in the context words string should be ignored by system, so that the reproduction word strings among the member of inquiry+context mighty torrent tabulation is counted.
There is the additive method that improves the tabulation of query string mighty torrent.A kind of approximate semantic equivalence that uses the present invention's total frequency analysis aspect generated query as mentioned below that comprises in these methods.In case generated the additional source language word string that expression semantically is similar to the notion of inquiry, just can use and stride language dictionaries and come each option is carried out above-mentioned mighty torrent method.The method can extended source language translation option quantity, and comprise when (not striding in the language dictionaries) idiom is expressed particularly useful in the initial query word strings.In idiom was expressed, independent word may lose its feature of semanteme fully.
Can carry out same processing to the highest result of each classification in the tabulation of query string mighty torrent.Use hereinafter described the present invention to discern semantic similar word strings aspect, can use query string mighty torrent tabulation go up the target language word string of user definition quantity (as, top five) make up user definition quantity semantic similar target language word string (as, each corresponding five).Can use these group synonym word strings to search the total string of striding a plurality of tabulations, to confirm to satisfy the quantity of user-defined total word strings in any semantic equivalence tabulation of returning or the word strings translation of number percent minimum value (as described below).In addition, can to the mode of word these group synonym word strings translates back source language, organize the total translation that word strings (and inquiring about self) has maximum quantity with that check which group and inquire about synonym with source language by word.The word matched source language word strings of source language or the target language sentence of its synon that group synonym are returned in translation with maximum quantity, are one group of correct target language translations.
The additive method that improves the tabulation of query string mighty torrent comprises and uses multilingual lever method and mighty torrent method simultaneously.In this embodiment, can be by the mode of word to word (and/or word is to phrase), use all possible translation of each word, source language looking up words string is translated as one or more the 3rd language, and by passing through each the 3rd language text corpus of search as mentioned above, search sentence and other word strings of the translation of words that in user-defined maximum word sum, comprises user-defined minimum number, come that it is carried out mighty torrent and handle.By word the mode of word (and/or word is to phrase) is translated as target language with qualified the 3rd language word string then, satisfies the target language word string that above-mentioned user-defined mighty torrent is handled criterion to be used to search for.In addition, the translation of words in the 3rd language directly can be translated as target language,, search the 3rd language word string and needn't as described in the step of front, search for the 3rd language corpus to be used to searching for qualified target language word string.The word strings that appears at the query string mighty torrent tabulation of centre the 3rd language that can be used for using more than one in the target language can provide further affirmation to translation.Can and use by the synonym word strings of above-mentioned generation source language, target language and middle the 3rd language and stride language dictionaries and come further translation to be confirmed.
Multilingual lever of the present invention aspect to make up and the dictionary of expansion word level also of great use, this can be used for target language mighty torrent embodiment of the present invention, and any other purpose.If in the prior art or several dictionaries of customization because do not have the clauses and subclauses of source language word or these clauses and subclauses are arranged but the complete list of target language translation that do not have possibility and imperfect, then the present invention can replenish these dictionaries by using the existing translation of source language word in one or more the 3rd language.System can obtain the 3rd all language words and discern the translation of known target language then.The most frequent target language translation that the 3rd language produces in the middle of using will be confirmed to be translation.How many total results are user-defined criterion define can become translation.In addition, if necessary, human editor person can assess the tabulation that is produced and get rid of incorrect translation.Moreover, also can use the method and system of striding the language frequency association to make up dictionary by checking the word in the source language.Also can use among the present invention and to use in total frequency analysis (describing hereinafter) identification form kind state or the language like the semantic category method of word and word strings to expand target language translation clauses and subclauses.
D. use multi-method difference to obtain
Can't reach the statistics determinacy that satisfies as the user definition criterion of correct translation if be used to discern the word strings translation candidate of any method generation of striding state relation, then can use the partial results of two or more methods to confirm association together as correct translation, in the time of maybe can not confirming, continue next candidate's translation.Do not have enough related words strings to reach under the statistics determinacy case at the text that can be used for analyzing, this is that everybody is desirable.The partial results of using distinct methods to obtain is confirmed the word strings translation, and this equally also is useful as using the related mode (this will save processing power and processing time) of calculating structure still less.In addition, as implied above, any word strings interpretation method that the method for identification semantic equivalence word strings can be used for subsidiary book invention or any other system among the present invention is discerned or is confirmed the word strings translation.
It should be noted that the present invention can follow the trail of the result of the result's's (and any other output of semantic equivalence of hereinafter describing and method of the present invention) who is used to determine to confirm as translation user definition parameter.The result is carried out such assessment uses these results to define the parameter of imitating definition automatically the permission system.These demands generally include that to make the combination that ins all sorts of ways that returning of combination is provided be the accurate statistics determinacy of translation.
III. stride state knowledge method for reconstructing and device
Another aspect of the present invention relates to provides second document forming by the data of first document creation second kind of state, form or the language be made up of the data of first kind of state, form or language, make the notion that the final expression of first and second document is identical in essence or the method and apparatus of information, and described method and apparatus comprises using and strides concept related database.Can use any method of the present invention " in advance make up " data base entries or can " as required " (immediately) structure.
An embodiment of interpretation method uses two anchor point method of superpositions to obtain the accurate translation of notion from a kind of state to another kind of state.Additional embodiments can allow when the adjacent segment in the target language does not have target language overlapping in the direct translation of overlapping source language word strings, if overlapping and their translation is also overlapping in target language in the 3rd language to the indirect translation of target language then, then confirm them by the 3rd language.The present invention uses two anchor point method of superpositions, support organically connects together the structure piece word strings of second kind of language, form or state, and make them in correct context, become the accurate translation of those word and expressions in mode accurately, just as the people who is mother tongue with second kind of language writes out or says.The method has solved the border clash problem that existing EBMT system runs into.
In one embodiment of the invention, combined word string associated data creation method and method of superposition provide the accurate language translation of the document of random length.By any source language input is resolved to a series of word strings, wherein each word strings is all with before it and the overlapping word of the word strings after the parsing afterwards with user definition quantity, and check the translation of those word strings in target language, search overlapping word or word strings, this method and system can come translation document by piece together structure piece notion in chain.When more multiple folded word of requirement was set, this can obtain the combination of word strings translation more accurately in target language when user-defined.
Moreover, manually or the word strings translation result by any automated process assembling comprise be used among the present invention stride language make up the word any method related with word strings (as, use parallel text, multilingual lever, target language mighty torrent or the like), by require word strings translation (as long as the known word strings translation of using both sides is as anchor point) both sides and adjacent word strings as the part of longer translation and inquiry the time have longer overlapping word strings (, more multiple folded word), can check these results' accuracy.Two anchor point method of superpositions do not allow semantic correctly but do not satisfy the concrete contextual translation of longer translation and inquiry, and two anchor points are overlapping will get rid of semantically incorrect translation.Therefore, when this method can not reach the translation of user-defined word strings separately and confirms point, can use two anchor point method of superpositions to confirm or gets rid of by of the present invention and anyly stride the word candidate string that the language barrier linked method identifies and translate.For example, if all words that only source document resolved to each word strings have overlapping word strings fragment fully, and known Far Left and the translation of rightmost word strings are accurately, then do not accept incorrect target language translation candidate on semanteme or the grammer.
Moreover, in case by having confirmed word strings translation candidate to then the word strings unit of these new affirmations to be added in the database as known accurate translation as anchor point long overlapping with the translation of known word strings.In addition, can will stride the macaronic overlapping word strings of two known word strings translations as independently word strings translation affirmation.
A. use linked database and two anchor point overlap technique to carry out document translation
As another preferred embodiment, the present invention can be the document of second kind of language with the document translation of first kind of language by using the above-mentioned language database of striding.The clauses and subclauses that can have word strings translation maybe can use the above-mentioned language of striding to make up any in the method for word strings translation and make up such clauses and subclauses immediately.
An embodiment of this aspect of the present invention at first uses any in the method for the possible target language word string of above-mentioned identification translation, and the longest word strings that each sentence begins to locate in the document that the location will be translated (source document) and the institute that satisfies user-defined criterion thereof might translate.Next, have second word strings of the overlapping word of user definition quantity with the word strings that had before identified in each sentence of the document (source document) that this method identification will be translated, and may translate (overlap length (that is word number) that user definition is required).If one during the translation of the target language word string of the word strings of second sign of (in the source language) sentence is translated with first word strings of sentence has user-defined minimum overlay, then the combination that will translate is confirmed as the translation unit of combination.If the translation that can not overlap, then discerning the difference that the source language word strings has user-defined minimum overlay resolves (promptly, different beginnings and/or end position), and whether their corresponding target language translations of the word strings of or user-defined size overlapping by word check can make up.Next, second word strings that identifies has the 3rd word strings and target language translation thereof in the source language of user-defined minimum overlay word number in identification and the source language.If any translation of the 3rd word strings that identifies and the translation of second word strings that identifies have overlapping word, then translation is confirmed as in this combination.Identification has the next source language word strings of user-defined minimum overlay word with the source language word strings that had before identified, and repeat to manage herein, up to: (1) has discerned each overlapping word strings (having user-defined at least minimum overlay length) and the translation of possible target language thereof in the source document, (2) in source language and the target language each word strings all on the left side and the right have the overlapping word strings that is at least user-defined minimum length (overlapping also can be a word, if the user defines like this), except the string of beginning only overlapping on the right, and last string on the left side is overlapping, and (3) select to satisfy the translation of the longest string of top attribute 1 and 2 as final output.In addition, can be based on user-defined criterion, have precedence over and have the more folded longer string of short weight, select to have long overlapping shorter target language word string (that is, having the still less string of word).Balance is programmable parameter between overlapping ratio and the string length, and can be optimized it by manual or Automatic Optimal operation.
Owing to stride the word strings translation of language each word in the word strings is all had suitable built-in context, and two anchor point method of superposition provides the accurate combination of word strings translation, the level of accuracy of translation document is much better than any existing interpretation method.The present invention uses the associated data base establishing method to make up word strings and makes up the piece notion, and will make up the piece notion and be combined as any amount of combined concept greatly by striding the two anchor point method of superpositions of language.
Using two anchor point method of superpositions is user-defined (in the above-described embodiments, the user is a sentence to the definition of translation and inquiry unit strings) as the separation of the chain of translation and inquiry unit strings translation.For example, rather than sentence, can be with this conceptual expansion require to shorter unit (as, between punctuation mark) or longer unit (as, the paragraph that comprises punctuation mark) all adjacent words strings, the word strings translation of striding source language and target language should be overlapping.Because having only on one side, the beginning and the overlapped elements at two places, end confirm, so when preparing to accept first or last word strings as translation, user-defined structure word strings is translated criterion strictness more by overlapping obtaining.Moreover the translation (by providing additional inspection to source language and/or the synon translation of target language) of any word strings is confirmed in the aspect that can use the present invention to discern the semantic equivalence word strings.
For example, consider to comprise with the English input and prepare to be translated as the Rabbinic-English word of composition of the sentence below rabbinical and the database (using any method of the present invention to make up or manual construction) of word strings translation: " In addition to my need to be loved by all the girls in town, I alwayswanted to be known as the best player to ever play on the New York statebasketball team ".
By above-mentioned processing, disposal route can determine that phrase " In addition to my need to be loved byall the girls " is to start from first word of source document in the source document and be present in the longest word strings in the database.It is related with some word strings in database, comprises Rabbinic word strings " benosaf ltzorech sheli lihiot ahuv al yeday kol habahurot ".Translation below processing use said method is determined then-promptly, in the same text (and being present in the database) word is arranged (or in addition with the English word string that before identified, the word strings of user-defined minimum length) overlapping, and two Rabbinic translations of those overlapping English word strings also have the longest English word string of overlapping fragments.For example:
" loved by all the girls in town " is translated as " ahuv al yeday kol habahurot buir ";
" the girls in town, I always wanted to be known " is translated as " Habahurot buir, tamid ratzity lihiot yahua ";
" I always wanted to be known as the best player " is translated as " tamit ratzity lihiotyahua bettor hasahkan hachi tov "; And
" the best player to ever play on the New York state basketball team " is translated as " hasahkan hachi tov sh hay paam sihek bekvutzat hakadursal shel medinat newyork ".
Return according in the database these, handle operation in some way and come more overlapping word and word strings and get rid of redundant.Use method of the present invention, system will obtain English fragment " In addition to my needto be loved by all the girls " and " loved by all the girls in town " and return Rabbinic fragment " benosaf ltzorech sheli lihiot ahuv al yeday kol habahurot " and " ahuv al yeday kolhabahurot buir " and determine overlapping.
In English, these phrases are:
" In addition to my need to be loved by all the girls " and " loved by all the girls intown ".Remove overlappingly, obtain " In addition to my need to be loved by all the girls intown ".
In Rabbinic, these phrases are:
" benosaf ltzorech sheli lihiot ahuv al yeday kol habahurot " and " ahuv al yedaykol habahurot buir ".Remove overlappingly, obtain: " benosaf ltzorech sheli lihiot ahuv al yedaykol habahurot buir ".
The present invention operates the fragment that the next one parses and continues to handle then.In this example, handle operation phrase " the girls in town, I always wanted to be known ".Set of letters corresponding in the Rabbinic is " habahurot buir, tamid ratzity lihiot yahua ".In English, following operation is pressed in overlapping processing: obtain " In addition to my need to beloved by all the girls in town, I always wanted to be known " by " In addition to my need to be loved by all the girls in town " and " thegirls in town; I always wanted to be known ".
In Rabbinic, following operation is pressed in overlapping processing:
Obtain " benosaf ltzorech sheli lihiotahuv al yeday kol habahurot buir, tamid ratzity lihiot yahua " by " benosaf ltzorech sheli lihiot ahuv al yeday kol habahurot buir " and " habahurot buir; tamid ratzity lihiot yahua ".
The present invention is to the word of remainder in the document that will translate and the operation that word strings continues this type of.Therefore, in the example of preferred embodiment, next English word string is " In addition to my need to be lovedby all the girls in town, I always wanted to be known " and " I always wanted to beknown as the best player ".By database be: " benosaf ltzorech sheli lihiot ahuv al yeday kol habahurot buir, tamid ratzity lihiotyahua " and " tamid ratzity lihiot yahua bettor hasahkan hachi tov " to the Rabbinic translation that these phrases return.Removal English is overlapping, obtains " In addition to my need to be loved by all the girls in town, I always wanted tobe known as the best player ".The removal Rabbinic is overlapping, obtains: " benosaf ltzorech shelilihiot ahuv al yeday kol habahurot buir, tamid ratzity lihiot yahua bettor hasahkanhachi tov ".
Continue this processing: next word strings is " In addition to my need to be loved by all thegirls in town, I always wanted to be known as the best player " and " the best player toever play on the New York state basketball team ".Corresponding Rabbinic phrase is " benosaf ltzorech sheli lihiot ahuv al yeday kol habahurot buir, tamid ratzity lihiotyahua bettor hasahkan hachi tov " and " hasahkan hachi tov sh hay paam sihekbekvutzat hakadursal shel medinat new york ".Removal English is overlapping, obtain " In additionto my need to be loved by all the girls in town, I always wanted to be known as thebest player to ever play on the New York state basketball team ".The removal Rabbinic is overlapping, obtain " benosaf ltzorech sheli lihiot ahuv al yeday kol habahurot buir; tamidratzity lihiot yahua bettor hasahkan hachi tov sh hay paam sihek bekvutzathakadursal shel medinat new york ", i.e. the translation of the text that will translate.
When reason was finished herein, the present invention returned and exports final cypher text.
It should be noted that these return is database root returns overlapping association according to above-mentioned processing net result.System does not finally accept do not have nature to meet returning of connection in second kind of (target) language by this processing, that is, the left side and the right are overlapping with adjacent language fragments as mentioned above, except first with last fragment.If Rabbinic returns related with adjacent Rabbinic word strings accurately not overlapping, then refuse it also with overlapping the highest related replacement of Rabbinic word strings of classification of Rabbinic word strings this English word string and adjacent, maybe can from database, retrieve the translation of overlapping English word string (shorter or longer) and Rabbinic thereof, and check accurately overlapping in the Rabbinic.
(the 253rd page (translator annotates: the original text page number)), the example that the method that the parallel text that printout displaying of the present invention uses two anchor point method of superpositions to be used in combination two states obtains is translated in appendix B.
(the 297th page (translator annotate: the original text page number)), the example that the parallel text that uses two anchor point method of superpositions to be used in combination two states obtains and the method for using multimode to obtain is translated is showed in printout of the present invention in appendix C.
(the 308th page (translator annotates: the original text page number)), the example that uses two anchor point method of superposition composite object language mighty torrent methods to translate is showed in printout of the present invention in appendix D.
Can set up various user-defined parameters to overlapping criterion.For example, the one or more words in overlapping be stop words (as, " the ", " it ", " in ") time, required overlapping word quantity can be bigger, because these common words make that the tie point of word strings combination is unreliable.The translation candidate with and its two overlapping translation between overlapping word strings long more, it is uncertain more that word strings is translated.If translation is incorrect, then it can not have grow overlapping with the adjacent translation on its both sides.
Therefore, it can be dynamic that user-defined minimum overlay requires, and can be that known correct also being based on is used to make up the word strings association among the present invention distinct methods is defined as the candidate based on translation, require between the translation of the word strings after the parsing, to have still less or more overlapping word.Moreover, can ignore overlapping stop words and satisfy this requirement for confirming translation desired minimum word quantity in overlapping.
For example, suppose that user-defined requirement needs two or more overlapping non-stop words confirm the combination of two word strings translations, and the word strings after the overlapping parsing " and I know it is good " and " it isgood to run two miles " are provided to system as the part of the longer word strings that will translate.System can not accept this analysis result, because do not have two non-stop words in the overlapping word strings " it is good ", so it does not satisfy user-defined overlapping requirement.Word strings needs to have between the fragment more words to meet the demands, check then the translation of overlapping respective objects language (as, " and I know it is good " and " know it isgood to run ").
If by any method of the present invention, any other automatic translating method identification, or the word strings of manual creation translation candidate can not determine and be accurate translation, and then two anchor point method of superpositions can require all word strings (except first and last word strings) must allow each word in the string and the adjacent words string on the left side or the right translate overlapping.For example, a kind of may the parsing of the word strings that translate " overlapping fully " can be as follows:
Source language (English) translation and inquiry: " The best time of the year is the summer becauseyou can sit in the sun and then jump in the pool ".
A kind of possible complete overlapping parsing:
“the?best?time?of?the?year”
“time?of?the?year?is?the?summer?because?you”
“year?is?the?summer?because?you?can?sit?in?the?sun”
“because?you?can?sit?in?the?sun?and?then”
“sun?and?then?jump?in”
“jump?in?the?pool”
Scheme can be when the source language translation query parse is overlapping word strings more completely, and overlapping of each continuous word strings is moved forward a word.For example:
“the?best?time?of”
“best?time?of?year”
“time?of?year?is”
“of?year?is?the”
“year?is?the?summer”
The processing that can continue to begin above, maximum overlapping up to each word that parses translation and inquiry.
Because word strings on the left side and the right are overlapping fully (except first and last word strings, can only carry out some additional affirmation to them by folk prescription is overlapping), if their translation candidate incorrect (or only correct in different contexts on every side) then can not accept.Should be by a kind of (or manually) in the correlating method of the present invention, independently first word strings on the left side is confirmed as accurate translation (at least on the nonoverlapping left side of word strings), should independently last word strings at sentence end be confirmed (at least on underlapped the right) as accurately translating.In the above example, word strings " the best time of the " and " jump in the pool " separately should be independently as accurately translating affirmation or obtaining confirming on their left sides and the right at least.These translations through confirming provide end points accurately, can be used as the anchor point of overlapping word strings translation candidate chains.
Identical method of superposition can be applied to word strings and connect and constitute longer unified word strings, is used to use single kind of state or Language Application, and is as mentioned below.
B. use that two anchor points are overlapping to carry out knowledge acquisition
Moreover, when making up two translations at every turn, can confirm two additional data base entries of striding the language translation, and they be added in the database based on overlapping result corresponding to word strings through confirming with overlapping word strings.At first, total overlapping translation combination can be confirmed in order to using in the future as an integral unit.Secondly, the unit of overlapping word constitutes the word strings translation by the present invention in source language and the target language, and it can be added in the database in order to using in the future.
For example, suppose to stride language database and have following language X word strings and the corresponding known translation in language Y:
The word strings of language X Translation in language Y
1.“EE?KK?GG?XX” ? 1a.“ll?bb?ee” 1b.“ee?kk?gg?xx”
2.“GG?XX?BB?YY” ? ? 2a.“gg?ll?bb?yy” 2b.“gg?xx?bb?yy” 2c.“gg?xx?mm?ll”
Based on above-mentioned data base entries, can confirm following additional database clauses and subclauses and with them as legal translation input:
3.“EE?KK?GG?XX?BB?YY” ? 3a.“ee?kk?gg?xx?bb?yy” 3b.“ee?kk?gg?xx?mm?ll”
4.“GG?XX” 4a.“gg?xx”
Clauses and subclauses 3 are the combined word string translations after the overlapping word of getting rid of in source language and the target language.Clauses and subclauses 4 are the overlapping word strings in source language and the target language, it with the shorter word strings in overlapping as word strings independently translation confirm.
Do not use target language mighty torrent method (or using any other method) can check by overlapping word strings long in source language and the target language in the tabulation of query string mighty torrent as accurately translating the translation candidate who confirms.If overlapping word strings translation candidate is by long overlapping being linked to together, and overlapping in long translation unit beginning and place, end with known word strings translation, then can stride macaronic corresponding word strings in overlapping and confirm as translation translating candidate and each.The method of the translation in the overlapping word strings of above-mentioned identification can be striden the overlapping existing translation of bilingual by utilization and be expanded any language database of striding, can generate automatically or hand assembled they so that use or be used for any other purpose by EBMT system, translation memory system.
C. other related application
The language linked database is striden in the combinations thereof use obviously can be used to improve trial is converted to information the prior art of another kind of state from a kind of state equivalent quality with the embodiment that strides the two overlapping interpretation methods of anchor point of language, as speech recognition software of the prior art and optical character identification (OCR) scanning device, so that stride multiple source related information, and in a kind of language, translate different jargon or dialect.The interpretation method of the application of the invention checks whether can translate these results, and these technology (and other technologies) can use the present invention to check result's (output) of their system.In the time can not finding, can warn and inquire the user, or system can be programmed in database non-overlapping portions in the translation is searched other approaching selections with the overlapping translation of adjacent words string.The various criterions of searching with the overlapping alternative word strings of adjacent words string comprise that those use embodiment of linked database to produce the criterion (description hereinafter) of semantic equivalence in a kind of language based on context.Certainly, all should change back initial language to returning of user.
Except the auxiliary prior art of carrying out these application, method of the present invention comprises and strides state study and two anchor point method of superposition, also can directly be used for making up these application.Concerning OCR, the visual representation of letter and word can be used to make up the association between the visual representation that word and word strings, computer code such as UTF-8 and other machine words mention agreement.How the to use a computer text of language of those professors can be set, make it and describe the computer language code order and align, thereby between human language and computerese, make up related as the textual description of training text.The written description of code and computer code also can be as using method of the present invention to make up related parallel text corpus.Concerning speech recognition, can analyze sound wave and penman text, between total notion, make association (use the word strings of penman text and also come training system) as the audio frequency sound of " parallel text " with the related of these texts with two kinds of different state representation, as mentioned below.
IV. frequency analysis method and device are created and had to single state frequency linked database
A. brief introduction
An alternative embodiment of the invention provide (1) create by single plant language (as, Japanese or English) word and the method and apparatus of the frequency linked database (FAD) of the notion represented of word strings, and (2) use FAD to discern the method and system of the total relation between two or more words and/or the word strings.Second method and system are called total frequency analysis (CFA), can generate the tabulation of associated concepts in various application with it.
In this embodiment, in case created FAD, just store the information of the proximity relation in text between relevant two or more reproduction word strings patterns with it.Concerning in case set up and stored these proximities by first processing, just is second processing, and promptly CFA provides the foundation, and CFA is that analysis is related by the 3rd word or word strings that two or more words and/or word strings are shared simultaneously with identification.This CFA is treated to various knowledge acquisition and knowledge generation application provides the basis.
The frequency associated program can be realized some method of the present invention, is used to make up database of the present invention, and the information of analyzing stored in database is determined the association between word and/or the word strings. Fig. 2 and Fig. 3 The storer 208 of demonstrating computer system 200 has wherein been stored intelligent use 302, associated program 304, database 306 and operating system 308 by processor 202 visits.Associated program 304 can be determined the word association by analytical database 306, passes through the inquiry that input equipment is directly submitted to inquiry or the response user who responds from intelligent use 302.Database 306 can comprise, for example, and FAD and document database.
By based on the frequency of occurrences and special fragment position with respect to other fragments in the document, resolve the how related each other information of the text that is input to all documents in the system and the relevant text fragments that parses of storage, the FAD system and method carries out work.As mentioned above, the text fragments that parses can comprise word and word strings, or uses the character and the character string of the language of the character have independent semantic values (as, Chinese character).Before by the FAD system handles, can with document storage in document database so that these documents are conducted interviews, resolve and analyze.
By word in user-defined each scope and word strings are carried out the FAD analysis, the present invention discerns frequent word and the word strings that occurs in the inherent approximating position of document.Word that these are related and word strings can be by second processing, be that CFA uses, so that identification is based on having the notion or the idea (being represented by these words or word strings in the present embodiment) of stronger relation each other with the common relation of other the 3rd notion and idea (being represented by word and word strings at this equally).
CFA handles by operating these related word strings that are stored among the FAD and creates the knowledge base of being made up of the associated concepts tabulation.In one embodiment of the invention, the tabulation of these associated concepts (representing with word and word strings in this embodiment) is called knowledge acquisition tabulation or semantic equivalence tabulation interchangeably.Use this embodiment of CFA, by be identified in around the inquiry or near be in the word strings of AD HOC, system generates tabulation to looking up words or word strings.Such pattern is called " left side signature or right signature " or combines and is called crable, is shared by the 3rd word and/or word strings.The result that inquiry generates to certain words or word strings identifies the notion of tight association, comprises semantic equivalence, the antisense notion of word or word strings, the example of notion, and other associated concepts of being represented by word and word strings.In case set up these signatures, cradle and knowledge acquisition tabulation by the knowledge base of every kind of language, just they can be used in mechanical translation application, search and text mining application, data compression, reach in a lot of other application, comprise the answer that allows the customer requirements systematic learning and/or problem is provided or carry out the artificial intelligence or the intelligent use of moving.
The FAD of the application of the invention provides the input of CFA, and system can determine the 3rd word and/or word strings association total between two or more words or the word strings.When operation during FAD, the user can be any amount of word of each appearance of the word of each selection located adjacent one another or word strings and/or the word strings of user definition length with the scope definition that will check in the document.
In case set up these words and word strings relation and they are stored among the FAD, system is based on (seeing from the instruction of intelligent use 302 Fig. 3) carry out one or many CFA, search is by the two or more words of intelligent use 302 selections and/or the common word and/or the word strings of scope of word strings.When system carries out CFA, if this information before had been stored among the FAD, can searching word or the frequency of occurrences of word strings in the scope of the word of each selection or word strings (maybe can use text or any other usable text in the document database, comprise the text on the Internet, any information among the FAD had not before been analyzed and be stored in to instant analysis).
The FAD that creates single kind of state is similar to the parallel text identification word strings of above-mentioned use and translates to create and stride language FAD.In this situation, in target document, determine scope, and count to determine the frequency of occurrences in the scope reappearing word and word strings.When list was planted the establishment FAD of language or state, principle was identical, but the frequency of word strings and proximity are to be used for determining single kind language or state word and the contextual pattern of word strings, rather than the language of striding of word and word strings is translated.
Construct each another way of reappearing the FAD of word or word strings proximity relation of record and be and be identified in position and the frequency that the word that reappears in the document database and word strings occur, they are stored in the simpler playing data for broadcasting storehouse, and set up the word strings frequency indices, such example is showed in table 4.The playing data for broadcasting storehouse is used as word strings frequency indices rather than FAD, associated program 304 can be discerned all identical word strings patterns, and, determine (to see by the two or more words and/or shared the highest the 3rd word and the word strings relation of classification of word strings of intelligent use 302 selections based on user-defined weight or other criterions Fig. 3).
B. the establishment of frequency linked database (FAD)
1. general introduction
Disclosed the method that makes up FAD at this, this method can be applied to single document of planting language, with the database that makes up related word and word strings based on their frequencies of occurrences in text and mutual proximity.FAD is provided for the structure piece of CFA of the present invention.Described method comprises:
A. the assembling list is planted the text corpus (can be stored in the document database) of language.
B. searching for all in the corpus of assembling of any word or word strings repeatedly occurs.
That c. determines user-defined quantity has user-defined length and appears at the word of present analysis or the word and/or the word strings on arbitrary limit (or both sides) of word strings.With this as scope.Except being the word of specific quantity with scope definition, broadly the range of definition (as, all words in the residing concrete text of word or word strings) or the narrow sense ground range of definition (as, the length-specific that has specific proximity with the word or the word strings of present analysis (promptly, word quantity) word strings), the user can be to the different definition of concrete application choice scope.
D. search for corpus, and determine the frequency that each word and word strings occur in the word of the present analysis of selecting or the scope around the word strings, and if desired, can determine they and the word of selection or the proximity between the word strings.
If scope definition is comprised for each limit, for example, maximum 30 words, then each word of system log (SYSLOG) and the word strings frequency of occurrences in 30 words of every limit in these words or word strings.If with scope definition is that looking up words or word strings the right comprise three word strings, the inquiry left side comprises four word strings, then only the right is comprised that three word strings and the left side comprise that the inquiry of four word strings is considered as the reproduction of this pattern.System can write down the word of each word or word strings and present analysis or the proximity between the word strings.
As mentioned above, to specific application, can control system identification and ignore common words, as " I ", " a ", " to " or the like.Yet,, also can consider those common words based on the concrete target of using of system.Therefore, can based on word and word strings on the word of present analysis or the word strings left side or the right make up FAD at a distance of the locational frequency of occurrences of user-defined word number exactly.In such a case, the user can to this application with scope narrow sense be defined as with the word of present analysis or the concrete proximity of word strings be the word or the word strings of a concrete length.
For example, system can analyze available document, determines to comprise phrase " go to thegame " in them totally for 10,000 times, and may find that " go to the game " occurs 87 times in the scope of 20 words of distance word " Jets ".In addition, system may determine " go to the game ", and (in English, i.e. the left side is at the language of turning left and reading from the right side exactly in word " Jets " front, in Rabbinic, i.e. the right) before seven words (beginning number from first word " go " of word strings) appears eight times.
Also can write down word and word strings reproduction mode combination in any based on the word number between them.For example, the database sentence number that word " Jets " occurred before three words of " go to the game " and " tickets " occurs after nine words of " go to the game " in can database of record.This pattern may occur three times, and uses the meaning can use the frequency of occurrences derivation notion of this word mode in text, helping to provide the answer of the problem that the user proposes, or helps to carry out the request that the user makes.
In the prior art, existence can be used by search based on the method for user-defined proximity " search " word or word strings and be used, and search is used the result who uses user-defined search parameter to obtain and come to present the document that comprises those search termses that require based on proximity to the user.Yet searching method does not use to use and searches for these parameters (for example, based on the frequency in the text) automatically and this information stores is not used for automatically obtaining or learning knowledge based on using further automatically step yet.
These FAD of the present invention can use a series of narrowly-defined scope to generate these FAD individually based on definite reproduction word strings pattern in their proximity (weighing by the word number between them) indication texts each other.Yet usually frequent and the most useful word and word strings pattern is (on the left side and the right) with the word of current check or word strings is adjacent or very approaching usually those.
2. use the FAD that reappears the word strings index
If use said method to comprise each proximity between the word mode of all reproductions in the usable text and the database of frequency relation by above-mentioned structure, then can need more calculation times.A lot of relations that construct as the result of this complete process may be employed use never.Following method comprises that index reappears word strings and avoids the above-mentioned accurately processing of relation that may be used for never determining.
In addition, following index process can as above-mentioned based on concrete word or word strings the position in scope determine the frequency of accurate pattern automatically and the alternate process of the method analyzed.This embodiment of the present invention is the method that makes up the playing data for broadcasting storehouse, and the playing data for broadcasting storehouse includes only each and reappears word and word strings position in document database, and does not comprise the proximity relation with other clauses and subclauses.The method is as follows: at first, search for the reproduction in usable text of all words and word strings.Secondly, each word that record repeatedly occurs in database and " position " of word strings, this can be undertaken by writing down its position in residing each document, for example, the word numbering of first word in the identification string, and the document code in the document database.In addition, also can only store the document code of word in the document database or the residing document of word strings.In this situation, can when the concrete inquiry of response, search for and the instant position of determining word or word strings.
Table 4 is examples of the clauses and subclauses in the playing data for broadcasting storehouse.
Table 4
Word or word strings Frequency and position
“kids?love?a?warm?hug” ? ? 20 times (word 58/ document 1678, word 45/ document 560, word 187/ document 45,231, word 689/ document 123 ...)
“kids?love?ice?cream” ? ? 873 times (word 765/ document 129, word 231/ document 764,907, word 652/ document 4501 ...)
“kids?love?a?warm?hug?before going?to?bed” ? 12 times (word 58/ document 1678, word 45/ document 560, word 187/ document 45,231 ...)
“kids?love?ice?cream?before going?to?bed” 10 times (word 765/ document 129, word 231/ document 764,907 ...)
“kids?love?staying?up?late?before going?to?bed” 17 times (word 23/ document 561, word 431/ document 76,431 ...)
“before?going?to?bed” ? ? 684 times (word 188/ document 28, word 50/ document 560, word 769/ document 129, word 436/ document 76,431 ...)
As shown, in document database, occur more than once word or each appearance of word strings all will add in the frequency counting, and by the word numbered positions of appointment in the document and the numbering that is used to identify its residing document, or, write down its position by using any other identifier of word or the position of word strings in document database.
If all documents in the document database have fully also intactly been generated playing data for broadcasting storehouse (comprising word numbered positions and document code), then positional information permission system is by calculating any general frequency relation as mentioned above, or any concrete word strings mode frequency relation.Up to having made up the playing data for broadcasting storehouse fully, system is just after having discerned the position from the playing data for broadcasting storehouse, immediately the two or more scopes in the document in the document database are carried out FAD, or use any searching method of the prior art the general search of the instant execution of word strings in the document database.When system is inquired about by the response of the document in the direct analytical documentation database is relevant, can add any word or the word strings that do not appear in the playing data for broadcasting storehouse and reappear, to replenish to reappearing the analysis of database.By the document in the document database directly being analyzed after the information of obtaining has been used to the specific tasks that it generated, can be with information stores in the playing data for broadcasting storehouse, so that use in the future.No matter whether system uses the playing data for broadcasting storehouse to make up FAD is analyzed, or not by creating those relations with inquiry immediately as the keyword search document, system all will identify the relation between any reproduction notion of being represented by word or word strings.
C. total frequency analysis-carry out knowledge base by correlating method and device to obtain and generate
Total frequency analysis (CFA) is the method that has the tabulation of the total notion (being represented by word or word strings) that concerns with the notion (word and/or word strings) of two or more present analysis that generates among the present invention.Can use several different embodiment of CFA to generate different types of knowledge and obtain tabulation or associated concepts.These tabulations can be used in multiple application, comprise intelligent use.In intelligent use, other embodiment of use CFA carry out additional analysis and come retrieving novel information, answer a question or execute the task with help.
Refer now to Fig. 3In CFA handled, intelligent use 302 can be discerned which the 3rd word and/or word strings part or all of frequently related with word that is provided and/or word strings in user-defined scope by the two or more words of associated program 304 usefulness and/or word strings enquiry frequency linked database or playing data for broadcasting storehouse.Among another embodiment aspect CFA of the present invention, system receive word or word strings the inquiry (from, for example, user or intelligent use 302) time use two or more FAD clauses and subclauses to discern two or more words and/or word strings to inquiry, to make two or more words that identify and/or the association between the word strings.This type of CFA as knowledge acquisition tabulation generate a part of handling be used for identified word string signature and cradle discern semantic equivalence and word and/other relations (as mentioned below) between the word strings.
Two kinds of distinct methods of carrying out CFA are arranged: (1) independent total frequency analysis (ICFA), and (2) relevant total frequency analysis (RCFA).In addition, after any in using two kinds of processing, system can be by using them in an additional generation or in many generations, or the result by making up any CFA and/or fragment are used for further CFA and carry out further statistical study.
1. independent total frequency analysis (ICFA)
When intelligent use 302 (is seen to associated program 304 Fig. 3) provide two or more words and/or word strings when carrying out CFA, system can be by with reference to frequent related all words and the word strings of FAD identification of the present invention and the word that is provided and/or word strings.System can discern those and the partly or entirely frequent related word and/or the word strings of two or more words that provide and/or word strings based on user-defined criterion then.
System can identify the word that provided to it and/or classification is carried out in the total association between the word strings by various user-defined modes.For example, total word that system can be by each be provided in (or taking advantage of or any other user definition weighted method) and word that is provided and/or the word strings or word strings associated frequencies come to carry out classification to related.As another example of user definition parameter, can require to obtain on all tables of the word that provided and/or word strings minimum frequencies (by the position in the tabulation, original occurrence number, or any other measure weigh).
For example, use the clauses and subclauses in the above-mentioned playing data for broadcasting storehouse, if task is to search and word strings " kidslove " and " before going to bed " all related notion, then the 3rd notion can be calculated by system, as " ice cream ", the frequency that occurs with first notion " kids love " in the user-defined scope in all available documents is as once analyzing, and calculates frequency that " ice cream " and second notion " before goingto bed " occur together as the analysis second time.The frequency of each provided relative value each other during application can be used and independently concern then.This will have many high (user-defined absolute or relative value) based on the frequency of " ice cream " (based on user-defined scope) classification on the frequency meter of the frequency meter of " kids love " and " before going to bed ".
Based on user-defined value, the method is being analyzed " ice cream " afterwards, by the relative frequency (proximity based on user-defined scope or application requires) of location " a warm hug " on the frequency meter of " kids love ", " a warm hug " discerned in location " a warm hug " on the frequency meter of " before going to bed " then.Can compare frequent association every other on two frequency meters (may be user-defined), for example " staying up late ", and based on the user defined value marking of the relative frequency by two table packs.System will generate the highest word strings of classification based on the user definition weight of each frequency association.
The result of this analysis can be, though system can derive compared with " kids love " " warm hugs ", " kids love " " ice cream " is more, but when " before going to bed ", come compared with " kidslove " " ice cream ", " kids love " " warm hugs " is more.
2. relevant total frequency analysis (RCFA)
Related with word strings except searching the total word that each looking up words or word strings independently have, embodiment only can attempt discerning at those and comprise the word that occurs in the user definition scope of document of the word of two or more present analysis and/or word strings and/or the frequent appearance of word strings.Relevant total frequency analysis is different from independent total frequency analysis, because current relevant word and/or word strings of carrying out the RCFA analysis occurs in the user definition scope of document together, and the latter only investigates independent appearance when analyzing.This embodiment of RCFA of the present invention uses following step:
At first, from available corpus, locate all the document of two or more words that provide and/or word strings is provided.For example, if document is to be stored in the document database, then can locate them by returning the concrete document code of document that two or more words that provide and/or word strings are provided in expression.Document code is meant that those pass through in the prior art or the numbering of the index scheme appointment described among the application.
Then, be close to the word provided and/or each word and the word strings of word strings in identification and the more user-defined scope, and the frequency of any one word and word strings in the recording interval.Once more, user-defined scope can be narrower and the word that includes only and provide or word strings be in specific proximity (as, continuously) reproduction word or word strings.
For example, suppose two word strings " kids love " and " before going to bed " are provided and they are carried out RCFA analyze to system.Further hypothesis playing data for broadcasting storehouse comprises following clauses and subclauses:
“kids?love?a?warm?hug” 20 times
“kids?love?ice?cream” 873 times
“kids?love?a?warm?hug?before going?to?bed” 12 times
“kids?love?ice?cream?before going?to?bed” 10 times
“kids?love?staying?up?late?before going?to?bed” 17 times
“before?going?to?bed” 684 times
When using two words and/or word strings to carry out the RCFA analysis, the playing data for broadcasting storehouse system will be pointed to comprise simultaneously in the document database two fragments (as, " kids love " and " before going to bed ") document because there is identical document code related with it.Usually, system only locate those therein word strings each other at a distance of user-defined word number or be in the document of any other user-defined qualified proximity each other.
In case system identification has gone out to comprise in the document database all documents of " kids love " and " before going to bed " in the proximity of appointment, system just can make up around the word strings of two submissions all in the user-defined scope and reappear the frequency meter of words and word strings.In based on database in the example of the limited text of amount (and word or the word strings of supposing user-defined area requirement word and word strings and present analysis are adjacent), " ice cream " occurs 10 times in the scope of two phrases that provided, therefore has frequency 10, " staying up late " occurs 17 times in the scope of two phrases that provided, therefore has frequency 17, and " a warm hug " occur 12 times in the scope of two phrases that provided, and therefore has frequency 12.
If with respect to the expanded range of two RCFA word strings, then existing 4 playing data for broadcasting storehouses also can comprise depends on that user-defined word strings scope adds other word strings in the said frequencies counting to.For example, in identical text, have reappear word and word strings near " kids love " and " before goingto bed " but direct not adjacent with them (as, " kids love ice cream and other sweets beforegoing to bed ").This also means if phrase " ice cream and other sweets " repeats, and it also is that the independent of inquiry answered.The aspect (based on user-defined criterion) in application of identification semantic equivalence item will be returned " ice cream " and " ice cream and other sweets " and be grouped into single semantic classification (as, sweet food) among the present invention.In addition, the order of notion can difference and meaning keep identical (as, " before going to bed, kids love ice cream "), wish this point is added in the analysis.The identification semantic category will be discerned the different notion order with same meaning like the aspect (the two anchor point method of superpositions of combination) of notion among the present invention as semantic equivalence.
In addition, as an alternative embodiment of the invention, word and the word strings (using RCFA or ICFA) that can use known or fixed semantic equivalence to replace searching for are used for reappearing word and word strings in the scope look-around of semantic equivalence.For example, " kids like ", " kids reallylove ", " kids enjoy " also can search in system, " children enjoy " or " children love " replaces " kids love ".Can use identical method to use the known equivalence of system is replaced " before going to bed ", as " before bed ", " before going to sleep " or " before bedtime ".
Above-mentioned order of words problem and semantic substitution problem are all handled by the ability that the present invention detects the word strings pattern.As mentioned below, total frequency approach of the present invention will produce a large amount of semantic equivalence word and/or word strings, they can be used for much more relevant semantic search terms extensive diagnostic.In addition, as mentioned below, appear at the pattern of the word strings of the categories in common in the bigger common group common pattern together by identification, the present invention also can recognition sequence the different but notions of same meaning (as, " the boy andthe spotted dog " and " the dog with the spots and the boy " will be regarded as semantic primitive greatly of equal value.The method of determining the semantic equivalence notion is the additional aspect that the present invention understands the ability in knowledge acquisition of natural language with identification with the method both than major concept that its composition of different sequence arrangements makes up the semantic equivalence of piece notion.
3. secondary frequency analysis (RCFA or ICFA)
In another embodiment, system can be to the 3rd word of first or second word that constitute inquiry or word strings and the selection that identifies in CFA or word strings (promptly, one of as a result who returns) or both carry out CFA, this will give and add new information to using the analysis of carrying out.For example, if based on the frequency of all words in " beforegoing to bed " (first) and " kids love " (second 's) the total scope and word strings select total related be " ice cream " (the 3rd), then this embodiment is " before going to bed " (first) and " ice cream " (the 3rd), or generate RCFA or ICFA between " kids love " (second) and " ice cream " (the 3rd), and select related based on that twice frequency analysis.For example, " ice cream " and " before going to bed " may to have higher frequency related with " stomach ", and this may be useful in the analysis of application of the present invention.Moreover, can use identical method, according to user or the defined many arbitrarily combinations of intelligent use or many arbitrarily for analyzing any two or more word and/or word strings.Concrete application will be used automatically and analyze, so that based on each continuous CFA result's identification which kind of CFA each is carried out for the associated frequencies analysis.More complicated application will be discerned twice or repeatedly frequency analysis will carrying out before being used in combination two or more independently results.
V. use CFA to carry out single state knowledge acquisition
Can the word and/or the word strings of expression same concept in a kind of language be discerned as the part of same semantic family based on the pattern that frequently appears at their word strings on every side in this language.By checking that concrete word and word strings appear at before certain words or the word strings (in English, promptly be arranged in the left side of certain words or word strings) and appear at after certain words or the word strings (at English, promptly be positioned at the right of certain words or word strings) frequency, it is obvious that these patterns will become.Therefore, two kinds of specific CFA are used in knowledge acquisition of the present invention tabulation generation aspect, and these two kinds of specific CFA are designed to make full use of representation class and have this fact of general character like the word of notion and word strings (or sharing some other semantic relation) on the type of word that frequently comes across its front and back and word strings and order.
By using RCFA or ICFA to come creation of knowledge to obtain tabulation in this embodiment, system can comprise the notion of height correlation therein based on generating word and word strings database completely in the frequently shared word strings in the related notion left side and the right.Though other relevant informations also have higher classification, the closest related word and word strings (that is, sharing the left side of identical frequent appearance and those words and the word strings of the right context words string) be semantic equivalence normally.Other related notions comprise antisense (as, if inquiry is " hard ", it is also may classification higher to return " soft "), the related notion in the big class (as, if inquiry is " dark blue ", it is also may classification higher to return " orange "), example (as, if inquiry is " massive fraud ", also the possibility classification is higher to return " skewing documents and misrepresenting data "), and other relevant knowledges.
For example, if require system identification to have the word of or much at one meaning identical and/or word strings (promptly with another word or word strings, the word and the word strings of semantic approximate (or synonym)), system can carry out for the first time that CFA searches word and the word strings that frequently appears at the inquiry left side and the right, and carrying out for the second time then, CFA discerns the most frequent identical left side and the every other word and the word strings of the right context words string shared in this language.Usually the left side of sharing by two different words and/or word strings and the right context words string constitute approachingly more, their meaning is also just approaching more.Though high-frequency total association also shared in antonym, they depend on that specific important context relation has a great difference, these context relations are created out " antisense signature " pattern that system can discern, can filter out the antisense word and the word strings of inquiry like this, or provide the antonym tabulation so that in other are used, use.
Relation between their signature set separately that linked character between the notion of being represented by word or word strings and any other notion of being represented by word or word strings will go out by system identification defines.System uses linked database to detect frequent certain words formation of reappearing in user-defined scope, and these user-defined scopes are to customize for detecting the word mode of surrounding the relation between notion and defined notion and other notions.Therefore, the left side of word or word strings signature comprises all contexts of being represented by this word or the residing different word strings on every side of word strings with right signature (or being called cradle when using RCFA).Get the most frequent left side and the right context words string and search which other word strings and between those closely similar signatures, frequently occur, can discern synonym like this or near synonym, or other highly related phrase (word strings) and/or words.
Other word strings with semantic relation also can be shared the identical left side and the right context words string.The member of identical total class as place, color, name, numeral, date, motion or the like, has a lot of identical context words strings, and system can discern them by these context words strings.Other relations, as represent the word and the word strings of the example of looking up words or word strings, or represent the word strings of other facts associated with the query, also will share specific same context word strings, and those specific same context word strings will define this special relationship by the identification of CFA of the present invention aspect.
The feature of every kind of relation is by context words string of sharing and not shared context words string definition.The user provides the word and/or the word strings example of defining relation to system, and the method at the identification semantic equivalence in the knowledge acquisition tabulation of helping among the present invention comprises that (1) determine the method for the direct mutual relationship that two word strings are had in mutual knowledge acquisition tabulation, (2) determine that the method for different knowledge acquisition tabulation thereon all appears in two words and/or word strings, (3) generated query adds that whether overlapping left side signature and inquiry add the synonymous expression of right signature and check their methods.
How to use linked database and intelligent use 302 (to see descriptive system Fig. 3) detect the in a word bright of singly going here and there of semantic equivalence and other association knowledge by CFA.System also can carry out ICFA and RCFA to the word and the word strings that are provided, and by user-defined weighted combined result.Knowledge acquisition list filtering of the present invention and sort method are described then.
A. using ICFA to carry out the knowledge acquisition tabulation generates
An embodiment uses the certain words around word or the word strings to constitute and carries out ICFA, this will be identified in semantic values (that is meaning) and go up the word of equivalence or approximately equivalent and/or other relevant word and word strings of any word in word strings and the inquiry or word strings.This embodiment comprises: step 1, the inquiry that will analyze of (query phrase) is made of in reception word or word strings, and (using FAD of the present invention aspect) return user-defined minimum of having of user definition quantity and maximum length, and return phrase is located immediately at the query phrase left side in all available documents the highest word and/or the word strings (returning phrase) of the frequency of occurrences.The user-defined word strings of reappearing is long more, and final result is usually with regard to accurate more (specifically).Step 2, use is in the word on each word of analyzing or word strings the right or the scope of word strings, among the highest result of the classification of user-defined quantity in the step 1 each carried out FAD analyze (system will carry out classification by each word of returning and analyzing or the reproduction word on word strings the right and the frequency of occurrences of word strings in step 2 in step 1).Add all the identical words of generation in the step 2 and the frequency of word strings then.Step 3, FAD is carried out in inquiry to be analyzed, and return user-defined minimum of having of user definition quantity and maximum length, and return phrase is located immediately at query phrase the right in all available documents the highest word and/or the word strings (returning phrase) (once more, for guaranteeing that accurately wishing usually is the word strings that comprises two or more words at least) of the frequency of occurrences.Step 4, use the word of present analysis and word strings each word on the direct left side or the scope of word strings, word that the classification of the user-defined quantity that step 3 is returned is the highest and each in the word strings are carried out FAD and are analyzed.Once more, by each word and the word of word strings front and the frequency of occurrences of word strings of in step 3, returning and in step 4, analyzing, the result is carried out classification.Add all total words in the step 4 and word strings result's frequency then.Step 5, identification is by all words and/or the word strings of step 2 and both generations of step 4.In one embodiment, the frequency with each word that returns in the step 2 and word strings multiply by the word of generation in the step 4 and/or the frequency of word strings.Word that classification is the highest and/or word strings (based on step 2 and step 4 result's frequency product) normally with inquiry near the word and the word strings of semantic equivalence.This handles the tabulation that produces and is called the knowledge acquisition tabulation.
As additional embodiments, in step 5, the sum that can return based on the different word strings of sharing with inquiry in step 1 and the step 3 carries out classification to returning of step 2 and step 4.
The combined treatment of step 1 and step 3 is embodiment of ICFA, wherein uses word or word strings to discern various words associated with the query and/or word strings group separately.The combination of step 2, step 4 and step 5 is another embodiment of ICFA, wherein uses two words and/or word strings to discern total related the 3rd word and/or word strings.
Following Example is showed these embodiment, uses imaginary database to come to create between from the word of the document database of system and word strings related, and it is related to use ICFA to create then.Suppose that the user imports all words and word strings equivalence (and other relevant word and word strings) that this word that system is known determined in word " detained ".
In step 1, only to get three best results and come simplified illustration (though the return results quantity that the present invention analyzes is wanted much bigger usually and is user-defined), system at first determines directly three the most frequent word strings to occur in the left side at " detained ".The word of being analyzed (" detained ") the directly length of the word strings on the left side can be the scope of length or length and be user-defined (in this example, being three word strings).The tabulation of the result of this analysis-the have word strings of user-defined length-be called " left signature list " on the word left side that is provided.Result below supposing the system returns in above-mentioned example:
1.“the?suspect?was__”
2.“was?arrested?and__”
3.“continued?to?be__”
In step 2, the left signature list that system operation is returned.System is positioned at the most frequent word of above-mentioned three three word strings of returning appearance afterwards and/or word strings-promptly, those are in the word and/or the word strings on left signature list member the right of returning.The length of the word strings that system returns in this operation is user-defined and can add restriction.The result of this analysis-word on the right of each left signature list clauses and subclauses and/or each tabulation of word strings-be called " left anchor point tabulation ".Left anchor point tabulation below supposing the system returns in above-mentioned example:
Left side signature list ? Left side anchor point tabulation
1.“the?suspect?was__” ? ? A. " arrested " (240 times) b. " held " (120) c. " released " (90)
2.“was?arrested?and__” ? ? a.“held”(250) b.“convicted”(150) c.“released”(100)
3.“continued?to?be__” ? ? a.“healthy”(200) b.“confident”(150) c.“optimistic”(120)
In step 2, can add the identical frequency of returning in the left anchor point tabulation equally.Unique total returning is in the anchor point tabulation of a left side:
a.“held” 120+250=370
b.“released” 90+100=190
In step 3, in the document in the system specified data storehouse on direct the right of the inquiry of selecting " detained " three two word strings of frequent appearance.Once more, the quantity of the word strings of the frequent appearance of being analyzed is user-defined (once more, as in step 1, system returns the word strings of top three appearance).And, the word of being analyzed (" detained ") directly the length of the word strings on the right be user-defined, in this example, it is two word strings (noting: the word strings or the length range that can use random length in step 1 and step 3).The tabulation of the result of this analysis-the have word strings of user definition length-be called " right signature list " on the word that provided the right.Supposing the system returns right signature list in above-mentioned example:
1.“__for?questioning”
2.“__on?charges”
3.“__during?the”
In step 4, the right signature list that system operation is returned.System is positioned at above-mentioned three two word strings fronts of returning and the most frequent word and/or word strings-promptly, those words and/or word strings on the two word strings left sides of returning occur.The length of the word strings that system returns in this operation can be user-defined or can be without limits.The result of this analysis-in the word on each right signature list clauses and subclauses left side and/or each tabulation of word strings-be called " right anchor point tabulation ".Right anchor point tabulation below supposing the system returns in above-mentioned example:
Right signature list Right anchor point tabulation
1.“__for?questioning” ? ? a.“held”(300) b.“wanted”(150) c.“brought?in”(100)
2.“__on?charges” ? ? a.“held”(350) b.“arrested”(200) c.“brought?in”(150)
3.“__during?the” ? ? a.“beautiful”(500) b.“happy”(400) c.“people”(250)
Be similar to step 2, can add the total frequency of returning in the right anchor point tabulation that different right signature lists returns generation.Unique total returning is in the top right anchor point tabulation:
a.“held” 300+350=650
b.“released” 100+150=250
In step 5, carry out ICFA and system and return classification.In this example, total the returning by multiply each other step 2 and 4 (that is, returning on left anchor point tabulation and right anchor point are tabulated both) frequency produces the frequency of weighting, obtains following knowledge acquisition and tabulates:
1.“held” 650×370=240,500
2.“arrested”200×240=48,000
Another embodiment of classification does not consider concrete weighted frequency.On the contrary, according to they residing anchor point list total, at least one left anchor point tabulation is gone up all results that produce with at least one right anchor point tabulation carry out classification.In above-mentioned example, the classification of using this embodiment to carry out can be:
Hierarchical knowledge obtains an anchor point numbering of table
1 “held” 4
2 “arrested” 2
" though release " and " brought in " both each has all produced twice in analysis, they all do not appear at the tabulation of left anchor point and right anchor point is tabulated BothGo up (" released " produced twice in left anchor point tabulation, and " brought in " produced twice) in right anchor point tabulation.Also can use other user-defined weighting schemes of combination anchor point tabulation quantity and total number of frequencies.For example, an embodiment can sum occur based on different anchor point tabulations the result is carried out classification, and based on total number of frequencies any the returning in the different anchor point tabulations that appear at equal amount is carried out further classification.
Another embodiment of classification can be with the result occurs the right anchor point that occurred of left anchor point tabulation quantity and the result quantity of tabulating multiply each other.In above-mentioned example, can obtain following classification:
Hierarchical knowledge obtains an anchor point tabulation product
1 “held” 4
2 “arrest” 1
Above-mentioned displaying is carried out based on the relatively little document of quantity in the document database.Document database usually can be bigger, and can comprise and can pass through network, as the Internet, by the document of system remote visit.In one embodiment of the invention, the user not only defines the quantity that will be included in the result in the signature list, but also can all find out specified quantity have the user definition minimum frequency as a result the time stop to analyze.This can serve as separation, and can save processing power when using large database.
Other examples of the user definition parameter of the ICFA of the knowledge acquisition tabulation of generation looking up words or word strings can be considered frequent reproduction word and/or the word strings at the inquiry left side and the right all lengths.Therefore, embodiment can be by the minimum and the maximum length of specified word string, make the word strings of returning in these signature lists have user-defined variable-length, and do not make the word strings of returning in left signature list and the right signature list have user-defined regular length.In to the analysis on the inquiry left side and the right, use the word strings of the most frequent appearance of different length, can provide more " context angles " to discern relevant word and word strings.In addition, this embodiment can comprise word that returns or the minimum occurrence number of word strings that meets the signature list condition.
In an embodiment who uses the present invention's variable word strings in this respect to analyze, can be by the inquiry (" detained ") of following analysis from previous example:
In step 1, generate the most left signature list of the word strings of frequent appearance on the inquiry left side of user-defined quantity (having user-defined minimum and maximum length) by available database.This is identical with the processing of step 1 in the previous examples, except using the word strings of all lengths rather than the word strings of regular length at this.If being (1), user-defined parameter returns eight word strings of frequent appearance, (2) the word strings minimum length is two words, maximum length is four words, and (3) minimum occurrence number is defined as and occurs at least in corpus 500 times, the possibility of result in the example of front following (once more, using imaginary corpus) then:
Left side signature list frequency
1.“people?were” 1,000
2.“arrested?and” 950
3.“were?reportedly” 800
4.“passengers?were” 775
5.“was?being” 700
6.“the?people?were” 650
7.“was?arrested?and” 575
8.“they?were?reportedly” 500
In step 2, shown in the example of front, the maximum word and the word strings of returning of direct the right occurrence number of positioning step 1, the result by left signature list generate left anchor point tabulation.
In step 3, use the identical parameter of describing in the step 1 of this example of definition, generate right signature list, obtain following result:
Right signature list frequency
1.“for?questioning” 1,750
2.“on?charges” 1,520
3.“during?the” 1,350
4.“because?of” 1,000
5.“due?to” 750
6.“in?connection” 600
7.“on?charges?of” 575
8.“for?questioning?after” 500
In step 4, shown in the example of front, the left side of returning of positioning step 3 reappears the most frequent word and word strings, and the result by right signature list generates right anchor point tabulation.
In step 5,, all results that at least one left anchor point tabulation is gone up and generation is gone up at least one right anchor point tabulation are carried out classification according to the residing list total of result.In addition, also can be by multiplying each other to determine classification with residing left anchor point list total of result and the residing right anchor point list total of result.In addition, can be weighted classification with sum frequency.As mentioned above, can use various user-defined weighting schemes.
Though it should be noted that above-mentioned example inquiry is a word (" detained "), system also can produce semantic equivalence to the word strings of random length, and wherein word strings is illustrated in semantically discernible notion.For example, if come inquiry system, then can produce the possible semantic equivalence of " car race " with " car race ".By carrying out the identical step of describing in the foregoing description, use ICFA to determine approximate semantic equivalence, system can produce " stock car race ", " auto race ", " drag race ", " NASCAR race ", " Indianapolis 500 ", " race ", and other semantic relevant word and word strings.System accepts inquiry and also uses identical processing to produce associated concepts, and no matter looking up words string or the length returned.The knowledge acquisition tabulation also will comprise other continuous items, for example, and " contest ", " sporting event ", " Dale Earnhardt, Jr. " or " boat race ".
B. using RCFA to carry out the knowledge acquisition tabulation generates
Another embodiment that creation of knowledge of the present invention obtains tabulation comprises the semantic equivalence association, and this is based on the use of relevant total frequency analysis (RCFA) as mentioned above, rather than independent total frequency analysis (ICFA).Using ICFA to carry out semanteme obtains same base this method and the principle used and also can use RCFA to use.Generation of the present invention comprises that the RCFA method of the knowledge acquisition tabulation of semantic equivalence and other relations comprises following step:
Step 1: reception will be searched the word or the word strings inquiry of its semantic equivalence word and word strings (and other related words and word strings), and searching documents database, playing data for broadcasting storehouse or FAD discern the word strings part of the user definition length that comprises this word or word strings in the document.In an example, word strings " initial public offering " is imported as inquiry, and used RCFA to discern its semantic equivalence.Systematic search and discern in the document part that comprises " initial public offering " word strings then.The user can define and limit the quantity of the part of returning.
Step 2: to each appearance of the looking up words string in the step 1, by word and/or the word strings of record (i) in the user definition length on the inquiry left side, the word of the right user definition length and/or the frequency of occurrences of word strings are (ii) inquired about in combination, analyze the part of returning.This step is created the left and right sides combined signature of inquiry " putting into cradle ", is called " left side/right side signature cradle " or crable.This step is the embodiment of RCFA, wherein uses the inquiry of word or word strings to generate two related words strings.
In our example, length that can user-defined left word strings is set to two or three words, and user-defined right word strings is set to two or three words.User-defined minimum number (for example, five times) appears in the cradle that will return (for example, 100) by user definition quantity, calculates separation.This processing can obtain the part set that following imagination is returned to inquiry " initial public offering ":
1.“announced?a?successful__of?common?stock”
2.“shares?at?an__price?of”
3.“announced?the__of?its”
4.“it?considers?an__of?common?stock”
5.“completed?an__raising?a”
6.“announced?its__of?shares”
7.“announced?the?proposed__for?its?common”
8.“announced?an__of?stock”
9.“completed?its__of?shares”
10.“in?representing__underwriters?for”
Step 3: the searching documents database, search in step 2 between the left and right sides word strings of each left side/right side signature cradle that produces and the most frequent word and word strings (using the option of the user definition maximum length of setting) occur.Discern other frequent word and/or word strings that occur that these occur between the word strings of a left side/right side signature cradle, will obtain possible semantic equivalence (and other related words or word strings).Can require to return a left side/right side signature cradle that will eligiblely just must have user-defined minimum number or number percent alternatively.This step is the embodiment of RCFA, wherein uses two words and/or word strings to discern relevant the 3rd word and/or word strings.
Step 4: can be based on sum, the sum frequency of a different left side of being filled/right side signature cradle, or the combination of some additive method or method, word as a result and/or word strings (that is, other words and the word strings of " filling " each cradle) between the word strings that appears at a left side/right side signature cradle are carried out classification.
In a preferred embodiment, at first, the sum that cradle is signed on different left sides/right side of passing through to be filled carries out classification to returning.Then, the sum frequency of the left side by all fillings/right side signature cradle is to sign quantity identical returning the carrying out classification of cradle of different left sides/right side of being filled.The frequency weighting of the left side that another embodiment of grading criterion also can return generation/right side signature cradle, or can provide special weight based on the sign length of the word strings in the cradle of a left side/right side.
In above-mentioned example, the most forward the possibility of result is word and/or word strings " IPO ", " ipo " (the possibility of result is a case sensitive), " Initial Offering ", " offering ", " PublicOffering " and " stock offering " in the step 3, all these " filling " (inquiry is available) of some left side/right side signature cradle do not resolve part.
When using ICFA or RCFA to determine semantic equivalence, can be in the analysis of use ICFA as implied above or RCFA, the word strings one of all lengths of varying number is used from left side signature, right signature or a left side/right side cradle of signing.Many more as the word strings of all lengths of left side signature, right signature and a left side/right side signature cradle as a part of analyzing, the angle of the notion in the semantic values of system identification looking up words or word strings is many more.
An embodiment can be taken at the most frequent word strings in certain length range, for example, and 1000 the most frequent long word strings of three to five words that constitute a left side/right side signature cradle on the inquiry left side and the right.Another example as embodiment, system can be defined as a left side/right side signature cradle on the left side and the right of inquiry and three the most frequent word strings occur, and the left side and the four the most frequent word strings of the right appearance in inquiry of user-defined quantity, add that five the most frequent word strings appear in the left side and the right in inquiry of user-defined quantity.Word quantity in the word strings of a left side/right side signature cradle is user-defined, and can comprise any combination of the word strings length range of the notion (being represented by word or word strings) of introducing or drawing present analysis.Can be according to the sum of the different cradle of being filled, to by filling word as a result and the word strings classification that cradle produces, the frequency counting of result that the cradles of different sizes are produced or the cradle of being filled provides user-defined weight.Can use RCFA to realize any specific embodiment that uses ICFA to search semantic equivalence or discern any other relation, vice versa.
Appendix ADisplaying uses RCFA to obtain the example of association results to various inquiries.Preceding 15 examples are showed the partial results (that is, the most forward 20-25 returns in each inquiry) of inquiry, and last example (to inquiry " it is important to note ") shows that the most forward 1000 are returned.User definition setting to these results is: (1) searches preceding 1000 appearance of inquiry, (2) all cradles of two word strings on two word strings on the record left side and three word strings and the right and three word strings formation, (3) according to their frequency of occurrences cradle is carried out classification, (4) search all words and the word strings of filling a left side/right side signature cradle, (5) based on the sum of the different cradles of being filled, return results, (6) the identical result of cradle quantity to being filled carries out classification (also those weightings that can be higher to the cradle medium frequency of being filled) according to the sum frequency of the cradle of all fillings.The corpus that is used to produce these results is made up of 2,400,000,000 words approximately.Note, Appendix AIn " relatively mark " of listing represent user-defined tolerance, as mentioned above, this reflection be specific a kind of tolerance of returning semantic relevant confidence level.Mark is low more, and confidence level is low more.Corpus is big more, if their occur frequently more based on user-defined measurement criterion, the confidence level of then can the part that these scores are low returning rises to higher level.
An alternative embodiment of the invention is related with the 3rd word and the word strings of (and also meet based on possible user-defined classification require) in two or more words and/or word strings and all knowledge acquisition tabulations that appear at them.This embodiment of the present invention is called total tabulation member and analyzes, and can be used to strengthen the application that has benefited from semantic association, as the result of search, text mining and AI application.For example, when having checked two or more knowledge acquisition tabulation and having identified total word and word strings as a result the time, can use total item to strengthen function of search to the inorganization text maninulation.Therefore, if specific search inquiry is input to item " Bonds " and " San Francisco " in the search engine of prior art as the keyword of two tolerance, then go up word and the word strings with user-defined minimum classification (and having user-defined weight) that occurs by the knowledge acquisition tabulation that is identified in two initial key speech, the present invention can replenish additional keyword in search.Therefore, can add that " baseball " and " the Giants " retrieve and the content of relevant Barry Bonds of classification rather than financial bond (financial bond).
In addition, can use the knowledge acquisition tabulation (promptly, by keyword self or by being included in the tabulation of deriving and obtaining in the lists of keywords) total item is used for according to correlativity the result being carried out classification, or the establishment classification comes organize results (this is undertaken by checking based on the total item that the composition and classification cluster occurs in the tabulation).In above-mentioned example, if the text in the database comprises the information of the financial bond transaction of relevant San Francisco, then the knowledge acquisition of " Bonds " and " San Francisco " tabulation all may comprise high the returning of classification as " bondtrading " and " debentures ", they can be by system as additional keyword or factor, with the search that support to strengthen, document classification, or the result classified to returning.Under these circumstances, system can identify and resemble " basketball " and " finance " such classification, provides the option of selecting which classification to the user.Equally, as mentioned below, can filter the knowledge acquisition tabulation, so that search the synonym of inquiry (or keyword), this can be used for result's enhancing of particular search and extend to outside the document that comprises keyword, comprises the synon document of keyword to such an extent as to comprise those.
C. knowledge acquisition list ordering and filtration
Using ICFA and RCFA to produce the knowledge acquisition tabulation will make and comprise in the tabulation that some is fit to sign cradle (or appear at left and right sides anchor point tabulate on) but be not the result of semantic equivalence of a left side/right side.Meeting user definition quantity as total signature of the required inquiry of the condition returned or cradle when not high, especially can be like this.For example, have a lot of words and/or the word strings of opposite meaning, and other relevant but the not word of semantic equivalence and word strings, also be fit to a lot of left sides/right sides identical cradle of signing with inquiry with looking up words or word strings.
For example, suppose RCFA is carried out in inquiry " in favor of ", and produced cradle " the court ruled__the plaintiff " and " the senator voted__the amendment ".Be easy to see, the synonym of inquiry, as " for ", and antonym, as " against ", the both is fit to these cradles and can appears in the knowledge acquisition tabulation.
Though these other non-semantic word strings of equal value is useful to a lot of application, if application requirements can only comprise semantic equivalence in the tabulation of inquiry, then can use filter method of the present invention, produce the knowledge acquisition tabulation that includes only semantic equivalence.These filter methods that describe below comprise (1) Direct mutual relationship-not only consider the ICFA return in inquiry or the relation of the classification in the RCFA knowledge acquisition tabulation, consider that also inquiry returns classification in self the CFA knowledge acquisition tabulation, (2) at each Semantic triangle systemA method and system that returns the quantity (and the classification in those tabulations) of both residing knowledge acquisition tabulations of-consideration inquiry and inquiry.This filter method helps the approximate semantic equivalence that returns as inquiry is discerned, and is lower even this returns in the knowledge acquisition of this inquiry tabulation classification.By being similar to semantic relation (promptly to sharing with inquiry, appear at together in some different tabulations with inquiry) other inquiries return in the knowledge acquisition tabulation of user definition quantity of generation, the identification classification is low returns and/or frequency (based on user-defined setting) realizes this point.And (3) Inquiry+signature is overlapping-in the method, use single method of superposition of planting in the language to discern semantic equivalence in one embodiment of the invention.Method of superposition is realized this effect with the same way as that it connects notion (being represented by word strings) adjacent in the logic chain.To appearing at (i) looking up words or word strings and left side signature thereof, and (ii) returning in the knowledge acquisition tabulation of looking up words or word strings and right signature thereof, check them whether overlapping.The word of present analysis or the synonymous expression of word strings can be discerned as the overlapping word in the overlapping word strings.
Moreover, another kind of method of the present invention provides additive method can use the word strings pattern to sort to returning from the word of knowledge acquisition tabulation and word strings automatically, generation can be by user's mark in case reflect exactly they with respect to the difference tabulation of the semantic feature of query term (as, the antisense of inquiry (as, inquiry: " hot ", return: " cold "), with inquiry belong to common class the member (as, inquiry: " blue ", return: " purple ")).
The method, as mentioned below, be called signature scheme sort method of the present invention.Also can use direct mutual relationship and semantic triangle system, method, word and word strings be sorted according to its semantic relation to each other.When the user to system provide the item of the relation of embodying training sample (as, " hot " and " cold " as antonym) time, method and system of the present invention can be based on the word in the knowledge acquisition tabulation and the appearance and the classification of word strings, and identification characterizes the pattern of this relation.The present invention can use general pattern afterwards, carries out related with word strings as the item that characterizes the relation of being discerned the word of sharing general pattern.
1. use direct mutual relationship to carry out related with semantic triangle system
Directly the mutual relationship method can be used in the knowledge acquisition tabulation of inquiry each returned, and uses aforesaid RCFA or ICFA, generates independent knowledge acquisition tabulation, filters the result that knowledge acquisition is tabulated.Create independently knowledge acquisition tabulation by in the tabulation of inquiry all are returned, system can discern the classification of initial query on its each knowledge acquisition of returning is tabulated and whether be higher than user-defined threshold value.Inquiry and the mutual classification of returning in knowledge acquisition tabulation each other are high more, and returning may be the semantic equivalence of inquiry more.
Semantic triangle system, method of the present invention also returns each of inquiring about and uses the independent knowledge acquisition that generates to tabulate to determine which returns is the approximate semantic equivalence of inquiry.Semantic triangle of the present invention be the aspect inspection to return the independent knowledge acquisition that generates tabulate discern those in the inquiry of user definition quantity also as returning that appearance different knowledge acquisition tabulations thereon occur and classification is higher than the word and the word strings of user definition threshold value.Knowledge acquisition tabulation to inquiry goes up, comprise inquiry at other of user-defined quantity or number percent simultaneously also is any returning of returning (classification in the tabulation of sharing based on them) on as the knowledge acquisition tabulation of returning, how low no matter return in the tabulation of inquiry classification has, and all generates knowledge acquisition tabulation and carries out direct interrelation analysis and further refine semantic relation between returning and inquiring about.
As described in just now, can use direct mutual relationship and semantic triangle system, method together, according to the inquiry semantic degree of closeness to returning classification.The level that can in the tabulation of initial query, divide direct mutual relationship, tabulation member, and special weight is given in the classification of inquiry in its each tabulation of returning.Can return with these based on user-defined criterion and determine that those can be used for the application of the necessary semantic equivalence of requirement in the knowledge acquisition tabulation of initial query.
For example, carry out the semantic equivalence analysis in the system if " IPO " be input to, the knowledge acquisition tabulation of then using the system of RCFA or ICFA to produce to have various results is as " initial public offering ", " stock sale ", " initial offering " and " stock market " and other.Though " stock market " is and inquiry " IPO " relevant notion, it is not its semantic equivalence.Use above-mentioned filter method, can generate independently knowledge acquisition tabulation " initial public offering ", " stock sale ", " initial offering " and " stock market ".
After generating these tabulations, the appearance of " IPO " (initial query) in the knowledge acquisition tabulation that " stock market " generated may be determined obviously than other tabulation much less that returns in direct mutual relationship of the present invention aspect, and the semantic triangle system, method may determine that " stock market " occurrence number in the independent tabulation that " initial publicoffering ", " stock sale " and " initial offering " are generated is less than inquiry always and other return.Given this, to the application as translation, speech recognition, search and only first-selected other application near semantic equivalence, user-defined parameter can remove " stock market " from the knowledge acquisition tabulation of " IPO ".
Can be based on the user-defined result who uses above-mentioned two kinds of analyses that is provided with.In one embodiment, in order can efficiently to handle, only carry out the most forward phrase of classification that above-mentioned analysis comes the last user definition quantity of knowledge acquisition tabulation of independence test inquiry by the CFA that generates it.Yet, if the phrase that occurs in the knowledge acquisition tabulation of inquiry has low classification (or even not occurring), what but this word or expression appeared at user definition quantity goes up (even its classification in the above is lower) to inquiring about the tabulation of definite semantic equivalence, also can check this phrase by the consideration (wherein inquiry has classification really in the tabulation of other phrases) that generates independently knowledge acquisition tabulation check " mutually ".
The a plurality of words and/or the word strings of synonym are provided to system as the user, provide training set relevant but the not word of the pairing of synonym and/or word strings fashionable to it then, can use synonym or the unique knowledge acquisition tabulation of non-synonym are occurred and hierarchical pattern, so that discerning the word and the word strings of synonym each other in the future.
Similarly, system also can use the non-synonym that the user provides but have each other particular kind of relationship (as, antisense, class members) the item example as training sample, and attempt this pass of identification and tie up to any general modfel between the item in each other the knowledge acquisition tabulation, and search these relative to each other patterns in other knowledge acquisition tabulations.System can use these patterns to discern to share the universal relation between two items of those patterns then.
Can use direct mutual relationship and semantic triangle system, method, the pattern that embodies other semantic relations based on the appearance in knowledge acquisition tabulation and hierarchical identification.For example, the user to system provide be each other belong to member's the word of common class and the training sample of word strings (as, " New York " and " LosAngeles ", be the city of the U.S.) afterwards, system can identify that knowledge acquisition tabulation occurs and the pattern of classification, can be with its vague generalization and be used for discerning other words and the word strings in the city of representing the U.S..
In addition, the knowledge acquisition tabulation that has of on the same group class members does not occur and hierarchical pattern can further be discerned indication and represents similar member's two words and/or the more generally pattern of word strings.For example, if the training word and the word strings of the city of the expression U.S. that the use user of system provides, color, name and numeral come analysis knowledge to obtain tabulation, and search tabulation appearance and the hierarchical pattern that characterizes the universal relation between the class members, system can use such pattern to discern as the universal relation between two items of class members in the future.
2. use inquiry with sign overlapping carry out related
The method uses the overlapping requirement of word as filter method, only stays semantic equivalence in the knowledge acquisition tabulation.The method can be improved existing knowledge acquisition tabulation or be used to create the independent tabulation of the semantic equivalence that only comprises inquiry.The method is got a looking up words or a word strings and is discerned the cradle of the user definition length word strings scope of user definition quantity (or independently left side signature is signed with right).Next, the left side signature that each inquiry is added user definition quantity is together as long word strings unit (inquiry+left side signature), and use RCFA (or ICFA) analyzes it and produces the sign knowledge acquisition of word strings of inquiry+left side and tabulate.Next, the right side signature that each inquiry is added user definition quantity comes selected inquiry+right side signature word strings is produced some knowledge acquisition tabulations as a unit.Next, the highest member of classification that user definition quantity is gone up in the knowledge acquisition tabulation of check inquiry+left side signature word strings, the right of each and the inquiry+right side of searching them sign knowledge acquisition tabulate overlapping word and word strings between the member left side of last user-defined quantity.The semantic equivalence that overlapping word in each the overlapping word strings that identifies in the step in the end or a plurality of word are normally inquired about.
For example, in the example of use inquiry " initial public offering ", the left signature list that identifies is added in the inquiry in front, and to each the generation knowledge acquisition tabulation in these long strings.Therefore, analysis to left side signature+inquiry (as " for an initial public offering ") will generate semantic equivalence as inquiry itself, equally also can use other left side signature+inquiries, as " announced the initialpublic offering " and " the proposed initial public offering ".
Next, use right signature+looking up words string, as " initial public offering price of " and " initial public offering of stock ", generate the knowledge acquisition tabulation (and possible synonym word strings) of these phrases as inquiry.
Next, check left side signature+Query List be on the right with right side signature+Query List on user-defined eligible member's the left side overlapping.Overlapping word and word strings are the semantic equivalence word of initial query and word strings (as, initial public offering).Such result's a example is, if left side signature+looking up words string " announced the initial public offering " generates the tabulation that comprises " went public with theIPO ", and right signature+looking up words string " initial public offering of stock " has qualified tabulation member " IPO of equity ", then " IPO " is overlapping word or word strings, assert that therefore it is the synonym notion of " initial public offering ".
The overlapping filter method of inquiry+signature can make up with other filter methods.In one embodiment, can be as first step before the overlapping filter method of use inquiry+signature with mutual direct relation and/or semantic triangle.
3. use the synonym mighty torrent to carry out association
Except the semantic approximate word and the method and system of describing just now of word strings discerned of the present invention, the present invention can also comprise the semantic equivalence word strings of further help identification looking up words string or revise the result's of CFA single state or single language mighty torrent method.This embodiment uses word word or word to be compiled the synonym of identified word to the similar vocabulary of phrase.Except word, similar vocabulary is compiled and also can be comprised idiom related with its semantic equivalence and collocation.
The looking up words string can be decomposed into word (and/or idiom and collocation) and use and should similar vocabulary to compile (and/or the word that obtains of use CFA is to word (or word is to phrase) semantic equivalence) to discern the tabulation of the semantic equivalence of each word (and/or each idiom and collocation).Search text corpus is then searched and in the word strings of user-defined maximum length each looking up words string word is had the synon word strings (for determining minimum value, a synonym only being counted in each word) of minimum number.Can use initial word rather than its synonym in the looking up words string to satisfy search criteria.The method makes up word strings Aim of Translation language mighty torrent method conceptive being similar between bilingual among the present invention, except in this embodiment, use similar vocabulary to compile rather than stride language dictionaries.For example, if use the technology dictionary that defines technical term by common words, then this method can the multi-form generation translation by two kinds of language (as, technical term and lay language).For example, if compiling, similar vocabulary comprises the clauses and subclauses " oncological mass " that are equivalent to " localized " " non-metastasized " clauses and subclauses and are equivalent to " cancer ", then based on user-defined search parameter and the text that is used for carrying out the mighty torrent processing, phrase " non-metastasized oncological mass " can be equivalent to phrase " localized oncologicalmass ", " non-matastasized cancer " and " localized cancer ", and other possible phrases.
4. word strings cradle or signature scheme ordering
Also can training book invention be identified in any word or the word strings left side and the right and identify the knowledge acquisition tabulating result with inquire about between the signature of relation and the pattern of cradle word strings (as, antisense, class members, notion and example, other relevant knowledges).The user can provide the example of one group of characterization of relation to system, allows systematic learning that the word strings signature and/or the cradle pattern of relationship characteristic are provided then.
For example, for training system identification antisense notion, the user can provide following three inquiries and three initial knowledge from each inquiry to obtain tabulation and be the member of the antisense notion of inquiry, and is as follows:
The inquiry antisense
1.“good” “bad”,“very?bad”,
“awful”
2.“world?class “stupid”,“dumb”,
scholar” “moron”
3.“cold” “hot”,“very?hot”,
“boiling”
The user also can provide the additional example of the synonym of inquiry and antonym thereof further to train.A system searching left side and/or the right signature (or cradle) unique then to the antonym of inquiry.
This embodiment of the present invention as generating the knowledge acquisition tabulation, uses CFA to determine that signing in a left side total between two groups of different words and/or the word strings and signs in the total right side both (or in some cases, determining total cradle).Importantly, this embodiment also can check the left side signature word strings of inquiry, and with they with the user import and be identified as inquiry antonym item right side signature word strings relatively, the accurate coupling between them is discerned in trial.This embodiment also checks the right side signature word strings of inquiry, and with they with the left side signature word strings of the antisense item of user's input relatively, attempt discerning the accurate coupling between them.Usually, these patterns that appear between the item of the same concept among inquiry and the phase negative side's (or context) of antisense thereof can be indicated special relation.When the user when system provides the example that characterizes the relation between them, it is identical that system can check and discern the right side signature of an example of the word of which left side signature of an example of inquiry or its synonym and the antisense notion of expression inquiry and word strings, and vice versa.Search be the right side signature of inquiry be again the word strings of left side signature of the antonym of inquiry, or identification be the left side signature of inquiry be again the word strings of right side signature of the antonym of inquiry, those word strings patterns that help to characterize this relation for identification provide.When system identifies it in the CFA of relevant knowledge knowledge acquisition tabulation before, do not run into as yet but have such " antisense signature " with respect to inquiry the time, system can discern the relation of returning and inquiring about as antisense.
These can constitute the pattern that allows training system to discern antisense in future to antisense unique signature and cradle pattern.Can identify other specific antonyms that system is not run into as yet by different antisenses and carry out general pattern.By the training of carrying out with previous antisense cradle or signature, perhaps can not capture the new antonymy that system runs into when relevant knowledge (comprising semantic equivalence) is carried out RCFA or ICFA.When such situation occurring, and when the user shows that to system result in the knowledge acquisition tabulation is the semantic antisense of looking up words string, can use looking up words string and semantic antisense word strings to return system is further trained the relation of discerning signature (or cradle) and this type of antisense.
The training method identical with the type that antisense is described can be used for training system and discern other relations.System uses example to search the signature unique to this relation (or cradle) word strings context pattern, and therefore can define it.For example, can be by providing the word strings example of different this semantic relation of sign to system, the same class members of training system identification inquiry or the example of inquiry.System can discern every group of word and/or unique cradle (or signature) pattern of word strings then, and can use it for the such relation of identification in the future.
This method and system is discerned the identical match of right side signature with left side signature that returns and the left side signature of inquiring about and the right side signature that returns of inquiry, discern relation with this word strings pattern of determining to sign, and only discern the cradle unique, and do not comprise real semantic equivalence (or other relations) antisense.This CFA method of handling the use standard is comparing between the signature of a left side and between the signature of the right side, except can't help to inquire about shared cradle at this system searching by the antisense of inquiry, rather than only searches the total cradle of inquiry.By the antisense unique cradle of identification, can use this word strings pattern to help discern the item of the antisense that is other to inquiry.
For example, unique signature that inquiry is not shared with the antisense of inquiry or cradle pattern generally include will inquiry the antisense signature or the cradle that comprise as the part of cradle or signature word strings, as follows.For example, three imaginary cradles to " hot " that occur in corpus of documents may be:
“it’s?not__it’s?cold”
“I’m?not__I’m?cold”
“you?promised?it?would?be__but?it’s?cold”
Antisense item " cold " be constitute looking up words " hot " not with the part of the word strings of the shared unique signature of word " cold ".This and other to " hot " and " cold " unique word strings signature or cradle can not discerned " cold " as the antisense of " hot ", even before the embodiment that uses this embodiment or other knowledge acquisition list filterings of the present invention and ordering, " cold " may upward classification be higher in the knowledge acquisition tabulation of item " hot " being used CFA obtain.
The result shows such pattern, constitutes by signature (or cradle), and the relation of sign unique type.System can use this pattern to discern and also share and pass through relatively other words and/or the word strings pairing of " relation recognition " pattern of their signature (or cradle) formation then.Therefore, in one embodiment of the invention, discern word and/or word strings with opposite meaning with word or word strings inquiry system, system with (1) be identified in occur around the inquiry the most frequent word and/word strings, (2) identification has some total signature (or cradle) with inquiry but is not the word of the general character on type or quantity or number percent that they can be discerned as synonym and/or the tabulation of word strings, (3) the shared signature (or cradle) of word that relatively these are relevant then (but not synonym) and/or word strings and inquiry (as mentioned above, carry out the left side to the right and the right to the left side, and the left side to the left side and the right to the two kinds of comparisons in the right), and the result of (4) comparison step 3 and the antisense word of previous identification and/or the signature of word strings pairing.If any relatively the obtaining and the enough similar pattern of the pattern that relatively obtains by the signature between the known antisense (based on the signature or the cradle that identify the indication antisense in the step 3) (under the user definition standard) that generates in the step 3, system's word or word strings that identification step 2 is obtained then, itself and inquiry contrast are obtained this pattern, and it is identified as the antisense of inquiry.
These identical principles are applied to discern the system of any relation of knowledge acquisition tabulation between returning and inquiring about, these relations not only comprise synonym and antisense, also comprise common class the member (as, " red " and " blue " all is that color, " New york " and " Paris " they are place names) and any other semantic relation.By locating the left side total between two words and/or the word strings signed to the left side to the right and the right to the right signature and the total left side in the left side and the right, can obtain characterizing the pattern of these relations, so that in the future discerning this relation automatically to sharing by the pairing of the item of the relation of those related signature definition by system.System also can automatically carry out " cluster " to word and/or word strings group by their to unique total signature and the cradle of this group, and discerns the relation of they and other groups.
The user definition parameter that the system of it should be noted that is used to produce word strings equivalence (or any other relation) can be included in the contiguous inquiry in the left side or the right but not with the word strings of inquiry direct neighbor.It is not too efficient usually at semantic meaning representation to adjust user-defined parameter, or in the not too conventional application of structure (as, be fixed on the dialogue in the Internet " chatroom " medium and the dialogue of other types) special needs.
VI. be used to stride single state knowledge tabulation of state knowledge acquisition and reconstruction (translation)
Additional embodiment of the present invention uses the tabulation of generative semantics equivalence to help the system and method for the application of the present invention in the language translation.Can substitute or connect any method of discerning among the present invention adding the word strings translation of striding in the language database to it and carry out translation.
Method and system of the present invention can be used to produce auxiliary any machine translation system based on corpus, and (as, semantic equivalence EBMT), such machine translation system comprises mechanical translation of the present invention aspect.The embodiment that reaches the semantic equivalence of target language word string in any amount of use source language may be used to generation, check and verification and accurately translates.Moreover other embodiment can use the translation of signature or cradle to help finish accurate translation.
For example, word strings is translated and is finished translation and it does not appear at and strides in the language linked database and can not use available parallel text to make up if desired, then system can generate the semantic equivalence of this unknown translation in source language, and check that the word strings whether any semantic equivalence is arranged has the known translation of target language in database, or can learn based on the available language text of striding.
In addition, the word strings translation in the target language also can be in striding the language linked database, and the adjacent words string translation of still getting along well is overlapping on both sides as two anchor point method of superpositions are desired.Under these circumstances, can not confirm translation according to the overlapping requirement of two anchor points, but target language word string translation can be used for producing the word strings of target language semantic equivalence, can check the overlapping situation with its neighbours then, so that it is confirmed as complete translation.
How in translation database, to use another example of generative semantics system and method for tabulating of equal value as follows:
At first, generate the left side of two parts of not resolving as yet at source document and the concrete signature with user-defined length on the right.For example, supposing the system is just at translation of the sentence " I went to the ball park towatch the baseball game ".Moreover, suppose " I went to the ", " went to the ball park ", " to watch the " and " watch the baseball game " to stride the overlapping translation of language known to system.With " went to the ball park " and " to watch the " overlapping phrase, system does not have overlapping target language word string translation, for example, " ball park to watch " (known it be the phrase or the part of not resolving) need provide overlapping connection to confirm all to have the sentence that translates of adjacent overlapping word strings in bilingual with it.If user-defined parameter-definition has three word strings for the direct left side of phrase of not resolving, and the phrase of Xie Xiing does not directly have three word strings in the right, and then the present invention returns two three word strings: " concrete left side signature word strings " (" went to the ") and " concrete right side signature word strings " (" the baseballgame ").
The second, use the embodiment of above-described any establishment semantic equivalence association, the phrase of not resolving of source language in the document database is generated signature list (using ICFA in this example).The tabulation of using above-mentioned semantic equivalence system and method that the phrase of not resolving is created out is called left signature list and right signature list.
The 3rd, concrete left side signature word strings and all clauses and subclauses on the left signature list are translated as target language.Can use any method of the present invention or any equipment of the prior art to obtain translation.The above-mentioned multilingual lever embodiment of the application of the invention can improve the result who uses translation system of the prior art to produce.The result of this processing is " a left target signature list ".All clauses and subclauses on concrete right side signature word strings and the right signature list are carried out similar Translation Processing create " right target signature list ".
The 4th, the step 2 and 4 of using above-mentioned semantic equivalence to handle is used the target document database, generates the tabulation of target language anchor point by left and right sides target signature list.The results list of this processing is respectively left target anchor point tabulation and the tabulation of right target anchor point.
At last, more left target anchor point tabulation and the tabulation of right target anchor point returns.The result who appears at least one left target anchor point tabulation is gone up and at least one right target anchor point is tabulated is may translating of inquiry, and carries out classification according to their residing anchor point list total.Extra classification weight is given in appearance in the anchor point tabulation that can obtain to deriving by concrete context words string, to obtain higher precision.The product of the quantity that also can be by the residing left anchor point tabulation of result and the quantity of right anchor point tabulation is determined classification.In addition, to classification as a result the time, can be with some weight of the sum frequency returned and/or any other user-defined criterion as the factor of considering.
Certainly, as the application of any use ICFA, can use RCFA to realize the foregoing description similarly in conjunction with concrete context cradle and other high-frequency general cradles of above-mentioned inquiry.Under these circumstances, in source language, generate accurately contextual concrete cradle and general cradle, then they are translated as the target language cradle.Then, on the target language corpus, use the target language cradle, fill these cradles with other target language word strings.
It is as follows to use semantic equivalence to make up another embodiment of the database that may translate of inquiry behind the phrase of not resolved:
At first, by only using the word strings of signing about the phrase of not resolving in the inquiry concrete as mentioned above, generate the anchor point tabulation.Then, by using left signature list and right signature list (rather than concrete left side signature and right signature word strings) as mentioned above, generate left anchor point tabulation and the tabulation of right anchor point.Then according to their residing anchor point list total to appearing at the tabulation of (a) left anchor point and/or carrying out classification by at least one (b) right anchor point tabulation in the concrete left side signature word strings and the anchor point tabulation that obtains of deriving and/or the result that derives in the anchor point tabulation that obtains at least one by concrete right side signature word strings.Appearance in the anchor point tabulation that can obtain to deriving by concrete context words string provides extra classification weight.In addition, also the product that returns residing right anchor point tabulation and left anchor point tabulation quantity can be used for classification or any other user-defined method.
Next, be translated as target language with the part of not resolving in the translation and inquiry and by the semantic equivalence tabulation that above-mentioned classification generates then.Can use parallel text database of the present invention to make up the method that any other structure word strings among device (using available parallel text), the present invention is translated, or other interpreting equipments of the prior art obtain translation.The result that the translation system that can use the above-mentioned multilingual lever embodiment of the present invention to improve the use prior art obtains.If the translation result of user definition quantity is identical, then this result can be appointed as possible translation.In order further to analyze, in another embodiment, to each translation result, system uses the tabulation of equal value of target language text database generative semantics.The initial target language translation of the tabulation (at least two tabulations) that appears at maximum quantity being gone up and having a minimum classification thresholds (absolute and/or relatively) in those tabulations is appointed as and is resolved may translating of part in the inquiry.
All embodiment of the word strings translation of using the semantic equivalence analysis to help translate not resolve also can be by using concrete context words string and carrying out that CFA produces the semantic equivalence of concrete left side signature word strings (or cradle) and the semantic equivalence of concrete right side signature word strings (or cradle) produces additional signature or cradle.These semantic equivalences that can use concrete signature or cradle make up semantic equivalence in the source language as additional signature or cradle, or use the signature that translates or cradle that they directly are translated as target language and come establishing target language semantic equivalence.
As using ICFA or RCFA document to be translated as another embodiment of another kind of language from a kind of language, resolve sentence and other the document fragments that will translate by word, and each word that will translate and corresponding left side signature word strings and right signature word strings are generated the knowledge acquisition tabulation.Use the word in the source language, reach the language dictionaries of striding between the bilingual, can obtain each word may translate in target language.Use these target language words to generate the knowledge acquisition tabulation of each target language word.The deriving method of two anchor point method of superpositions finds out the overlapping word strings in each knowledge acquisition tabulation of adjacent or word that the position is approaching in the present source language, and makes same treatment in target language.Language dictionaries is striden in use, and the word in the overlapping word strings in the knowledge acquisition tabulation can be confirmed as translation with those strings.Can further use two anchor point method of superpositions connection translations and adjacent word strings to come the translation of verification word strings.Can use identical method greater than the unit of a word (as, two words) to resolving the back, and can replace striding language dictionaries, serve as the translation bridge between the language with translation of the present invention aspect or translation engine of the prior art.
In addition, discern the method that allows interchangeable semantic item is carried out signifying word when translating at search source language word string and/or target language word string in the application of the invention, the concrete method for quality of identified word or word strings and other words or the semantic relation that word strings had can be used in translation application among the present invention.For example, suppose to use a kind of method of the present invention that meaning is translated as English for the language X word strings of " tell Bob to come downstairs ".If the text of language X and/or English does not comprise this word strings, but comprise word strings " tell Jim to come downstairs " and " tell Mary to come downstairs ", then wish to use these word strings, by using " name mark " rather than word " Bob " to help the identification translation, in the translation of final output, replace the name mark then with " Bob ".
The known method of prior art uses the class mark to known equivalence class in translation, these equivalence classes such as name, date, numeral and week, they can exchange in translation usually each other, so translation of this form translation that just can serve as all class members.These methods of the prior art attempt filling in advance the known member of equivalence class, thereby discern them when meeting them.Though the method is worked finely to the known class member who only belongs to a class, if but the word that belongs to two or more classes is run into by system, or word or word strings be unfamiliar certain kinds (as, name) member can not use the class mark when then prior art is searched for the translation candidate in target text.
The present invention provides not being known class member's word and the method that word strings is used the class mark to system.The method analysis does not appear at any word strings of striding in language database or the corpus, and attempt checking any word in the longer unknown words string or substring (or by in unknown words string front and/or the back add the expansion that adjacent word is created) whether be the signature (or cradle) that the word in long the unknown string or word strings is identified as class members that can signifying word.
For example, if the word strings meaning of translation is " tell Jerome to come downstairs " and system does not comprise this word strings translation and can not find it in available documents in database, then system can identify, cradle " tell__to come downstairs " is possible " name class " sign, and word " Jerome " appears in enough other word strings of corpus mesopodium, satisfies quantity or the number percent that is categorized as the required user-defined name cradle of name mark simultaneously.In case signifying word name Jerome, system just can use this information, makes up the translation of " tell Jerome to come downstairs " with the word strings of any other name that comprises cradle " tell__to come downstairs " and filling cradle in the corpus.
Moreover at word or word strings has two kinds of meanings and have only a kind of meaning whenever to belong to certain kinds, which meaning concrete cradle (or independently left side signature and right signature) will determine to use.For example, if sentence is " give me the blue paint before you go ", then system can reach " blue " based on cradle " giveme the__paint " other determine that it is the known signature signifying word " blue " of color, with it as color.Yet, if word strings is " I feel blue since the breakup ", then system will not turn to color to " blue " mark, because this cradle does not satisfy color class, but can use the member's word that belongs to " emotion " class together with " blue " to replace it based on said method as " sad ".
VII. single state knowledge is rebuild
Stride language as two anchor point method of superpositions and piece together suitable adjacent words string translation, identical method of superposition can be used in single the kind in the language, by will being overlapping subelement than long concept analysis, generate the semantic equivalence of subelement, and when synonym subelement and its neighbours overlapping (neighbours can be the synonyms of original text or original text), replace the synonym subelement of original text, express any long notion with multitude of different ways.To text mining and search and retrieval, and natural language identification, natural language interface and complicated more artificial intelligence application, this is effectively application.
For example, handle sentence " when I get home from school I must do my homework beforeI go out to play with my friends ".By carrying out RCFA or ICFA knowledge acquisition analysis and semantic equivalence filter method, the semantic equivalence phrase of the subelement that parses below can be learnt by system:
1.“when?I?get?home?from?school?I?must”
a.“when?I?come?home?from?school?I?must”
b.“when?I?come?home?from?school?I?better”
c.“as?soon?as?I?come?home?from?school?I?have?to”
2.“I?must?do?my?homework?before?I?go?out”
a.“I?have?to?do?my?homework?before?I?go?out”
b.“I?better?do?my?schoolwork?before?I?head?out”
c.“I?must?get?my?homework?done?before?I?leave?the?house”
3.“go?out?to?play?with?my?friends”
a.“head?out?to?play?with?my?friends”
b.“leave?the?house?to?hang?out?with?my?posse”
c.“go?out?to?hang?with?my?buddies”
The word strings tabulation of above-mentioned semantic equivalence adds method of superposition, and the various alternative of expressing whole initial sentence can be provided.For example, a kind of alternative expression of sentence can be:
when?I?arrive?home?from?school?I?better
I?better?do?my?schoolwork?before?I?head?out
head?out?to
play?with?my?friends
After getting rid of redundancy, system provides " when I arrive home from school I better do myschoolwork before I head out to play with my friends " synonymous expression as initial query.
The scope that VIII.CFA uses
A. general introduction
On core, the associated data base construction method comprises that (i) obtains the data cell of organizing with linear or orderly mode, it (ii) is the whole possible adjacent subset of institute with data decomposition, and, make up the relation between all data subsets (iii) based in all data cells that can be used for studying, reappearing subclass distance (usually very) approaching frequency each other.On the core of CFA, the proximity that frequently reappears between the system identification playing data for broadcasting slice groups concerns to be found by the shared particular association of the data slot of two or more reproductions.Therefore, the identical method of using in database initialize and total frequency analysis can be in data mining, text mining, the Target Recognition of a lot of other types, and needs any other of pattern between the identification associated concepts to be used for recognition mode in using.Moreover these tasks are not limited to search the word strings pattern in the text.
Concerning the language translation, the concrete form of notion represents that in document concerning music, concrete form can be the note of expression same composition and numeral of sound frequency or the like.Use video and two kinds of medium of audio frequency, can use similar method to obtain that the baseball player swings but the video clipping of failing to hit and the association between the word strings " strike out ".The baseball player swings and misses that the consistent general vision of getting back to the lobby is then represented and word strings " strike out " (or known its meaning is the sound frequency of " stike out "), on significant sample size, will have the very high concept related frequency of striding.In case develop when being encoded to viewdata can vague generalization to the mechanism of the understanding of swinging and missing, just can the permission system operate in this case.
As another example, a common objective of visual software comprises with the systematic analysis visual image determines that automatically the someone is whether in image.Though this current development level to visual or image recognition technology is the task of difficulty, the present invention can use CFA by search in the part in image corresponding to the people adjacent feature (as, in given radius) learn the signature of " people ".Use this embodiment to provide the image corpus, training the factor distinguished of searching between pel array that constitutes the people and the pel array that constitutes other things outside the people on the corpus to system.A kind of method allows system use the image of obtaining by the photosensitive camera lens and the infrared ray sensor of the object of identification distribute heat.The object of training system identification definition distribute heat and those the light sensitive pixels pattern of the relation between the object of distribute heat not then.Distribute grouping by such heat, system can further improve non-human element (other animals, fire, or the like) and the people who the training of voxel model is distinguished distribute heat.
Generally speaking, the present invention is based on the image sequence that in all contexts of this main concept, appears at around its and define any given " main concept ".In some sense, the present invention is by surrounding its each main concept of all concept definitions, and this comprises the notion that appears at the main concept front and appear at notion after the main concept, and no matter the expression-form of notion.When expressing notion with written language, exists " time " dimension of surrounding and defining it (as, by flow, in proper order or sequence express).Left signature form in the English is shown in the different concepts that appears on " time " before any inquiry, and the right signature form in the English is shown in the different concepts that appears on " time " after the query concept.
Representation of concept in the specific medium outside text has increased " space " dimension of additional encirclement main concept.Except the context that a plurality of time quantums provide notion, these additional dimensions also provide the context of other definition main concepts.For example, spoken language increases context (signature) (except still very important in being right after before the main concept and the identification of notion afterwards) with tone color, intonation and modulation in tone or the like form to each notion in the image sequence.Physics (or perception) dimension around the visual representation of notion adds to not moving notion additional context is provided along with the time, if it is along with the time moves, then also is provided at the image sequence before or after it.Certainly, except by the important context that provides of image sequence around on a plurality of time quantums, contextual dimension helped define in time each isolated notion around the audio frequency-visual representation of notion and other many sensory indications simultaneously also increased some.
B. data compression
In case use CFA (or use stride the state knowledge acquisition stride state ground) in single kind of state to generate conceptual knowledge base, to in every kind of language and stride various words and the word strings that different language is expressed same concept, can give numbering or some other unique effective identification label or mark to each notion and identify them jointly.This provides very powerful data compression method and system naturally.If the expression in standing state has been given with the particular association of the data point of another kind of state and made a catalogue in database, then can between that two states, change.
For example, can be to by certain form, state, or language representation each " notion " distributes numbering (or the frequency on the electromagnetic spectrum).When will be with association of ideas during from a location transmission to another position, they can be resolved to overlapping notion, and the expression of the notion that those can be parsed be converted to other marks that distribute (as, numbering, electromagnetic frequency or the like).By using these marks, use electromagnetism again and again or other bandwidth-version (and sending scrambler and received code device) obtain compression from a location transmission to the required data volume in another position.
The transmission of notion need be matched (notion, unique number) in transmission for the first time, and only needs transmission to number later on.The multiprocessor of the technology of the present invention is realized, the long distance that identical efficient internal transmission can be used as notion between the processor transmit and realize (as, pass through unique number).In case transmitted notion, just the unique identifier of replacing them with conceptual description come to its decode-and no matter unique identifier how to encode: numbering, electromagnetic frequency, or any other identifier can.
IX. the single state CFA that is used for intelligent use
In another embodiment, the user can be based on the identification that the AD HOC that appears at the two or more different word strings in problem, request or the sentence is together made up, and control the present invention automatically performs specific CFA.The user can control system, make that the appearance (after resolving to two or more word strings of all lengths in various other modes, identification is in existing word strings combination specific proximity or order) of pattern of two or more various words strings is the parts that trigger the complexity classification of specific CFA.These CFA can require system visit before the CFA by the front learn and be stored in information in the knowledge base now, maybe can require study from the fresh information of document database (or Web or other available corpus), use it and it is stored in the knowledge base so that use in the future.Each result to CFA, system is retrieving information from knowledge base, or the trigger that is provided with based on previous training and user (or system the trigger that obtains of study) automatically, carry out CFA next time (or a series of CFA that triggers by previous CFA), give the answer of ging wrong or execute the task up to system.
The present invention can use method of the present invention to generate the knowledge acquisition tabulation, and uses filter method to discern all from request, problem, or the semantic equivalence word and the word strings of word that parses in the sentence and word strings.In one embodiment, can train the dissimilar problem of method and system identification of the present invention.For example, if to system interrogation as " Where can I see kangaroos in America? " such problem, system may be trained for and will wherein may be identified as " Where Does One Find__ " classification by the part of user's classification, and this is before to be trained and marked by the user.The user can use above-mentioned semantic equivalence maker (and method of superposition) various other forms to one or more example identification inquiries of this type of problem by training system.In case system is trained, and system can the various object lessons of recognition system, then the user can be provided with trigger when identifying this type of problem, this will begin to carry out the answer that predetermined CFA next time provides problem.
For example, system can learn by semantic equivalence analysis and filtration: " where can I go to see__ ", " where can you tell me to go to see__ " and " where can I find__ " are all members of " Where Does One Find__ " problem utility box.
Similarly, system also will by RCFA or ICFA to " see kangaroos " (as, " watchkangaroos ") and " in America " (as, " in the US ") carry out semantic equivalence and generate and obtain classification or notion class.Therefore system can discern inhomogeneity member's combination of CFA is carried out in triggering to next group word and/or word strings appearance.Therefore the user can training system discerns these class members' in particular sequence pattern, thereby they trigger the required strategy of answer that CFA discerns this type of " Where Does One Find__ " problem.
Moreover " Where Does One Find " part may be in sentence beginning place, for example " If I wantto see kangaroos while I ' m in America, where do you suggest I go "." where doyou suggest I go " is last notion in this sequence.Therefore user's training system is identified as this form and image sequence the member of " Where Does One Find__ " problem classification, carries out artificial intelligence application so that analyze by CFA.
In one embodiment, the user can be provided with trigger to system, make that for the answer that obtains, system provides the answer that meets concept classification case " Places " when it runs into the sequence of the notion in the utility box of proposition " WhereDoes One Find__ " problem.Find out correct place and be the target of the CFA that triggers by this group word strings in identification " WhereDoes One Find__ " problem.
The user can training system, when running into the problem of " Where Does One Find__ " type, search the object checked with query requests in " Places " utility box (in this example, i.e. " kangaroos ") near the member of relation the closest (that is, frequently appearing at the direct left side or the right (or)).Judge the closest needs of which " Places " and " object " relation know the direct left side of object in text or the right or near frequency counting, indicate you can find the concrete word strings signature or the cradle of this object around maybe can comprising the training system recognition object in the somewhere.If this is the unique information in the problem, then
In " Place " utility box with " kangaroos " maximally related member may be " Australia ".Yet in this example, problem also comprises the member that user's training system is identified as " Place Restriction " utility box, " in America ".The user can training book invention want to trigger CFA between the things (" kangaroos ") seen and the position limit (" in America ") the quizmaster.The highest association may be " the zoo ", " the San Diego Zoo " or " on TV " between these two data fragments.Note " Place " utility box that " on TV " may not be accordant to the old routine.Yet inquiry " where can I see " meets " HowCan One View__ " utility box (and " Where Does One Find__ " utility box).This can comprise " on TV ", and therefore, intelligent use can allow from " Place " utility box and for example the user determines or the answer of " Ways to View Things " utility box that systematic learning obtains.
Other more the complicated problems result that may require CFA trigger problem or the request that another time CFA handles particular type as the part that multistep triggers scheme.As mentioned above, the user can be based on meeting general classification and the user pattern of the various words string of training system " thinking is handled " of using or strategy, and training system uses these trigger steps.
As described in just now, user's training system uses specific trigger to specific CFA.Along with user's training system, and obtain the abundant trigger of dealing with problems, system just can begin to learn when running into new word strings pattern, how based on unfamiliar a plurality of word strings patterns and trigger similarity (using the analysis of CFA semantic equivalence to add the overlapping similarity of judging) between known a plurality of word strings patterns of CFA, how identification triggers next step suitable CFA.Next, the similarity between the system identification trigger group and use them that the trigger of new word strings pattern is set.Moreover the user can be provided with the strategy that automatic trigger solves new problem to system trigger is set.
The person skilled in the art should be understood that experienced operators can make a change and without departing from the spirit and scope of the present invention said apparatus and method.
Appendix A-knowledge acquisition tabulation
(example) with partial results
The knowledge acquisition engine
The sample results of using the English corpus of 2,400,000,000 word amounts to obtain
" vigilant eyes " are carried out the result that Concept Mining obtains
Phrase Relative mark
?1 Vigilant eyes 669
?2 Control 17
?3 Supervision 13
?4 Instruct 9
?5 Authority 9
?6 Indication 8
?7 Protection 8
?8 Order 8
?9 Influence 8
?10 Authority 7
?11 Umbrella 6
?12 Support 5
?13 The leader 5
?14 Protect 5
?15 Patronage 5
?16 Close examination 5
?17 Pressure 5
?18 Order 5
?19 Scrupulous 5
?20 Poster 4
?21 Management 4
?22 Eyes 4
?23 Be responsible for 4
?24 Nose 4
?25 Pay close attention to 4
The knowledge acquisition engine
The sample results of using the English corpus of 2,400,000,000 word amounts to obtain
" significant " carried out the result that Concept Mining obtains
Phrase Relative mark
?1 Significant 984
?2 Far reaching 24
?3 Positive 22
?4 Main 20
?5 Useful 20
?6 In essence 17
7 Really 16
8 Big 15
9 Directly 14
10 Constructive 13
11 Great 13
12 Important 12
13 Bigger 12
14 Greatly 11
15 Unique 11
16 Valuable 11
17 Basic 10
18 Huge 10
19 Crucial 10
20 Conclusive 10
21 Core 9
22 Big 9
23 Lasting 9
24 Special 9
25 Alright 9
The knowledge acquisition engine
The sample results of using the English corpus of 2,400,000,000 word amounts to obtain
" demonstration " carried out the result that Concept Mining obtains
Phrase Relative mark
?1 Demonstration 917
?2 On probation 9
?3 Version 8
?4 Assessment 8
?5 Download 5
?6 Duplicate 4
?7 Assessment in 30 days 4
?8 pdf 3
?9 The assessment copy 3
?10 30 days on probation 3
?11 30 days beta releases 3
?12 Pamphlet 3
?13 The demo of software 3
?14 Sample 3
?15 Shareware 3
?16 The onlooker 3
?17 The player 3
?18 Shareware 3
?19 The Acrobat reader 2
?20 Help 2
?21 Final version 2
?22 The assessment version 2
?23 The realization of this product 2
24 The reader 2
25 Show 2
The knowledge acquisition engine
The sample results of using the English corpus of 2,400,000,000 word amounts to obtain
" God " is carried out the result that Concept Mining obtains
Phrase Relative mark
?1 God 956
?2 The paradise 19
?3 Main 17
?4 Refreshing 17
?5 He (objective case) 15
?6 Jehovah 13
?7 The angle 12
?8 He (nominative) 10
?9 The people 9
?10 The Allah 9
?11 Christ 8
?12 Godhead 7
?13 Jesus 7
?14 Refreshing 7
?15 Last main 7
?16 You 6
?17 It 6
?18 One 6
?19 The father 6
?20 Canada 5
?21 The priest 5
?22 God he 5
?23 Like that refreshing 5
?24 Rule 4
?25 God in the conviction 4
The knowledge acquisition engine
The sample results of using the English corpus of 2,400,000,000 word amounts to obtain
" meeting " carried out the result that Concept Mining obtains
Phrase Relative mark
?1 Meeting 982
?2 Meeting 73
?3 Parliament 45
?4 Symposial 40
?5 Forum 30
?6 Summary 27
?7 Conference 23
?8 Repeatedly meeting 23
?9 Discussion 18
10 The meeting of holding 18
11 Summit 16
12 The meeting phase 16
13 Project 15
14 Forum 15
15 Congress 15
16 Rally 14
17 Repeatedly have a meeting 13
18 Engineering 13
19 Incident 13
20 Fact-finding mission 13
21 The council 13
22 International conference 12
23 During this time 10
24 Joint conference 9
25 General talks 9
The knowledge acquisition engine
The sample results of using the English corpus of 2,400,000,000 word amounts to obtain
" Arizona State " carried out the result that Concept Mining obtains
Phrase Relative mark
?1 The Arizona State 953
?2 The Florida State 52
?3 The California 50
?4 The Iowa 42
?5 The Ohio 41
?6 The Illinois 40
?7 The state of Michigan 40
?8 The state of Colorado 37
?9 Washington 35
?10 The Utah State 32
?11 The Georgia 32
?12 ? The Arizona, American Samoa 31 ?
?13 The Tennessee State 31
?14 The Oregon 30
?15 Pennsylvania 29
?16 The Texas 28
?17 The Minnesota State 28
?18 The New Mexico 28
?19 The Kansas State 27
?20 ? The North Carolina 24 ?
?21 The Louisiana 24
?22 The Oklahoma 23
?23 That 23
?24 The Virginia 23
25 The Arkansas State 22
The knowledge acquisition engine
The sample results of using the English corpus of 2,400,000,000 word amounts to obtain
" WWW " carried out the result that Concept Mining obtains
Phrase Relative mark
?1 WWW 940
?2 Web 122
?3 The Internet 81
?4 www 35
?5 World wide web 13
?6 Whole world WWW 12
?7 The Internet WWW 11
?8 New 10
?9 Official 9
?10 Website 9
?11 The website 9
?12 Website 9
?13 New WWW 8
?14 The website of company 7
?15 Company web page 7
?16 Main 6
?17 Company's site 6
?18 Homepage 6
?19 Support 6
?20 Official website 5
The knowledge acquisition engine
The sample results of using the English corpus of 2,400,000,000 word amounts to obtain
" analysis " carried out the result that Concept Mining obtains
Phrase Relative mark
?1 Analyze 971
?2 Analyze 12
?3 Determine 8
?4 Improve 8
?5 Assessment 7
?6 Estimation 7
?7 Understand 7
?8 Check 7
?9 Estimate 7
?10 Right 6
?11 In 6
?12 Use 5
?13 Relatively 5
?14 Tolerance 5
15 Obtain 5
16 Verification 5
17 Research 4
18 Minimize 4
19 Investigation 4
20 Reduce 4
21 Test 4
22 That 4
23 Skew 4
24 Check 4
25 Isolated 4
The knowledge acquisition engine
The sample results of using the English corpus of 2,400,000,000 word amounts to obtain
" for information about " carried out the result that Concept Mining obtains
Phrase Relative mark
?1 For information about 978
?2 Relevant information 167
?3 Related information 73
?4 Relevant details 63
?5 Relevant 51
?6 Correlative detail 46
?7 About 42
?8 Relevant information 31
?9 Relevent information 28
?10 Informational linkage 25
?11 ? In detail for information about 25 ?
?12 Details 24
?13 ? Relevant details 17 ?
?14 Information please be got in touch 16
?15 Related consulting 16
?16 ? Relevant wherein any information 13 ?
?17 ? With wherein any relevant information 12 ?
?18 Related details 12
?19 Message reference 12
?20 Information inspection 12
?21 Relevant financial information 11
?22 Relevant information 9
?23 ? Relevant general information 9 ?
?24 Information or registration 9
?25 ? The information of relevant use 8 ?
The knowledge acquisition engine
The sample results of using the English corpus of 2,400,000,000 word amounts to obtain
" insurance ground is said " carried out the result that Concept Mining obtains
Phrase Relative mark
?1 Insurance ground is said 148
?2 In all conscience 24
?3 That is very important 16
?4 You can find 12
?5 Clearly 12
?6 Say liberally 11
?7 We agree 11
?8 True 10
?9 We can say 10
?10 Very important 9
?11 Importantly 9
?12 Equally 8
?13 ? Importantly recognize 8 ?
?14 Unfortunately 7
?15 Very clear 7
?16 Now 7
?17 ? It is fair saying so 7 ?
?18 Obviously 7
?19 We know 7
?20 It is said 7
?21 Obviously 7
?22 Commonly knownly be 7
?23 ? We should also remember 7 ?
?24 Importantly remember 6
?25 He can find 6
?26 Say safely 6
The knowledge acquisition engine
The sample results of using the English corpus of 2,400,000,000 word amounts to obtain
" country's maximum " carried out the result that Concept Mining obtains
Phrase Relative mark
?1 Country's maximum 674
?2 Maximum 70
?3 Whole nation maximum 29
?4 The biggest in the world 25
?5 Take the lead 23
?6 Best 20
?7 Maximum 19
?8 The oldest 14
9 Quality is best 14
10 First 12
11 Main 9
12 Greatest 8
13 The whole nation takes the lead 8
14 The strongest 8
15 Few 7
16 Advanced in the world 7
17 The biggest in the world 7
18 Top 6
19 Fastest-rising 6
20 Most important 6
21 Britain's maximum 6
22 The most successful 6
23 The earliest 5
24 The richest 5
The knowledge acquisition engine
The sample results of using the English corpus of 2,400,000,000 word amounts to obtain
" CEO " carried out the result that Concept Mining obtains
Phrase Relative mark
?1 CEO 953
?2 The first executive officer 178
?3 The top manager 74
?4 The general manager (GM) 35
?5 The chief operating officer 28
?6 The founder 25
?7 President 24
?8 Chairman 24
?9 The director 20
?10 Common founder 16
?11 The vice president 13
?12 General counsellor 12
?13 Head 12
?14 The managing director 12
?15 The chief financial officer 11
?16 The executive director 11
?17 The vice president 10
?18 CFO 9
?19 COO 9
?20 The member 9
?21 The publisher 9
?22 The cashier 7
?23 The secretary 6
?24 Always 6
The knowledge acquisition engine
The sample results of using the English corpus of 2,400,000,000 word amounts to obtain
" terms and conditions " are carried out the result that Concept Mining obtains
Phrase Relative mark
?1 Terms and conditions 969
?2 Clause 334
?3 Condition 153
?4 Terms of Use 105
?5 Collateral condition 83
?6 Terms of service 65
?7 Rule 58
?8 ? Terms of Use and condition 48
?9 Requirement 44
?10 Guilding principle 35
?11 Flow process 28
?12 Restriction 25
?13 Policy 24
?14 Principle 19
?15 Restriction 19
?16 Regulation 19
?17 Standard 17
?18 Service condition 17
?19 TOS 16
?20 Information 15
?21 ? Clause and collateral condition 15
?22 Criterion 15
?23 Following terms and conditions 14
?24 Regulation and regulation 13
?25 The website clause 13
The knowledge acquisition engine
The sample results of using the English corpus of 2,400,000,000 word amounts to obtain
" rule and regulation " carried out the result that Concept Mining obtains
Phrase Relative mark
?1 Rule and regulation 978
?2 Rule 61
?3 Regulation 48
?4 Guilding principle 28
?5 Terms and conditions 26
?6 Requirement 23
?7 Condition 22
?8 Flow process 21
?9 Institute clause 19
?10 Clause 18
?11 Policy 18
?12 Law 17
13 Standard 13
14 Principle 13
15 Criterion 11
16 Decree 11
17 Rule and flow process 9
18 Flow process 8
19 Rule 8
20 Instruction 8
21 Policy and flow process 8
22 Policy 8
23 Instruct 7
24 Arrange 7
25 Laws and regulations 6
The knowledge acquisition engine
The sample results of using the English corpus of 2,400,000,000 word amounts to obtain
" the Al-Qaeda terrorist organization " carried out the result that Concept Mining obtains
Phrase Relative mark
1 The Al-Qaeda terrorist organization *
2 Al-qaida *
3 Al-qaeda *
4 Al?qaida *
5 Al-qa?eda *
6 Osama Ben Ladeng *
7 The terrorist *
8 Al-qaeda *
9 It *
10 Al-qa’ida *
11 The whole world *
12 They *
13 Al?queda *
Figure A0382572901681
Appendix A-knowledge acquisition tabulation
Example with whole results)
" being important to note that " carried out the result that knowledge acquisition obtains
Phrase Relative mark
1 Be important to note that 249
2 In all conscience 16
3 Importantly 16
4 Clearly 12
5 That 8
6 This is very important 8
7 We agree 8
8 Importantly be familiar with 8
9 You can find 7
10 In fact 7
11 That is disgraced 7
12 Insurance ground is said 7
13 Be important to note that 6
14 Ironically 6
15 Be 6
16 Should point out 6
17 We should remember 6
18 Unfortunately 5
19 Be clear that very much 5
20 He can find 5
21 In all conscience 5
22 So to say that 5
23 Clearly 5
24 We know 5
25 We must know 5
26 We should remember 5
27 That is well-known 5
28 This shows 5
29 We know 5
30 He knows 4
31 Key is 4
32 I we can say 4
33 Obviously 4
34 Be clear that very much 4
35 Should note 4
36 We can find 4
37 You it will be appreciated that 4
38 We know 4
39 It must be admitted that for we 4
40 {。##.##1}, 3
41 In time 3
42 It is 3
43 He can agree 3
44 Necessary 3
45 You can find 3
46 Relevant 3
47 Everyone can agree 3
48 It is important to remember 3
49 We must understand 3
50 You can agree 3
51 We are verified 3
52 You know 3
53 You can agree 3
54 He says 3
55 I should 3
56 And importantly 3
57 Very unfortunately 3
58 It shows 3
59 People understand 3
60 Answer is 3
61 We should look at 3
62 Equally 3
63 Everyone can agree 3
64 He is clear 3
65 His meaning is 3
66 I am listening 3
67 I am verified 3
68 I say 3
69 I must say 3
70 In view of such fact 3
71 Very ignominiously be 3
72 It is apparent that 3
54
Figure A0382572901741
Figure A0382572901771
Figure A0382572901791
Figure A0382572901801
Figure A0382572901821
Figure A0382572901831
Figure A0382572901851
Figure A0382572901891
Figure A0382572901941
Figure A0382572901961
Figure A0382572901981
Figure A0382572902021
Figure A0382572902041
Figure A0382572902051
Figure A0382572902061
Figure A0382572902091
Figure A0382572902102
Figure A0382572902111
Figure A0382572902121
Figure A0382572902131
Figure A0382572902142
Figure A0382572902151
Figure A0382572902161
Figure A0382572902191
Figure A0382572902201
Figure A0382572902202
Figure A0382572902211
Figure A0382572902241
Figure A0382572902251
Appendix B-
Use parallel text and the overlapping example of translating
Attempt translation (from English to the Spanish):
you?can?also?rename?the?file?and?write?code?that?affects?the?project?in?order?to complete?the?application?for?information?on?creating?applications ?
Checking?db?for:you?can?also?rename?the?file?and?write?code?that?affects?the?project in?order?to?complete?the?application?for?information?on?creating?applications found?in?1?files(took?0.085?Seconds)
Checking?db?for:you?can?also?rename?the?file?and?write?code?that?affects?the?project in?order?to?complete?the?application?for?information?on?creating found?in?1?files(took?0.082?Seconds)
Checking?db?for:you?can?also?rename?the?file?and?write?code?that?affects?the?project in?order?to?complete?the?application?for?information?on found?in?1?files(took?0.082?Seconds)
Checking?db?for:you?can?also?rename?the?file?and?write?code?that?affects?the?project in?order?to?complete?the?application?for?information found?in?1?files(took?0.084?Seconds)
Checking?db?for:you?can?also?rename?the?file?and?write?code?that?affects?the?project in?order?to?complete?the?application?for found?in?1?files(took?0.082?Seconds)
Checking?db?for:you?can?also?rename?the?file?and?write?code?that?affects?the?project in?order?to?complete?the?application found?in?1?files(took?0.082?Seconds)
Checking?db?for:you?can?also?rename?the?file?and?write?code?that?affects?the?project in?order?to?complete?the found?in?1?files(took?0.082?Seconds)
Checking?db?for:you?can?also?rename?the?file?and?write?code?that?affects?the?project in?order?to?complete found?in?1?files(took?0.082?Seconds)
Checking?db?for:you?can?also?rename?the?file?and?write?code?that?affects?the?project in?order?to found?in?1?files(took?0.082?Seconds)
Checking?db?for:you?can?also?rename?the?file?and?write?code?that?affects?the?project in?order found?in?1?files(took?0.082?Seconds)
Checking?db?for:you?can?also?rename?the?file?and?write?code?that?affects?the?project in found?in?1?files(took?0.082?Seconds)
Checking?db?for:you?can?also?rename?the?file?and?write?code?that?affects?the?project found?in?1?files(took?0.082?Seconds)
Checking?db?for:you?can?also?rename?the?file?and?write?code?that?affects?the found?in?1?files(took?0.082?Seconds)
Checking?db?for:you?can?also?rename?the?file?and?write?code?that?affects found?in?1?files(took?0.082?Seconds)
Checking?db?for:you?can?also?rename?the?file?and?write?code?that found?in?1?files(took?0.082?Seconds)
Checking?db?for:you?can?also?rename?the?file?and?write?code found?in?1?files(took?0.082?Seconds)
Checking?db?for:you?can?also?rename?the?file?and?write found?in?1?files(took?0.083?Seconds)
Checking?db?for:you?can?also?rename?the?file?and found?in?1?files(took?0.082?Seconds)
Checking?db?for:you?can?also?rename?the?file found?in?1?files(took?0.053?Seconds)
Checking?db?for:you?can?also?rename?the found?in?1?files(took?0.048?Seconds)
Checking?db?for:you?can?also?rename found?in?4?files(took?0.047?Seconds)
Checking?db?for:you?can?also found?in?1000?files(took?0.032?Seconds) □□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□ □□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□ □□□□□□□□Will?check? 100?files
File?comparison?took?4.865?Seconds.
The frequency meter of you can also
Figure A0382572902281
Checking?db?for:can?also?rename?the?file?and?write?code?that?affects?the?project?in order?to?complete?the?application?for?information?on?creating?applications found?in?1?files(took?0.038?Seconds)
Checking?db?for:can?also?rename?the?file?and?write?code?that?affects?the?project?in order?to?complete?the?application?for?information?on?creating found?in?1?files(took?0.038?Seconds)
Checking?db?for:can?also?rename?the?file?and?write?code?that?affects?the?project?in order?to?complete?the?application?for?information?on found?in?1?files(took?0.038?Seconds)
Checking?db?for:can?also?rename?the?file?and?write?code?that?affects?the?project?in order?to?complete?the?application?for?information found?in?1?files(took?0.037?Seconds)
Checking?db?for:can?also?rename?the?file?and?write?code?that?affects?the?project?in order?to?complete?the?application?for found?in?1?files(took?0.038?Seconds)
Checking?db?for:can?also?rename?the?file?and?write?code?that?affects?the?project?in order?to?complete?the?application found?in?1?files(took?0.038?Seconds)
Checking?db?for:can?also?rename?the?file?and?write?code?that?affects?the?project?in order?to?complete?the found?in?1?files(took?0.038?Seconds)
Checking?db?for:can?also?rename?the?file?and?write?code?that?affects?the?project?in order?to?complete found?in?1?files(took?0.038?Seconds)
Checking?db?for:can?also?rename?the?file?and?write?code?that?affects?the?project?in order?to found?in?1?files(took?0.580?Seconds)
Checking?db?for:can?also?rename?the?file?and?write?code?that?affects?the?project?in order found?in?1?files(took?0.038?Seconds)
Checking?db?for:can?also?rename?the?file?and?write?code?that?affects?the?project?in found?in?1?files(took?0.038?Seconds)
Checking?db?for:can?also?rename?the?file?and?write?code?that?affects?the?project found?in?1?files(took?0.037?Seconds)
Checking?db?for:can?also?rename?the?file?and?write?code?that?affects?the found?in?1?files(took?0.037?Seconds)
Checking?db?for:can?also?rename?the?file?and?write?code?that?affects found?in?1?files(took?0.037?Seconds)
Checking?db?for:can?also?rename?the?file?and?write?code?that found?in?1?files(took?0.037?Seconds)
Checking?db?for:can?also?renane?the?file?and?write?code found?in?1?files(took?0.040?Seconds)
Checking?db?for:can?also?rename?the?file?and?write found?in?1?files(took?0.039?Seconds)
Checking?db?for:can?also?rename?the?file?and found?in?1?files(took?0.037?Seconds)
Checking?db?for:can?also?rename?the?file found?in?1?files(took?0.008?Seconds)
Checking?db?for:can?also?rename?the found?in?4?files(took?0.003?Seconds)
Checking?db?for:can?also?rename found?in?33?files(took?0.002?Seconds) □□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□Will?check? 33?files
File?comparison?took?1.774?Seconds.
The frequency meter of you can also
Figure A0382572902301
You can also rename may translate (using overlapping)
Checking?db?for:also?rename?the?file?and?write?code?that?affects?the?project?in?order to?complete?the?application?for?information?on?creating?applications found?in?1?files(took?0.038?Seconds)
Checking?db?for:also?rename?the?file?and?write?code?that?affects?the?project?in?order to?complete?the?application?for?information?on?creating found?in?1?files(took?0.038?Seconds)
Checking?db?for:also?rename?the?file?and?write?code?that?affects?the?project?in?order to?complete?the?application?for?information?on found?in?1?files(took?0.038?Seconds)
Checking?db?for:also?rename?the?file?and?write?code?that?affects?the?project?in?order to?complete?the?application?for?information found?in?1?files(took?0.038?Seconds)
Checking?db?for:also?rename?the?file?and?write?code?that?affects?the?project?in?order to?complete?the?application?for found?in?1?files(took?0.038?Seconds)
Checking?db?for:also?rename?the?file?and?write?code?that?affects?the?project?in?order to?complete?the?application found?in?1?files(took?0.038?Seconds)
Checking?db?for:also?rename?the?file?and?write?code?that?affects?the?project?in?order to?complete?the found?in?1?files(took?0.038?Seconds)
Checking?db?for:also?rename?the?file?and?write?code?that?affects?the?project?in?order to?complete found?in?1?files(took?0.038?Seconds)
Checking?db?for:also?rename?the?file?and?write?code?that?affects?the?project?in?order to found?in?1?files(took?0.038?Seconds)
Checking?db?for:also?rename?the?file?and?write?code?that?affects?the?project?in?order found?in?1?files(took?0.038?Seconds)
Checking?db?for:also?rename?the?file?and?write?code?that?affects?the?project?in found?in?1?files(took?0.038?Seconds)
Checking?db?for:also?rename?the?file?and?write?code?that?affects?the?project found?in?1?files(took?0.040?Seconds)
Checking?db?for:also?rename?the?file?and?write?code?that?affects?the found?in?1?files(took?0.038?Seconds)
Checking?db?for:also?rename?the?file?and?write?code?that?affects found?in?1?files(took?0.039?Seconds)
Checking?db?for:also?rename?the?file?and?write?code?that found?in?1?files(took?0.038?Seconds)
Checking?db?for:also?rename?the?file?and?write?code found?in?1?files(took?0.038?Seconds)
Checking?db?for:also?rename?the?file?and?write found?in?1?files(took?0.035?Seconds) ?
Checking?db?for:also?rename?the?file?and found?in?1?files(took?0.034?Seconds)
Checking?db?for:also?rename?the?file found?in?1?files(took?0.007?Seconds)
Checking?db?for:also?rename?the found?in?4?files(took?0.001?Seconds)
Checking?db?for:rename?the?file?and?write?code?that?affects?the?project?in?order?to complete?the?application?for?information?on?creating?applications found?in?1?files(took?0.045?Seconds)
Checking?db?for:rename?the?file?and?write?code?that?affects?the?project?in?order?to complete?the?application?for?information?on?creating found?in?1?files(took?0.044?Seconds)
Checking?db?for:rename?the?file?and?write?code?that?affects?the?project?in?order?to complete?the?application?for?information?on found?in?1?files(took?0.044?Seconds)
Checking?db?for:rename?the?file?and?write?code?that?affects?the?project?in?order?to complete?the?application?for?information found?in?1?files(took?0.044?Seconds)
Checking?db?for:rename?the?file?and?write?code?that?affects?the?project?in?order?to complete?the?application?for found?in?1?files(took?0.044?Seconds)
Checking?db?for:rename?the?file?and?write?code?that?affects?the?project?in?order?to complete?the?application found?in?1?files(took?0.044?Seconds)
Checking?db?for:rename?the?file?and?write?code?that?affects?the?project?in?order?to complete?the found?in?1?files(took?0.043?Seconds)
Checking?db?for:rename?the?file?and?write?code?that?affects?the?project?in?order?to complete found?in?1?files(took?0.045?Seconds)
Checking?db?for:rename?the?file?and?write?code?that?affects?the?project?in?order?to found?in?1?files(took?0.044?Seconds)
Checking?db?for:rename?the?file?and?write?code?that?affects?the?project?in?order found?in?1?files(took?0.044?Seconds)
Checking?db?for:rename?the?file?and?write?code?that?affects?the?project?in found?in?1?files(took?0.044?Seconds)
Checking?db?for:rename?the?file?and?write?code?that?affects?the?project found?in?1?files(took?0.044?Seconds)
Checking?db?for:rename?the?file?and?write?code?that?affects?the found?in?1?files(took?0.043?Seconds)
Checking?db?for:rename?the?file?and?write?code?that?affects found?in?1?files(took?0.044?Seconds)
Checking?db?for:rename?the?file?and?write?code?that found?in?1?files(took?0.043?Seconds)
Checking?db?for:rename?the?file?and?write?code found?in?1?files(took?0.037?Seconds)
Checking?db?for:rename?the?file?and?write found?in?1?files(took?0.036?Seconds)
Checking?db?for:rename?the?file?and found?in?3?files(took?0.034?Seconds)
Checking?db?for:rename?the?file found?in?117?files(took?0.005?Seconds) □□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□ □□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□ □□□□□□□□Will?check? 100?files
File?comparison?took?5.326?Seconds.
The frequency meter of rename the file
Figure A0382572902341
You can also rename the file may translate (using overlapping)
Figure A0382572902351
Checking?db?for:the?file?and?write?code?that?affects?the?project?in?order?to?complete the?application?for?information?on?creating?applications found?in?1?files(took?0.040?Seconds)
Checking?db?for:the?file?and?write?code?that?affects?the?project?in?order?to?complete the?application?for?information?on?creating found?in?1?files(took?0.040?Seconds)
Checking?db?for:the?file?and?write?code?that?affects?the?project?in?order?to?complete the?application?for?information?on found?in?1?files(took?0.039?Seconds)
Checking?db?for:the?file?and?write?code?that?affects?the?project?in?order?to?complete the?application?for?information found?in?1?files(took?0.043?Seconds)
Checking?db?for:the?file?and?write?code?that?affects?the?project?in?order?to?complete the?application?for found?in?1?files(took?0.041?Seconds)
Checking?db?for:the?file?and?write?code?that?affects?the?project?in?order?to?complete the?application found?in?1?filed(took?0.040?Seconds)
Checking?db?for:the?file?and?write?code?that?affects?the?project?in?order?to?complete the found?in?1?files(took?0.040?Seoonds)
Checking?db?for:the?file?and?write?code?that?affects?the?project?in?order?to?complete found?in?1?files(took?0.040?Seconds)
Checking?db?for:the?file?and?write?code?that?affects?the?project?in?order?to found?in?1?files(took?0.040?Seconds)
Checking?db?for:the?file?and?write?code?that?affects?the?project?in?order found?in?1?files(took?0.040?Seconds)
Checking?db?for:the?file?and?write?code?that?affects?the?project?in found?in?1?files(took?0.040?Seconds)
Checking?db?for:the?file?and?write?code?that?affects?the?project found?in?1?files(took?0.040?Seconds)
Checking?db?for:the?file?and?write?code?that?affects?the found?in?1?files(took?0.040?Seconds)
Checking?db?for:the?file?and?write?code?that?affects found?in?1?files(took?0.040?Seconds)
Checking?db?for:the?file?and?write?code?that found?in?1?files(took?0.039?Seconds)
Checking?db?for:the?file?and?write?code found?in?1?files(took?0.033?Seconds)
Checking?db?for:the?file?and?wirte found?in?6?files(took?0.031?Seconds)
Checking?db?for:the?file?and found?in?664?files(took?0.432?Seconds) □□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□ □□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□ □□□□□□□□Will?check? 100?files
File?comparison?took?10.28?Seconds.
The frequency meter of the file and
Figure A0382572902371
You can also rename the file and may translate (using overlapping)
Figure A0382572902372
Checking?db?for:file?and?write?code?that?affects?the?project?in?order?to?complete?the application?for?information?on?creating?applications found?in?1?files(took?0.012?Seconds)
Checking?db?for:file?and?write?code?that?affects?the?project?in?order?to?complete?the application?for?information?on?creating found?in?1?files(took?0.011?Seconds) ?
Checking?db?for:file?and?write?code?that?affects?the?project?in?order?to?complete?the application?for?information?on found?in?1?files(took?0.011?Seconds)
Checking?db?for:file?and?write?code?that?affects?the?project?in?order?to?complete?the application?for?information found?in?1?files(took?0.011?Seconds)
Checking?db?for:file?and?write?code?that?affects?the?project?in?order?to?complete?the application?for found?in?1?files(took?0.011?Seconds)
Checking?db?for:file?and?write?code?that?affects?the?project?in?order?to?complete?the application found?in?1?files(took?0.011?Seconds)
Checking?db?for:file?and?write?code?that?affects?the?project?in?order?to?complete?the found?in?1?files(took?0.011?Seconds)
Checking?db?for:file?and?write?code?that?affects?the?project?in?order?to?complete found?in?1?files(took?0.011?Seconds)
Checking?db?for:file?and?write?code?that?affects?the?project?in?order?to found?in?1?files(took?0.011?Seconds)
Checking?db?for:file?and?write?code?that?affects?the?project?in?order found?in?1?files(took?0.011?Seconds)
Checking?db?for:file?and?write?code?that?affects?the?project?in found?in?1?files(took?0.011?Seconds)
Checking?db?for:file?and?write?code?that?affects?the?project found?in?1?files(took?0.011?Seconds)
Checking?db?for:file?and?write?code?that?affects?the found?in?1?files(took?0.011?Seconds)
Checking?db?for:file?and?write?code?that?afiects found?in?1?files(took?0.009?Seconds)
Checking?db?for:file?and?write?code?that found?in?1?files(took?0.696?Seconds)
Checking?db?for:file?and?write?code found?in?1?files(took?0.003?Seconds)
Checking?db?for:file?and?write found?in?14?files(took?0.001?Seconds) □□□□□□□□□□□□□□Will?check. 14?files
File?comparison?took?0.949?Seconds.
The frequency meter of file and write
You can also rename the file and write may translate (using overlapping)
Checking?db?for:and?write?code?that?affects?the?project?in?order?to?complete?the application?for?information?on?creating?applications found?in?1?files(took?0.011?Seconds)
Checking?db?for:and?write?code?that?affects?the?project?in?order?to?complete?the application?for?information?on?creating found?in?1?files(took?0.010?Seconds)
Checking?db?for:and?write?code?that?affects?the?project?in?order?to?complete?the application?for?information?on found?in?1?files(took?0.010?Seconds)
Checking?db?for:and?write?code?that?affects?the?project?in?order?to?complete?the application?for?information found?in?1?files(took?0.010?Seconds)
Checking?db?for:and?write?code?that?affects?the?project?in?order?to?complete?the application?for found?in?1?files(took?0.010?Seconds)
Checking?db?for:and?write?code?that?affects?the?project?in?order?to?complete?the application found?in?1?files(took?0.010?Seconds)
Checking?db?for:and?write?code?that?affects?the?project?in?order?to?complete?the found?in?1?files(took?0.012?Seconds)
Checking?db?for:and?write?code?that?affects?the?project?in?order?to?complete found?in?1?files(took?0.011?Seconds)
Checking?db?for:and?write?code?that?affects?the?project?in?order?to found?in?1?files(took?0.010?Seconds)
Checking?db?for:and?write?code?that?affects?the?project?in?order found?in?1?files(took?0.010?Seconds)
Checking?db?for:and?write?code?that?affects?the?project?in found?in?1?files(took?0.011?Seconds)
Checking?db?for:and?write?code?that?affects?the?project found?in?1?files(took?0.010?Seconds)
Checking?db?for:and?write?code?that?affects?the found?in?1?files(took?0.010?Seconds)
Checking?db?for:and?write?code?that?affects found?in?1?files(took?0.008?Seconds)
Checking?db?for:and?write?code?that found?in?3?files(took?0.008?Seconds)
Checking?db?for:and?write?code found?in?35?files(took?0.002?Seconds) □□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□Will?check? 35?files
File?comparison?took?2.702?Seconds.
The frequency meter of and write code
You can also rename the file and write code may translate (using overlapping)
Figure A0382572902421
Checking?db?for:write?code?that?affects?the?project?in?order?to?complete?the application?for?information?on?creating?applications found?in?1?files(took?0.018?Seconds)
Checling?db?for:write?code?that?affects?the?project?in?order?to?complete?the application?for?information?on?creating found?in?1?files(took?0.017?Seoonds)
Checking?db?for:write?code?that?affects?the?project?in?order?to?complete?the application?for?information?on found?in?1?files(took?0.018?Seconds)
Checking?db?for:write?code?that?affects?the?project?in?orderto?complete?the application?for?information found?in?1?files(took?0.017?Seconds)
Checking?db?for:write?code?that?affects?the?project?in?order?to?complete?the application?for found?in?1?files(took?0.017?Seconds)
Checking?db?for:write?code?that?affects?the?project?in?order?to?complete?the application found?in?1?files(took?0.017?Seconds)
Checking?db?for:write?code?that?affects?the?project?inorder?to?complete?the found?in?1?files(took?0.017?Seconds)
Checking?db?for:write?code?that?affects?the?project?in?order?to?complete found?in?1?files(took?0.017?Seconds)
Checking?db?for:write?code?that?affects?the?project?in?order?to found?in?1?files(took?0.017?Seconds)
Checking?db?for:write?code?that?affects?the?project?in?order found?in?1?files(took?0.017?Seconds)
Checking?db?for:write?code?that?affects?the?project?in found?in?1?files(took?0.017?Seconds)
Checking?db?for:write?code?that?affects?the?project found?in?1?files(took?0.009?Seconds)
Checking?db?for:write?code?that?affects?the found?in?1?files(took?0.008?Seconds)
Checking?db?for:write?code?that?affects found?in?1?ffies(took?0.006?Seconds)
Checking?db?for:write?code?that fonnd?in?126?files(took?0.005?Seconds) □□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□ □□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□ □□□□□□□□Wiil?check? 100?files
File?comparison?took?9.389?Seconds.
The frequency meter of Write code that
Figure A0382572902431
Figure A0382572902441
You can also rename the file and write code that may translate (using overlapping)
Figure A0382572902442
Checking?db?for:code?that?affects?the?project?in?order?to?complete?the?application for?information?on?creating?applications found?in?1?files(took?0.013?Seconds)
Checking?db?for:code?that?affects?the?project?in?order?to?complete?the?application for?in?formation?on?creating found?in?1?files(took?0.013?Seconds)
Checking?db?for:code?that?affects?the?project?in?order?to?complete?the?application for?information?on found?in?1?files(took?0.012?Seconds)
Checking?db?for:code?that?affects?the?project?in?order?to?complete?the?application for?information found?in?1?files(took?0.012?Seconds)
Checking?db?for:code?that?affects?the?project?in?order?to?complete?the?application for found?in?1?files(took?0.013?Seconds)
Checking?db?for:code?that?affects?the?project?in?order?to?complete?the?application found?in?1?files(took?0.012?Seconds)
Checking?db?for:code?that?affects?the?project?in?order?to?complete?the found?in?1?files(took?0.014?Seconds)
Checking?db?for:code?that?affects?the?project?in?order?to?complete found?in?1?files(took?0.012?Seconds)
Checking?db?for:code?that?affects?the?project?in?order?to found?in?1?files(took?0.012?Seconds)
Checking?db?for:code?that?affects?the?project?in?order found?in?1?files(took?0.012?Seconds)
Checking?db?for:code?that?affects?the?project?in found?in?1?files(took?0.011?Seconds)
Checking?db?for:code?that?affects?the?project found?in?1?files(took?0.003?Seconds)
Checking?db?for:code?that?affects?the found?in?1?files(took?0.002?Seconds)
Checking?db?for:code?that?affects found?in?1?files(took?0.699?Seconds)
Checking?db?for:that?affects?the?project?in?order?to?complete?the?application?for information?on?creating?applications found?in?1?files(took?0.056?Seconds)
Checking?db?for:that?affects?the?project?in?order?to?complete?the?application?for information?on?creating found?in?1?files(took?0.055?Seconds)
Checking?db?for:that?affects?the?project?in?order?to?complete?the?application?for information?on found?in?1?files(took?0.055?Seconds)
Checking?db?for:that?affects?the?project?in?order?to?complete?the?application?for information found?in?1?files(took?0.055?Seconds)
Checking?db?for:that?affects?the?project?in?order?to?complete?the?application?for found?in?1?files(took?0.055?Seconds)
Checking?db?for:that?affects?the?project?in?order?to?complete?the?application found?in?1?files(took?0.055?Seconds)
Checking?db?for:that?affects?the?project?in?order?to?complete?the found?in?1?files(took?0.055?Seconds)
Checking?db?for:that?affects?the?project?in?order?to?complete found?in?1?files(took?0.054?Seconds)
Checking?db?for:that?affects?the?project?in?order?to found?in?1?files(took?0.055?Seconds)
Checking?db?for:that?affects?the?pr0ject?in?order found?in?1?files(took?0.011?Seconds)
Checking?db?for:that?affects?the?project?in found?in?1?files(took?0.010?Seconds)
Checking?db?for:that?affects?the?project found?in?1?files(took?0.002?Seconds)
Checking?db?for:that?affects?the found?in?27?files(took?0.001?Seconds) □□□□□□□□□□□□□□□□□□□□□□□□□□□Will?check? 27?files
File?comparison?took?1.895?Seconds.
The frequency meter of that affects the
Figure A0382572902461
Figure A0382572902471
You can also rename the file and write code that afftect may translate (using overlapping)
Figure A0382572902472
Figure A0382572902481
Checking?dn?for:affects?the?project?in?order?to?complete?the?application?for information?on?creating?applications found?in?1?files(took?0.059?Seconds)
Checking?db?for:affects?the?project?in?order?to?complete?the?application?for information?on?creating found?in?1?files(took?0.058?Seconds)
Checking?db?for:affects?the?project?in?order?to?complete?the?application?for information?on found?in?1?files(took?0.058?Seconds)
Checking?db?for:affects?the?project?in?order?to?complete?the?application?for information found?in?1?files(took?0.058?Seconds)
Checking?db?for:affects?the?project?in?order?to?complete?the?application?for found?in?1?files(took?0.058?Seconds)
Checking?db?for:affects?the?project?in?order?to?complete?the?application found?in?1?files(took?0.058?Seconds)
Checking?db?for:affects?the?project?in?order?to?complete?the found?in?1?files(took?0.058?Seconds)
Checking?db?for:affects?the?project?in?order?to?complete found?in?1?files(took?0.058?Seconds)
Checking?db?for:affects?the?project?in?order?to found?in?1?files(took?0.054?Seconds)
Checking?db?for:affects?the?project?in?order found?in?1?files(took?0.010?Seconds)
Checking?db?for:affects?the?project?in found?in?1?files(took?0.008?Seconds)
Checking?db?for:affects?the?project found?in?2?files(took?0.001?Seconds)
Checking?db?for:the?project?in?order?to?complete?the?application?for?information?on?creating?applications found?in?1?files(took?0.099?Seconds)
Checking?db?for:the?project?in?order?to?complete?the?application?for?information?on?creating found?in?1?files(took?0.098?Seconds)
Checking?db?for:the?project?in?order?to?complete?the?application?for?information?on?found?in?1?files(took?0.099?Seconds)
Checking?db?for:the?project?in?order?to?complete?the?application?for?information?found?in?1?files(took?0.999?Seconds)
Checking?db?for:the?project?in?order?to?complete?the?application?for?found?in?1?files(fook?0.098?Seconds)
Checking?db?for:the?project?in?order?tp?complete?the?application?found?in?1?files(took?0.098?Seconds)
Checking?db?for:the?project?in?order?to?complete?the found?in?1?files(took?0.099?Seconds)
Checking?db?for:the?project?in?order?to?complete found?in?1?files(took?0.058?Seconds)
Checking?db?for:the?project?in?order?to found?in?1?files(took?0.054?Seconds)
Checking?db?for:the?project?in?order found?in?12?files(took?0.010?Seconds) □□□□□□□□□□□□Will?check? 12?files
File?comparison?took?1.033?Seconds.
The frequency meter of the project in order
Can not find overlappingly, please attempt other inputs.
Checking?db?for:the?project?in found?in?181?files(took?0.007?Seconds) □□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□ □□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□ □□□□□□□□Will?check? 100?files
File?comparison?took?8.229?Seconds.
The frequency meter of the project in
Figure A0382572902511
You can also rename the file and write code that affects the project may translate (using overlapping)
Figure A0382572902512
Figure A0382572902521
Checking?db?for:project?in?order?to?complete?the?application?for?information?on creating?applications found?in?1?files(took?0.092?Seconds)
Checking?db?for:project?in?order?to?complete?the?application?for?information?on creating found?in?1?files(took?0.092?Seconds)
Checking?db?for:project?in?order?to?complete?the?application?for?information?on found?in?1?files(took?0.090?Seconds)
Checking?db?for:project?in?order?to?complete?the?application?for?information found?in?1?files(took?0.091?Seconds)
Checking?db?for:project?in?order?to?complete?the?application?for found?in?1?files(took?0.091?Seconds)
Checking?db?for:project?in?order?to?complete?the?application found?in?1?files(took?0.090?Seconds)
Checking?db?for:project?in?order?to?complete?the found?in?1?files(took?0.089?Seconds)
Checking?db?for:project?in?order?to?complete found?in?1?files(took?0.049?Seconds)
Checking?db?for:project?in?order?to found?in?1?files(took?0.044?Seconds)
Checking?db?for:project?in?order found?in?24?files(took?0.001?Seconds) □□□□□□□□□□□□□□□□□□□□□□□□Will?check? 24?files
File?comparison?took?1.656?Seconds.
The frequency meter of project in order
You can also rename the file and write code that affects the project in order may translate (using overlapping)
Checking?db?for:in?order?to?complete?the?application?for?information?on?creating applications found?in?1?files(took?0.096?Seconds)
Checking?db?for:in?order?to?complete?the?application?for?information?on?creating found?in?1?files(took?0.095?Seconds)
Checking?db?for:in?order?to?complete?the?application?for?information?on found?in?1?files(took?0.095?Seconds)
Checking?db?for:in?order?to?complete?the?application?for?information found?in?1?files(took?0.095?Seconds)
Checking?db?for:in?order?to?complete?the?application?for found?in?1?files(took?0.094?Seconds)
Checking?db?for:in?order?to?complete?the?application found?in?1?files(took?0.091?Seconds)
Checking?db?for:in?order?to?complete?the found?in?5?files(took?0.090?Seconds)
Checking?db?for:in?order?to?complete found?in?7?files(took?0.053?Seconds)
Checking?db?for:in?order?to found?in?1000?files(took?0.033?Seconds) □□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□ □□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□ □□□□□□□□Will?check? 100?files
File?comparison?took?7.183?Seconds.
The frequency meter of in order to
Figure A0382572902561
You can also rename the file and write code that affects the project in order to may translate (using overlapping)
Figure A0382572902562
Checking?db?for:order?to?complete?the?application?for?information?on?creating applications found?in?1?files(took?0.055?Seconds)
Checking?db?for:order?to?complete?the?application?for?information?on?creating found?in?1?files(took?0.053?Seconds)
Checking?db?for:order?to?complete?the?application?for?information?on found?in?1?files(took?0.053?Seconds)
Checking?db?for:order?to?complete?the?application?for?information found?in?1?files(took?0.050?Seconds)
Checking?db?for:order?to?complete?the?application?for found?in?1?files(took?0.048?Seconds)
Checking?db?for:order?to?complete?the?application found?in?1?files(took?0.045?Seconds)
Checking?db?for:order?to?complete?the found?in?33?files(took?0.044?Seconds) □□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□Will?check? 33?files
File?comparison?took?1.949?Seconds.
The frequency meter of in order to complete the
You can also rename the file and write code that affects the project in order to complete the may translate (using overlapping)
Figure A0382572902591
Checking?db?for:to?complete?the?application?for?information?on?creating applications found?in?1?files(took?0.096?Seconds)
Checking?db?for:to?complete?the?application?for?information?on?creating found?in?1?files(took?0.095?Seconds)
Checking?db?for:to?complete?the?application?for?information?on found?in?1?files(took?0.095?Seconds)
Checking?db?for:to?complete?the?application?for?information found?in?1?files(took?0.049?Seconds)
Checking?db?for:to?complete?the?application?for found?in?1?files(took?0.048?Seconds)
Checking?db?for:to?complete?the?application found?in?4?files(took?0.043?Seconds)
Checking?db?for:complete?the?application?for?information?on?creating?applications found?in?1?files(took?0.067?Seconds)
Checking?db?for:complete?the?application?for?information?on?creating found?in?1?files(took?0.070?Seconds)
Checking?db?for:complete?the?application?for?information?on found?in?1?files(took?0.050?Seconds)
Checking?db?for:complete?the?application?for?information found?in?1?files(took?0.005?Seconds)
Checking?db?for:complete?the?application?for found?in?1?files(took?0.004?Seconds)
Checking?db?for:complete?the?application found?in?4?files(took?0.001?Seconds)
Checking?db?for:the?application?for?information?on?creating?applications found?in?1?files(took?0.067?Seconds)
Checking?db?for:the?application?for?information?on?creating found?in?1?files(took?0.065?Seconds)
Checking?db?for:the?application?for?information?on found?in?1?files(took?0.049?Seconds)
Checking?db?for:the?application?for?information found?in?1?files(took?0.005?Seconds)
Checking?db?for:the?application?for found?in?74?files(took?0.003?Seconds) □□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□ □□□□□□□□□□□□□□□□□□□□□□□□□□□□Will?check? 74?files
File?comparison?took?4.957?Seconds.
Frequency?table?for:the?application?for
Figure A0382572902611
You can also rename the file and write code that affects the project in order to complete the applicationfor information on creating may translate (using overlapping)
Figure A0382572902612
Figure A0382572902621
Checking?db?for:application?for?information?on?creating?applications found?in?1?files(took?0.063?Seconds)
Checking?db?for:application?for?information?on?creating found?in?1?files(took?0.061?Seconds)
Checking?db?for:application?for?information?on found?in?1?files(took?0.044?Seconds)
Checking?db?for:application?for?information found?in?7?files(took?0.001?Seconds)
Checking?db?for:for?information?on?creating?applications found?in?1?files(took?0.063?Seconds)
Checking?db?for:for?information?on?creating found?in?88?files(took?0.063?Seconds) □□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□ □□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□Will check? 88?files
File?comparison?took?7.270?Seconds.
The frequency meter of for information on creating
Figure A0382572902631
You can also rename the file and write code that affects the project in order to complete the applicationfor information on creating applications may translate (using overlapping)
Figure A0382572902632
aplicación?para
Checking?db?for:information?on?creating?applications found?in?1?files(took?0.017?Seconds)
Checking?db?for:on?creating?applications found?in?1?files(took?0.001?Seconds)
Checking?db?for:creating?applications found?in?50?files(took?0.002?Seconds) □□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□ □□□□Will?check? 50?files
File?comparison?took?2.627?Seconds.
The frequency meter of creating aoplications
Figure A0382572902661
You can also rename the file and write code that affects the projectin order to complete the application for information on creating applications may translate (using overlapping)
Figure A0382572902671
Translation?process?complete(took?245.6?seconds)
English:you?can?also?rename?the?file?and?write?code?that?affects?theproject?in?order?to?complete?the?application?for?information?on?creatingapplications
Spanish: también?puede?cambiar?el?nombre?de?un?archivo?y?escribir código.que?afecta?al?proyecto?para?completar?la?aplicación?para?obtener información?sobre?cómo?crear?aplicaciones
Appendix C
The translation of following faceted search " unless we will have a copy " from English to French.
Checking:unless?we?will?have?a?copy
db?check?took?0.269?SecondS
0?files?found **
Calling?Triangulation
′unless?we?will?have?a?copy′,from?EN?to?FR=àmoins?que?nous?ayons?unecopie
′unless?we?will?have?a?copy′,from?EN?to?DE=′es?sei?denn?wir?eine?Kopiehaben′and?back?to?FR?its′c′est?nous?que?une?copie?a′
′unless?we?will?have?a?copy′,from?EN?to?EL=′εκτóζ?αν?θα
Figure A0382572902701
and?back?to?FR?its′àmoins?que?nous?ayons?une?copie′
′unless?we?will?have?a?copy′,from?EN?to?ES=′a?menos?que?tengamos?unacopia′and?back?to?FR?its′àmoins?que?nous?ayons?une?copie′
′unless?we?will?have?a?copy′,from?EN?to?IT=′a?meno?che?abbiamo?una?copia′and?back?to?FR?its′moins?que?nous?avons?une?copie′
′unless?we?will?have?a?copy′,from?EN?to?KO= and?back?to?FR?its′Nous?quand?il?y?a?une?copie?la?rancune′
′unless?we?will?have?a?copy′,from?EN?to?NL=′tenzij?wij?een?exemplaar?zullenhebben′and?back?to?FR?its′àmoins?que?nous?une?copie′
′unless?we?will?have?a?copy′,from?EN?to?PT=′a?menos?que?nós?tivermos?umacópia′and?back?to?FR?its′àmoins?que?nous?ayons?une?copie′
′unless?we?will?have?a?copy′,from?EN?to?RU=′Ecлн?мы?не?будем?иметьконию′and?back?to FR?its′Si?nous?n′aurons?pas?une?copie′
The?Triangulation?process?took?12.58?sec.
Checking″àmoins?que?nous?ayons?une?copie″back?to?original?Ianguage.
′àmoins?que?nous?ayons?une?copie′,from?FR?to?EN=unless?we?have?a?copy
′àmoins?que?nous?ayons?une?copie′,from?FR?to?DE=′es?sei?denn?wir?eineKopie?haben′and?back?to?EN?its′it?is?we?a?copy?has′
′àmoins?que?nous?ayons?une?copie′,from?FR?to?EL=′moins?que?nous ayonsune?copie′and?back?to?EN?its′moins?que?nous?y′!ayons?une?copie′
′àmoins?que?nous?ayons?une?copie′,from?FR?to?ES=′a?menos?que?tengamosuna?copia′and?back?to?EN?its′unless?we?have?a?copy′
′àmoins?que?nous?ayons?une?copie′,from?FR?to?IT=′a?meno?che?abbiamo?unacopia′and?back?to?EN?its′less?that?we?have?one?copy′
′àmoins?que?nous?ayons?une?copie′,from?FR?to?KO=
Figure A0382572902712
and?back?to?EN?its′Grudge?us?who?are?not?when?it?is?the?copy′
′àmoins?que?nous?ayons?une?copie′,from?FR?to?NL=′tenzij?wij?eenexemplaar?hebben′and?back?to?EN?its′unless?we?have?a?copy′
′àmoins?que?nous?ayons?une?copie′,from?FR?to?PT=′a?menos?que?nóstivermos?uma?cópia′and?back?to?EN?its′unless?we?have?a?copy′
′àmoins?que?nous?ayons?une?copie′,from?FR?to?RU=″and?back?to?EN?its″
The?Triangulation?process?took?12.90?sec.
Checking:unless?we?will?have?a
db?cheek?took?0.225?Seconds
0?files?found **
Calling?Triangulation
′unless?we?will?have?a′,from?EN?to?FR=àmoins?que?nous?ayons?a
′unless?we?will?have?a′,from?EN?to?DE=′es?sei?denn?wix?a?haben′and?back?toFR?its′c′est?que?nous?A?a′
′unless?we?will?have?a′,from?EN?to?EL=′εκτóζ?ανθα
Figure A0382572902714
το?α′and?back?to
FR?its′àmoins?que?nous?ayons?le?a′
′unless?we?will?have?a′,from?EN?to?ES=′a?menos?que?tengamos?a′and?back?toFR?its′àmoins?que?nous?ayons?a′
′unless?we?will?have?a′,from?EN?to?IT=′a?meno?che?abbiamo?a′and?back?to?FRits′moins?que?nous?devons′
′unless?we?will?have?a′,from?EN?to?KO=
Figure A0382572902721
and?backto?FR?its′Nous?quand?iI?y?a?un}a{Ia?rancune′
′unless?we?will?have?a′,from?EN?to?NL=′tenzij?wij?a?zullen?hebben′and?backto?FR?its′àmoins?que?nous?a′
′unless?we?will?have?a′,from?EN?to?PT=′a?menos?que?nós?tivermos?a′and?backto?FR?its′àmoins?que?nous?ayons′
′unless?we?will?have?a′,from?EN?to?RU=′Если?мы?не?будем?иметьa′andback?to?FR?its′Si?nous?n′aurons?pas?A′
The?Triangulation?process?took?12.51?sec.
Checking:unless?we?win?have
db?check?took?0.124?Seconds
0?files?found **
Calling?Triangulation
′unless?we?will?have′,from?EN?to?FR=àmoins?que?nous?ayons
′unless?we?will?have′,from?EN?to?DE=′es?sei?denn?wir?haben′and?back?to?FRits′c′est?nous?a′
′unless?we?will?have′,from?EN?to?EL=′εκτóζ?αν?θα
Figure A0382572902722
and?back?to?FR?its
′àmoins?que?nous?ayons′
′unless?we?will?have′,from?EN?to?ES=′a?menos?que?tengamos′and?back?to?FRits′àmoins?que?nous?ayons′
′unless?we?will?have′,from?EN?to?IT=′a?meno?che?abbiamo′and?back?to?FR?its′moins?que?nous?avons′
′unless?we?will?have′,from?EN?to?KO= and?back?to?FRits′Quand?iI?y?a?de nous?Ia?rancune′
′unless?we?will?have′,from?EN?to?NL=′tenzij?wij?zullen?hebben′and?back?toFR?its′àmoins?que?nous′
′unless?we?will?have′,from?EN?to?PT=′a?menos?que?nós?tivermos′and?back?toFR?its′àmoins?que?nous?ayons′
′unless?we?will?have′,from?EN?to?RU=′Если?мы?не?будем?иметь′and?back?toFR?its′Si?nous?n′aurons?pas′
The?Triangulation?process?took?7.314?sec.
Checking″àmoins?que?nous?ayons″back?to?original?language.
′àmoins?que?nous?ayons′,from?FR?to?EN=unless?we?have
′àmoins?que?nous?ayons′,from?FR?to?DE=′es?sei?denn?wir?haben′and?back?toEN?its′it?is?we?has′
′àmoins?que?nous?ayons′,from?FR?to?EL=′moins?que?nous
Figure A0382572902732
ayons′and?backto?EN?its′moins?que?nous?y′!ayons′
′àmoins?que?nous?ayons′,from?FR?to?ES=′a?menos?que?tengamos′and?back?toEN?its′unless?we?have′
′àmoins?que?nous?ayons′,from?FR?to?IT=′a?meno?che?abbiamo′and?back?toEN?its′less?that?we?have′
′àmoins?que?nous?ayons′,from?FR?to?KO=′ and?backto?EN?its′When?there?are?grudge?we?who?are?not′
′àmoins?que?nous?ayons′,from?FR?to?NL=′tenzij?wij?hebben′and?back?to?ENits′unless?we?have′
′àmoins?que?nous?ayons′,from?FR?to?PT=′a?menos?que?nós?tivermos′andback?to?EN?its′unless?we?have′
′àmoins?que?nous?ayons′,from?FR?to?RU=″and?back?to?EN?its″
The?Triangulation?process?took?12.15?sec.
Checking:unless?we?will
db?check?took?0.001?Seconds
0?files?found **
Calling?Triangulation
′unless?we?will′,from?EN?to?FR=àmoins?que?nous
′unless?we?will′,from?EN?to?DE=′es?sei?denn?wit?werden′and?back?to?FR?its′c′est?nous?devient′
′unless?we?will′,from?EN?to?EL=′εκτóζ?αν′and?back?to?FR?its′àmoins?que′
′unless?we?will′,from?EN?to?ES=′a?menos?que′and?back?to?FR?its′àmoinsque′
′unless?we?will′,from?EN?to?IT=′a?meno?che′and?back?to?FR?its′moins?que′
′unless?we?will′,from?EN?to?KO= and?back?to?FR?its′Larancune?oùnous?ne?sommes?pas′
′unless?we?will′,from?EN?to?NL=′tenzij?wij?zullen′and?back?to?FR?its′àmoinsque?nous′
′unless?we?will′,from?EN?to?PT=′a?menos?que?nós′and?back?to?FR?its′àmoinsque?nous′
′unless?we?will′,from?EN?to?RU=′Если?мы?не?будем′and?back?to?FR?its′Sinous?ne?serons?pas′
The?Triangulation?process?took?10.56?sec.
Checking″àmoins?que″back?to?original?language.
′àmoins?que′,from?FR?to?EN=unless
′àmoins?que′,from?FR?to?DF=′es?sei?denn′and?back?tp?EN?its′it?is′
′àmoins?que′,from?FR?to?EL= moins?que′and?back?to?EN?its′y′!moins?que′
′àmoins?que′,from?FR?to?ES=′a?menos?que′and?back?to?EN?its′unless′
′àmoins?que′,from?FR?to?IT=′a?meno?che′and?back?to?EN?its′less?than′
′àmoins?que′,from?FR?to?KO= and?back?to?EN?its′The?grudge?whichis?not′
′àmoins?que′,from?FR?to?NL=′tenzij′and?back?to?EN?its′unless′
′àmoins?que′,from?FR?to?PT=′a?menos?que′and?back?to?EN?its′unless′
àmoins?que′,from?FR?to?RU=″and?baok?to?EN?its″
The?Triangulation?process?took?7.903?sec.
Checking:unless?we
db?check?took?0.093?Seconds
first?grep?took?2.003?Seconds
found?in?1000?files
Figure A0382572902753
translated?it?in?0.702?Seconds
Rule-based?translation?#2=àmoins?que?nous
translated?it?in?5.394?Seconds
999?of?1000?files?contain?a?pair(source?and?target?language).
Checking:
Figure A0382572902761
moins?que?nous
grep?in?target?language?took?0.233?Seconds?20?found.
counting?in?files?took?0.018?Seconds
Found?in?16?files.
File?#0? eng/hansard_disc/set_a/a0/a_012.89.eng--total?words:1786;Locations:578. french?file.
File?#1? eng/hansard_disc/set_a/a0/a_020.29.eng--total?words:2004;Locations:760. french?file.
File?#2? eng/hansard_disc/set_a/a0/a_008.9.eng--total?words:1972;Locations:919. french?file.
File?#3? eng/hansard_disc/set_a/a0/a_009.24.eng--total?words:2319;Locations:953. french?file.
File?#4? eng/hansard_disc/set_a/a0/a_026.37.eng--total?words:2320;Locations:1895. french?file.
File?#5? eng/nansard_disc/set_a/a0/a_006.25.eng--total?words:2285;Locations:1637. french?file.
File?#6? eng/hansard_disc/set_a/a0/a_015.61.eng--total?words:2314;Locations:236,948. french?file.
File?#7? eng/hansard_disc/set_a/a0/a_031.53.eng--total?words:2495;Locations:1446. french?file.
File?#8? eng/hansard_disc/set_a/a0/a_011.78.ehg--total?words:2448;Locations:1470. french?file.
File?#9? eng/hansard_disc/set_a/a0/a_014.92.eng--total?words:2511;Locations:1867. french?file.
File?#10? eng/hansard_disc/set_a/a0/a_014.38.eng--total?words:2387;Locations:2098. french?file.
File?#11? eng/hansard_disc/set_a/a0/a_017.82.eng--total?words:2437;Locations:1333. french?file.
File?#12? eng/hansard_disc/set_a/a0/a_013.1.eng--total?words:2380;Locations:1638,2213. french?file.
File?#13? eng/hansard_disc/set_a/a0/a_029.25.eng--total?words:2526;Locations:1514. french?file.
File?#14? eng/hansard_disc/set_a/a0/a_027.42.eng--total?words:2577;Locations:2124. french?file.
File?#15? eng/hansard_disc/set_a/a0/a_006.93.eng--total?words:2621;Locations:2534. french?file.
Checking:àmoins?que?nous
grep?in?target?language?took?0.237?Seconds?20?found.
counting?in?files?took?0.019?Seconds
Found?in?16?files.
File?#0? eng/hansard_disc/set_a/a0/a_012.89.eng--total?words:1786;Locations:578. french?file.
File?#1? eng/hansard_disc/set_a/a0/a_020.29.eng--total?words:2004;Locations:760. french?file.
File?#2? eng/hansard_disc/set_a/a0/a_008.9.eng--total?words:1972;Locations:919. french?file.
File?#3? eng/hansard_disc/set_a/a0/a_009.24.eng--total?words:2319;Locations:953. french?file.
File?#4? eng/hansard_disc/set_a/a0/a_026.37.eng--total?words:2320;Locations:1895. french?file.
File?#5? eng/hansard_disc/set_a/a0/a_006.25.eng--total?words:2285;Locations:1637. french?file.
File?#6? eng/hansard_disc/set_a/a0/a_015.61.eng--total?words:2314;Locations:236,948. french?file.
File?#7? eng/hansard_disc/set_a/a0/a_031.53.eng--total?words:2495;Locations:1446. french?file.
File?#8? eng/hansard_disc/set_a/a0/a_011.78.eng--total?words:2448;Locations:1470. french?file.
File?#9? eng/hansard?disc/set_a/a0/a_014.92.eng--total?words:2511;Locations:1867. french?file.
File?#10? eng/hansard_disc/set_a/a0/a_014.38.eng--total?words:2387;Locations:2098. french?file.
File?#11? eng/hansard_disc/set_a/a0/a_017.82.eng--total?words:2437;Locations:1333. french?file.
File?#12? eng/hansard_disc/set_a/a0/a_013.1.eng--total?words:2380;Locations:1638,2213. french?file.
File?#13? eng/hansard_disc/set_a/a0/a_029.25.eng--total?words:2526;Locations:1514. french?file.
File?#14? eng/hansard_disc/set_a/a0/a_027.42.eng--total?words:2577;Locations:2124. french?file.
File?#15? eng/hansard_disc/set_a/a0/a_006.93.eng--total?words:2621;Locations:2534. french?file.
Last?search?took?13.44
*true *
The frequency meter of unless we
Numbering Affiliated number of files The English counting French
1 ?13docs 13times àmoins?que?nous
Starting?to?translate,false,false,french,true,eng,fre
Trying?to?translate
So?far?I?have?a?good?overlap?O
Checking:we?will?have?a?copy
db?check?took?0.297?Seconds
0?files?found **
Calling?Triangulation
′we?will?have?a?copy′,from?EN?to?FR=nous?aurons?une?copie
′we?will?have?a?copy′,from?EN?to?DE=′wir?haben?eine?Kopie′and?back?to?FRits′nous?avons?une?copie′
′we?will?have?a?copy′,from?EN?to?EL=′θα
Figure A0382572902781
and?back?toFR?its′nous?aurons?une?copie′
′we?will?have?a?copy′,from?EN?to?ES=′tendremos?una?copia′and?back?to?FRits′nous?aurons?un?copie′
′we?will?have?a?copy′,from?EN?to?IT=′avremo?una?copia′and?back?to?FR?its′nous?aurons?une?copie′
′we?will?haye?a?copy′,from?EN?to?KO=
Figure A0382572902782
and?backto?FR?its′Nous?serons?la?copie′
′we?will?have?a?copy′,from?EN?to?NL=′wij?Zullen?een?exemplaar?hebben′andback?to?FR?its′nous?aurons?une?copie′
′we?will?have?a?copy′,from?EN?to?PT=′nós?teremos?uma?cópia′and?back?to?FRits′nous?aurons?une?copie′
′we?will?have?a?copy′,from?EN?to?RU=′Мы?будем?иметь?копию′and?back?to
FR?its′Nous?aurons?une?copie′
The?Triangulation?process?took?17.77?sec.
Checking″nous?aurons?une?copie″back?to?original?language.
′nous?aurons?une?copie′,from?FR?to?EN=we?will?have?a?copy
′nous?aurons?une?copie′,from?FR?to?DE=′wir?haben?eine?Kopie′and?back?toEN?its′we?have?a?copy′
′nous?aurons?une?copie′,from?FR?to?EL=′nous?aurons?une?copie′and?back?toEN?its′nous?aurons?une?copie′
′nous?aurons?une?copie′,from?FR?to?ES=′tendremos?una?copia′and?back?toEN?its′we?will?have?one?copies′
′nous?aurons?une?copie′,from?FR?to?IT=′avremo?una?copia′and?back?to?EN?its′We?will?have?one?copy′
′nous?aurons une?copie′,from?FR?to?KO= andback?to?EN?its′The?copy?which?means?will?be?we′
′nous?aurons?une?copie′,from?FR?to?NL=′wij?zullen?een?exemplaar?hebben′and?back?to?EN?its′we?will?have?a?copy′
′nous?aurons?une?copie′,from?FR?to?PT=′nós?teremos?uma?cópia′and?back?toEN?its′we?will?have?a?copy′
′nous?aurons?une?copie′,from?FR?to?RU=″and?back?to?EN?its″
The?Triangulation?process?took?8.645?sec.
The frequency meter of we will have a copy
Numbering Affiliated number of files The English counting French
? 1 ? ? ?20docs ? ? 9times ? ? nous?aurons?une?copie ?
English:unless?we?will?have?a?copy
French:
Starting?to?translate?unless?we?will?have?a?copy,false,false,french,true,eng,freselect?lang,olang?from?peanut?where?lang=′unless?we?will?have?a?copy′orderby?langcount?desc-0
Current?string?to?be?translated=unless?we?will?have?a?copy
Got?Here....
What?now?true
1)àmoins?que?nous?aurons?une?copie
The?translation?process?took?117.0?sec.
Appendix D-
Use target language mighty torrent and the overlapping example of translating
Starting?to?translate?brake?and?over(hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuegocon?israel)
Figure A0382572902822
hamas?anuncióeste?jueves?was?just?translated?and?returned?results
Number?of?results=1000
Translation?for?hamas?anuncióeste?jueves?took?1.328
Figure A0382572902824
Figure A0382572902825
hamas?anuncióeste?jueves?el?was?just?translated?and?returned?results
Number?of?results=1000
Translation?for?hamas?anuncióeste?jueves?el?took?0.946
Figure A0382572902826
Figure A0382572902827
hamas?anuncióeste?jueves?el?fin?was?just?translated?and?returned?results
Number?of?results=1000
Translation?for?hamas?anuncióeste?jueves?el?fin?took?1.29
Skipping?anuncióeste?jueves?el(2<2)
anuncióeste?jueves?el?fin?was?just?translated?and?returned?results
Number?of?results=306
Translation?for?anuncióeste?jueves?el?fin?took?0.827
going?to?try?and?overlap?this?piece?with?the?hashmap
@@@Pre?3@@@
@@@Post?4@@@
Trying?to?overlap′hamas?anuncióeste?jueves?el?fin′,′anuncióeste?jueves?el?fin′(4,null,1)-(306)
No?good?source?overlap
@@@Pre?4@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueves?el′,′anuncióeste?jueves?el?fin′(2,hamasanuncióeste?jueves?el?fin,1)-(306)
Got?an?overlap?in?source,checking?target
1000-306
Overlap?check?for′hamas?anuncióeste?jueves?el′,′anuncióeste?jueves?el?fin′took?0.722
***hamas?anuncióeste?jueves?el(1000),(306)anuncióeste?jueves?el?fin=hamas?anuncióeste?jueves?el?fin
@@@1223->0
The overlapping result of hamas anunci ó este jueves el fin '
1)′hamas?announced?thursday,the?completion′-85(Repeated?11?times)(hamas,announced?thursday?the∷announced?thursday?the?completion)
2)′hamas,announced?thursday?the?termination′-85(Repeated?5?times)(null)
3)′hamas?announced?thursday,the?end′-85(Repeated?4?times)(hamas,announcedthursday?the∷announced?thursday?the?end)
4)′hamas,announced?thursday?the?end′-85(Repeated?9?times)(null)
5)′hamas?announced?thursday,the?termination′-85(Repeated?4?times)(hamas,announced?thursday?the∷announced?thursday?the?termination)
6)′hamas,announced?thursday?the?completion′-85(Repeated?8?times)(null)
7)′hamas,announced?thursday?that?the?completion′-80(Repeated?3?times)(null)
8)′hamas?announced?on?thursday,the?end′-80(Repeated?1?times)(hamas,announced?on?thursday?the∷announced?on?thursday?the?end)
9)′hamas,announced?thnrsday?the?end?of′-80(Repeated?8?times)(null)
10)′hamas?announced?thursday,the?end?of'-80(Repeated?3?times)(hamas,announced?thursday?the∷announced?thursday?the?end?of)
11)′of,hamas?announced?thursday?the?end′-80(Repeated?7?times)(null)
12)′that,hamas?announced?thursday?the?termination′-80(Repeated?3?times)(null)
13)′and,hamas?announced?thursday?the?end′-80(Repeated?10?times)(null)
14)′as,hamas?announced?thursday?the?termination′-80(Repeated?4?times)(null)
15)′hamas?announced?thursday,the?termination?of′-80(Repeated?3?times)(hamas,announced?thursday?the∷announced?thursday?the?termination?of)
16)′hamas,announced?thursday?the?completion?of′-80(Repeated?7?times)(null)
17)′of,hamas?announced?thursday?the?completion′-80(Repeated?4?times)(null)
18)′the,hamas?announced?thursday?the?completion′-80(Repeated?4?times)(null)
19)′hamas,announced?thursday?is?the?end′-80(Repeated?2?times)(null)
20)′and,hamas?announced?thursday?the?termination′-80(Repeated?6?times)(null)
Sort according to multiplicity
1)thursday?announced,the?completion-32(Score=65?times)
2)thursday?announced,the?completion?of-26(Score=60?times)
3)announced?thursday,the?completion-22(Score=65?times)
4)announced?thursday,the?completion?of-20(Score=60?times)
5)on?thursday?announced,the?completion-16(Score=60?times)
6)day,hamas?announced?thursday?the?end-15(Score=65?times)
7)thursday?announced,the?termination-14(Score=65?times)
8)announced?on?thursday,the?end-13(Score=60?times)
9)day,hamas?announced?thursday?the?completion-13(Score=65?times)
10)on?thursday?announced,the?completion?of-13(Score=55?times)
11)thursday?announced,the?termination?of-12(Score=60?times)
12)announced?on?thursday,the?completion-12(Score=60?times)
13)thursday?announced,the?completion?of?its-12(Score=55?times)
14)announced?thursday,the?completion?of?its-12(Score=55?times)
15)announced?on,thursday?an?end-12(Score=50?times)
16)hamas?announced?thursday,the?completion-11(Score=85?times)
17)they?announced,thursday?the?completion-11(Score=60?times)
18)day,hamas?announced?thursday?the?end?of-11(Score=60?times)
19)announced?on?thursday,the?end?of-10(Score=55?times)
20)announced?on,thursday?an?end?to-10(Score=45?times)
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueves′,′anuncióeste?jueves?el?fin′(2,hamasanuncióeste?jueves?el?fin,1)-(306)
Got?an?overlap?in?source,checking?target
997-306
Overlap?check?for′hamas?anuncióeste?jueves′,′anuncióeste?jueves?el?fin′took?0.958
***hamas?anuncióeste?jueves(997),(306)anuncióeste?jueves?el?fin=hamasanuncióeste?jueves?el?fin
@@@3169->0
The overlapping result of hamas anunci ó este jueves el fin '
1)′hamas?announced,thursday?the?completion′-85(Repeated?11?times)(hamas,announced?thursday∷announced?thursday?the?completion)
2)′hamas,announced?thursday?the?termination′-85(Repeated?5?times)(null)
3)′hamas,announced?thursday?the?completion′-85(Repeated?8?times)(null)
4)′hamas?announced?thursday,the?completion′-85(Repeated?11?times)(null)
5)′hamas?announced,thursday?the?termination′-85(Repeated?4?times)(hamas,announced?thursday∷announced?thursday?the?termination)
6)′hamas?announced?thursday,the?end′-85(Repeated?4?times)(null)
7)′hamas,announced?thursday?the?end′-85(Repeated?9?times)(null)
8)′hamas?announced?thursday,the?termination′-85(Repeated?4?times)(null)
9)′hamas?announced,thursday?the?end′-85(Repeated?4?times)(hamaas,announcedthursday∷announced?thursday?the?end)
10)′hamas?announced?on,thursday?the?completton′-80(Repeated?4?times)(hamas,announced?on?thursday∷announced?on?thursday?the?completion)
11)′that,hamas?announced?thursday?the?termination′-80(Repeated?3?times)(null)
12)′hamas,announced?thursday?the?completion?of′-80(Repeated?7?times)(null)
13)′the,hamas?announced?thursday?the?completion′-80(Repeated?4?times)(null)
14)′hamas,announeed?thursday?in?the?finale′-80(Repeated?3?times)(null)
15)′hamas,announced?on?thursday?the?end′-80(Repeated?6?times)(null)
16)′that,hsmas?announced?thursday?the?completion′-80?(Repeated?4?times)(null)
17)′hamas,announced?thursdayand?end?the′-80?(Repeated?2?times)(null)
18)′hamas,announced?on?thursday?the?completion′-80(Repeated?4?times)(null)
19)′the,hamas?announced?thursday?the?termination′-80(Repeated?4?times)(null)
20)′that,hamas?annouuced?thursday?the?end′-80(Repeated?7?times)(null)
Sort according to multiplicity
1)announced?on,thursday?an?end-18?(Score=50?times)
2)announced?on,thursday?the?completion-16(Score=60?times)
3)announced?thursday,the?completion-16(Score=65?times)
4)day,hamas?announced?thursday?the?end-15(Score=65?times)
5)announced?on,thursday?the?end-15(Score=60?times)
6)announced?on,thursday?conpletion-15(Score=55?times)
7)thursday?announced,the?completion-14(Score=65?times)
8)announced?on,thursday?an?end?to-13(Score=45?times)
9)day,hamas?announced?thursday?the?completion-13(Score=65?times)
10)announced?thursday,the?complefion?of-13(Score=60?times)
11)e?announced,thursday?the?completion-12(Score=45?times)
12)announced?on,thursday?the?completion?of-11(Score=55?times)
13)hamas?announced,thursday?the?completion-11(Score=85?times)
14)announced?on,thursday?the?termination-11(Score=60?times)
15)day,hamas?announced?thursday?the?end?of-11(Score=60?times)
16)hamas?announced?thursday,the?completion-11(Score=85?times)
17)e?announced,thursday?the?end-10(Score=45?times)
18)and,hamas?announced?thursday?the?end-10(Score=80?times)
19)hamas?announced,thursday?the?completion?of-10(Score=80?times)
20)announced?on?thursday,the?completion-10(Score=60?times)
Figure A0382572902851
anuncióeste?jueves?el?fin?de?was?just?translated?and?returned?results
Number?of?results=1000
Translation?for?anuncióeste?jueves?el?fin?de?took?1.195
going?to?try?and?overlap?this?piece?with?the?hashmap
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueves?el?fin′,′anuncióeste?jueves?el?fin?de′(2,hamas?anuncióeste?jueves?el?fin?de,1)-(1000)
Got?an?overlap?in?source,checking?target
1500-1000
Overlap?check?for′hamas?anuncióeste?jueves?el?fin′,′anuncióeste?jueves?el?fin?de′took4.251
***hamas?anuncióeste?jueves?el?fin(1500),(1000)anuncióeste?jueves?el?finde=hamas?anuncióeste?jueves?el?fin?de
###1839->1839
The overlapping result of hamas anunci ó este jueves el fin de
1)hmas?announced?thursday?the,end?of-90(Repeated?1?times)(hamas?announced,thursday?the?end∷announced?thursday?the?end?of)
2)hamas?announced?thursday?the,completion?of-90(Repeated?1?times)(hamas,announced?thursday?the?completion∷announced?thursday?the?completion?of)
3)hamas?announced?thursday?the,termination?of-90(Repeated?1?times)(hamasannounced?thursday,the?termination∷announced?thursday?the?termination?of)
4)hamas?announced?thursday?the?end,of?its-85(Repeated?1?times)(hamasannounced,thursday?the?end?of∷announced?thursday?the?end?of?its)
5)hamas?announced?on?thursday?the,completion?of-85(Repeated?1?times)(hamas,announced?on?thursday?the?completion∷announced?on?thursday?the?completion?of)
6)hamas?announced?thursday?the?completion,of?its-85(Repeated?1?times)(hamasannounced?thursday,the?completion?of∷announced?thursday?the?completion?of?its)
7)hamas?announced?on?thursday?the,end?of-85(Repeated?1?times)(hamasannounced?on,thursday?the?end∷announced?on?thursday?the?end?of)
8)hamas?announced?thursday?that?completion,of?the-85(Repeated?1?times)(hamas,announced?thursday?that?completion?of∷announced?thursday?that?completion?of?the)
9)hamas?announced?thursday?that?by?the,end?of?this-85(Repeated?1?times)(hamasannounced?thursday,that?by?the?end∷that?by?the?end?of?this)
10)hamas?announced?on?thursday?the,termination?of-85(Repeated?1?times)(hamas,announced?on?thursday?the?termination∷announced?on?thursday?the?termination?of)
11)hamas?announced?thursday?the?completion,of?a-85(Repeated?1?times)(hamasannounced?thursday,the?completion?of∷announced?thursday?the?completion?of?a)
12)hamas?announced?on?thursday?the?completion,of?its-80(Repeated?1?times)(hamas?announced?on?thursday,the?completion?of∷thursday?the?completion?of?its)
13)hamas?announced?on?thursday?the?end,of?its-80(Repeated?1?times)(hamasannounced?on?thursday?the,end?of∷thursday?the?end?of?its)
14)hamas?announced?on?thursday?the?completion,of?a-80(Repeated?1?times)(hamas,announced?on?thursday?the?completion?of∷announced?on?thursday?thecompletion?of?a)
15)hamas?announced?thursday?that,completion?of-80(Repeated?1?times)(hamas,announced?thursday?that?completion∷announced?thursday?that?completion?of)
16)hamas?announced?thursday?that?at?the,end?of-80(Repeated?2?times)(hamasannounced?thursday,that?at?the?end∷thursday?that?at?the?end?of)
17)hamas?announced?on?thursday,completion?of-80(Repeated?1?times)(hamasannounced?on,thursday?completion∷announced?on?thursday?cormpletion?of)
18)thursday?announced?the?completion,of?this-75(Repeated?15?times)(thursdayannounced,the?completion?of∷announced?the?completion?of?this)
19)thursday?annonnced?the?end,of?this-75(Repeated?8?times)(thursday?announced,the?end?of∷announced?the?end?of?this)
20)hamas?announced?on?thursday?completion,of?its-75(Repeated?1?times)(hamas,announced?on?thursday?completion?of∷announced?on?thursday?completion?of?its)
Sort according to multiplicity
1)announced?thursday?the,completion?of-186(Score=70?times)
2)announced?thursday?the,end?of-135(Score=70?times)
3)announced?thursday?the,termination?of-98(Score=70?times)
4)thursday?announced?the,end?of-60(Score=70?times)
5)announced?thursday?the?completion,of?its-58(Score=65?times)
6)announced?thursday?the?completion,of?a-53(Score=65?times)
7)announced?thursday?the?termination,of?all-47(Score=50?times)
8)announced?thursday?the?end,of?its-44(Score=65?times)
9)thursday?announced?the?completion,of?the-43(Score=65?times)
10)on?thursday?announced?the,end?of-42(Score=65?times)
11)thursday?announced?the,completion?of-41(Score=70?times)
12)on?thursday?announced?the,completion?of-37(Score=65?times)
13)thursday?announced?the?completion,of?a-35(Score=65?times)
14)thursday?announced?the?termination,of?the-33(Score=65?times)
15)announced?thursday?the?termination,of?200-28(Score=50?times)
16)announced?thursday?the?end,of?cash-28(Soore=50?times)
17)announced?thursday?the?end,of?major-28(Score=50?times)
18)announced?thursday?the?end,of?fighting-28(Score=50?times)
19)thursday?announced,completion?of-21(Score=65?times)
20)e?announced?thursday?the,completion?of-19(Score=50?times)
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueves?el′,′anuncióeste?jueves?el?fin?de′(2,hamas?anuncióeste?jueves?el?fin?de,1)-(1000)
Got?an?overlap?in?source,checking?target
1000-1000
Overlap?check?for′hamas?aouncióeste?jueves?el′,′anuncióeste?jueves?el?fin?de′took0.979
***hamas?anuncióeste?jueves?el(1000),(1000)anuncióeste?jueves?el?fin?de=hamas?anuncióeste?jueves?el?fin?de
@@@2205->0
The overlapping result of hamas anunci ó este jueves el fin de
1)′hamas?announced?thursday?the,end?of′-90(Repeated?1?times)(null)
2)′hamas?announced?thursday,the?end?of′-90(Repeated?3?times)(hamas,announced?thursday?the∷announced?thursday?the?end?of)
3)′hamas?announced?thursday,the?termination?of′-90(Repeated?3?times)(hamas,announced?thursday?the∷announced?thursday?the?termination?of)
4)′hamas?announced?thursday?the,completion?of′-90(Repeated?1?times)(null)
5)′hamas?announced?thursday,the?completion?of′-90(Repeated?10?times)(hamas,announced?thursday?the∷announced?thursday?the?completion?of)
6)′hamas?announced?thursday?the,termination?of′-90(Repeated?1?times)(null)
7)′hamas?announced?on?thursday,the?completion?of′-85(Repeated?3?times)(hamas,announced?on?thursday?the∷announced?on?thursday?the?completion?of)
8)′hamas?announced?thursday?the?completion,of?its′-85(Repeated?1?times)(null)
9)′hamas?announced?thursday,the?completion?of?its′-85(Repeated?6?times)(hamas,announced?thursday?the∷announced?thursday?the?completion?of?its)
10)′hamas?announced?thursday?that?completion,of?the′-85(Repeated?1?times)(null)
11)′hamas?announced?thursday,the?completion′-85(Repeated?11?times)(hamas,announced?thursday?the∷announced?thursday?the?completion)
12)′hamas?announced?thursday,the?end′-85(Repeated?4?times)(hamas,announcedthursday?the∷announced?thursday?the?end)
13)′hamas?announced?thursday?the?completion,of?a′-85(Repeated?1?times)(null)
14)′hamas?announced?on?thursday,the?termination?of′-85(Repeated?2?times)(hamas,announced?on?thursday?the∷announced?on?thursday?the?termination?of)
15)′hamas?announced?thursday,the?end?of?its′-85(Repeated?2?times)(hamas,announced?thursday?the∷announced?thursday?the?end?of?its)
16)′hamas?announced?thursday,that?completion?of?the′-85(Repeated?2?times)(hamas,announced?thursday?that∷announced?thursday?that?completion?of?the)
17)′hamas?announced?thursday?the?end,of?its′-85(Repeated?1?times)(null)
18)′hamas?announced?on?thursday?the,completion?of′-85(Repeated?1?times)(null)
19)′hamas?announced?thursday,the?termination′-85(Repeated?4?times)(hamas,announced?thursday?the∷announced?thursday?the?termination)
20)′hamas?announced?on?thursday?the,end?of′-85(Repeated?7?times)(hamas,announced?on?thursday?the?end∷announced?on?thursday?the?end?of)
Sort according to multiplicity
1)announced?thursday?the,end?of-123(Score=70?times)
2)announced?thursday?the,completion?of-93(Score=70?times)
3)announced?thursday?the,termination?of-85(Score=70?times)
4)thursday?announced?the,end?of-41(Score=70?times)
5)thursday?announced?the?completion,of?the-34(Score=65?times)
6)announced?thursday?the?termination,of?all-33(Score=50?times)
7)thursday?announced,the?completion-31(Score=65?times)
8)announced?thursday?the?end,of?major-28(Score=50?times)
9)announced?thursday?the?end,of?its-28(Score=65?times)
10)announced?thursday?the?termination,of?200-28(Score=50?times)
11)announced?thursday?the?end,of?cash-28(Score=50?tines)
12)announced?thursday?the?end,of?fighting-28(Score=50?times)
13)announced,thursday?the-28(Score=45?times)
14)thursday?announced?the?termination,of?the-25(Score=65?times)
15)thursday?announced,the?completion?of-25(Score=70?times)
16)on?thursday?announced?the,end?of-25(Score=65?times)
17)announced?thursday?the?completion,of?its-24(Score=65?times)
18)they?announced,thursday?the-24(Score=40?times)
19)announced?thursday?the?completion,of?a-24(Score=65?times)
20)announced?thursday,the?completion-22(Score=65?times)
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueves′,′anuncióeste?jueves?el?fin?de′(2,hamasanuncióeste?jueves?el?fin?de,1)-(1000)
Got?an?overlap?in?source,checking?target
997-1000
Overlap?check?for′hamas?anuncióeste?jueves′,′anuncióeste?jueves?el?fin?de′took?1.358
***hamas?anuncióeste?jueves(997),(1000)anuncióeste?jueves?el?fin?de=hamas?anuncióeste?jueves?el?fin?de
@@@4950->0
The overlapping result of hamas anunci ó este jueves el fin de
1)′hamas?announced?thursday?the,end?of′-90(Repeated?1?times)(null)
2)′hamas?announced?thursday,the?end?of′-90(Repeated?3?times)(null)
3)′hamas?announced,thursday?the?end?of′-90(Repeated?3?times)(hamas,announced?thursday∷announced?thursday?the?end?of)
4)′hamas?announced?thursday,the?termination?of′-90(Repeated?3?times)(null)
5)′hamas?announced?thursday?the,completion?of′-90(Repeated?1?times)(null)
6)′hamas?announced,thursday?the?completion?of′-90(Repeated?10?times)(hamas,announced?thursday∷announced?thursday?the?completion?of)
7)′hamas?announced?thursday,the?completion?of′-90(Repeated?10?times)(null)
8)′hamas?announced,thursday?the?termination?of′-90(Repeated?3?times)(hamas,announced?thursday∷announced?thursday?the?termination?of)
9)′hamas?announced?thursday?the,termination?of′-90(Repeated?1?times)(null)
10)′hamas?announced,thursday?the?completion′-85(Repeated?11?times)(hamas,announced?thursday∷announced?thursday?the?completion)
11)′hamas?announced?on?thursday,the?completion?of′-85(Repeated?3?times)(null)
12)′hamas?announced?thursday?the?completion,of?its′-85(Repeated?1?times(null)
13)′hamas?announced?thursday,the?completion?of?its′-85(Repeated?6?times)(null)
14)′hamas?announced?thursday?that?completion,of?the′-85(Repeated?1?times)(null)
15)′hamas?announced?thursday,the?completion′-85(Repeated?11?times)(null)
16)′hamas?announced,thursday?the?termination′-85(Repeated?4?times)(hamas,announced?thursday∷announced?thursday?the?termination)
17)′hamas?announced?thursday,the?end′-85(Repeated?4?times)(null)
18)′hamas?announced?thursday?the?completion,of?a′-85(Repeated?1?times)(null)
19)′hamas?announced?on,thursday?the?end?of′-85(Repeated?6?times)(hamas,announced?on?thursday∷announced?on?thursday?the?end?of)
20)′hamas?announced?on?thursday,the?termination?of′-85(Repeated?2?times)(null)
Sort according to multiplicity
1)announced,thursday?the-431(Score=45?times)
2)announced?thursday?the,completion?of-93(Score=70?times)
3)announced?thursday?the,end?of-66(Score=70?times)
4)announced?thursday?the,termination?of-47(Score=70?times)
5)hamas?announced,thursday?the-41(Score=65?times)
6)thursday,announced?the-38(Score=45?times)
7)announced?thursday?the?end,of?its-27(Score=65?times)
8)announced?thursday,the?completion-24(Score=65?times)
9)announced?thursday?the?completion,of?its-24(Score=65?times)
10)thursday?announced,the?completion-23(Score=65?times)
11)announced?thursday,that?completion-23(Score=55?times)
12)announced?thursday?the?completion,of?a-22(Score=65?times)
13)announced?thursday,the?completion?of-21(Score=70?times)
14)announced?thursday,that?completion?of-21(Score=60?times)
15)announced?thursday,that?completion?of?the-19(Score=65?times)
16)announced?on,thursday?the?end-19(Score=60?times)
17)thursday?announced,the?completion?of-18(Score=70?times)
18)announced?on,thursday?the?completion-17(Score=60?times)
19)thursday?announced?the?completion,of?the-16(Score=65?times)
20)announced?on,thursday?completion-16(Score=55?times)
Skipping?este?jueves?el?fin(2<2)
Skipping?este?jueves?el?fin?de(2<2)
Skipping?este?jueves?el?fin?de?su(2<2)
Skipping?jueves?el?fin?de(2<2)
Skipping?jueves?el?fin?de?su?(2<2)
Figure A0382572902901
jueves?el?fin?de?su?cese?was?just?translated?and?returned?results
Number?of?results=998
Translation?for?jueves?el?fin?de?su?cese?took?1.205
going?to?try?and?overlap?this?piece?with?the?hashmap
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueves?el?fin?de′,′jueves?el?fin?de?su?cese′(2,hamas?anuncióeste?jueves?el?fin?de?su?cese,3)-(998)
Got?an?overlap?in?source,checking?target
1500-998
Overlap?check?for′hamas?anuncióeste?jueves?el?fin?de′,′jueves?el?fin?de?su?cese′took1.705
***hamas?anuncióeste?jueves?el?fin?de(1500),(998)jueves?el?fin?de?su?cese=hamas?anuncióeste?jueves?el?fin?de?su?cese
###1235->1235
The overlapping result of hamas anunci ó este jueves el fin de su cese
1)hamas?announced?thursday?the?termination,of?cease-110(Repeated?3?times)(hamas?announced?thursday?the,termination?of∷thursday?the?termination?of?cease)
2)hamas?announced?thursday?the?end,of?cease-110(Repeated?2?times)(hamasannounced,thursday?the?end?of∷thursday?the?end?of?cease)
3)hamas?announced?thursday?the?completion,of?cease-110(Repeated?2?times)(hamas?announced?thursday?the,completion?of∷thursday?the?completion?of?cease)
4)hamas?announced?on?thursday?the?end,of?cease-105(Repeated?2?times)(hamasannounced?on?thursday?the,end?of∷thursday?the?end?of?cease)
5)hamas?announced?thursday?the?termination,of?cease?and-105(Repeated?2?times)(hamas?announced?thursday?the,termination?of∷thursday?the?termination?of?cease?and)
6)hamas?announced?thursday?the?end,of?the?cease-105(Repeated?3?times)(hamasannounced,thursday?the?end?of∷thursday?the?end?of?the?cease)
7)hamas?announced?on?thursday?the?termination,of?cease-105(Repeated?3?times)(hamas?announced?on?thursday,the?termination?of∷thursday?the?termination?of?cease)
8)hamas?announced?on?thursday?the?completion,of?cease-105(Repeated?2?times)(hamas?announced?on?thursday,the?completion?of∷thursday?the?completion?of?cease)
9)hamas?announced?on?thursday?the?termination,of?cease?and-100(Repeated?2times)(hamas?announced?on?thursday,the?termination?of∷thursday?the?termination?ofcease?and)
10)hamas?announced?on?thursday?completion,of?cease-100(Repeated?2?times)(hamas?announced?on?thursday,completion?of∷thursday?completion?of?cease)
11)hamas?announced?on?thursday?the?end,of?the?cease-100(Repeated?3?times)(hamas?announced?on?thursday?the,end?of∷thursday?the?end?of?the?cease)
12)hamas?announced?thursday?the?end?of,its?unilateral?cease-95(Repeated?2times)(hamas?announced?thursday,the?end?of?its∷thursday?the?end?of?its?unilateralcease)
13)hamas?announced?thursday?the?successful?completion,of?cease-90(Repeated?1times)(hamas?announced?thursday?the?successful,completion?of∷thursday?the?successfulcompletion?of?cease)
14)hamas?announced?thursday?the,end?of-90(Repeated?1?times)(hamas?announced,thursday?the?end∷thursday?the?end?of)
15)hamas?announced?on?thursday?the?eud?of,its?unilateral?cease-90(Repeated?2times)(hamas?announced?on?thursday?the?end,of?its∷thursday?the?end?of?its?unilateralcease)
16)announced?thursday?the?completion,of?cease-90(Repeated?94?times)(announcedthursday,the?completion?of∷thursday?the?completion?of?cease)
17)hamas?announced?thursday?the?end,of?cease?fire-90(Repeated?1?times)(hamasannounced,thursday?the?end?of∷thursday?the?end?of?cease?fire)
18)announced?thursday?the?end,of?cease-90(Repeated?94?times)(announcedthursday?the,end?of∷thursday?the?end?of?cease)
19)announced?thursday?the?termination,of?cease-90(Repeated?141?times)(announced?thursday,the?termination?of∷thursday?the?termination?of?cease)
20)hamas?announced?thursday?the?completion,of?cease?project-90(Repeated?1times)(hamas?announced?thursday?the,completion?of∷thursday?the?completion?of?ceaseproject)
Sort according to multiplicity
1)announced?thursday?the?end,of?the-188(Score=65?times)
2)announced?thursday?the?termination,of?cease-141(Score=90?times)
3)announced?thursday?the?end,of?the?cease-141(Score=85?times)
4)announced?thursday?the?termination,of?cease?and-94(Score=85?times)
5)announced?thursday?the?end?of,its?unilateral?cease-94(Score=75?times)
6)announced?thursday?the?end,of?the?cease?fire-94(Score=65?times)
7)announced?thursday?the?completion,of?cease-94(Score=90?times)
8)announced?thursday?the?end,of?cease-94(Score=90?times)
9)announced?thursday?the?end,of?cash-47(Score=50?times)
10)announced?thursday?the?termination,of?cease?and?desist-47(Score=65?times)
11)announced?thursday?the?end,of?cease?fire-47(Score=70?times)
12)announced?thursday?the?completion,of?cease?project-47(Score=70?times)
13)announced?thursday?the?end?of,ist?unilateral?cease?fire-47(Score=55?times)
14)announced?thursday?the?end,of?the?cease?fire?which-47(Score=60?times)
15)announced?thursday?the?end?of,its?annual-46(Score=55?times)
16)thursday?announced?that?by?the?end,of?thursday-45(Score=40?times)
17)announced?thursday?the,end?of-44(Score=70?times)
18)announced?on?thursday?the?end,of?the-24(Score=60?times)
19)announced?on?thursday?the?termination,of?cease-21(Score=85?times)
20)e?announced?thursday?the?end,of?the-20(Score=45?times)
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueves?el?fin′,′jueves?el?fin?de?su?cese′(2,hamasanuncióeste?jueves?el?fin?de?su?cese,3)-(998)
Got?an?overlap?in?source,checking?target
1500-998
Overlap?check?for′hamas?anuncióeste?jueves?el?fin′,′jueves?el?fin?de?su?cese′took?1.531
***hamas?anuncióeste?jueves?el?fin(1500),(998)jueves?el?fin?de?su?cese=hamas?anuncióeste?jueves?el?fin?de?su?cese
@@@1581->0
The overlapping result of hamas anunci ó este jueves el fin de su cese
1)′hamas?announced?thursday?the?end,of?cease′-110(Repeated?2?times)(hamasannounced,thursday?the?end?of∷thursday?the?end?of?cease)
2)′hamas?announced?thursday?the?termination,of?cease′-110(Repeated?3?times)(hamas?announced,thursday?the?termination?of∷thursday?the?termination?of?cease)
3)′hamas?announced?thursday?the?completion,of?cease′-110(Repeated?2?times)(hamas?announced?thursday,the?completion?of∷thursday?the?completion?of?ceass)
4)′hamas?announced?on?thursday?the?termination,of?cease′-105(Repeated?3?times)(hamas?announced?on?thursday,the?termination?of∷thursday?the?termination?of?cease)
5)′hamas?announced?thursday?the?end,of?the?cease′-105(Repeated?3?times)(hamasannounced,thursday?the?end?of∷thursday?the?end?of?the?cease)
6)′hamas?announced?on?thursday?the?completion,of?cease′-105(Repeated?2?times)(hamas?announced?on?thursday,the?completion?of∷thursday?the?completion?of?cease)
7)′hamas?announced?on?thursday?the?end,of?cease′-105(Repeated?2?times)(hamasannounced?on?thursday?the,end?of∷thursday?the?end?of?cease)
8)′hamas?announced?thursday?the?termination,of?cease?and′-105(Repeated?2times)(hamas?announced,thursday?the?termination?of∷thursday?the?termination?of?ceaseand)
9)′hamas?announced?on?thursday?completion,of?cease′-100(Repeated?2?times)(hamas?announced?on,thursday?completion?of∷thursday?completion?of?cease)
10)′hamas?announced?on?thursday?the?end,of?the?cease′-100(Repeated?3?times)(hamas?announced?on?thursday?the,end?of∷thursday?the?end?of?the?cease)
11)′hamas?announced?on?thursday?the?termination,of?cease?and′-100(Repeated?2times)(hamas?announced?on?thursday,the?termination?of∷thursday?the?termination?ofcease?and)
12)′hamas?announced?thursday?the?end?of,its?unilateral?cease′-95(Repeated?2times)(hamas?announced?thursday,the?end?of?its∷thursday?the?end?of?its?unilateralcease)
13)′hamas?announced?on?thursday?the?end,of?its?unilateral?cease′-90(Repeated?2times)(hamas?announced?on?thursday?the,end?of∷thursday?the?end?of?its?unilateralcease)
14)′hamas?announced?on?thursday?the?end?of,its?unilateral?cease′-90(Repeated?2times)(null)
15)′hamas?announced?thursday?the?end,of?cease?fire′-90(Repeated?1?times)(hamasannounced,thursday?the?end?of∷thursday?the?end?of?cease?fire)
16)′announced?thursday?the?termination,of?cease′-90(Repeated?141?times)(announced?thursday,the?termination?of∷thursday?the?termination?of?cease)
17)′hamas?announced?thursday?the?completion,of?cease?project′-90(Repeated?1times)(hamas?announced?thursday,the?completion?of∷thursday?the?completion?of?ceaseproject)
18)′hamas?announced?thursday?the?successful?completion,of?cease′-90(Repeated?1times)(hamas?announced?thursday,the?successful?completion?of∷thursday?the?successfulcompletion?of?cease)
19)′hamas?announced?thursday?the,end?of′-90(Repeated?1?times)(hamasannounced,thursday?the?end∷thursday?the?end?of)
20)′announced?thursday?the?completion,of?cease′-90(Repeated?94?times)(announced?thursday,the?completion?of∷thursday?the?completion?of?cease)
Sort according to multiplicity
1)announced?thursday?the,end?of-211(Score=70?times)
2)announced?thursday?the?end,of?the-188(Score=65?times)
3)announced?thursday?the?termination,of?cease-141(Score=90?times)
4)announced?thursday?the?end,of?the?cease-141(Score=85?times)
5)announced?thursday?the?end?of,its?unilateral?cease-94(Score=75?times)
6)announced?thursday?the?termination,of?cease?and-94(Score=85?times)
7)announced?thursday?the?completion,of?cease-94(Score=90?times)
8)announced?thursday?the?end,of?cease-94(Score=90?times)
9)announced?thursday?the?end,of?the?cease?fire-94(Score=65?times)
10)announced?thursday?the?end?of,its?unilateral?cease?fire-47(Score=55?times)
11)announced?thursday?the?termination,of?cease?and?desist-47(Score=65?times)
12)announced?thursday?the?end,of?the?cease?fire?which-47(Score=60?times)
13)announced?thursday?the?end,of?cease?fire-47(Score=70?times)
14)announced?thursday?the?completion,of?cease?project-47(Score=70?times)
15)announced?thursday?the?end?of,its?annual-46(Score=55?times)
16)announced?thursday?the?end,of?cash-29(Score=50?times)
17)announced?on?thursday?the?end,of?the-24(Score=60?times)
18)e?announced?thursday?the,end?of-22(Score=50?times)
19)announced?on?thursday?the?termination,of?cease-21(Score=85?times)
20)e?announced?thursday?the?end,of?the-20(Score=45?times)
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueyes?el′,′jueves?el?fin?de?su?cese′(2,hamasanuncióeste?jueves?el?fin?de?su?cese,3)-(998)
Got?an?overlap?in?source,checking?target
1000-998
Overlap?check?for′hamas?anuncióeste?jueves?el′,′jueves?el?fin?de?su?cese′took?1.348
***hamas?anuncióeste?jueves?el(1000),(998)jueves?el?fin?de?su?cese=hamasanuncióeste?jueves?el?fin?de?su?cese
@@@1512->0
The overlapping result of hamas anunci ó este jueves el fin de su cese
1)′hamas?announced?thursday?the?end,of?cease′-110(Repeated?2?times)(null)
2)′hamas?announced?thursday?the?termination,of?cease′-110(Repeated?3?times)(null)
3)′hamas?announced?thursday?the?completion,of?cease′-110(Repeated?2?times)(null)
4)′hamas?announced?on?thursday?the?termination,of?cease′-105(Repeated?3?times)(null)
5)′hamas?announced?thursday?the?end,of?hte?cease′-105(Repeated?3?times)(null)
6)′hamas?announced?on?thursday?the?completion,of?cease′-105(Repeated?2?times)(null)
7)′hamas?announced?on?thursday?the?end,of?cease′-105(Repeated?2?times)(null)
8)′hamas?announced?thursday?the?termination,of?cease?and′-105(Repeated?2times)(null)
9)′hamas?announced?on?thursday?completion,of?cease′-100(Repeated?2?times)(null)
10)′hamas?announced?on?thursday?the?end,of?the?cease′-100(Repeated?3?times)(null)
11)′hamas?announced?on?thursday?the?termination,of?cease?and′-100(Repeated?2times)(null)
12)′hamas?announced?thursday?the?end?of,its?unilateral?cease′-95(Repeated?2times)(null)
13)′hamas?announced?on?thursday?the?end,of?its?unilateral?cease′-90(Repeated?2times)(null)
14)′hamas?announced?on?thursday?the?end?of,its?unilateral?cease′-90(Repeated?2times)(null)
15)′hamas?announced?thursday?the?end,of?cease?fire′-90(Repeated?1?times)(null)
16)′announced?thursday?the?termination,of?cease′-90(Repeated?141?times)(null)
17)′hamas?announced?thursday?the?completion,of?cease?project′-90(Repeated?1times)(null)
18)′hamas?announced?thursday?the?successful?completion,of?cease′-90(Repeated?1times)(null)
19)′hamas?announced?thursday?the,end?of′-90(Repeated?1?times)(null)
20)′announced?thursday?the?completion,of?cease′-90(Repeated?94?times)(null)
Sort according to multiplicity
1)announced?thursday?the,end?of-207(Score=70?times)
2)announced?thursday?the?end,of?the-188(Score=65?times)
3)announced?thursday?the?termination,of?ceasee-141(Score=90?times)
4)announced?thursday?the?end,of?the?cease-141(Score=85?times)
5)announced?thursday?the?end?of,its?unilateral?cease-94(Score=75?times)
6)announced?thursday?the?termination,of?cease?and-94(Score=85?times)
7)announced?thursday?the?completion,of?cease-94(Score=90?times)
8)announced?thursday?the?end,of?cease-94(Score=90?times)
9)announced?thursday?the?end,of?the?cease?fire-94(Score=65?times)
10)announced?thursday?the?end?of,its?unilateral?cease?fire-47(Score=55?times)
11)announced?thursday?the?termination,of?cease?and?desist-47(Score=65?times)
12)announced?thursday?the?end,of?the?cease?fire?which-47(Score=60?times)
13)announced?thursday?the?end,of?cease?fire-47(Score=70?times)
14)announced?thursday?the?completion,of?cease?project-47(Score=70?times)
15)announced?thursday?the?end?of,ist?annual-46(Score=55?times)
16)announced?on?thursday?the?end,of?the-24(Score=60?times)
17)e?announced?thursday?the,end?of-22(Score=50?times)
18)announced?thursday?the?end,of?cash-22(Score=50?times)
19)announced?on?thursday?the?termination,of?cease-21(Score=85?times)
20)e?announced?thursday?the?end,of?the-20(Score=45?times)
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueves′,′jueves?el?fin?de?su?cese′(2,null,3)-(998)
No?good?source?overlap
Skipping?el?fin?de?su(2<1)
Skipping?el?fin?de?su?cese(2<2)
Skipping?el?fin?de?su?cese?del(2<2)
Skipping?fin?de?su?cese(2<2)
Skipping?fin?de?su?cese?del(2<2)
fin?de?su?cese?del?fuego?was?just?translated?and?returned?results
Number?of?results=999
Translation?for?fin?de?su?cese?del?fuego?took?1.246
going?to?try?and?overlap?this?piece?with?the?hashmap
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueves?el?fin?de′,′fin?de?su?cese?del?fuego′(2,hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego,5)--(999)
Got?an?overlap?in?source,checking?target
1500-999
Overlap?check?for′hamas?anuncióeste?jueves?el?fin?de′,′fin?de?su?cese?del?fuego′took2.114
***hamas?anuncióeste?jueves?el?fin?de(1500),(999)fin?de?su?cese?del?fuego=hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego
###218->218
The overlapping result of hamas anunci ó este jueves el fin de su cese del fuego
1)hamas?announced?thursday?the?end?of,its?unilateral?cease?fire-115(Repeated?1times)(hamas?announced?thursday?the?end,of?its∷end?of?its?unilateral?cease?fire)
2)hamas?announced?on?thursday?the?end?of,its?unilateral?cease?fire-110(Repeated1?times)(hamas?announced?on?thursday?the?end,of?its∷end?of?its?unilateral?cease?fire)
3)thursday?announced?the?end?of,the?cease?fire-105(Repeated?20?times)(thursdayannounced,the?end?of?the∷the?end?of?the?cease?fire)
4)which?thursday?announced?the?end?of,the?cease?fire-100(Repeated?4?times)(which?thursday?announced?the?end,of?the∷the?end?of?the?cease?fire)
5)ou?thursday?announced?the?end?of,the?cease?fire-100(Repeated?4?times)(onthursday?announced,the?end?of?the∷the?end?of?the?cease?fire)
6)thursday?announced?the?end?of,the?cease?fire?which-100(Repeated?15?times)(thursday?announced?the?end,of?the∷end?of?the?cease?fire?which)
7)thursday?announced?the?end?of,its?unilateral?cease?fire-95(Repeated?4?times)(thursday?announced?the?end,of?its∷end?of?its?unilateral?cease?fire)
8)hamas?announced?thursday?the?end?of,its?unilateral?cease-95(Repeated?2?times)(hamas?announced?thursday?the?end,of?its∷end?of?its?unilateral?cease)
9)announced?thursday?the?end?of,its?unilateral?cease?fire-95(Repeated?46?times)(announced?thursday?the?end,of?its∷end?of?its?unilateral?cease?fire)
10)which?thursday?announced?the?end?of,the?cease?fire?which-95(Repeated?3times)(which?thursday?announced?the?end,of?the∷end?of?the?cease?fire?which)
11)on?thursday?announced?the?end?of,the?cease?fire?which-95(Repeated?3?times)(on?thursday?announced?the?end,of?the∷end?of?the?cease?fire?which)
12)thursday?announced?the?end?of,his?light-95(Repeated?6?times)(thursdayannounced?the?end,of?his∷the?end?of?his?light)
13)which?thursday?announced?the?end?of,its?unilateral?cease?fire-90(Repeated?1times)(which?thursday?announced?the?end,of?its∷end?of?its?unilateral?cease?fire)
14)on?thursday?announced?the?end?of,its?unilateral?cease?fire-90(Repeated?1times)(on?thursday?announced?the?end,of?its∷end?of?its?unilateral?cease?fire)
15)on?thursday?announced?the?end?of,his?light-90(Repeated?2?times)(on?thursdayannounced?the?end,of?his∷the?end?of?his?light)
16)they?announced?thursday?the?end?of,its?unilateral?cease?fire-90(Repeated?1times)(they?announced?thursday?the?end,of?its∷end?of?its?unilateral?cease?fire)
17)and?announced?thursday?the?end?of,its?unilateral?cease?fire-90(Repeated?1times)(and?announced?thursday?the?end,of?its∷end?of?its?unilateral?cease?fire)
18)were?announced?thursday?the?end?of,its?unilateral?cease?fire-90(Repeated?1times)(were?announced?thursday?the?end,of?its∷end?of?its?unilateral?cease?fire)
19)was?announced?thursday?the?end?of,its?unilateral?cease?fire-90(Repeated?1times)(was?announced?thursday?the?end,of?its∷end?of?its?unilateral?cease?fire)
20)be?announced?thursday?the?end?of,its?unilateral?cease?fire-90(Repeated?1times)(be?announced?thursday?the?end,of?its∷end?of?its?unilateral?cease?fire)
Sort according to multiplicity
1)announced?thursday?the?end?of,its?unilateral?cease-92(Score=75?times)
2)announced?thursday?the?end?of,its?unilateral?cease?fire-46(Score=95?times)
3)thursday?announced?the?end?of,the?fire-40(Score=85?times)
4)thursday?announced?the?end?of,the?cease-25(Score=85?times)
5)thursday?announced?the?end?of,the?cease?fire-20(Score=105?times)
6)thursday?announced?the?end?of,the?fire?and-15(Score=80?times)
7)thursday?announced?the?end?of,the?unconditional?cease?fire-15(Score=85times)
8)thursday?announced?the?end?of,the?cease?fire?which-15(Score=100?times)
9)thursday?announced?the?end?of,a?14-month?cease-10(Score=65?times)
10)thursday?announced?the?end?of,the?unconditional?cease?fire?that-10(Score=80times)
11)thursday?announced?the?end?of,the?fire?his-10(Score=90?times)
12)thursday?announced?the?end?of,the?cease?fire?which?ended-10(Score=80times)
13)thursday?announced?the?end?of,the?fire?and?his-10(Score=85?times)
14)announced?on?thursday?the?end?of,its?unilateral?cease-10(Score=70?times)
15)e?announced?thursday?the?end?of,its?unilateral?cease-10(Score=55?times)
16)thursday?announced?the?end?of,the?hearth-10(Score=85?times)
17)thursday?announced?the?end?of,its?unilateral?cease-8(Score=75?times)
18)on?thursday?announced?the?end?of,the?fire-8(Score=80?times)
19)officials?thursday?announced?the?end?of,the?fire-8(Score=65?times)
20)which?thursday?announced?the?end?of,the?fire-8(Score=80?times)
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueves?el?fin′,′fin?de?su?cese?del?fuego′(2,null,5)--(999)
No?good?source?overlap
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueves?el′,′fin?de?su?cese?del?fuego′(2,null,5)--(999)
No?good?source?overlap
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueves?el?fin?de?su?cese′,′fin?de?su?cese?delfuego′(2,hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego,5)-(999)Got?an?overlap?in?source,checking?target
1500-999
Overlap?check?for′hamas?anuncióeste?jueves?el?fin?de?su?cese′,′fin?de?su?cese?del?fuego′took?2.737
***hamas?anuncióeste?jueves?el?fin?de?su?cese(1500),(999)fin?de?su?cese?delfuego=hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego
@@@3369->0
The overlapping result of hamas anunci ó este jueves el fin de su cese del fuego
1)′hamas?announced?thursday?the?end?of,cease?fire′-130(Repeated?1?times)(hamasannounced?thursday?the?end,of?cease∷end?of?cease?fire)
2)′hamas?announced?thursday?the?end?of?cesse,fire?the′-125(Repeated?2?times)(hamas?announced?thursday?the?end,of?cease?fire∷of?cease?fire?the)
3)′hamas?announced?thursday?the?end?of?the,cease?fire′-125(Repeated?1?times)(hamas?announced?thursday?the?end,of?the?cease∷the?end?of?the?cease?fire)
4)′hamas?announced?thursday?the?end?of?cease,fire?it′-125(Repeated?2?times)(hamas?announced?thursday?the?end,of?cease?fire∷of?cease?fire?it)
5)′hamas?announced?thursday?the?end?of?cease,fire?by′-125(Repeated?3?times)(hamas?announced?thursday?the?end,of?cease?fire∷of?cease?fire?by)
6)′hamas?announced?thursday?the?end?of?cease,fire?in′-125(Repeated?3?times)(hamas?announced?thursday?the?end,of?cease?fire∷of?cease?fire?in)
7)′hamas?announced?thursday?the?end?of?cease,fire?was′-125(Repeated?2?times)(hamas?announced?thursday?the?end,of?cease?fire∷of?cease?fire?was)
8)′hamas?announced?on?thursday?the?end?of,cease?fire′-125(Repeated?1?times)(hamas?announced?on?thursday?the?end,of?cease∷end?of?cease?fire)
9)′hamas?announced?thursday?the?end?of?cease,fire?or′-125(Repeated?2?times)(hamas?announced?thursday?the?end,of?cease?fire∷of?cease?fire?or)
10)′hamas?announced?thursday?the?end?of?cease,fire?and′-125(Repeated?1?times)(hamas?announced?thursday?the?end,of?cease?fire∷of?cease?fire?and)
11)′hamas?announced?thursday?the?end?of?cease,fire?is′-125(Repeated?2?times)(hamas?announced?thursday?the?end,of?cease?fire∷of?cease?fire?is)
12)′hamas?announced?thursday?the?end?of?cease,fire?for′-125(Repeated?1?times)(hamas?announced?thursday?the?end,of?cease?fire∷of?cease?fire?for)
13)′hamas?announced?on?thursday?the?end?of?cease,fire?by′-120(Repeated?3?times)(hamas?announced?on?thursday?the?end,of?cease?fire∷of?cease?fire?by)
14)′hamas?announced?on?thursday?the?end?of?cease,fire?the′-120(Repeated?2times)(hamas?announced?on?thursday?the?end,of?cease?fire∷of?cease?fire?the)
15)′hamas?announced?thursday?the?end?of?cease,fire?by?the′-120(Repeated?1times)(hamas?announced?thursday?the?end,of?cease?fire∷of?cease?fire?by?the)
16)′hamas?announced?on?thursday?the?end?of?cease,fire?is′-120(Repeated?2?times)(hamas?announced?on?thursday?the?end,of?cease?fire∷of?cease?fire?is)
17)′hamas?announced?on?thursday?the?end?of?cease,fire?and′-120(Repeated?1times)(hamas?announced?on?thursday?the?end,of?cease?fire∷of?cease?fire?and)
18)′hamas?announced?thursday?the?end?of?cease,fire?in?the′-120(Repeated?1?times)(hamas?announced?thursday?the?end,of?cease?fire∷of?cease?fire?in?the)
19)′hamas?announced?thursday?the?end?of?cease,fire?it?has′-120(Repeated?1?times)(hamas?announced?thursday?the?end,of?cease?fire∷of?cease?fire?it?has)
20)′hamas?announced?on?thursday?the?end?of?cease,fire?in′-120(Repeated?3?times)(hamas?announced?on?thursday?the?end,of?cease?fire∷of?cease?fire?in)
Sort according to multiplicity
1)announced?thursday?the?end?of?cease,fire?in-101(Score=105?times)
2)announced?thursday?the?end?of?cease,fire?by-101(Score=105?times)
3)announced?thursday?the?end?of?cease,fire?it-94(Score=105?times)
4)announced?thursday?the?end?of?cease,fire?or-94(Score=105?times)
5)announced?thursday?the?end?of?cease,fire?was-94(Score=105?times)
6)announced?thursday?the?end?of?the?cease,fire?at-74(Score=100?times)
7)announced?thursday?the?end?of?cease,fire?the-54(Score=105?times)
8)announced?thursday?the?end?of?cease,fire?is-54(Score=105?times)
9)announced?thursday?the?end?ofthe?cease,fire?to-47(Score=100?times)
10)announced?thursday?the?end?of?cease,fire?and-47(Score=105?times)
11)announced?thursday?the?end?of,cease?fire-47(Score=110?times)
12)announced?thursday?the?end?of?cease,fire?in?the-47(Score=100?times)
13)announced?thursday?the?end?of?cease,fire?for-47(Score=105?times)
14)announced?thursday?the?end?of?the?cease,fire?which-47(Score=100?times)
15)announced?thursday?the?end?of?cease,fire?by?the-47(Score=100?times)
16)announced?thursday?the?end?of?cease,fire?was?the-47(Score=100?times)
17)announced?thursday?the?end?of?cease,fire?or?what-47(Score=100?times)
18)announced?thursday?the?end?of?the,cease?fire-47(Score=105?times)
19)announced?thursday?the?end?of?cease,fireit?has-47(Score=100?times)
20)announced?thursday?the?end?of?its?unilateral,cease?fire-30(Score=95?times)
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueves′,′fin?de?su?cese?del?fuego′(2,null,5)-(999)
No?good?source?overlap
Skipping?de?su?cese?del(2<1)
Skipping?de?su?cese?del?fuego(2<2)
Figure A0382572903001
de?su?cese?del?fuego?con?was?just?translated?and?returned?results
Number?of?results=1000
Translation?for?de?su?cese?del?fuego?con?took?1.176
going?to?try?and?overlap?this?piece?with?the?hashmap
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueves?el?fin?de′,′de?su?cese?del?fuego?con′(2,null,6)--(1000)
No?good?source?overlap
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueves?el?fin′,′de?su?cese?del?fuego?con′(2,null,6)-(1000)
No?good?source?overlap
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueves?el′,′de?su?cese?del?fuego?con′(2,null,6)-(1000)
No?good?source?overlap
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueves?el?fin?de?su?ceae?del?fuego′,′de?su?cesedel?fuego?con′(2,hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego?con,6)-(1000)Got?an?overlap?in?source,checking?target
1500-1000
Overlap?check?for′hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego′,′de?su?cese?delfuego?con′took?6.308
***hamas?anuncióeste?jueves?el?in?de?su?cese?del?fuego(1500),(1000)de?sucese?del?fuego?con=hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego?con
###16233->16233
The overlapping result of hamas anunci ó este jueves el fin de su cese del fuego con
1)hamas?announced?thursday?the?end?of?cease,fire?with?their-140(Repeated?4times)(hamas?announced?thursday?the?end?of,cease?fire∷cease?ire?with?their)
2)hamas?announced?thursday?the?end?of?cease,fire?with-135(Repeated?21?times)(hamas?announced?thursday?the?end?of,cease?fire∷of?cease?fire?with)
3)hamas?announced?on?thursday?the?end?of?cease,fire?with?their-135(Repeated?4times)(hamas?announced?on?thursday?the?end?of,cease?fire∷cease?fire?with?their)
4)announced?thursday?the?end?of?cease,fire?with?hamas-135(Repeated?94?times)(announced?thursday?the?end?of,cease?fire∷cease?fire?with?hamas)
5)hamas?announced?thursday?the?end?of?the?cease,fire?with?their-135(Repeated?4times)(hamas?announced?thursday?the?end?of?the,cease?fire∷the?cease?fire?with?their)
6)be?announced?thursday?the?end?of?cease,fire?with?hamas-130(Repeated?2?times)(be?announced?thursday?the?end?of,cease?fire∷cease?fire?with?hamas)
7)hamas?announced?on?thursday?the?end?of?cease,fire?with-130(Repeated?21times)(hamas?announced?on?thursday?the?end?of,cease?fire∷of?cease?fire?with)
8)announced?thursday?the?end?of?cease,fire?with?hamas?and-130(Repeated?47times)(announced?thursday?the?end?of,cease?fire∷cease?fire?with?hamas?and)
9)and?announced?thursday?the?end?of?cease,fire?with?hamas-130(Repeated?4times)(and?announced?thursday?the?end?of,cease?fire∷cease?fire?with?hamas)
10)announced?on?thursday?the?end?of?cease,fire?with?hamas-130(Repeated?12times)(announced?on?thursday?the?end?of,cease?fire∷cease?fire?with?hamas)
11)announced?thursday?the?end?of?the?cease,fire?with?hamas-130(Repeated?94times)(announced?thursday?the?end?of?the,cease?fire∷cease?fire?with?hamas)
12)hamas?announced?thursday?the?end?of?the?cease,fire?with-130(Repeated?21times)(hamas?announced?thursday?the?end?of?the,cease?fire∷the?cease?fire?with)
13)hamas?announced?thursday?the?end?of?cease,fire?with?the-130(Repeated?13times)(hamas?announced?thursday?the?end?of,cease?fire∷of?cease?fire?with?the)
14)hamas?announced?on?thursday?the?end?of?the?cease,fire?with?their-130(Repeated?4?times)(hamas?announced?on?thursday?the?end?of?the,cease?fire∷the?ceasefire?with?their)
15)they?announced?thursday?the?end?of?cease,fire?with?hamas-130(Repeated?2times)(they?announced?thursday?the?end?of,cease?fire∷cease?fire?with?hamas)
16)were?announced?thursday?the?end?of?cease,fire?with?hamas-130(Repeated?2times)(were?announced?thursday?the?end?of,cease?fire∷cease?fire?with?hamas)
17)hamas?announced?thursday?the?end?of?cease,fire?with?them-130(Repeated?1times)(hamas?announced?thursday?the?end?of,cease?fire∷cease?fire?with?them)
18)was?announced?thursday?the?end?of?cease,fire?with?hamas-130(Repeated?2times)(was?announced?thursday?the?end?of,cease?fire∷cease?fire?with?hamas)
19)thursday?announced?the?end?of?the?cease?fire,with?hamas-130(Repeated?10times)(thursday?announced?the?end?of?the?cease,fire?with∷cease?fire?with?hamas)
20)hamas?announced?thursday?the?end?of?cease,fire?as-125(Repeated?3?times)(hamas?announced?thursday?the?end?of,cease?fire∷cease?fire?as)
Sort according to multiplicity
1)announced?thursday?the?end?of?cease,fire?with-246(Score=115?times)
2)announced?thursday?the?end?of?the?cease,fire?with-186(Score=110?times)
3)announced?thursday?the?end?of?cease,fire?with?hamas-94(Score=135?times)
4)announced?thursday?the?end?of?cease,fire?with?the-94(Score=110?times)
5)announced?thursday?the?end?of?the?cease,fire?with?hamas-94(Score=130?times)
6)announced?thursday?the?end?of?its?unilateral?cease,fire?with-86(Score=100times)
7)announced?thursday?the?end?of?the?cease,fire?woth?the-74(Score=105?times)
8)announced?thursday?the?end?of?cease,fire?with?their-64(Score=120?times)
9)announced?thursday?the?end?of?its?unilateral?cease,fire?with?hamas-60(Score=120?times)
10)announced?thursday?the?end?of?the?cease,fire?with?their-53(Score=115?times)
11)announced?thursday?the?end?of?the?cease,fire?a-51(Score=100?times)
12)announced?on?thursday?the?end?of?cease,fire?with-51(Score=110?times)
13)announced?thursday?the?end?of?cease,fire?a-49(Score=105?times)
14)announced?on?thursday?the?end?of?the?cease,fire?with-47(Score=105?times)
15)announced?thursday?the?end?of?the?cease,fire?with?hamas?and-47(Score=125times)
16)announced?thursday?the?end?of?cease,fire?with?hamas?and-47(Score=130times)
17)announced?on?thursday?the?end?of?cease,fire?a-33(Score=100?times)
18)announced?on?thursday?the?end?of?the?cease,fire?a-32(Score=95?times)
19)hamas?announced?thursday?the?end?of?the?cease,fire?a-30(Score=120?times)
20)announced?thursday?the?end?of?its?unilateral?cease,fire?with?hamas?and-30(Score=115?times)
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueves?el?fin?de?su?cese′,′de?su?cese?del?fuegocon′(2,hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego?con,6)--(1000)Got?an?overlap?in?source,checking?target
1500-1000
Overlap?check?for′hamas?anuncióeste?jueves?el?fin?de?su?cese′,′de?su?cese?del?fuego?con′took?3.087
***hamas?anuncióeste?jueves?el?fin?de?su?cese(1500),(1000)de?su?cese?delfuego?con=hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego?con
@@@17704->0
The overlapping result of hamas anunci ó este jueves el fin de su cese del fuego con
1)′hamas?announced?thursday?the?end?of?cease,fire?with?their′-140(Repeated?4times)(null)
2)′hamas?announced?thursday?the?end?of?cease,fire?with′-135(Repeated?21?times)(hamas?announced?thursday?the?end,of?cease?fire∷of?cease?fire?with)
3)′hamas?announced?on?thursday?the?end?of?cease,fire?with?their′-135(Repeated?4times)(null)
4)′announced?thursday?the?end?of?cease,fire?with?hamas′-135(Repeated?94?times)(null)
5)′hamas?announced?thursday?the?end?of?the?cease,fire?with?their′-135(Repeated4?times)(null)
6)′be?announced?thursday?the?end?of?cease,fire?with?hamas′-130(Repeated?2times)(null)
7)′hamas?announced?on?thursday?the?end?of?cease,fire?with′-130(Repeated?21times)(hamas?announced?on?thursday?the?end,of?cease?fire∷of?cease?fire?with)
8)′announced?thursday?the?end?of?cease,fire?with?hamas?and′-130(Repeated?47times)(null)
9)′and?announced?thursday?the?end?of?cease,fire?with?hamas′-130(Repeated?4times)(null)
10)′announced?on?thursday?the?end?of?cease,fire?with?hamas′-130(Repeated?12times)(null)
11)′announced?thursday?the?end?of?the?cease,fire?with?hamas′-130(Repeated?94times)(null)
12)′hamas?announced?thursday?the?end?of?the?cease,fire?with′-130(Repeated?21times)(null)
13)′hamas?announced?thursday?the?end?of?cease,fire?with?the′-130(Repeated?13times)(hamas?announced?thursday?the?end,of?cease?fire∷of?cease?fire?with?the)
14)′hamas?announced?on?thursday?the?end?of?the?cease,fire?with?their′-130(Repeated?4?times)(null)
15)′they?announced?thursday?the?end?of?cease,fire?with?hamas′-130(Repeated?2times)(null)
16)′were?announced?thursday?the?end?of?cease,fire?with?hamas′-130(Repeated?2times)(null)
17)′hamas?announced?thursday?the?end?of?cease,fire?with?them′-130(Repeated?1times)(null)
18)′was?announced?thursday?the?end?of?cease,fire?with?hamas′-130(Repeated?2times)(null)
19)′thursday?announced?the?end?of?the?cease?fire,with?hamas′-130(Repeated?10times)(null)
20)′hamas?announced?thursday?the?end?of?cease,fire?as′-125(Repeated?3?times)(null)
Sort according to multiplicity
1)announced?thursday?the?end?of?cease,fire?with-229(Score=115?times)
2)announced?thursday?the?end?of?the?cease,fire?with-172(Score=110?times)
3)announced?thursday?the?end?of?cease,fire?with?hamas-94(Score=135?times)
4)announced?thursday?the?end?of?the?cease,fire?with?hamas-94(Score=130?times)
5)announced?thursday?the?end?of?cease,fire?with?the-83(Score=110?times)
6)announced?thursday?the?end?of?its?unilateral?cease,fire?with-80(Score=100times)
7)announced?thursday?the?end?of?the?cease,fire?with?the-66(Score=105?times)
8)announced?thursday?the?end?of?cease,fire?with?their-62(Score=120?times)
9)announced?thursday?the?end?of?its?unilateral?cease,fire?with?hamas-58(Score=120?times)
10)announced?thursday?the?end?of?cease,fire?a-49(Score=105?times)
11)announced?on?thursday?the?end?of?cease,fire?with-49(Score=110?times)
12)announced?thursday?the?end?of?the?cease,fire?a-47(Score=100?times)
13)announced?on?thursday?the?end?of?the?cease,fire?with-47(Score=105?times)
14)announced?thursday?the?end?of?the?cease,fire?with?hamas?and-47(Score=125ties)
15)announced?thursday?the?end?of?cease,fire?with?hamas?and-47(Score=130times)
16)announced?thursday?the?end?of?the?cease,fire?with?their-45(Score=115?times)
17)announced?on?thursday?the?end?of?cease,fire?a-33(Score=100?times)
18)announced?on?thursday?the?end?of?the?cease,fire?a-32(Score=95?times)
19)hamas?announced?thursday?the?end?of?the?cease,fire?a-30(Score=120?times)
20)hamas?announced?on?thursday?the?end?of?the?cease,fire?a-29(Score=115times)
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueves′,′de?su?cese?del?fuego?con′(2,null,6)-(1000)
No?good?source?overlap
Skipping?su?cese?del?fuegp(2<2)
Figure A0382572903042
su?cese?del?fuego?con?was?just?translated?and?returned?results
Number?of?results=1000
Translation?for?su?cese?del?fuego?con?took?0.949
going?to?try?and?overlap?this?piece?with?the?hashmap
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueves?el?fin?de′,′su?cese?del?fuego?con′(2,null,7)--(1000)
No?good?source?overlap
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueves?el?fin′,′su?cese?del?fuego?con′(2,null,7)--(1000)
No?good?source?overlap
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap?′hamas?anuncióeste?jueves?el′,′su?cese?del?fuego?con′(2,null,7)-(1000)
No?good?source?overlap
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego′,′su?cese?delfuego?con′(2,hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego?con,7)--(1000)Got?an?overlap?in?source,checking?target
1500-1000
Overlap?check?for′hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego′,′su?cese?delfuego?con′took?7.002
***hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego(1500),(1000)su?cesedel?fuego?con=hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego?con
@@@19781->0
The overlapping result of hamas anunci ó este jneves el fin de su cese del fuego con
1)′hamas?announced?thursday?the?end?of?cease,fire?with?their′-140(Repeated?4times)(hamas?announced?thursday?the?end?of,cease?fire∷cease?fire?with?their)
2)′hamas?announced?thursday?the?end?of?cease,fire?with′-135(Repeated?21?times)(hamas?announced?thursday?the?end?of,cease?fire∷of?cease?fire?with)
3)′hamas?announced?on?thursday?the?end?of?cease,fire?with?their′-135(Repeated?4times)(hamas?announced?on?thursday?the?end?of,cease?fire∷cease?fire?with?their)
4)′hamas?announced?thursday?the?end?of?cease,fire?his′-135(Repeated?3?times)(hamas?announced?thursday?the?end?of,cease?fire∷cease?fire?his)
5)′announced?thursday?the?end?of?cease,fire?with?hamas′-135(Repeated?94?times)(announced?thursday?the?end?of,cease?fire∷cease?fire?with?hamas)
6)′hamas?announced?thursday?the?end?of?the?cease,fire?with?their′-135(Repeated4?times)(hamas?announced?thursday?the?end?of?the,cease?fire∷the?cease?fire?with?their)
7)′be?announced?thursday?the?end?of?cease,fire?with?hamas′-130(Repeated?2times)(be?announced?thursday?the?end?of,cease?fire∷cease?fire?with?hamas)
8)′hamas?announced?on?thursday?the?end?of?cease,fire?with′-130(Repeated?21times)(hamas?announced?on?thursday?the?end?of,cease?fire∷of?cease?fire?with)
9)′announced?thursday?the?end?of?cease,fire?with?hamas?and′-130(Repeated?47times)(announced?thursday?the?end?of,cease?fire∷cease?fire?with?hamas?and)
10)′and?announced?thursday?the?end?of?cease,fire?with?hamas′-130(Repeated?4times)(and?announced?thursday?the?end?of,cease?fire∷cease?fire?with?hamas)
11)′hamas?announced?thursday?the?end?of?cease?fire,in?their′-130(Repeated?3times)(hamas?announced?thursday?the?end?of?cease,fire?in∷cease?fire?in?their)
12)′hamas?announced?thursday?the?end?of?cease,fire?to?his′-130(Repeated?2?times)(hamas?announced?thursday?the?end?of,cease?fire∷cease?fire?to?his)
13)′announced?on?thursday?the?end?of?cease,fire?with?hamas′-130(Repeated?12times)(announced?on?thursday?the?end?of,cease?fire∷cease?fire?with?hamas)
14)′announced?thursday?the?end?of?the?cease,fire?with?hamas′-130(Repeated?94times)(announced?thursday?the?end?of?the,cease?fire∷cease?fire?with?hamas)
15)′hamas?announced?thursday?the?end?of?cease,fire?had?his′-130(Repeated?2times)(hamas?announced?thursday?the?end?of,cease?fire∷cease?fire?had?his)
16)′hamas?announced?thursday?the?end?of?the?cease,fire?with′-130(Repeated?21times)(hamas?announced?thursday?the?end?of?the,cease?fire∷the?cease?fire?with)
17)′hamas?announced?thursday?the?end?of?cease,fire?on?their′-130(Repeated?2times)(hamas?announced?thursday?the?end?of,cease?fire∷cease?fire?on?their)
18)′hamas?announced?thursday?the?end?of?cease?fire,for?their′-130(Repeated?2times)(hamas?announced?thursday?the?end?of?cease,fire?for∷cease?fire?for?their)
19)′hamas?announced?thursday?the?end?of?cease,fire?with?the′-130(Repeated?13times)(hamas?announced?thursday?the?end?of,cease?fire∷of?cease?fire?with?the)
20)′hamas?announced?thursday?the?end?of?cease?fire,in?his′-130(Repeated?2?times)(hamas?announced?thursday?the?end?of?cease,fire?in∷cease?fire?in?his)
Sort according to multiplicity
1)announced?thursday?the?end?of?cease,fire?witn-178(Score=115?times)
2)announced?thursday?the?end?of?the?cease,fire?with-136(Score=110?times)
3)announced?thursday?the?end?of?the?cease,fire?with?hamas-94(Score=130?times)
4)announced?thursday?the?end?of?cease,fire?with?hamas-94(Score=135?times)
5)announced?thursday?the?end?of?cease,fire?with?the-72(Score=110?times)
6)announced?thursday?the?end?of?cease,fire?with?their-51(Score=120?times)
7)announced?thursday?the?end?of?the?cease,fire?a-50(Score=100?times)
8)announced?thursday?the?end?of?cease,fire?a-48(Score=105?times)
9)announced?thursday?the?end?of?cease,fire?with?hamas?and-47(Score=130times)
10)announced?thursday?the?end?of?the?cease,fire?with?hamas?and-47(Score=125times)
11)hamas?announced?thursday?the?end?of?the?cease,fire?a-47(Score=120?times)
12)announced?on?thursday?the?end?of?cease,fire?with-47(Score=110?times)
13)announced?thursday?the?end?of?its?unilateral?cease,fire?with-45(Score=100times)
14)announced?on?thursday?the?end?of?the?cease,fire?with-39(Score=105?times)
15)announced?thursday?the?end?of?its?unilateral?cease,fire?with?hamas-36(Score=120?times)
16)announced?on?thursday?the?end?of?cease,fire?a-30(Score=100?times)
17)announced?thursday?the?end?of?the?cease,fire?with?the-30(Score=105?times)
18)hamas?announced?thursday?the?end?of?cease,fire?a-29(Score=125?times)
19)hamas?announced?on?thursday?the?end?of?cease,fire?a-27(Score=120?times)
20)hamas?announced?on?thursday?the?end?of?the?cease,fire?a-26(Score=115times)
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego?con′,′su?cesedel?fuego?con′(2,null,7)-(1000)
No?good?source?overlap
@@@Pre?2@@@
@@@Post?2@@@
Trying?ot?overlap′hamas?anuncióeste?jueves?el?fin?de?su?cese′,′su?cese?del?fuego?con′(2,hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego?con,7)-(1000)
Got?an?overlap?in?source,checking?target
1500-1000
Overlap?check?for′hamas?anuncióeste?jueves?el?fin?de?su?cese′,′su?cese?del?fuego?con′took?2.612
***hamas?anuncióeste?jueves?el?fin?de?su?cese(1500),(1000)su?cese?del?fuegocon=hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego?con
@@@2475->0
The overlapping result of hamas anunci ó este jueves el fin de su cese del fuego con
1)′hamas?announced?thursday?the?end?of?cease,fire?with?their′-140(Repeated?4times)(null)
2)′hamas?announced?thursday?the?end?of?cease,fire?with′-135(Repeated?21?times)(hamas?announced?thursday?the?end,of?cease?fire∷of?cease?fire?with)
3)′hamas?announced?on?thursday?the?end?of?cease,fire?with?their′-135(Repeated?4times)(null)
4)′hamas?announced?thursday?the?end?of?cease,fire?his′-135(Repeated?3?times)(null)
5)′announced?thursday?the?end?of?cease,fire?with?hamas′-135(Repeated?94?times)(null)
6)′hamas?announced?thursday?the?end?of?the?cease,fire?with?their′-135(Repeated4?times)(null)
7)′be?announced?thursday?the?end?of?cease,fire?with?hamas′-130(Repeated?2times)(null)
8)′hamas?announced?on?thursday?the?end?of?cease,fire?with′-130(Repeated?21times)(hamas?announced?on?thursday?the?end,of?cease?fire∷of?cease?fire?with)
9)′announced?thursday?the?end?of?cease,fire?with?hamas?and′-130(Repeated?47times)(null)
10)′and?announced?thursday?the?end?of?cease,fire?with?hamas′-130(Repeated?4times)(null)
11)′hamas?announced?thursday?the?end?of?cease?fire,in?their′-130(Repeated?3times)(null)
12)′hamas?announced?thursday?the?end?of?cease,fire?to?his′-130(Repeated?2?times)(null)
13)′announced?on?thursday?the?end?of?cease,fire?wit?hamas′-130(Repeated?12times)(null)
14)′announced?thursday?the?end?of?the?cease,fire?with?hamas′-130(Repeated?94times)(null)
15)′hamas?announced?thursday?the?end?of?cease,fire?had?his′-130(Repeated?2times)(null)
16)′hamas?announced?thursday?the?end?of?the?cease,fire?with′-130(Repeated?21times)(null)
17)′hamas?announced?thursday?the?end?of?cease,fire?on?their′-130(Repeated?2times)(null)
18)′hamas?announced?thursday?the?end?of?cease?fire,for?their′-130(Repeated?2times)(null)
19)′hamas?announced?thursday?the?end?of?cease,fire?with?the′-130(Repeated?13times)(hamas?announced?thursday?the?end,of?cease?fire∷of?cease?fire?with?the)
20)′hamas?announced?thursday?the?end?of?cease?fire,in?his′-l30(Repeated?2?times)(null)
Sort according to multiplicity
1)announced?thursday?the?end?of?cease,fire?with-178(Score=115?times)
2)announced?thursday?the?end?of?the?cease,fire?with-136(Score=110?times)
3)announced?thursday?the?end?of?cease,fire?with?hamas-94(Score=135?times)
4)announced?thursday?the?end?of?the?cease,fire?with?hamas-94(Score=130?times)
5)announced?thursday?the?end?of?cease,fire?with?the-72(Score=110?times)
6)announced?thursday?the?end?of?cease,fire?with?their-51(Score=120?times)
7)announced?thursday?the?end?of?the?cease,fire?a-50(Score=100?times)
8)announced?thursday?the?end?of?cease,fire?a-48(Score=105?times)
9)announced?on?thursday?the?end?of?cease,fire?with-47(Score=110?times)
10)hamas?announced?thursday?the?end?of?the?cease,fire?a-47(Score=120?times)
11)announced?thursday?the?end?of?the?cease,fire?with?hamas?and-47(Score=125times)
12)announced?thursday?the?end?of?cease,fire?with?hamas?and-47(Score=130times)
13)announced?thursday?the?end?of?its?unilateral?cease,fire?with-45(Score=100times)
14)announced?on?thursday?the?end?of?the?cease,fire?with-39(Score=105?times)
15)announced?thursday?the?end?of?its?unilateral?cease,fire?with?hamas-36(Score=120?times)
16)announced?thursday?the?end?of?the?cease,fire?with?the-30(Score=105?times)
17)announced?on?thursday?the?end?of?cease,fire?a-30(Score=100?times)
18)hamas?announced?thursday?the?end?of?cease,fire?a-29(Score=125?times)
19)hamas?announced?on?thursday?the?end?of?cease,fire?a-27(Score=120?times)
20)hamas?announced?on?thursday?the?end?of?the?cease,fire?a-26(Score=115times)
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueves′,′su?cese?del?fuego?con′(2,null,7)-(1000)
No?good?source?overlap
Figure A0382572903082
su?cese?del?fuego?con?israel?was?just?translated?and?rerurned?results
Number?of?results=631
Translation?for?su?cese?del?fuego?con?israel?took?1.12
going?to?try?and?overlap?this?piece?with?the?hashmap
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueves?el?fin?de′,′su?cese?del?fuego?con?israel′(2,null,7)-(631)
No?good?source?overlap
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueves?el?fin′,′su?cese?del?fuego?con?israel′(2,null,7)-(631)
No?good?source?overlap
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueves?el′,′su?cese?del?fuego?con?israel′(2,null,7)-(631)
No?good?source?overlap
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego′,′su?cese?delfuego?con?israel′(2,hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego?con?israel,7)-(631)
Got?an?overlap?in?source,checking?target
1500-631
Overlap?check?for′hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego′,′su?cese?delfuego?con?israel′took?7.102
***hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego(1500),(631)su?cesedel?fuego?con?israel=hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego?conisrael
###14957->14957
The overlapping result of hamas anunci ó este jueves el fin de su cese del fuego con israel
1)hamas?announced?thursday?the?end?of?cease,fire?with?israel-155(Repeated?30times)(hamas?announced?thursday?the?end?of,cease?fire∷cease?fire?with?israel)
2)hamas?announced?thursday?the?end?of?cease,fire?israel-150(Repeated?10?times)(hamas?announced?thursday?the?end?of,cease?fire∷cease?fire?israel)
3)hamas?announced?on?thursday?the?end?of?cease,fire?with?israel-150(Repeated?26times)(hamas?announced?on?thursday?the?end?of,cease?fire∷cease?fire?with?israel)
4)hamas?announced?thursday?the?end?of?cease,fire?with?isael?was-150(Repeated1?times)(hamas?announced?thursday?the?end?of,cease?fire∷cease?fire?with?israel?was)
5)hamas?announced?thursday?the?end?of?cease?fire,by?israel?with-150(Repeated?3times)(hamas?announced?thursday?the?end?of?cease,fire?by∷cease?fire?by?israel?with)
6)hamas?announced?thursday?the?end?of?cease,fire?with?israel?and-150(Repeated12?times)(hamas?announced?thursday?the?end?of,cease?fire∷cease?fire?with?israe?and)
7)hamas?announced?thursday?the?end?of?the?cease,fire?with?israel-150(Repeated27?times)(hamas?announced?thursday?the?end?of?the,cease?fire∷the?cease?fire?withisrael)
8)hamas?announced?thursday?the?end?of?cease,fire?with?israel?the-150(Repeated?3times)(hamas?announced?thursday?the?end?of,cease?fire∷cease?fire?with?israel?the)
9)hamas?announced?thursday?the?end?of?cease?fire,by?israel-145(Repeated?4times)(hamas?announced?thursday?the?end?of?cease,fire?by∷cease?fire?by?israel)
10)hamas?announced?thursday?the?end?of?the?cease,fire?with?israel?the-145(Repeated?3?times)(hamas?announced?thursday?the?end?of?the,cease?fire∷cease?fire?withisrael?the)
11)hamas?announced?thursday?the?end?of?cease,fire?israel?is-145(Repeated?5times)(hamas?announced?thursday?the?end?of,cease?fire∷cease?fire?israel?is)
12)hamas?announced?thursday?the?end?of?the?cease,fire?with?israel?and-145(Repeated?9?times)(hamas?announced?thursday?the?end?of?the,cease?fire∷the?cease?firewith?israel?and)
13)hamas?announced?on?thursday?the?end?of?cease,fire?with?israel?the-145(Repeated?2?times)(hamas?announced?on?thursday?the?end?of,cease?fire∷cease?fire?withisrael?the)
14)hamas?announced?thursday?the?end?of?cease?fire,and?israel-145(Repeated?5times)(hamas?announced?thursday?the?end?of?cease,fire?and∷cease?fire?and?israel)
15)hamas?announced?on?thursday?the?end?of?the?cease,fire?with?israel-145(Repeated?20?times)(hamas?announced?on?thursday?the?end?of?the,cease?fire∷the?ceasefire?with?israel)
16)hamas?announced?on?thursday?the?end?of?cease,fire?with?israel?and-145(Repeated?9?times)(hamas?announced?on?thursday?the?end?of,cease?fire∷cease?fire?withisrael?and)
17)hamas?announced?on?thursday?the?end?of?cease,fire?israel-145(Repeated?7times)(hamas?announced?on?thursday?the?end?of,cease?fire∷cease?fire?israel)
18)hamas?announced?thursday?the?end?of?the?cease,fire?by?israel?with-145(Repeated?3?times)(hamas?announced?thursday?the?end?of?the,cease?fire∷cease?fire?byisrael?with)
19)hamas?announced?on?thursday?the?end?of?cease?fire,by?israel?with-145(Repeated?3?times)(hamas?announced?on?thursday?the?end?of?cease,fire?by∷cease?fire?byisrael?with)
20)hamas?announced?thursday?the?end?of?the?cease,fire?with?israel?was-145(Repeated?1?times)(hamas?announced?thursday?the?end?of?the,cease?fire∷cease?fire?withisrael?was)
Sort according to multiplicity
1)announced?thursday?the?end?of?cease,fire?with?israel-279(Score=135?times)
2)announced?thursday?the?end?of?the?cease,fire?with?israel-209(Score=130?times)
3)announced?thursday?the?end?of?cease,fire?israel-113(Score=130?times)
4)announced?thursday?the?end?of?cease?fire,by?israel-91(Score=125?times)
5)announced?thursday?the?end?of?cease,fire?with?israel?and-85(Score=130?times)
6)announced?on?thursday?the?end?of?cease,fire?with?israel-65(Score=130?times)
7)announced?thursday?the?end?of?the?cease,fire?by?israel-53(Score=120?times)
8)announced?thursday?the?end?of?cease,fire?with?israel?the-53(Score=130?times)
9)announced?thursday?the?end?of?cease?fire,by?israel?with-52(Score=130?times)
10)announced?thursday?the?end?of?cease?fire,and?israel-50(Score=125?times)
11)announced?thursday?the?end?of?cease,fire?israel?is-50(Score=125?times)
12)announced?thursday?the?end?of?the?cease,fire?israel-49(Score=125?times)
13)announced?thursday?the?end?of?cease,fire?with?israel?was-47(Score=130times)
14)announced?thursday?the?end?of?the?cease,fire?with?israel?and-46(Score=125times)
15)announced?thursday?the?end?of?the?cease,fire?by?israel?with-46(Score=125times)
16)announced?thursday?the?end?of?the?cease,fire?with?israel?the-43(Score=125times)
17)announced?thursday?the?end?of?its?unilateral?cease,fire?with?israel-43(Score=120?times)
18)e?announced?thursday?the?end?of?cease,fire?with?israel-39(Score=115?times)
19)announced?on?thursday?the?end?of?the?cease,fire?with?israel-38(Score=125times)
20)announced?thursday?the?end?of?the?cease,fire?with?israel?was-37(Score=125times)
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego?con′,′su?cesedel?fuego?con?israel′(2,hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego?conisrael,7)-(631)
Got?an?overlap?in?source,checking?target
1500-631
Overlap?check?for′hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego?con′,′su?cese?delfuego?con?israel′took?3.371
***hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego?con(1500),(631)sucese?del?fuego?con?israel=hamas?anuncióeste?jueves?el?fin?de?su?cese?delfuego?con?israel
@@@16056->0
The overlapping result of hamas anunci ó este jueves el fin de su cese del fuego con israel
1)′hamas?announced?thursday?the?end?of?cease?fire,with?israel′-155(Repeated?1times)(hamas?announced?thursday?the?end?of?cease,fire?with∷cease?fire?with?israel)
2)′hamas?announced?thursday?the?end?of?cease,fire?with?israel′-155(Repeated?27times)(null)
3)′hamas?announced?on?thursday?the?end?of?cease?fire,with?israel′-150(Repeated1?times)(hamas?announced?on?thursday?the?end?of?cease,fire?with∷cease?fire?with?israel)
4)′hamas?announced?thursday?the?end?of?cease,fire?israel′-150(Repeated?8?times)(null)
5)′hamas?announced?on?thursday?the?end?of?cease,fire?with?israel′-150(Repeated22?times)(null)
6)′hamas?announced?thursday?the?end?of?cease,fire?with?israel?was′-150(Repeated1?times)(null)
7)′hamas?announced?thursday?the?end?of?cease?fire,by?israel?with′-150(Repeated3?times)(hamas?announced?thursday?the?end?of?cease,fire?by∷cease?fire?by?israel?with)
8)′hamas?announced?thursday?the?end?of?cease,fire?with?israel?and′-150(Repeated9?times)(null)
9)′hamas?announced?thursday?the?end?of?cease?fire?with,israel?and′-150(Repeated10?times)(hamas?announced?thursday?the?end?of?cease,fire?with?israel∷cease?fire?withisrael?and)
10)′hamas?announced?thursday?the?end?of?cease?fire?with,israel?was′-150(Repeated?1?times)(hamas?announced?thursday?the?end?of?cease,fire?with?israel∷ceasefire?with?israel?was)
11)′hamas?announced?thursday?the?end?of?the?cease,fire?with?israel′-150(Repeated?23?times)(null)
12)′hamas?announced?thursday?the?end?of?the?cease?fire,with?israel′-150(Repeated?1?times)(hamas?announced?thursday?the?end?of?the?cease,fire?with∷the?ceasefire?with?israel)
13)′hamas?announced?thursday?the?end?of?cease?fire?with,israel?the′-150(Repeated?3?times)(hamas?announced?thursday?the?end?of?cease,fire?with?israel∷ceasefire?with?israel?the)
14)′hamas?announced?thursday?the?end?of?cease,fire?with?israel?the′-150(Repeated?3?times)(null)
15)′hamas?announced?thursday?the?end?of?the?cease?fire?with,israel?the′-145(Repeated?2?times)(hamas?announced?thursday?the?end?of?the?cease,fire?withisrael∷cease?fire?with?israel?the)
16)′hamas?announced?thursday?the?end?of?the?cease?fire,by?israel?with′-145(Repeated?2?times)(hamas?announced?thursday?the?end?of?the?cease,fire?by∷cease?fireby?israel?with)
17)′hamas?announced?on?thursday?the?end?of?cease?fire?with,israel?was′-145(Repeated?1?times)(hamas?announced?on?thursday?the?end?of?cease,fire?withisrael∷cease?fire?with?israel?was)
18)′hamas?announced?on?thursday?the?end?of?the?cease?fire,with?israel′-145(Repeated?1?times)(hamas?announced?on?thursday?the?end?of?the?cease,fire?with∷thecease?fire?with?israel)
19)′hamas?announced?thursday?the?end?of?the?cease?fire?with,israel?was′-145(Repeated?1?times)(hamas?announced?thursday?the?end?of?the?cease,fire?withisrael∷cease?fire?with?israel?was)
20)′hamas?announced?thursday?the?end?of?cease?fire,by?israel′-145(Repeated?4times)(hamas?announced?thursday?the?end?of?cease,fire?by∷cease?fire?by?israel)
Sort according to multiplicity
1)announced?thursday?the?end?of?cease,fire?with?israel-253(Score=135?times)
2)announced?thursday?the?end?of?the?cease,fire?with?israel-129(Score=130?times)
3)announced?thursday?the?end?of?cease,fire?israel-82(Score=130?times)
4)announced?thursday?the?end?of?cease,fire?with?israel?and-68(Score=130?times)
5)announced?thursday?the?end?of?cease?fire,by?israel-66(Score=125?times)
6)announced?thursday?the?end?of?cease?fire,with?israel-66(Score=135?times)
7)announced?on?thursday?the?end?of?cease,fire?with?israel-51(Score=130?times)
8)announced?thursday?the?end?of?cease?fire,by?israel?with-50(Score=130?times)
9)announced?thursday?the?end?of?cease,fire?with?israel?the-50(Score=130?times)
10)announced?thursday?the?end?of?cease,fire?with?israel?was-47(Score=130times)
11)announced?thursday?the?end?of?its?unilateral?cease,fire?with?israel-43(Score=120?times)
12)announced?on?thursday?the?end?of?the?cease,fire?with?israel-35(Score=125times)
13)announced?thursday?the?end?of?the?cease,fire?by?israel-33(Score=120?times)
14)announced?thursday?the?end?of?the?cease?fire,with?israel-32(Score=130?times)
15)e?announced?thursday?the?end?of?cease,fire?with?israel-31(Score=115?times)
16)announced?thursday?the?end?of?the?cease,fire?israel-30(Score=125?times)
17)announced?thursday?the?end?of?the?cease,fire?with?israel?and-29(Score=125times)
18)hamas?announced?thursday?the?end?of?cease,fire?with?israel-27(Score=155times)
19)announced?on?thursday?the?end?of?its?unilateral?cease,fire?with?israel-26(Score=115?times)
20)announced?thursday?the?end?of?the?cease,fire?by?israel?with-26(Score=125times)
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueves?el?fin?de?su?cese′,′su?cese?del?fuego?conisrael′(2,hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego?con?israel,7)-(631)
Got?an?overlap?in?source,checking?target
1500-631
Oyerlap?check?for′hamas?anuncióeste?jueves?el?fin?de?su?cese′,′su?cese?del?fuego?conisrael′took?2.783
***hamas?anuncióeste?jueves?el?fin?de?su?cese(1500),(631)su?cese?del?fuegocon?israel=hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego?con?israel
@@@1575->0
The overlapping result of hamas anunci ó este jueves el fin de su cese del fuego con israel
1)′hamas?announced?thursday?the?end?of?cease?fire,with?israel′-155(Repeated?1times(null)
2)′hamas?announced?thursday?the?end?of?cease,fire?with?israel′-155(Repeated?27times)(null)
3)′hamas?announced?on?thursday?the?end?of?cease?fire,with?israel′-150(Repeated1?times)(null)
4)′hamas?announced?thursday?the?end?of?cease,fire?israel′-150(Repeated?8?times)(null)
5)′hamas?announced?on?thursday?the?end?of?cease,fire?with?israel′-150(Repeated22?times)(null)
6)′hamas?announced?thursday?the?end?of?cease,fire?with?israel?was′-150(Repeated1?times)(null)
7)′hamas?announced?thursday?the?end?of?cease?fire,by?israel?with′-150(Repeated3?times)(null)
8)′hamas?announced?thursday?the?end?of?cease,fire?with?israel?and′-150(Repeated9?times)(null)
9)′hamas?announced?thursday?the?end?of?cease?fire?with,israel?and′-150(Repeated9?times)(null)
10)′hamas?announced?thursday?the?end?of?cease?fire?with,israel?was′-150(Repeated?1?times)(null)
11)′hamas?announced?thursday?the?end?of?the?cease,fire?with?israel′-150(Repeated?23?times)(null)
12)′hamas?announced?thursday?the?end?of?the?cease?fire,with?israel′-150(Repeated?1?times)(null)
13)′hamas?announced?thursday?the?end?of?cease?fire?with,israel?the′-150(Repeated?3?times)(null)
14)′hamas?announced?thursday?the?end?of?cease,fire?with?israel?the′-150(Repeated?3?times)(null)
15)′hamas?announced?thursday?the?end?of?the?cease?fire?with,israel?the′-145(Repeated?2?times)(null)
16)′hamas?announced?thursday?the?end?of?the?cease?fire,by?israel?with′-145(Repeated?2?times)(null)
17)′hamas?announced?on?thursday?the?end?of?cease?fire?with,israel?was′-145(Repeated?1?times)(null)
18)′hamas?announced?on?thursday?the?end?of?the?cease?fire,with?israel′-145(Repeated?1?times)(null)
19)′hamas?announced?thursday?the?end?of?the?cease?fire?with,israel?was′-145(Repeated?1?times)(null)
20)′hamas?announced?thursday?the?end?of?cease?fire,by?israel′-145(Repeated?4times)(null)
Sort according to multiplicity
1)announced?thursday?the?end?of?cease,fire?with?israel-252(Score=135?times)
2)announced?thursday?the?end?of?the?cease,fire?with?israel-126(Score=130?times)
3)announced?thursday?the?end?of?cease,fire?israel-81(Score=130?times)
4)announced?thursday?the?end?of?cease,fire?with?israel?and-67(Score=130?times)
5)announced?thursday?the?end?of?cease?fire,with?israel-66(Score=135?times)
6)announced?thursday?the?end?of?cease?fire,by?israel-66(Score=125?times)
7)announced?on?thursday?the?end?of?cease,fire?with?israel-51(Score=130?times)
8)announced?thursday?the?end?of?cease,fire?with?israel?the-50(Score=130?times)
9)announced?thursday?the?end?of?cease?fire,by?israel?with-50(Score=130?times)
10)announced?thursday?the?end?of?cease,fire?with?israel?was-47(Score=130times)
11)announced?thursday?the?end?of?its?unilateral?cease,fire?with?israel-43(Score=120?times)
12)announced?on?thursday?the?end?of?the?cease,fire?with?israel-35(Score=125times)
13)announced?thursday?the?end?of?the?cease,fire?by?israel-33(Score=120?times)
14)announced?thursday?the?end?of?the?cease?fire,with?israel-32(Score=130?times)
15)e?announced?thursday?the?end?of?cease,fire?with?israel-31(Score=115?tmes)
16)announced?thursday?the?end?of?the?cease,fire?israel-29(Score=125?times)
17)hamas?announced?thursday?the?end?of?cease,fire?with?israel-27(Score=155times)
18)announced?thursday?the?end?of?the?cease,fire?with?israel?and-27(Score=125times)
19)announced?on?thursday?the?end?of?its?unilateral?cease,fire?with?israel-26(Score=115?times)
20)announced?thursday?the?end?of?the?cease,fire?by?israel?with-26(Score=125times)
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueves′,′su?cese?del?fuego?con?israel′(2,null,7)--(631)
No?good?source?overlap
Figure A0382572903141
cese?del?fuego?con?was?just?translated?and?returned?results
Number?of?results=1000
Translation?for?cese?del?fuego?con?took?0.705
going?to?try?and?overlap?this?piece?with?the?hashmap
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueves?el?fin?de′,′cese?del?fuego?con′(2,null,8)--(1000)
No?good?source?overlap
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueves?el?fin′,′cese?del?fuego?con′(2,null,8)-(1000)
No?good?source?overlap
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueves?el′,′cese?del?fuego?con′(2,null,8)-(1000)
No?good?source?overlap
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego?con?israel′,′cese?del?fuego?con′(2,null,8)-(1000)
No?good?source?overlap
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego′,′cese?delfuego?con′(2,hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego?con,8)-(1000)
Got?an?overlap?in?source,checking?target
1500-1000
Overlap?check?for′hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego′,′cese?del?fuegocon′took?9.486
***hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego(1500),(1000)cesedel?fuego?con=hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego?con
@@@29730->0
The overlapping result of hamas anunci ó este jueves el fin de su cese del fuego con
1)′hamas?announced?thursday?the?end?of?cease,fire?with?their′-140(Repeated?4times)(hamas?announced?thursday?the?end?of,cease?fire∷cease?fire?with?their)
2)′hamas?announced?thursday?the?end?of?cease,fire?with′-135(Repeated?93?times)(hamas?announced?thursday?the?end?of,cease?fire∷of?cease?fire?with)
3)′hamas?announced?on?thursday?the?end?of?cease,fire?with?their′-135(Repeated?4times)(hamas?announced?on?thursday?the?end?of,cease?fire∷cease?fire?with?their)
4)′hamas?announced?thursday?the?end?of?cease,fire?his′-135(Repeated?3?times)(null)
5)′hamas?announced?thursday?the?end?of?cease,fire?of′-135(Repeated?10?times)(hamas?announced?thursday?the?end?of,cease?fire∷cease?fire?of)
6)′announced?thursday?the?end?of?cease,fire?with?hamas′-135(Repeated?141?times)(announced?thursday?the?end?of,cease?fire∷cease?fire?with?hamas)
7)′hamas?announced?thursday?the?end?of?the?cease,fire?with?their′-135(Repeated4?times)(hamas?announced?thursday?the?end?of?the,cease?fire∷cease?fire?with?their)
8)′hamas?announced?on?thursday?the?end?of?cease,fire?with′-130(Repeated?80times)(hamas?announced?on?thursday?the?end?of,cease?fire∷of?cease?fire?with)
9)′announced?thursday?the?end?of?cease,fire?with?hamas?and′-130(Repeated?94times)(announced?thursday?the?end?of,cease?fire∷cease?fire?with?hamas?and)
10)′and?announced?thursday?the?end?of?cease,fire?with?hamas′-130(Repeated?6times)(and?announced?thursday?the?end?of,cease?fire∷cease?fire?wiht?hamas)
11)′hamas?announced?thursday?the?end?of?cease?fire,in?their′-130(Repeated?3times)(null)
12)′hamas?announced?thursday?the?end?of?cease,fire?with?in′-130(Repeated?6times)(hamas?announced?thursday?the?end?of,cease?fire∷cease?fire?with?in)
13)′announced?thursday?the?end?of?the?cease,fire?with?hamas′-130(Repeated?103times)(announced?thursday?the?end?of?the,cease?fire∷cease?fire?with?hamas)
14)′hamas?announced?thursday?the?end?of?the?cease,fire?with′-130(Repeated?80times)(hamas?announced?thursday?the?end?of?the,cease?fire∷the?cease?fire?with)
15)′hamas?announced?thursday?the?end?of?cease,fire?on?their′-130(Repeated?2times)(null)
16)′hamas?announced?thursday?the?end?of?cease?fire,for?their′-130(Repeated?2times)(null)
17)′hamas?announced?thursday?the?end?of?cease,fire?with?the′-130(Repeated?52times)(hamas?announced?thursday?the?end?of,cease?fire∷of?cease?fire?with?the)
18)′hamas?announced?on?thursday?the?end?of?the?cease,fire?with?their′-130(Repeated?4?times)(hamas?announced?on?thursday?the?end?of?the,cease?fire∷cease?firewith?their)
19)′they?announced?thursday?the?end?of?cease,fire?with?hamas′-130(Repeated?3times)(they?announced?thursday?the?end?of,cease?fire∷cease?fire?with?hamas)
20)′were?announced?thursday?the?end?of?cease,fire?with?hamas′-130(Repeated?3times)(were?announced?thursday?the?end?of,cease?fire∷cease?fire?with?hamas)
Sort according to multiplicity
1)announced?thursday?the?end?of?cease,fire?with-276(Score=115?times)
2)announced?thursday?the?end?of?the?cease,fire?with-199(Score=110?times)
3)announced?thursday?the?end?of?cease,fire?with?hamas-141(Score=135?times)
4)announced?on?thursday?the?end?of?cease,fire?with-106(Score=110?times)
5)announced?thursday?the?end?of?the?cease,fire?with?hamas-103(Score=130times)
6)announced?thursday?the?end?of?cease,fire?with?hamas?and-94(Score=130times)
7)hamas?announced?thursday?the?end?of?cease,fire?with-93(Score=135?times)
8)hamas?announced?on?thursday?the?end?of?cease,fire?with-80(Score=130?times)
9)hamas?announced?thursday?the?end?of?the?cease,fire?with-80(Score=130?times)
10)announced?thursday?the?end?of?cease,fire?with?the-78(Score=110?times)
11)announced?on?thursday?the?end?of?the?cease,fire?with-58(Score=105?times)
12)announced?thursday?the?end?of?the?cease,fire?with?hamas?and-56(Score=125times)
13)hamas?announced?thursday?the?end?of?cease,fire?with?the-52(Score=130times)
14)announced?thursday?the?end?of?the?cease,fire?with?the-52(Score=105?times)
15)announced?on?thursday?the?end?of?cease,fire?with?the-49(Score=105?times)
16)announced?thursday?the?end?of?cease,fire?with?hamas?and?the-47(Score=125times)
17)hamas?announced?thursday?the?end?of?the?cease,fire?with?the-43(Score=125times)
18)hamas?announced?on?thursday?the?end?of?cease,fire?with?the-43(Score=125times)
19)hamas?announced?on?thursday?the?end?of?the?cease,fire?with-40(Score=125times)
20)announced?thursday?the?end?of?cease,fire?a-38(Score=105?times)
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego?con′,′cese?delfuego?con′(2,null,8)-(1000)
No?good?source?overlap
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueves?el?fin?de?su?cese′,′cese?del?fuego?con′(2,null,8)-(1000)
No?good?source?overlap
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueves′,′cese?del?fuego?con′(2,null,8)-(1000)
No?good?source?overlap
cese?del?fuego?con?israel?was?just?translated?and?returned?results
Number?of?results=748
Translation?for?cese?del?fuego?con?israel?took?0.888
going?to?try?and?overlap?this?piece?with?the?hashmap
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueves?el?fin?de′,′cese?del?fuego?con?israel′(2,null,8)-(748)
No?good?source?overlap
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anunció?este?jueves?el?fin′,′cese?del?fuego?con?israel′(2,null,8)-(748)
No?good?source?overlap
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueves?el′,′cese?del?fuego?con?israel′(2,null,8)--(748)
No?good?source?overlap
@@@Pre?2@@@
@@@Post2?@@@
Trying?to?overlap′hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego?con?israel′,′cese?del?fuego?con?israel′(2,null,8)-(748)
No?good?source?overlap
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego′,′cese?delfuego?con?israel′(2,hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego?con?israel,8)-(748)
Got?an?overlap?in?source,checking?target
1500-748
Overlap?check?for′hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego′,′cese?del?fuegocon?israel′took?7.89
***hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego(1500),(748)cese?delfuego?con?israel=hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego?conisrael
@@@18681->0
The overlapping result of hamas anunci ó este jueves el fin de su cese del fuego con israel
1)′hamas?announced?thursday?the?end?of?cease,fire?with?israel′-155(Repeated?28times)(hamas?announced?thursday?the?end?of,cease?fire∷cease?fire?with?israel)
2)′hamas?announced?thursday?the?end?of?cease?fire,with?israel′-155(Repeated?1times)(null)
3)′hamas?announced?on?thursday?the?end?of?cease?fire,with?israel′-150(Repeated1?times)(null)
4)′hamas?announced?thursday?the?end?of?cease?fire?with,israel?and′-150(Repeated9?times)(null)
5)′hamas?announced?thursday?the?end?of?the?cease,fire?with?israel′-150(Repeated24?times)(hamas?announced?thursday?the?end?of?the,cease?fire∷the?cease?fire?withisrael)
6)′hamas?announced?thursday?the?end?of?cease,fire?with?israel?the′-150(Repeated3?times)(hamas?announced?thursday?the?end?of,cease?fire∷cease?fire?with?israel?the)
7)′hamas?announced?thursday?the?end?of?cease,fire?israel′-150(Repeated?8?times)(hamas?announced?thursday?the?end?of,cease?fire∷cease?fire?israel)
8)′hamas?announced?on?thursday?the?end?of?cease,fire?with?israel′-150(Repeated23?times)(hamas?announced?on?thursday?the?end?of,cease?fire∷cease?fire?with?israel)
9)′hamas?announced?thursday?the?end?of?cease,fire?with?israel?was′-150(Repeated1?times)(hamas?announced?thursday?the?end?of,cease?fire∷cease?fire?with?israel?was)
10)′hamas?announced?thursday?the?end?of?cease?fire,by?israel?with′-150(Repeated3?times)(hamas?announced?thursday?the?end?of?cease,fire?by∷cease?fire?by?israel?with)
11)′hamas?announced?thursday?the?end?of?cease,fire?with?israel?and′-150(Repeated?9?times)(hamas?announced?thursday?the?end?of,cease?fire∷cease?fire?withisrael?and)
12)′hamas?announced?thursday?the?end?of?cease?fire?with,israel?was′-150(Repeated?1?times)(null)
13)′hamas?announced?thursday?the?end?of?the?cease?fire,with?israel′-150(Repeated?1?times)(null)
14)′hamas?announced?thursday?the?end?of?cease?fire?with,israel?the′-150(Repeated?3?times)(null)
15)′hamas?announced?thursday?the?end?of?the?cease?fire?with,israel?the′-145(Repeated?2?times)(null)
16)′hamas?announced?on?thursday?the?end?of?cease?fire?with,israel?was′-145(Repeated?1?times)(null)
17)′hamas?announced?on?thursday?the?end?of?the?cease?fire,with?israel′-145(Repeated?1?times(null)
18)′hamas?announced?thursday?the?end?of?the?cease?fire?with,israel?was′-145(Repeated?1?times)(null)
19)′hamas?announced?thursday?the?end?of?the?cease,fire?with?israel?the′-145(Repeated?3?times)(hamas?announced?thursday?the?end?of?the,cease?fire∷cease?fire?withisrael?the)
20)′hamas?announced?on?thursday?the?end?of?cease?fire?with,israel?and′-145(Repeated?8?times)(null)
Sort according to multiplicity
1)announced?thursday?the?end?of?cease,fire?with?israel-259(Score=135?times)
2)announced?thursday?the?end?of?the?cease,fire?with?israel-122(Score=130?times)
3)announced?thursday?the?end?of?cease,fire?israel-71(Score=130?times)
4)announced?thursday?the?end?of?cease,fire?with?israel?and-67(Score=130?times)
5)announced?thursday?the?end?of?cease?fire,by?israel-62(Score=125?times)
6)announced?thursday?the?end?of?cease?fire,with?israel-61(Score=135?times)
7)announced?on?thursday?the?end?of?cease,fire?with?israel-51(Score=130?times)
8)announced?thursday?the?end?of?cease,fire?with?israel?the-51(Score=130?times)
9)announced?thursday?the?end?of?cease?fire,by?israel?with-50(Score=130?times)
10)announced?thursday?the?end?of?cease,fire?with?israel?was-47(Score=130times)
11)announced?thursday?the?end?of?its?unilateral?cease,fire?with?israel-44(Score=120?times)
12)announced?on?thursday?the?end?of?the?cease,fire?with?israel-37(Score=125times)
13)e?announced?thursday?the?end?of?cease,fire?with?israel-34(Score=115?times)
14)announced?thursday?the?end?of?the?cease,fire?israel-32(Score=125?times)
15)announced?thursday?the?end?of?the?cease?fire,with?israel-30(Score=130?times)
16)hamas?announced?thursday?the?end?of?cease,fire?with?israel-28(Score=155times)
17)announced?on?thursday?the?end?of?its?unilateral?cease,fire?with?israel-26(Score=115?times)
18)hamas?announced?thursday?the?end?of?the?cease,fire?with?israel-24(Score=150?times)
19)announced?thursday?the?end?of?cease?fire,and?israel-23(Score=125?times)
20)announced?thursday?the?end?of?the?cease,fire?with?israel?and-23(Score=125times)
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamsa?anuncióeste?jueves?el?fin?de?su?cese?del?fuego?con′,′cese?delfuego?con?israel′(2,hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego?con?israel,8)--(748)
Got?an?overlap?in?source,checking?target
1500-748
Overlap?check?for′hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego?con′,′cese?delfuego?con?israel′took?3.299
***hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego?con(1500),(748)cesedel?fuego?con?israel=hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego?conisrael
@@@2840->0
The overlapping result of hamas anunci ó este jueves el fin de su cese del fuego con israel
1)′hamas?announced?thursday?the?end?of?cease,fire?with?israel′-155(Repeated?28times)(null)
2)′hamas?announced?thursday?the?end?of?cease?fire,with?israel′-155(Repeated?1times)(hamas?announced?thursday?the?end?of?cease,fire?with∷cease?fire?with?israel)
3)′hamas?announced?on?thursday?the?end?of?cease?fire,with?israel′-150(Repeated1?times)(hamas?announced?on?thursday?the?end?of?cease,fire?with∷cease?fire?with?israel)
4)′hamas?announced?thursday?the?end?of?cease?fire?with,israel?and′-150(Repeated9?times)(hamas?announced?thursday?the?end?of?cease,fire?with?israel∷cease?fire?withisrael?and)
5)′hamas?announced?thursday?the?end?of?the?cease,fire?with?israel′-150(Repeated24?times)(null)
60′hamas?announced?thursday?the?end?of?cease,fire?with?israel?the′-150(Repeated3?times)(null)
7)′hamas?announced?thursday?the?end?of?cease,fire?israel′-150(Repeated?8?times)(null)
8)′hamas?announced?on?thursday?the?end?of?cease,fire?with?israel′-150(Repeated23?times)(null)
9)′hamas?announced?thursday?the?end?of?cease,fire?with?israel?was′-150(Repeated1?times)(null)
10)′hamas?announced?thursday?the?end?of?cease?fire,by?israel?with′-150(Repeated3?times)(hamas?announced?thursday?the?end?of?cease,fire?by∷cease?fire?by?israel?with)
11)′hamas?announced?thursday?the?end?of?cease?fire?with,israel?as′-150(Repeated3?times)(hamas?announced?thursday?the?end?of?cease,fire?with?israel∷fire?with?israel?as)
12)′hamas?announced?thuusday?the?end?of?cease,fire?with?israel?and′-150(Repeated?9?times)(null)
13)′hamas?announced?thursday?the?end?of?cease?fire?with,israel?was′-150(Repeated?1?times)(hanas?announced?thursday?the?end?of?cease,fire?with?israel∷ceasefire?with?israel?was)
14)′hamas?announced?thursday?the?end?of?the?cease?fire,with?israel′-150(Repeated?1?times)(hamas?announced?thursday?the?end?of?the?cease,fire?with∷the?ceasefire?with?jsrael)
15)′hamas?announced?thursday?the?end?of?cease?fire?with,israel?the′-150(Repeated?3?times)(hamas?announced?thuaay?the?end?of?cease,fire?with?israel∷ceasefire?with?israel?the)
16)′hamas?announced?thursday?the?end?of?the?cease?fire?with,israel?the′-145(Repeated?2?times)(hamas?announced?thursday?the?end?of?the?cease,fire?withisrael∷cease?fire?with?israel?the)
17)′hamas?announced?on?thursday?the?end?of?cease?fire?with,israel?was′-145(Repeated?1?times)(hamas?announced?on?thursday?the?end?of?cease,fire?withisrael∷cease?fire?with?israel?was)
18)′hamas?announced?on?thursday?the?end?of?the?cease?fire,with?israel′-145(Repeated?1?times)(hamas?announced?on?thursday?the?end?of?the?cease,fire?with∷thecease?fire?with?israel)
19)′hamas?announced?thursday?the?end?of?the?cease?fire?with,israel?was′-145(Repeated?1?times)(hamas?announced?thursday?the?end?of?the?cease,fire?withisrael∷cease?fire?with?israel?was)
20)′hamas?announced?on?thursday?the?end?of?cease?fire?with,israel?as′-145(Repeated?3?times)(hamas?announced?on?thursday?the?end?of?cease,fire?with?israel∷firewith?israel?as)
Sort according to multiplicity
1)announced?thursday?the?end?of?cease,fire?with?israel-250(Score=135?times)
2)announced?thursday?the?end?of?the?cease,fire?with?israel-101(Score=130?times)
3)announced?thursday?the?end?of?cease,fire?israel-65(Score=130?times)
4)announced?thursday?the?end?of?cease?fire,with?israel-64(Score=135?times)
5)announced?thursday?the?end?of?cease,fire?with?israel?and-60(Score=130?times)
6)announced?thursday?the?end?of?cease?fire,by?israel-58(Score=125?times)
7)announced?thursday?the?end?of?cease?fire,by?israel?with-50(Score=130?times)
8)announced?thursday?the?end?of?cease,fire?with?israel?the-50(Score=130?times)
9)announced?on?thursday?the?end?of?cease,fire?with?israel-47(Score=130?times)
10)announced?thursday?the?end?of?cease,fire?with?israel?was-47(Score=130times)
11)announced?thursday?the?end?of?its?unilateral?cease,fire?with?israel-44(Score=120?times)
12)announced?on?thursday?the?end?of?the?cease,fire?with?israel-37(Score=125times)
13)e?announced?thursday?the?end?of?cease,fire?with?israel-31(Score=115?times)
14)announced?thursday?the?end?of?the?cease?fire,with?israel-31(Score=130?times)
15)hamas?announced?thursday?the?end?of?cease,fire?with?israel-28(Score=155times)
16)hamas?announced?thursday?the?end?of?the?cease,fire?with?israel-24(Score=150?times)
17)announced?thursday?the?end?of?its?unilateral?cease?fire,with?israel-24?(Score=120?times)
18)hamas?announced?on?thursday?the?end?of?cease,fire?with?israel-23(Score=150times)
19)announced?on?thursday?the?end?of?its?unilateral?cease,fire?with?israel-23(Score=115?times)
20)announced?thursday?the?end?of?the?cease,fire?israel-22(Score=125?times)
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueves?el?fin?de?su?cese′,′cese?del?fuego?conisrael′(2,null,8)-(748)
No?good?source?overlap
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueves′,′cese?del?fuego?con?israel′(2,null,8)-(748)
No?good?source?overlap
del?fuego?con?israel?was?just?translated?and?returned?results
Number?of?results=604
Translation?for?del?fuego?con?israel?took?0.634
going?to?try?and?overlap?this?piece?with?the?hashmap
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueves?el?fin?de′,′del?fuego?con?israel′(2,null,9)-(604)
No?good?source?overlap
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueves?el?fin′,′del?fuego?con?israel′(2,null,9)-(604)
No?good?source?overlap
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueves?el′,′del?fuego?con?israel′(2,null,9)-(604)
No?good?source?overlap
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego?con?israel′,′del?fuego?con?israel′(2,null,9)-(604)
No?good?source?overlap
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego′,′del?fuegocon?israel′(2,hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego?con?israel,9)-(604)Got?an?overlap?in?source,checking?target
1500-604
Overlap?check?for′hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego′,′del?fuego?conisrael′took?3.242
***hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego(1500),(604)delfuego?con?israel=hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego?conisrael
@@@2927->0
The overlapping result of hamas anunci ó este juees el fin de su cese del fuego con israel
1)′hamas?announced?thursday?the?end?of?cease,fire?with?israel′-155(Repeated?28times)(hamas?announced?thursday?the?end?of,cease?fire∷cease?fire?with?israel)
2)′hamas?announced?thursday?the?end?of?cease?fire,with?israel′-155(Repeated?1times)(null)
3)′hamas?announced?on?thursday?the?end?of?cease?fore,with?israel′-150(Repeated1?times)(null)
4)′hamas?announced?thursday?the?end?of?cease?fire?with,israel?and′-150(Repeated9?times)(null)
5)′hamas?announced?thursday?the?end?of?the?cease,fire?with?israel′-150(Repeated24?times)(hamas?announced?thursday?the?end?of?the,cease?fire∷the?cease?fire?withisrael)
6)′hamas?announced?thursday?the?end?of?cease,fire?with?israel?the′-150(Repeated3?times)(hamas?announced?thursday?the?end?of,cease?fire∷cease?fire?with?israel?the)
7)′hamas?announced?thursday?the?end?of?cease,fire?israel′-150(Repeated?8?times)(null)
8)′hamas?announced?on?thursday?the?end?of?cease,fire?with?israel′-150(Repeated23?times)(hamas?announced?on?thursday?the?end?of,cease?fire∷cease?fire?with?israel)
9)′hamas?announced?thursday?the?end?of?cease,fire?with?israel?was′-150(Repeated1?times)(null)
10)′hamas?announced?thursday?the?end?of?cease?fire,by?israel?with′-150(Repeated3?times)(hamas?announced?thursday?the?end?of?cease,fire?by∷cease?fire?by?israel?with)
11)′hamas?announced?thursday?the?end?of?cease?fire?with,israel?as′-150(Repeated3?times)(null)
12)′hamas?announced?thursday?the?end?of?cease,fire?with?israel?and′-150(Repeated?9?times)(hamas?announced?thursday?the?end?of,cease?fire∷cease?fire?withisrael?and)
13)′hamas?announced?thursday?the?end?of?cease?fire?with,israel?was′-150(Repeated?1?times)(null)
14)′hamas?announced?thursday?the?end?of?the?cease?fire,with?israel′-150(Repeated?1?times)(null)
15)′hamas?announced?thursday?the?end?of?cease?fire?with,israel?the′-150(Repeated?3?times)(null)
16)′hamas?announced?thursday?the?end?of?the?cease?fire?with,israel?the′-145(Repeated?2?times)(null)
17)′hamas?announced?on?thursday?the?end?of?cease?fire?with,israel?was′-145(Repeated?1?times)(null)
18)′hamas?announced?on?thursday?the?end?of?the?cease?fire,with?israel′-145(Repeated?1?times)(null)
19)′hamas?announced?thursday?the?end?of?the?cease?fire?with,israel?was′-145(Repeated?1?times)(null)
20)′hamas?announced?on?thursday?the?end?of?cease?fire?with,israel?as′-145(Repeated?3?times)(null)
Sort according to multiplicity
1)announced?thursday?the?end?of?cease,fire?with?israel-250(Score=135?times)
2)announced?thursday?the?end?of?the?cease,fire?with?israel-101(Score=130?times)
3)announced?thursday?the?end?of?cease,fire?israel-65(Score=130?times)
4)announced?thursday?the?end?of?cease?fire,with?israel-64(Score=135?times)
5)announced?thursday?the?end?of?cease,fire?with?israel?and-60(Score=130?times)
6)announced?thursday?the?end?of?cease?fire,by?israel-58(Score=125?times)
7)announced?thursday?the?end?of?cease?fire,by?israel?with-50(Score=130?times)
8)announced?thursday?the?end?of?cease,fire?with?israel?the-50(Score=130?times)
9)announced?on?thursday?the?end?of?cease,fire?with?israel-47(Score=130?times)
10)announced?thursday?the?end?of?cease,fire?with?israel?was-47(Score=130times)
11)announced?thursday?the?end?of?its?unilateral?cease,fire?with?israel-44(Score=120?times)
12)announced?on?thursday?the?end?of?the?cease,fire?with?israel-37(Score=125times)
13)e?announced?thursday?the?end?of?cease,fire?with?israel-31(Score=115?times)
14)announced?thursday?the?end?of?the?cease?fire,with?israel-31(Score=130?times)
15)hamas?announced?thursday?the?end?of?cease,fire?with?israel-28(Score=155times)
16)hamas?announced?thursday?the?end?of?the?cease,fire?with?israel-24(Score=150?times)
17)announced?thursday?the?end?of?its?unilateral?cease?fire,with?israel-24(Score=120?times)
18)hamas?announced?on?thursday?the?end?of?cease,fire?with?israel-23(Score=150times)
19)announced?on?thursday?the?end?of?its?unilateral?cease,fire?with?israel-23(Score=115?times)
20)announced?thursday?the?end?of?the?cease,fire?israel-22(Score=125?times)
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego?con′,′delfuego?con?israei′(2,hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego?con?israel,9)-(604)
Got?an?overlap?in?source,checking?target1500-604
Overlap?check?for′hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego?con′,′del?fuegocon?israel′took?2.82
***hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego?con(1500),(604)delfuego?con?israel=hamas?anuncióeste?jueves?el?fin?de?su?cese?del?fuego?conisrael
@@@1577->0
The overlapping result of hamas anunci ó este jueves el fin de su cese del fuego con israel
1)′hamas?announced?thursday?the?end?of?cease,fire?with?israel′-155(Repeated?28times)(null)
2)′hamas?announced?thursday?the?end?of?cease?fire,with?israel′-155(Repeated?1times)(hamas?announced?thursday?the?end?of?cease,fire?with∷cease?fire?with?israel)
3)′hamas?announced?on?thursday?the?end?of?cease?fire,with?israel′-150(Repeated1?times)(hamas?announced?on?thursday?the?end?of?cease,fire?with∷cease?fire?with?israel)
4)′hamas?announced?thursday?the?end?of?cease?fire?with,israel?and′-150(Repeated9?times)(hamas?announced?thursday?the?end?of?cease,fire?with?israel∷cease?fire?withisrael?and)
5)′hamas?announced?thursday?the?end?of?the?cease,fire?with?israel′-150(Repeated24?times)(null)
6)′hamas?announced?thursday?the?end?of?cease,fire?with?israel?the′-150(Repeated3?times)(null)
7)′hamas?announced?thursday?the?end?of?cease,fire?israel′-150(Repeated?8?times)(null)
8)′hamas?announced?on?thursday?the?end?of?cease,fire?with?israel′-150(Repeated23?times)(null)
9)′hamas?announced?thursday?the?end?of?cease,fire?with?israel?was′-150(Repeated1?times)(null)
10)′hamas?announced?thursday?the?end?of?cease?fire,by?israel?with′-150(Repeated3?times)(hamas?announced?thursday?the?end?of?cease,fire?by∷cease?fire?by?israel?with)
11)′hamas?announced?thursday?the?end?of?cease?fire?with,israe1?as′-150(Repeated3?times)(hamas?announced?thursday?the?end?of?cease,fire?with?israel∷fire?with?israel?as)
12)′hamas?announced?thursday?the?end?of?cease,fire?with?israel?and′-150(Repeated?9?times)(null)
13)′hamas?announced?thursday?the?end?of?cease?fire?with?israel?was′-150(Repeated?1?times)(null)
14)′hamas?announced?thursday?the?end?of?the?cease?fire,with?israel′-150(Repeated?1?times)(hamas?announced?thursday?the?end?of?the?cease,fire?with∷the?ceasefire?with?israel)
15)′hamas?announced?thursday?the?end?of?cease?fire?with,israel?the′-150(Repeaed?3?times)(hamas?announced?thursday?the?end?of?cease,fire?with?israel∷ceasefire?with?israel?the)
16)′hamas?announced?thursday?the?end?of?the?cease?fire?with,israel?the′-145(Repeated?2?times)(hamas?announced?thursday?the?end?of?the?cease,fire?withisrael∷cease?fire?with?israel?the)
17)′hamas?announced?on?thursday?the?end?of?cease?fire?with,israel?was′-145(Repeated?1?times)(null)
18)′hamas?announced?on?thursday?the?end?of?the?cease?fire,with?israel′-145(Repeated?1?times)(hamas?announced?on?thursday?the?end?of?the?cease,fire?with∷thecease?fire?with?israel)
19)′hamas?announced?thursday?the?end?of?the?cease?fire?with,israel?was′-145(Repeated?1?times)(null)
20)′hamas?announced?on?thursday?the?end?of?cease?fire?with,israel?as′-145(Repeated?3?times)(hamas?announced?on?thursday?the?end?of?cease,fire?with?israel∷firewith?israel?as)
Sort according to multiplicity
1)announced?thursday?the?end?of?cease,fire?with?israel-249(Score=135?times)
2)announced?thursday?the?end?of?the?cease,fire?with?israel-99(Score=130?times)
3)announced?thursday?the?end?of?cease,fire?israel-65(Score=130?times)
4)announced?thursday?the?end?of?cease?fire,with?israel-64(Score=135?times)
5)announced?thursday?the?end?of?cease,fire?with?israel?and-59(Score=130?times)
6)announced?thursday?the?end?of?cease?fire,by?israel-58(Score=125?times)
7)announced?thursday?the?end?of?cease,fire?with?israel?the-50(Score=130?times)
8)announced?thursday?the?end?of?cease?fire,by?israel?with-50(Score=130?times)
9)announced?thursday?the?end?of?cease,fire?with?israel?was-47(Score=130?times)
10)announced?on?thursday?the?end?of?cease,fire?with?israel-47(Score=130?times)
11)announced?thursday?the?end?of?its?uniiateral?cease,fire?with?israel-44(Score=120?times)
12)announced?on?thursday?the?end?of?the?cease,fire?with?israel-37(Score=125times)
13)announced?thursday?the?end?of?the?cease?fire,with?israel-31(Score=130?times)
14)e?announced?thursday?the?end?of?cease,fire?with?israel-30(Score=115?times)
15)hamas?announced?thursday?the?end?of?cease,fire?with?israel-28(Score=155times)
16)hamas?announced?thursday?the?end?of?the?cease,fire?with?israel-24(Score=150?times)
17)announced?thursday?the?end?of?its?unilateral?cease?fire,with?israel-24(Score=120?times)
18)hamas?announced?on?thursday?the?end?of?cease,fire?with?israel-23(Score=150times)
19)announced?on?thursday?the?end?of?its?unilateral?cease,fire?with?israel-23(Score=115?times)
20)announced?thursday?the?end?of?the?cease,fire?israel-22(Score=125?times)
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anunció?este?jueves?el?fin?de?su?cese′,′del?fuego?con?israel′(2,null,9)-(604)
No?good?source?overlap
@@@Pre?2@@@
@@@Post?2@@@
Trying?to?overlap′hamas?anuncióeste?jueves′,′del?fuego?con?israel′(2,null,9)-(604)
No?good?source?overlap
Operation result hamas anunci ó este jueves el fin de su cese del fuego con israel (having moved 22 overlapping inspections)
1)hamas?announced?thursday?the?end?of?cease?fire?with?israel-155(Repeated?35 times) 2)hamas?announced?thursday?the?end?of?cease?fire?by?israel?with-150(Repeated?3 times) 3)hamas?announced?thursday?the?end?of?cease?fire?with?israel?as-150(Repeated?3 times) 4)hamas?announced?thursday?the?end?of?cease?fire?israel-150(Repeated?8?times) 5)hamas?announced?thursday?the?end?of?cease?fire?with?israel?and-150(Repeated 9?times) 6)hamas?announced?thursday?the?end?of?cease?fire?with?israel?was-150(Repeated 1?times) 7)hamas?announced?on?thursday?the?end?of?cease?fire?with?israel-150(Repeated 29?times) 8)hamas?announced?thursday?the?end?of?the?cease?fire?with?israel-150(Repeated 28?times) 9)hamas?announced?thursday?the?end?of?cease?fire?with?israel?the-150(Repeated 3?times) 10)hamas?announced?thursday?the?end?of?cease?fire?by?israel-145(Repeated?4 times) 11)hamas?announced?on?thursday?the?end?of?cease?fire?with?israel?was-145 (Repeated?1?times) 12)hamas?announced?thursday?the?end?of?the?cease?fire?by?israel?with-145 (Repeated?2?times) 13)hamas?announced?on?thursday?the?end?of?the?cease?fire?with?israel-145 (Repeated?25?times) 14)hamas?announced?on?thursday?the?end?of?cease?fire?israel-145(Repeated?7 times) 15)hamas?announced?thursday?the?end?of?cease?fire?israel?is-145(Repeated?3 times) 16)hamas?announced?thursday?the?end?of?the?cease?fire?israel-145(Repeated?7 times) 17)hamas?announced?on?thursday?the?end?of?cease?fire?with?israel?and-145 (Repeated?8?times) 18)hamas?announced?on?thursday?the?end?of?cease?fire?with?israel?the-145 (Repeated?2?times) 19)hamas?announced?thursday?the?end?of?the?cease?fire?with?israel?the-145 (Repeated?2?times) 20)hamas?announced?thursday?the?end?of?the?cease?fire?with?israel?was-145 (Repeated?1?times) 21)hamas?announced?on?thursday?the?end?of?cease?fire?by?israel?with-145 (Repeated?2?times) 22)hamas?announced?thursday?the?end?of?cease?fire?and?israel-145(Repeated?4 times) 23)hamas?announced?thursday?the?end?of?the?cease?fire?with?israel?as-145 (Repeated?3?times)
24)hamas?announced?on?thursday?the?end?of?cease?fire?with?israel?as-145 (Repeated?3?times) 25)hamas?announced?thursday?the?end?of?the?cease?fire?with?israel?and-145 (Repeated?8?times) 26)hamas?announced?on?thursday?the?end?of?cease?fire?israel?is-140(Repeated?3 times) 27)hamas?announced?thursday?the?end?of?cease?fire?and?on?israel-140(Repeated 4?times) 28)hamas?announced?on?thursday?the?end?of?the?cease?fire?israel-140(Repeated?7 times) 29)hamas?announced?thursday?the?end?of?the?cease?fire?by?israel-140(Repeated?3 times) 30)hamas?announced?thursday?the?end?of?the?cease?fireand?israel-140(Repeated 3?times) 31)hamas?announced?on?thursday?the?end?of?cease?fire?and?israel-140(Repeated 3?times) 32)hamas?announced?on?thursday?the?end?of?the?cease?fire?with?israel?and-140 (Repeated?8?times) 33)hamas?announced?on?thursday?the?end?of?the?cease?fire?with?israel?the-140 (Repeated?2?times) 34)hamas?announced?thursday?the?end?of?the?cease?fire?israel?is-140(Repeated?3 times) 35)hamas?announced?on?thursday?the?end?of?the?cease?fire?by?israel?with-140 (Repeated?2?times) 36)hamas?announced?on?thursday?the?end?of?cease?fire?by?israel-140(Repeated?3 times) 37)hamas?announced?on?thursday?the?end?of?the?cease?fire?with?israel?was-140 (Repeated?1?times) 38)hamas?announced?on?thursday?the?end?of?the?cease?fire?with?israel?as-140 (Repeated?3?times) 39)hamas?announced?thursday?the?end?of?its?unilateral?cease?fire?with?israel-140 (Repeated?20?times) 40)hamas?announced?thursday?the?end?of?its?unilateral?cease?fire?with?israel?and- 135(Repeated?8?times) 41)hamas?announced?thursday?the?end?of?its?unilateral?cease?fire?with?israel?was- 135(Repeated?1?times) 42)hamas?announced?on?thursday?the?end?of?its?unilateral?cease?fire?with?israel- 135(Repeated?16?times) 43)hamas?announced?thursday?the?end?of?the?cease?fire?and?on?israel-135 (Repeated?4?times) 44)hamas?announced?thursday?the?end?of?cease?fire?hudna?with?israel-135 (Repeated?3?times) 45)hamas?announced?on?thursday?the?end?of?the?cease?fire?and?israel-135 (Repeated?3?times) 46)hamas?announced?thursday?the?end?of?cease?fire?and?on?israel?to-135 (Repeated?3?times) 47)hamas?announced?thursday?the?end?of?cease?fire?against?israel?with-135 (Repeated?2?times) 48)announced?thursday?the?end?of?cease?fire?with?israel-135(Repeated?235?times) 49)hamas?announced?on?thursday?the?end?of?the?cease?fire?by?israel-135
(Repeated?3?times) 50)hamas?announced?thursday?the?end?of?cease?fire?with?israel?defense-135 (Repeated?2?times) 51)hamas?announced?on?thursday?the?end?of?the?cease?fire?israel?is-135 (Repeated?3?times) 52)hamas?announced?thursday?the?end?of?cease?fire?with?israel?since-135 (Repeated?1?times) 53)hamas?announced?on?thursday?the?end?of?cease?fire?and?on?israel-135 (Repeated?3?times) 54)hamas?announced?thursday?the?end?of?cease?fire?with?israel?renew-135 (Repeated?3?times) 55)hamas?announced?thursday?the?end?of?its?unilateral?cease?fire?with?israel?the- 135(Repeated?2?times) 56)hamas?announced?thursday?the?end?of?its?unilateral?cease?fire?israel-135 (Repeated?7?times) 57)hamas?announced?thursday?the?end?of?cease?fire?with?israel?when-135 (Repeated?4?times) 58)hamas?announced?thursday?the?end?of?cease?fire?with?israel?but-135(Repeated 3?times) 59)hamas?announced?thursday?the?end?of?cease?fire?tems?with?israel-135 (Repeated?3?times) 60)hamas?announced?thursday?the?end?of?its?unilateral?cease?fire?by?israel?with- 135(Repeated?2?times) 61)hamas?announced?thursday?the?end?of?cease?fire?with?israel?defence-135 (Repeated?2?times) 62)hamas?announced?thursday?the?end?of?cease?fire?with?israel?even-135 (Repeated?3?times) 63)announced?thursday?the?end?of?cease?fire?with?israel?the-130(Repeated?45 times) 64)hamas?announced?on?thursday?the?end?of?cease?fire?and?on?israel?to-130 (Repeated?2?times) 65)hamas?announced?thursday?the?end?of?the?cease?fire?against?israel?with-130 (Repeated?1?times) 66)hamas?announced?thursday?the?end?of?cease?fire?then?israel-130(Repeated?2 times) 67)hamas?announced?on?thursday?the?end?of?its?unilateral?cease?fire?by?israel?with -130(Repeated?2?times) 68)hamas?announced?thursday?the?end?of?the?cease?fire?with?israel?since-130 (Repeated?1?times) 69)hamas?announced?on?thursday?the?end?of?cease?fire?with?israel?since-130 (Repeated?1?times) 70)hamas?announced?on?thursday?the?end?of?cease?fire?hudna?with?israel-130 (Repeated?3?times) 71)hamas?announced?on?thursday?the?end?of?cease?fire?with?israel?renew-130 (Repeated?2?times) 72)announced?thursday?the?end?of?the?cease?fire?with?israel-130(Repeated?91 times) 73)hamas?announced?thursday?the?end?of?cease?fire?declaration?israel-130 (Repeated?2?times) 74)hamas?announced?on?thursday?the?end?of?cease?fire?with?israel?but-130
(Repeated?3?times) 75)announced?thursday?the?end?of?cease?fire?with?israel?and-130(Repeated?54 times) 76)hamas?announced?thursday?the?end?of?cease?fire?with?israel?when?in-130 (Repeated?3?times) 77)hamas?announced?thursday?the?end?of?cease?fire?with?israel?and?pretty-130 (Repeated?2?times) 78)hamas?announced?thursday?the?end?of?its?unilateral?cease?fire?by?israel-130 (Repeated?3?times) 79)announced?on?thursday?the?end?of?cease?fire?with?israel-130(Repeated?50 times) 80)hamas?announced?on?thursday?the?end?of?cease?fire?terms?with?israel-130 (Repeated?2?times) 81)hamas?announced?thursday?the?end?of?cease?fire?between?israel-130 (Repeated?3?times) 82)hamas?announced?thursday?the?end?of?the?cease?fire?with?israel?but-130 (Repeated?3?times) 83)hamas?announced?thursday?the?end?of?the?cease?fire?with?israel?defence-130 (Repeated?1?times) 84)hamas?announced?on?thursday?the?end?of?its?unilateral?cease?fire?israel-130 (Repeated?4?times) 85)hamas?announced?on?thursday?the?end?of?its?unilateral?cease?fire?with?israel was-130(Repeated?1?times) 86)hamas?announced?thursday?the?end?of?cease?fire?agreement?israel-130 (Repeated?2?times) 87)hamas?announced?thursday?the?end?of?cease?fire?israel?should-130(Repeated 2?times) 88)hamas?announced?on?thursday?the?end?of?cease?fire?against?israel?with-130 (Repeated?2?times) 89)hamas?announced?thursday?the?end?of?cease?fire?israel?conquered-130 (Repeated?2?times) 90)hamas?announced?thursday?the?end?of?its?unilateral?cease?fire?israel?is-130 (Repeated?3?times) 91)hamas?announced?thursday?the?end?of?the?cease?fire?and?on?israel?to-130 (Repeated?3?times) 92)hamas?announced?thursday?the?end?of?cease?fire?by?israel?with?continued-130 (Repeated?2?times) 93)hamas?announced?on?thursday?the?end?of?cease?fire?with?israel?when-130 (Repeated?3?times) 94)hamas?announced?on?thursday?the?end?of?cease?fire?with?israel?defense-130 (Repeated?2?times) 95)hamas?announced?on?thursday?the?end?of?cease?fire?with?israel?even-130 (Repeated?1?times) 96)hamas?announced?thursday?the?end?of?the?cease?fire?with?israel?even-130 (Repeated?2?times) 97)hamas?announced?thursday?the?end?of?the?cease?fire?with?israel?renew-130 (Repeated?2?times) 98)and?announced?thursday?the?end?of?cease?fire?with?israel-130(Repeated?12 times) 99)hamas?announced?thursdav?the?end?of?the?cease?fire?terms?with?israel-130
(Repeated?2?times) 100)announced?thursday?the?end?of?cease?fire?israel-130(Repeated?55?times)
Time?so?far?took?101.26(0)

Claims (236)

1. a method of obtaining the associated concepts knowledge base is characterized in that, described method comprises step:
Document pairing with two kinds of different language representation's same concept is provided, and first in the wherein said document pairing is with first kind of language performance, and second in the described document pairing with second kind of language performance;
The inquiry that reception will be analyzed, described inquiry is with described first kind of language performance, and described inquiry is made up of word or word strings;
Analyzing described first in the pairing of described document discerns the described institute first in of described inquiry in described document pairing and occurs;
Select a plurality of word scopes in described second in the pairing of described document, the scope of described selection is corresponding to the described appearance first in of described inquiry in described document pairing;
Calculating is included in the frequency of word and word strings in the scope of described selection;
Based on all the unique words that obtain by described calculation procedure and the appearance of word strings, list described frequency with form; And
If use the described frequency of listing with form, described unique word and word strings appear in the scope of a more than selection, then return the tabulation of the appearance of all unique words and word strings.
2. the method for claim 1 is characterized in that, if certain word or word strings are the subclass that appears at the longer word strings in the scope of a more than selection, then described calculation procedure is ignored the appearance of this word or word strings.
3. a method of obtaining the associated concepts knowledge base is characterized in that, described method comprises step:
A plurality of documents pairing with two kinds of different language representation's same concept is provided, and one group in wherein a plurality of described documents pairings with first kind of language performance, and second group in described a plurality of documents pairings with second kind of language performance;
The inquiry that reception will be analyzed, described inquiry is with described first kind of language performance, and described inquiry is made up of word or word strings;
Analyze in described a plurality of pairing described first group discern described inquiry in described first group the institute occur;
Select a plurality of word scopes in described second group in described a plurality of pairings, the scope of described selection is corresponding to the appearance of described inquiry in described first group;
Calculating is included in the frequency of word and word strings in the scope of described selection;
Based on all the unique words that obtain by described calculation procedure and the appearance of word strings, list described frequency with form; And
If use the described frequency of listing with form, described unique word and word strings appear in the scope of a more than selection, then return the tabulation of the appearance of all unique words and word strings.
4. method as claimed in claim 3 is characterized in that, if certain word or word strings are the subclass that appears at the longer word strings in the scope of a more than selection, then described calculation procedure is ignored the appearance of this word or word strings.
5. computer equipment, described computer equipment comprises processor, is connected to the storer of described processor, and is stored in the program in the described storer, it is characterized in that, described computer configuration for carry out described program and carry out below step:
Document pairing with two kinds of different language representation's same concept is provided, and first in the wherein said document pairing is with first kind of language performance, and second in the described document pairing with second kind of language performance;
The inquiry that reception will be analyzed, described inquiry is with described first kind of language performance, and described inquiry is made up of word or word strings;
Analyzing described first in the pairing of described document discerns the described institute first in of described inquiry in described document pairing and occurs;
Select a plurality of word scopes in described second in the pairing of described document, the scope of described selection is corresponding to the described appearance first in of described inquiry in described document pairing;
Calculating is included in the frequency of word and word strings in the scope of described selection;
Based on all the unique words that obtain by described calculation procedure and the appearance of word strings, list described frequency with form; And
If use the described frequency of listing with form, described unique word and word strings appear in the scope of a more than selection, then return the tabulation of the appearance of all unique words and word strings.
6. computer equipment as claimed in claim 5 is characterized in that, if certain word or word strings are the subclass that appears at the longer word strings in the scope of a more than selection, then described calculation procedure is ignored the appearance of this word or word strings.
7. computer equipment, described computer equipment comprises processor, is connected to the storer of described processor, and is stored in the program in the described storer, it is characterized in that, described computer configuration for carry out described program and carry out below step:
A plurality of documents pairing with two kinds of different language representation's same concept is provided, and one group in wherein a plurality of described documents pairings with first kind of language performance, and second group in described a plurality of documents pairings with second kind of language performance;
The inquiry that reception will be analyzed, described inquiry is with described first kind of language performance, and described inquiry is made up of word or word strings;
Analyze in described a plurality of pairing described first group discern described inquiry in described first group the institute occur;
Select a plurality of word scopes in described second group in described a plurality of pairings, the scope of described selection is corresponding to the appearance of described inquiry in described first group;
Calculating is included in the frequency of word and word strings in the scope of described selection, and described frequency is based on the appearance of all unique words and word strings;
Based on all the unique words that obtain by described calculation procedure and the appearance of word strings, list described frequency with form; And
If use the described frequency of listing with form, described unique word and word strings appear in the scope of a more than selection, then return the tabulation of the appearance of all unique words and word strings.
8. computer equipment as claimed in claim 7 is characterized in that, if certain word or word strings are the subclass that appears at the longer word strings in the scope of a more than selection, then described calculation procedure is ignored the appearance of this word or word strings.
9. computer-readable medium, the procedure stores that can be carried out by computer processor is characterized in that described program is used to carry out following step thereon:
Document pairing with two kinds of different language representation's same concept is provided, and first in the wherein said document pairing is with first kind of language performance, and second in the described document pairing with second kind of language performance;
The inquiry that reception will be analyzed, described inquiry is with described first kind of language performance, and described inquiry is made up of word or word strings;
Analyzing described first in the pairing of described document discerns the described institute first in of described inquiry in described document pairing and occurs;
Select a plurality of word scopes in described second in the pairing of described document, the scope of described selection is corresponding to the described appearance first in of described inquiry in described document pairing;
Calculating is included in the frequency of word and word strings in the scope of described selection;
Based on all the unique words that obtain by described calculation procedure and the appearance of word strings, list described frequency with form; And
If use the described frequency of listing with form, described unique word and word strings appear in the scope of a more than selection, then return the tabulation of the appearance of all unique words and word strings.
10. computer-readable storage medium as claimed in claim 9 is characterized in that, if certain word or word strings are the subclass that appears at the longer word strings in the scope of a more than selection, then described calculation procedure is ignored the appearance of this word or word strings.
11. a computer-readable medium, the procedure stores that can be carried out by computer processor is characterized in that described program is used to carry out following step thereon:
A plurality of documents pairing with two kinds of different language representation's same concept is provided, and one group in wherein a plurality of described documents pairings with first kind of language performance, and second group in described a plurality of documents pairings with second kind of language performance;
The inquiry that reception will be analyzed, described inquiry is with described first kind of language performance, and described inquiry is made up of word or word strings;
Analyze in described a plurality of pairing described first group discern described inquiry in described first group the institute occur;
Select a plurality of word scopes in described second group in described a plurality of pairings, the scope of described selection is corresponding to the appearance of described inquiry in described first group;
Calculating is included in the frequency of word and word strings in the scope of described selection;
Based on all the unique words that obtain by described calculation procedure and the appearance of word strings, list described frequency with form; And
If use the described frequency of listing with form, described unique word and word strings appear in the scope of a more than selection, then return the tabulation of the appearance of all unique words and word strings.
12. computer-readable storage medium as claimed in claim 11 is characterized in that, if certain word or word strings are the subclass that appears at the longer word strings in the scope of a more than selection, then described calculation procedure is ignored the appearance of this word or word strings.
13. one kind is carried out signifying word to association, so that carry out the method for high-efficiency information transmission, it is characterized in that described method comprises following step:
Create related; And
The mark that is equivalent to described association by appointment comes the described association of signifying word;
Described establishment association comprises:
Document pairing with two kinds of different language representation's same concept is provided, and first in the wherein said document pairing is with first kind of language performance, and second in the described document pairing with second kind of language performance;
The inquiry that reception will be analyzed, described inquiry is with described first kind of language performance, and described inquiry is made up of word or word strings;
Analyzing described first in the pairing of described document discerns the described institute first in of described inquiry in described document pairing and occurs;
Select a plurality of word scopes in described second in the pairing of described document, the scope of described selection is corresponding to the described appearance first in of described inquiry in described document pairing;
Calculating is included in the frequency of word and word strings in the scope of described selection, if certain word or word strings are the subclass that appears at the longer word strings in the scope of a more than selection, then described calculation procedure is ignored the appearance of this word or word strings;
Based on all the unique words that obtain by described calculation procedure and the appearance of word strings, list described frequency with form; And
If use the described frequency of listing with form, described unique word and word strings appear in the scope of a more than selection, then return the tabulation of the appearance of all unique words and word strings.
14. method as claimed in claim 13 is characterized in that, also comprises:
Described mark is sent to the second place or a plurality of second place from a position;
In the described second place or a plurality of second place, analyze the mark of described appointment and discern described association; And
Provide described association to the user.
15. one kind is carried out signifying word to association, so that carry out the method for high-efficiency information transmission, it is characterized in that described method comprises following step:
Create related; And
The mark that is equivalent to described association by appointment comes the described association of signifying word;
Described establishment association comprises:
A plurality of documents pairing with two kinds of different language representation's same concept is provided, and one group in wherein a plurality of described documents pairings with first kind of language performance, and second group in described a plurality of documents pairings with second kind of language performance;
The inquiry that reception will be analyzed, described inquiry is with described first kind of language performance, and described inquiry is made up of word or word strings;
Analyze in described a plurality of pairing described first group discern described inquiry in described first group the institute occur;
Select a plurality of word scopes in described second group in described a plurality of pairings, the scope of described selection is corresponding to the appearance of described inquiry in described first group;
Calculating is included in the frequency of word and word strings in the scope of described selection, if certain word or word strings are the subclass that appears at the longer word strings in the scope of a more than selection, then described calculation procedure is ignored the appearance of this word or word strings;
Based on all the unique words that obtain by described calculation procedure and the appearance of word strings, list described frequency with form; And
If use the described frequency of listing with form, described unique word and word strings appear in the scope of a more than selection, then return the tabulation of the appearance of all unique words and word strings.
16. method as claimed in claim 15 is characterized in that, also comprises:
Described mark is sent to the second place or a plurality of second place from a position;
In the described second place or a plurality of second place, analyze the mark of described appointment and discern described association; And
Provide described association to the user.
17. an establishment comprises the method for the associated concepts knowledge base of source language, target language and a kind of the 3rd language, it is characterized in that, described method comprises following step:
The inquiry that reception will be analyzed, described inquiry is expressed with source language, and described inquiry is made up of word or word strings;
With described query translation is result with described the 3rd language performance;
Described result is translated as second result who expresses with described target language; And
Related described inquiry and described second result in described target language.
18. an establishment comprises the method for the associated concepts knowledge base of source language, target language and multiple the 3rd language, it is characterized in that, described method comprises following step:
A. receive the inquiry that will analyze, described inquiry is expressed with source language, and described inquiry is made up of word or word strings;
B. with described query translation result with described the 3rd language performance;
C. described result is translated as second result who expresses with described target language;
D. in described multiple the 3rd language each, repeating step b and c;
E. return each among described second result; And
F. to all second results, that one or more described second results are related with described inquiry by two or more generations in the described multilingual.
19. as claim 17 or 15 described methods, it is characterized in that, comprise the steps:
Use existing one or more translation scheme, described inquiry is the 3rd result in the described target language of translation;
Return described the 3rd result, and the described result who returns is added among second result in the described described target language that returns; And
To producing all the second or the 3rd results more than once, one or more described second result and the 3rd result of association search inquiry.
20. a computer equipment, described computer equipment comprise processor, are connected to the storer of described processor, and are stored in the program in the described storer, it is characterized in that, described computer configuration is for carrying out described program and carrying out following step:
The inquiry that reception will be analyzed, described inquiry is expressed with source language, and described inquiry is made up of word or word strings;
With described query translation is result with described the 3rd language performance;
Described result is translated as second result who expresses with described target language; And
Related described inquiry and described second result in described target language.
21. a computer equipment, described computer equipment comprise processor, are connected to the storer of described processor, and are stored in the program in the described storer, it is characterized in that, described computer configuration is for carrying out described program and carrying out following step:
A. receive the inquiry that will analyze, described inquiry is expressed with source language, and described inquiry is made up of word or word strings;
B. with described query translation result with described the 3rd language performance;
C. described result is translated as second result who expresses with described target language;
D. in described multiple the 3rd language each, repeating step b and c;
E. return each among described second result; And
F. to all second results, that one or more described second results are related with described inquiry by two or more generations in the described multilingual.
22. as claim 20 or 21 described computer equipments, it is characterized in that, also be configured to carry out following step:
Use existing one or more translation scheme, described inquiry is the 3rd result in the described target language of translation;
Return described the 3rd result, and the described result who returns is added among second result in the described described target language that returns; And
To producing all the second or the 3rd results more than once, one or more described second result and the 3rd result of association search inquiry.
23. a computer-readable medium, the procedure stores that can be carried out by computer processor is characterized in that described program is used to carry out following step thereon:
The inquiry that reception will be analyzed, described inquiry is expressed with source language, and described inquiry is made up of word or word strings;
With described query translation is result with described the 3rd language performance;
Described result is translated as second result who expresses with described target language; And
Related described inquiry and described second result in described target language.
24. a computer-readable medium, the procedure stores that can be carried out by computer processor is characterized in that described program is used to carry out following step thereon:
A. receive the inquiry that will analyze, described inquiry is expressed with source language, and described inquiry is made up of word or word strings;
B. with described query translation result with described the 3rd language performance;
C. described result is translated as second result who expresses with described target language;
D. in described multiple the 3rd language each, repeating step b and c;
E. return each among described second result; And
F. to all second results, that one or more described second results are related with described inquiry by two or more generations in the described multilingual.
25. as claim 23 or 24 described computer medias, it is characterized in that, also be configured to carry out following step:
Use existing one or more translation scheme, described inquiry is the 3rd result in the described target language of translation;
Return described the 3rd result, and the described result who returns is added among second result in the described described target language that returns; And
To producing all the second or the 3rd results more than once, one or more described second result and the 3rd result of association search inquiry.
26. one kind is carried out signifying word to association, so that carry out the method for high-efficiency information transmission, it is characterized in that described method comprises following step:
Step below using, create the association that comprises source language, target language and a kind of the 3rd language:
The inquiry that reception will be analyzed, described inquiry is expressed with source language, and described inquiry is made up of word or word strings;
With described query translation is result with described the 3rd language performance;
Described result is translated as second result who expresses with described target language;
Related described inquiry and described second result in described target language; And
The mark that is equivalent to described association by appointment comes the described association of signifying word.
27. method as claimed in claim 26 is characterized in that, also comprises;
Described mark is sent to the second place or a plurality of second place from a position;
In the described second place or a plurality of second place, analyze the mark of described appointment and discern described association; And
Provide described association to the user.
28. one kind is carried out signifying word to association, so that carry out the method for high-efficiency information transmission, it is characterized in that described method comprises following step:
Step below using, create the association that comprises source language, target language and multiple the 3rd language:
A. receive the inquiry that will analyze, described inquiry is expressed with source language, and described inquiry is made up of word or word strings;
B. with described query translation result with described the 3rd language performance;
C. described result is translated as second result who expresses with described target language;
D. in described multiple the 3rd language each, repeating step b and c;
E. return each among described second result;
F. to all second results, that one or more described second results are related with described inquiry by two or more generations in the described multilingual; And
The mark that is equivalent to described association by appointment comes the described association of signifying word.
29. method as claimed in claim 28 is characterized in that, also comprises;
Described mark is sent to the second place or a plurality of second place from a position;
In the described second place or a plurality of second place, analyze the mark of described appointment and discern described association; And
Provide described association to the user.
30. a method of creating the associated concepts knowledge base is characterized in that, described method comprises step:
Use provides the translation of the word strings of expressing with first kind of speech with the word and/or the word strings of second kind of language performance;
Corpus of documents with described second kind of language performance is provided;
The inquiry that reception will be analyzed, described inquiry is expressed with source language, and described inquiry is made up of word or word strings;
To described inquiry, the described translation that provides is provided, each word all translations in described second kind of language of described word strings inquiry are formed in identification;
Analyze described corpus of documents, search word strings with described second kind of language performance, the word strings with user-defined maximum word number is only discerned in wherein said analysis, the word strings that has in described identification step the translation that the word with first kind of language performance by user-defined minimum number obtains is only discerned in described analysis, and described analysis is only to counting with each the translation in the described word of first kind of language performance; And
From the described analysis that described corpus of documents is carried out, return tabulation with the described word strings of described second kind of language performance as the word strings result.
31. method as claimed in claim 30, it is characterized in that, described word strings with described second kind of language performance has first and second portion at least, and the inquiry that described first kind of language represented in described tabulation is related with described second kind of language expression, and described method also comprises following step:
Any two appearance with described word strings result who returns of overlapping described first and second portion are searched in the word strings result's that inspection is returned described tabulation;
All described two overlapping word strings of returning are combined as the 3rd word strings, and wherein said the 3rd word strings is that described first word strings and described second word strings merge described overlapping word combination afterwards; And
All described the 3rd word strings are added in described word strings result's the described tabulation.
32. method as claimed in claim 30 is characterized in that, comprises first kind of certain words string in the language with the word of first kind of language performance, as idiom and collocation.
33. as claim 30,31 and 32 described methods, it is characterized in that, also comprise:
Based on user-defined criterion described word strings the results list is carried out classification.
34. a computer equipment, described computer equipment comprise processor, are connected to the storer of described processor, and are stored in the program in the described storer, it is characterized in that, described computer configuration is for carrying out described program and carrying out following step:
Use provides the translation of the word strings of expressing with first kind of speech with the word and/or the word strings of second kind of language performance;
Corpus of documents with described second kind of language performance is provided;
The inquiry that reception will be analyzed, described inquiry is expressed with source language, and described inquiry is made up of word or word strings;
To described inquiry, the described translation that provides is provided, each word all translations in described second kind of language of described word strings inquiry are formed in identification;
Analyze described corpus of documents, search word strings with described second kind of language performance, the word strings with user-defined maximum word number is only discerned in wherein said analysis, the word strings that has in described identification step the translation that the word with first kind of language performance by user-defined minimum number obtains is only discerned in described analysis, and described analysis is only to counting with each the translation in the described word of first kind of language performance; And
From the described analysis that described corpus of documents is carried out, return tabulation with the described word strings of described second kind of language performance as the word strings result.
35. method as claimed in claim 34, it is characterized in that, described word strings with described second kind of language performance has first and second portion at least, and the inquiry that described first kind of language represented in described tabulation is related with described second kind of language expression, and described method also comprises following step:
Any two appearance with described word strings result who returns of overlapping described first and second portion are searched in the word strings result's that inspection is returned described tabulation;
All described two overlapping word strings of returning are combined as the 3rd word strings, and wherein said the 3rd word strings is that described first word strings and described second word strings merge described overlapping word combination afterwards; And all described the 3rd word strings are added in described word strings result's the described tabulation.
36. method as claimed in claim 34 is characterized in that, comprises first kind of certain words string in the language with the word of first kind of language performance, as idiom and collocation.
37. method as claimed in claim 34 is characterized in that, also comprises:
Based on user-defined criterion described word strings the results list is carried out classification.
38. a computer-readable medium, the procedure stores that can be carried out by computer processor is characterized in that described program is used to carry out following step thereon:
Use provides the translation of the word strings of expressing with first kind of speech with the word and/or the word strings of second kind of language performance;
Corpus of documents with described second kind of language performance is provided;
The inquiry that reception will be analyzed, described inquiry is expressed with source language, and described inquiry is made up of word or word strings;
To described inquiry, the described translation that provides is provided, each word all translations in described second kind of language of described word strings inquiry are formed in identification;
Analyze described corpus of documents, search word strings with described second kind of language performance, the word strings with user-defined maximum word number is only discerned in wherein said analysis, the word strings that has in described identification step the translation that the word with first kind of language performance by user-defined minimum number obtains is only discerned in described analysis, and described analysis is only to counting with each the translation in the described word of first kind of language performance; And
From the described analysis that described corpus of documents is carried out, return tabulation with the described word strings of described second kind of language performance as the word strings result.
39. computer media as claimed in claim 38, it is characterized in that, described word strings with described second kind of language performance has first and second portion at least, and the inquiry that described first kind of language represented in described tabulation is related with described second kind of language expression, and described method also comprises following step:
Any two appearance with described word strings result who returns of overlapping described first and second portion are searched in the word strings result's that inspection is returned described tabulation;
All described two overlapping word strings of returning are combined as the 3rd word strings, and wherein said the 3rd word strings is that described first word strings and described second word strings merge described overlapping word combination afterwards; And
All described the 3rd word strings are added in described word strings result's the described tabulation.
40. computer media as claimed in claim 38 is characterized in that, comprises first kind of certain words string in the language with the word of first kind of language performance, as idiom and collocation.
41. computer media as claimed in claim 38 is characterized in that, also comprises:
Based on user-defined criterion described word strings the results list is carried out classification.
42. one kind is carried out signifying word to association, so that carry out the method for high-efficiency information transmission, it is characterized in that described method comprises following step:
Create related; And
The mark that is equivalent to described association by appointment comes the described association of signifying word;
Described establishment association comprises:
Use provides the translation with the word strings of first kind of language performance with the word and/or the word strings of second kind of language performance;
Corpus of documents with described second kind of language performance is provided;
The inquiry that reception will be analyzed, described inquiry is expressed with source language, and described inquiry is made up of word or word strings;
To described inquiry, the described translation that provides is provided, each word all translations in described second kind of language of described word strings inquiry are formed in identification;
Analyze described corpus of documents, search word strings with described second kind of language performance, the word strings with user-defined maximum word number is only discerned in wherein said analysis, the word strings that has in described identification step the translation that the word with first kind of language performance by user-defined minimum number obtains is only discerned in described analysis, and described analysis is only to counting with each the translation in the described word of first kind of language performance; And
From the described analysis that described corpus of documents is carried out, return tabulation with the described word strings of described second kind of language performance as the word strings result.
43. method as claimed in claim 42 is characterized in that, also comprises;
Described mark is sent to the second place or a plurality of second place from a position;
In the described second place or a plurality of second place, analyze the mark of described appointment and discern described association; And
Provide described association to the user.
44. method as claimed in claim 42 is characterized in that, comprises first kind of certain words string in the language with the word of first kind of language performance, as idiom and collocation.
45. method as claimed in claim 30 is characterized in that, also comprises:
Corpus of documents with described first kind of language performance is provided;
In described corpus of documents, discern the appearance of the user definition quantity of described inquiry with described first kind of language performance;
Analysis is in the each described word and/or the word strings that the user definition quantity on the left side and the right occurs of described inquiry, and identification forms in the word of the user definition quantity on the described inquiry left side and/or word strings, described inquiry, and in the word of the user definition quantity on described inquiry the right and/or the word strings of word strings;
Create the tabulation of the result's who forms described analytical procedure the word strings of returning;
Each word strings of returning of separate analysis, and use the described translation identification that provides to form each word all translations in described second kind of language of each described word strings of returning;
Analyze described corpus of documents, search word strings with described second kind of language performance, the word strings with user-defined maximum word number is only discerned in wherein said analysis, the word strings that has in described foundation step the translation that the word with first kind of language performance by user-defined minimum number obtains is only discerned in described analysis, and described analysis is only to counting with each the translation in the described word of first kind of language performance;
From the described analysis that described corpus of documents is carried out, return tabulation with the described word strings of described second kind of language performance as the word strings result;
Analyzing tabulation of described word strings and described second word strings tabulates and discerns each word strings in the described word strings tabulation as the number of times of the subclass word strings appearance of the word strings in described second word strings tabulation; And
Based on the step of described word strings tabulation of described analysis and the tabulation of described second word strings, return a tabulation.
46. method as claimed in claim 45, it is characterized in that, the step of described word strings tabulation of described analysis and the tabulation of described second word strings comprises, if word strings is the subclass that is in the longer word strings on the identical return-list, that then ignores this word strings described occurrence number occurs revising at every turn.
47. method as claimed in claim 45 is characterized in that, comprises first kind of certain words string in the language with the word of first kind of language performance, as idiom and collocation.
48. as claim 45 or 46 described methods, it is characterized in that, also comprise:
Based on user-defined criterion described word strings the results list is carried out classification.
49. computer equipment as claimed in claim 34 is characterized in that, also is configured to carry out following step:
Corpus of documents with described first kind of language performance is provided;
In described corpus of documents, discern the appearance of the user definition quantity of described inquiry with described first kind of language performance;
Analysis is in the each described word and/or the word strings that the user definition quantity on the left side and the right occurs of described inquiry, and identification forms in the word of the user definition quantity on the described inquiry left side and/or word strings, described inquiry, and in the word of the user definition quantity on described inquiry the right and/or the word strings of word strings;
Create the tabulation of the result's who forms described analytical procedure the word strings of returning;
Each word strings of returning of separate analysis, and use the described translation identification that provides to form each word all translations in described second kind of language of each described word strings of returning;
Analyze described corpus of documents, search word strings with described second kind of language performance, the word strings with user-defined maximum word number is only discerned in wherein said analysis, the word strings that has in described foundation step the translation that the word with first kind of language performance by user-defined minimum number obtains is only discerned in described analysis, and described analysis is only to counting with each the translation in the described word of first kind of language performance;
From the described analysis that described corpus of documents is carried out, return tabulation with the described word strings of described second kind of language performance as the word strings result;
Analyzing tabulation of described word strings and described second word strings tabulates and discerns each word strings in the described word strings tabulation as the number of times of the subclass word strings appearance of the word strings in described second word strings tabulation; And
Based on the step of described word strings tabulation of described analysis and the tabulation of described second word strings, return a tabulation.
50. computer equipment as claimed in claim 49, it is characterized in that, the step of described word strings tabulation of described analysis and the tabulation of described second word strings comprises, if word strings is the subclass that is in the longer word strings on the identical return-list, that then ignores this word strings described occurrence number occurs revising at every turn.
51. computer equipment as claimed in claim 49 is characterized in that, comprises first kind of certain words string in the language with the word of first kind of language performance, as idiom and collocation.
52. as claim 49 or 50 described computer equipments, it is characterized in that, also be configured to carry out following step:
Based on user-defined criterion described word strings the results list is carried out classification.
53. computer-readable storage medium as claimed in claim 38 is characterized in that, also is configured to carry out following step:
Corpus of documents with described first kind of language performance is provided;
In described corpus of documents, discern the appearance of the user definition quantity of described inquiry with described first kind of language performance;
Analysis is in the each described word and/or the word strings that the user definition quantity on the left side and the right occurs of described inquiry, and identification forms in the word of the user definition quantity on the described inquiry left side and/or word strings, described inquiry, and in the word of the user definition quantity on described inquiry the right and/or the word strings of word strings;
Create the tabulation of the result's who forms described analytical procedure the word strings of returning;
Each word strings of returning of separate analysis, and use the described translation identification that provides to form each word all translations in described second kind of language of each described word strings of returning;
Analyze described corpus of documents, search word strings with described second kind of language performance, the word strings with user-defined maximum word number is only discerned in wherein said analysis, the word strings that has in described foundation step the translation that the word with first kind of language performance by user-defined minimum number obtains is only discerned in described analysis, and described analysis is only to counting with each the translation in the described word of first kind of language performance;
From the described analysis that described corpus of documents is carried out, return tabulation with the described word strings of described second kind of language performance as the word strings result;
Analyzing tabulation of described word strings and described second word strings tabulates and discerns each word strings in the described word strings tabulation as the number of times of the subclass word strings appearance of the word strings in described second word strings tabulation; And
Based on the step of described word strings tabulation of described analysis and the tabulation of described second word strings, return a tabulation.
54. computer media as claimed in claim 53, it is characterized in that, the step of described word strings tabulation of described analysis and the tabulation of described second word strings comprises, if word strings is the subclass that is in the longer word strings on the identical return-list, that then ignores this word strings described occurrence number occurs revising at every turn.
55. computer media as claimed in claim 53 is characterized in that, comprises first kind of certain words string in the language with the word of first kind of language performance, as idiom and collocation.
56. computer media as claimed in claim 53 is characterized in that, also is configured to carry out following step:
Based on user-defined criterion described word strings the results list is carried out classification.
57. one kind is carried out signifying word to association, so that carry out the method for high-efficiency information transmission, it is characterized in that described method comprises following step:
Create related; And
The mark that is equivalent to described association by appointment comes the described association of signifying word;
Described establishment association comprises:
Use provides the translation of the word strings of expressing with first kind of speech with the word and/or the word strings of second kind of language performance;
Corpus of documents with described second kind of language performance is provided;
The inquiry that reception will be analyzed, described inquiry is expressed with source language, and described inquiry is made up of word or word strings;
To described inquiry, the described translation that provides is provided, each word all translations in described second kind of language of described word strings inquiry are formed in identification;
Analyze described corpus of documents, search word strings with described second kind of language performance, the word strings with user-defined maximum word number is only discerned in wherein said analysis, the word strings that has in described identification step the translation that the word with first kind of language performance by user-defined minimum number obtains is only discerned in described analysis, and described analysis is only to counting with each the translation in the described word of first kind of language performance;
From the described analysis that described corpus of documents is carried out, return as a result of with the tabulation of the described word strings of described second kind of language performance;
Corpus of documents with described first kind of language performance is provided;
In described corpus of documents, discern the appearance of the user definition quantity of described inquiry with described first kind of language performance;
Analysis is in the each described word and/or the word strings that the user definition quantity on the left side and the right occurs of described inquiry, and identification forms in the word of the user definition quantity on the described inquiry left side and/or word strings, described inquiry, and in the word of the user definition quantity on described inquiry the right and/or the word strings of word strings;
Create the tabulation of the result's who forms described analytical procedure the word strings of returning;
Each word strings of returning of separate analysis, and use the described translation identification that provides to form each word all translations in described second kind of language of each described word strings of returning;
Analyze described corpus of documents, search word strings with described second kind of language performance, the word strings with user-defined maximum word number is only discerned in wherein said analysis, the word strings that has in described foundation step the translation that the word with first kind of language performance by user-defined minimum number obtains is only discerned in described analysis, and described analysis is only to counting with each the translation in the described word of first kind of language performance;
From the described analysis that described corpus of documents is carried out, return tabulation with the described word strings of described second kind of language performance as the word strings result;
Analyzing tabulation of described word strings and described second word strings tabulates and discerns each word strings in the described word strings tabulation as the number of times of the subclass word strings appearance of the word strings in described second word strings tabulation; And
Based on the step of described word strings tabulation of described analysis and the tabulation of described second word strings, return a tabulation.
58. method as claimed in claim 57 is characterized in that, also comprises:
Described mark is sent to the second place or a plurality of second place from a position;
In the described second place or a plurality of second place, analyze the mark of described appointment and discern described association; And
Provide described association to the user.
59. method as claimed in claim 57 is characterized in that, comprises first kind of certain words string in the language with the word of first kind of language performance, as idiom and collocation.
60. a method of obtaining the associated concepts knowledge base is characterized in that, described method comprises step:
The word strings that use is expressed with target language provides the translation of the word strings of expressing with source language;
Two contents fragments that reception is expressed with described source language, the lap that wherein said first fragment and described second fragment have described content;
Use the translation of described word strings, translate described first content fragment, thereby return the 3rd fragment of expressing with described target language;
Use the translation of described word strings, translate described second content fragment, thereby return the 4th fragment of expressing with described target language;
Analyze described the 3rd fragment and described the 4th fragment, determine whether described the 3rd fragment and described the 4th fragment have lap;
If described the 3rd fragment and described the 4th fragment have lap, the lap of the lap of then related described the 3rd fragment and described the 4th fragment and described first fragment and described second fragment; And
If described the 3rd fragment and described the 4th fragment have lap, then related described the 3rd fragment and described the 4th fragment merge single target language word string that the combination of described lap obtains and merge described lap with described first fragment and described second fragment and make up the single source language word string that obtains.
61. a computer equipment, described computer equipment comprise processor, are connected to the storer of described processor, and are stored in the program in the described storer, it is characterized in that, described computer configuration is for carrying out described program and carrying out following step:
The word strings that use is expressed with target language provides the translation of the word strings of expressing with source language;
Two contents fragments that reception is expressed with described source language, the lap that wherein said first fragment and described second fragment have described content;
Use the translation of described word strings, translate described first content fragment, thereby return the 3rd fragment of expressing with described target language;
Use the translation of described word strings, translate described second content fragment, thereby return the 4th fragment of expressing with described target language;
Analyze described the 3rd fragment and described the 4th fragment, determine whether described the 3rd fragment and described the 4th fragment have lap;
If described the 3rd fragment and described the 4th fragment have lap, the lap of the lap of then related described the 3rd fragment and described the 4th fragment and described first fragment and described second fragment; And
If described the 3rd fragment and described the 4th fragment have lap, then related described the 3rd fragment and described the 4th fragment merge single target language word string that the combination of described lap obtains and merge described lap with described first fragment and described second fragment and make up the single source language word string that obtains.
62. a computer-readable medium, the procedure stores that can be carried out by computer processor is characterized in that described program is used to carry out following step thereon:
The word strings that use is expressed with target language provides the translation of the word strings of expressing with source language;
Two contents fragments that reception is expressed with described source language, the lap that wherein said first fragment and described second fragment have described content;
Use the translation of described word strings, translate described first content fragment, thereby return the 3rd fragment of expressing with described target language;
Use the translation of described word strings, translate described second content fragment, thereby return the 4th fragment of expressing with described target language;
Analyze described the 3rd fragment and described the 4th fragment, determine whether described the 3rd fragment and described the 4th fragment have lap;
If described the 3rd fragment and described the 4th fragment have lap, the lap of the lap of then related described the 3rd fragment and described the 4th fragment and described first fragment and described second fragment; And
If described the 3rd fragment and described the 4th fragment have lap, then related described the 3rd fragment and described the 4th fragment merge single target language word string that the combination of described lap obtains and merge described lap with described first fragment and described second fragment and make up the single source language word string that obtains.
63. one kind is carried out signifying word to association, so that carry out the method for high-efficiency information transmission, it is characterized in that described method comprises following step:
Create related; And
The mark that is equivalent to described association by appointment comes the described association of signifying word;
Described establishment association comprises:
The word strings that use is expressed with target language provides the translation of the word strings of expressing with source language;
Two contents fragments that reception is expressed with described source language, the lap that wherein said first fragment and described second fragment have described content;
Use the translation of described word strings, translate described first content fragment, thereby return the 3rd fragment of expressing with described target language;
Use the translation of described word strings, translate described second content fragment, thereby return the 4th fragment of expressing with described target language;
Analyze described the 3rd fragment and described the 4th fragment, determine whether described the 3rd fragment and described the 4th fragment have lap;
If described the 3rd fragment and described the 4th fragment have lap, the lap of the lap of then related described the 3rd fragment and described the 4th fragment and described first fragment and described second fragment; And
If described the 3rd fragment and described the 4th fragment have lap, then related described the 3rd fragment and described the 4th fragment merge single target language word string that the combination of described lap obtains and merge described lap with described first fragment and described second fragment and make up the single source language word string that obtains.
64. as the described method of claim 63, it is characterized in that, also comprise:
Described mark is sent to the second place or a plurality of second place from a position;
In the described second place or a plurality of second place, analyze the mark of described appointment and discern described association; And
Provide described association to the user.
65. a converted contents and rebuild the method for knowledge base is characterized in that described method comprises following step:
A. receive content with first kind of language performance;
B. be a plurality of fragments with described Context resolution with first kind of language performance;
C. select first fragment and second fragment, the lap that described first fragment and described second fragment have described content;
D. visit with first target fragment in the described content of second kind of language performance, described first target fragment is corresponding to one in described first fragment and second fragment;
E. visit with second target fragment in the described content of second kind of language performance, described second target fragment is corresponding in described first fragment and second fragment another, and has lap with described first target fragment;
F. merge the combination that lap obtains based on described first target fragment and second target fragment, determine described content with second kind of language performance;
G., described content with described second kind of language performance is provided; And
H. whole in a plurality of fragments, repeating step c to g wherein is appointed as first fragment with described second fragment, and will be appointed as second fragment with next fragment that described second fragment has a lap; And
I. to all next fragments in described a plurality of fragments, repeating step h is up to described content all being converted to described second kind of language.
66. change the method for document content by rebuilding knowledge base for one kind, it is characterized in that described method comprises following step:
Utilize the fragment association database between the content of first kind of language and second kind of language, wherein said conversion comprises that its appropriate translation that has an overlapping contents fragment with them in described second kind of language resolves and check the overlapping fragments of the document content of described first kind of language; The content of first kind of language content by described inspection and second kind of language of described inspection merges overlapping fragment, and after merging overlapping fragments the content of the described first kind of language of association and the content of described second kind of language.
67. the method changing document and rebuild knowledge base is characterized in that described method comprises following step:
A. provide the data slot that comprises first kind of language to reach the content of the data slot of second kind of language related with it;
B. from the document that will translate, select the data slot that starts from first word of document and be present in first kind of language in the database;
The fragment of second kind of language of first fragment association of first kind of language c. from database, retrieving and located;
D. select at least one and the fragment of the first kind of language that had before marked off to have second fragment that in first kind of language, marks off of one or more laps;
Second fragment of second kind of language of second fragment association of first kind of language e. from database, retrieving and select;
F. return two data fragments in first kind of language, and merge lap, make them become the individual data fragment of first kind of language;
If g. two of second kind of language data fragments have lap, then return, in second kind of language by merging the individual data fragment that lap obtains; And
The described individual data fragment of h. related described first kind of language and the described individual data fragment of described second kind of language, thus of the conversion of described individual data fragment returned from described first kind of language to described second kind of language.
68. as the described method of claim 67, it is characterized in that, also comprise:
Specify in the document of first kind of language with the overlapping next data slot of the previous data slot of first kind of language as second fragment that marks off in first kind of language, repeating step d to h.
69. as the described method of claim 68, it is characterized in that, also comprise:
To all overlapping with the previous data slot of first kind of language in the document of first kind of language next data slots, repeating step d to h is up to converting entire document.
70., it is characterized in that described fragment occurs with the form of a word or a plurality of words as the described method of claim 67.
71., it is characterized in that described fragment occurs with the form of a plurality of words as the described method of claim 67.
72. a method of changing document is characterized in that, described method comprises following step:
A. provide the data slot that comprises first kind of language to reach the content of the data slot of second kind of language related with it;
B. from the document that will translate, select the data slot that starts from first word of document and be present in first kind of language in the database;
The fragment of second kind of language of first fragment association of first kind of language c. from database, retrieving and located;
D. select at least one and the fragment of the first kind of language that had before marked off to have second fragment that in first kind of language, marks off of one or more laps;
Second fragment of second kind of language of second fragment association of first kind of language e. from database, retrieving and select, second fragment of described first kind of language selecting and this fragment have lap in second kind of language; And
F. two fragments that merge second kind of language of lap combination constitute two fragments merging laps translation afterwards of first kind of language.
73. as the described method of claim 72, it is characterized in that, also comprise:
Specify next fragment as second fragment that marks off, repeating step d to f is up to document being converted to fully second kind of language.
74. a computer equipment, described computer equipment comprise processor, are connected to the storer of described processor, and are stored in the program in the described storer, it is characterized in that, described computer configuration is for carrying out described program and carrying out following step:
A. receive content with first kind of language performance;
B. be a plurality of fragments with described Context resolution with first kind of language performance;
C. select first fragment and second fragment, the lap that described first fragment and described second fragment have described content;
D. visit with first target fragment in the described content of second kind of language performance, described first target fragment is corresponding to one in described first fragment and second fragment;
E. visit with second target fragment in the described content of second kind of language performance, described second target fragment is corresponding in described first fragment and second fragment another, and has lap with described first target fragment;
F. merge the combination that lap obtains based on described first target fragment and second target fragment, determine described content with second kind of language performance;
G., described content with described second kind of language performance is provided; And
H. whole in a plurality of fragments, repeating step c to g wherein is appointed as first fragment with described second fragment, and will be appointed as second fragment with next fragment that described second fragment has a lap; And
I. to all next fragments in described a plurality of fragments, repeating step h is up to described content all being converted to described second kind of language.
75. a computer equipment, described computer equipment comprise processor, are connected to the storer of described processor, and are stored in the program in the described storer, it is characterized in that, described computer configuration is for carrying out described program and carrying out following step:
A. provide the data slot that comprises first kind of language to reach the content of the data slot of second kind of language related with it;
B. from the document that will translate, select the data slot that starts from first word of document and be present in first kind of language in the database;
The fragment of second kind of language of first fragment association of first kind of language c. from database, retrieving and located;
D. select at least one and the fragment of the first kind of language that had before marked off to have second fragment that in first kind of language, marks off of one or more laps;
Second fragment of second kind of language of second fragment association of first kind of language e. from database, retrieving and select;
F. return two data fragments in first kind of language, and merge lap, make them become the individual data fragment of first kind of language;
If g. two of second kind of language data fragments have lap, then return, in second kind of language by merging the individual data fragment that lap obtains; And
The described individual data fragment of h. related described first kind of language and the described individual data fragment of described second kind of language, thus of the conversion of described individual data fragment returned from described first kind of language to described second kind of language.
76. as the described computer equipment of claim 75, it is characterized in that, also be configured to:
Specify in the document of first kind of language with the overlapping next data slot of the previous data slot of first kind of language as second fragment that marks off in first kind of language, repeating step d to h.
77. as the described computer equipment of claim 76, it is characterized in that, also be configured to:
To all overlapping with the previous data slot of first kind of language in the document of first kind of language next data slots, repeating step d to h is up to the content that converts entire document.
78., it is characterized in that described fragment occurs with the form of a word or a plurality of words as the described computer equipment of claim 75.
79., it is characterized in that described fragment occurs with the form of a plurality of words as the described computer equipment of claim 75.
80. a computer equipment, described computer equipment comprise processor, are connected to the storer of described processor, and are stored in the program in the described storer, it is characterized in that, described computer configuration is for carrying out described program and carrying out following step:
A. receive content with first kind of language performance;
B. be a plurality of fragments with described Context resolution with first kind of language performance;
C. select first fragment and second fragment, the lap that described first fragment and described second fragment have described content;
D. visit with first target fragment in the described content of second kind of language performance, described first target fragment is corresponding to one in described first fragment and second fragment;
Second fragment of second kind of language of second fragment association of first kind of language e. from database, retrieving and select, second fragment of described first kind of language selecting and this fragment have lap in second kind of language; And
F. two fragments that merge second kind of language of lap combination constitute two fragments merging laps translation afterwards of first kind of language.
81. as the described computer equipment of claim 80, it is characterized in that, also be configured to:
Specify next fragment as second fragment that marks off, repeating step d to f is up to document being converted to fully second kind of language.
82. a computer-readable medium, the procedure stores that can be carried out by computer processor is characterized in that described program is used to carry out following step thereon:
A. receive content with first kind of language performance;
B. be a plurality of fragments with described Context resolution with first kind of language performance;
C. select first fragment and second fragment, the lap that described first fragment and described second fragment have described content;
D. visit with first target fragment in the described content of second kind of language performance, described first target fragment is corresponding to one in described first fragment and second fragment;
E. visit with second target fragment in the described content of second kind of language performance, described second target fragment is corresponding in described first fragment and second fragment another, and has lap with described first target fragment;
F. merge the combination that lap obtains based on described first target fragment and second target fragment, determine described content with second kind of language performance;
G., described content with described second kind of language performance is provided; And
H. whole in a plurality of fragments, repeating step c to g wherein is appointed as first fragment with described second fragment, and will be appointed as second fragment with next fragment that described second fragment has a lap; And
I. to all next fragments in described a plurality of fragments, repeating step h is up to described content all being converted to described second kind of language.
83. a computer-readable medium, the procedure stores that can be carried out by computer processor is characterized in that described program is used to carry out following step thereon:
A. provide the data slot that comprises first kind of language to reach the content of the data slot of second kind of language related with it;
B. from the document that will translate, select the data slot that starts from first word of document and be present in first kind of language in the database;
The fragment of second kind of language of first fragment association of first kind of language c. from database, retrieving and located;
D. select at least one and the fragment of the first kind of language that had before marked off to have second fragment that in first kind of language, marks off of one or more laps;
Second fragment of second kind of language of second fragment association of first kind of language e. from database, retrieving and select;
F. return two data fragments in first kind of language, and merge lap, make them become the individual data fragment of first kind of language;
If g. two of second kind of language data fragments have lap, then return, in second kind of language by merging the individual data fragment that lap obtains; And
The described individual data fragment of h. related described first kind of language and the described individual data fragment of described second kind of language, thus of the conversion of described individual data fragment returned from described first kind of language to described second kind of language.
84. as the described computer media of claim 83, it is characterized in that, also be configured to:
Specify in the document of first kind of language with the overlapping next data slot of the previous data slot of first kind of language as second fragment that marks off in first kind of language, repeating step d to h.
85. as the described computer media of claim 84, it is characterized in that, also be configured to:
To all overlapping with the previous data slot of first kind of language in the document of first kind of language next data slots, repeating step d to h is up to the content that converts entire document.
86., it is characterized in that described fragment occurs with the form of a word or a plurality of words as the described computer media of claim 84.
87., it is characterized in that described fragment occurs with the form of a plurality of words as the described computer media of claim 83.
88. a computer-readable medium, the procedure stores that can be carried out by computer processor is characterized in that described program is used to carry out following step thereon:
A. provide the data slot that comprises first kind of language to reach the content of the data slot of second kind of language related with it;
B. from the document that will translate, select the data slot that starts from first word of document and be present in first kind of language in the database;
The fragment of second kind of language of first fragment association of first kind of language c. from database, retrieving and located;
D. select at least one and the fragment of the first kind of language that had before marked off to have second fragment that in first kind of language, marks off of one or more laps;
Second fragment of second kind of language of second fragment association of first kind of language e. from database, retrieving and select, second fragment of described first kind of language selecting and this fragment have lap in second kind of language; And
F. two fragments that merge second kind of language of lap combination constitute two fragments merging laps translation afterwards of first kind of language.
89. as the described computer media of claim 88, it is characterized in that, also be configured to:
Specify next fragment as second fragment that marks off, repeating step d to f is up to document being converted to fully second kind of language.
90. a converted contents and rebuild the computer system of knowledge base is characterized in that described system comprises:
A. receive content with first kind of language performance, and be the computing equipment of at least one first fragment and second fragment with described Context resolution, described first fragment has first, described second fragment has second portion, the lap that described first and described second portion have described content;
B. described computing equipment is visited in the described content third and fourth fragment with second kind of language performance, described the 3rd fragment is corresponding to one in described first fragment and second fragment, described the 4th fragment is corresponding to another, and has and the overlapping part of described the 3rd fragment; And
C. described computing equipment is determined described content with second kind of language performance based on described third and fourth fragment with lap, and provides described content with second kind of language.
91., it is characterized in that also comprise the Database Systems of storage the described the 3rd and the 3rd fragment, wherein said computer equipment is visited described third and fourth fragment by described Database Systems as the described computer system of claim 90.
92. as the described computer system of claim 90, it is characterized in that, described second fragment of given content is as content first fragment in first kind of language, and specifying in next contents fragment that has in first kind of language with the first fragment lap of first kind of language of appointment is second fragment of content in first kind of language, and to each next fragment repeating step a to c of content, up to converting whole contents.
93. create single method of planting the frequency linked database of language, it is characterized in that described method comprises for one kind:
One group of document is provided, and wherein said document comprises at least one document;
Receive word or the word strings inquiry that to analyze from the user;
Search for described this group document, search the appearance of described inquiry;
Establishment appears at apart from described inquiry and reaches the word in the scope of user definition word number and the tabulation of word strings; And
List all that occur apart from described inquiry reaches in the scope of user definition word number with form and reappear the frequency of occurrences of words and word strings.
94. as the described method of claim 93, it is characterized in that, also comprise and create the described step that reaches the proximity tabulation of word in the scope of user definition word number and word strings apart from described inquiry that appears at.
95. as the described method of claim 93, it is characterized in that, also comprise two or more words on the related described word list or word strings or both.
96., it is characterized in that as claim 93 or 94 described methods, return the tabulation of described word and word strings, the tabulation of the described frequency of occurrences to the user, and one or more in described word and the tabulation of word strings proximity.
97. a computer equipment, described computer equipment comprise processor, are connected to the storer of described processor, and are stored in the program in the described storer, it is characterized in that, described computer configuration is for carrying out described program and carrying out following step:
One group of document is provided, and wherein said document comprises at least one document;
Receive word or the word strings inquiry that to analyze from the user;
Search for described this group document, search the appearance of described inquiry;
Establishment appears at apart from described inquiry and reaches the word in the scope of user definition word number and the tabulation of word strings; And
List all that occur apart from described inquiry reaches in the scope of user definition word number with form and reappear the frequency of occurrences of words and word strings.
98. as the described computer equipment of claim 97, it is characterized in that, also be configured to create described appearing at and reach the word in the scope of user definition word number and the proximity tabulation of word strings apart from described inquiry.
99. as the described computer equipment of claim 97, it is characterized in that, also comprise two or more words on the related described word list or word strings or both.
100., it is characterized in that as claim 97 or 98 described computer equipments, return the tabulation of described word and word strings, the tabulation of the described frequency of occurrences to the user, and one or more in described word and the tabulation of word strings proximity.
101. a computer-readable medium, the procedure stores that can be carried out by computer processor is characterized in that described program is used to carry out following step thereon:
One group of document is provided, and wherein said document comprises at least one document;
Receive word or the word strings inquiry that to analyze from the user;
Search for described one group of document, search the appearance of described inquiry;
Establishment appears at apart from described inquiry and reaches the word in the scope of user definition word number and the tabulation of word strings; And
List all that occur apart from described inquiry reaches in the scope of user definition word number with form and reappear the frequency of occurrences of words and word strings.
102. as the described computer media of claim 101, it is characterized in that, also comprise and carry out to create the described step that reaches the proximity tabulation of word in the scope of user definition word number and word strings apart from described inquiry that appears at.
103. as the described computer media of claim 101, it is characterized in that, also comprise two or more words on the related described word list or word strings or both.
104., it is characterized in that as claim 101 or 102 described computer medias, return the tabulation of described word and word strings, the tabulation of the described frequency of occurrences to the user, and one or more in described word and the tabulation of word strings proximity.
105. as the described method of claim 93, it is characterized in that, also comprise:
Receive second word or the word strings inquiry that to analyze from the user;
Search for described one group of document, search the appearance of described second inquiry;
Establishment appears at apart from described second inquiry and reaches word in the scope of user definition word number and second tabulation of word strings;
Establishment appears at all that reach in the scope of user definition word number apart from described second inquiry and reappears second tabulation of the frequency of occurrences of words and word strings;
Establishment appears at apart from described inquiry and reaches in the scope of user definition word number and reach described word in the scope of user definition word number apart from described second inquiry and word strings tabulation and described second word and word strings tabulate word on both and the 3rd tabulation of word strings; And
Word in described the 3rd tabulation is related with described first inquiry and described second inquiry with word strings.
106. as the described method of claim 105, it is characterized in that, revise described the 3rd word and word strings tabulation according to user-defined criterion.
107. as the described method of claim 105, it is characterized in that, described the 3rd word and word strings tabulation carried out classification based on user-defined parameter.
108. as the described computer equipment of claim 97, it is characterized in that, also be configured to carry out following step:
Receive second word or the word strings inquiry that to analyze from the user;
Search for described one group of document, search the appearance of described second inquiry;
Establishment appears at apart from described second inquiry and reaches word in the scope of user definition word number and second tabulation of word strings;
Establishment appears at all that reach in the scope of user definition word number apart from described second inquiry and reappears second tabulation of the frequency of occurrences of words and word strings;
Establishment appears at apart from described inquiry and reaches in the scope of user definition word number and reach described word in the scope of user definition word number apart from described second inquiry and word strings tabulation and described second word and word strings tabulate word on both and the 3rd tabulation of word strings; And
Word in described the 3rd tabulation is related with described first inquiry and described second inquiry with word strings.
109. as the described computer equipment of claim 108, it is characterized in that, revise described the 3rd word and word strings tabulation according to user-defined criterion.
110. as the described computer equipment of claim 108, it is characterized in that, described the 3rd word and word strings tabulation carried out classification based on user-defined parameter.
111. as the described computer media of claim 101, it is characterized in that, also comprise:
Receive second word or the word strings inquiry that to analyze from the user;
Search for described one group of document, search the appearance of described second inquiry;
Establishment appears at apart from described second inquiry and reaches word in the scope of user definition word number and second tabulation of word strings;
Establishment appears at all that reach in the scope of user definition word number apart from described second inquiry and reappears second tabulation of the frequency of occurrences of words and word strings;
Establishment appears at apart from described inquiry and reaches in the scope of user definition word number and reach described word in the scope of user definition word number apart from described second inquiry and word strings tabulation and described second word and word strings tabulate word on both and the 3rd tabulation of word strings; And
Word in described the 3rd tabulation is related with described first inquiry and described second inquiry with word strings.
112. as the described computer media of claim 111, it is characterized in that, revise described the 3rd word and word strings tabulation according to user-defined criterion.
113. as the described computer media of claim 111, it is characterized in that, described the 3rd word and word strings tabulation carried out classification based on user-defined parameter.
114. the method for the word in a kind of language of association is characterized in that described method comprises:
One group of document is provided, comprises a document at least in wherein said one group of document;
Select first word or word strings, reach second word or word strings;
Location wherein first word or word strings appears at second word or word strings and has all documents in the scope of proximity, and the scope of the proximity of described definition has upper and lower bound;
The range of definition in the document of being located wherein defines described scope with respect to first word or word strings and second word or word strings;
Search for described scope, search and reappear word and word strings; And
Based on reappearing the word and the frequency of occurrences of word strings in described scope, related first word or word strings and second word or word strings and reproduction word and word strings.
115. as the described method of claim 114, it is characterized in that, strengthen first word of described association or word strings and second word or word strings by the word or the higher frequency of occurrences of word strings.
116. as the described method of claim 114, it is characterized in that, strengthen first word of described association or word strings and second word or word strings by the word or the lower frequency of occurrences of word strings.
117., it is characterized in that the described upper limit and the described lower limit of the scope of the proximity of described definition equate as the described method of claim 114.
118. a computer equipment, described computer equipment comprise processor, are connected to the storer of described processor, and are stored in the program in the described storer, it is characterized in that, described computer configuration is for carrying out described program and carrying out following step:
One group of document is provided, comprises a document at least in wherein said one group of document;
Select first word or word strings, reach second word or word strings;
Location wherein first word or word strings appears at second word or word strings and has all documents in the scope of proximity, and the scope of the proximity of described definition has upper and lower bound;
The range of definition in the document of being located wherein defines described scope with respect to first word or word strings and second word or word strings;
Search for described scope, search and reappear word and word strings; And
Based on reappearing the word and the frequency of occurrences of word strings in described scope, related first word or word strings and second word or word strings and reproduction word and word strings.
119. as the described computer equipment of claim 118, it is characterized in that, strengthen first word of described association or word strings and second word or word strings by the word or the higher frequency of occurrences of word strings.
120. as the described computer equipment of claim 118, it is characterized in that, strengthen first word of described association or word strings and second word or word strings by the word or the lower frequency of occurrences of word strings.
121., it is characterized in that the described upper limit and the described lower limit of the scope of the proximity of described definition equate as the described computer equipment of claim 118.
122. a computer-readable medium, the procedure stores that can be carried out by computer processor is characterized in that described program is used to carry out following step thereon:
One group of document is provided, comprises a document at least in wherein said one group of document;
Select first word or word strings, reach second word or word strings;
Location wherein first word or word strings appears at second word or word strings and has all documents in the scope of proximity, and the scope of the proximity of described definition has upper and lower bound;
The range of definition in the document of being located wherein defines described scope with respect to first word or word strings and second word or word strings;
Search for described scope, search and reappear word and word strings; And
Based on reappearing the word and the frequency of occurrences of word strings in described scope, related first word or word strings and second word or word strings and reproduction word and word strings.
123. as the described computer media of claim 122, it is characterized in that, strengthen first word of described association or word strings and second word or word strings by the word or the higher frequency of occurrences of word strings.
124. as the described computer media of claim 122, it is characterized in that, strengthen first word of described association or word strings and second word or word strings by the word or the lower frequency of occurrences of word strings.
125., it is characterized in that the described upper limit and the described lower limit of the scope of the proximity of described definition equate as the described computer media of claim 122.
126. as the described method of claim 114, it is characterized in that, also comprise:
Specify first word or word strings or second word or word strings as first word or word strings;
Select the 3rd word or word strings, wherein said the 3rd word or word strings are results of described associated steps, and specify this result as second word or word strings; And
Repeat described selection, location, definition, search and associated steps.
127. as the described computer equipment of claim 118, it is characterized in that, also be configured to:
Specify first word or word strings or second word or word strings as first word or word strings;
Select the 3rd word or word strings, wherein said the 3rd word or word strings are results of described associated steps, and specify this result as second word or word strings; And
Repeat described selection, location, definition, search and associated steps.
128. as the described computer media of claim 122, it is characterized in that, also be configured to:
Specify first word or word strings or second word or word strings as first word or word strings;
Select the 3rd word or word strings, wherein said the 3rd word or word strings are results of described associated steps, and specify this result as second word or word strings; And
Repeat described selection, location, definition, search and associated steps.
129. as the described method of claim 105, it is characterized in that, also comprise:
Specify the inquiry of first word or word strings or second word or word strings inquiry as first word or word strings inquiry;
Select the 3rd word or word strings, wherein said the 3rd word or word strings are results of described related word and word strings step, and specify this result as second word or word strings association; And
Repeat described search, create second word and word strings tabulation, create second frequency of occurrences tabulation, create the 3rd word and word strings tabulation, and associated steps.
130. as the described computer equipment of claim 108, it is characterized in that, also comprise:
Specify the inquiry of first word or word strings or second word or word strings inquiry as first word or word strings inquiry;
Select the 3rd word or word strings, wherein said the 3rd word or word strings are results of described related word and word strings step, and specify this result as second word or word strings association; And
Repeat described search, create second word and word strings tabulation, create second frequency of occurrences tabulation, create the 3rd word and word strings tabulation, and associated steps.
131. as the described computer equipment of claim 111, it is characterized in that, also comprise:
Specify the inquiry of first word or word strings or second word or word strings inquiry as first word or word strings inquiry;
Select the 3rd word or word strings, wherein said the 3rd word or word strings are results of described related word and word strings step, and specify this result as second word or word strings association; And
Repeat described search, create second word and word strings tabulation, create second frequency of occurrences tabulation, create the 3rd word and word strings tabulation, and associated steps.
132. the word in a kind of language of association and the method for word strings is characterized in that described method comprises:
A., one group of document is provided, and wherein said one group of document comprises a document at least;
B. receive word or the word strings inquiry that to analyze from the user;
C. search for described one group of document, search the inquiry that to analyze, and return the document that comprises the inquiry that to analyze;
D. in the described document that returns, based on their frequency, determine to be positioned at the word of user definition quantity on the described inquiry left side that will analyze or word strings or both, and create and be included in described word or word strings or both the left signature lists that is positioned at the described inquiry left side that will analyze in the described document that returns;
E. search for described one group of document, search word and word strings on described left signature list;
F. determine to be positioned at the word of user definition quantity on described word on the described left signature list or word strings or both the right or word strings or both, and based on they frequencies in one group of document, establishment comprises described word or word strings or the described word on the right of both or word strings or both the left anchor point tabulations that is positioned on the described left signature list;
G. in the described document that returns, determine to be positioned at the word of user definition quantity on the described inquiry the right that will analyze or word strings or both, and, create being included in described word or word strings or both the right signature lists that is positioned at the described inquiry the right that will analyze in the described document that returns based on their frequency;
H. search for described one group of document, search word and word strings on described right signature list;
I. determine to be positioned at the word of user definition quantity on described word on the described right signature list or word strings or both left sides or word strings or both, and, create and comprise the described word that is positioned on the described right signature list or described word or word strings or both right anchor point tabulations on word strings or both left sides based on their frequency;
J. based on appearing at each word in the described left anchor point tabulation or the frequency of word strings, and appear at the described word in the described right anchor point tabulation or the frequency of word strings, the result is carried out classification.
133. as the described method of claim 132, it is characterized in that, described to the result carry out classification comprise with the sum frequency that appears at each word in the described left anchor point tabulation or word strings with appear at the described word in the described right anchor point tabulation or the sum frequency of word strings and multiply each other.
134. as the described method of claim 132, it is characterized in that, describedly the result is carried out classification comprise, each is appeared at word or word strings at least one left anchor point tabulation and at least one the right anchor point tabulation, with the sum frequency that appears at each word in the described left anchor point tabulation or word strings with appear at the described word in the described right anchor point tabulation or the sum frequency addition of word strings.
135., it is characterized in that the described classification that the result is carried out is based on described word or the residing left anchor point of word strings is tabulated and the sum of total right anchor point tabulation as the described method of claim 133.
136., it is characterized in that the described classification that the result is carried out is based on user-defined parameter as the described method of claim 133.
137. as the described method of claim 133, it is characterized in that, by specifying described result as new inquiry, repeating step a to j determines and returns the result of new inquiry, and, revise the described classification that the result is carried out based on the result's of the described inquiry of hierarchical modification of inquiry on the results list of new inquiry described classification.
138. as the described method of claim 133, it is characterized in that by specifying described result as new inquiry, repeating step a to j determines and return the result of new inquiry, and, revise described result based on the described result who inquires about the described inquiry of hierarchical modification on the results list of new inquiry.
139. as the described method of claim 133, it is characterized in that, by specifying the new inquiry of each conduct among a plurality of described results, repeating step a to j determines and returns among the result of a plurality of new inquiries each, and, revise described result based on inquiry and both quantity of coming across new Query List wherein together described result of revising described inquiry as a result.
140. as the described method of claim 133, it is characterized in that, by specifying the new inquiry of each conduct among a plurality of described results, repeating step a to j determines and returns among the result of a plurality of new inquiries each, and based on inquiry and as a result both quantity of coming across new Query List wherein together revise the result's of described inquiry described classification, revise the described classification that the result is carried out.
141. as the described method of claim 133, it is characterized in that, by the new inquiry of each conduct among the automatic described result, repeating step a to j determines the also result of each new inquiry of return results, and the classification in the tabulation of new inquiry based on inquiry and described result, revise the result's of described inquiry classification, revise the described classification that the result is carried out.
142. as the described method of claim 133, it is characterized in that, by the new inquiry of each conduct among the automatic described result, and repeating step a to j determines and the result of each new inquiry of return results, and the described result of the described inquiry of hierarchical modification in the tabulation of new inquiry based on inquiry and described result, revise the result.
143. as the described method of claim 133, it is characterized in that, by specifying described result as new inquiry, repeating step a to j, and the described result's who revises described inquiry based on the word on the word on the left signature list of the new inquiry on left signature list that does not appear at inquiry and/or the right signature list and word strings and/or the right signature list and word strings described classification, revise the described classification that the result is carried out.
144. as the described method of claim 133, it is characterized in that, by specifying described result as new inquiry, repeating step a to j, and, revise described result based on the described result that the word on the word on the left signature list of the new inquiry on left signature list that does not appear at inquiry and/or the right signature list and word strings and/or the right signature list and word strings are revised described inquiry.
145. as the described method of claim 133, it is characterized in that, by specifying described result as new inquiry, repeating step a to j, and the described result's who revises described inquiry based on the word on the word on the left signature list of the inquiry on left signature list that does not appear at new inquiry and/or the right signature list and word strings and/or the right signature list and word strings described classification, revise the described classification that the result is carried out.
146. as the described method of claim 133, it is characterized in that, by specifying described result as new inquiry, repeating step a to j, and, revise described result based on the described result that the word on the word on the left signature list of the inquiry on left signature list that does not appear at new inquiry and/or the right signature list and word strings and/or the right signature list and word strings are revised described inquiry.
147. as the described method of claim 133, it is characterized in that, by specifying described result as new inquiry, repeating step a to j, and revise the described result of described inquiry based on the word on the left signature list of the inquiry on the right signature list that appears at new inquiry and word strings, revise described result.
148. as the described method of claim 133, it is characterized in that, by specifying described result as new inquiry, repeating step a to j, and the result's who revises described inquiry based on the word on the left signature list of the inquiry on the right signature list that appears at new inquiry and word strings described classification, revise the described classification that the result is carried out.
149. as the described method of claim 133, it is characterized in that, by specifying described result as new inquiry, repeating step a to j, and revise the described result of described inquiry based on the word on the right signature list of the inquiry on the left signature list that appears at new inquiry and word strings, revise described result.
150. as the described method of claim 133, it is characterized in that, by specifying described result as new inquiry, repeating step a to j, and the result's who revises described inquiry based on the word on the left signature list of the inquiry on the right signature list that appears at new inquiry and word strings described classification, revise the described classification that the result is carried out.
151. as the described method of claim 133, it is characterized in that, also comprise following additional step:
K. in the described document that returns, based on their frequency, determine to be positioned at the word of user definition quantity on the described inquiry left side that will analyze or word strings or both, and create and comprise inquiry and be positioned at the described word on the described inquiry left side or the tabulation of word strings or both second word strings;
L. to each word strings in the tabulation of second word strings, as new inquiry and repeating step c to h, create word and word strings linked list by each word strings in the tabulation of specifying second word strings;
M. in the described document that returns, based on their frequency, determine to be positioned at the word of user definition quantity on the described inquiry the right that will analyze or word strings or both, and create and comprise inquiry and be positioned at the described word on described inquiry the right or second tabulation of word strings or both the 3rd word strings;
N. to each word strings in the tabulation of second the 3rd word strings, as new inquiry and repeating step c to j, create second word and word strings linked list by each word strings in the tabulation of specifying second the 3rd word strings;
O. determine on the described linked list with described second linked list on word strings have the word strings of lap; And
P. the word in the lap of overlapping word strings or word strings are identified as the synonym or the approximate synonym of inquiry.
152. a computer equipment, described computer equipment comprise processor, are connected to the storer of described processor, and are stored in the program in the described storer, it is characterized in that, described computer configuration is for carrying out described program and carrying out following step:
A., one group of document is provided, and wherein said one group of document comprises a document at least;
B. receive word or the word strings inquiry that to analyze from the user;
C. search for described one group of document, search the inquiry that to analyze, and return the document that comprises the inquiry that to analyze;
D. in the described document that returns, based on their frequency, determine to be positioned at the word of user definition quantity on the described inquiry left side that will analyze or word strings or both, and create and be included in described word or word strings or both the left signature lists that is positioned at the described inquiry left side that will analyze in the described document that returns;
E. search for described one group of document, search word and word strings on described left signature list;
F. determine to be positioned at the word of user definition quantity on described word on the described left signature list or word strings or both the right or word strings or both, and based on they frequencies in one group of document, establishment comprises described word or word strings or the described word on the right of both or word strings or both the left anchor point tabulations that is positioned on the described left signature list;
G. in the described document that returns, determine to be positioned at the word of user definition quantity on the described inquiry the right that will analyze or word strings or both, and, create being included in described word or word strings or both the right signature lists that is positioned at the described inquiry the right that will analyze in the described document that returns based on their frequency;
H. search for described one group of document, search word and word strings on described right signature list;
I. determine to be positioned at the word of user definition quantity on described word on the described right signature list or word strings or both left sides or word strings or both, and, create and comprise the described word that is positioned on the described right signature list or described word or word strings or both right anchor point tabulations on word strings or both left sides based on their frequency;
J. based on appearing at each word in the described left anchor point tabulation or the frequency of word strings, and appear at the described word in the described right anchor point tabulation or the frequency of word strings, the result is carried out classification.
153. as the described computer equipment of claim 152, it is characterized in that, described to the result carry out classification comprise with the sum frequency that appears at each word in the described left anchor point tabulation or word strings with appear at the described word in the described right anchor point tabulation or the sum frequency of word strings and multiply each other.
154. as the described computer equipment of claim 152, it is characterized in that, describedly the result is carried out classification comprise, each is appeared at word or word strings at least one left anchor point tabulation and at least one the right anchor point tabulation, with the sum frequency that appears at each word in the described left anchor point tabulation or word strings with appear at the described word in the described right anchor point tabulation or the sum frequency addition of word strings.
155., it is characterized in that the described classification that the result is carried out is based on described word or the residing left anchor point of word strings is tabulated and the sum of total right anchor point tabulation as the described computer equipment of claim 152.
156., it is characterized in that the described classification that the result is carried out is based on user-defined parameter as the described computer equipment of claim 152.
157. as the described computer equipment of claim 152, it is characterized in that, by specifying described result as new inquiry, repeating step a to j determines and returns the result of new inquiry, and, revise the described classification that the result is carried out based on the result's of the described inquiry of hierarchical modification of inquiry on the results list of new inquiry described classification.
158. as the described computer equipment of claim 152, it is characterized in that, by specifying described result as new inquiry, repeating step a to j determines and returns the result of new inquiry, and, revise described result based on the described result who inquires about the described inquiry of hierarchical modification on the results list of new inquiry.
159. as the described computer equipment of claim 152, it is characterized in that, by specifying the new inquiry of each conduct among a plurality of described results, repeating step a to j determines and returns among the result of a plurality of new inquiries each, and, revise described result based on inquiry and both quantity of coming across new Query List wherein together described result of revising described inquiry as a result.
160. as the described computer equipment of claim 152, it is characterized in that, by specifying the new inquiry of each conduct among a plurality of described results, repeating step a to j determines and returns among the result of a plurality of new inquiries each, and based on inquiry and as a result both quantity of coming across new Query List wherein together revise the result's of described inquiry described classification, revise the described classification that the result is carried out.
161. as the described computer equipment of claim 152, it is characterized in that, by the new inquiry of each conduct among the automatic described result, repeating step a to j determines the also result of each new inquiry of return results, and the classification in the tabulation of new inquiry based on inquiry and described result, revise the result's of described inquiry classification, revise the described classification that the result is carried out.
162. as the described computer equipment of claim 152, it is characterized in that, by the new inquiry of each conduct among the automatic described result, and repeating step a to j determines and the result of each new inquiry of return results, and the described result of the described inquiry of hierarchical modification in the tabulation of new inquiry based on inquiry and described result, revise the result.
163. as the described computer equipment of claim 152, it is characterized in that, by specifying described result as new inquiry, repeating step a to j, and the described result's who revises described inquiry based on the word on the word on the left signature list of the new inquiry on left signature list that does not appear at inquiry and/or the right signature list and word strings and/or the right signature list and word strings described classification, revise the described classification that the result is carried out.
164. as the described computer equipment of claim 152, it is characterized in that, by specifying described result as new inquiry, repeating step a to j, and, revise described result based on the described result that the word on the word on the left signature list of the new inquiry on left signature list that does not appear at inquiry and/or the right signature list and word strings and/or the right signature list and word strings are revised described inquiry.
165. as the described computer equipment of claim 152, it is characterized in that, by specifying described result as new inquiry, repeating step a to j, and the described result's who revises described inquiry based on the word on the word on the left signature list of the inquiry on left signature list that does not appear at new inquiry and/or the right signature list and word strings and/or the right signature list and word strings described classification, revise the described classification that the result is carried out.
166. as the described computer equipment of claim 152, it is characterized in that, by specifying described result as new inquiry, repeating step a to j, and, revise described result based on the described result that the word on the word on the left signature list of the inquiry on left signature list that does not appear at new inquiry and/or the right signature list and word strings and/or the right signature list and word strings are revised described inquiry.
167. as the described computer equipment of claim 152, it is characterized in that, by specifying described result as new inquiry, repeating step a to j, and revise the described result of described inquiry based on the word on the left signature list of the inquiry on the right signature list that appears at new inquiry and word strings, revise described result.
168. as the described computer equipment of claim 152, it is characterized in that, by specifying described result as new inquiry, repeating step a to j, and the result's who revises described inquiry based on the word on the left signature list of the inquiry on the right signature list that appears at new inquiry and word strings described classification, revise the described classification that the result is carried out.
169. as the described computer equipment of claim 152, it is characterized in that, by specifying described result as new inquiry, repeating step a to j, and revise the described result of described inquiry based on the word on the right signature list of the inquiry on the left signature list that appears at new inquiry and word strings, revise described result.
170. as the described computer equipment of claim 152, it is characterized in that, by specifying described result as new inquiry, repeating step a to j, and the result's who revises described inquiry based on the word on the left signature list of the inquiry on the right signature list that appears at new inquiry and word strings described classification, revise the described classification that the result is carried out.
171. as the described computer equipment of claim 152, it is characterized in that, also comprise following additional step:
K. in the described document that returns, based on their frequency, determine to be positioned at the word of user definition quantity on the described inquiry left side that will analyze or word strings or both, and create and comprise inquiry and be positioned at the described word on the described inquiry left side or the tabulation of word strings or both second word strings;
L. to each word strings in the tabulation of second word strings, as new inquiry and repeating step c to h, create word and word strings linked list by each word strings in the tabulation of specifying second word strings;
M. in the described document that returns, based on their frequency, determine to be positioned at the word of user definition quantity on the described inquiry the right that will analyze or word strings or both, and create and comprise inquiry and be positioned at the described word on described inquiry the right or second tabulation of word strings or both the 3rd word strings;
N. to each word strings in the tabulation of second the 3rd word strings, as new inquiry and repeating step c to j, create second word and word strings linked list by each word strings in the tabulation of specifying second the 3rd word strings;
O. determine on the described linked list with described second linked list on word strings have the word strings of lap; And
P. the word in the lap of overlapping word strings or word strings are identified as the synonym or the approximate synonym of inquiry.
172. a computer-readable medium, the procedure stores that can be carried out by computer processor is characterized in that described program is used to carry out following step thereon:
A., one group of document is provided, and wherein said one group of document comprises a document at least;
B. receive word or the word strings inquiry that to analyze from the user;
C. search for described one group of document, search the inquiry that to analyze, and return the document that comprises the inquiry that to analyze;
D. in the described document that returns, based on their frequency, determine to be positioned at the word of user definition quantity on the described inquiry left side that will analyze or word strings or both, and create and be included in described word or word strings or both the left signature lists that is positioned at the described inquiry left side that will analyze in the described document that returns;
E. search for described one group of document, search word and word strings on described left signature list;
F. determine to be positioned at the word of user definition quantity on described word on the described left signature list or word strings or both the right or word strings or both, and based on they frequencies in one group of document, establishment comprises described word or word strings or the described word on the right of both or word strings or both the left anchor point tabulations that is positioned on the described left signature list;
G. in the described document that returns, determine to be positioned at the word of user definition quantity on the described inquiry the right that will analyze or word strings or both, and, create being included in described word or word strings or both the right signature lists that is positioned at the described inquiry the right that will analyze in the described document that returns based on their frequency;
H. search for described one group of document, search word and word strings on described right signature list;
I. determine to be positioned at the word of user definition quantity on described word on the described right signature list or word strings or both left sides or word strings or both, and, create and comprise the described word that is positioned on the described right signature list or described word or word strings or both right anchor point tabulations on word strings or both left sides based on their frequency;
J. based on appearing at each word in the described left anchor point tabulation or the frequency of word strings, and appear at the described word in the described right anchor point tabulation or the frequency of word strings, the result is carried out classification.
173. as the described computer media of claim 172, it is characterized in that, described to the result carry out classification comprise with the sum frequency that appears at each word in the described left anchor point tabulation or word strings with appear at the described word in the described right anchor point tabulation or the sum frequency of word strings and multiply each other.
174. as the described computer media of claim 172, it is characterized in that, describedly the result is carried out classification comprise, each is appeared at word or word strings at least one left anchor point tabulation and at least one the right anchor point tabulation, with the sum frequency that appears at each word in the described left anchor point tabulation or word strings with appear at the described word in the described right anchor point tabulation or the sum frequency addition of word strings.
175., it is characterized in that the described classification that the result is carried out is based on described word or the residing left anchor point of word strings is tabulated and the sum of total right anchor point tabulation as the described computer media of claim 172.
176., it is characterized in that the described classification that the result is carried out is based on user-defined parameter as the described computer media of claim 172.
177. as the described computer media of claim 172, it is characterized in that, by specifying described result as new inquiry, repeating step a to j determines and returns the result of new inquiry, and, revise the described classification that the result is carried out based on the result's of the described inquiry of hierarchical modification of inquiry on the results list of new inquiry described classification.
178. as the described computer media of claim 172, it is characterized in that, by specifying described result as new inquiry, repeating step a to j determines and returns the result of new inquiry, and, revise described result based on the described result who inquires about the described inquiry of hierarchical modification on the results list of new inquiry.
179. as the described computer media of claim 172, it is characterized in that, by specifying the new inquiry of each conduct among a plurality of described results, repeating step a to j determines and returns among the result of a plurality of new inquiries each, and, revise described result based on inquiry and both quantity of coming across new Query List wherein together described result of revising described inquiry as a result.
180. as the described computer media of claim 172, it is characterized in that, by specifying the new inquiry of each conduct among a plurality of described results, repeating step a to j determines and returns among the result of a plurality of new inquiries each, and based on inquiry and as a result both quantity of coming across new Query List wherein together revise the result's of described inquiry described classification, revise the described classification that the result is carried out.
181. as the described computer media of claim 172, it is characterized in that, by the new inquiry of each conduct among the automatic described result, repeating step a to j determines the also result of each new inquiry of return results, and the classification in the tabulation of new inquiry based on inquiry and described result, revise the result's of described inquiry classification, revise the described classification that the result is carried out.
182. as the described computer media of claim 172, it is characterized in that, by the new inquiry of each conduct among the automatic described result, and repeating step a to j determines and the result of each new inquiry of return results, and the described result of the described inquiry of hierarchical modification in the tabulation of new inquiry based on inquiry and described result, revise the result.
183. as the described computer media of claim 172, it is characterized in that, by specifying described result as new inquiry, repeating step a to j, and the described result's who revises described inquiry based on the word on the word on the left signature list of the new inquiry on left signature list that does not appear at inquiry and/or the right signature list and word strings and/or the right signature list and word strings described classification, revise the described classification that the result is carried out.
184. as the described computer media of claim 172, it is characterized in that, by specifying described result as new inquiry, repeating step a to j, and, revise described result based on the described result that the word on the word on the left signature list of the new inquiry on left signature list that does not appear at inquiry and/or the right signature list and word strings and/or the right signature list and word strings are revised described inquiry.
185. as the described computer media of claim 172, it is characterized in that, by specifying described result as new inquiry, repeating step a to j, and the described result's who revises described inquiry based on the word on the word on the left signature list of the inquiry on left signature list that does not appear at new inquiry and/or the right signature list and word strings and/or the right signature list and word strings described classification, revise the described classification that the result is carried out.
186. as the described computer media of claim 172, it is characterized in that, by specifying described result as new inquiry, repeating step a to j, and, revise described result based on the described result that the word on the word on the left signature list of the inquiry on left signature list that does not appear at new inquiry and/or the right signature list and word strings and/or the right signature list and word strings are revised described inquiry.
187. as the described computer media of claim 172, it is characterized in that, by specifying described result as new inquiry, repeating step a to j, and revise the described result of described inquiry based on the word on the left signature list of the inquiry on the right signature list that appears at new inquiry and word strings, revise described result.
188. as the described computer media of claim 172, it is characterized in that, by specifying described result as new inquiry, repeating step a to j, and the result's who revises described inquiry based on the word on the left signature list of the inquiry on the right signature list that appears at new inquiry and word strings described classification, revise the described classification that the result is carried out.
189. as the described computer media of claim 172, it is characterized in that, by specifying described result as new inquiry, repeating step a to j, and revise the described result of described inquiry based on the word on the right signature list of the inquiry on the left signature list that appears at new inquiry and word strings, revise described result.
190. as the described computer media of claim 172, it is characterized in that, by specifying described result as new inquiry, repeating step a to j, and the result's who revises described inquiry based on the word on the left signature list of the inquiry on the right signature list that appears at new inquiry and word strings described classification, revise the described classification that the result is carried out.
191. as the described computer media of claim 172, it is characterized in that, also comprise following additional step:
K. in the described document that returns, based on their frequency, determine to be positioned at the word of user definition quantity on the described inquiry left side that will analyze or word strings or both, and create and comprise inquiry and be positioned at the described word on the described inquiry left side or the tabulation of word strings or both second word strings;
L. to each word strings in the tabulation of second word strings, as new inquiry and repeating step c to h, create word and word strings linked list by each word strings in the tabulation of specifying second word strings;
M. in the described document that returns, based on their frequency, determine to be positioned at the word of user definition quantity on the described inquiry the right that will analyze or word strings or both, and create and comprise inquiry and be positioned at the described word on described inquiry the right or second tabulation of word strings or both the 3rd word strings;
N. to each word strings in the tabulation of second the 3rd word strings, as new inquiry and repeating step c to j, create second word and word strings linked list by each word strings in the tabulation of specifying second the 3rd word strings;
O. determine on the described linked list with described second linked list on word strings have the word strings of lap; And
P. the word in the lap of overlapping word strings or word strings are identified as the synonym or the approximate synonym of inquiry.
192. the word of an a kind of language of association and the method for word strings is characterized in that, described method comprises:
A., one group of document is provided, and wherein said one group of document comprises a document at least;
B. receive word or the word strings inquiry that to analyze from the user;
C. search for described one group of document, search the inquiry that to analyze, and return the document that comprises the inquiry that to analyze;
D. comprise in the returning to document of the inquiry that to analyze described, determine to be positioned at the word with user definition length of user definition quantity on the inquiry left side and the right or word strings or both;
E. return the tabulation with one or more clauses and subclauses, wherein said one or more clauses and subclauses are included in described definite word of being positioned at the inquiry left side and the right in described the returning to document or word strings or both;
F. search for described one group of document, in described return-list, search described one or more clauses and subclauses; And
G. return described definite word or the word strings that in described returning to document, is positioned at the described inquiry left side and the right or occur word or word strings or both tabulations the most frequent and that have user definition length between the two.
193. as the described method of claim 192, it is characterized in that, based on unique described definite word or the word strings or both quantity on the word left side that is arranged in the described tabulation of returning and the right, the described word that returns or word strings or both tabulations are carried out classification.
194. as claim 192 or 193 described methods, it is characterized in that, the described word that returns or word strings or both tabulations carried out classification based on user-defined parameter.
195. as the described method of claim 192, it is characterized in that, by specifying described result as new inquiry, repeating step a to g determines and returns the result of new inquiry, and, revise the described classification that the result is carried out based on the result's of the described inquiry of hierarchical modification of inquiry on new Query Result described classification.
196. as the described method of claim 192, it is characterized in that by specifying described result as new inquiry, repeating step a to g determines and return the result of new inquiry, and, revise described result based on the described result who inquires about the described inquiry of hierarchical modification on new Query Result.
197. as the described method of claim 192, it is characterized in that, by specifying the new inquiry of each conduct among the described result, repeating step a to g determines and returns the result of new inquiry, and based on inquiry and as a result both quantity of occurring new Query List thereon together revise the result's of described inquiry described classification, revise the described classification that the result is carried out.
198. as the described method of claim 192, it is characterized in that, by specifying the new inquiry of each conduct among the described result, repeating step a to g determines and returns the result of new inquiry, and, revise described result based on inquiry and both quantity of occurring new Query List thereon together described result of revising described inquiry as a result.
199. as the described method of claim 192, it is characterized in that, by specifying the new inquiry of each conduct among the described result, repeating step a to g determines and returns the result of new inquiry, and, revise the described classification that the result is carried out based on result's the described classification of the described inquiry of hierarchical modification on inquiry and the result's new Query List that both occur together at them.
200. as the described method of claim 192, it is characterized in that, by specifying the new inquiry of each conduct among the described result, repeating step a to g determines and returns the result of new inquiry, and, revise described result based on the described result of the described inquiry of hierarchical modification on inquiry and the result's new Query List that both occur together at them.
201. as the described method of claim 192, it is characterized in that, by specifying described result as new inquiry, repeating step a to e determines and returns the word that is positioned at the new inquiry left side and the right or word strings or both, and, revise the described classification that the result is carried out based on not appearing at being positioned at the inquiry word on the left side or word strings or both and/or being positioned at the word on inquiry the right or word strings or both revise the result's of described inquiry described classification of the new inquiry left side and/or the right.
202. as the described method of claim 192, it is characterized in that, by specifying described result as new inquiry, repeating step a to e determines and returns the word that is positioned at the new inquiry left side and the right or word strings or both, and, revise described result based on not appearing at being positioned at the inquiry word on the left side or word strings or both and/or being positioned at the word on inquiry the right or word strings or both revise the described result of described inquiry of the new inquiry left side and/or the right.
203. as the described method of claim 192, it is characterized in that, by specifying described result as new inquiry, repeating step a to e determines and returns the word that is positioned at the new inquiry left side and the right or word strings or both, and based on the word that is positioned at the new inquiry left side that does not appear at the inquiry left side and/or the right or word strings or both and/or be positioned at the word on new inquiry the right or word strings or both revise the result's of described inquiry described classification, revise the described classification that the result is carried out.
204. as the described method of claim 192, it is characterized in that, by specifying described result as new inquiry, repeating step a to e determines and returns the word that is positioned at the new inquiry left side and the right or word strings or both, and based on the word that is positioned at the new inquiry left side that does not appear at the inquiry left side and/or the right or word strings or both and/or be positioned at the word on new inquiry the right or word strings or both revise the described result of described inquiry, revise described result.
205. as the described method of claim 192, it is characterized in that, also comprise:
H. in the described document that returns, based on their frequency, determine to be positioned at the word of user definition quantity on the described inquiry left side that will analyze or word strings or both, and create and comprise inquiry and be positioned at the described word on the described inquiry left side or the tabulation of word strings or both second word strings;
I. to each word strings in the tabulation of second word strings, as new inquiry and repeating step c to g, create word and word strings linked list by each word strings in the tabulation of specifying second word strings;
J. in the described document that returns, based on their frequency, determine to be positioned at the word of user definition quantity on the described inquiry the right that will analyze or word strings or both, and create and comprise inquiry and be positioned at the described word on described inquiry the right or second tabulation of word strings or both the 3rd word strings;
K. to each word strings in the tabulation of second the 3rd word strings, as new inquiry and repeating step c to g, create second word and word strings linked list by each word strings in the tabulation of specifying second the 3rd word strings;
L. determine on the described linked list with described second linked list on word strings have the word strings of lap; And
M. the word in the lap of overlapping word strings or word strings are identified as the synonym or the approximate synonym of inquiry.
206. a computer equipment, described computer equipment comprise processor, are connected to the storer of described processor, and are stored in the program in the described storer, it is characterized in that, described computer configuration is for carrying out described program and carrying out following step:
A., one group of document is provided, and wherein said one group of document comprises a document at least;
B. receive word or the word strings inquiry that to analyze from the user;
C. search for described one group of document, search the inquiry that to analyze, and return the document that comprises the inquiry that to analyze;
D. comprise in the returning to document of the inquiry that to analyze described, determine to be positioned at the word with user definition length of user definition quantity on the inquiry left side and the right or word strings or both;
E. return the tabulation with one or more clauses and subclauses, wherein said one or more clauses and subclauses are included in described definite word of being positioned at the inquiry left side and the right in described the returning to document or word strings or both;
F. search for described one group of document, in described return-list, search described one or more clauses and subclauses; And
G. return described definite word or the word strings that in described returning to document, is positioned at the described inquiry left side and the right or occur word or word strings or both tabulations the most frequent and that have user definition length between the two.
207. as the described computer equipment of claim 206, it is characterized in that, based on unique described definite word or the word strings or both quantity on the word left side that is arranged in the described tabulation of returning and the right, the described word that returns or word strings or both tabulations are carried out classification.
208. as claim 206 or 207 described methods, it is characterized in that, the described word that returns or word strings or both tabulations carried out classification based on user-defined parameter.
209. as the described computer equipment of claim 206, it is characterized in that, by specifying described result as new inquiry, repeating step a to g determines and returns the result of new inquiry, and, revise the described classification that the result is carried out based on the result's of the described inquiry of hierarchical modification of inquiry on new Query Result described classification.
210. as the described computer equipment of claim 206, it is characterized in that by specifying described result as new inquiry, repeating step a to g determines and return the result of new inquiry, and, revise described result based on the described result who inquires about the described inquiry of hierarchical modification on new Query Result.
211. as the described computer equipment of claim 206, it is characterized in that, by specifying the new inquiry of each conduct among the described result, repeating step a to g determines and returns the result of new inquiry, and based on inquiry and as a result both quantity of occurring new Query List thereon together revise the result's of described inquiry described classification, revise the described classification that the result is carried out.
212. as the described computer equipment of claim 206, it is characterized in that, by specifying the new inquiry of each conduct among the described result, repeating step a to g determines and returns the result of new inquiry, and, revise described result based on inquiry and both quantity of occurring new Query List thereon together described result of revising described inquiry as a result.
213. as the described computer equipment of claim 206, it is characterized in that, by specifying the new inquiry of each conduct among the described result, repeating step a to g determines and returns the result of new inquiry, and, revise the described classification that the result is carried out based on result's the described classification of the described inquiry of hierarchical modification on inquiry and the result's new Query List that both occur together at them.
214. as the described computer equipment of claim 206, it is characterized in that, by specifying the new inquiry of each conduct among the described result, repeating step a to g determines and returns the result of new inquiry, and, revise described result based on the described result of the described inquiry of hierarchical modification on inquiry and the result's new Query List that both occur together at them.
215. as the described computer equipment of claim 206, it is characterized in that, by specifying described result as new inquiry, repeating step a to e determines and returns the word that is positioned at the new inquiry left side and the right or word strings or both, and, revise the described classification that the result is carried out based on not appearing at being positioned at the inquiry word on the left side or word strings or both and/or being positioned at the word on inquiry the right or word strings or both revise the result's of described inquiry described classification of the new inquiry left side and/or the right.
216. as the described computer equipment of claim 206, it is characterized in that, by specifying described result as new inquiry, repeating step a to e determines and returns the word that is positioned at the new inquiry left side and the right or word strings or both, and, revise described result based on not appearing at being positioned at the inquiry word on the left side or word strings or both and/or being positioned at the word on inquiry the right or word strings or both revise the described result of described inquiry of the new inquiry left side and/or the right.
217. as the described computer equipment of claim 206, it is characterized in that, by specifying described result as new inquiry, repeating step a to e determines and returns the word that is positioned at the new inquiry left side and the right or word strings or both, and based on the word that is positioned at the new inquiry left side that does not appear at the inquiry left side and/or the right or word strings or both and/or be positioned at the word on new inquiry the right or word strings or both revise the result's of described inquiry described classification, revise the described classification that the result is carried out.
218. as the described computer equipment of claim 206, it is characterized in that, by specifying described result as new inquiry, repeating step a to e determines and returns the word that is positioned at the new inquiry left side and the right or word strings or both, and based on the word that is positioned at the new inquiry left side that does not appear at the inquiry left side and/or the right or word strings or both and/or be positioned at the word on new inquiry the right or word strings or both revise the described result of described inquiry, revise described result.
219. as the described computer equipment of claim 206, it is characterized in that, also comprise:
H. in the described document that returns, based on their frequency, determine to be positioned at the word of user definition quantity on the described inquiry left side that will analyze or word strings or both, and create and comprise inquiry and be positioned at the described word on the described inquiry left side or the tabulation of word strings or both second word strings;
I. to each word strings in the tabulation of second word strings, as new inquiry and repeating step c to g, create word and word strings linked list by each word strings in the tabulation of specifying second word strings;
J. in the described document that returns, based on their frequency, determine to be positioned at the word of user definition quantity on the described inquiry the right that will analyze or word strings or both, and create and comprise inquiry and be positioned at the described word on described inquiry the right or second tabulation of word strings or both the 3rd word strings;
K. to each word strings in the tabulation of second the 3rd word strings, as new inquiry and repeating step c to g, create second word and word strings linked list by each word strings in the tabulation of specifying second the 3rd word strings;
L. determine on the described linked list with described second linked list on word strings have the word strings of lap; And
M. the word in the lap of overlapping word strings or word strings are identified as the synonym or the approximate synonym of inquiry.
220. a computer-readable medium, the procedure stores that can be carried out by computer processor is characterized in that described program is used to carry out following step thereon:
A., one group of document is provided, and wherein said one group of document comprises a document at least;
B. receive word or the word strings inquiry that to analyze from the user;
C. search for described one group of document, search the inquiry that to analyze, and return the document that comprises the inquiry that to analyze;
D. comprise in the returning to document of the inquiry that to analyze described, determine to be positioned at the word with user definition length of user definition quantity on the inquiry left side and the right or word strings or both;
E. return the tabulation with one or more clauses and subclauses, wherein said one or more clauses and subclauses are included in described definite word of being positioned at the inquiry left side and the right in described the returning to document or word strings or both;
F. search for described one group of document, in described return-list, search described one or more clauses and subclauses; And
G. return described definite word or the word strings that in described returning to document, is positioned at the described inquiry left side and the right or occur word or word strings or both tabulations the most frequent and that have user definition length between the two.
221. as the described computer media of claim 220, it is characterized in that, based on unique described definite word or the word strings or both quantity on the word left side that is arranged in the described tabulation of returning and the right, the described word that returns or word strings or both tabulations are carried out classification.
222. as claim 220 or 221 described computer medias, it is characterized in that, the described word that returns or word strings or both tabulations carried out classification based on user-defined parameter.
223. as the described computer media of claim 220, it is characterized in that, by specifying described result as new inquiry, repeating step a to g determines and returns the result of new inquiry, and, revise the described classification that the result is carried out based on the result's of the described inquiry of hierarchical modification of inquiry on new Query Result described classification.
224. as the described computer media of claim 220, it is characterized in that by specifying described result as new inquiry, repeating step a to g determines and return the result of new inquiry, and, revise described result based on the described result who inquires about the described inquiry of hierarchical modification on new Query Result.
225. as the described computer media of claim 220, it is characterized in that, by specifying the new inquiry of each conduct among the described result, repeating step a to g determines and returns the result of new inquiry, and based on inquiry and as a result both quantity of occurring new Query List thereon together revise the result's of described inquiry described classification, revise the described classification that the result is carried out.
226. as the described computer media of claim 220, it is characterized in that, by specifying the new inquiry of each conduct among the described result, repeating step a to g determines and returns the result of new inquiry, and, revise described result based on inquiry and both quantity of occurring new Query List thereon together described result of revising described inquiry as a result.
227. as the described computer media of claim 220, it is characterized in that, by specifying the new inquiry of each conduct among the described result, repeating step a to g determines and returns the result of new inquiry, and, revise the described classification that the result is carried out based on result's the described classification of the described inquiry of hierarchical modification on inquiry and the result's new Query List that both occur together at them.
228. as the described computer media of claim 220, it is characterized in that, by specifying the new inquiry of each conduct among the described result, repeating step a to g determines and returns the result of new inquiry, and, revise described result based on the described result of the described inquiry of hierarchical modification on inquiry and the result's new Query List that both occur together at them.
229. as the described computer media of claim 220, it is characterized in that, by specifying described result as new inquiry, repeating step a to e determines and returns the word that is positioned at the new inquiry left side and the right or word strings or both, and, revise the described classification that the result is carried out based on not appearing at being positioned at the inquiry word on the left side or word strings or both and/or being positioned at the word on inquiry the right or word strings or both revise the result's of described inquiry described classification of the new inquiry left side and/or the right.
230. as the described computer media of claim 220, it is characterized in that, by specifying described result as new inquiry, repeating step a to e determines and returns the word that is positioned at the new inquiry left side and the right or word strings or both, and, revise described result based on not appearing at being positioned at the inquiry word on the left side or word strings or both and/or being positioned at the word on inquiry the right or word strings or both revise the described result of described inquiry of the new inquiry left side and/or the right.
231. as the described computer media of claim 220, it is characterized in that, by specifying described result as new inquiry, repeating step a to e determines and returns the word that is positioned at the new inquiry left side and the right or word strings or both, and based on the word that is positioned at the new inquiry left side that does not appear at the inquiry left side and/or the right or word strings or both and/or be positioned at the word on new inquiry the right or word strings or both revise the result's of described inquiry described classification, revise the described classification that the result is carried out.
232. as the described computer media of claim 220, it is characterized in that, by specifying described result as new inquiry, repeating step a to e determines and returns the word that is positioned at the new inquiry left side and the right or word strings or both, and based on the word that is positioned at the new inquiry left side that does not appear at the inquiry left side and/or the right or word strings or both and/or be positioned at the word on new inquiry the right or word strings or both revise the described result of described inquiry, revise described result.
233. as the described computer media of claim 220, it is characterized in that, also comprise:
H. in the described document that returns, based on their frequency, determine to be positioned at the word of user definition quantity on the described inquiry left side that will analyze or word strings or both, and create and comprise inquiry and be positioned at the described word on the described inquiry left side or the tabulation of word strings or both second word strings;
I. to each word strings in the tabulation of second word strings, as new inquiry and repeating step c to g, create word and word strings linked list by each word strings in the tabulation of specifying second word strings;
J. in the described document that returns, based on their frequency, determine to be positioned at the word of user definition quantity on the described inquiry the right that will analyze or word strings or both, and create and comprise inquiry and be positioned at the described word on described inquiry the right or second tabulation of word strings or both the 3rd word strings;
K. to each word strings in the tabulation of second the 3rd word strings, as new inquiry and repeating step c to g, create second word and word strings linked list by each word strings in the tabulation of specifying second the 3rd word strings;
L. determine on the described linked list with described second linked list on word strings have the word strings of lap; And
M. the word in the lap of overlapping word strings or word strings are identified as the synonym or the approximate synonym of inquiry.
234. a method of carrying out the content conversion in single kind language is characterized in that described method comprises following step:
A., first group of a plurality of word strings is provided;
B., second group of a plurality of word strings is provided, each the described word strings in wherein said second group in the mode of synonym or approximate synonym corresponding to a described word strings in described first group;
C. receive the word strings inquiry that to analyze;
D. be a plurality of subclass word strings with described word strings query parse, wherein the second portion of the part of each subclass and adjacent one or more subclass is overlapping;
E. use described second group of word strings to analyze each described subclass word strings that parses, discern the synonym word strings of each described subclass word strings that parses; And
F. when any word strings that parses and described adjacent subset are overlapping, replace it with the synonym word strings.
235. a computer equipment, described computer equipment comprise processor, are connected to the storer of described processor, and are stored in the program in the described storer, it is characterized in that, described computer configuration is for carrying out described program and carrying out following step:
A., first group of a plurality of word strings is provided;
B., second group of a plurality of word strings is provided, each the described word strings in wherein said second group in the mode of synonym or approximate synonym corresponding to a described word strings in described first group;
C. receive the word strings inquiry that to analyze;
D. be a plurality of subclass word strings with described word strings query parse, wherein the second portion of the part of each subclass and adjacent one or more subclass is overlapping;
E. use described second group of word strings to analyze each described subclass word strings that parses, discern the synonym word strings of each described subclass word strings that parses; And
F. when any word strings that parses and described adjacent subset are overlapping, replace it with the synonym word strings.
236. a computer-readable medium, the procedure stores that can be carried out by computer processor is characterized in that described program is used to carry out following step thereon:
A., first group of a plurality of word strings is provided;
B., second group of a plurality of word strings is provided, each the described word strings in wherein said second group in the mode of synonym or approximate synonym corresponding to a described word strings in described first group;
C. receive the word strings inquiry that to analyze;
D. be a plurality of subclass word strings with described word strings query parse, wherein the second portion of the part of each subclass and adjacent one or more subclass is overlapping;
E. use described second group of word strings to analyze each described subclass word strings that parses, discern the synonym word strings of each described subclass word strings that parses; And
F. when any word strings that parses and described adjacent subset are overlapping, replace it with the synonym word strings.
CNB038257297A 2002-10-29 2003-09-22 Knowledge system method and apparatus Expired - Fee Related CN100380373C (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US10/281,997 US7711547B2 (en) 2001-03-16 2002-10-29 Word association method and apparatus
US10/281,997 2002-10-29
US10/659,792 2003-09-11

Publications (2)

Publication Number Publication Date
CN1720524A true CN1720524A (en) 2006-01-11
CN100380373C CN100380373C (en) 2008-04-09

Family

ID=35931770

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB038257297A Expired - Fee Related CN100380373C (en) 2002-10-29 2003-09-22 Knowledge system method and apparatus

Country Status (1)

Country Link
CN (1) CN100380373C (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101178720B (en) * 2007-10-23 2010-12-15 浙江大学 Distributed clustering method facing to internet micro-content
CN101981566A (en) * 2008-03-28 2011-02-23 微软公司 Intra-language statistical machine translation
CN102193912A (en) * 2010-03-12 2011-09-21 富士通株式会社 Phrase division model establishing method, statistical machine translation method and decoder
CN102272754B (en) * 2008-11-05 2015-04-01 谷歌公司 Custom language models
CN107169310A (en) * 2017-03-20 2017-09-15 上海基银生物科技有限公司 A kind of genetic test construction of knowledge base method and system
CN107273503A (en) * 2017-06-19 2017-10-20 北京百度网讯科技有限公司 Method and apparatus for generating the parallel text of same language
CN111709431A (en) * 2020-06-15 2020-09-25 厦门大学 Instant translation method and device, computer equipment and storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI765322B (en) * 2020-08-21 2022-05-21 伊斯酷軟體科技股份有限公司 Knowledge management device, method, and computer program product for a software project

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1994006086A1 (en) * 1992-09-04 1994-03-17 Caterpillar Inc. Integrated authoring and translation system
GB2279164A (en) * 1993-06-18 1994-12-21 Canon Res Ct Europe Ltd Processing a bilingual database.
JP3408291B2 (en) * 1993-09-20 2003-05-19 株式会社東芝 Dictionary creation support device
US5659765A (en) * 1994-03-15 1997-08-19 Toppan Printing Co., Ltd. Machine translation system
EP0834139A4 (en) * 1995-06-07 1998-08-05 Int Language Engineering Corp Machine assisted translation tools

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101178720B (en) * 2007-10-23 2010-12-15 浙江大学 Distributed clustering method facing to internet micro-content
CN101981566A (en) * 2008-03-28 2011-02-23 微软公司 Intra-language statistical machine translation
CN102272754B (en) * 2008-11-05 2015-04-01 谷歌公司 Custom language models
CN102193912A (en) * 2010-03-12 2011-09-21 富士通株式会社 Phrase division model establishing method, statistical machine translation method and decoder
CN102193912B (en) * 2010-03-12 2013-11-06 富士通株式会社 Phrase division model establishing method, statistical machine translation method and decoder
CN107169310A (en) * 2017-03-20 2017-09-15 上海基银生物科技有限公司 A kind of genetic test construction of knowledge base method and system
CN107169310B (en) * 2017-03-20 2020-06-26 上海基银生物科技有限公司 Gene detection knowledge base construction method and system
CN107273503A (en) * 2017-06-19 2017-10-20 北京百度网讯科技有限公司 Method and apparatus for generating the parallel text of same language
CN107273503B (en) * 2017-06-19 2020-07-10 北京百度网讯科技有限公司 Method and device for generating parallel text in same language
CN111709431A (en) * 2020-06-15 2020-09-25 厦门大学 Instant translation method and device, computer equipment and storage medium
CN111709431B (en) * 2020-06-15 2023-02-10 厦门大学 Instant translation method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN100380373C (en) 2008-04-09

Similar Documents

Publication Publication Date Title
Balsmeier et al. Machine learning and natural language processing on the patent corpus: Data, tools, and new measures
Sætra AI in context and the sustainable development goals: Factoring in the unsustainability of the sociotechnical system
Park et al. ConceptVector: Text visual analytics via interactive lexicon building using word embedding
Baviskar et al. Efficient automated processing of the unstructured documents using artificial intelligence: A systematic literature review and future directions
Ayedh et al. The effect of preprocessing on arabic document categorization
CN1227645A (en) Iterative problem solving technique
CN1204515C (en) Method and apparatus for processing free-format data
CN1535433A (en) Category based, extensible and interactive system for document retrieval
Gwinn et al. The Biodiversity Heritage Library: sharing biodiversity literature with the world
Uetz et al. A quarter century of reptile and amphibian databases
CN110674252A (en) High-precision semantic search system for judicial domain
Hoekstra et al. Data scopes for digital history research
CN1777888A (en) Method for sentence structure analysis based on mobile configuration concept and method for natural language search using of it
CN1871597A (en) System and method for associating documents with contextual advertisements
CN1255213A (en) Language analysis system and method
Khan et al. Exploring the frontiers of deep learning and natural language processing: A comprehensive overview of key challenges and emerging trends
CN1720524A (en) Knowledge system method and apparatus
Chen et al. Semantic Space models for classification of consumer webpages on metadata attributes
Yin et al. Improving sentence representations via component focusing
Blanke et al. Ocropodium: open source OCR for small-scale historical archives
CN1924995A (en) Content analysis based short message ask/answer system and implementing method thereof
Chung et al. A bibliometric analysis of the literature on open access in scopus
Abdumanapovna The contemporary language studies with corpus linguistics
Jouis Next Generation Search Engines: Advanced Models for Information Retrieval: Advanced Models for Information Retrieval
Przybyła Boosting question answering by deep entity recognition

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C19 Lapse of patent right due to non-payment of the annual fee
CF01 Termination of patent right due to non-payment of annual fee