Abstract:
Bureau Works TMill is a state if the art semantic Translation Memory clean-up mechanism that can ensure your legacy Translation Memories become fresh and as good as new.
Scenario:
Translation memories degrade and depreciate over time. Large translation memories tend to have a wide array of problems including but not limited to:
Major Severity
- discrepancy between language and locale and TUV. e.g. TUV says PT-BR but the translated text in in Korean.
- incorrect translations in TUVs that deviate significantly from source meaning
Minor Severity
- tag mismatches between source and target Translation Unit Variants (TUVs)
- terminology mismatches between TUV and glossary
- spelling mistakes
- grammar mistakes
- out of date tone e.g. Formal Vs. Informal
- cultural inappropriateness (could be major depending on the case)
The Challenge:
Most of these issues cannot be picked up by purely syntactical verification. They require semantic analysis. Semantic analysis used to be prohibitively expensive. As a result over time, as Translation Memories degraded through time and scale, the only solution was to apply TM-wide penalties which resulted in negative economic impact due to leveraging loss, or to live with the endless propagation of errors which negatively impacted translation quality.
The Solution: Bureau Works TMill
TMill is a semantic Translation Memory clean-up tool that leverages Bureau Works' ML Tech Stack. Using a variety of models including GPT-3.5 and 4 allied with our NLP tools and methodology, our tools can clean up TMs of all sizes, locales, and health.
Requirements: TMX files
Minimum processing unit per request: 1,000,000 TUVs
The TMill Methodology
Stage 1: Preparation of Assets
Once we have received client assets in TMX format we perform file integrity checks to ensure the content is ingestion-ready. This means that there are no major discrepancies between locales, no significant numbers of empty TUVs, no file structure issues or other red flags can be spotted with simple checks.
Stage 2: Initial Analysis
Ingestion-ready assets are processed through our TMill engine. Our engine will read and check all TUVs for the error types identified above and produce a detailed report. This report will outline per locale:
- the total number of TUVs analyzed
- the number of TUVs in each error category
Note: If a TUV contains multiple error categories we will group it in the error category of highest severity
Stage 3: Joint Strategy Definition
Based on the initial analysis, Bureau Works will make clean=up recommendations and also listen to client input to arrive at a final desired scenario. Certain error types may be considered too toxic and requested to be removed for instance while others may be considered fixable. It's a matter of making use-case-sensitive decisions.
Stage 4: Fixing
At this stage, Bureau Works TMill will reingest all of the segments, stratified according to the error type, and implement the best attempt at fixing the identified issues. Some issues have nearly flawless best-attempts such as proofreading while others such as tag-fixing are more subject to file-structure and locale dependencies.
Stage 5: Packaging and Deliverables
TMs will be reassembled according to the knowledge management hierarchy defined by the join strategy. TMs can be all regrouped together, divided by error kind, authors, time frames, and other relevant metadata that can be used to further guide the overall TM management strategy.
For a detailed proposal please reach out to help@bureauworks.com
Comments
0 comments
Please sign in to leave a comment.