Re: Cv Arabic Docx

0 views
Skip to first unread message
Message has been deleted

Germana Layng

unread,
Jul 9, 2024, 3:30:52 AM7/9/24
to maefersnetsu

the language I am working with is farsi/arabic (they are right to left languages) , so i have difficult time using python-docx. I can't extract texts in appropriate form, they all be mixed up in .txt file.

I think first the appropriate form need to be defined. if you are working on a NLP project you need to have the sentences and each word in the sentences. I think the following code can be helpful for extracting text from a docx file. (Python 2.7)

cv arabic docx


تنزيل الملف https://urllio.com/2yZ6xh



I use LibreOffice on Linux and sometimes have to proofread .docx files for a friend.These files are written in German, but they are created on a Windows computer where the default setting is Arabic.They are also formatted in an untolerable way (on the lines of all bold and 20 pt).

In the document as I receive it, the formatting is garbage, but the text flow is actually correct.When I select all text and click Clear formatting, the text flow is changed to right-to-left, and periods go to the left as well.

First thing first: Language has nothing to do with your plight. You must look for something called "Text direction". Arabic and Persian are by mistake called "Right-to-left languages" but they are in fact "Bi-directional languages".

You must edit the "Normal" style of the document and change the text direction from "Right-to-left" to "Left-to-right". This is the style that gets enforced when you click on "Clear Formatting". (In some rare cases, "Normal (web)" gets enforced.) After that, select the whole document, open the Paragraph formatting dialog box and set direction to Left-to-Right.

Now, there is a frightening part too: Sometimes, I receive documents from unknown origins that for some reason, cause the Direction radio buttons to be grayed out! In this case, I use 7-zip to peak into the document and edit style.xml in Visual Studio Code. Here is the part for Normal style:

I have a magento CMS ecommerce based group buying deals website based in UAE and my developers have coded it in a way to send out PDF coupon vouchers in Arabic language to customers who bought the deal.

The problem is that the characters inside the PDF look very akward! They are disjointed and not connected together and they are displayed in a reversed order which can only be readable from left to right. So in other words the coupon vouchers in Arabic look extremely messy!

From what I see, it is not related to PDF or Acrobat, your CMS system may not support Arabic fully or it's not PDFing in the right way. The developers need to look into this matter, but I apologies for not having any suggestions for them.

If you can't get the people who designed your coupon to fix the problems, you can save the PDF as a Microsoft Word document and edit in an Arabic version of Word. Word should have proper Arabic fonts and right to left text flow. Then you convert to PDF from Word's Acrobat tab. It is not the best answer, but if you cannot go back to the place that designed your coupon, you may have to redo it in a proper editor and export it to PDF.

Even Searching in Chrome PDF viewer is done properly ... While in Acdobe Reader and also MS Explorer & Edge the text is extracted in REVERSED order .. even highlighting text is a BIG MESS in these products .. while i Chrome things go very smoothly

@Yamani_De you are right, i also have the same issue when i convert pdf files into excel, the arabic language displayed in reverse mode and i think that this problem must be solved in the acrobat software

Thousands of Arabic Acrobate PDF users are comlaining about the SAME EXACT thing: disjoint Arabic letters. Adobe Need to FIX this coronic problem. Stop beating around the bush and start fixing.

I believe this is a compiling issue with the program and PDF, I have the same challenge before. On our system we can write Arabic letters with no problem at all, but when we export report to PDF , pdf reverses the letter and separated the arabic letter.

See, I know Acrobat has issues with Arabic in some aspects, however whether we are on Mac or PC, we're generating PDF with Arabic content for decades now using Microsoft Office (Mac lately), Adobe Illustrator, InDesign, Photoshop, AutoCAD, and from web pages. These PDFs with Arabic content has been the standard media in printing industry with no issues apart from common problems not related to Arabic language.

Original Poster is using a specific Content Management System where they generate PDFs from it. OP didn't come back to tell us how he is generating the PDF from their CMS, but you may tell us how you're doing it, and only this way we can judge if it is an Acrobat issue or not.

Even in Microsoft SharePoint .. when Searchable PDFs with Right-to-Left scripts ( such as Arabic) , are indexed .. SharePoint only recognizes reversed text .. and shows search preview with weirdly reversed text ...

this is surely rooted to the Filter that Microsoft bundles with SharePoint .. that filter is suffering from the same issues that are shown in Adobe Reader + Internet Explorer & MS Edge ... While Google's Chrome does not show such issues and handles right-to-left words properly

Ziad, for someone who comments a lot in this thread with a title 'expert', you failed to present a solution to our common problem. Either you don't have the solution or that your corporate interests require that you don't share it with us. Thanks for nothing.

@THinkFirst ... We're proud of our language ... the richest in the world .. living for more than 2000 years ..
We Arabs inveted the decimal system ... that is the basis of building the civilized world ... while & your people were living in caves ... and spending your time hunting & fighting ...

I have the same problem when converting a PDF file containing arabic words to Excel using EXPORT PDF from within Adobe Acrobat Reader DC , after buying the annual subscription. The resulted excel file contained the arabic words in a messy form. The arabic letters were disjointed and inverted (from left to right).

I converted the file into Word (.docx) file by EXPORT PDF also, and the resulted file was in a very acceptable way, that means the arabic words were in the right direction and the letters were jointed.

The purpose of this topic is to demonstrate how the Duxbury Braille Translator can work with documents in many different languages. This DBT Help holds a significant number of language samples in Microsoft Word format. There are actually a number of ways to import these sample files into DBT: with or without using Word, and with or without SWIFT.

SWIFT is an add-in to MS Word designed to facilitate quick operations using a combination of Word and Duxbury DBT. You can obtain the installation file for SWIFT from the Duxbury Systems website ( ) or from your DBT installation CD-ROM.

Find your favorite language in the tables below, and click on the name of the .docx file you want to use. This opens a dialog that invites you to Save the file, Open it, or Cancel. Clicking Open launches Word with the sample file.

If you have SWIFT installed, your ribbon interface now shows a Braille tab. Click on the Braille tab to access SWIFT. It does a number of services for you. First, each of the sample files is internally marked with the name of the most appropriate DBT template (the one most commonly used with that language). SWIFT reads the template name to pass it along to DBT automatically. Second, it provides you with several quick processing options. Depending on your needs, you choose one of three output options (Emboss Direct, Open in DBT or Print Braille):

I have wrote a document by Arabic language Arial font, after I finished I saved the documnet in doxc extension to open it later on other device, but when I opened it by Libre again I noteced all my written word have been turned to just ? question marks , What is the solve of this problem, and how can I get back (recover) my written document? help please

Are you really sure you saved as .docx? It looks rather that you somehow saved as .txt. .txt and .docx (OOXML variant) follow one another in the menu. It is easy to click on the wrong one. You then get a popup dialog asking you to choose the encoding. You can of course select one of the Arabic candidates but they are legacy alphabets.

The attached file probably comes from a Mac (linebreak is CR+LF). Apart from a few numbers, it consists exclusively of U+003F QUESION MARK and some spaces. As @gabix pointed out, it is neither an .odt not a .docx document. It was probably a plain text one but it underwent encoding conversion to ASCII, losing all non-ASCII characters. There is some math inside but everything is damaged beyond repair.

The first few pages of my document (introductory material) should be numbered using roman numerals. Starting with a specific chapter, the remaining pages should be numbered arabic, starting from one (1) again.

If you are using the book class, start your document with \frontmatter. This changes the numbering to roman numerals. Then mark the main part of with \mainmatter. There are also \appendix (which changes the chapter numbering to uppercase letters) and \backmatter.

\frontmatter and \mainmatter will suffice for different styles of page numbering, but not for roman numbering of introductory tables (which are quite uncommon). Assuming you have only one frontmatter chapter that includes tables, the following should do the trick:

Converting PDFs to other formats, such as Word and Text, can help fulfill various requirements. The same is the case with PDFs written in Arabic. We know there aren't many tools that support this language, but such conversion needs can arise anytime. It even gets difficult when you have a scanned PDF and need to OCR Arabic PDF to Word.

When dealing with scanned PDFs in Arabic, extracting data from them seems an uphill task. For that, we will discuss a few free online tools that help you convert scanned Arabic PDFs to Word or Text using OCR technology. Moreover, this guide will cover one of the best PDF editors to help you perform OCR on PDFs and edit them on the spot.

03c5feb9e7
Reply all
Reply to author
Forward
0 new messages