Discussion:
here's a word macro to fix wrap multi-line text when pasting
(too old to reply)
josh
2009-05-29 03:24:44 UTC
Permalink
Announcing a humble little tool that can save a lot of time, not to
mention tedium and frustration.

Problem: when copying multiple lines of text from PDF into Word
documents, there’s no easy way to merge multiple lines of text into a
single paragraph in Word. The busy author must hand-edit, line by
line, removing line breaks and adding spaces.

Solution: With this simple macro for Microsoft Word 2007, press CTRL-
SHIFT-V and the clipboard’s text is pasted without CR/LF “wrap”
characters, so you get a single, merged line of text.

You can get it at https://sourceforge.net/projects/wordwrap-macro/

V1 works with Word 2007 on Vista in English, but it’s open-source. I
had very little time to implement this, so feel free to improve its
crude but effective design! (please update the sourceforge project if
you do, so your work is available to everyone.)

-Josh
Josh Whitkin - Lecturer, Games Art & Design, Murdoch University, Perth
Australia
DeanH
2009-05-29 12:00:59 UTC
Permalink
Have you seen these?
http://sbarnhill.mvps.org/WordFAQs/CleanWebText.htm
http://www.word.mvps.org/FAQs/Formatting/CleanWebText.htm
http://gregmaxey.mvps.org/Clean_Up_Text.htm

All the best
DeanH
Post by josh
Announcing a humble little tool that can save a lot of time, not to
mention tedium and frustration.
Problem: when copying multiple lines of text from PDF into Word
documents, there’s no easy way to merge multiple lines of text into a
single paragraph in Word. The busy author must hand-edit, line by
line, removing line breaks and adding spaces.
Solution: With this simple macro for Microsoft Word 2007, press CTRL-
SHIFT-V and the clipboard’s text is pasted without CR/LF “wrap”
characters, so you get a single, merged line of text.
You can get it at https://sourceforge.net/projects/wordwrap-macro/
V1 works with Word 2007 on Vista in English, but it’s open-source. I
had very little time to implement this, so feel free to improve its
crude but effective design! (please update the sourceforge project if
you do, so your work is available to everyone.)
-Josh
Josh Whitkin - Lecturer, Games Art & Design, Murdoch University, Perth
Australia
Larry
2009-06-16 09:21:01 UTC
Permalink
Here is a new macro that seems to work cleanly, that gets around a rub in
Josh's approach by combining part of his technique (manipulating text in the
clipboard) with another technique I had been using for a couple of years that
had its own awkwardness. The problem of pasting text with hard carriage
returns into Word has long aggravated me, and after looking many times on the
web I have found no good solution -- but I think this does it!

From a selected block of pasted text, after the user makes some minor edits
to mark the desired paragraph breaks, this macro strips out excess CRLFs and
hanging hyphens (which otherwise would become embedded in mid-line). Here is
the code:

-- Larry

Sub X-XS-CRLFs()

' ---------- Copy from below here down to similar line near bottom ---------
'
' X-XS-CRLFs Macro
' Macro created 6/15/09.
'
' This is an MS Word macro by Larry Edwards, June 2009.
' ***@gci.net (Box 6484 Sitka, Ak 99835)

' PURPOSE: Remove Excess CR-LFs And Hanging Hyphens From a Selected Block
' of Text That Has Been Pasted Into Ms Word (as from a PDF file).

' INSTALLATION: (1) In Word, click Tools/Macro/Macros/Create. If prompted
' to replace a macro, click "No." In the Macro Name box, cut and paste this:
' X-XS-CRLFs and click the Create button again. (2) You will
' be transferred to a Visual Basic window, with the cursor located where you
' need to paste this macro. Do not change any text you see there. Copy
' everything between the dashed lines near the top and bottom of this file,
' and paste it at that cursor. (3) Click File/Save-Normal, and exit visual
' basic. (4) If you wish, you may add a button on your toolbar to activate
' the macro, making it more accessible than described in (D) below.

' INSTRUCTIONS FOR USE:

' (A) Paste the copied PDF text into the word document (and these
' instructions illustrate assuming multiple paragraphs going inbetween
' existing correctly formatted paragraphs). Click the paragraph symbol in
' the toolbar to turn on Word's formatting symbols. There likely is one
' symbol at the end of each line of text (meaning Word treats each line as a
' separate paragraph), and those lines are likey far short of the right
' margin. (B) Leave the paragraph symbols intact, but hit the Enter key to
' add a second paragraph symbol everywhere there should be a real paragraph
' break. This is not necessary for the last pasted paragraph that you want
' to correct. (C) With the mouse, select all text you from which you want to
' remove excess CRLFs. (D) Run this macro by hitting F8 (or alternately
' Tools/Macro/Macros), highlight the X-XS-CRLFs macro,
' and click Run. Excess CRLFs and hanging hyphens are removed.

' FINISHING UP: Text will appear best if you have a "normal style" that is
' set to provide whitespace above or below each paragraph. If not, you
' will likely want to manually add a CRLF (Enter key) at the end of each
' paragraph. Also, in removing hyphens at ends of lines this macro may have
' removed some necessary hyphens. Proof read for this, such as by looking
' for red spell-checker underscores.

' TROUBLESHOOTING: (1) If you forget to insert the paragraph breaks and
' end up with a wad of text, use Alt-Backspace until you get back to your
' original paste. (2) If the macro does not work, you likely need to
' activate the "Microsoft Forms 2.0 Object Library," which can be done
' from within Word's macro editor. (Click: Tools/References and check-mark
' the item of that name. If it is not there, use the browse button to find
' C:\Windows\System32\FM20.dll, select it to the list, and be sure it is
' checked. Click OK.

' HOW IT WORKS:

' The macro will copy the selected text to the clipboard, clean it up, and
' paste back, in place of the selection. This allows operation on the
' selected text in isolation from any text that will be above or below in
' the actual document, and works effectively you have put in the real
' paragraph breaks that Adobe Reader, Fox-It (or whatever) cannot detect.

' The macro combines strategies developed by (1) PC Magazine and (2) Josh
' Whitkin (https://sourceforge.net/projects/wordwrap-macro/).

' (1) An old article in PCM showed how to replace the various
' permutations of "new line" charcacters but the technique operates on an
' entire document, which will corrupt pre-existing text in the document.
' Unfortunately, I was unable to expand the macro to work on blocks of
' selected text, because -- even in a macro -- the "find and replace"
' operation the technique relies on causes the found text to become the
' selection, preventing operations on the pasted text from completing.

' (2) In May 2009, Josh Whitkin posted an incomplete macro on Source Forge
' which proposed doing the CR-LF removals while the text is still within
' the Windows clipboard. Doing such operations in the clipboard before the
' text is actually pasted to the document works to a degree, but suffers
' from not being able to tell where the real paragraph breaks should be.
' The result is a wad of text that has to be untangled manually, although
' at least preexisting text in the document is undisturbed.

' So, (3) ... this macro combines parts of both approaches, but uses some
' minimal prep-work by the user so that the end result will be accurate.
' Pulling the prepped text back into the clipboard allows text manipulation
' while leaving the selected text in the document unaffected and in a state
' ready to be substituted by pasting the converted text. Clean and easy.

' T H E M A C R O :

' ------- VARIABLES --------

Dim q() As Byte ' clipboard char array
Dim h As Long ' current character counter of q
Dim i As Long ' current character counter of q + 1
Dim j As Long ' current character counter of q + 2
Dim k As Long ' current character counter of q + 3

Dim z() As Byte ' target char array
Dim n As Long ' counter of z

Dim u As Long ' Shadow for UBound(z) length

Dim x As String 'temp string

Dim NoLastPair As Boolean ' This is a flag of whether the last character
' was a LF or not

Dim y As DataObject ' Holds the clipboard comments.


' ------- CODE ------

Set y = New DataObject

' CLEAN UP THE TEXT A BIT.

Selection.ParagraphFormat.Alignment = wdAlignParagraphLeft
Selection.Find.ClearFormatting

Selection.Find.ParagraphFormat.Alignment = wdAlignParagraphLeft
Selection.Find.Replacement.ClearFormatting

' COPY TEXT TO THE CLIPBOARD AND GO TO WORK.

Selection.Copy

y.GetFromClipboard
x = y.GetText(1)

z = StrConv(x, vbFromUnicode) ' Convert to an array of bytes.
q = z ' Make a duplicate string as target.

n = 0 ' As index and as true target EOS.

NoLastPair = False
HyphenDeleted = False

For h = 0 To UBound(z)

u = UBound(z)
i = h + 1
j = h + 2
k = h + 3

If i = u Then 'Keep i inbounds when h is ...
GoTo FINAL_STEPS
End If
If j = u Then 'Keep j inbounds when h is ...
NoLastPair = True
End If
If k = u Then 'Keep h inbounds when h is ...
NoLastPair = True
End If

If z(h) = 12 Then ' ^L.
GoTo Next_Char ' Pass over ^L.
End If ' "Next" increms h.

If z(h) = 45 And z(i) = 13 Then ' "hypen""CR". (EOL hyphen)
h = h + 1 ' Bypass hyphen, to CR.
i = i + 1 ' Move 2nd pointer to nxt char.
j = j + 1
k = k + 1
HyphenDeleted = True
GoTo TEST_DOUBLE_CRLF
End If

If z(h) = 13 And z(i) = 10 Then ' "CRLF "
If z(j) = 32 Then ' If yes, bypass all three.
h = h + 2 ' "Next" increms h again.
GoTo Next_Char
End If
GoTo TEST_DOUBLE_CRLF
End If

If z(h) = 32 And z(i) = 13 Then ' " CRLF"
h = h + 1 ' Bypass the space.
i = i + 1
j = j + 1
k = k + 1
GoTo TEST_DOUBLE_CRLF
End If

GoTo COPY_TEXT_CHAR ' "Next" increms h again.

TEST_DOUBLE_CRLF:
If NoLastPair Then
GoTo FINAL_STEPS
End If
If h = u Then
GoTo FINAL_STEPS
End If
If i = u Then 'Keep i inbounds when h is ...
GoTo FINAL_STEPS
End If
If j = u Then 'Keep j inbounds when h is ...
GoTo FINAL_STEPS
End If
If k = u Then 'Keep h inbounds when h is ...
GoTo FINAL_STEPS
End If

If z(h) = 13 And z(i) = 10 Then ' CRLF?
If z(j) = 13 And z(k) = 10 Then ' Double CRLF?
q(n) = 13 ' Copy only one CRLF
n = n + 1
q(n) = 10
n = n + 1 ' Increm target string.
h = h + 1 ' "Next" increms h again.
GoTo Next_Char
End If

h = h + 1 ' Bypass single CRLF
If HyphenDeleted Then
GoTo Next_Char
End If
If Not z(j) = 32 Then ' Add space after CRLF, if none.
q(n) = 32
n = n + 1
End If
GoTo Next_Char ' "Next" increms h again.
End If

COPY_TEXT_CHAR:
q(n) = z(h) ' Copy a text character.
n = n + 1

Next_Char:
HyphenDeleted = False
Next ' Access next source character.


' CLEAR OUT THE X STRING, AND MOVE THE CONVERTED STRING THERE.

FINAL_STEPS:

x = ""

'Below, we truncate the other string at n because it'll have extra
'chars 'at the end, since it was the same length as input string,
'... and ...
'chr() converts ASCII byte code to its character. So we build the output
'string, char by char.

For i = 0 To n - 1 ' n-1 = the end of the converted string.

x = x + Chr(q(i))

Next

Selection.TypeText (x) ' Paste back into the Word document.

' ---------- Copy down to above here from similar line near top ----------

End Sub

Loading...