lora-sp | a158640 | 2023-03-13 15:58:30 +0100 | [diff] [blame] | 1 | Ill-formed documents: |
| 2 | |
lora-sp | 090514f | 2023-04-06 11:30:44 +0200 | [diff] [blame] | 3 | - 132 instances of unescaped "&" in text-elements (fixed) |
lora-sp | a158640 | 2023-03-13 15:58:30 +0100 | [diff] [blame] | 4 | - doc "investor.bg - 2020-01-04.xml" contains ill-formed line "<p><div</p>" (line 168) |
lora-sp | 090514f | 2023-04-06 11:30:44 +0200 | [diff] [blame] | 5 | - doc "svobodnaevropa.bg - 2020-01-04.xml" lacks author name for second text |
| 6 | - doc "webcafe.bg - 2020-01-10.xml" lacks at least one author name |
| 7 | - in all of the 10 docs from dnevnik.bg, there is a string of the following form: |
| 8 | [class*="general-article"] .article-content > p:first-of-type::first-letter { float: none; font-size: 17px; line-height: 1.42em; padding: 0; } |
| 9 | it can be found by the command grep -e "\[.*\}" *.xml |