银行行号批量查询系统_使用批量插入处理行号和错误-白红宇

银行行号批量查询系统_使用批量插入处理行号和错误

阅读量：2510 次

发布时间：2019-05-11

本文共 10992 字，大约阅读时间需要 36 分钟。

银行行号批量查询系统

In the of reviewing the basics of bulk insert, we looked at importing entire files, specifying delimiters for rows and columns, and bypassing error messages. Sometimes we’ll want to skip first and ending lines, log errors and bad records for review after inserting data, and work with data types directly without first importing using a varchar and converting to the data type later. In this part, we look at these techniques using T-SQL’s native bulk insert.

在回顾大容量插入基础知识的中，我们研究了导入整个文件，为行和列指定分隔符以及绕过错误消息。有时，我们将希望跳过第一行和最后一行，在插入数据后跳过错误和错误记录以供查看，并直接使用数据类型，而无需先使用varchar导入，然后再转换为数据类型。在这一部分中，我们将使用T-SQL的本机大容量插入来研究这些技术。

跳过行并使用数据类型 (Skipping Lines and Working With Data Types)

With many imports, the first row of data in the file will specify the columns of the file. We do not necessarily have to keep the column names when we use bulk insert to insert the data, but we should be careful about inserting data into a specific format, if we also insert the first row that doesn’t match that format. For an example of this, in the below code that creates the table and the below image of the file, we see that the first line of data from the file has values like SpefzA, SpefzB, SpefzC, SpefzD which don’t match the table’s data type (except in 2 cases). We also see that other values in the file don’t match those data either – for instance, the 0 or 1 of SpefzC looks like a bit and not a varchar. The file’s first line in this case tells us about the data in the 2^nd row through the end of the file, so when we insert the data from the file, we’ll want to skip the first row.

对于许多导入，文件中的第一行数据将指定文件的列。当我们使用批量插入来插入数据时，我们不一定必须保留列名，但是，如果我们还插入与该格式不匹配的第一行，则应谨慎地将数据插入特定格式。例如，在下面的创建表的代码和文件的下面的图像中，我们看到文件的第一行数据具有与表的值不匹配的值，如SpefzA，SpefzB，SpefzC，SpefzD。数据类型（2种情况除外）。我们还看到文件中的其他值也不匹配这些数据-例如，SpefzC的0或1看起来有点而不是varchar。在这种情况下，文件的第一行告诉我们文件末尾^第二行中的数据，因此，当我们从文件插入数据时，我们将跳过第一行。

CREATE TABLE etlImport5(  VarcharSpefzTen VARCHAR(10),  IntSpefz INT,  BitSpefz BIT,  VarcharSpefzMax VARCHAR(MAX))

The first line in our file we want to exclude as it won’t match the above format of our table.

我们要排除的文件第一行与表的上述格式不匹配。

In the below bulk insert, we specify the file, comma as the column terminator (called FIELDTERMINATOR), and a new line character as the row terminator. We also add another option – the first row we specify as 2 (the default is the start of the file – row 1). This will skip the first line that we see in the above image and start where the A,1,0,”Range” line is. After our insert, we check our table with a select and see the data without the first row from the file.

在下面的批量插入中，我们指定文件，逗号作为列终止符（称为FIELDTERMINATOR），换行符作为行终止符。我们还添加了另一个选项-我们将第一行指定为2（默认为文件的开头-第1行）。这将跳过我们在上图中看到的第一行，并开始于A，1,0，“ Range”行。插入后，我们通过选择检查表，并查看不带文件第一行的数据。

BULK INSERT etlImport5FROM 'C:\ETL\Files\Read\Bulk\daily_20180101.txt'WITH (  FIELDTERMINATOR = ',',  ROWTERMINATOR = '\n',  FIRSTROW = 2) SELECT *FROM etlImport5

Our bulk insert skipped the first line, which was only specifying the columns in the file.

我们的批量插入跳过了第一行，该行仅指定文件中的列。

In some cases, files have first rows simply specify what we’ll find in the file and we can skip this row by starting on row 2 (or if there are several rows to skip, we can start on the row 3, 4, etc). In rarer cases, we may come across files that have an ending line that doesn’t match the format at all. Sometimes these have a sentence with a statement about the end of the file, or in some cases, they may have a line that says something to the effect of “File created at 2018-01-01 at 12 AM.” Since these don’t match the file format at all, they would normally throw an error, but like we can specify the first row of a file, we can specify the last row of a file.

在某些情况下，文件的第一行仅指定我们将在文件中找到的内容，我们可以从第2行开始跳过该行（或者如果要跳过几行，则可以从第3、4行开始，以此类推）。在极少数情况下，我们可能会发现文件的结尾行根本与格式不匹配。有时，这些语句带有一个关于文件结尾的语句，或者在某些情况下，它们的一行可能会说出“ 2018年1月1日凌晨12点创建的文件”。由于这些根本不符合文件格式，因此它们通常会引发错误，但是就像我们可以指定文件的第一行一样，我们可以指定文件的最后一行。

We add two lines – one that matches the format and the last line that is a sentence.

我们添加了两行-一行与格式匹配，最后一行是句子。

In our file, we’ll add two lines – one of the lines matches our current format with three commas separating data types that match our created table. The last line is a sentence with no delimiters and that doesn’t match our format. In the below code, well empty our table by truncating it and try to bulk insert our updated file without specifying and ending row and seeing the error. After that, we’ll bulk insert the file, but this time specifying that row 7 is the last row of data we want to insert (the bad row is on row 8).

在我们的文件中，我们将添加两行–其中一行与我们当前的格式匹配，并以三个逗号分隔与创建的表格匹配的数据类型。最后一行是没有定界符的句子，与我们的格式不符。在下面的代码中，将其截断以清空表，然后尝试批量插入更新的文件，而无需指定和结束行并看到错误。之后，我们将批量插入文件，但是这次指定第7行是我们要插入的数据的最后一行（坏的行位于第8行）。

TRUNCATE TABLE etlImport5 --- Fails on last lineBULK INSERT etlImport5FROM 'C:\ETL\Files\Read\Bulk\daily_20180101.txt'WITH (  FIELDTERMINATOR = ',',  ROWTERMINATOR = '\n',  FIRSTROW = 2) --- PassesBULK INSERT etlImport5FROM 'C:\ETL\Files\Read\Bulk\daily_20180101.txt'WITH (  FIELDTERMINATOR = ',',  ROWTERMINATOR = '\n',  FIRSTROW = 2,  LASTROW = 7) SELECT *FROM etlImport5

Our first bulk insert fails because the ending line doesn’t match our format.

我们的第一个批量插入失败，因为结尾行与我们的格式不匹配。

When we specify the last line of 7, which is the last line that matches our file’s format, the bulk insert passes.

当我们指定7的最后一行，即与文件格式匹配的最后一行时，批量插入将通过。

While we won’t see ending lines like the above ending line above this, it’s useful to know that bulk insert can insert a range of data in a file, making leaving out an ending line number easy.

虽然我们看不到像上面的上述结束线那样的结束线，但了解批量插入可以在文件中插入一定范围的数据，这很容易使结束线号省略，这很有用。

批量插入记录错误 (Logging Errors With Bulk Insert)

As we looked at in the first part of this series, sometimes we will have data that don’t match our table’s format, even if the number of delimiters is present and we can specify a maximum error amount to allow the insert to continue happening, even if some lines don’t get added to our table. This might be easy when it’s one or two lines, but if we have a large file and we want to see a log of all the errors? In our example file, we’ll add some bad data to our file along with good data and test logging these erroneous rows of data.

正如我们在本系列的第一部分中所看到的那样，即使存在分隔符的数量，有时我们也会有与表格式不匹配的数据，并且我们可以指定最大错误量以允许插入继续发生，即使某些行没有添加到我们的表中。如果只有一两行，这可能很容易，但是如果我们有一个大文件，并且想查看所有错误的日志，该怎么办？在示例文件中，我们将一些不良数据与良好数据一起添加到我们的文件中，并测试记录这些错误的数据行。

We add three lines of bad data that we’ll be logging to review.

我们添加了三行不良数据，我们将记录这些不良数据进行审查。

TRUNCATE TABLE etlImport5 BULK INSERT etlImport5FROM 'C:\ETL\Files\Read\Bulk\daily_20180101.txt'WITH (  FIELDTERMINATOR = ',',  ROWTERMINATOR = '\n',  FIRSTROW = 2,  MAXERRORS = 100,  ERRORFILE = 'C:\ETL\Files\Read\Bulk\daily_20180101_log') SELECT *FROM etlImport5

We see the three error messages from the three bad rows of data.

我们从三个错误的数据行中看到三个错误消息。

We still have the correct data inserted even with the errors.

即使有错误，我们仍然可以插入正确的数据。

Because we’ve allowed up to 100 errors and we only experienced 3 errors, bulk insert continued to insert data and we see the 9 rows of good data, even with the error message about the 3 bad records. We also specified an error file and when we look at our file location, we see two additionally created files – daily_20180101_log and daily_20180101_log.Error. The below images show these files in the directory and their contents (I opened these files in Notepad).

因为我们最多允许100个错误，而我们仅遇到3个错误，所以大容量插入继续插入数据，即使看到有关3条不良记录的错误消息，我们仍然可以看到9行良好的数据。我们还指定了一个错误文件，当我们查看文件位置时，我们会看到两个附加创建的文件– daily_20180101_log和daily_20180101_log.Error。下图显示了目录中的这些文件及其内容（我在记事本中打开了这些文件）。

Two error files appear after bulk insert passes with three errors.

批量插入通过后出现三个错误，出现两个错误文件。

We see the error records with the latter one appearing due to end of file reasons.

我们看到错误记录，由于文件结尾的原因，出现了后一个错误记录。

We see the three error row numbers that were experienced during the bulk insert with details.

我们将看到批量插入过程中遇到的三个错误行号，并提供详细信息。

The general log file shows us the erroneous rows with the latter erroneous row appearing due to end of file reasons (if you add a row with the data J,10,0,”Range” and retest, you’ll only get three error rows because the last row of data is legitimate)The .Error file specifies the erroneous rows of data, in this example rows 8, 11 and 13 and gives us the type of problem with these rows of data as they relate to the table – HRESULT 0x80020005 – which indicates a type mismatch. When we review the file and table, we see that the type expected was a bit, but these values are a 2 and a bit can only be a 1 or 0. From a debugging perspective, this can save us a lot of time, as our data context may allow us to bypass some erroneous rows that we can quickly review along with seeing where they occurred in the file. It is worth noting here that in order to use the error file, the user executing this will need write ability to the file location, or another file location.

常规日志文件向我们显示了错误的行，由于文件末尾的原因，出现了错误的行（如果您添加数据J，10,0，“ Range”的行并重新测试，则只会得到三个错误行）（因为最后一行数据是合法的）。.Error文件指定了错误的数据行，在本示例中为第8、11和13行，并向我们提供了与表相关的这些数据行的问题类型– HRESULT 0x80020005 –表示类型不匹配。当我们查看文件和表时，我们看到期望的类型是一点，但是这些值是2，而一点只能是1或0。从调试的角度来看，这可以节省很多时间，因为我们的数据上下文可能允许我们绕过一些错误的行，我们可以快速查看它们并查看它们在文件中的位置。在这里值得注意的是，为了使用错误文件，执行此错误的用户将需要对文件位置或其他文件位置具有写能力。

More important than understanding how to allow and log errors is recognizing when these are appropriate for our data context. As an example, if we’re testing a data set for validity, use or other purposes, we want the data even if it has some errors as we’re in the process of testing it. This is also true for other business use-cases where we may accept some bad data, if most of the data can be validated (for instance, manual data entry is error-prone, so with data types like this, we will expect a percent of data to be erroneous). On the other hand, there may be contexts where one or two erroneous rows will invalidate a data set and, in these situations, we can use the default option of zero maximum errors (no need to specify the parameter) and avoid logging them.

比了解如何允许和记录错误更重要的是认识到何时这些错误适合我们的数据上下文。例如，如果我们出于有效性，用途或其他目的而对数据集进行测试，则即使在测试过程中出现某些错误，我们也希望数据。对于其他业务用例也是如此，如果我们可以接受一些不良数据，并且大多数数据都可以得到验证（例如，手动数据输入容易出错，那么对于此类数据类型，我们希望得到一个百分比）数据错误）。另一方面，在某些情况下，一两个错误行会使数据集无效，在这种情况下，我们可以使用零最大错误的默认选项（无需指定参数）并避免记录它们。

摘要 (Summary)

We’ve reviewed using bulk insert’s first and last row feature, which can be helpful when we have delimited files that have headers which map out the data, but which we don’t want to import due to being incompatible with our data structure. We’ve also seen how we can use error file parameter along with maximum errors to quickly review what failed with the bulk insert as well as where these failures exist in our file. Finally, we reviewed some thoughts about where the best context for this may be.

我们已经审查了使用大容量插入的第一行和最后一行功能，当分隔的文件具有映射数据的标头但由于与数据结构不兼容而不想导入时，该功能将很有用。我们还看到了如何使用错误文件参数以及最大错误来快速查看批量插入失败的原因以及这些失败在文件中的位置。最后，我们回顾了有关最佳环境可能在哪里的一些想法。