Details
-
Type: Bug
-
Status: Closed
-
Priority: Major
-
Resolution: Not a bug
-
Affects Version/s: 1.3.2
-
Fix Version/s: None
-
Component/s: Engine
-
Labels:None
-
Environment:QuickFIX 1.3.2
Description
It seems that if we should receive a sequence reset while QuickFIX is processing queued messages, those queued messages may not be processed correctly. Is this the case?
Here are log file extracts demonstrating this case. This shows a FIX-level failover between ourselves and a FIX 4.1 counterparty; we have broken the connection while receiving messages, failed over, and restored state from the message store.
Mon Dec 01 14:15:01 JST 2008 MsgSeqNum too high, expecting 1147 but received 1285
Mon Dec 01 14:15:01 JST 2008 Sent ResendRequest FROM: 1147 TO: 999999
[Here, we notice that we have just received message 1285, but we wanted 1147, because that was the extent of our processing. We ask for a resend.]
Mon Dec 01 14:15:01 JST 2008 Received ResendRequest FROM: 2 TO: 999999
[Our counterparty doesn't keep track of our messages after a failover and asks us to send everything again. We do it.]
Mon Dec 01 14:15:05 JST 2008 MsgSeqNum too high, expecting 1286 but received 1288
Mon Dec 01 14:15:05 JST 2008 Sent ResendRequest FROM: 1286 TO: 999999
Mon Dec 01 14:15:05 JST 2008 MsgSeqNum too high, expecting 1286 but received 1289
Mon Dec 01 14:15:05 JST 2008 Already sent ResendRequest FROM: 1286 TO: 1287. Not sending another.
Mon Dec 01 14:15:05 JST 2008 MsgSeqNum too high, expecting 1286 but received 1290
Mon Dec 01 14:15:05 JST 2008 Already sent ResendRequest FROM: 1286 TO: 1287. Not sending another.
[They keep sending us new messages. We queue them up].
Mon Dec 01 14:16:32 JST 2008 Received SequenceReset FROM: 1286 TO: 1744
[They send us a sequence reset, the FIX sequence number of which is 1743].
Mon Dec 01 14:16:32 JST 2008 Processing queued message: 1744, pending: [1288, 1289, 1290, ..., 1999]
Mon Dec 01 14:16:32 JST 2008 Processing queued message: 1745, pending: [1288, 1289, 1290, ..., 1999]
Mon Dec 01 14:16:32 JST 2008 Processing queued message: 1746, pending: [1288, 1289, 1290, ..., 1999]
[We start processing the queued messages. But the strange thing is that we start from 1744, and not 1288.
Mon Dec 01 14:16:35 JST 2008 Processing queued message: 1999, pending: [1288, 1289, 1290, 1291, 1292, 1293, 1294, 1295, 1296, 1297, 1298, 1299, 1300, 1301, 1302, 1303, 1304, 1305, 1306, 1307, 1308, 1309, 1310, 1311, 1312, 1313, 1314, 1315, 1316, 1317, 1318, 1319, 1320, 1321, 1322, 1323, 1324, 1325, 1326, 1327, 1328, 1329, 1330, 1331, 1332, 1333, 1334, 1335, 1336, 1337, 1338, 1339, 1340, 1341, 1342, 1343, 1344, 1345, 1346, 1347, 1348, 1349, 1350, 1351, 1352, 1353, 1354, 1355, 1356, 1357, 1358, 1359, 1360, 1361, 1362, 1363, 1364, 1365, 1366, 1367, 1368, 1369, 1370, 1371, 1372, 1373, 1374, 1375, 1376, 1377, 1378, 1379, 1380, 1381, 1382, 1383, 1384, 1385, 1386, 1387, 1388, 1389, 1390, 1391, 1392, 1393, 1394, 1395, 1396, 1397, 1398, 1399, 1400, 1401, 1402, 1403, 1404, 1405, 1406, 1407, 1408, 1409, 1410, 1411, 1412, 1413, 1414, 1415, 1416, 1417, 1418, 1419, 1420, 1421, 1422, 1423, 1424, 1425, 1426, 1427, 1428, 1429, 1430, 1431, 1432, 1433, 1434, 1435, 1436, 1437, 1438, 1439, 1440, 1441, 1442, 1443, 1444, 1445, 1446, 1447, 1448, 1449, 1450, 1451, 1452, 1453, 1454, 1455, 1456, 1457, 1458, 1459, 1460, 1461, 1462, 1463, 1464, 1465, 1466, 1467, 1468, 1469, 1470, 1471, 1472, 1473, 1474, 1475, 1476, 1477, 1478, 1479, 1480, 1481, 1482, 1483, 1484, 1485, 1486, 1487, 1488, 1489, 1490, 1491, 1492, 1493, 1494, 1495, 1496, 1497, 1498, 1499, 1500, 1501, 1502, 1503, 1504, 1505, 1506, 1507, 1508, 1509, 1510, 1511, 1512, 1513, 1514, 1515, 1516, 1517, 1518, 1519, 1520, 1521, 1522, 1523, 1524, 1525, 1526, 1527, 1528, 1529, 1530, 1531, 1532, 1533, 1534, 1535, 1536, 1537, 1538, 1539, 1540, 1541, 1542, 1543, 1544, 1545, 1546, 1547, 1548, 1549, 1550, 1551, 1552, 1553, 1554, 1555, 1556, 1557, 1558, 1559, 1560, 1561, 1562, 1563, 1564, 1565, 1566, 1567, 1568, 1569, 1570, 1571, 1572, 1573, 1574, 1575, 1576, 1577, 1578, 1579, 1580, 1581, 1582, 1583, 1584, 1585, 1586, 1587, 1588, 1589, 1590, 1591, 1592, 1593, 1594, 1595, 1596, 1597, 1598, 1599, 1600, 1601, 1602, 1603, 1604, 1605, 1606, 1607, 1608, 1609, 1610, 1611, 1612, 1613, 1614, 1615, 1616, 1617, 1618, 1619, 1620, 1621, 1622, 1623, 1624, 1625, 1626, 1627, 1628, 1629, 1630, 1631, 1632, 1633, 1634, 1635, 1636, 1637, 1638, 1639, 1640, 1641, 1642, 1643, 1644, 1645, 1646, 1647, 1648, 1649, 1650, 1651, 1652, 1653, 1654, 1655, 1656, 1657, 1658, 1659, 1660, 1661, 1662, 1663, 1664, 1665, 1666, 1667, 1668, 1669, 1670, 1671, 1672, 1673, 1674, 1675, 1676, 1677, 1678, 1679, 1680, 1681, 1682, 1683, 1684, 1685, 1686, 1687, 1688, 1689, 1690, 1691, 1692, 1693, 1694, 1695, 1696, 1697, 1698, 1699, 1700, 1701, 1702, 1703, 1704, 1705, 1706, 1707, 1708, 1709, 1710, 1711, 1712, 1713, 1714, 1715, 1716, 1717, 1718, 1719, 1720, 1721, 1722, 1723, 1724, 1725, 1726, 1727, 1728, 1729, 1730, 1731, 1732, 1733, 1734, 1735, 1736, 1737, 1738, 1739, 1740, 1741, 1742, 1743]
[We get to the end, and are missing all of the messages in the brackets].
Is this the appropriate behaviour? Should QuickFIX burn off messages in the queue if it receives a sequence number reset? In this case it would have been better for us if we had ignored it, since we would not have lost messages from the counterparty. We are investigating why the counterparty sent us this reset message, but since we cannot ask them to change their system, we might have to find some way of working around it.
What I suspect is happening is that the counterparty doesn't support resend requests at all and is just resetting our sequence number - but since this is the first time I've seen this behaviour, I want to make sure that the queue logic is correct in this case. It just seems strange to process it this way - normally, the messages that I have seen are 'processing message 2, remaining [3, 4, 5]': ie: the item being processed always has a sequence number that is lower than the other items in the queue.